Migrate from CSV

This page has instructions for migrating data from CSV files into CockroachDB using IMPORT.

The examples below use the employees data set that is also used in the MySQL docs.

The examples below pull real data from Amazon S3. They use the employees data set that is also used in the MySQL docs, dumped as a set of CSV files.

Note that CockroachDB's IMPORT does not support importing data into existing tables.

Step 1. Export data to CSV

Please refer to the documentation of your database for instructions on exporting data to CSV.

You will need to export one CSV file per table, with the following requirements:

  • Files must be in valid CSV format, with the caveat that the delimiter must be a single character. To use a character other than comma (such as a tab), set a custom delimiter using the delimiter option.
  • Files must be UTF-8 encoded.
  • If one of the following characters appears in a field, the field must be enclosed by double quotes:
    • delimiter (, by default)
    • double quote (")
    • newline (\n)
    • carriage return (\r)
  • If double quotes are used to enclose fields, then a double quote appearing inside a field must be escaped by preceding it with another double quote. For example: "aaa","b""bb","ccc"
  • If a column is of type BYTES, it can either be a valid UTF-8 string or a hex-encoded byte literal beginning with \x. For example, a field whose value should be the bytes 1, 2 would be written as \x0102.

Step 2. Host the files where the cluster can access them

Each node in the CockroachDB cluster needs to have access to the files being imported. There are several ways for the cluster to access the data; for a complete list of the types of storage IMPORT can pull from, see Import File URLs.

Tip:

We strongly recommend using cloud storage such as Amazon S3 or Google Cloud to host the data files you want to import.

Step 3. Import the CSV

You will need to write an IMPORT TABLE statement that matches the schema of the table data you're importing.

For example, to import the data from employees.csv into an employees table, issue the following statement:

  1. > IMPORT TABLE employees (
  2. emp_no INT PRIMARY KEY,
  3. birth_date DATE NOT NULL,
  4. first_name STRING NOT NULL,
  5. last_name STRING NOT NULL,
  6. gender STRING NOT NULL,
  7. hire_date DATE NOT NULL
  8. ) CSV DATA ('https://s3-us-west-1.amazonaws.com/cockroachdb-movr/datasets/employees-db/csv/employees.csv.gz');
  1. job_id | status | fraction_completed | rows | index_entries | system_records | bytes
  2. --------------------+-----------+--------------------+--------+---------------+----------------+----------
  3. 381866942129111041 | succeeded | 1 | 300024 | 0 | 0 | 13258389
  4. (1 row)

Repeat the above for each CSV file you want to import.

Note that you will need to run ALTER TABLE … ADD CONSTRAINT to add any foreign key relationships.

Configuration Options

The following options are available to IMPORT … CSV:

Column delimiter

The delimiter option is used to set the Unicode character that marks where each column ends. Default: ,.

Example usage:

  1. > IMPORT TABLE employees (
  2. emp_no INT PRIMARY KEY,
  3. birth_date DATE NOT NULL,
  4. first_name STRING NOT NULL,
  5. last_name STRING NOT NULL,
  6. gender STRING NOT NULL,
  7. hire_date DATE NOT NULL
  8. )
  9. CSV DATA ('s3://acme-co/employees.csv?AWS_ACCESS_KEY_ID=123&AWS_SECRET_ACCESS_KEY=456')
  10. WITH delimiter = e'\t';

Comment syntax

The comment option determines which Unicode character marks the rows in the data to be skipped.

Example usage:

  1. > IMPORT TABLE employees (
  2. emp_no INT PRIMARY KEY,
  3. birth_date DATE NOT NULL,
  4. first_name STRING NOT NULL,
  5. last_name STRING NOT NULL,
  6. gender STRING NOT NULL,
  7. hire_date DATE NOT NULL
  8. )
  9. CSV DATA ('s3://acme-co/employees.csv?AWS_ACCESS_KEY_ID=123&AWS_SECRET_ACCESS_KEY=456')
  10. WITH comment = '#';

Skip header rows

The skip option determines the number of header rows to skip when importing a file.

Example usage:

  1. > IMPORT TABLE employees (
  2. emp_no INT PRIMARY KEY,
  3. birth_date DATE NOT NULL,
  4. first_name STRING NOT NULL,
  5. last_name STRING NOT NULL,
  6. gender STRING NOT NULL,
  7. hire_date DATE NOT NULL
  8. )
  9. CSV DATA ('s3://acme-co/employees.csv?AWS_ACCESS_KEY_ID=123&AWS_SECRET_ACCESS_KEY=456')
  10. WITH skip = '2';

Null strings

The nullif option defines which string should be converted to NULL.

Example usage:

  1. > IMPORT TABLE employees (
  2. emp_no INT PRIMARY KEY,
  3. birth_date DATE NOT NULL,
  4. first_name STRING NOT NULL,
  5. last_name STRING NOT NULL,
  6. gender STRING NOT NULL,
  7. hire_date DATE NOT NULL
  8. )
  9. CSV DATA ('s3://acme-co/employees.csv?AWS_ACCESS_KEY_ID=123&AWS_SECRET_ACCESS_KEY=456')
  10. WITH nullif = '';

File compression

The compress option defines which decompression codec should be used on the CSV file to be imported. Options include:

  • gzip: Uses the gzip algorithm to decompress the file.
  • bzip: Uses the bzip algorithm to decompress the file.
  • none: Disables decompression.
  • auto: Default. Guesses based on file extension ('none' for .csv, 'gzip' for .gz, 'bzip' for .bz and .bz2).
    Example usage:
  1. > IMPORT TABLE employees (
  2. emp_no INT PRIMARY KEY,
  3. birth_date DATE NOT NULL,
  4. first_name STRING NOT NULL,
  5. last_name STRING NOT NULL,
  6. gender STRING NOT NULL,
  7. hire_date DATE NOT NULL
  8. )
  9. CSV DATA ('s3://acme-co/employees.csv.gz?AWS_ACCESS_KEY_ID=123&AWS_SECRET_ACCESS_KEY=456')
  10. WITH compress = 'gzip';

See also

Was this page helpful?
YesNo