Import data into TimescaleDB from .csv

tip

First make sure that you have properly installed AND setup TimescaleDB within your PostgreSQL instance.

If you have data stored in an external .csv file, you can import it into TimescaleDB:

  1. Create a new empty table with the same schema as the data file and convert the table to a hypertable.
  2. Insert the data from the file.

1. Creating the new Empty Table

Creating the empty table requires foreknowledge of the schema of the data in the file, but is otherwise the same as creating any new hypertable. Our example is a database named new_db and a data file named old_db.csv.

First create a new empty PostgreSQL table:

  1. -- Assuming the data file's columns are time, location, temperature
  2. CREATE TABLE conditions (
  3. time TIMESTAMPTZ NOT NULL,
  4. location text NOT NULL,
  5. temperature DOUBLE PRECISION NULL
  6. );

Then convert that table into a hypertable using create_hypertable:

  1. SELECT create_hypertable('conditions', 'time');

2. Inserting data into the hypertable

To bulk insert data into the new table, we recommend using our open sourced Go program that can speed up large data migrations by running multiple COPYs concurrently. For example, to use 4 workers:

  1. timescaledb-parallel-copy --db-name new_db --table conditions \
  2. --file old_db.csv --workers 4 --copy-options "CSV"

In addition to parallelizing the workload, the tool also offers flags to improve the copy experience. See the repo on GitHub for full details.

tip

We recommend not setting the number of workers higher than the number of available CPU cores on the machine. Above that, the workers tend to compete with each other for resources and reduce the performance improvements.

Using PostgreSQL’s COPY

Although we recommend our open sourced Go program for better bulk insert performance, we can also use PostgreSQL’s bulk insert command COPY to copy data from the .csv into our new db:

  1. psql -d new_db -c "\COPY conditions FROM old_db.csv CSV"

This method is straightforward and requires no extra tools, but for large datasets it can be impractical and time-consuming because COPY is single-threaded. For a faster method that can utilize more of the CPU, use the previous method.

Migration from InfluxDB to TimescaleDB using Outflux

Outflux is an open-source tool that users can use to batch migrate data from InfluxDB to TimescaleDB. Anyone who is currently running an InfluxDB instance can migrate their workload to TimescaleDB with a single command: outflux migrate. You must also have TimescaleDB installed and a means to connect to it.

With Outflux, users can pipe exported data directly into TimescaleDB. Outflux manages schema discovery, validation, and creation.

For more information on how to get started, please follow this tutorial. Now check out some common hypertable commands for exploring your data.