About timescaledb-parallel-copy

To speed up bulk inserts of data, Timescale provides an open source parallel importer program called timescaledb-parallel-copy. The program parallelizes migration by using several workers to run multiple COPY functions concurrently.

PostgreSQL’s native COPY function is transactional and single-threaded, and is not suitable for ingesting large amounts of data. If the file is at least chronologically ordered with respect to the time dimension of the hypertable, timescaledb-parallel-copy improves performance by parallelizing this operation. This enables you to take full advantage of your hardware resources.

timescaledb-parallel-copy ingests data efficiently by preserving the order of the rows. The round-robin approach to share inserts between parallel workers ensures that the database switches between chunks less often. This improves memory management and keeps operations on the disk as sequential as possible.

Before you begin

Importing data using timescaledb-parallel-copy

  1. Install timescaledb-parallel-copy from the github repository:

    1. go install github.com/timescale/timescaledb-parallel-copy/cmd/[email protected]
  2. Check the version of timescaledb-parallel-copy:

    1. timescaledb-parallel-copy --version
  3. Change to the directory that contains the .csv files to import.

  4. To import data into tsdb database on cloud. Set <NUM_WORKERS> to twice the number of CPUs in your database. For example, if you have 4 CPUs, <NUM_WORKERS> should be 8.

    1. timescaledb-parallel-copy \
    2. --connection "host=<CLOUD_HOST> \
    3. user=tsdbadmin password=<CLOUD_PASSWORD> \
    4. port=<CLOUD_PORT> \
    5. sslmode=require" \
    6. --db-name tsdb \
    7. --table <TABLE_NAME> \
    8. --file <FILE_NAME>.csv \
    9. --workers <NUM_WORKERS> \
    10. --reporting-period 30s
    note

    To import data into a tsdb database on a localhost the connection parameter would be "host=localhost user=postgres sslmode=disable"