About timescaledb-parallel-copy
To speed up bulk inserts of data, Timescale provides an open source parallel importer program called timescaledb-parallel-copy
. The program parallelizes migration by using several workers to run multiple COPY
functions concurrently.
PostgreSQL’s native COPY
function is transactional and single-threaded, and is not suitable for ingesting large amounts of data. If the file is at least chronologically ordered with respect to the time dimension of the hypertable, timescaledb-parallel-copy
improves performance by parallelizing this operation. This enables you to take full advantage of your hardware resources.
timescaledb-parallel-copy
ingests data efficiently by preserving the order of the rows. The round-robin
approach to share inserts between parallel workers ensures that the database switches between chunks less often. This improves memory management and keeps operations on the disk as sequential as possible.
Before you begin
- Install Go runtime version 1.13 or later.
- Install TimescaleDB or create a TimescaleDB service on Timescale Cloud.
- Gather the connection details for TimescaleDB.
- Create a hypertable on the TimescaleDB database to insert the data. Ensure that you use a schema that matches the data in your
.csv
file.
Importing data using timescaledb-parallel-copy
Install
timescaledb-parallel-copy
from the github repository:go install github.com/timescale/timescaledb-parallel-copy/cmd/[email protected]
Check the version of
timescaledb-parallel-copy
:timescaledb-parallel-copy --version
Change to the directory that contains the
.csv
files to import.To import data into
tsdb
database on cloud. Set<NUM_WORKERS>
to twice the number of CPUs in your database. For example, if you have 4 CPUs,<NUM_WORKERS>
should be8
.timescaledb-parallel-copy \
--connection "host=<CLOUD_HOST> \
user=tsdbadmin password=<CLOUD_PASSWORD> \
port=<CLOUD_PORT> \
sslmode=require" \
--db-name tsdb \
--table <TABLE_NAME> \
--file <FILE_NAME>.csv \
--workers <NUM_WORKERS> \
--reporting-period 30s
note
To import data into a
tsdb
database on a localhost the connection parameter would be"host=localhost user=postgres sslmode=disable"