Using TimescaleDB

Backup & Restore

Backing up TimescaleDB takes advantage of the reliable functionality already available through PostgreSQL. There are several ways to accomplish this: physical backups with pg_basebackup or another tool, or logical backups with pg_dump and pg_restore. Physical backups may also be used with Write-Ahead Log (WAL) archiving to achieve an ongoing backup.

Performing Physical Backups

For full instance physical backups (which are especially useful for starting up new replicas), pg_basebackup works with all TimescaleDB installations. You can also use any of several external backup and restore managers such as pg_backrest, barman, or wal-e. These allow you to take online, hot physical backups of your entire instance, and many offer incremental backups and other automation options.

Using pg_dump and pg_restore

In this section, we cover how to backup and restore an entire database or individual hypertables using the native PostgreSQL pg_dump and pg_restore commands.

TIP:Upgrades between different versions of TimescaleDB can be done in place; you don’t need to backup/restore your data. Follow these updating instructions.

Entire database

To backup a database named tutorial, run from the command line:

  1. pg_dump -Fc -f tutorial.bak tutorial

Restoring data from a backup currently requires some additional procedures, which need to be run from psql:

  1. CREATE DATABASE tutorial;
  2. \c tutorial --connect to the db where we'll perform the restore
  3. CREATE EXTENSION timescaledb;
  4. SELECT timescaledb_pre_restore();
  5. -- execute the restore (or from a shell)
  6. \! pg_restore -Fc -d tutorial tutorial.bak
  7. SELECT timescaledb_post_restore();

WARNING:PostgreSQL’s pg_dump does not currently specify the version of the extension in its backup, which leads to problems if you are restoring into a database instance with a more recent extension version installed. (In particular, the backup could be for some version 1.1, but then the CREATE EXTENSION timescaledb command just installs the latest (say, 1.3), and thus does not have the opportunity to run our upgrade scripts.) We are looking into submitting a fix for pg_dump.

The workaround is that when restoring from a backup, you need to restore to a PostgreSQL instance with the same extension version installed, and then upgrade the version.

WARNING:When restoring from versions before 1.3, you must follow the instructions for restoring from earlier versions. You can select docs for previous versions of the database in the sidebar.

WARNING:These instructions do not work if you use flags to selectively choose tables (-t) or schemas (--schema), and so cannot be used to backup only an individual hypertable. In particular, even if you explicitly specify both the hypertable and all of its constituent chunks, this dump would still lack necessary information that TimescaleDB stores in the database catalog about the relation between these tables.

You can, however, explicitly exclude tables from this whole database backup (-T), as well as continue to selectively backup plain tables (i.e., non-hypertable) as well.

Individual hypertables

Below is the procedure for performing a backup and restore of hypertable conditions.

Backup

Backup the hypertable schema:

  1. pg_dump -s -d old_db --table conditions -N _timescaledb_internal | \
  2. grep -v _timescaledb_internal > schema.sql

Backup the hypertable data to a CSV file.

  1. psql -d old_db \
  2. -c "\COPY (SELECT * FROM conditions) TO data.csv DELIMITER ',' CSV"

Restore

Restore the schema:

  1. psql -d new_db < schema.sql

Recreate the hypertables:

  1. psql -d new_db -c "SELECT create_hypertable('conditions', 'time')"

TIP:The parameters to create_hypertable do not need to be the same as in the old db, so this is a good way to re-organize your hypertables (e.g., change partitioning key, number of partitions, chunk interval sizes, etc.).

Restore the data:

  1. psql -d new_db -c "\COPY conditions FROM data.csv CSV"

TIP:The standard COPY command in PostgreSQL is single threaded. So to speed up importing larger amounts of data, we recommend using our parallel importer instead.

Continuous archiving and recovery with Docker & WAL-E

When using TimescaleDB in a containerized environment, it is possible to do continuous archiving using a WAL-E “sidecar” container (i.e., a container that runs alongside the main container). For this purpose, we provide a WAL-E sidecar image that works with TimescaleDB as well as regular PostgreSQL. In the following example, we will setup archiving to the local filesystem using one main TimescaleDB container called timescaledb, and one WAL-E sidecar called wale. For production deployments, this example can be adapted to do archiving against, e.g., AWS S3, and run in an orchestration framework like Kubernetes.

Running the TimescaleDB container

To make TimescaleDB use the WAL-E sidecar for archiving, the two containers need to share a network. Create a Docker network like so:

  1. docker network create timescaledb-net

Then launch TimescaleDB with archiving turned on, using the newly created network:

  1. docker run \
  2. --name timescaledb \
  3. --network timescaledb-net \
  4. -e POSTGRES_PASSWORD=insecure \
  5. -e POSTGRES_INITDB_WALDIR=/var/lib/postgresql/data/pg_wal \
  6. -e PGDATA=/var/lib/postgresql/data/pg_data \
  7. timescale/timescaledb:latest-pg10 postgres \
  8. -cwal_level=archive \
  9. -carchive_mode=on \
  10. -carchive_command="/usr/bin/wget wale/wal-push/%f -O -" \
  11. -carchive_timeout=600 \
  12. -ccheckpoint_timeout=700 \
  13. -cmax_wal_senders=1

We explicitly set the location of the write-ahead log (POSTGRES_INITDB_WALDIR) and data directory (PGDATA) so that we can share these with the WAL-E sidecar. Both must reside in a Docker volume (a volume is created for /var/lib/postgresql/data by default).

It is now possible to log into the database and create tables and data:

  1. docker exec -it timescaledb psql -U postgres

Running the WAL-E sidecar

Our WAL-E Docker image runs a small Web endpoint that accepts WAL-E commands via a HTTP API. This allows PostgreSQL to communicate with the WAL-E sidecar over the internal network to trigger archiving. It is, of course, also possible to use the container to invoke WAL-E directly. The Docker image accepts the standard WAL-E environment variables to configure the archiving backend (e.g., AWS S3) and more. See WAL-E’s documentation for more details.

To enable the WAL-E docker image to perform archiving, it needs to use the network and data volume(s) of the TimescaleDB container. It also needs to know the location of the write-ahead log and data directories. Thus, launch the WAL-E sidecar as follows:

  1. docker run \
  2. --name wale \
  3. --network timescaledb-net \
  4. --volumes-from timescaledb \
  5. -v ~/backups:/backups \
  6. -e WALE_LOG_DESTINATION=stderr \
  7. -e PGWAL=/var/lib/postgresql/data/pg_wal \
  8. -e PGDATA=/var/lib/postgresql/data/pg_data \
  9. -e PGHOST=timescaledb \
  10. -e PGPASSWORD=insecure \
  11. -e PGUSER=postgres \
  12. -e WALE_FILE_PREFIX=file://localhost/backups \
  13. timescale/timescaledb-wale:latest

This will make the WAL-E image listen on commands on port 80 on the timescaledb-net internal network and write backups to ~/backups on the Docker host.

To do the initial base backup, execute the following command in the running WAL-E container (assuming the container’s name is timescaledb-wale):

  1. docker exec wale wal-e backup-push /var/lib/postgresql/data/pg_data

Alternatively, do it via the sidecar’s HTTP endpoint (this requires exposing the sidecar’s port 80 on the Docker host by mapping it to, e.g., port 8080):

  1. curl http://localhost:8080/backup-push

Base backups should be done at regular intervals (e.g., every day) to minimize the amount of WAL replay, making recoveries faster. To make new base backups, simply re-trigger a base backup as shown above, either manually or on a schedule (e.g., via a CRON job). If you run TimescaleDB on Kubernetes, there is built-in support for scheduling cron jobs that can invoke base backups via, e.g., the WAL-E container’s HTTP API.

Recovery

To recover the database instance from the backup archive, create a new TimescaleDB container:

  1. docker create \
  2. --name timescaledb-recovered \
  3. --network timescaledb-net \
  4. -e POSTGRES_PASSWORD=insecure \
  5. -e POSTGRES_INITDB_WALDIR=/var/lib/postgresql/data/pg_wal \
  6. -e PGDATA=/var/lib/postgresql/data/pg_data \
  7. timescale/timescaledb:latest-pg10 postgres

Now restore the database files from the base backup:

  1. docker run -it --rm \
  2. -v ~/backups:/backups \
  3. --volumes-from timescaledb-recovered \
  4. -e WALE_LOG_DESTINATION=stderr \
  5. -e WALE_FILE_PREFIX=file://localhost/backups \
  6. timescale/timescaledb-wale:latest \wal-e \
  7. backup-fetch /var/lib/postgresql/data/pg_data LATEST

Recreate some configuration files (normally, these are backed up configuration files from the old database instance):

  1. docker run -it --rm \
  2. --volumes-from timescaledb-recovered \
  3. timescale/timescaledb:latest-pg10 \
  4. cp /usr/local/share/postgresql/pg_ident.conf.sample /var/lib/postgresql/data/pg_data/pg_ident.conf
  5. docker run -it --rm \
  6. --volumes-from timescaledb-recovered \
  7. timescale/timescaledb:latest-pg10 \
  8. cp /usr/local/share/postgresql/postgresql.conf.sample /var/lib/postgresql/data/pg_data/postgresql.conf
  9. docker run -it --rm \
  10. --volumes-from timescaledb-recovered \
  11. timescale/timescaledb:latest-pg10 \
  12. sh -c 'echo "local all postgres trust" > /var/lib/postgresql/data/pg_data/pg_hba.conf'

Now create a recovery.conf that tells PostgreSQL how to recover:

  1. docker run -it --rm \
  2. --volumes-from timescaledb-recovered \
  3. timescale/timescaledb:latest-pg10 \
  4. sh -c 'echo "restore_command='\''/usr/bin/wget wale/wal-fetch/%f -O -'\''" > /var/lib/postgresql/data/pg_data/recovery.conf'

Then run the WAL-E sidecar again (you may have to remove the old one first). It will be used to replay the last WAL segments that may not be reflected in the base backup:

  1. docker run \
  2. --name wale \
  3. --network timescaledb-net \
  4. -v ~/backups:/backups \
  5. --volumes-from timescaledb-recovered \
  6. -e WALE_LOG_DESTINATION=stderr \
  7. -e PGWAL=/var/lib/postgresql/data/pg_wal \
  8. -e PGDATA=/var/lib/postgresql/data/pg_data \
  9. -e PGHOST=timescaledb \
  10. -e PGPASSWORD=insecure \
  11. -e PGUSER=postgres \
  12. -e WALE_FILE_PREFIX=file://localhost/backups \
  13. timescale/timescaledb-wale:latest

Finally, launch the TimescaleDB docker container:

  1. docker start timescaledb-recovered

Verify that the database started up and recovered successfully:

  1. docker logs timescaledb-recovered

Note that it is normal to see some archive recovery “errors” at the end as the recovery will be complete when no further files can be found in the archive. See the PostgreSQL documentation on continuous archiving for more information.