Writing data
If you are familiar with SQL, then the commands for writing to the database will be familiar to you. TimescaleDB uses standard SQL commands for writing data, including INSERT, UPDATE, and DELETE as well as UPSERTs through ON CONFLICT statements; and it all works as expected with changes to hypertables propagating down to individual chunks.
INSERT
Data can be inserted into a hypertable using the standard INSERT
SQL command (PostgreSQL docs).
INSERT INTO conditions(time, location, temperature, humidity)
VALUES (NOW(), 'office', 70.0, 50.0);
You can also insert multiple rows into a hypertable using a single INSERT
call, even thousands at a time. This is typically much more efficient than inserting data row-by-row, and is recommended when possible.
INSERT INTO conditions
VALUES
(NOW(), 'office', 70.0, 50.0),
(NOW(), 'basement', 66.5, 60.0),
(NOW(), 'garage', 77.0, 65.2);
TIP:The rows that belong to a single batch INSERT command do not need to belong to the same chunk (by time interval or partitioning key). Upon receiving an
INSERT
command for multiple rows, the TimescaleDB engine will determine which rows (sub-batches) belong to which chunks, and will write them accordingly to each chunk in a single transaction.
You can also specify that INSERT
returns some or all of the inserted data via the RETURNING
statement:
INSERT INTO conditions
VALUES (NOW(), 'office', 70.1, 50.1) RETURNING *;
time | location | temperature | humidity
-------------------------------+----------+-------------+----------
2017-07-28 11:42:42.846621+00 | office | 70.1 | 50.1
(1 row)
UPDATE Commands
Updates in TimescaleDB work as expected in standard SQL (PostgreSQL docs).
UPDATE conditions SET temperature = 70.2, humidity = 50.0
WHERE time = '2017-07-28 11:42:42.846621+00' AND location = 'office';
An update command can touch many rows at once, i.e., the following will modify all rows found in a 10-minute block of data.
UPDATE conditions SET temperature = temperature + 0.1
WHERE time >= '2017-07-28 11:40' AND time < '2017-07-28 11:50';
WARNING:TimescaleDB achieves much higher insert performance compared to vanilla PostgreSQL when inserts are localized to the most recent time interval (or two). If your workload is heavily based on
UPDATE
s to old time intervals instead, you may observe significantly lower write throughput.
UPSERT Functionality
TimescaleDB supports UPSERTs in the same manner as PostgreSQL via the optional ON CONFLICT
clause (PostgreSQL docs). If such a clause is provided, rather than cause an error, an inserted row that conflicts with another can either (a) do nothing or (b) result in a subsequent update of that existing row.
In order to create a conflict, an insert must be performed on identical value(s) in column(s) covered by a unique index or constraint. Such an index is created automatically when marking column(s) as PRIMARY KEY
or with a UNIQUE
constraint.
Following the examples given above, an INSERT
with an identical timestamp and location as an existing row will succeed and create an additional row in the database.
If, however, the conditions
table had been created with a UNIQUE constraint defined on one or more of the columns (either at table creation time or via an ALTER
command):
CREATE TABLE conditions (
time TIMESTAMPTZ NOT NULL,
location TEXT NOT NULL,
temperature DOUBLE PRECISION NULL,
humidity DOUBLE PRECISION NULL,
UNIQUE (time, location)
);
then the second attempt to insert to this same time will normally return an error.
The above UNIQUE
statement during table creation internally is similar to:
CREATE UNIQUE INDEX on conditions (time, location);
Both of these result in a unique index for the table:
# \d+ conditions;
Table "public.conditions"
Column | Type | Modifiers | Storage | Stats target | Description
-------------+--------------------------+-----------+----------+--------------+-------------
time | timestamp with time zone | not null | plain | |
location | text | not null | extended | |
temperature | double precision | | plain | |
humidity | double precision | | plain | |
Indexes:
"conditions_time_location_idx" UNIQUE, btree ("time", location)
Now, however, the INSERT
command can specify that nothing be done on a conflict. This is particularly important when writing many rows as one batch, as otherwise the entire transaction will fail (as opposed to just skipping the row that conflicts).
INSERT INTO conditions
VALUES ('2017-07-28 11:42:42.846621+00', 'office', 70.1, 50.0)
ON CONFLICT DO NOTHING;
Alternatively, one can specify how to update the existing data:
INSERT INTO conditions
VALUES ('2017-07-28 11:42:42.846621+00', 'office', 70.2, 50.1)
ON CONFLICT (time, location) DO UPDATE
SET temperature = excluded.temperature,
humidity = excluded.humidity;
TIP:Unique constraints must include all partitioning keys. For example, if the table just uses time partitioning, the system requires
time
as part of the constraint:UNIQUE(time)
,UNIQUE(time, location)
,UNIQUE(location, time)
, etc. On the other hand,UNIQUE(location)
is not a valid constraint.If the schema were to have an additional column like
device
that is used as an additional partition dimension, then the constraint would have to beUNIQUE(time, device)
orUNIQUE(time, device, location)
. In such scenarios then,UNIQUE(time, location)
would no longer be a valid constraint.WARNING:TimescaleDB does not yet support using
ON CONFLICT ON CONSTRAINT
with a named key (e.g.,conditions_time_location_idx
), but much of this functionality can be captured by specifying the same columns as above with a unique index/constraint. This limitation will be removed in a future version.
DELETE
Data can be deleted from a hypertable using the standard DELETE
SQL command (PostgreSQL docs), which will propagate down to the appropriate chunks that comprise the hypertable.
DELETE FROM conditions WHERE temperature < 35 OR humidity < 60;
DELETE FROM conditions WHERE time < NOW() - INTERVAL '1 month';
After running a large DELETE
operation, users are recommended to VACUUM
or VACUUM FULL
the hypertable to reclaim storage occupied by deleted or obsoleted rows (PostgreSQL docs).
TIP:For deleting old data, such as in the second example above, we recommend using the TimescaleDB function
drop_chunks
instead. This feature is much more performant: it deletes entire chunks of data (basically, removing files), rather than at the individual row level (necessitating vacuuming). See the section on data retention.