Overview
Update data
Update data with same tags and time index
Updates can be efficiently performed by inserting new data. If rows of data have the same tags and time index, the old data will be replaced with the new data. This means that you can only update columns with a field type. To update data, simply insert new data with the same tag and time index as the existing data.
For more information about column types, please refer to the Data Model.
Note
Excessive updates may negatively impact query performance, even though the performance of updates is the same as insertion.
Update all fields in a table
By default, when updating data, all fields will be overwritten with the new values, except for InfluxDB line protocol, which only updates the specified fields. The following example using SQL demonstrates the behavior of overwriting all fields in a table.
Assuming you have a table named monitor
with the following schema. The host
column represents the tag and the ts
column represents the time index.
CREATE TABLE monitor (
host STRING,
ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP() TIME INDEX,
cpu FLOAT64,
memory FLOAT64,
PRIMARY KEY(host)
);
Insert a new row into the monitor
table:
INSERT INTO monitor (host, ts, cpu, memory)
VALUES ("127.0.0.1", "2024-07-11 20:00:00", 0.8, 0.1);
Check the data in the table:
SELECT * FROM monitor;
+-----------+---------------------+------+--------+
| host | ts | cpu | memory |
+-----------+---------------------+------+--------+
| 127.0.0.1 | 2024-07-11 20:00:00 | 0.8 | 0.1 |
+-----------+---------------------+------+--------+
1 row in set (0.00 sec)
To update the data, you can use the same host
and ts
values as the existing data and set the new cpu
value to 0.5
:
INSERT INTO monitor (host, ts, cpu, memory)
-- The same tag `127.0.0.1` and the same time index 2024-07-11 20:00:00
VALUES ("127.0.0.1", "2024-07-11 20:00:00", 0.5, 0.1);
The new data will be:
SELECT * FROM monitor;
+-----------+---------------------+------+--------+
| host | ts | cpu | memory |
+-----------+---------------------+------+--------+
| 127.0.0.1 | 2024-07-11 20:00:00 | 0.5 | 0.1 |
+-----------+---------------------+------+--------+
1 row in set (0.01 sec)
With the default merge policy, if columns are omitted in the INSERT INTO
statement, they will be overwritten with the default values.
For example:
INSERT INTO monitor (host, ts, cpu)
VALUES ("127.0.0.1", "2024-07-11 20:00:00", 0.5);
The default value of the memory
column in the monitor
table is NULL
. Therefore, the new data will be:
SELECT * FROM monitor;
+-----------+---------------------+------+--------+
| host | ts | cpu | memory |
+-----------+---------------------+------+--------+
| 127.0.0.1 | 2024-07-11 20:00:00 | 0.5 | NULL |
+-----------+---------------------+------+--------+
1 row in set (0.01 sec)
Update specific fields in a table
This update policy is supported by default in the InfluxDB line protocol. You can also enable this behavior by specifying the merge_mode
option as last_non_null
when creating a table using SQL. Here’s an example:
CREATE TABLE monitor (
host STRING,
ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP() TIME INDEX,
cpu FLOAT64,
memory FLOAT64,
PRIMARY KEY(host)
) WITH ('merge_mode'='last_non_null');
INSERT INTO monitor (host, ts, cpu, memory)
VALUES ("127.0.0.1", "2024-07-11 20:00:00", 0.8, 0.1);
To update specific fields in the monitor
table, you can insert new data with only the fields you want to update. For example:
INSERT INTO monitor (host, ts, cpu)
VALUES ("127.0.0.1", "2024-07-11 20:00:00", 0.5);
This will update the cpu
field while leaving the memory
field unchanged. The result will be:
+-----------+---------------------+------+--------+
| host | ts | cpu | memory |
+-----------+---------------------+------+--------+
| 127.0.0.1 | 2024-07-11 20:00:00 | 0.5 | 0.1 |
+-----------+---------------------+------+--------+
Notice that the last_non_null
merge mode cannot update the old value to NULL
. For example:
INSERT INTO monitor (host, ts, cpu, memory)
VALUES ("127.0.0.1", "2024-07-11 20:00:01", 0.8, 0.1);
INSERT INTO monitor (host, ts, cpu)
VALUES ("127.0.0.1", "2024-07-11 20:00:01", NULL);
That will not update anything:
+-----------+---------------------+------+--------+
| host | ts | cpu | memory |
+-----------+---------------------+------+--------+
| 127.0.0.1 | 2024-07-11 20:00:01 | 0.8 | 0.1 |
+-----------+---------------------+------+--------+
For more information about the merge_mode
option, please refer to the CREATE TABLE statement.
Avoid updating data by creating table with append_mode
option
GreptimeDB supports an append_mode
option when creating a table, which always inserts new data to the table. This is especially useful when you want to keep all historical data, such as logs.
You can only create a table with the append_mode
option using SQL. After successfully creating the table, all protocols ingest data to the table will always insert new data.
For example, you can create an app_logs
table with the append_mode
option as follows. The host
and log_level
columns represent tags, and the ts
column represents the time index.
CREATE TABLE app_logs (
ts TIMESTAMP TIME INDEX,
host STRING,
api_path STRING FULLTEXT,
log_level STRING,
log STRING FULLTEXT,
PRIMARY KEY (host, log_level)
) WITH ('append_mode'='true');
Insert a new row into the app_logs
table:
INSERT INTO app_logs (ts, host, api_path, log_level, log)
VALUES ('2024-07-11 20:00:10', 'host1', '/api/v1/resource', 'ERROR', 'Connection timeout');
Check the data in the table:
SELECT * FROM app_logs;
The output will be:
+---------------------+-------+------------------+-----------+--------------------+
| ts | host | api_path | log_level | log |
+---------------------+-------+------------------+-----------+--------------------+
| 2024-07-11 20:00:10 | host1 | /api/v1/resource | ERROR | Connection timeout |
+---------------------+-------+------------------+-----------+--------------------+
1 row in set (0.01 sec)
You can insert new data with the same tag and time index:
INSERT INTO app_logs (ts, host, api_path, log_level, log)
-- The same tag `host1` and `ERROR`, the same time index 2024-07-11 20:00:10
VALUES ('2024-07-11 20:00:10', 'host1', '/api/v1/resource', 'ERROR', 'Connection reset');
Then you will find two rows in the table:
SELECT * FROM app_logs;
+---------------------+-------+------------------+-----------+--------------------+
| ts | host | api_path | log_level | log |
+---------------------+-------+------------------+-----------+--------------------+
| 2024-07-11 20:00:10 | host1 | /api/v1/resource | ERROR | Connection reset |
| 2024-07-11 20:00:10 | host1 | /api/v1/resource | ERROR | Connection timeout |
+---------------------+-------+------------------+-----------+--------------------+
2 rows in set (0.01 sec)
Delete Data
You can effectively delete data by specifying tags and time index. Deleting data without specifying the tag and time index columns is not efficient, as it requires two steps: querying the data and then deleting it by tag and time index. For more information about column types, please refer to the Data Model.
Warning
Excessive deletions can negatively impact query performance.
You can only delete data using SQL. For example, to delete a row from the monitor
table with tag host
and timestamp index ts
:
DELETE FROM monitor WHERE host='127.0.0.2' AND ts=1667446798450;
The output will be:
Query OK, 1 row affected (0.00 sec)
For more information about the DELETE
statement, please refer to the SQL DELETE.
Truncate Table
To delete all data in a table, you can use the TRUNCATE TABLE
statement in SQL. For example, to truncate the monitor
table:
TRUNCATE TABLE monitor;
For more information about the TRUNCATE TABLE
statement, refer to the SQL TRUNCATE TABLE documentation.
Manage data retention with TTL policies
You can use Time to Live (TTL) policies to automatically remove stale data from your databases. TTL allows you to set policies to periodically delete data from tables. Setting TTL policies has the following benefits:
- Decrease storage costs by cleaning out obsolete data.
- Reduce the number of rows the database has to scan for some queries, potentially increasing query performance.
You can set TTL for every table when creating it. For example, the following SQL statement creates a table named monitor
with a TTL policy of 7 days:
CREATE TABLE monitor (
host STRING,
ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP() TIME INDEX,
cpu FLOAT64,
memory FLOAT64,
PRIMARY KEY(host)
) WITH ('ttl'='7d');
You can also create a database-level TTL policy. For example, the following SQL statement creates a database named test
with a TTL policy of 7 days:
CREATE DATABASE test WITH ('ttl'='7d');
You can set TTL policies at both the table level and the database level simultaneously. If a table has its own TTL policy, it will take precedence over the database TTL policy. Otherwise, the database TTL policy will be applied to the table.
For more information about TTL policies, please refer to the CREATE statement.