Handle duplicate data points
This page documents an earlier version of InfluxDB. InfluxDB v2.7 is the latest stable version. View this page in the v2.7 documentation.
InfluxDB identifies unique data points by their measurement, tag set, and timestamp (each a part of Line protocol used to write data to InfluxDB).
web,host=host2,region=us_west firstByte=15.0 1559260800000000000
--- ------------------------- -------------------
| | |
Measurement Tag set Timestamp
Duplicate data points
For points that have the same measurement name, tag set, and timestamp, InfluxDB creates a union of the old and new field sets. For any matching field keys, InfluxDB uses the field value of the new point. For example:
# Existing data point
web,host=host2,region=us_west firstByte=24.0,dnsLookup=7.0 1559260800000000000
# New data point
web,host=host2,region=us_west firstByte=15.0 1559260800000000000
After you submit the new data point, InfluxDB overwrites firstByte
with the new field value and leaves the field dnsLookup
alone:
# Resulting data point
web,host=host2,region=us_west firstByte=15.0,dnsLookup=7.0 1559260800000000000
from(bucket: "example-bucket")
|> range(start: 2019-05-31T00:00:00Z, stop: 2019-05-31T12:00:00Z)
|> filter(fn: (r) => r._measurement == "web")
Table: keys: [_measurement, host, region]
_time _measurement host region dnsLookup firstByte
-------------------- ------------ ----- ------- --------- ---------
2019-05-31T00:00:00Z web host2 us_west 7 15
Preserve duplicate points
To preserve both old and new field values in duplicate points, use one of the following strategies:
Add an arbitrary tag
Add an arbitrary tag with unique values so InfluxDB reads the duplicate points as unique.
For example, add a uniq
tag to each data point:
# Existing point
web,host=host2,region=us_west,uniq=1 firstByte=24.0,dnsLookup=7.0 1559260800000000000
# New point
web,host=host2,region=us_west,uniq=2 firstByte=15.0 1559260800000000000
It is not necessary to retroactively add the unique tag to the existing data point. Tag sets are evaluated as a whole. The arbitrary uniq
tag on the new point allows InfluxDB to recognize it as a unique point. However, this causes the schema of the two points to differ and may lead to challenges when querying the data.
After writing the new point to InfluxDB:
from(bucket: "example-bucket")
|> range(start: 2019-05-31T00:00:00Z, stop: 2019-05-31T12:00:00Z)
|> filter(fn: (r) => r._measurement == "web")
Table: keys: [_measurement, host, region, uniq]
_time _measurement host region uniq firstByte dnsLookup
-------------------- ------------ ----- ------- ---- --------- ---------
2019-05-31T00:00:00Z web host2 us_west 1 24 7
Table: keys: [_measurement, host, region, uniq]
_time _measurement host region uniq firstByte
-------------------- ------------ ----- ------- ---- ---------
2019-05-31T00:00:00Z web host2 us_west 2 15
Increment the timestamp
Increment the timestamp by a nanosecond to enforce the uniqueness of each point.
# Old data point
web,host=host2,region=us_west firstByte=24.0,dnsLookup=7.0 1559260800000000000
# New data point
web,host=host2,region=us_west firstByte=15.0 1559260800000000001
After writing the new point to InfluxDB:
from(bucket: "example-bucket")
|> range(start: 2019-05-31T00:00:00Z, stop: 2019-05-31T12:00:00Z)
|> filter(fn: (r) => r._measurement == "web")
Table: keys: [_measurement, host, region]
_time _measurement host region firstByte dnsLookup
------------------------------ ------------ ----- ------- --------- ---------
2019-05-31T00:00:00.000000000Z web host2 us_west 24 7
2019-05-31T00:00:00.000000001Z web host2 us_west 15
The output of examples queries in this article has been modified to clearly show the different approaches and results for handling duplicate data.