title | sidebar_label | description | image |
---|---|---|---|
Configuring commit lag of out-of-order (O3) data | Out-of-order commit lag | This document describes server configuration parameters for out-of-order commit-lag along with details when and why these settings should be applied | /img/guides/out-of-order-commit-lag/o3-data.jpeg |
Server configuration may be applied when ingesting data over InfluxDB Line Protocol (ILP) to allow user control on how the system processes and commits late-arriving data for optimum throughput.
Background
As of software version 6.0, QuestDB adds support for out-of-order (O3) data ingestion. The skew and latency of out-of-order data are likely to be relatively constant, so users may configure ingestion based on the characteristics of the data.
Most real-time out-of-order data patterns are caused by the delivery mechanism and hardware jitter, and therefore the timestamp distribution will be contained by some boundary.
import Screenshot from “@theme/Screenshot”
If any new timestamp value has a high probability to arrive within 10 seconds of the previously received value, the boundary for this data is 10 seconds
and we name this lag.
When the order of timestamp values follow this pattern, it will be recognized by our out-of-order algorithm and prioritized using an optimized processing path. A commit of this data re-orders uncommitted rows and then commits all rows up to the boundary; the remaining rows stay in memory to be committed later.
Out-of-order (O3) commit parameters
Commit parameters allow for specifying that commits of out-of-order data should occur when:
- they are outside a window of time for which they are expected to be out-of-order or
- when the row-count passes a certain threshold.
:::info
Commit parameters are user-configurable for ingestion using InfluxDB line protocol only. This is the case as commits over Postgres wire protocol are invoked client-side and commits via REST API occur either row-by-row or after a CSV import is complete.
:::
The following server configuration parameters are user-configurable:
# the maximum number of uncommitted rows
cairo.max.uncommitted.rows=X
# the maximum time between jobs that commit uncommitted rows
cairo.commit.lag=X
# the maximum time between ILP jobs that commit uncommitted rows
line.tcp.maintenance.job.interval=X
These parameters are enforced so that commits occur if any one of these conditions are met, therefore out-of-order commits occur based on the age of out-of-order records or by record count.
An out-of-order commit will occur:
- every
cairo.max.uncommitted.rows
or - if records haven’t been committed for
line.tcp.maintenance.job.interval
If a commit occurs due to cairo.max.uncommitted.rows
being reached, then cairo.commit.lag
will be applied.
When to change out-of-order commit configuration
The defaults for the out-of-order algorithm are optimized for real-world usage and should cover most patterns for timestamp arrival. The default configuration is as follows:
cairo.commit.lag=300000
cairo.max.uncommitted.rows=500000
line.tcp.maintenance.job.interval=30000
Users should modify out-of-order parameters if there is a known or expected pattern for:
- The length of time by which most records are late
- The frequency of incoming records and the expected throughput
For optimal ingestion performance, the number of commits of out-of-order data should be minimized. For this reason, if throughput is low and timestamps are expected to be consistently delayed up to thirty seconds, the following configuration settings can be applied
cairo.commit.lag=30000
cairo.max.uncommitted.rows=500
For high-throughput scenarios, lower commit timer and larger number of uncommitted rows may be more appropriate. The following settings would assume a throughput of ten thousand records per second with a likely maximum of 1 second lateness for timestamp values:
cairo.commit.lag=1000
cairo.max.uncommitted.rows=10000
How to configure out-of-order ingestion
Server-wide configuration
These settings may be applied via server configuration file:
cairo.max.uncommitted.rows=500
cairo.commit.lag=10000
line.tcp.maintenance.job.interval=1000
As with other server configuration parameters, these settings may be passed as environment variables:
QDB_LINE_TCP_MAINTENANCE_JOB_INTERVAL
QDB_CAIRO_MAX_UNCOMMITTED_ROWS
QDB_CAIRO_COMMIT_LAG
To set this configuration for the current shell:
export QDB_CAIRO_MAX_UNCOMMITTED_ROWS=1000
export QDB_CAIRO_COMMIT_LAG=20000
questdb start
Passing the environment variables via Docker is done using the -e
flag:
docker run -p 9000:9000 \
-p 9009:9009 \
-p 8812:8812 \
-p 9003:9003 \
-e QDB_CAIRO_MAX_UNCOMMITTED_ROWS=1000 \
-e QDB_CAIRO_COMMIT_LAG=20000 \
questdb/questdb
Per-table lag and maximum uncommitted rows
It’s possible to set out-of-order values per table when creating a new table as part of the PARTITION BY
clause. Configuration is passed using the WITH
keyword with the following two parameters:
maxUncommittedRows
- equivalent tocairo.max.uncommitted.rows
commitLag
- equivalent tocairo.commit.lag
CREATE TABLE my_table (timestamp TIMESTAMP) timestamp(timestamp)
PARTITION BY DAY WITH maxUncommittedRows=250000, commitLag=240s
Checking the values per-table may be done using the tables()
function:
select id, name, maxUncommittedRows, commitLag from tables();
id | name | maxUncommittedRows | commitLag |
---|---|---|---|
1 | my_table | 250000 | 240000000 |
2 | device_data | 10000 | 30000000 |
The values can changed per each table with:
ALTER TABLE my_table SET PARAM maxUncommittedRows = 10000
and
ALTER TABLE my_table SET PARAM commitLag = 20s
For more information on checking table metadata, see the meta functions documentation page.
INSERT lag and batch size
The INSERT
keyword may be passed parameters for handling the expected lag of out-of-order records and a batch size for the number of rows to process and insert at once. The following query shows an INSERT AS SELECT
operation with lag and batch size applied:
INSERT batch 100000 lag 180000000 INTO trades
SELECT ts, instrument, quantity, price
FROM unordered_trades
:::info
Using the lag and batch size parameters during INSERT AS SELECT
statements is a convenient strategy to load and order large datasets from CSV in bulk. This strategy along with an example workflow is described in the importing data guide.
:::