Migrate from InfluxDB

Migrate from InfluxDB

This guide will help you understand the differences between the data models of GreptimeDB and InfluxDB, and guide you through the migration process.

Data model in difference

To understand the differences between the data models of InfluxDB and GreptimeDB, please refer to the Data Model in the Ingest Data documentation.

Database connection information

Before you begin writing or querying data, it’s crucial to comprehend the differences in database connection information between InfluxDB and GreptimeDB.

Token: The InfluxDB API token, used for authentication, aligns with the GreptimeDB authentication. When interacting with GreptimeDB using InfluxDB’s client libraries or HTTP API, you can use <greptimedb_user:greptimedb_password> as the token.
Organization: Unlike InfluxDB, GreptimeDB does not require an organization for connection.
Bucket: In InfluxDB, a bucket serves as a container for time series data, which is equivalent to the database name in GreptimeDB.

Ingest data

GreptimeDB is compatible with both v1 and v2 of InfluxDB’s line protocol format, facilitating a seamless migration from InfluxDB to GreptimeDB.

HTTP API

To write a measurement to GreptimeDB, you can use the following HTTP API request:

InfluxDB line protocol v2
InfluxDB line protocol v1

curl -X POST 'http://<host>:4000/v1/influxdb/api/v2/write?db=<db-name>' \
  -H 'authorization: token <greptime_user:greptimedb_password>' \
  -d 'census,location=klamath,scientist=anderson bees=23 1566086400000000000'

curl 'http://<host>:4000/v1/influxdb/write?db=<db-name>&u=<greptime_user>&p=<greptimedb_password>' \
  -d 'census,location=klamath,scientist=anderson bees=23 1566086400000000000'

Telegraf

GreptimeDB’s support for the Influxdb line protocol ensures its compatibility with Telegraf. To configure Telegraf, simply add GreptimeDB URL into Telegraf configurations:

For detailed configuration instructions, please refer to the Ingest Data via Telegraf documentation.

Client libraries

Writing data to GreptimeDB is a straightforward process when using InfluxDB client libraries. Simply include the URL and authentication details in the client configuration.

For example:

Node.js
Python
Go
Java
PHP

'use strict'
/** @module write
**/
import { InfluxDB, Point } from '@influxdata/influxdb-client'
/** Environment variables **/
const url = 'http://<host>:4000/v1/influxdb'
const token = '<greptime_user>:<greptimedb_password>'
const org = ''
const bucket = '<db-name>'
const influxDB = new InfluxDB({ url, token })
const writeApi = influxDB.getWriteApi(org, bucket)
writeApi.useDefaultTags({ region: 'west' })
const point1 = new Point('temperature')
  .tag('sensor_id', 'TLM01')
  .floatField('value', 24.0)
writeApi.writePoint(point1)

import influxdb_client
from influxdb_client.client.write_api import SYNCHRONOUS
bucket = "<db-name>"
org = ""
token = "<greptime_user>:<greptimedb_password>"
url="http://<host>:4000/v1/influxdb"
client = influxdb_client.InfluxDBClient(
    url=url,
    token=token,
    org=org
)
# Write script
write_api = client.write_api(write_options=SYNCHRONOUS)
p = influxdb_client.Point("my_measurement").tag("location", "Prague").field("temperature", 25.3)
write_api.write(bucket=bucket, org=org, record=p)

bucket := "<db-name>"
org := ""
token := "<greptime_user>:<greptimedb_password>"
url := "http://<host>:4000/v1/influxdb"
client := influxdb2.NewClient(url, token)
writeAPI := client.WriteAPIBlocking(org, bucket)
p := influxdb2.NewPoint("stat",
    map[string]string{"unit": "temperature"},
    map[string]interface{}{"avg": 24.5, "max": 45},
    time.Now())
writeAPI.WritePoint(context.Background(), p)
client.Close()

private static String url = "http://<host>:4000/v1/influxdb";
private static String org = "";
private static String bucket = "<db-name>";
private static char[] token = "<greptime_user>:<greptimedb_password>".toCharArray();
public static void main(final String[] args) {
    InfluxDBClient influxDBClient = InfluxDBClientFactory.create(url, token, org, bucket);
    WriteApiBlocking writeApi = influxDBClient.getWriteApiBlocking();
    Point point = Point.measurement("temperature")
            .addTag("location", "west")
            .addField("value", 55D)
            .time(Instant.now().toEpochMilli(), WritePrecision.MS);
    writeApi.writePoint(point);
    influxDBClient.close();
}

$client = new Client([
    "url" => "http://<host>:4000/v1/influxdb",
    "token" => "<greptime_user>:<greptimedb_password>",
    "bucket" => "<db-name>",
    "org" => "",
    "precision" => InfluxDB2\Model\WritePrecision::S
]);
$writeApi = $client->createWriteApi();
$dateTimeNow = new DateTime('NOW');
$point = Point::measurement("weather")
        ->addTag("location", "Denver")
        ->addField("temperature", rand(0, 20))
        ->time($dateTimeNow->getTimestamp());
$writeApi->write($point);

In addition to the languages previously mentioned, GreptimeDB also accommodates client libraries for other languages supported by InfluxDB. You can code in your language of choice by referencing the connection information and code snippets provided earlier.

Query data

GreptimeDB does not support Flux and InfluxQL, opting instead for SQL and PromQL.

SQL is a universal language designed for managing and manipulating relational databases. With flexible capabilities for data retrieval, manipulation, and analytics, it is also reduce the learning curve for users who are already familiar with SQL.

PromQL (Prometheus Query Language) allows users to select and aggregate time series data in real time, The result of an expression can either be shown as a graph, viewed as tabular data in Prometheus’s expression browser, or consumed by external systems via the HTTP API.

Suppose you are querying the maximum cpu usage from the monitor table, recorded over the past 24 hours. In influxQL, the query might look something like this:

SELECT 
   MAX("cpu") 
FROM 
   "monitor" 
WHERE 
   time > now() - 24h 
GROUP BY 
   time(1h)

This InfluxQL query computes the maximum value of the cpu field from the monitor table, considering only the data where the time is within the last 24 hours. The results are then grouped into one-hour intervals.

In Flux, the query might look something like this:

from(bucket: "public")
  |> range(start: -24h)
  |> filter(fn: (r) => r._measurement == "monitor")
  |> aggregateWindow(every: 1h, fn: max)

The similar query in GreptimeDB SQL would be:

SELECT
    ts,
    host,
    AVG(cpu) RANGE '1h' as mean_cpu
FROM
    monitor
WHERE
    ts > NOW() - INTERVAL '24 hours'
ALIGN '1h' TO NOW
ORDER BY ts DESC;

In this SQL query, the RANGE clause determines the time window for the AVG(cpu) aggregation function, while the ALIGN clause sets the alignment time for the time series data. For more information on time window grouping, please refer to the Aggregate data by time window document.

The similar query in PromQL would be something like:

avg_over_time(monitor[1h])

To query time series data from the last 24 hours, you need to execute this PromQL, using the start and end parameters of the HTTP API to define the time range. For more information on PromQL, please refer to the PromQL document.

Visualize data

It is recommended using Grafana to visualize data in GreptimeDB. Please refer to the Grafana documentation for details on configuring GreptimeDB.

Migrate data

For a seamless migration of data from InfluxDB to GreptimeDB, you can follow these steps:

Write data to both GreptimeDB and InfluxDB to avoid data loss during migration.
Export all historical data from InfluxDB and import the data into GreptimeDB.
Stop writing data to InfluxDB and remove the InfluxDB server.

Write data to both GreptimeDB and InfluxDB simultaneously

Writing data to both GreptimeDB and InfluxDB simultaneously is a practical strategy to avoid data loss during migration. By utilizing InfluxDB’s client libraries, you can set up two client instances - one for GreptimeDB and another for InfluxDB. For guidance on writing data to GreptimeDB using the InfluxDB line protocol, please refer to the Ingest Data section.

If retaining all historical data isn’t necessary, you can simultaneously write data to both GreptimeDB and InfluxDB for a specific period to accumulate the required recent data. Subsequently, cease writing to InfluxDB and continue exclusively with GreptimeDB. If a complete migration of all historical data is needed, please proceed with the following steps.

Export data from InfluxDB v1 Server

Create a temporary directory to store the exported data of InfluxDB.

mkdir -p /path/to/export

Use the influx_inspect export command of InfluxDB to export data.

influx_inspect export \
  -database <db-name> \ 
  -end <end-time> \
  -lponly \
  -datadir /var/lib/influxdb/data \
  -waldir /var/lib/influxdb/wal \
  -out /path/to/export/data

The -database flag specifies the database to be exported.
The -end flag specifies the end time of the data to be exported. Must be in RFC3339 format, such as 2024-01-01T00:00:00Z. You can use the timestamp when simultaneously writing data to both GreptimeDB and InfluxDB as the end time.
The -lponly flag specifies that only the Line Protocol data should be exported.
The -datadir flag specifies the path to the data directory, as configured in the InfluxDB data settings.
The -waldir flag specifies the path to the WAL directory, as configured in the InfluxDB data settings.
The -out flag specifies the output directory.

The exported data in InfluxDB line protocol looks like the following:

disk,device=disk1s5s1,fstype=apfs,host=bogon,mode=ro,path=/ inodes_used=356810i 1714363350000000000
diskio,host=bogon,name=disk0 iops_in_progress=0i 1714363350000000000
disk,device=disk1s6,fstype=apfs,host=bogon,mode=rw,path=/System/Volumes/Update inodes_used_percent=0.0002391237988702021 1714363350000000000
...

Export Data from InfluxDB v2 Server

Create a temporary directory to store the exported data of InfluxDB.

mkdir -p /path/to/export

Use the influx inspect export-lp command of InfluxDB to export data in the bucket to line protocol.

influxd inspect export-lp \
  --bucket-id <bucket-id> \
  --engine-path /var/lib/influxdb2/engine/ \
  --end <end-time> \
  --output-path /path/to/export/data

The --bucket-id flag specifies the bucket ID to be exported.
The --engine-path flag specifies the path to the engine directory, as configured in the InfluxDB data settings.
The --end flag specifies the end time of the data to be exported. Must be in RFC3339 format, such as 2024-01-01T00:00:00Z. You can use the timestamp when simultaneously writing data to both GreptimeDB and InfluxDB as the end time.
The --output-path flag specifies the output directory.

The outputs look like the following:

{"level":"info","ts":1714377321.4795408,"caller":"export_lp/export_lp.go:219","msg":"exporting TSM files","tsm_dir":"/var/lib/influxdb2/engine/data/307013e61d514f3c","file_count":1}
{"level":"info","ts":1714377321.4940555,"caller":"export_lp/export_lp.go:315","msg":"exporting WAL files","wal_dir":"/var/lib/influxdb2/engine/wal/307013e61d514f3c","file_count":1}
{"level":"info","ts":1714377321.4941633,"caller":"export_lp/export_lp.go:204","msg":"export complete"}

The exported data in InfluxDB line protocol looks like the following:

cpu,cpu=cpu-total,host=bogon usage_idle=80.4448912910468 1714376180000000000
cpu,cpu=cpu-total,host=bogon usage_idle=78.50167052182304 1714376190000000000
cpu,cpu=cpu-total,host=bogon usage_iowait=0 1714375700000000000
cpu,cpu=cpu-total,host=bogon usage_iowait=0 1714375710000000000
...

Import Data to GreptimeDB

Before importing data to GreptimeDB, if the data file is too large, it’s recommended to split the data file into multiple slices:

split -l 100000 -d -a 10 data data.
# -l [line_count]    Create split files line_count lines in length.
# -d                 Use a numeric suffix instead of a alphabetic suffix.
# -a [suffix_length] Use suffix_length letters to form the suffix of the file name.

You can import data using the HTTP API as described in the write data section. The script provided below will help you in reading data from the files and importing it into GreptimeDB.

Suppose you are in the directory where the data files are stored:

.
├── data.0000000000
├── data.0000000001
├── data.0000000002
...

Replace the following placeholders with your GreptimeDB connection information to setup the environment variables:

export GREPTIME_USERNAME=<greptime_username>
export GREPTIME_PASSWORD=<greptime_password>
export GREPTIME_HOST=<host>
export GREPTIME_DB=<db-name>

Import the data from the files into GreptimeDB:

for file in data.*; do
  curl -i --retry 3 \
    -X POST "http://${GREPTIME_HOST}:4000/v1/influxdb/write?db=${GREPTIME_DB}&u=${GREPTIME_USERNAME}&p=${GREPTIME_PASSWORD}" \
    --data-binary @${file}
  sleep 1
done