Getting Started with Outflux
Designed to help users seamlessly migrate from InfluxDB to TimescaleDB, Outflux enables users to pipe exported data directly into TimescaleDB. Outflux manages the schema discovery, validation, and creation for you. It’s easy to use, configurable, and most importantly, it’s fast.
Installation
Before we start
Before we start, you will need the following setup:
- A running instance of InfluxDB at a known location and a means to connect to it
- TimescaleDB installed and a means to connect to it
- And if you need to import some sample data, the InfluxDB Command Line Interface
Setting up Outflux
If all the pre-requirements are met, you can start installing Outflux. Outflux is an open-source tool and the code is available on GitHub as a public repository. We publish builds for Linux, Windows and MacOS.
- Visit the releases section of the repository
- Download the latest compressed tarball for your platform
- Extract it to a preferred location
If you navigate to where you extracted the archive and execute:
$ ./outflux --help
Outflux offers the capabilities to migrate an InfluxDB database, or specific measurements to TimescaleDB.
It can also allow a user to transfer only the schema of a database or measurement to TimescaleDB
Usage:
outflux [command]
Available Commands:
help Help about any command
migrate Migrate the schema and data from InfluxDB measurements into TimescaleDB hypertables
schema-transfer Discover the schema of measurements and validate or prepare a TimescaleDB hyper-table
with the discovered schema
You will see the help output for Outflux, a brief explanation of what it can do, the usage, and available commands.
For instructions on how to set up Outflux from source you can visit the README.
Importing sample data
If you don’t already have an existing InfluxDB database, you can try Outflux by importing this example file with data written in the Influx Line Protocol found at https://timescaledata.blob.core.windows.net/datasets/outflux_taxi.txt
You can use the Influx CLI client to load the data. The file will first create the “outflux_tutorial” database and then do 2741 inserts.
$ influx -import -path=outflux_taxit.txt -database=outflux_tutorial
2019/03/27 11:39:11 Processed 1 commands
2019/03/27 11:39:11 Processed 2741 inserts
2019/03/27 11:39:11 Failed 0 inserts
The data in the file is without a timestamp so the current time of the Influx server will be used at the time of insert. All the data points belong to one measurement taxi
. The points are tagged with location, rating, and vendor. Four fields are recorded: fare, mta_tax, tip, and tolls. The influx
client assumes the server is available at http://localhost:8086
by default.
Schema discovery, validation, and transfer
One of Outflux’s features is the ability to discover the schema of an InfluxDB measurement and either validate that a TimescaleDB table exists that can hold the transferred data, or create a new table that can satisfy the schema requirements.
We can now create a TimescaleDB hypertable ready to receive the demo data we inserted into the InfluxDB instance. If you followed the tutorial and inserted the data from the example, there should be a taxi
measurement in the outflux_tutorial
database in the InfluxDB instance.
The schema-transfer
command of Outflux can work with 4 schema strategies:
ValidateOnly
: checks if the TimescaleDB extension is installed, a specified database has a hypertable with the proper columns, and if it’s partitioned properly, but will not perform modificationsCreateIfMissing
: runs all checks thatValidateOnly
does and creates and properly partitions any missing hypertablesDropAndCreate
: drops any existing table with the same name as the measurement, and creates a new hypertable and partitions it properlyDropCascadeAndCreate
: performs the same action as DropAndCreate with the additional strength of executing a cascade table drop if there is an existing table with the same name as the measurement
The schema-transfer
command can be called with parameters that specify the schema strategy, and if the user wants the tags or fields of a measure to be transferred as a single JSONB column. By default each tag and each field is created as a separate column.
We can run schema-transfer
with Outflux on our example data with:
$ outflux schema-transfer outflux_tutorial taxi \
--input-server=http://localhost:8086 \
--output-conn="dbname=postgres user=postgres"
The schema-transfer
command is executed by specifying the database (outflux_tutorial
) and then the measurements (taxi
). If no measurements are specified, all measurements in a database will be transferred. The location of the InfluxDB server is specified with the --input-server
flag. The target database and other connection options are specified with the --output-conn
flag. Here we’re using the postgres
user and database to connect to our server. How to specify usernames, passwords and many more configuration options about the input and output connections (including which environment variables are recognized) can be discovered on the GitHub repo for Outflux. By default, schema-transfer
executes with the CreateIfMissing
strategy.
Here’s an example output of running Outflux schema-transfer
with the DropAndCreate
strategy and having all tags in a single JSONB column:
$ outflux schema-transfer outflux_tutorial taxi \
--input-server=http://localhost:8086 \
--output-conn="dbname=postgres user=postgres" \
--schema-strategy=DropAndCreate \
--tags-as-json
2019/03/27 12:10:30 Selected input database: outflux_tutorial
2019/03/27 12:10:30 Overriding PG environment variables for connection with: dbname=postgres
user=postgres
2019/03/27 12:10:30 Tags for measure 'taxi' will be combined into a single JSONB column
2019/03/27 12:10:30 pipe_taxi starting execution
2019/03/27 12:10:30 Discovering influx schema for measurement: taxi
2019/03/27 12:10:30 Discovered: DataSet { Name: taxi, Columns: [Column... 2019/03/27 12:10:30
Selected Schema Strategy: DropAndCreate
2019/03/27 12:10:30 Table taxi exists, dropping it
2019/03/27 12:10:30 Executing: DROP TABLE taxi
2019/03/27 12:10:30 Table taxi ready to be created
2019/03/27 12:10:30 Creating table with:
CREATE TABLE "taxi"("time" TIMESTAMP, "tags" JSONB, "fare" FLOAT, "mta_tax" FLOAT, "tip" FLOAT, "tolls" FLOAT)
2019/03/27 12:10:30 Preparing TimescaleDB extension:
CREATE EXTENSION IF NOT EXISTS timescaledb
2019/03/27 12:10:30 Creating hypertable with: SELECT create_hypertable('"taxi"', 'time');
2019/03/27 12:10:30 No data transfer will occur
2019/03/27 12:10:30 Schema Transfer complete in: 0.056 seconds
Data Migration
Schema transfer is useful, but it’s not what we built Outflux for. You can do schema-transfer and data migration in one with the migrate
command. The connection options available are the same (and you can check them out on the public repo). You can transfer a complete InfluxDB database with each measurement being exported as a separate table, or you can select which measurements to export.
You can transfer all of the example data from the taxi
measurement in the outflux_tutorial
database with the command:
$ outflux migrate outflux_tutorial taxi \
--input-server=http://localhost:8086 \
--output-conn="dbname=postgres user=postgres" \
--schema-strategy=DropAndCreate
Here we’re using the DropAndCreate strategy that will drop any previous table named cpu
and create it before piping the data. The migrate command supports several flags that offer the user flexibility in the selection of data to be migrated. One of them is the --limit
flag that will only export the first N rows from the InfluxDB database ordered by time. The output of the migrate command with a N=10 limit should look like this:
$ outflux migrate outflux_tutorial taxi \
--input-server=http://localhost:8086 \
--output-conn="dbname=postgres user=postgres" \
--schema-strategy=ValidateOnly --limit=10
2019/03/27 12:15:01 All pipelines scheduled
2019/03/27 12:15:01 Overriding PG environment variables for connection with: dbname=postgres
user=postgres
2019/03/27 12:15:01 pipe_taxi starting execution
2019/03/27 12:15:01 Discovering influx schema for measurement: taxi
2019/03/27 12:15:01 Discovered: DataSet { Name: taxi, Columns: [Column {... 2019/03/27 12:15:01
Selected Schema Strategy: ValidateOnly
2019/03/27 12:15:01 Table taxi exists. Proceeding only with validation
2019/03/27 12:15:01 existing hypertable 'taxi' is partitioned properly
2019/03/27 12:15:01 Starting extractor 'pipe_taxi_ext' for measure: taxi
2019/03/27 12:15:01 Starting data ingestor 'pipe_taxi_ing'
2019/03/27 12:15:01 pipe_taxi_ext: Extracting data from database 'outflux_tutorial'
2019/03/27 12:15:01 pipe_taxi_ext: SELECT "time", "location_id", "rating", "vendor", "fare",
"mta_tax", "tip", "tolls"
FROM "taxi"
LIMIT 10
2019/03/27 12:15:01 pipe_taxi_ext:Pulling chunks with size 15000
2019/03/27 12:15:01 Will batch insert 8000 rows at once. With commit strategy: CommitOnEachBatch
2019/03/27 12:15:01 pipe_taxi_ext: Extracted 10 rows from Influx
2019/03/27 12:15:01 pipe_taxi_ing: Complete. Inserted 10 rows.
2019/03/27 12:15:01 All pipelines finished
2019/03/27 12:15:01 Migration execution time: 0.055 seconds
Another way to select the exported data are the --from
and --to
flags to specify a narrower time-window to export. To export data only after January 1, 2020 execute the command:
$ outflux migrate outflux_tutorial cpu \
--input-server=http://localhost:8086 \
--output-conn="dbname=postgres user=postgres" \
--schema-strategy=ValidateOnly --from=2020-01-01T00:00:00Z
If you follow the output closely, you can see that the data is pulled from the InfluxDB server in chunks, by default sized 15000, but can be changed by specifying the --chunk-size
flag. The data is inserted in batches of 8000 rows (by default), which can also be modified by the flag --batch-size
. All the possible flags for the migrate command are listed in the GitHub documentation (https://github.com/timescale/outflux#migrate) or you can see them by executing:
$ outflux migrate --help
Next Steps
Once you have migrated to TimescaleDB, you can begin to familiarize yourself with our architecture and API reference.
Additionally, we have several other tutorials available for you to explore as you become accustomed to working with TimescaleDB.