Getting Started with Outflux

Designed to help users seamlessly migrate from InfluxDB to TimescaleDB, Outflux enables users to pipe exported data directly into TimescaleDB. Outflux manages the schema discovery, validation, and creation for you. It’s easy to use, configurable, and most importantly, it’s fast.

Installation

Before we start

Before we start, you will need the following setup:

Setting up Outflux

If all the pre-requirements are met, you can start installing Outflux. Outflux is an open-source tool and the code is available on GitHub as a public repository. We publish builds for Linux, Windows and MacOS.

  1. Visit the releases section of the repository
  2. Download the latest compressed tarball for your platform
  3. Extract it to a preferred location

If you navigate to where you extracted the archive and execute:

  1. $ ./outflux --help
  2. Outflux offers the capabilities to migrate an InfluxDB database, or specific measurements to TimescaleDB.
  3. It can also allow a user to transfer only the schema of a database or measurement to TimescaleDB
  4. Usage:
  5. outflux [command]
  6. Available Commands:
  7. help Help about any command
  8. migrate Migrate the schema and data from InfluxDB measurements into TimescaleDB hypertables
  9. schema-transfer Discover the schema of measurements and validate or prepare a TimescaleDB hyper-table
  10. with the discovered schema

You will see the help output for Outflux, a brief explanation of what it can do, the usage, and available commands.

For instructions on how to set up Outflux from source you can visit the README.

Importing sample data

If you don’t already have an existing InfluxDB database, you can try Outflux by importing this example file with data written in the Influx Line Protocol found at https://timescaledata.blob.core.windows.net/datasets/outflux_taxi.txt

You can use the Influx CLI client to load the data. The file will first create the “outflux_tutorial” database and then do 2741 inserts.

  1. $ influx -import -path=outflux_taxit.txt -database=outflux_tutorial
  2. 2019/03/27 11:39:11 Processed 1 commands
  3. 2019/03/27 11:39:11 Processed 2741 inserts
  4. 2019/03/27 11:39:11 Failed 0 inserts

The data in the file is without a timestamp so the current time of the Influx server will be used at the time of insert. All the data points belong to one measurement taxi. The points are tagged with location, rating, and vendor. Four fields are recorded: fare, mta_tax, tip, and tolls. The influx client assumes the server is available at http://localhost:8086 by default.

Schema discovery, validation, and transfer

One of Outflux’s features is the ability to discover the schema of an InfluxDB measurement and either validate that a TimescaleDB table exists that can hold the transferred data, or create a new table that can satisfy the schema requirements.

We can now create a TimescaleDB hypertable ready to receive the demo data we inserted into the InfluxDB instance. If you followed the tutorial and inserted the data from the example, there should be a taxi measurement in the outflux_tutorial database in the InfluxDB instance.

The schema-transfer command of Outflux can work with 4 schema strategies:

  • ValidateOnly: checks if the TimescaleDB extension is installed, a specified database has a hypertable with the proper columns, and if it’s partitioned properly, but will not perform modifications
  • CreateIfMissing: runs all checks that ValidateOnly does and creates and properly partitions any missing hypertables
  • DropAndCreate: drops any existing table with the same name as the measurement, and creates a new hypertable and partitions it properly
  • DropCascadeAndCreate: performs the same action as DropAndCreate with the additional strength of executing a cascade table drop if there is an existing table with the same name as the measurement

The schema-transfer command can be called with parameters that specify the schema strategy, and if the user wants the tags or fields of a measure to be transferred as a single JSONB column. By default each tag and each field is created as a separate column.

We can run schema-transfer with Outflux on our example data with:

  1. $ outflux schema-transfer outflux_tutorial taxi \
  2. --input-server=http://localhost:8086 \
  3. --output-conn="dbname=postgres user=postgres"

The schema-transfer command is executed by specifying the database (outflux_tutorial) and then the measurements (taxi). If no measurements are specified, all measurements in a database will be transferred. The location of the InfluxDB server is specified with the --input-server flag. The target database and other connection options are specified with the --output-conn flag. Here we’re using the postgres user and database to connect to our server. How to specify usernames, passwords and many more configuration options about the input and output connections (including which environment variables are recognized) can be discovered on the GitHub repo for Outflux. By default, schema-transfer executes with the CreateIfMissing strategy.

Here’s an example output of running Outflux schema-transfer with the DropAndCreate strategy and having all tags in a single JSONB column:

  1. $ outflux schema-transfer outflux_tutorial taxi \
  2. --input-server=http://localhost:8086 \
  3. --output-conn="dbname=postgres user=postgres" \
  4. --schema-strategy=DropAndCreate \
  5. --tags-as-json
  6. 2019/03/27 12:10:30 Selected input database: outflux_tutorial
  7. 2019/03/27 12:10:30 Overriding PG environment variables for connection with: dbname=postgres
  8. user=postgres
  9. 2019/03/27 12:10:30 Tags for measure 'taxi' will be combined into a single JSONB column
  10. 2019/03/27 12:10:30 pipe_taxi starting execution
  11. 2019/03/27 12:10:30 Discovering influx schema for measurement: taxi
  12. 2019/03/27 12:10:30 Discovered: DataSet { Name: taxi, Columns: [Column... 2019/03/27 12:10:30
  13. Selected Schema Strategy: DropAndCreate
  14. 2019/03/27 12:10:30 Table taxi exists, dropping it
  15. 2019/03/27 12:10:30 Executing: DROP TABLE taxi
  16. 2019/03/27 12:10:30 Table taxi ready to be created
  17. 2019/03/27 12:10:30 Creating table with:
  18. CREATE TABLE "taxi"("time" TIMESTAMP, "tags" JSONB, "fare" FLOAT, "mta_tax" FLOAT, "tip" FLOAT, "tolls" FLOAT)
  19. 2019/03/27 12:10:30 Preparing TimescaleDB extension:
  20. CREATE EXTENSION IF NOT EXISTS timescaledb
  21. 2019/03/27 12:10:30 Creating hypertable with: SELECT create_hypertable('"taxi"', 'time');
  22. 2019/03/27 12:10:30 No data transfer will occur
  23. 2019/03/27 12:10:30 Schema Transfer complete in: 0.056 seconds

Data Migration

Schema transfer is useful, but it’s not what we built Outflux for. You can do schema-transfer and data migration in one with the migrate command. The connection options available are the same (and you can check them out on the public repo). You can transfer a complete InfluxDB database with each measurement being exported as a separate table, or you can select which measurements to export.

You can transfer all of the example data from the taxi measurement in the outflux_tutorial database with the command:

  1. $ outflux migrate outflux_tutorial taxi \
  2. --input-server=http://localhost:8086 \
  3. --output-conn="dbname=postgres user=postgres" \
  4. --schema-strategy=DropAndCreate

Here we’re using the DropAndCreate strategy that will drop any previous table named cpu and create it before piping the data. The migrate command supports several flags that offer the user flexibility in the selection of data to be migrated. One of them is the --limit flag that will only export the first N rows from the InfluxDB database ordered by time. The output of the migrate command with a N=10 limit should look like this:

  1. $ outflux migrate outflux_tutorial taxi \
  2. --input-server=http://localhost:8086 \
  3. --output-conn="dbname=postgres user=postgres" \
  4. --schema-strategy=ValidateOnly --limit=10
  5. 2019/03/27 12:15:01 All pipelines scheduled
  6. 2019/03/27 12:15:01 Overriding PG environment variables for connection with: dbname=postgres
  7. user=postgres
  8. 2019/03/27 12:15:01 pipe_taxi starting execution
  9. 2019/03/27 12:15:01 Discovering influx schema for measurement: taxi
  10. 2019/03/27 12:15:01 Discovered: DataSet { Name: taxi, Columns: [Column {... 2019/03/27 12:15:01
  11. Selected Schema Strategy: ValidateOnly
  12. 2019/03/27 12:15:01 Table taxi exists. Proceeding only with validation
  13. 2019/03/27 12:15:01 existing hypertable 'taxi' is partitioned properly
  14. 2019/03/27 12:15:01 Starting extractor 'pipe_taxi_ext' for measure: taxi
  15. 2019/03/27 12:15:01 Starting data ingestor 'pipe_taxi_ing'
  16. 2019/03/27 12:15:01 pipe_taxi_ext: Extracting data from database 'outflux_tutorial'
  17. 2019/03/27 12:15:01 pipe_taxi_ext: SELECT "time", "location_id", "rating", "vendor", "fare",
  18. "mta_tax", "tip", "tolls"
  19. FROM "taxi"
  20. LIMIT 10
  21. 2019/03/27 12:15:01 pipe_taxi_ext:Pulling chunks with size 15000
  22. 2019/03/27 12:15:01 Will batch insert 8000 rows at once. With commit strategy: CommitOnEachBatch
  23. 2019/03/27 12:15:01 pipe_taxi_ext: Extracted 10 rows from Influx
  24. 2019/03/27 12:15:01 pipe_taxi_ing: Complete. Inserted 10 rows.
  25. 2019/03/27 12:15:01 All pipelines finished
  26. 2019/03/27 12:15:01 Migration execution time: 0.055 seconds

Another way to select the exported data are the --from and --to flags to specify a narrower time-window to export. To export data only after January 1, 2020 execute the command:

  1. $ outflux migrate outflux_tutorial cpu \
  2. --input-server=http://localhost:8086 \
  3. --output-conn="dbname=postgres user=postgres" \
  4. --schema-strategy=ValidateOnly --from=2020-01-01T00:00:00Z

If you follow the output closely, you can see that the data is pulled from the InfluxDB server in chunks, by default sized 15000, but can be changed by specifying the --chunk-size flag. The data is inserted in batches of 8000 rows (by default), which can also be modified by the flag --batch-size. All the possible flags for the migrate command are listed in the GitHub documentation (https://github.com/timescale/outflux#migrate) or you can see them by executing:

  1. $ outflux migrate --help

Next Steps

Once you have migrated to TimescaleDB, you can begin to familiarize yourself with our architecture and API reference.

Additionally, we have several other tutorials available for you to explore as you become accustomed to working with TimescaleDB.