Tutorials
Quick Start: Python and TimescaleDB
Goal
This quick start guide is designed to get the Python developer up and running with TimescaleDB as their database. In this tutorial, you’ll learn how to:
- Connect to TimescaleDB
- Create a relational table
- Generate a Hypertable
- Insert a batch of rows into your Timescale database
- Execute a query on your Timescale database
Pre-requisites
To complete this tutorial, you will need a cursory knowledge of the Structured Query Language (SQL). The tutorial will walk you through each SQL command, but it will be helpful if you’ve seen SQL before.
To start, install TimescaleDB. Once your installation is complete, we can proceed to ingesting or creating sample data and finishing the tutorial.
You will also need:
psycopg2
library. See here for installation instructions- An existing Python Virtual Environment. To set one up, follow this tutorial
Connect Python to TimescaleDB
Step 1: Import needed libraries
Add the following import statements to the top of your python script:
#import psycopg2
Step 2: Compose a connection string
Locate your TimescaleDB credentials in order to compose a connection string for psycopg2
to use in order to connect to your TimescaleDB instance.
You’ll need the following credentials:
- password
- username
- host URL
- port
- database name
Next compose your connection string variable, as a libpq connection string, using the following format:
CONNECTION = "postgres://username:password@host:port/dbname"
If you’re using a hosted version of TimescaleDB, or generally require an SSL connection, use this version instead:
CONNECTION = "postgres://username:password@host:port/dbname?sslmode=require"
Alternatively you can specify each parameter in the connection string as follows
CONNECTION = "dbname =tsdb user=tsdbadmin password=secret host=host.com port=5432 sslmode=require"
WARNING:The above method of composing a connection string is for test or development purposes only, for production purposes be sure to make sensitive details like your password, hostname, and port number environment variables.
Step 3: Connect to Timescale database using the Psycopg2 connect function
We’ll use the psycopg2 connect function to create a new database session
In your main
function, add the following lines:
def main():
…
with psycopg2.connect(CONNECTION) as conn:
#Call the function that needs the database connection
func_1(conn)
Alternatively, you can create a connection object as follows and the pass that object around as needed, like opening a cursor to perform database operations:
def main():
conn = psycopg2.connect(CONNECTION)
insert_data(conn)
cur = conn.cursor()
Congratulations, you’ve successfully connected to TimescaleDB using Python!
Create a relational table
Step 1: Formulate your SQL statement
First, compose a string which contains the SQL state that you would use to create a relational table. In the example below, we create a table called sensors
, with columns id
, type
and location
:
query_create_sensors_table = "CREATE TABLE sensors (id SERIAL PRIMARY KEY, type VARCHAR(50), location VARCHAR(50));"
Step 2: Execute the SQL statement and commit changes
Next, we execute our CREATE TABLE
statement by opening a cursor, executing the query from Step 1 and committing the query we executed in order to make changes we made to the database persistent. Afterward, we close the cursor we opened to clean up:
cur = conn.cursor()
#see definition in Step 1
cur.execute(query_create_sensors_table)
conn.commit()
cur.close()
Congratulations, you’ve successfully created a relational table in TimescaleDB using Python!
Generate hypertable
In TimescaleDB, the primary point of interaction with your data is a [hypertable][hypertable], the abstraction of a single continuous table across all space and time intervals, such that one can query it via standard SQL.
Virtually all user interactions with TimescaleDB are with hypertables. Creating tables and indexes, altering tables, inserting data, selecting data, etc. can (and should) all be executed on the hypertable.
A hypertable is defined by a standard schema with column names and types, with at least one column specifying a time value.
Step 1: Formulate the CREATE TABLE SQL statement for your hypertable
First, we create a variable which houses our CREATE TABLE
SQL statement for our hypertable. Notice how the hypertable has the compulsory time column:
#create sensor data hypertable
query_create_sensordata_table = """CREATE TABLE sensor_data (
time TIMESTAMPTZ NOT NULL,
sensor_id INTEGER,
temperature DOUBLE PRECISION,
cpu DOUBLE PRECISION,
FOREIGN KEY (sensor_id) REFERENCES sensors (id)
);"""
Step 2: Formulate create hypertable SELECT statement for your hypertable
Next we formulate the SELECT
statement to convert the table we created in Step 1 into a hypertable. Note that we must specify the table name which we wish to convert to a hypertable and its time column name as the two arguments, as mandated by the create_hypertable
docs:
query_create_sensordata_hypertable = "SELECT create_hypertable('sensor_data', 'time');"
Step 3: Execute Statements from Step 1 and Step 2 and commit changes
Now we bring it all together by opening a cursor with our connection, executing our statement from step 1, then executing our statement from step 2 and committing our changes and closing the cursor:
cur = conn.cursor()
cur.execute(query_create_sensordata_table)
cur.execute(query_create_sensordata_hypertable)
#commit changes to the database to make changes persistent
conn.commit()
cur.close()
Congratulations, you’ve successfully created a hypertable in your Timescale database using Python!
Insert rows into TimescaleDB
How to insert rows using Psycopg2
Here’s a typical pattern you’d use to insert some data into a table. In the example below, we insert the relational data in the array sensors
, into the relational table named sensors
.
First, we open a cursor with our connection to the database, then using prepared statements formulate our INSERT
SQL statement and then we execute that statement,
sensors = [('a','floor'),('a', 'ceiling'), ('b','floor'), ('b', 'ceiling')]
cur = conn.cursor()
for sensor in sensors:
try:
cur.execute("INSERT INTO sensors (type, location) VALUES (%s, %s);",
(sensor[0], sensor[1]))
except (Exception, psycopg2.Error) as error:
print(error.pgerror)
conn.commit()
A cleaner way to pass variables to the cur.execute
function is below, where we separate the formulation of our SQL statement, SQL
, with the data being passed with it into the prepared statement, data
:
SQL = "INSERT INTO sensors (type, location) VALUES (%s, %s);"
for sensor in sensors:
try:
data = (sensor[0], sensor[1])
cur.execute(SQL, data)
except (Exception, psycopg2.Error) as error:
print(error.pgerror)
conn.commit()
Congratulations, you’ve successfully inserted data into TimescaleDB using Python!
How to insert rows fast using pgcopy
While using psycopg2
by itself may be sufficient for you to insert rows into your hypertable, if you need quicker performance, you can use pgcopy. To do this, install pgcopy
using pip3
or the like and then add this line to your list of import
statements:
from pgcopy import CopyManager
Here’s some sample code which shows how to insert data into Timescale using pgcopy
, using the example of sensor data from four sensors:
#insert using pgcopy
def fast_insert(conn):
cur = conn.cursor()
#for sensors with ids 1-4
for id in range(1,4,1):
data = (id, )
#create random data
simulate_query = """SELECT generate_series(now() - interval '24 hour', now(), interval '5 minute') AS time,
%s as sensor_id,
random()*100 AS temperature,
random() AS cpu
"""
cur.execute(simulate_query, data)
values = cur.fetchall()
#define columns names of the table you're inserting into
cols = ('time', 'sensor_id', 'temperature', 'cpu')
#create copy manager with the target table and insert!
mgr = CopyManager(conn, 'sensor_data', cols)
mgr.copy(values)
#commit after all sensor data is inserted
#could also commit after each sensor insert is done
conn.commit()
#check if it worked
cur.execute("SELECT * FROM sensor_data LIMIT 5;")
print(cur.fetchall())
cur.close()
Step 1: Get data to insert into database
First we generate random sensor data - you would replace this step with funneling in your real data from your data pipeline.
Step 2: Define columns of table you’re inserting data into
Then we define the column names of the table we want to insert data into. In this case, we’re using the sensor_data
hypertable that we created in the “Generate a Hypertable” section above. This hypertable consists of the columns named time
, sensor_id
, temperature
and cpu
. We define these column names in a tuple of strings called cols
.
Step 3: Instantiate a CopyManager with your target table and column definition
Lastly we create an instance of the pgcopy
CopyManager, mgr
, and pass our connection variable, hypertable name, and tuple of column names. Then we use the copy
function of the CopyManager to insert the data into the database performantly using pgcopy
and then commit when we’re done. There is also sample code to check if the insert worked.
Congratulations, you’ve successfully performantly inserted data into TimescaleDB using Python and the pgcopy
library!
Execute a query
Step 1: Define your query in SQL
First, define the SQL query you’d like to run on the database. The example below is a simple SELECT
statement from our Hello Timescale tutorial.
query = "SELECT * FROM rates;"
Step 2: Execute the query
Next we’ll open a cursor from our existing database connection, conn
, and then execute the query we defined in Step 1:
cur = conn.cursor()
query = "SELECT * FROM rates;"
cur.execute(query)
Step 3: Access results returned by query
To access all the resulting rows returned by your query, we’ll use one pyscopg2
’s results retrieval methods, such as fetchall()
or fetchmany()
. In the example below, we’re simply printing the results of our query, row by row. Note the the result of fetchall()
is a list of tuples, so you can handle them accordingly:
cur = conn.cursor()
query = "SELECT * FROM rates;"
cur.execute(query)
for i in cur.fetchall():
print(i)
cur.close()
Executing queries using prepared statements
For more complex queries than a simple SELECT *
, we can use prepared statements to ensure our queries are executed safely against the database. We write our query using placeholders as shown in the sample code below. For more on how to properly use placeholders in psycopg2, see the basic module usage document.
#query with placeholders
cur = conn.cursor()
query = """
SELECT time_bucket('5 minutes', time) AS five_min, avg(cpu)
FROM sensor_data
JOIN sensors ON sensors.id = sensor_data.sensor_id
WHERE sensors.location = %s AND sensors.type = %s
GROUP BY five_min
ORDER BY five_min DESC;
"""
data = (location, sensor_type)
cur.execute(query, data)
results = cur.fetchall()
Congratulations, you’ve successfully executed a query on TimescaleDB using Python! For more information on how to execute more complex queries, see the psycopg2 documentation
Next steps
Now that you’re able to connect, read, and write to a TimescaleDB instance from your Python application, and generate the scaffolding necessary to build a new application from an existing TimescaleDB instance, be sure to check out these advanced TimescaleDB tutorials: