- Quick Start: Ruby and TimescaleDB
- Goal
- Prerequisites
- Connect Ruby to TimescaleDB
- Create a relational table
- Generate hypertable
- Insert rows into TimescaleDB
- Execute a query
- Generating requests
- Counting requests per minute
- Creating scopes to reuse
- Add performance and path attributes to PageLoad
- Hooking application controller to collect performance data
- Exploring aggregation functions
Quick Start: Ruby and TimescaleDB
Goal
This quick start guide is designed to get the Rails developer up and running with TimescaleDB as their database. In this tutorial, you’ll learn how to:
- Connect to TimescaleDB
- Create a relational table
- Generate a Hypertable
- Insert a batch of rows into your Timescale database
- Execute a query on your Timescale database
Prerequisites
To complete this tutorial, you will need a cursory knowledge of the Structured Query Language (SQL). The tutorial will walk you through each SQL command, but it will be helpful if you’ve seen SQL before.
To start, install TimescaleDB. Once your installation is complete, we can proceed to ingesting or creating sample data and finishing the tutorial.
You will also need to install Rails.
Connect Ruby to TimescaleDB
Step 1: Create a new Rails application
Let’s start by creating a new Rails application configured to use PostgreSQL as the database. TimescaleDB is a PostgreSQL extension.
rails new my_app -d=postgresql
Rails will finish creating and bundling your application, installing all required Gems in the process.
Step 2: Configure the TimescaleDB database
Locate your TimescaleDB credentials in order to connect to your TimescaleDB instance.
You’ll need the following credentials:
- password
- username
- host URL
- port
- database name
In the default
section of the config/database.yml
section, configure your database:
default: &default
adapter: postgresql
encoding: unicode
# For details on connection pooling, see Rails configuration guide
# http://guides.rubyonrails.org/configuring.html#database-pooling
pool: <%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %>
host: [your hostname]
port: [your port]
username: [your username]
password: [your password]
warning
Experienced Rails developers will want to set and retrieve environment variables for the username and password of the database. For the purposes of this quick start, we will hard code the host
, port
, username
, and password
. This is _not_ advised for code or databases of consequence.
Then configure the database name in the development
, test
, and production
sections. Let’s call our database my_app_db
like so:
development:
<<: *default
database: my_app_db
Repeat the step for the test
and production
sections further down this file.
Your final file should look like this (without all the automatically generated comments):
default: &default
adapter: postgresql
encoding: unicode
# For details on connection pooling, see Rails configuration guide
# http://guides.rubyonrails.org/configuring.html#database-pooling
pool: <%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %>
host: [your hostname]
port: [your port]
username: [your username]
password: [your password]
development:
<<: *default
database: my_app_db
test:
<<: *default
database: my_app_db
production:
<<: *default
database: my_app_db
#### Create the database
Now we can run the following rake
command to create the database in TimescaleDB:
rails db:create
This will create the my_app_db
database in your TimescaleDB instance and a schema.rb
file that represents the state of your TimescaleDB database.
Create a relational table
Step 1: Add TimescaleDB to your Rails migration
First, let’s setup our database to include the TimescaleDB extension. We will start by creating a migration:
rails generate migration add_timescale
In your db/migrate
project folder, you’ll see a new migration file for [some sequence of numbers]_add_timescale.rb
. Replace the contents of that file with the following to instruct the database to load the TimescaleDB extension to PostgreSQL:
class AddTimescale < ActiveRecord::Migration[5.2]
def change
enable_extension("timescaledb") unless extensions.include? "timescaledb"
end
end
Step 2: Run database migrations
Now we can run the following rails
command to add the TimescaleDB extension to our database:
rails db:migrate
warning
In order for the command to work, you will need to make sure there is a database named postgres
in your TimescaleDB deployment. This database is sometimes not present by default.
With rails dbconsole
you can test that the extension has been added by running the \dx
command:
echo "\dx" | rails dbconsole
The output should be something like the following:
List of installed extensions
Name | Version | Schema | Description
-------------+---------+------------+-------------------------------------------------------------------
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
timescaledb | 2.1.1 | public | Enables scalable inserts and complex queries for time-series data
(2 rows)
Step 3: Create a table
Suppose we wanted to create a table to store the user agent (browser) and time whenever a visitor loads our page. You could easily extend this simple example to store a host of additional web analytics of interest to you. We can generate a Rails scaffold to represent this information in a table:
rails generate scaffold PageLoads user_agent:string
TimescaleDB requires that any UNIQUE
or PRIMARY KEY
indexes on your table include all partitioning columns, which in our case is the time column. A new Rails model will include a PRIMARY KEY
index for id
by default, so we need to either remove the column or make sure that the index includes time as part of a “composite key”.
tip
Composite keys aren’t supported natively by Rails, but if you need to keep your id
column around for some reason you can add support for them with the composite_primary_keys
gem.
To satisfy this TimescaleDB requirement, we need to change the migration code to not create a PRIMARY KEY
using the id
column when create_table
is used. To do this we can change the migration implementation:
class CreatePageLoads < ActiveRecord::Migration[6.0]
def change
create_table :page_loads, id: false do |t|
t.string :user_agent
t.timestamps
end
end
end
Rails generates all the helper files and a database migration. We can then run a rails db:migrate
command again to create the table in our database.
rails db:migrate
Now, we can confirm that our table exists using and the model is properly mapped using a simple rails runner
command:
rails runner 'p PageLoad.count'
0
And we can view the structure of the page_loads
table combining the \d page_loads
command in the rails dbconsole
output:
echo "\d page_loads" | rails dbconsole
Table "public.page_loads"
Column | Type | Collation | Nullable | Default
------------+--------------------------------+-----------+----------+---------
user_agent | character varying | | |
created_at | timestamp(6) without time zone | | not null |
updated_at | timestamp(6) without time zone | | not null |
Generate hypertable
In TimescaleDB, the primary point of interaction with your data is a hypertable, the abstraction of a single continuous table across all space and time intervals, such that one can query it via standard SQL.
Virtually all user interactions with TimescaleDB are with hypertables. Creating tables and indexes, altering tables, inserting data, selecting data, etc. can (and should) all be executed on the hypertable.
A hypertable is defined by a standard schema with column names and types, with at least one column specifying a time value.
tip
The TimescaleDB documentation on schema management and indexing explains this in further detail.
Let’s create this migration to modify the page_loads
database and create a hypertable by first running the following command:
rails generate migration add_hypertable
In your db/migrate
project folder, you’ll see a new migration file for [some sequence of numbers]_add_hypertable
.
Then we can write the migration to first remove the id
column and then add our hypertable like so:
class AddHypertable < ActiveRecord::Migration[5.2]
def change
remove_column :page_loads, :id
execute "SELECT create_hypertable('page_loads', 'created_at');"
end
end
When we run rails db:migrate
we will generate the hypertable.
We can confirm this in psql
by running the \d page_loads
command and seeing the following:
Table "public.page_loads"
Column | Type | Collation | Nullable | Default
------------+-----------------------------+-----------+----------+---------
user_agent | character varying | | |
time | timestamp without time zone | | not null |
created_at | timestamp without time zone | | not null |
updated_at | timestamp without time zone | | not null |
Indexes:
"page_loads_time_idx" btree ("time" DESC)
Triggers:
ts_insert_blocker BEFORE INSERT ON page_loads FOR EACH ROW EXECUTE PROCEDURE _timescaledb_internal.insert_blocker()
Insert rows into TimescaleDB
Let’s create a new view and controller so that we can insert a value into the database and see our results. When our view displays, we will store the user agent and time into our database.
rails generate controller static_pages home
This will generate the view and controller files for a page called /static_pages/home
in our site. Let’s first add a line to the static_pages_controller.rb
file to retrieve the user agent of the site visitor’s browser:
class StaticPagesController < ApplicationController
def home
@agent = request.user_agent
end
end
Subsequently, in the home.html.erb
file, we will print the @agent
variable we just created:
<h1>StaticPages#home</h1>
<p>Find me in app/views/static_pages/home.html.erb</p>
<p>
Request: <%= @agent %>
</p>
Start your Rails server on the command line:
rails s
And, in your browser, visit http://localhost:3000/static_pages/home
. You should see a printout of the user agent for your browser.
Now that we’ve successfully obtained the user agent and passed it as a variable to the view, we can create a PageLoad
object, store the user agent information and time, and save the object to our TimescaleDB database. In the static_pages_controller.rb
controller file, add the following:
class StaticPagesController < ApplicationController
def home
PageLoad.create(user_agent: request.user_agent)
end
end
Go back to your browser and refresh the page several times. You should see commit messages in your Rails console window, like so:
Started GET "/static_pages/home" for ::1 at 2020-04-15 14:02:18 -0700
Processing by StaticPagesController#home as HTML
(79.5ms) BEGIN
↳ app/controllers/static_pages_controller.rb:6
PageLoad Create (79.9ms) INSERT INTO "page_loads" ("user_agent", "time", "created_at", "updated_at") VALUES ($1, $2, $3, $4) [["user_agent", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1 Safari/605.1.15"], ["time", "2020-04-15 21:02:18.106769"], ["created_at", "2020-04-15 21:02:18.187643"], ["updated_at", "2020-04-15 21:02:18.187643"]]
↳ app/controllers/static_pages_controller.rb:6
(80.0ms) COMMIT
↳ app/controllers/static_pages_controller.rb:6
Rendering static_pages/home.html.erb within layouts/application
Rendered static_pages/home.html.erb within layouts/application (0.5ms)
Completed 200 OK in 266ms (Views: 20.9ms | ActiveRecord: 239.4ms)
You can view these entries in TimescaleDB by running the following command in psql
:
SELECT * FROM page_loads ORDER BY time DESC;
The output should look like this:
user_agent | created_at | updated_at
-----------------------------------------------------------------------------------------------------------------------+----------------------------+----------------------------
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1 Safari/605.1.15 | 2020-04-15 21:02:18.187643 | 2020-04-15 21:02:18.187643
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1 Safari/605.1.15 | 2020-04-15 21:02:17.404137 | 2020-04-15 21:02:17.404137
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1 Safari/605.1.15 | 2020-04-15 21:02:14.82468 | 2020-04-15 21:02:14.82468
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1 Safari/605.1.15 | 2020-04-15 21:02:12.957934 | 2020-04-15 21:02:12.957934
(4 rows)
Execute a query
So far, we’ve created a TimescaleDB table and inserted data into it. Now, let’s retrieve data and display it.
In our static_pages_controller.rb
file let’s modify the home
method and use Active Record to query all items in the page_load
database and store them in an array:
class StaticPagesController < ApplicationController
def home
PageLoad.create(:user_agent => request.user_agent)
end
end
And we can modify our home.html.erb
view to iterate over the array and display each item:
<h1>Static Pages requests: <%= PageLoad.count %> </h1>
Now, each time we refresh our page, we can see that a record is being inserted into the my_app_db
TimescaleDB database, and the counter is incremented on the page.
Generating requests
We need to have a lot of page loads to continue our research and explore the time_bucket function.
Let’s use Apache Bench aka ab
to request 50,000 times parallelizing 10 times.
ab -n 50000 -c 10 http://localhost:3000/static_pages/home
Now, you can grab a tea and relax while it creates thousands of records in your first hypertable. You’ll be able to count how many ‘empty requests’ your Rails will support.
Counting requests per minute
Once the ab
command begins running, we can start a rails console and try some queries using the time_bucket function.
rails console
Now, let’s start counting how many requests we have per minute:
PageLoad
.select("time_bucket('1 minute', created_at) as time, count(1) as total")
.group('time').order('time')
.map {|result| [result.time, result.total]}
# => [
# [2021-04-14 20:38:00 UTC, 11770],
# [2021-04-14 20:39:00 UTC, 11668], ...]
It works! Now, let’s create some useful scopes that can help to summarize and easily access the time_bucket
function:
Creating scopes to reuse
Scopes are very useful for decomposing complex SQL statements into Ruby objects. It also allow to introduce params and reuse queries as you need.
Examples of scopes:
class PageLoad < ApplicationRecord
scope :last_month, -> { where('created_at > ?', 1.month.ago) }
scope :last_week, -> { where('created_at > ?', 1.week.ago) }
scope :last_hour, -> { where('created_at > ?', 1.hour.ago) }
scope :yesterday, -> { where('DATE(created_at) = ?', 1.day.ago.to_date) }
scope :today, -> { where('DATE(created_at) = ?', Date.today) }
end
And you can also combine the scopes with other ActiveRecord methods:
PageLoad.last_week.count # Total of requests from last week
PageLoad.last_hour.first # First request from last hour
PageLoad.last_hour.all # All requests from last hour
PageLoad.last_hour.limit(10) # 10 requests from last hour
# Count chrome users from last hour
PageLoad.last_hour.where("user_agent ilikes '%Chrome%'").count
Now, let’s introduce a scope that counts per time dimension:
class PageLoad < ApplicationRecord
scope :counts_per, -> (time_dimension) {
select("time_bucket('#{time_dimension}', created_at) as time, count(1) as total")
.group(:time).order(:time)
.map {|result| [result.time, result.total]}
}
end
Exploring other time frames:
PageLoad.counts_per('1 hour')
# PageLoad Load (1037.3ms) SELECT time_bucket('1 hour', created_at) as time, count(1) as total FROM "page_loads" WHERE (created_at > '2021-04-08 12:03:14.800902') GROUP BY time ORDER BY time
# => [[2021-04-14 21:00:00 UTC, 185836],
# [2021-04-14 22:00:00 UTC, 155286], ... ]
Add performance and path attributes to PageLoad
Let’s get deeper in requests, moving our example to watch all server requests and store the endpoint path and the time necessary to return the response.
First, we need to add columns to the database using rails migrations:
rails g migration add_performance_to_page_load path:string performance:float
The Rails generator is smart enough to understand the naming convention of the migration and the extra params to suggest a code like this:
class AddPerformanceToPageLoad < ActiveRecord::Migration[6.0]
def change
add_column :page_loads, :path, :string
add_column :page_loads, :performance, :float
end
end
And, now we can run migrations with rails db:migrate
to get the two columns in the database.
Hooking application controller to collect performance data
Our next step is make the PageLoad record creation happen in any request happening in the system. So, let’s hook the application controller with some around_action hook.
class ApplicationController < ActionController::Base
around_action do |controller, action|
performance = Benchmark.measure(&action.method(:call))
PageLoad.create(path: request.path,
performance: performance.real,
user_agent: request.user_agent)
end
end
We’re using only the real performance from benchmark but you can collect additional metrics to see more details about your system.
You can refresh the page and check the latest record in the rails console:
PageLoad.order(:created_at).last
# PageLoad Load (1.7ms) SELECT "page_loads".* FROM "page_loads" ORDER BY "page_loads"."created_at" DESC LIMIT $1 [["LIMIT", 1]]
# => #<PageLoad:0x00007fdafc5c69d8 path: "/static_pages/home", performance: 0.049275, ...>
Exploring aggregation functions
Now that we know what pages exist, we can explore page by page (or all the pages together), grouping by path or not.
class PageLoad < ApplicationRecord
scope :time_bucket, -> (time_dimension, value: 'count(1)') {
select(<<~SQL)
time_bucket('#{time_dimension}', created_at) as time, path,
#{value} as value
SQL
.group('time, path').order('path, time')
}
end
And we can build scopes reusing previous scopes to have easy names for the most used queries:
scope :per_minute, -> { time_bucket('1 minute') }
scope :per_hour, -> { time_bucket('1 hour') }
scope :per_day, -> { time_bucket('1 day') }
scope :per_week, -> { time_bucket('1 week') }
scope :per_month, -> { time_bucket('1 month') }
Create some average response depending on the timeframe:
scope :average_response_time_per_minute, -> { time_bucket('1 minute', value: 'avg(performance)') }
scope :average_response_time_per_hour, -> { time_bucket('1 hour', value: 'avg(performance)') }
And also, understand the limits max
and min
of the requests:
scope :worst_response_time_last_minute, -> { time_bucket('1 minute', value: 'max(performance)') }
scope :worst_response_time_last_hour, -> { time_bucket('1 hour', value: 'max(performance)') }
scope :best_response_time_last_hour, -> { time_bucket('1 hour', value: 'min(performance)') }
Finally, let’s build some useful method that can create a resume for every different path. So, first step is collect unique paths from the page loads:
scope :paths, -> { distinct.pluck(:path) }
Testing on Ruby console we have:
PageLoad.paths # => ["/page_loads/new", "/static_pages/home"]
# (151.6ms) SELECT DISTINCT "page_loads"."path" FROM "page_loads"
Now, let’s have a look in the actual metrics we generate for the response time filtering by methods that contains response_time
.
PageLoad.methods.grep /response_time/
# => [:average_response_time_per_minute,
# :worst_response_time_last_minute,
# :average_response_time_per_hour,
# :worst_response_time_last_hour,
# :best_response_time_last_hour]
Now, it’s time to build our summary based on every single page:
def self.resume_for(path)
filter = where(path: path)
get = -> (scope_name) { filter.send(scope_name).first.value}
metrics.each_with_object({}) do |metric, resume|
resume[metric] = get[metric]
end
end
def self.metrics
methods.grep /response_time/
end
Trying it on console:
PageLoad.resume_for("/page_loads/new")
# => {:average_response_time_per_minute=>0.967591333319433,
# :worst_response_time_last_minute=>2.892941999947652,
# :average_response_time_per_hour=>0.48624183332625154,
# :worst_response_time_last_hour=>2.892941999947652,
# :best_response_time_last_hour=>0.0030219999607652426}
And you can keep combining other filters like:
PageLoad.last_week.resume_for("/page_loads/new")
PageLoad.yesterday.resume_for("/page_loads/new")
PageLoad.today.resume_for("/page_loads/new")
The last step is to recursively navigate into all of the pages and build a summary for each:
def self.statistics
paths.each_with_object({}) do |path, resume|
resume[path] = resume_for(path)
end
end
And now, testing again on console:
PageLoad.statistics
# => {"/page_loads/new"=>
# {:average_response_time_per_minute=>0.967591333319433,
# :worst_response_time_last_minute=>2.892941999947652,
# :average_response_time_per_hour=>0.48624183332625154,
# :worst_response_time_last_hour=>2.892941999947652,
# :best_response_time_last_hour=>0.0030219999607652426},
# "/static_pages/home"=>
# {:average_response_time_per_minute=> ...}
And you can also explore different data frames for the statistics:
PageLoad.yesterday.statistics # => {...}
PageLoad.last_week.statistics # => {...}
As you can see in the console, every single query is being executed independent, which is suboptimal but covers different options.
Now that you get some basics of the TimescaleDB instance from your Rails application, be sure to check out these advanced TimescaleDB tutorials: