Estimate Vitals Storage in PostgreSQL
Vitals data can be divided into two categories: Kong Gateway node statistics and request response codes.
Kong Gateway node statistics
These types of metrics are proxy latency, upstream latency, and cache hit/miss. Kong Gateway node statistics are stored in tables like the following:
vitals_stats_seconds_timestamp
stores 1 new row for every second Kong runsvitals_stats_minutes
stores 1 new row for every minute Kong runsvitals_stats_days
stores 1 new row for every day Kong runs
Kong Gateway node statistics are not associated with specific Kong Gateway entities like Workspaces, Services, or Routes. They’re designed to represent the cluster’s state in time. This means the tables will have new rows regardless if Kong Gateway is routing traffic or idle.
The tables do not grow infinitely and hold data for the following duration of time:
vitals_stats_seconds_timestamp
holds data for 1 hour (3600 rows)vitals_stats_minutes
holds data for 25 hours (90000 rows)vitals_stats_days
holds data for 2 years (730 rows)
Request response codes
Request response codes are stored in the other group of tables following a different rationale. Tables in this group share the same structure (entity_id
, at
, duration
, status_code
, count
):
vitals_code_classes_by_workspace
vitals_code_classes_by_cluster
vitals_codes_by_route
The entity_id
does not exist in vitals_code_classes_by_cluster
as this table doesn’t store entity-specific information. In the vitals_code_classes_by_workspace
table, entity_id
is workspace_id
. In the vitals_codes_by_route
table, entity_id
is service_id
and route_id
.
at
is a timestamp. It logs the start of the period a row represents, while duration
is the duration of that period.
status_code
and count
are the quantity of the HTTP status codes (200, 300, 400, 500) observed in the period represented by a row.
While Kong Gateway node statistic tables grow only according to time, status code tables only have new rows when Kong Gateway proxies traffic, and the number of new rows depends on the traffic itself.
Example
Consider a brand new Kong Gateway that hasn’t proxied any traffic yet. Kong Gateway node statistic tables have rows but status codes tables don’t.
When Kong Gateway proxies its first request at t
returning status code 200, the following rows are added:
vitals_codes_by_cluster
[second(t), 1, 200, 1]
[minute(t), 60, 200, 1]
[day(t), 84600, 200, 1]
Second, minute, and day content is trimmed in the following way:
second(t)
ist
trimmed to seconds, for example:second(2021-01-01 20:21:30)
would be2021-01-01 20:21:30
.minute(t)
ist
trimmed to minutes, for example:minute(2021-01-01 20:21:30.234)
would be2021-01-01 20:21:00
.day(t)
ist
trimmed to day, for example:day(2021-01-01 20:21:30.234)
would be2021-01-01 00:00:00
.
vitals_codes_by_workspace
[workspace_id, second(t), 1, 200, 1]
[workspace_id, minute(t), 60, 200, 1]
[workspace_id, day(t), 84600, 200, 1]
vitals_codes_by_route
[service_id, route_id, second(t), 1, 200, 1]
[service_id, route_id, minute(t), 60, 200, 1]
[service_id, route_id, day(t), 84600, 200, 1]
Let’s consider what happens when new requests are proxied in some scenarios.
Scenario where no rows are inserted
If we make the same request again at the same t
and it also receives 200, no new rows will be inserted.
In this case, the existing rows have their count column incremented accordingly:
vitals_codes_by_cluster
[second(t), 1, 200, 2]
[minute(t), 60, 200, 2]
[day(t), 84600, 200, 2]
vitals_codes_by_workspace
[workspace_id, second(t), 1, 200, 2]
[workspace_id, minute(t), 60, 200, 2]
[workspace_id, day(t), 84600, 200, 2]
vitals_codes_by_route
[service_id, route_id, second(t), 1, 200, 2]
[service_id, route_id, minute(t), 60, 200, 2]
[service_id, route_id, day(t), 84600, 200, 2]
Scenario where new rows are inserted
If the last request received a 500 status code, new rows are inserted:
vitals_codes_by_cluster
[second(t), 1, 200, 1]
[minute(t), 60, 200, 1]
[day(t), 84600, 200, 1]
[second(t), 1, 500, 1]
[minute(t), 60, 500, 1]
[day(t), 84600, 500, 1]
vitals_codes_by_workspace
[workspace_id, second(t), 1, 200, 1]
[workspace_id, minute(t), 60, 200, 1]
[workspace_id, day(t), 84600, 200, 1]
[workspace_id, second(t), 1, 500, 1]
[workspace_id, minute(t), 60, 500, 1]
[workspace_id, day(t), 84600, 500, 1]
vitals_codes_by_route
[service_id, route_id, second(t), 1, 200, 1]
[service_id, route_id, minute(t), 60, 200, 1]
[service_id, route_id, day(t), 84600, 200, 1]
[service_id, route_id, second(t), 1, 500, 1]
[service_id, route_id, minute(t), 60, 500, 1]
[service_id, route_id, day(t), 84600, 500, 1]
Scenario where a second row is inserted
Assume that at t + 5s
, where minute(t)==minute(t + 5s)
, Kong Gateway proxies the same request returning 200. Since minute()
and day()
for both t
and t + 5s
are the same, minute and day rows should just be updated. Since second()
is different for the two instants, a new second row should be inserted in each table.
vitals_codes_by_cluster
[second(t), 1, 200, 1]
[second(t + 5s), 1, 200, 1]
[minute(t), 60, 200, 2]
[day(t), 84600, 200, 2]
vitals_codes_by_workspace
[workspace_id, second(t), 1, 200, 1]
[workspace_id, second(t + 5s), 1, 200, 1]
[workspace_id, minute(t), 60, 200, 2]
[workspace_id, day(t), 84600, 200, 2]
vitals_codes_by_route
[service_id, route_id, second(t), 1, 200, 1]
[service_id, route_id, second(t + 5s), 1, 200, 1]
[service_id, route_id, minute(t), 60, 200, 2]
[service_id, route_id, day(t), 84600, 200, 2]
In summary, the number of rows in those status codes tables is directly related to:
- The number of status codes observed in Kong Gateway proxied requests
- The number of Kong Gateway entities involved in those requests
- The constant flow of proxied requests
In an estimate of row numbers in scenario, consider a Kong Gateway cluster with the following characteristics:
- A constant flow of requests returning all 5 possible groups of status codes (1xx, 2xx, 3xx, 4xx and 5xx).
- Just 1 workspace, 1 service, and 1 route
After 24 hours of traffic, the status codes tables will have this number of rows:
Status code table name | Day | Minute | Seconds | Total |
---|---|---|---|---|
vitals_codes_by_cluster | 5 | 7200 | 18000 | 25200 |
vitals_codes_by_workspace | 5 | 7200 | 18000 | 25200 |
vitals_codes_by_route | 5 | 7200 | 18000 | 25200 |
It’s important to note that this assumes that all 5 groups of status codes had been observed in those 24 hours of traffic. This is why quantities were multiplied by 5.
With this baseline scenario, it’s easier to calculate what happens when the number of Kong Gateway entities (Workspaces and Routes) involved in the traffic increases. Tables vitals_codes_by_workspace
and vitals_codes_by_route
have their row number change with increase in workspaces and routes, respectively.
If the above Kong Gateway cluster is expanded to have 10 workspaces with 1 route each (10 routes total) and it proxies traffic for 24 hours and returns all 5 status codes, vitals_codes_by_workspace
and vitals_codes_by_route
would have 252,000 rows.