approx_count()
Returns an estimate of the number of times that the text item
was seen by the Count-Min Sketch agg
.
approx_count (
item TEXT,
agg CountMinSketch
) RETURNS INTEGER
warning
Experimental features could have bugs. They might not be backwards compatible, and could be removed in future releases. Use these features at your own risk, and do not use any experimental features in production.
Required arguments
Name | Type | Description |
---|---|---|
item | TEXT | The text whose frequency you want to estimate |
agg | CountMinSketch | The aggregate to use for estimating the frequency of item |
Returns
Column | Type | Description |
---|---|---|
approx_count | INTEGER | The estimated number of times item was seen by the sketch |
Sample usage
Create a Count-Min Sketch of the stock symbols seen in your tick data. With this aggregate, you’ll then be able to estimate how often any text value appears in the symbol
column.
SELECT count_min_sketch(symbol, 0.01, 0.01) AS symbol_sketch
FROM stocks_real_time;
In this example, the first 0.01
dictates that your frequency estimates have a relative error of 1%. A relative error of 1% means that the approximate count of an item is overestimated by at most 1% of the total number of (non-NULL
) rows in the column you aggregated. (The Count-Min Sketch is a biased estimator of the true frequency because it may overestimate the count of an item, but it cannot underestimate that count.)
The second 0.01
means that your estimated frequency falls outside those error bounds 1% of the time (on average).
You can then pass this aggregate into the approx_count
function. Doing so gives you an estimate of how many times a given symbol appears in the symbol
column.
For example, if you wanted to know approximately how many of the quotes in the tick data were for the AAPL
stock, you would then do the following:
WITH t AS (
SELECT count_min_sketch(symbol, 0.01, 0.01) AS symbol_sketch
FROM stocks_real_time
)
SELECT toolkit_experimental.approx_count('AAPL', symbol_sketch)
FROM t;