Sqlite User-Defined Functions
The sqlite_udf
playhouse module contains a number of user-defined functions, aggregates, and table-valued functions, which you may find useful. The functions are grouped in collections and you can register these user-defined extensions individually, by collection, or register everything.
Scalar functions are functions which take a number of parameters and return a single value. For example, converting a string to upper-case, or calculating the MD5 hex digest.
Aggregate functions are like scalar functions that operate on multiple rows of data, producing a single result. For example, calculating the sum of a list of integers, or finding the smallest value in a particular column.
Table-valued functions are simply functions that can return multiple rows of data. For example, a regular-expression search function that returns all the matches in a given string, or a function that accepts two dates and generates all the intervening days.
Note
To use table-valued functions, you will need to build the playhouse._sqlite_ext
C extension.
Registering user-defined functions:
db = SqliteDatabase('my_app.db')
# Register *all* functions.
register_all(db)
# Alternatively, you can register individual groups. This will just
# register the DATE and MATH groups of functions.
register_groups(db, 'DATE', 'MATH')
# If you only wish to register, say, the aggregate functions for a
# particular group or groups, you can:
register_aggregate_groups(db, 'DATE')
Using a library function (“hostname”):
# Assume we have a model, Link, that contains lots of arbitrary URLs.
# We want to discover the most common hosts that have been linked.
query = (Link
.select(fn.hostname(Link.url).alias('host'), fn.COUNT(Link.id))
.group_by(fn.hostname(Link.url))
.order_by(fn.COUNT(Link.id).desc())
.tuples())
# Print the hostname along with number of links associated with it.
for host, count in query:
print('%s: %s' % (host, count))
Functions, listed by collection name
Scalar functions are indicated by (f)
, aggregate functions by (a)
, and table-valued functions by (t)
.
CONTROL_FLOW
if_then_else
(cond, truthy[, falsey=None])
Simple ternary-type operator, where, depending on the truthiness of the cond
parameter, either the truthy
or falsey
value will be returned.
DATE
strip_tz
(date_str)
Parameters: | date_str – A datetime, encoded as a string. |
---|---|
Returns: | The datetime with any timezone info stripped off. |
The time is not adjusted in any way, the timezone is simply removed.
humandelta
(nseconds[, glue=’, ‘])
Parameters: |
|
---|---|
Returns: | Easy-to-read description of timedelta. |
Example, 86471 -> “1 day, 1 minute, 11 seconds”
mintdiff
(datetime_value)
Parameters: | datetime_value – A date-time. |
---|---|
Returns: | Minimum difference between any two values in list. |
Aggregate function that computes the minimum difference between any two datetimes.
avgtdiff
(datetime_value)
Parameters: | datetime_value – A date-time. |
---|---|
Returns: | Average difference between values in list. |
Aggregate function that computes the average difference between consecutive values in the list.
duration
(datetime_value)
Parameters: | datetime_value – A date-time. |
---|---|
Returns: | Duration from smallest to largest value in list, in seconds. |
Aggregate function that computes the duration from the smallest to the largest value in the list, returned in seconds.
date_series
(start, stop[, step_seconds=86400])
Parameters: |
|
---|
Table-value function that returns rows consisting of the date/+time values encountered iterating from start to stop, step_seconds
at a time.
Additionally, if start does not have a time component and step_seconds is greater-than-or-equal-to one day (86400 seconds), the values returned will be dates. Conversely, if start does not have a date component, values will be returned as times. Otherwise values are returned as datetimes.
Example:
SELECT * FROM date_series('2017-01-28', '2017-02-02');
value
-----
2017-01-28
2017-01-29
2017-01-30
2017-01-31
2017-02-01
2017-02-02
FILE
file_ext
(filename)
Parameters: | filename (str) – Filename to extract extension from. |
---|---|
Returns: | Returns the file extension, including the leading “.”. |
file_read
(filename)
Parameters: | filename (str) – Filename to read. |
---|---|
Returns: | Contents of the file. |
HELPER
gzip
(data[, compression=9])
Parameters: |
|
---|---|
Returns: | Compressed binary data. |
gunzip
(data)
Parameters: | data (bytes) – Compressed data. |
---|---|
Returns: | Uncompressed binary data. |
hostname
(url)
Parameters: | url (str) – URL to extract hostname from. |
---|---|
Returns: | hostname portion of URL |
toggle
(key)
Parameters: | key – Key to toggle. |
---|
Toggle a key between True/False state. Example:
>>> toggle('my-key')
True
>>> toggle('my-key')
False
>>> toggle('my-key')
True
setting
(key[, value=None])
Parameters: |
|
---|---|
Returns: | Value associated with key. |
Store/retrieve a setting in memory and persist during lifetime of application. To get the current value, only specify the key. To set a new value, call with key and new value.
clear_toggles
()
Clears all state associated with the toggle()
function.
clear_settings
()
Clears all state associated with the setting()
function.
MATH
randomrange
(start[, stop=None[, step=None]])
Parameters: |
|
---|
Return a random integer between [start, end)
.
gauss_distribution
(mean, sigma)
Parameters: |
|
---|
sqrt
(n)
Calculate the square root of n
.
tonumber
(s)
Parameters: | s (str) – String to convert to number. |
---|---|
Returns: | Integer, floating-point or NULL on failure. |
mode
(val)
Parameters: | val – Numbers in list. |
---|---|
Returns: | The mode, or most-common, number observed. |
Aggregate function which calculates mode of values.
minrange
(val)
Parameters: | val – Value |
---|---|
Returns: | Min difference between two values. |
Aggregate function which calculates the minimal distance between two numbers in the sequence.
avgrange
(val)
Parameters: | val – Value |
---|---|
Returns: | Average difference between values. |
Aggregate function which calculates the average distance between two consecutive numbers in the sequence.
range
(val)
Parameters: | val – Value |
---|---|
Returns: | The range from the smallest to largest value in sequence. |
Aggregate function which returns range of values observed.
median
(val)
Parameters: | val – Value |
---|---|
Returns: | The median, or middle, value in a sequence. |
Aggregate function which calculates the middle value in a sequence.
Note
Only available if you compiled the _sqlite_udf
extension.
STRING
substr_count
(haystack, needle)
Returns number of times needle
appears in haystack
.
strip_chars
(haystack, chars)
Strips any characters in chars
from beginning and end of haystack
.
damerau_levenshtein_dist
(s1, s2)
Computes the edit distance from s1 to s2 using the damerau variant of the levenshtein algorithm.
Note
Only available if you compiled the _sqlite_udf
extension.
levenshtein_dist
(s1, s2)
Computes the edit distance from s1 to s2 using the levenshtein algorithm.
Note
Only available if you compiled the _sqlite_udf
extension.
str_dist
(s1, s2)
Computes the edit distance from s1 to s2 using the standard library SequenceMatcher’s algorithm.
Note
Only available if you compiled the _sqlite_udf
extension.
regex_search
(regex, search_string)
Parameters: |
|
---|
Table-value function that searches a string for substrings that match the provided regex
. Returns rows for each match found.
Example:
SELECT * FROM regex_search('\w+', 'extract words, ignore! symbols');
value
-----
extract
words
ignore
symbols