User-Defined Functions
Table of Contents
CREATE OR REPLACE
CrateDB supports user-defined functions. See CREATE FUNCTION for a full syntax description.
These functions can be created like so:
cr> CREATE FUNCTION my_subtract_function(integer, integer)
... RETURNS integer
... LANGUAGE JAVASCRIPT
... AS 'function my_subtract_function(a, b) { return a - b; }';
CREATE OK, 1 row affected (... sec)
cr> SELECT doc.my_subtract_function(3, 1);
+--------------------------------+
| doc.my_subtract_function(3, 1) |
+--------------------------------+
| 2 |
+--------------------------------+
SELECT 1 row in set (... sec)
OR REPLACE
can be used to replace an existing function:
cr> CREATE OR REPLACE FUNCTION log10(long)
... RETURNS double
... LANGUAGE JAVASCRIPT
... AS 'function log10(a) { return Math.log(a)/Math.log(10); }';
CREATE OK, 1 row affected (... sec)
cr> SELECT doc.log10(10);
+---------------+
| doc.log10(10) |
+---------------+
| 1.0 |
+---------------+
SELECT 1 row in set (... sec)
Arguments can be named in the function definition.
For example, if you wanted two geo_point
arguments named start_point
and end_point
, you would do it like this:
cr> CREATE OR REPLACE FUNCTION calculate_distance(start_point geo_point, end_point geo_point)
... RETURNS float
... LANGUAGE JAVASCRIPT
... AS 'function calculate_distance(start_point, end_point){
... return Math.sqrt( Math.pow(end_point[0] - start_point[0], 2), Math.pow(end_point[1] - start_point[1], 2));
... }';
CREATE OK, 1 row affected (... sec)
Note
Argument names are used for query documentation purposes only. You cannot reference arguments by name in the function body.
Optionally, you can specify a schema for the function. If you omit the schema, the current session schema is used.
You can explicitly assign a schema like this:
cr> CREATE OR REPLACE FUNCTION my_schema.log10(long)
... RETURNS double
... LANGUAGE JAVASCRIPT
... AS 'function log10(a) { return Math.log(a)/Math.log(10); }';
CREATE OK, 1 row affected (... sec)
Warning
Snapshots can’t be used to backup functions, because snapshots contain table data only.
Supported Types
The argument types, and the return type of the function can be any of the CrateDB supported Data Types. Data types of values passed into a function must strictly correspond to its argument data types.
Note
The value returned by the function will be casted to the return type provided in the definition if required. An exception will be thrown if the cast is not successful.
Overloading
Within a specific schema, you can overload functions by defining two functions with the same name that have a different set of arguments:
cr> CREATE FUNCTION my_schema.my_multiply(integer, integer)
... RETURNS integer
... LANGUAGE JAVASCRIPT
... AS 'function my_multiply(a, b) { return a * b; }';
CREATE OK, 1 row affected (... sec)
This would overload our my_multiply
function with different argument types:
cr> CREATE FUNCTION my_schema.my_multiply(long, long)
... RETURNS long
... LANGUAGE JAVASCRIPT
... AS 'function my_multiply(a, b) { return a * b; }';
CREATE OK, 1 row affected (... sec)
This would overload our my_multiply
function with more arguments:
cr> CREATE FUNCTION my_schema.my_multiply(long, long, long)
... RETURNS long
... LANGUAGE JAVASCRIPT
... AS 'function my_multiply(a, b, c) { return a * b * c; }';
CREATE OK, 1 row affected (... sec)
Caution
It is considered bad practice to create functions that have the same name as the CrateDB built-in functions!
Note
If you call a function without a schema name, CrateDB will look it up in the built-in functions first and only then in the user-defined functions available in the search_path.
Therefore a built-in function with the same name as a user-defined function will hide the latter, even if it contains a different set of arguments! However, such functions can still be called if the schema name is explicitly provided.
Determinism
Caution
User-defined functions need to be deterministic, meaning that they must always return the same result value when called with the same argument values, because CrateDB might cache the returned values and reuse the value if the function is called multiple times with the same arguments.
DROP FUNCTION
Functions can be dropped like this:
cr> DROP FUNCTION doc.log10(long);
DROP OK, 1 row affected (... sec)
Adding IF EXISTS
prevents from raising an error if the function doesn’t exist:
cr> DROP FUNCTION IF EXISTS doc.log10(integer);
DROP OK, 1 row affected (... sec)
Optionally, argument names can be specified within the drop statement:
cr> DROP FUNCTION IF EXISTS doc.calculate_distance(start_point geo_point, end_point geo_point);
DROP OK, 1 row affected (... sec)
Optionally, you can provide a schema:
cr> DROP FUNCTION my_schema.log10(long);
DROP OK, 1 row affected (... sec)
Supported Languages
CrateDB currently only supports the UDF language javascript
.
JavaScript
The UDF language javascript
supports the ECMAScript 5.1 standard.
Note
The JavaScript language is an enterprise feature.
CrateDB uses the Java built-in JavaScript engine Nashorn to interpret and execute functions written in JavaScript. The engine is initialized using the --no-java
option which basically restricts all access to Java APIs from within the JavaScript context. CrateDB’s engine also does not allow non-standard syntax extensions (--no-syntax-extensions
).
This, however, does not mean that JavaScript is securely sandboxed.
Also, even though Nashorn runs ECMA-complient JavaScript, objects that are normally accessible with a web browser (e.g. window
, console
and so on) are are not available.
Caution
The JavaScript language is an experimental feature and is disabled by default. You can enable the Javascript Language via the configuration file.
Supported Types
JavaScript functions can handle all CrateDB data types. However, for some return types the function output must correspond to the certain format.
If a function requires geo_point
as a return type, then the JavaScript function must return a double array
of size 2, WKT
string or GeoJson
object.
Here is an example of a JavaScript function returning a double array
:
cr> CREATE FUNCTION rotate_point(point geo_point, angle float)
... RETURNS geo_point
... LANGUAGE JAVASCRIPT
... AS 'function rotate_point(point, angle) {
... var cos = Math.cos(angle);
... var sin = Math.sin(angle);
... var x = cos * point[0] - sin * point[1];
... var y = sin * point[0] + cos * point[1];
... return [x, y];
... }';
CREATE OK, 1 row affected (... sec)
Below is an example of a JavaScript function returning a WKT
string, which will be cast to geo_point
:
cr> CREATE FUNCTION symmetric_point(point geo_point)
... RETURNS geo_point
... LANGUAGE JAVASCRIPT
... AS 'function symmetric_point (point, angle) {
... var x = - point[0],
... y = - point[1];
... return "POINT (\" + x + \", \" + y +\")";
... }';
CREATE OK, 1 row affected (... sec)
Similarly, if the function specifies the geo_shape
return data type, then the JavaScript function should return a GeoJson
object or``WKT`` string:
cr> CREATE FUNCTION line(start_point array(double), end_point array(double))
... RETURNS object
... LANGUAGE JAVASCRIPT
... AS 'function line(start_point, end_point) {
... return { "type": "LineString", "coordinates" : [start_point, end_point] };
... }';
CREATE OK, 1 row affected (... sec)
Note
If the return value of the JavaScript function is undefined
, it is converted to NULL
.
Working with NUMBERS
The JavaScript engine Nashorn interprets numbers as java.lang.Double
, java.lang.Long
, or java.lang.Integer
, depending on the computation performed. In most cases, this is not an issue, since the return type of the JavaScript function will be cast to the return type specified in the CREATE FUNCTION
statement, although cast might result in a loss of precision.
However, when you try to cast DOUBLE
to TIMESTAMP
, it will be interpreted as UTC seconds and will result in a wrong value:
cr> CREATE FUNCTION utc(long, long, long)
... RETURNS TIMESTAMP
... LANGUAGE JAVASCRIPT
... AS 'function utc(year, month, day) {
... return Date.UTC(year, month, day, 0, 0, 0);
... }';
CREATE OK, 1 row affected (... sec)
cr> SELECT date_format(utc(2016,04,6)) as epoque;
+------------------------------+
| epoque |
+------------------------------+
| 48314-07-22T00:00:00.000000Z |
+------------------------------+
SELECT 1 row in set (... sec)
To avoid this behavior, the numeric value should be divided by 1000 before it is returned:
cr> CREATE FUNCTION utc(long, long, long)
... RETURNS TIMESTAMP
... LANGUAGE JAVASCRIPT
... AS 'function utc(year, month, day) {
... return Date.UTC(year, month, day, 0, 0, 0)/1000;
... }';
CREATE OK, 1 row affected (... sec)
cr> SELECT date_format(utc(2016,04,6)) as epoque;
+-----------------------------+
| epoque |
+-----------------------------+
| 2016-05-06T00:00:00.000000Z |
+-----------------------------+
SELECT 1 row in set (... sec)
Working With Array
Methods
The JavaScript Array
object has a number of prototype methods you can use, such as join, map, sort, slice, reduce, and so on.
Normally, you can call these methods directly from an Array
object, like so:
function array_join(a, b) {
return a.join(b);
}
However, when writing JavaScript for use with CrateDB, you must explicitly use the prototype method:
function array_join(a, b) {
return Array.prototype.join.call(a, b);
}
You must do it like this because arguments are not passed as Array
objects, and so do not have the associated prototype methods available. Arguments are instead passed as array-like objects.