10.5. Functions
Plugin Implementation
The function framework is used to implement SQL functions. Presto includes anumber of built-in functions. In order to implement new functions, you canwrite a plugin that returns one more more functions from getFunctions()
:
- public class ExampleFunctionsPlugin
- implements Plugin
- {
- @Override
- public Set<Class<?>> getFunctions()
- {
- return ImmutableSet.<Class<?>>builder()
- .add(ExampleNullFunction.class)
- .add(IsNullFunction.class)
- .add(IsEqualOrNullFunction.class)
- .add(ExampleStringFunction.class)
- .add(ExampleAverageFunction.class)
- .build();
- }
- }
Note that the ImmutableSet
class is a utility class from Guava.The getFunctions()
method contains all of the classes for the functionsthat we will implement below in this tutorial.
For a full example in the codebase, see either the presto-ml
module for machinelearning functions or the presto-teradata-functions
module for Teradata-compatiblefunctions, both in the root of the Presto source.
Scalar Function Implementation
The function framework uses annotations to indicate relevant informationabout functions, including name, description, return type and parametertypes. Below is a sample function which implements is_null
:
- public class ExampleNullFunction
- {
- @ScalarFunction("is_null", calledOnNullInput = true)
- @Description("Returns TRUE if the argument is NULL")
- @SqlType(StandardTypes.BOOLEAN)
- public static boolean isNull(@SqlNullable @SqlType(StandardTypes.VARCHAR) Slice string)
- {
- return (string == null);
- }
- }
The function is_null
takes a single VARCHAR
argument and returns aBOOLEAN
indicating if the argument was NULL
. Note that the argument tothe function is of type Slice
. VARCHAR
uses Slice
, which is essentiallya wrapper around byte[]
, rather than String
for its native container type.
@SqlType
:
The @SqlType
annotation is used to declare the return type and the argumenttypes. Note that the return type and arguments of the Java code must matchthe native container types of the corresponding annotations.
@SqlNullable
:
The @SqlNullable
annotation indicates that the argument may be NULL
. Withoutthis annotation the framework assumes that all functions return NULL
ifany of their arguments are NULL
. When working with a Type
that has aprimitive native container type, such as BigintType
, use the object wrapper for thenative container type when using @SqlNullable
. The method must be annotated with@SqlNullable
if it can return NULL
when the arguments are non-null.
Parametric Scalar Functions
Scalar functions that have type parameters have some additional complexity.To make our previous example work with any type we need the following:
- @ScalarFunction(name = "is_null", calledOnNullInput = true)@Description("Returns TRUE if the argument is NULL")public final class IsNullFunction{ @TypeParameter("T") @SqlType(StandardTypes.BOOLEAN) public static boolean isNullSlice(@SqlNullable @SqlType("T") Slice value) { return (value == null); }
@TypeParameter("T")
@SqlType(StandardTypes.BOOLEAN)
public static boolean isNullLong(@SqlNullable @SqlType("T") Long value)
{
return (value == null);
}
@TypeParameter("T")
@SqlType(StandardTypes.BOOLEAN)
public static boolean isNullDouble(@SqlNullable @SqlType("T") Double value)
{
return (value == null);
}
// ...and so on for each native container type
}
@TypeParameter
:
The @TypeParameter
annotation is used to declare a type parameter which canbe used in the argument types @SqlType
annotation, or return type of the function.It can also be used to annotate a parameter of type Type
. At runtime, the enginewill bind the concrete type to this parameter. @OperatorDependency
may be usedto declare that an additional function for operating on the given type parameter is needed.For example, the following function will only bind to types which have an equals functiondefined:
- @ScalarFunction(name = "is_equal_or_null", calledOnNullInput = true)@Description("Returns TRUE if arguments are equal or both NULL")public final class IsEqualOrNullFunction{ @TypeParameter("T") @SqlType(StandardTypes.BOOLEAN) public static boolean isEqualOrNullSlice( @OperatorDependency(operator = OperatorType.EQUAL, returnType = StandardTypes.BOOLEAN, argumentTypes = {"T", "T"}) MethodHandle equals, @SqlNullable @SqlType("T") Slice value1, @SqlNullable @SqlType("T") Slice value2) { if (value1 == null && value2 == null) { return true; } if (value1 == null || value2 == null) { return false; } return (boolean) equals.invokeExact(value1, value2); }
// ...and so on for each native container type
}
Another Scalar Function Example
The lowercaser
function takes a single VARCHAR
argument and returns aVARCHAR
, which is the argument converted to lower case:
- public class ExampleStringFunction
- {
- @ScalarFunction("lowercaser")
- @Description("converts the string to alternating case")
- @SqlType(StandardTypes.VARCHAR)
- public static Slice lowercaser(@SqlType(StandardTypes.VARCHAR) Slice slice)
- {
- String argument = slice.toStringUtf8();
- return Slices.utf8Slice(argument.toLowerCase());
- }
- }
Note that for most common string functions, including converting a string tolower case, the Slice library also provides implementations that work directlyon the underlying byte[]
, which have much better performance. This functionhas no @SqlNullable
annotations, meaning that if the argument is NULL
,the result will automatically be NULL
(the function will not be called).
Aggregation Function Implementation
Aggregation functions use a similar framework to scalar functions, but area bit more complex.
AccumulatorState
:
All aggregation functions accumulate input rows into a state object; thisobject must implement AccumulatorState
. For simple aggregations, justextend AccumulatorState
into a new interface with the getters and settersyou want, and the framework will generate all the implementations andserializers for you. If you need a more complex state object, you will needto implement AccumulatorStateFactory
and AccumulatorStateSerializer
and provide these via the AccumulatorStateMetadata
annotation.
The following code implements the aggregation function avg_double
which computes theaverage of a DOUBLE
column:
- @AggregationFunction("avg_double")public class AverageAggregation{ @InputFunction public static void input(LongAndDoubleState state, @SqlType(StandardTypes.DOUBLE) double value) { state.setLong(state.getLong() + 1); state.setDouble(state.getDouble() + value); }
@CombineFunction
public static void combine(LongAndDoubleState state, LongAndDoubleState otherState)
{
state.setLong(state.getLong() + otherState.getLong());
state.setDouble(state.getDouble() + otherState.getDouble());
}
@OutputFunction(StandardTypes.DOUBLE)
public static void output(LongAndDoubleState state, BlockBuilder out)
{
long count = state.getLong();
if (count == 0) {
out.appendNull();
}
else {
double value = state.getDouble();
DOUBLE.writeDouble(out, value / count);
}
}
}
The average has two parts: the sum of the DOUBLE
in each row of the columnand the LONG
count of the number of rows seen. LongAndDoubleState
is an interfacewhich extends AccumulatorState
:
- public interface LongAndDoubleState
- extends AccumulatorState
- {
- long getLong();
- void setLong(long value);
- double getDouble();
- void setDouble(double value);
- }
As stated above, for simple AccumulatorState
objects, it is sufficient tojust to define the interface with the getters and setters, and the frameworkwill generate the implementation for you.
An in-depth look at the various annotations relevant to writing an aggregationfunction follows:
@InputFunction
:
The @InputFunction
annotation declares the function which accepts inputrows and stores them in the AccumulatorState
. Similar to scalar functionsyou must annotate the arguments with @SqlType
. Note that, unlike in the abovescalar example where Slice
is used to hold VARCHAR
, the primitivedouble
type is used for the argument to input. In this example, the inputfunction simply keeps track of the running count of rows (via setLong()
)and the running sum (via setDouble()
).
@CombineFunction
:
The @CombineFunction
annotation declares the function used to combine twostate objects. This function is used to merge all the partial aggregation states.It takes two state objects, and merges the results into the first one (in theabove example, just by adding them together).
@OutputFunction
:
The @OutputFunction
is the last function called when computing anaggregation. It takes the final state object (the result of merging allpartial states) and writes the result to a BlockBuilder
.
- Where does serialization happen, and what is
GroupedAccumulatorState
?
The @InputFunction
is usually run on a different worker from the@CombineFunction
, so the state objects are serialized and transportedbetween these workers by the aggregation framework. GroupedAccumulatorState
is used when performing a GROUP BY
aggregation, and an implementationwill be automatically generated for you, if you don’t specify aAccumulatorStateFactory