Scalar User Defined Functions (UDFs)

Description

User-Defined Functions (UDFs) are user-programmable routines that act on one row. This documentation lists the classes that are required for creating and registering UDFs. It also contains examples that demonstrate how to define and register UDFs and invoke them in Spark SQL.

UserDefinedFunction

To define the properties of a user-defined function, the user can use some of the methods defined in this class.

  • asNonNullable(): UserDefinedFunction

    Updates UserDefinedFunction to non-nullable.

  • asNondeterministic(): UserDefinedFunction

    Updates UserDefinedFunction to nondeterministic.

  • withName(name: String): UserDefinedFunction

    Updates UserDefinedFunction with a given name.

Examples

  1. import org.apache.spark.sql.SparkSession
  2. import org.apache.spark.sql.functions.udf
  3. val spark = SparkSession
  4. .builder()
  5. .appName("Spark SQL UDF scalar example")
  6. .getOrCreate()
  7. // Define and register a zero-argument non-deterministic UDF
  8. // UDF is deterministic by default, i.e. produces the same result for the same input.
  9. val random = udf(() => Math.random())
  10. spark.udf.register("random", random.asNondeterministic())
  11. spark.sql("SELECT random()").show()
  12. // +-------+
  13. // |UDF() |
  14. // +-------+
  15. // |xxxxxxx|
  16. // +-------+
  17. // Define and register a one-argument UDF
  18. val plusOne = udf((x: Int) => x + 1)
  19. spark.udf.register("plusOne", plusOne)
  20. spark.sql("SELECT plusOne(5)").show()
  21. // +------+
  22. // |UDF(5)|
  23. // +------+
  24. // | 6|
  25. // +------+
  26. // Define a two-argument UDF and register it with Spark in one step
  27. spark.udf.register("strLenScala", (_: String).length + (_: Int))
  28. spark.sql("SELECT strLenScala('test', 1)").show()
  29. // +--------------------+
  30. // |strLenScala(test, 1)|
  31. // +--------------------+
  32. // | 5|
  33. // +--------------------+
  34. // UDF in a WHERE clause
  35. spark.udf.register("oneArgFilter", (n: Int) => { n > 5 })
  36. spark.range(1, 10).createOrReplaceTempView("test")
  37. spark.sql("SELECT * FROM test WHERE oneArgFilter(id)").show()
  38. // +---+
  39. // | id|
  40. // +---+
  41. // | 6|
  42. // | 7|
  43. // | 8|
  44. // | 9|
  45. // +---+

Find full example code at “examples/src/main/scala/org/apache/spark/examples/sql/UserDefinedScalar.scala” in the Spark repo.

  1. import org.apache.spark.sql.*;
  2. import org.apache.spark.sql.api.java.UDF1;
  3. import org.apache.spark.sql.expressions.UserDefinedFunction;
  4. import static org.apache.spark.sql.functions.udf;
  5. import org.apache.spark.sql.types.DataTypes;
  6. SparkSession spark = SparkSession
  7. .builder()
  8. .appName("Java Spark SQL UDF scalar example")
  9. .getOrCreate();
  10. // Define and register a zero-argument non-deterministic UDF
  11. // UDF is deterministic by default, i.e. produces the same result for the same input.
  12. UserDefinedFunction random = udf(
  13. () -> Math.random(), DataTypes.DoubleType
  14. );
  15. random.asNondeterministic();
  16. spark.udf().register("random", random);
  17. spark.sql("SELECT random()").show();
  18. // +-------+
  19. // |UDF() |
  20. // +-------+
  21. // |xxxxxxx|
  22. // +-------+
  23. // Define and register a one-argument UDF
  24. spark.udf().register("plusOne",
  25. (UDF1<Integer, Integer>) x -> x + 1, DataTypes.IntegerType);
  26. spark.sql("SELECT plusOne(5)").show();
  27. // +----------+
  28. // |plusOne(5)|
  29. // +----------+
  30. // | 6|
  31. // +----------+
  32. // Define and register a two-argument UDF
  33. UserDefinedFunction strLen = udf(
  34. (String s, Integer x) -> s.length() + x, DataTypes.IntegerType
  35. );
  36. spark.udf().register("strLen", strLen);
  37. spark.sql("SELECT strLen('test', 1)").show();
  38. // +------------+
  39. // |UDF(test, 1)|
  40. // +------------+
  41. // | 5|
  42. // +------------+
  43. // UDF in a WHERE clause
  44. spark.udf().register("oneArgFilter",
  45. (UDF1<Long, Boolean>) x -> x > 5, DataTypes.BooleanType);
  46. spark.range(1, 10).createOrReplaceTempView("test");
  47. spark.sql("SELECT * FROM test WHERE oneArgFilter(id)").show();
  48. // +---+
  49. // | id|
  50. // +---+
  51. // | 6|
  52. // | 7|
  53. // | 8|
  54. // | 9|
  55. // +---+

Find full example code at “examples/src/main/java/org/apache/spark/examples/sql/JavaUserDefinedScalar.java” in the Spark repo.