UDF Function
UDF: User Defined Function, user-defined function. In some scenarios, we need to use hive functions to process some data. Functions like count() and sum() are built-in. If we want to use some functions that are not built-in, we need to customize the function, which can be done by writing UDF.
Overall step description
- Write UDF functions in UDF format locally and package them as jar package files
- 【Scriptis >> Workspace】Upload to the corresponding directory in the workspace
- 【Management Console>>UDF Function】 Create udf (default loading)
- Used in task code (only effective for newly started engines)
Step1 Writing jar packages locally
Hive UDF Example:
- add hive dependency
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>3.1.3</version>
</dependency>
- create UDF class
import org.apache.hadoop.hive.ql.exec.UDF;
public class UDFExample extends UDF {
public Integer evaluate(Integer value) {
return value == null ? null : value + 1;
}
}
- package
mvn package
Step2【Scriptis >> Workspace】Upload jar package Select the corresponding folder and right-click to select Upload
Step3【Management Console>>UDF Function】 Create UDF
- Function name: Conforming to the rules is sufficient, such as test Udf Using jar in scripts such as SQL
- Function Type: General
- Script path: Select the shared directory path where the jar package is stored, such as../..// Wds Functions 1 0 0. jar
- Registration format: package name+class name, such as com.webank.wedatasphere.willink.bdp.udf.ToUpperCase
- Usage format: Input type and return type must be consistent with the definition in the jar package
- Classification: drop-down selection; Alternatively, enter a custom directory (which will create a new target level directory under the personal function)
Note that the newly created UDF function is loaded by default and can be viewed on the [Scriptis>>UDF Functions] page for easy viewing during Scriptis task editing. Checking the UDF function indicates that it will be loaded and used
Step4 Use this udf function
Innovative udf function using the above steps in the task Function name is [Create UDF] Function name In pyspark: print (sqlContext.sql(“select test_udf_jar(name1) from stacyyan_ind.result_sort_1_20200226”).collect())
Overall step description
- Create a new Spark script file in the desired directory in the [Scriptis>>workspace]
- Create UDF in [Management Console>>UDF Functions] (default loading)
- Used in task code (only effective for newly started engines)
Step1 dss-scriptis-Create a new scala script
def helloWorld(str: String): String = “hello, “ + str
Step2 Create UDF
- Function name: Conforming to the rules is sufficient, such as test Udf Scala
- Function type: spark
- Script Path:../..// B
- Registration format: The input type and return type must be consistent with the definition; The function names that need to be defined in the registration format must be strictly consistent, such as helloWorld
- Classification: Drop down and select the first level directory that exists under dss scriptis UDF function - Personal function; Alternatively, enter a custom directory (which will create a new target level directory under the personal function)
Step3 Use this udf function
Use the above steps in the task to create a new udf function Function name is [Create UDF] Function name
- In scala val s=sqlContext.sql(“select test_udf_scala(name1) from stacyyan_ind.result_sort_1_20200226”) show(s)
- in pyspark print(sqlContext.sql(“select test_udf_scala(name1) from stacyyan_ind.result_sort_1_20200226”).collect());
- in sql select test_udf_scala(name1) from stacyyan_ind.result_sort_1_20200226;
Overall step description
- Create a new Python script file in the desired directory in the [Scriptis>>workspace]
- Create UDF in [Management Console>>UDF Functions] (default loading)
- Used in task code (only effective for newly started engines)
Step1 dss-scriptis-Create a new pyspark script
def addation(a, b): return a + b Step2 Create UDF
- Function name: Conforming to the rules is sufficient, such as test Udf Py
- Function type: spark
- Script Path:../..// A
- Registration format: The function names that need to be defined must be strictly consistent, such as addition
- Usage format: The input type and return type must be consistent with the definition
- Classification: Drop down and select the first level directory that exists under dss scriptis UDF function - Personal function; Alternatively, enter a custom directory (which will create a new target level directory under the personal function)
Step3 uses this udf function Use the above steps in the task to create a new udf function Function name is [Create UDF] Function name
- in pyspark print(sqlContext.sql(“select test_udf_py(pv,impression) from neiljianliu_ind.alias where entityid=504059 limit 50”).collect());
- in sql select test_udf_py(pv,impression) from neiljianliu_ind.alias where entityid=504059 limit 50
Overall step description
- Create a new Spark Scala script file in the desired directory in the [Scriptis>>workspace]
- Create UDF in [Management Console>>UDF Functions] (default loading)
- Used in task code (only effective for newly started engines)
- Step1 dss-scriptis-Create a new scala script def hellozdy(str:String):String = “hellozdy,haha “ + str Step2 CREATE FUNCTION
- Function name: Must be strictly consistent with the defined function name, such as hellozdy
- Function Type: Custom Function
- Script Path:../..// D
- Usage format: The input type and return type must be consistent with the definition
- Classification: Drop down and select the first level directory that exists under dss scriptis method function personal function; Alternatively, enter a custom directory (which will create a new target level directory under the personal function) Step3 Use this function Use the above steps in the task to create a new udf function Function name is [Create UDF] Function name val a = hellozdy(“abcd”); print(a)
“FAILED: SemanticException [Error 10011]: Invalid function xxxx”
Firstly, check if the UDF function configuration is correct:
The registration format is the function path name:
Check the scriptis udf function to see if the loaded function is checked. When the function is not checked, udf will not be loaded when the engine starts
Check if the engine has loaded UDF. If not, please restart another engine or restart the current engine Note: UDF will only be loaded when the engine is initialized. If UDF is added midway, the current engine will not be able to perceive and load it