Hive Engine

This article mainly introduces the installation, usage and configuration of the Hive engine plugin in Linkis.

If you want to use the hive engine on your server, you need to ensure that the following environment variables have been set correctly and the engine startup user has these environment variables.

It is strongly recommended that you check these environment variables for the executing user before executing hive tasks.

Environment variable nameEnvironment variable contentRemarks
JAVA_HOMEJDK installation pathRequired
HADOOP_HOMEHadoop installation pathRequired
HADOOP_CONF_DIRHadoop configuration pathrequired
HIVE_CONF_DIRHive configuration pathrequired
  1. # link hive
  2. bin/hive
  3. # test command
  4. show databases;
  5. # Being able to link successfully and output database information normally means that the environment configuration is successful
  6. hive (default)> show databases;
  7. OK
  8. databases_name
  9. default

The binary installation package released by linkis includes the Hive engine plug-in by default, and users do not need to install it additionally.

The version of Hive supports hive1.x and hive2.x. The default is to support hive on MapReduce. If you want to change to Hive on Tez,Linkis is compatible with hive on tez and requires the following steps:

  • You need to copy Tez-related dependencies to {LINKIS_HOME}/lib/linkis-engineconn-plugins/hive/dist/3.1.3/lib is the dist not plugin directory . You can also modify hive ec pom to add tez dependency compile
  • vim {LINKIS_HOME}/lib/linkis-engineconn-plugins/hive/dist/3.1.3/conf/linkis-engineconn.properties and update linkis.hive.engine.type=tez
  • sh linkis-daemon.sh restart linkis-cg-manager

The hive version supported by default is 3.1.3, if you want to modify the hive version, you can find the linkis-engineplugin-hive module, modify the \<hive.version> tag, and then compile this module separately Can

EngineConnPlugin engine plugin installation

  1. sh ./bin/linkis-cli -engineType hive-3.1.3 \
  2. -codeType hql -code "show databases" \
  3. -submitUser hadoop -proxyUser hadoop

More Linkis-Cli command parameter reference: Linkis-Cli usage

Linkis provides SDK of Java and Scala to submit tasks to Linkis server. For details, please refer to JAVA SDK Manual. For the Hive task, you only need to modify EngineConnType and CodeType parameters in Demo:

  1. Map<String, Object> labels = new HashMap<String, Object>();
  2. labels.put(LabelKeyConstant.ENGINE_TYPE_KEY, "hive-3.1.3"); // required engineType Label
  3. labels.put(LabelKeyConstant.USER_CREATOR_TYPE_KEY, "hadoop-IDE");// required execute user and creator
  4. labels.put(LabelKeyConstant.CODE_TYPE_KEY, "hql"); // required codeType
ConfigurationDefaultRequiredDescription
wds.linkis.rm.instance10noengine maximum concurrency
wds.linkis.engineconn.java.driver.memory1gNoengine initialization memory size
wds.linkis.engineconn.max.free.time1hnoengine idle exit time

The MapReduce task of hive needs to use yarn resources, so a queue needs to be set

yarn

If the default parameters are not satisfied, there are the following ways to configure some basic parameters

hive

Note: After modifying the configuration under the IDE tag, you need to specify -creator IDE to take effect (other tags are similar), such as:

  1. sh ./bin/linkis-cli -creator IDE \
  2. -engineType hive-3.1.3 -codeType hql \
  3. -code "show databases" \
  4. -submitUser hadoop -proxyUser hadoop

Submit the task interface, configure it through the parameter params.configuration.runtime

  1. Example of http request parameters
  2. {
  3. "executionContent": {"code": "show databases;", "runType": "sql"},
  4. "params": {
  5. "variable": {},
  6. "configuration": {
  7. "runtime": {
  8. "wds.linkis.rm.instance":"10"
  9. }
  10. }
  11. },
  12. "labels": {
  13. "engineType": "hive-3.1.3",
  14. "userCreator": "hadoop-IDE"
  15. }
  16. }

Linkis is managed through engine tags, and the data table information involved is as follows.

  1. linkis_ps_configuration_config_key: Insert the key and default values ​​​​of the configuration parameters of the engine
  2. linkis_cg_manager_label: insert engine label such as: hive-3.1.3
  3. linkis_ps_configuration_category: Insert the directory association of the engine
  4. linkis_ps_configuration_config_value: The configuration that the insertion engine needs to display
  5. linkis_ps_configuration_key_engine_relation: The relationship between the configuration item and the engine

The initial data related to the engine in the table is as follows

  1. -- set variable
  2. SET @HIVE_LABEL="hive-3.1.3";
  3. SET @HIVE_ALL=CONCAT('*-*,',@HIVE_LABEL);
  4. SET @HIVE_IDE=CONCAT('*-IDE,',@HIVE_LABEL);
  5. -- engine label
  6. insert into `linkis_cg_manager_label` (`label_key`, `label_value`, `label_feature`, `label_value_size`, `update_time`, `create_time`) VALUES ('combined_userCreator_engineType', @HIVE_ALL, 'OPTIONAL', 2, now(), now());
  7. insert into `linkis_cg_manager_label` (`label_key`, `label_value`, `label_feature`, `label_value_size`, `update_time`, `create_time`) VALUES ('combined_userCreator_engineType', @HIVE_IDE, 'OPTIONAL', 2, now(), now());
  8. select @label_id := id from linkis_cg_manager_label where `label_value` = @HIVE_IDE;
  9. insert into linkis_ps_configuration_category (`label_id`, `level`) VALUES (@label_id, 2);
  10. -- configuration key
  11. INSERT INTO `linkis_ps_configuration_config_key` (`key`, `description`, `name`, `default_value`, `validate_type`, `validate_range`, `is_hidden`, `is_advanced`, `level`, `treeName`, `engine_conn_type`) VALUES ('wds.linkis.rm.instance', 'range: 1-20, unit: piece', 'hive engine maximum concurrent number', '10', 'NumInterval', '[1,20]', '0 ', '0', '1', 'Queue resource', 'hive');
  12. INSERT INTO `linkis_ps_configuration_config_key` (`key`, `description`, `name`, `default_value`, `validate_type`, `validate_range`, `is_hidden`, `is_advanced`, `level`, `treeName`, `engine_conn_type`) VALUES ('wds.linkis.engineconn.java.driver.memory', 'Value range: 1-10, unit: G', 'hive engine initialization memory size', '1g', 'Regex', '^([ 1-9]|10)(G|g)$', '0', '0', '1', 'hive engine settings', 'hive');
  13. INSERT INTO `linkis_ps_configuration_config_key` (`key`, `description`, `name`, `default_value`, `validate_type`, `validate_range`, `is_hidden`, `is_advanced`, `level`, `treeName`, `engine_conn_type`) VALUES ('hive.client.java.opts', 'hive client process parameters', 'jvm parameters when the hive engine starts','', 'None', NULL, '1', '1', '1', 'hive engine settings', 'hive');
  14. INSERT INTO `linkis_ps_configuration_config_key` (`key`, `description`, `name`, `default_value`, `validate_type`, `validate_range`, `is_hidden`, `is_advanced`, `level`, `treeName`, `engine_conn_type`) VALUES ('mapred.reduce.tasks', 'Range: -1-10000, unit: number', 'reduce number', '-1', 'NumInterval', '[-1,10000]', '0', '1', '1', 'hive resource settings', 'hive');
  15. INSERT INTO `linkis_ps_configuration_config_key` (`key`, `description`, `name`, `default_value`, `validate_type`, `validate_range`, `is_hidden`, `is_advanced`, `level`, `treeName`, `engine_conn_type`) VALUES ('wds.linkis.engineconn.max.free.time', 'Value range: 3m,15m,30m,1h,2h', 'Engine idle exit time','1h', 'OFT', '[\ "1h\",\"2h\",\"30m\",\"15m\",\"3m\"]', '0', '0', '1', 'hive engine settings', ' hive');
  16. -- key engine relation
  17. insert into `linkis_ps_configuration_key_engine_relation` (`config_key_id`, `engine_type_label_id`)
  18. (select config.id as `config_key_id`, label.id AS `engine_type_label_id` FROM linkis_ps_configuration_config_key config
  19. INNER JOIN linkis_cg_manager_label label ON config.engine_conn_type = 'hive' and label_value = @HIVE_ALL);
  20. -- engine default configuration
  21. insert into `linkis_ps_configuration_config_value` (`config_key_id`, `config_value`, `config_label_id`)
  22. (select `relation`.`config_key_id` AS `config_key_id`, '' AS `config_value`, `relation`.`engine_type_label_id` AS `config_label_id` FROM linkis_ps_configuration_key_engine_relation relation
  23. INNER JOIN linkis_cg_manager_label label ON relation.engine_type_label_id = label.id AND label.label_value = @HIVE_ALL);

The default log interface does not display application_id and the number of task completed, users can output the log according to their needs The code blocks that need to be modified in the log4j2-engineconn.xml/log4j2.xml configuration file in the engine are as follows

  1. Need to add under the appenders component
  1. <Send name="SendPackage" >
  2. <PatternLayout pattern="%d{yyyy-MM-dd HH:mm:ss.SSS} %-5level [%t] %logger{36} %L %M - %msg%xEx%n"/>
  3. </Send>
  1. Need to add under root component
  1. <appender-ref ref="SendPackage"/>
  1. Need to add under loggers component
  1. <logger name="org.apache.hadoop.hive.ql.exec.StatsTask" level="info" additivity="true">
  2. <appender-ref ref="SendPackage"/>
  3. </logger>

After making the above related modifications, the log can add task task progress information, which is displayed in the following style

  1. 2022-04-08 11:06:50.228 INFO [Linkis-Default-Scheduler-Thread-3] SessionState 1111 printInfo - Status: Running (Executing on YARN cluster with App id application_1631114297082_432445)
  2. 2022-04-08 11:06:50.248 INFO [Linkis-Default-Scheduler-Thread-3] SessionState 1111 printInfo - Map 1: -/- Reducer 2: 0/1
  3. 2022-04-08 11:06:52.417 INFO [Linkis-Default-Scheduler-Thread-3] SessionState 1111 printInfo - Map 1: 0/1 Reducer 2: 0/1
  4. 2022-04-08 11:06:55.060 INFO [Linkis-Default-Scheduler-Thread-3] SessionState 1111 printInfo - Map 1: 0(+1)/1 Reducer 2: 0/1
  5. 2022-04-08 11:06:57.495 INFO [Linkis-Default-Scheduler-Thread-3] SessionState 1111 printInfo - Map 1: 1/1 Reducer 2: 0(+1)/1
  6. 2022-04-08 11:06:57.899 INFO [Linkis-Default-Scheduler-Thread-3] SessionState 1111 printInfo - Map 1: 1/1 Reducer 2: 1/1

An example of a complete xml configuration file is as follows:

  1. <!--
  2. ~ Copyright 2019 WeBank
  3. ~
  4. ~ Licensed under the Apache License, Version 2.0 (the "License");
  5. ~ you may not use this file except in compliance with the License.
  6. ~ You may obtain a copy of the License at
  7. ~
  8. ~ http://www.apache.org/licenses/LICENSE-2.0
  9. ~
  10. ~ Unless required by applicable law or agreed to in writing, software
  11. ~ distributed under the License is distributed on an "AS IS" BASIS,
  12. ~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  13. ~ See the License for the specific language governing permissions and
  14. ~ limitations under the License.
  15. -->
  16. <configuration status="error" monitorInterval="30">
  17. <appenders>
  18. <Console name="Console" target="SYSTEM_OUT">
  19. <ThresholdFilter level="INFO" onMatch="ACCEPT" onMismatch="DENY"/>
  20. <PatternLayout pattern="%d{yyyy-MM-dd HH:mm:ss.SSS} %-5level [%t] %logger{36} %L %M - %msg%xEx%n"/>
  21. </Console>
  22. <Send name="Send" >
  23. <Filters>
  24. <ThresholdFilter level="WARN" onMatch="ACCEPT" onMismatch="DENY" />
  25. </Filters>
  26. <PatternLayout pattern="%d{yyyy-MM-dd HH:mm:ss.SSS} %-5level [%t] %logger{36} %L %M - %msg%xEx%n"/>
  27. </Send>
  28. <Send name="SendPackage" >
  29. <PatternLayout pattern="%d{yyyy-MM-dd HH:mm:ss.SSS} %-5level [%t] %logger{36} %L %M - %msg%xEx%n"/>
  30. </Send>
  31. <Console name="stderr" target="SYSTEM_ERR">
  32. <ThresholdFilter level="ERROR" onMatch="ACCEPT" onMismatch="DENY" />
  33. <PatternLayout pattern="%d{HH:mm:ss.SSS} %-5level %class{36} %L %M - %msg%xEx%n"/>
  34. </Console>
  35. </appenders>
  36. <loggers>
  37. <root level="INFO">
  38. <appender-ref ref="stderr"/>
  39. <appender-ref ref="Console"/>
  40. <appender-ref ref="Send"/>
  41. <appender-ref ref="SendPackage"/>
  42. </root>
  43. <logger name="org.apache.hadoop.hive.ql.exec.StatsTask" level="info" additivity="true">
  44. <appender-ref ref="SendPackage"/>
  45. </logger>
  46. <logger name="org.springframework.boot.diagnostics.LoggingFailureAnalysisReporter" level="error" additivity="true">
  47. <appender-ref ref="stderr"/>
  48. </logger>
  49. <logger name="com.netflix.discovery" level="warn" additivity="true">
  50. <appender-ref ref="Send"/>
  51. </logger>
  52. <logger name="org.apache.hadoop.yarn" level="warn" additivity="true">
  53. <appender-ref ref="Send"/>
  54. </logger>
  55. <logger name="org.springframework" level="warn" additivity="true">
  56. <appender-ref ref="Send"/>
  57. </logger>
  58. <logger name="org.apache.linkis.server.security" level="warn" additivity="true">
  59. <appender-ref ref="Send"/>
  60. </logger>
  61. <logger name="org.apache.hadoop.hive.ql.exec.mr.ExecDriver" level="fatal" additivity="true">
  62. <appender-ref ref="Send"/>
  63. </logger>
  64. <logger name="org.apache.hadoop.hdfs.KeyProviderCache" level="fatal" additivity="true">
  65. <appender-ref ref="Send"/>
  66. </logger>
  67. <logger name="org.spark_project.jetty" level="ERROR" additivity="true">
  68. <appender-ref ref="Send"/>
  69. </logger>
  70. <logger name="org.eclipse.jetty" level="ERROR" additivity="true">
  71. <appender-ref ref="Send"/>
  72. </logger>
  73. <logger name="org.springframework" level="ERROR" additivity="true">
  74. <appender-ref ref="Send"/>
  75. </logger>
  76. <logger name="org.reflections.Reflections" level="ERROR" additivity="true">
  77. <appender-ref ref="Send"/>
  78. </logger>
  79. <logger name="org.apache.hadoop.ipc.Client" level="ERROR" additivity="true">
  80. <appender-ref ref="Send"/>
  81. </logger>
  82. </loggers>
  83. </configuration>