Hive

本文主要介绍在 Linkis 中, Hive 引擎连接器的安装、使用和配置。

如果您希望在您的服务器上使用 Hive 引擎连接器,您需要保证以下的环境变量已经设置正确并且引擎连接器的启动用户是有这些环境变量的。

强烈建议您在执行 Hive 任务之前,检查下执行用户的这些环境变量。

环境变量名环境变量内容备注
JAVA_HOMEJDK安装路径必须
HADOOP_HOMEHadoop安装路径必须
HADOOP_CONF_DIRHadoop配置路径必须
HIVE_CONF_DIRHive配置路径必须
  1. # 链接hive
  2. bin/hive
  3. # 测试命令
  4. show databases;
  5. # 能够链接成功,并正常输出数据库信息代表环境配置成功
  6. hive (default)> show databases;
  7. OK
  8. databases_name
  9. default

linkis 发布的二进制安装包中默认包含了 Hive 引擎插件,用户无需额外安装。

Hive 的版本是支持 Hive1.xHive2.x ,默认是支持 Hive on MapReduce ,如果您想改成 Hive on Tez, Linkis是可以兼容hive on tez的需要以下步骤:

  • 需要将Tez相关的依赖拷贝到 {LINKIS_HOME}/lib/linkis-engineconn-plugins/hive/dist/3.1.3/lib 是dist not plugin 目录[1]. 你也可以修改hive ec pom增加tez dependency重新编译
  • vim {LINKIS_HOME}/lib/linkis-engineconn-plugins/hive/dist/3.1.3/conf/linkis-engineconn.properties and update linkis.hive.engine.type=tez
  • sh linkis-daemon.sh restart linkis-cg-manager

默认支持的 Hive 版本是 3.1.3 ,如果您想修改 Hive 的版本,您可以找到 linkis-engineplugin-hive 模块,修改 <hive.version\> 标签,然后单独编译此模块即可

EngineConnPlugin引擎插件安装

  1. sh ./bin/linkis-cli -engineType hive-3.1.3 \
  2. -codeType hql -code "show databases" \
  3. -submitUser hadoop -proxyUser hadoop

更多 Linkis-Cli 命令参数参考: Linkis-Cli 使用

Linkis 提供了 JavaScalaSDKLinkis 服务端提交任务。具体可以参考 JAVA SDK Manual。对于 Hive 任务你只需要修改 Demo 中的 EngineConnTypeCodeType 参数即可:

  1. Map<String, Object> labels = new HashMap<String, Object>();
  2. labels.put(LabelKeyConstant.ENGINE_TYPE_KEY, "hive-3.1.3"); // required engineType Label
  3. labels.put(LabelKeyConstant.USER_CREATOR_TYPE_KEY, "hadoop-IDE");// required execute user and creator
  4. labels.put(LabelKeyConstant.CODE_TYPE_KEY, "hql"); // required codeType
配置默认值是否必须说明
wds.linkis.rm.instance10引擎连接器最大并发数
wds.linkis.engineconn.java.driver.memory1g引擎连接器初始化内存大小
wds.linkis.engineconn.max.free.time1h引擎连接器空闲退出时间

HiveMapReduce 任务是需要用到 yarn 的资源,所以需要设置队列。

yarn

如果默认参数不满足时,有如下几中方式可以进行一些基础参数配置

hive

注意: 修改 IDE 标签下的配置后需要指定 -creator IDE 才会生效(其它标签类似),如:

  1. sh ./bin/linkis-cli -creator IDE \
  2. -engineType hive-3.1.3 -codeType hql \
  3. -code "show databases" \
  4. -submitUser hadoop -proxyUser hadoop

提交任务接口,通过参数 params.configuration.runtime 进行配置

  1. http 请求参数示例
  2. {
  3. "executionContent": {"code": "show databases;", "runType": "sql"},
  4. "params": {
  5. "variable": {},
  6. "configuration": {
  7. "runtime": {
  8. "wds.linkis.rm.instance":"10"
  9. }
  10. }
  11. },
  12. "labels": {
  13. "engineType": "hive-3.1.3",
  14. "userCreator": "hadoop-IDE"
  15. }
  16. }

Linkis 是通过引擎连接器标签来进行管理的,所涉及的数据表信息如下所示。

  1. linkis_ps_configuration_config_key: 插入引擎连接器的配置参数的key和默认values
  2. linkis_cg_manager_label:插入引擎连接器标签如:hive-3.1.3
  3. linkis_ps_configuration_category 插入引擎连接器的目录关联关系
  4. linkis_ps_configuration_config_value 插入引擎连接器需要展示的配置
  5. linkis_ps_configuration_key_engine_relation:配置项和引擎连接器的关联关系

表中与引擎连接器相关的初始数据如下

  1. -- set variable
  2. SET @HIVE_LABEL="hive-3.1.3";
  3. SET @HIVE_ALL=CONCAT('*-*,',@HIVE_LABEL);
  4. SET @HIVE_IDE=CONCAT('*-IDE,',@HIVE_LABEL);
  5. -- engine label
  6. insert into `linkis_cg_manager_label` (`label_key`, `label_value`, `label_feature`, `label_value_size`, `update_time`, `create_time`) VALUES ('combined_userCreator_engineType', @HIVE_ALL, 'OPTIONAL', 2, now(), now());
  7. insert into `linkis_cg_manager_label` (`label_key`, `label_value`, `label_feature`, `label_value_size`, `update_time`, `create_time`) VALUES ('combined_userCreator_engineType', @HIVE_IDE, 'OPTIONAL', 2, now(), now());
  8. select @label_id := id from linkis_cg_manager_label where `label_value` = @HIVE_IDE;
  9. insert into linkis_ps_configuration_category (`label_id`, `level`) VALUES (@label_id, 2);
  10. -- configuration key
  11. INSERT INTO `linkis_ps_configuration_config_key` (`key`, `description`, `name`, `default_value`, `validate_type`, `validate_range`, `is_hidden`, `is_advanced`, `level`, `treeName`, `engine_conn_type`) VALUES ('wds.linkis.rm.instance', '范围:1-20,单位:个', 'hive引擎连接器最大并发数', '10', 'NumInterval', '[1,20]', '0', '0', '1', '队列资源', 'hive');
  12. INSERT INTO `linkis_ps_configuration_config_key` (`key`, `description`, `name`, `default_value`, `validate_type`, `validate_range`, `is_hidden`, `is_advanced`, `level`, `treeName`, `engine_conn_type`) VALUES ('wds.linkis.engineconn.java.driver.memory', '取值范围:1-10,单位:G', 'hive引擎连接器初始化内存大小','1g', 'Regex', '^([1-9]|10)(G|g)$', '0', '0', '1', 'hive引擎连接器设置', 'hive');
  13. INSERT INTO `linkis_ps_configuration_config_key` (`key`, `description`, `name`, `default_value`, `validate_type`, `validate_range`, `is_hidden`, `is_advanced`, `level`, `treeName`, `engine_conn_type`) VALUES ('hive.client.java.opts', 'hive客户端进程参数', 'hive引擎连接器启动时jvm参数','', 'None', NULL, '1', '1', '1', 'hive引擎连接器设置', 'hive');
  14. INSERT INTO `linkis_ps_configuration_config_key` (`key`, `description`, `name`, `default_value`, `validate_type`, `validate_range`, `is_hidden`, `is_advanced`, `level`, `treeName`, `engine_conn_type`) VALUES ('mapred.reduce.tasks', '范围:-1-10000,单位:个', 'reduce数', '-1', 'NumInterval', '[-1,10000]', '0', '1', '1', 'hive资源设置', 'hive');
  15. INSERT INTO `linkis_ps_configuration_config_key` (`key`, `description`, `name`, `default_value`, `validate_type`, `validate_range`, `is_hidden`, `is_advanced`, `level`, `treeName`, `engine_conn_type`) VALUES ('wds.linkis.engineconn.max.free.time', '取值范围:3m,15m,30m,1h,2h', '引擎连接器空闲退出时间','1h', 'OFT', '[\"1h\",\"2h\",\"30m\",\"15m\",\"3m\"]', '0', '0', '1', 'hive引擎连接器设置', 'hive');
  16. -- key engine relation
  17. insert into `linkis_ps_configuration_key_engine_relation` (`config_key_id`, `engine_type_label_id`)
  18. (select config.id as `config_key_id`, label.id AS `engine_type_label_id` FROM linkis_ps_configuration_config_key config
  19. INNER JOIN linkis_cg_manager_label label ON config.engine_conn_type = 'hive' and label_value = @HIVE_ALL);
  20. -- engine default configuration
  21. insert into `linkis_ps_configuration_config_value` (`config_key_id`, `config_value`, `config_label_id`)
  22. (select `relation`.`config_key_id` AS `config_key_id`, '' AS `config_value`, `relation`.`engine_type_label_id` AS `config_label_id` FROM linkis_ps_configuration_key_engine_relation relation
  23. INNER JOIN linkis_cg_manager_label label ON relation.engine_type_label_id = label.id AND label.label_value = @HIVE_ALL);

默认的日志界面是不显示 application_id 以及 task 完成数量的,用户可以根据需要输出该日志 引擎连接器内的 log4j2-engineconn.xml/log4j2.xml 配置文件中需要修改的代码块如下

  1. appenders 组件下需要添加
  1. <Send name="SendPackage" >
  2. <PatternLayout pattern="%d{yyyy-MM-dd HH:mm:ss.SSS} %-5level [%t] %logger{36} %L %M - %msg%xEx%n"/>
  3. </Send>
  1. root 组件下需要添加
  1. <appender-ref ref="SendPackage"/>
  1. loggers 组件下需要添加
  1. <logger name="org.apache.hadoop.hive.ql.exec.StatsTask" level="info" additivity="true">
  2. <appender-ref ref="SendPackage"/>
  3. </logger>

进行如上相关修改后日志可以增加任务 task 进度信息,显示为如下样式

  1. 2022-04-08 11:06:50.228 INFO [Linkis-Default-Scheduler-Thread-3] SessionState 1111 printInfo - Status: Running (Executing on YARN cluster with App id application_1631114297082_432445)
  2. 2022-04-08 11:06:50.248 INFO [Linkis-Default-Scheduler-Thread-3] SessionState 1111 printInfo - Map 1: -/- Reducer 2: 0/1
  3. 2022-04-08 11:06:52.417 INFO [Linkis-Default-Scheduler-Thread-3] SessionState 1111 printInfo - Map 1: 0/1 Reducer 2: 0/1
  4. 2022-04-08 11:06:55.060 INFO [Linkis-Default-Scheduler-Thread-3] SessionState 1111 printInfo - Map 1: 0(+1)/1 Reducer 2: 0/1
  5. 2022-04-08 11:06:57.495 INFO [Linkis-Default-Scheduler-Thread-3] SessionState 1111 printInfo - Map 1: 1/1 Reducer 2: 0(+1)/1
  6. 2022-04-08 11:06:57.899 INFO [Linkis-Default-Scheduler-Thread-3] SessionState 1111 printInfo - Map 1: 1/1 Reducer 2: 1/1

完整 xml 配置文件范例如下:

  1. <!--
  2. ~ Copyright 2019 WeBank
  3. ~
  4. ~ Licensed under the Apache License, Version 2.0 (the "License");
  5. ~ you may not use this file except in compliance with the License.
  6. ~ You may obtain a copy of the License at
  7. ~
  8. ~ http://www.apache.org/licenses/LICENSE-2.0
  9. ~
  10. ~ Unless required by applicable law or agreed to in writing, software
  11. ~ distributed under the License is distributed on an "AS IS" BASIS,
  12. ~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  13. ~ See the License for the specific language governing permissions and
  14. ~ limitations under the License.
  15. -->
  16. <configuration status="error" monitorInterval="30">
  17. <appenders>
  18. <Console name="Console" target="SYSTEM_OUT">
  19. <ThresholdFilter level="INFO" onMatch="ACCEPT" onMismatch="DENY"/>
  20. <PatternLayout pattern="%d{yyyy-MM-dd HH:mm:ss.SSS} %-5level [%t] %logger{36} %L %M - %msg%xEx%n"/>
  21. </Console>
  22. <Send name="Send" >
  23. <Filters>
  24. <ThresholdFilter level="WARN" onMatch="ACCEPT" onMismatch="DENY" />
  25. </Filters>
  26. <PatternLayout pattern="%d{yyyy-MM-dd HH:mm:ss.SSS} %-5level [%t] %logger{36} %L %M - %msg%xEx%n"/>
  27. </Send>
  28. <Send name="SendPackage" >
  29. <PatternLayout pattern="%d{yyyy-MM-dd HH:mm:ss.SSS} %-5level [%t] %logger{36} %L %M - %msg%xEx%n"/>
  30. </Send>
  31. <Console name="stderr" target="SYSTEM_ERR">
  32. <ThresholdFilter level="ERROR" onMatch="ACCEPT" onMismatch="DENY" />
  33. <PatternLayout pattern="%d{HH:mm:ss.SSS} %-5level %class{36} %L %M - %msg%xEx%n"/>
  34. </Console>
  35. </appenders>
  36. <loggers>
  37. <root level="INFO">
  38. <appender-ref ref="stderr"/>
  39. <appender-ref ref="Console"/>
  40. <appender-ref ref="Send"/>
  41. <appender-ref ref="SendPackage"/>
  42. </root>
  43. <logger name="org.apache.hadoop.hive.ql.exec.StatsTask" level="info" additivity="true">
  44. <appender-ref ref="SendPackage"/>
  45. </logger>
  46. <logger name="org.springframework.boot.diagnostics.LoggingFailureAnalysisReporter" level="error" additivity="true">
  47. <appender-ref ref="stderr"/>
  48. </logger>
  49. <logger name="com.netflix.discovery" level="warn" additivity="true">
  50. <appender-ref ref="Send"/>
  51. </logger>
  52. <logger name="org.apache.hadoop.yarn" level="warn" additivity="true">
  53. <appender-ref ref="Send"/>
  54. </logger>
  55. <logger name="org.springframework" level="warn" additivity="true">
  56. <appender-ref ref="Send"/>
  57. </logger>
  58. <logger name="org.apache.linkis.server.security" level="warn" additivity="true">
  59. <appender-ref ref="Send"/>
  60. </logger>
  61. <logger name="org.apache.hadoop.hive.ql.exec.mr.ExecDriver" level="fatal" additivity="true">
  62. <appender-ref ref="Send"/>
  63. </logger>
  64. <logger name="org.apache.hadoop.hdfs.KeyProviderCache" level="fatal" additivity="true">
  65. <appender-ref ref="Send"/>
  66. </logger>
  67. <logger name="org.spark_project.jetty" level="ERROR" additivity="true">
  68. <appender-ref ref="Send"/>
  69. </logger>
  70. <logger name="org.eclipse.jetty" level="ERROR" additivity="true">
  71. <appender-ref ref="Send"/>
  72. </logger>
  73. <logger name="org.springframework" level="ERROR" additivity="true">
  74. <appender-ref ref="Send"/>
  75. </logger>
  76. <logger name="org.reflections.Reflections" level="ERROR" additivity="true">
  77. <appender-ref ref="Send"/>
  78. </logger>
  79. <logger name="org.apache.hadoop.ipc.Client" level="ERROR" additivity="true">
  80. <appender-ref ref="Send"/>
  81. </logger>
  82. </loggers>
  83. </configuration>