常见问题

常见问题

Kerberos

连接 Kerberos 认证的 Hive Metastore 报错：GSS initiate failed

通常是因为 Kerberos 认证信息填写不正确导致的，可以通过以下步骤排查：
1. 1.2.1 之前的版本中，Doris 依赖的 libhdfs3 库没有开启 gsasl。请更新至 1.2.2 之后的版本。
2. 确认对各个组件，设置了正确的 keytab 和 principal，并确认 keytab 文件存在于所有 FE、BE 节点上。
  1. hadoop.kerberos.keytab/hadoop.kerberos.principal：用于 Hadoop hdfs 访问，填写 hdfs 对应的值。
  2. hive.metastore.kerberos.principal：用于 hive metastore。
3. 尝试将 principal 中的 ip 换成域名（不要使用默认的 _HOST 占位符）
4. 确认 /etc/krb5.conf 文件存在于所有 FE、BE 节点上。
通过 Hive Catalog 连接 Hive 数据库报错：RemoteException: SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS].

如果在 show databases 和 show tables 都是没问题的情况下，查询的时候出现上面的错误，我们需要进行下面两个操作：
- fe/conf、be/conf 目录下需放置 core-site.xml 和 hdfs-site.xml
- BE 节点执行 Kerberos 的 kinit 然后重启 BE ，然后再去执行查询即可.
查询配置了Kerberos的外表，遇到该报错：GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos Ticket)，一般重启FE和BE能够解决该问题。
- 重启所有节点前可在"${DORIS_HOME}/be/conf/be.conf"中的JAVA_OPTS参数里配置-Djavax.security.auth.useSubjectCredsOnly=false，通过底层机制去获取JAAS credentials信息，而不是应用程序。
- 在JAAS Troubleshooting中可获取更多常见JAAS报错的解决方法。
在Catalog中配置Kerberos时，报错Unable to obtain password from user的解决方法：
- 用到的principal必须在klist中存在，使用klist -kt your.keytab检查。
- 检查catalog配置是否正确，比如漏配yarn.resourcemanager.principal。
- 若上述检查没问题，则当前系统yum或者其他包管理软件安装的JDK版本存在不支持的加密算法，建议自行安装JDK并设置JAVA_HOME环境变量。
使用 KMS 访问 HDFS 时报错：java.security.InvalidKeyException: Illegal key size

升级 JDK 版本到 >= Java 8 u162 的版本。或者下载安装 JDK 相应的 JCE Unlimited Strength Jurisdiction Policy Files。
在Catalog中配置Kerberos时，如果报错SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS]，那么需要将core-site.xml文件放到"${DORIS_HOME}/be/conf"目录下。

如果访问HDFS报错No common protection layer between client and server，检查客户端和服务端的hadoop.rpc.protection属性，使他们保持一致。
```
    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <configuration>
        <property>
            <name>hadoop.security.authentication</name>
            <value>kerberos</value>
        </property>
    </configuration>
```
在使用Broker Load时，配置了Kerberos，如果报错Cannot locate default realm.。

将 -Djava.security.krb5.conf=/your-path 配置项添加到Broker Load启动脚本的 start_broker.sh 的 JAVA_OPTS里。
当在Catalog里使用Kerberos配置时，不能同时使用hadoop.username属性。

JDBC Catalog

通过 JDBC Catalog 连接 SQLServer 报错：unable to find valid certification path to requested target

请在 jdbc_url 中添加 trustServerCertificate=true 选项。
通过 JDBC Catalog 连接 MySQL 数据库，中文字符乱码，或中文字符条件查询不正确

请在 jdbc_url 中添加 useUnicode=true&characterEncoding=utf-8

注：1.2.3 版本后，使用 JDBC Catalog 连接 MySQL 数据库，会自动添加这些参数。
通过 JDBC Catalog 连接 MySQL 数据库报错：Establishing SSL connection without server's identity verification is not recommended

请在 jdbc_url 中添加 useSSL=true
使用JDBC Catalog将MySQL数据同步到Doris中，日期数据同步错误。需要校验下MySQL的版本是否与MySQL的驱动包是否对应，比如MySQL8以上需要使用驱动com.mysql.cj.jdbc.Driver。

Hive Catalog

通过 Hive Metastore 访问 Iceberg 表报错：failed to get schema 或 Storage schema reading not supported

在 Hive 的 lib/ 目录放上 iceberg 运行时有关的 jar 包。

在 hive-site.xml 配置：
```
metastore.storage.schema.reader.impl=org.apache.hadoop.hive.metastore.SerDeStorageSchemaReader
```
配置完成后需要重启Hive Metastore。

连接 Hive Catalog 报错：Caused by: java.lang.NullPointerException

如 fe.log 中有如下堆栈：

Caused by: java.lang.NullPointerException
    at org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook.getFilteredObjects(AuthorizationMetaStoreFilterHook.java:78) ~[hive-exec-3.1.3-core.jar:3.1.3]
    at org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook.filterDatabases(AuthorizationMetaStoreFilterHook.java:55) ~[hive-exec-3.1.3-core.jar:3.1.3]
    at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getAllDatabases(HiveMetaStoreClient.java:1548) ~[doris-fe.jar:3.1.3]
    at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getAllDatabases(HiveMetaStoreClient.java:1542) ~[doris-fe.jar:3.1.3]
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_181]

可以尝试在 create catalog 语句中添加 "metastore.filter.hook" = "org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl" 解决。

如果创建 Hive Catalog 后能正常show tables，但查询时报java.net.UnknownHostException: xxxxx

可以在 CATALOG 的 PROPERTIES 中添加
```
'fs.defaultFS' = 'hdfs://<your_nameservice_or_actually_HDFS_IP_and_port>'
```
Hive 1.x 的 orc 格式的表可能会遇到底层 orc 文件 schema 中列名为 _col0，_col1，_col2… 这类系统列名，此时需要在 catalog 配置中添加 hive.version 为 1.x.x，这样就会使用 hive 表中的列名进行映射。
```
CREATE CATALOG hive PROPERTIES (
    'hive.version' = '1.x.x'
);
```
使用Catalog查询表数据时发现与Hive Metastore相关的报错：Invalid method name，需要设置hive.version参数。
```
CREATE CATALOG hive PROPERTIES (
    'hive.version' = '2.x.x'
);
```
查询 ORC 格式的表，FE 报错 Could not obtain block 或 Caused by: java.lang.NoSuchFieldError: types

对于 ORC 文件，在默认情况下，FE 会访问 HDFS 获取文件信息，进行文件切分。部分情况下，FE 可能无法访问到 HDFS。可以通过添加以下参数解决：

"hive.exec.orc.split.strategy" = "BI"

其他选项：HYBRID（默认），ETL。

在hive上可以查到hudi表分区字段的值，但是在doris查不到。

doris和hive目前查询hudi的方式不一样，doris需要在hudi表结构的avsc文件里添加上分区字段,如果没加，就会导致doris查询partition_val为空（即使设置了hoodie.datasource.hive_sync.partition_fields=partition_val也不可以）

{
    "type": "record",
    "name": "record",
    "fields": [{
        "name": "partition_val",
        "type": [
            "null",
            "string"
            ],
        "doc": "Preset partition field, empty string when not partitioned",
        "default": null
        },
        {
        "name": "name",
        "type": "string",
        "doc": "名称"
        },
        {
        "name": "create_time",
        "type": "string",
        "doc": "创建时间"
        }
    ]
}

查询hive外表，遇到该报错：java.lang.ClassNotFoundException: Class com.hadoop.compression.lzo.LzoCodec not found

去hadoop环境搜索hadoop-lzo-*.jar放在"${DORIS_HOME}/fe/lib/"目录下并重启fe。

从 2.0.2 版本起，可以将这个文件放置在BE的 custom_lib/ 目录下（如不存在，手动创建即可），以防止升级集群时因为 lib 目录被替换而导致文件丢失。
创建hive表指定serde为 org.apache.hadoop.hive.contrib.serde2.MultiDelimitserDe，访问表时报错：storage schema reading not supported

在hive-site.xml文件中增加以下配置，并重启hms服务：
```
<property>
   <name>metastore.storage.schema.reader.impl</name>
   <value>org.apache.hadoop.hive.metastore.SerDeStorageSchemaReader</value>
</property>
```

报错：java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty

FE日志中完整报错信息如下：

org.apache.doris.common.UserException: errCode = 2, detailMessage = S3 list path failed. path=s3://bucket/part-*,msg=errors while get file status listStatus on s3://bucket: com.amazonaws.SdkClientException: Unable to execute HTTP request: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty: Unable to execute HTTP request: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty
org.apache.doris.common.UserException: errCode = 2, detailMessage = S3 list path exception. path=s3://bucket/part-*, err: errCode = 2, detailMessage = S3 list path failed. path=s3://bucket/part-*,msg=errors while get file status listStatus on s3://bucket: com.amazonaws.SdkClientException: Unable to execute HTTP request: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty: Unable to execute HTTP request: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty
org.apache.hadoop.fs.s3a.AWSClientIOException: listStatus on s3://bucket: com.amazonaws.SdkClientException: Unable to execute HTTP request: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty: Unable to execute HTTP request: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty
Caused by: com.amazonaws.SdkClientException: Unable to execute HTTP request: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty
Caused by: javax.net.ssl.SSLException: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty
Caused by: java.lang.RuntimeException: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty
Caused by: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty

尝试更新FE节点CA证书，使用 update-ca-trust（CentOS/RockyLinux），然后重启FE进程即可。

HDFS

访问 HDFS 3.x 时报错：java.lang.VerifyError: xxx

1.2.1 之前的版本中，Doris 依赖的 Hadoop 版本为 2.8。需更新至 2.10.2。或更新 Doris 至 1.2.2 之后的版本。
使用 Hedged Read 优化 HDFS 读取慢的问题。

在某些情况下，HDFS 的负载较高可能导致读取某个 HDFS 上的数据副本的时间较长，从而拖慢整体的查询效率。HDFS Client 提供了 Hedged Read 功能。该功能可以在一个读请求超过一定阈值未返回时，启动另一个读线程读取同一份数据，哪个先返回就是用哪个结果。

注意：该功能可能会增加 HDFS 集群的负载，请酌情使用。

可以通过以下两种方式开启这个功能：
- 在创建 Catalog 的参数中指定：
```
create catalog regression properties (
    'type'='hms',
    'hive.metastore.uris' = 'thrift://172.21.16.47:7004',
    'dfs.client.hedged.read.threadpool.size' = '128',
    'dfs.client.hedged.read.threshold.millis' = "500"
);
```
  dfs.client.hedged.read.threadpool.size 表示用于 Hedged Read 的线程数，这些线程由一个 HDFS Client 共享。通常情况下，针对一个 HDFS 集群，BE 节点会共享一个 HDFS Client。
  
  dfs.client.hedged.read.threshold.millis 是读取阈值，单位毫秒。当一个读请求超过这个阈值未返回时，会触发 Hedged Read。

开启后，可以在 Query Profile 中看到相关参数：
`TotalHedgedRead`: 发起 Hedged Read 的次数。
`HedgedReadWins`：Hedged Read 成功的次数（发起并且比原请求更快返回的次数）
注意，这里的值是单个 HDFS Client 的累计值，而不是单个查询的数值。同一个 HDFS Client 会被多个查询复用。

DLF Catalog

使用DLF Catalog时，BE读在取JindoFS数据出现Invalid address，需要在/ets/hosts中添加日志中出现的域名到IP的映射。
读取数据无权限时，使用hadoop.username属性指定有权限的用户。
DLF Catalog中的元数据和DLF保持一致。当使用DLF管理元数据时，Hive新导入的分区，可能未被DLF同步，导致出现DLF和Hive元数据不一致的情况，对此，需要先保证Hive元数据被DLF完全同步。