Doris-Operator supports mounting PV (Persistent Volume) on pods of various Doris components.

PV is generally created by the kubernetes system administrator. Doris-Operator does not use PV directly when deploying Doris services. Instead, it declares a set of resources through PVC to apply for PV from the kubernetes cluster. When a PVC is created, Kubernetes will attempt to bind it to an available PV that meets the requirements. StorageClass shields administrators from the process of manually creating PVs. When there are no ready-made PVs that meet PVC requirements, PVs can be dynamically allocated based on StorageClass. PV provides a variety of storage types, mainly divided into two categories: network storage and local storage. Based on their respective principles and implementations, the two provide users with different performance and usage experiences. Users can choose according to their own containerized service types and their own needs.

If PVC is not configured during deployment, Doris-Operator uses the emptyDir mode by default to store metadata data files and run logs. When the pod is restarted, related data will be lost.

Recommended node directory type for persistent storage:

  • FE: doris-meta, log
  • BE: storage, log
  • CN: storage, log
  • BROKER: log

Doris-Operator outputs logs to the console and the specified directory at the same time. If the user’s Kubernetes system has complete log collection capabilities, log information at the Doris INFO level (default) can be collected through console output. However, it is still recommended to configure PVC to persist log files, because in addition to INFO level logs, there are also logs such as fe.out, be.out, audit.log and garbage collection logs, which facilitates quick problem location and audit log backtracking.

ConfigMap is a resource object used to store configuration files in Kubernetes. It allows dynamically mounting configuration files and decouples configuration files from applications, making configuration management more flexible and maintainable. Like PVCs, ConfigMap can be referenced by Pods in order to use configuration data in the application.

StorageClass

Doris-Operator provides Kubernetes default StorageClass mode to support FE and BE data storage, where the storage path (mountPath) uses the default configuration in the image. If users need to specify the StorageClass themselves, they need to modify persistentVolumeClaimSpec.storageClassName in spec.feSpec.persistentVolumes, as shown below:

  1. apiVersion: doris.selectdb.com/v1
  2. kind: DorisCluster
  3. metadata:
  4. labels:
  5. app.kubernetes.io/name: doriscluster
  6. name: doriscluster-sample-storageclass1
  7. spec:
  8. feSpec:
  9. replicas: 3
  10. image: selectdb/doris.fe-ubuntu:2.0.2
  11. limits:
  12. cpu: 8
  13. memory: 16Gi
  14. requests:
  15. cpu: 8
  16. memory: 16Gi
  17. persistentVolumes:
  18. - mountPath: /opt/apache-doris/fe/doris-meta
  19. name: storage0
  20. persistentVolumeClaimSpec:
  21. # when use specific storageclass, the storageClassName should reConfig, example as annotation.
  22. storageClassName: ${your_storageclass}
  23. accessModes:
  24. - ReadWriteOnce
  25. resources:
  26. # notice: if the storage size less 5G, fe will not start normal.
  27. requests:
  28. storage: 100Gi
  29. - mountPath: /opt/apache-doris/fe/log
  30. name: storage1
  31. persistentVolumeClaimSpec:
  32. # when use specific storageclass, the storageClassName should reConfig, example as annotation.
  33. storageClassName: ${your_storageclass}
  34. accessModes:
  35. - ReadWriteOnce
  36. resources:
  37. requests:
  38. storage: 100Gi
  39. beSpec:
  40. replicas: 3
  41. image: selectdb/doris.be-ubuntu:2.0.2
  42. limits:
  43. cpu: 8
  44. memory: 16Gi
  45. requests:
  46. cpu: 8
  47. memory: 16Gi
  48. persistentVolumes:
  49. - mountPath: /opt/apache-doris/be/storage
  50. name: storage2
  51. persistentVolumeClaimSpec:
  52. # when use specific storageclass, the storageClassName should reConfig, example as annotation.
  53. storageClassName: ${your_storageclass}
  54. accessModes:
  55. - ReadWriteOnce
  56. resources:
  57. requests:
  58. storage: 100Gi
  59. - mountPath: /opt/apache-doris/be/log
  60. name: storage3
  61. persistentVolumeClaimSpec:
  62. # when use specific storageclass, the storageClassName should reConfig, example as annotation.
  63. storageClassName: ${your_storageclass}
  64. accessModes:
  65. - ReadWriteOnce
  66. resources:
  67. requests:
  68. storage: 100Gi

Customized ConfigMap

Doris uses ConfigMap in Kubernetes to decouple configuration files and services. Before deploying doriscluster, you need to deploy the ConfigMap you want to use under the same namespace in advance. The following example shows that FE uses ConfigMap named fe-configmap and BE uses ConfigMap named be-configmap. Cluster related yaml:

ConfigMap sample for FE

  1. apiVersion: v1
  2. kind: ConfigMap
  3. metadata:
  4. name: fe-configmap
  5. labels:
  6. app.kubernetes.io/component: fe
  7. data:
  8. fe.conf: |
  9. CUR_DATE=`date +%Y%m%d-%H%M%S`
  10. # the output dir of stderr and stdout
  11. LOG_DIR = ${DORIS_HOME}/log
  12. JAVA_OPTS="-Djavax.security.auth.useSubjectCredsOnly=false -Xss4m -Xmx8192m -XX:+UseMembar -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=7 -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSClassUnloadingEnabled -XX:-CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=80 -XX:SoftRefLRUPolicyMSPerMB=0 -Xloggc:$DORIS_HOME/log/fe.gc.log.$CUR_DATE"
  13. # For jdk 9+, this JAVA_OPTS will be used as default JVM options
  14. JAVA_OPTS_FOR_JDK_9="-Djavax.security.auth.useSubjectCredsOnly=false -Xss4m -Xmx8192m -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=7 -XX:+CMSClassUnloadingEnabled -XX:-CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=80 -XX:SoftRefLRUPolicyMSPerMB=0 -Xlog:gc*:$DORIS_HOME/log/fe.gc.log.$CUR_DATE:time"
  15. # INFO, WARN, ERROR, FATAL
  16. sys_log_level = INFO
  17. # NORMAL, BRIEF, ASYNC
  18. sys_log_mode = NORMAL
  19. # Default dirs to put jdbc drivers,default value is ${DORIS_HOME}/jdbc_drivers
  20. # jdbc_drivers_dir = ${DORIS_HOME}/jdbc_drivers
  21. http_port = 8030
  22. rpc_port = 9020
  23. query_port = 9030
  24. edit_log_port = 9010
  25. enable_fqdn_mode = true

Note that when using FE’s ConfigMap, you must add enable_fqdn_mode = true to fe.conf. For specific reasons, please refer to document here

BE’s ConfigMap sample

  1. apiVersion: v1
  2. kind: ConfigMap
  3. metadata:
  4. name: be-configmap
  5. labels:
  6. app.kubernetes.io/component: be
  7. data:
  8. be.conf: |
  9. CUR_DATE=`date +%Y%m%d-%H%M%S`
  10. PPROF_TMPDIR="$DORIS_HOME/log/"
  11. JAVA_OPTS="-Xmx1024m -DlogPath=$DORIS_HOME/log/jni.log -Xloggc:$DORIS_HOME/log/be.gc.log.$CUR_DATE -Djavax.security.auth.useSubjectCredsOnly=false -Dsun.java.command=DorisBE -XX:-CriticalJNINatives -DJDBC_MIN_POOL=1 -DJDBC_MAX_POOL=100 -DJDBC_MAX_IDLE_TIME=300000 -DJDBC_MAX_WAIT_TIME=5000"
  12. # For jdk 9+, this JAVA_OPTS will be used as default JVM options
  13. JAVA_OPTS_FOR_JDK_9="-Xmx1024m -DlogPath=$DORIS_HOME/log/jni.log -Xlog:gc:$DORIS_HOME/log/be.gc.log.$CUR_DATE -Djavax.security.auth.useSubjectCredsOnly=false -Dsun.java.command=DorisBE -XX:-CriticalJNINatives -DJDBC_MIN_POOL=1 -DJDBC_MAX_POOL=100 -DJDBC_MAX_IDLE_TIME=300000 -DJDBC_MAX_WAIT_TIME=5000"
  14. # since 1.2, the JAVA_HOME need to be set to run BE process.
  15. # JAVA_HOME=/path/to/jdk/
  16. # https://github.com/apache/doris/blob/master/docs/zh-CN/community/developer-guide/debug-tool.md#jemalloc-heap-profile
  17. # https://jemalloc.net/jemalloc.3.html
  18. JEMALLOC_CONF="percpu_arena:percpu,background_thread:true,metadata_thp:auto,muzzy_decay_ms:15000,dirty_decay_ms:15000,oversize_threshold:0,lg_tcache_max:20,prof:false,lg_prof_interval:32,lg_prof_sample:19,prof_gdump:false,prof_accum:false,prof_leak:false,prof_final:false"
  19. JEMALLOC_PROF_PRFIX=""
  20. # INFO, WARNING, ERROR, FATAL
  21. sys_log_level = INFO
  22. # ports for admin, web, heartbeat service
  23. be_port = 9060
  24. webserver_port = 8040
  25. heartbeat_service_port = 9050
  26. brpc_port = 8060

doriscluster deployment example using the above two ConfigMap:

  1. apiVersion: doris.selectdb.com/v1
  2. kind: DorisCluster
  3. metadata:
  4. labels:
  5. app.kubernetes.io/name: doriscluster
  6. name: doriscluster-sample-configmap
  7. spec:
  8. feSpec:
  9. replicas: 3
  10. image: selectdb/doris.fe-ubuntu:2.0.2
  11. limits:
  12. cpu: 8
  13. memory: 16Gi
  14. requests:
  15. cpu: 8
  16. memory: 16Gi
  17. configMapInfo:
  18. # use kubectl create configmap fe-configmap --from-file=fe.conf
  19. configMapName: fe-configmap
  20. resolveKey: fe.conf
  21. beSpec:
  22. replicas: 3
  23. image: selectdb/doris.be-ubuntu:2.0.2
  24. limits:
  25. cpu: 8
  26. memory: 16Gi
  27. requests:
  28. cpu: 8
  29. memory: 16Gi
  30. configMapInfo:
  31. # use kubectl create configmap be-configmap --from-file=be.conf
  32. configMapName: be-configmap
  33. resolveKey: be.conf
  34. brokerSpec:
  35. replicas: 3
  36. image: selectdb/doris.broker-ubuntu:2.0.2
  37. limits:
  38. cpu: 2
  39. memory: 4Gi
  40. requests:
  41. cpu: 2
  42. memory: 4Gi
  43. configMapInfo:
  44. # use kubectl create configmap broker-configmap --from-file=apache_hdfs_broker.conf
  45. configMapName: broker-configmap
  46. resolveKey: apache_hdfs_broker.conf

The resolveKey here is the name of the incoming configuration file (must be fe.conf, be.conf or apache_hdfs_broker.conf, the cn node is also be.conf) used to parse the incoming Doris cluster configuration file, doris-operator will parse the file to guide the customized deployment of doriscluster.

Add special configuration files to the conf directory

This paragraph is for reference. Containerized deployment solutions that configure other files need to be placed in the conf directory of the Doris node. For example, the common HDFS/Hive configuration file mapping of Data Lake Multi-catalog.

Here we take BE’s ConfigMap and the core-site.xml file that needs to be added as an example:

  1. apiVersion: v1
  2. kind: ConfigMap
  3. metadata:
  4. name: be-configmap
  5. labels:
  6. app.kubernetes.io/component: be
  7. data:
  8. be.conf: |
  9. be_port = 9060
  10. webserver_port = 8040
  11. heartbeat_service_port = 9050
  12. brpc_port = 8060
  13. core-site.xml: |
  14. <?xml version="1.0" encoding="UTF-8"?>
  15. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
  16. <configuration>
  17. <property>
  18. <name>hadoop.security.authentication</name>
  19. <value>kerberos</value>
  20. </property>
  21. </configuration>
  22. ...

Note that the data structure in data is as follows: key-value pair mapping:

  1. data:
  2. file_name_1:
  3. file_content_1
  4. file_name_2:
  5. file_content_2
  6. file_name_3:
  7. file_content_3

BE multi-disk configuration

Doris’ BE service supports multi-disk mounting, which can well solve the problem of mismatch between computing resources and storage resources in the server era. At the same time, using multiple disks can also greatly improve the storage efficiency of doris. On Kubernetes, Doris can also mount multiple disks to maximize storage efficiency. Using multiple disks on Kubernetes requires using configuration files. In order to achieve decoupling of service and configuration, doris uses ConfigMap as the bearer of configuration to dynamically mount configuration files for service use. The following is the doriscluster configuration in which the BE service uses ConfigMap to host the configuration file and mount two disks for BE use:

  1. apiVersion: doris.selectdb.com/v1
  2. kind: DorisCluster
  3. metadata:
  4. labels:
  5. app.kubernetes.io/name: doriscluster
  6. name: doriscluster-sample-storageclass1
  7. spec:
  8. feSpec:
  9. replicas: 3
  10. image: selectdb/doris.fe-ubuntu:2.0.2
  11. limits:
  12. cpu: 8
  13. memory: 16Gi
  14. requests:
  15. cpu: 8
  16. memory: 16Gi
  17. persistentVolumes:
  18. - mountPath: /opt/apache-doris/fe/doris-meta
  19. name: storage0
  20. persistentVolumeClaimSpec:
  21. # when use specific storageclass, the storageClassName should reConfig, example as annotation.
  22. #storageClassName: openebs-jiva-csi-default
  23. accessModes:
  24. - ReadWriteOnce
  25. resources:
  26. # notice: if the storage size less 5G, fe will not start normal.
  27. requests:
  28. storage: 100Gi
  29. - mountPath: /opt/apache-doris/fe/log
  30. name: storage1
  31. persistentVolumeClaimSpec:
  32. # when use specific storageclass, the storageClassName should reConfig, example as annotation.
  33. #storageClassName: openebs-jiva-csi-default
  34. accessModes:
  35. - ReadWriteOnce
  36. resources:
  37. requests:
  38. storage: 100Gi
  39. beSpec:
  40. replicas: 3
  41. image: selectdb/doris.be-ubuntu:2.0.2
  42. limits:
  43. cpu: 8
  44. memory: 16Gi
  45. requests:
  46. cpu: 8
  47. memory: 16Gi
  48. configMapInfo:
  49. configMapName: be-configmap
  50. resolveKey: be.conf
  51. persistentVolumes:
  52. - mountPath: /opt/apache-doris/be/storage
  53. name: storage2
  54. persistentVolumeClaimSpec:
  55. # when use specific storageclass, the storageClassName should reConfig, example as annotation.
  56. #storageClassName: openebs-jiva-csi-default
  57. accessModes:
  58. - ReadWriteOnce
  59. resources:
  60. requests:
  61. storage: 100Gi
  62. - mountPath: /opt/apache-doris/be/storage1
  63. name: storage3
  64. persistentVolumeClaimSpec:
  65. # when use specific storageclass, the storageClassName should reConfig, example as annotation.
  66. #storageClassName: openebs-jiva-csi-default
  67. accessModes:
  68. - ReadWriteOnce
  69. resources:
  70. requests:
  71. storage: 100Gi
  72. - mountPath: /opt/apache-doris/be/log
  73. name: storage4
  74. persistentVolumeClaimSpec:
  75. # when use specific storageclass, the storageClassName should reConfig, example as annotation.
  76. #storageClassName: openebs-jiva-csi-default
  77. accessModes:
  78. - ReadWriteOnce
  79. resources:
  80. requests:
  81. storage: 100Gi

Compared with the default example, the configuration of configMapInfo is added, and a configuration of persistentVolumeClaimSpec is also added, persistentVolumeClaimSpec fully follows the definition format of the Kubernetes native resource PVC spec. In the example, configMapInfo identifies which ConfigMap under the same namespace and which key corresponding content will be used as the configuration file after BE is deployed, where the key must be be.conf. The following is an example of the above doriscluster ConfigMap that needs to be pre-deployed:

  1. apiVersion: v1
  2. kind: ConfigMap
  3. metadata:
  4. name: be-configmap
  5. labels:
  6. app.kubernetes.io/component: be
  7. data:
  8. be.conf: |
  9. CUR_DATE=`date +%Y%m%d-%H%M%S`
  10. PPROF_TMPDIR="$DORIS_HOME/log/"
  11. JAVA_OPTS="-Xmx1024m -DlogPath=$DORIS_HOME/log/jni.log -Xloggc:$DORIS_HOME/log/be.gc.log.$CUR_DATE -Djavax.security.auth.useSubjectCredsOnly=false -Dsun.java.command=DorisBE -XX:-CriticalJNINatives -DJDBC_MIN_POOL=1 -DJDBC_MAX_POOL=100 -DJDBC_MAX_IDLE_TIME=300000 -DJDBC_MAX_WAIT_TIME=5000"
  12. # For jdk 9+, this JAVA_OPTS will be used as default JVM options
  13. JAVA_OPTS_FOR_JDK_9="-Xmx1024m -DlogPath=$DORIS_HOME/log/jni.log -Xlog:gc:$DORIS_HOME/log/be.gc.log.$CUR_DATE -Djavax.security.auth.useSubjectCredsOnly=false -Dsun.java.command=DorisBE -XX:-CriticalJNINatives -DJDBC_MIN_POOL=1 -DJDBC_MAX_POOL=100 -DJDBC_MAX_IDLE_TIME=300000 -DJDBC_MAX_WAIT_TIME=5000"
  14. # since 1.2, the JAVA_HOME need to be set to run BE process.
  15. # JAVA_HOME=/path/to/jdk/
  16. # https://github.com/apache/doris/blob/master/docs/zh-CN/community/developer-guide/debug-tool.md#jemalloc-heap-profile
  17. # https://jemalloc.net/jemalloc.3.html
  18. JEMALLOC_CONF="percpu_arena:percpu,background_thread:true,metadata_thp:auto,muzzy_decay_ms:15000,dirty_decay_ms:15000,oversize_threshold:0,lg_tcache_max:20,prof:false,lg_prof_interval:32,lg_prof_sample:19,prof_gdump:false,prof_accum:false,prof_leak:false,prof_final:false"
  19. JEMALLOC_PROF_PRFIX=""
  20. # INFO, WARNING, ERROR, FATAL
  21. sys_log_level = INFO
  22. # ports for admin, web, heartbeat service
  23. be_port = 9060
  24. webserver_port = 8040
  25. heartbeat_service_port = 9050
  26. brpc_port = 8060
  27. storage_root_path = /opt/apache-doris/be/storage,medium:ssd;/opt/apache-doris/be/storage1,medium:ssd

When using multiple disks, the path in the corresponding value of storage_root_path in ConfigMap should correspond to each mounting path of persistentVolume in doriscluster. storage_root_path For the corresponding writing rules, please refer to the document in the link. When using cloud disks, the media is uniformly SSD.