In the k8s environment, the service will enter the CrashLoopBackOff
state due to some unexpected things. You can view the pod status and pod_name under the specified namespace through the kubectl get pod --namespace ${namespace}
command.
In this state, the cause of the service problem cannot be determined simply by using the describe and logs commands. When the service enters the CrashLoopBackOff
state, there needs to be a mechanism that allows the pod deploying the service to enter the running
state so that users can enter the container for debugging through exec.
doris-operator provides a debug
running mode. In essence, the debug process occupies the active detection port of the corresponding node, bypasses the k8s active detection mechanism, and creates a smoothly running container environment to facilitate users to enter and locate problems.
The following describes how to enter debug mode for manual debugging when the service enters CrashLoopBackOff
, and how to return to normal startup state after solving the problem.
Start Debug mode
When a pod of the service enters CrashLoopBackOff or cannot be started normally during normal operation, take the following steps to put the service into debug
mode and manually start the service to find the problem.
1.Use the following command to add annotation to the pod with problems.
$ kubectl annotate pod ${pod_name} --namespace ${namespace} selectdb.com.doris/runmode=debug
When the service is restarted next time, the service will detect the annotation that identifies the debug
mode startup, and will enter the debug
mode to start, and the pod status will be running
.
2.When the service enters debug
mode, the pod of the service is displayed in a normal state. Users can enter the inside of the pod through the following command
$ kubectl --namespace ${namespace} exec -ti ${pod_name} bash
3. Manually start the service under debug
. When the user enters the pod, manually execute the start_xx.sh
script by modifying the port of the corresponding configuration file. The script directory is under /opt/apache-doris/xx/bin
.
FE needs to modify query_port
, BE needs to modify heartbeat_service_port
The main purpose is to avoid misleading the flow by accessing the crashed node through service in debug
mode.
Exit Debug mode
When the service locates the problem, it needs to exit the debug
operation. At this time, you only need to delete the corresponding pod according to the following command, and the service will start in the normal mode.
$ kubectl delete pod ${pod_name} --namespace ${namespace}
Notes
After entering the pod, you need to modify the port information of the configuration file before you can manually start the corresponding Doris component.
- FE needs to modify the
query_port=9030
configuration with the default path:/opt/apache-doris/fe/conf/fe.conf
. - BE needs to modify the
heartbeat_service_port=9050
configuration with the default path:/opt/apache-doris/be/conf/be.conf
.