Troubleshooting hosted control planes
- Gathering information to troubleshoot hosted control planes
- Checking why worker nodes did not join the hosted cluster

Troubleshooting hosted control planes

If you encounter issues with hosted control planes, see the following information to guide you through troubleshooting.

Hosted control planes is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

Gathering information to troubleshoot hosted control planes

When you need to troubleshoot an issue with hosted control plane clusters, you can gather information by running the hypershift dump cluster command. The command generates output for the management cluster and the hosted cluster.

The output for the management cluster contains the following content:

Cluster-scoped resources: These resources are node definitions of the management cluster.
The hypershift-dump compressed file: This file is useful if you need to share the content with other people.
Namespaced resources: These resources include all of the objects from the relevant namespaces, such as config maps, services, events, and logs.
Network logs: These logs include the OVN northbound and southbound databases and the status for each one.
Hosted clusters: This level of output involves all of the resources inside of the hosted cluster.

The output for the hosted cluster contains the following content:

Cluster-scoped resources: These resources include all of the cluster-wide objects, such as nodes and CRDs.
Namespaced resources: These resources include all of the objects from the relevant namespaces, such as config maps, services, events, and logs.

Although the output does not contain any secret objects from the cluster, it can contain references to the names of secrets.

Prerequisites

You must have cluster-admin access to the management cluster.
You need the name value for the HostedCluster resource and the namespace where the CR is deployed.
You must have the hcp command line interface installed. For more information, see Installing the hosted control planes command line interface.
You must have the OpenShift CLI (oc) installed.
You must ensure that the kubeconfig file is loaded and is pointing to the management cluster.

Procedure

To gather output for troubleshooting, enter the following commands:

$ CLUSTERNAME="samplecluster"

$ CLUSTERNS="clusters"

$ mkdir clusterDump-${CLUSTERNS}-${CLUSTERNAME}

$ hypershift dump cluster \
    --name ${CLUSTERNAME} \
    --namespace ${CLUSTERNS} \
    --dump-guest-cluster \
    --artifact-dir clusterDump-${CLUSTERNS}-${CLUSTERNAME}

Example output

2023-06-06T12:18:20+02:00   INFO    Archiving dump  {"command": "tar", "args": ["-cvzf", "hypershift-dump.tar.gz", "cluster-scoped-resources", "event-filter.html", "namespaces", "network_logs", "timestamp"]}
2023-06-06T12:18:21+02:00   INFO    Successfully archived dump  {"duration": "1.519376292s"}

To configure the command-line interface so that it impersonates all of the queries against the management cluster by using a username or service account, enter the hypershift dump cluster command with the --as flag.

The service account must have enough permissions to query all of the objects from the namespaces, so the cluster-admin role is recommended to make sure you have enough permissions. The service account must be located in or have permissions to query the namespace of the HostedControlPlane resource.

If your username or service account does not have enough permissions, the output contains only the objects that you have permissions to access. During that process, you might see forbidden errors.
- To use impersonation by using a service account, enter the following commands. Replace values as necessary:
```
$ CLUSTERNAME="samplecluster"
```
```
$ CLUSTERNS="clusters"
```
```
$ SA="samplesa"
```
```
$ SA_NAMESPACE="default"
```
```
$ mkdir clusterDump-${CLUSTERNS}-${CLUSTERNAME}
```
```
$ hypershift dump cluster \
    --name ${CLUSTERNAME} \
    --namespace ${CLUSTERNS} \
    --dump-guest-cluster \
    --as "system:serviceaccount:${SA_NAMESPACE}:${SA}" \
    --artifact-dir clusterDump-${CLUSTERNS}-${CLUSTERNAME}
```
- To use impersonation by using a username, enter the following commands. Replace values as necessary:
```
$ CLUSTERNAME="samplecluster"
```
```
$ CLUSTERNS="clusters"
```
```
$ CLUSTERUSER="cloud-admin"
```
```
$ mkdir clusterDump-${CLUSTERNS}-${CLUSTERNAME}
```
```
$ hypershift dump cluster \
    --name ${CLUSTERNAME} \
    --namespace ${CLUSTERNS} \
    --dump-guest-cluster \
    --as "${CLUSTERUSER}" \
    --artifact-dir clusterDump-${CLUSTERNS}-${CLUSTERNAME}
```

Checking why worker nodes did not join the hosted cluster

If your control plane API endpoint is available, but worker nodes did not join the hosted cluster on AWS, you can debug worker node issues. To troubleshoot why worker nodes did not join the hosted cluster on AWS, you can check the following information.

Hosted control planes on AWS is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

Prerequisites

You have configured the hosting cluster on AWS.
Your control plane API endpoint is available.

Procedure

Address any error messages in the status of the HostedCluster and NodePool resources:
1. Check the status of the HostedCluster resource by running the following command:
```
$ oc get hc -n <hosted_cluster_namespace> <hosted_cluster_name> -o jsonpath='{.status}'
```
2. Check the status of the NodePool resource by running the following command:
```
$ oc get hc -n <hosted_cluster_namespace> <hosted_cluster_name> -o jsonpath='{.status}'
```
  If you did not find any error messages in the status of the HostedCluster and NodePool resources, proceed to the next step.

Check if your worker machines are created by running the following commands, replacing values as necessary:

$ HC_NAMESPACE="clusters"
$ HC_NAME="cluster_name"
$ CONTROL_PLANE_NAMESPACE="${HC_NAMESPACE}-${HC_NAME}"
$ oc get machines.cluster.x-k8s.io -n $CONTROL_PLANE_NAMESPACE
$ oc get awsmachines -n $CONTROL_PLANE_NAMESPACE

If worker machines do not exist, check if the machinedeployment and machineset resources are created by running the following commands:
```
$ oc get machinedeployment -n $CONTROL_PLANE_NAMESPACE
$ oc get machineset -n $CONTROL_PLANE_NAMESPACE
```
If the machinedeployment and machineset resources do not exist, check logs of the HyperShift Operator by running the following command:
```
$ oc logs deployment/operator -n hypershift
```
If worker machines exist but are not provisioned in the hosted cluster, check the log of the cluster API provider by running the following command:
```
$ oc logs deployment/capi-provider -c manager -n $CONTROL_PLANE_NAMESPACE
```
If worker machines exist and are provisioned in the cluster, ensure that machines are initialized through Ignition successfully by checking the system console logs. Check the system console logs of every machine by using the console-logs utility by running the following command:
```
$ ./bin/hypershift console-logs aws --name $HC_NAME --aws-creds ~/.aws/credentials --output-dir /tmp/console-logs
```
You can access the system console logs in the /tmp/console-logs directory. The control plane exposes the Ignition endpoint. If you see an error related to the Ignition endpoint, then the Ignition endpoint is not accessible from the worker nodes through https.
If worker machines are provisioned and initialized through Ignition successfully, you can extract and access the journal logs of every worker machine by creating a bastion machine. A bastion machine allows you to access worker machines by using SSH.
1. Create a bastion machine by running the following command:
```
$ ./bin/hypershift create bastion aws --aws-creds ~/.aws/credentials --name $CLUSTER_NAME --ssh-key-file /tmp/ssh/id_rsa.pub
```
2. Optional: If you used the --generate-ssh flag when creating the cluster, you can extract the public and private key for the cluster by running the following commands:
```
$ mkdir /tmp/ssh
$ oc get secret -n clusters ${HC_NAME}-ssh-key -o jsonpath='{ .data.id_rsa }' | base64 -d > /tmp/ssh/id_rsa
$ oc get secret -n clusters ${HC_NAME}-ssh-key -o jsonpath='{ .data.id_rsa\.pub }' | base64 -d > /tmp/ssh/id_rsa.pub
```
3. Extract journal logs from the every worker machine by running the following commands:
```
$ mkdir /tmp/journals
$ INFRAID="$(oc get hc -n clusters $CLUSTER_NAME -o jsonpath='{ .spec.infraID }')"
$ SSH_PRIVATE_KEY=/tmp/ssh/id_rsa
$ ./test/e2e/util/dump/copy-machine-journals.sh /tmp/journals
```
  You must place journal logs in the /tmp/journals directory in a compressed format. Check for the error that indicates why kubelet did not join the cluster.