Troubleshooting

Troubleshooting

I can’t access some resources when installing Karmada

Pull images from Kubernetes Image Registry (registry.k8s.io).

You can run the following commands to change the image registry in Mainland China.

sed -i'' -e "s#registry.k8s.io#registry.aliyuncs.com/google_containers#g" artifacts/deploy/karmada-etcd.yaml
sed -i'' -e "s#registry.k8s.io#registry.aliyuncs.com/google_containers#g" artifacts/deploy/karmada-apiserver.yaml
sed -i'' -e "s#registry.k8s.io#registry.aliyuncs.com/google_containers#g" artifacts/deploy/kube-controller-manager.yaml

Download the Golang package in Mainland China and run the following command before installation.
```
export GOPROXY=https://goproxy.cn
```

Member cluster healthy checking does not work

If your environment is similar to the following.

After registering member cluster to karmada with push mode, and using kubectl get cluster, found the cluster status was ready. Then, by opening the firewall between the member cluster and karmada, after waiting for a long time, the cluster status was also ready, not change to fail.

The cause of the problem was that the firewall did not close the already existing TCP connection between the member cluster and karmada.

login to the node where the member cluster apiserver is located
use the tcpkill command to close the tcp connection.

# ens192 is the name of the network-card used by the member cluster to communicate with karmada.
tcpkill -9  -i ens192 src host ${KARMADA_APISERVER_IP} and dst port ${MEMBER_CLUTER_APISERVER_IP}

x509: certificate signed by unknown authority issue when using `karmadactl init`

When using the karmadactl init command to install Karmada, init command raises the error log as follows:

deploy.go:55] Post "https://192.168.24.211:32443/api/v1/namespaces": x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "karmada")

Cause: Karmada has been installed on the cluster before. karmada-etcd uses the hostpath mode to mount the local storage. When karmada-etcd is uninstalled, residual data exists. We need to delete files in the default directory/var/lib/karmada-etcd. If the karmadactl --etcd-data parameter is used, please delete the corresponding directory.

Related Issue: #1467, #2504.

karmada-webhook keeps on crashing due to “too many open files”

When using hack/local-up-karmada to install Karmada, karmada-webhook keeps on crashing raising the error log as follows:

I1121 06:33:46.144605       1 webhook.go:83] karmada-webhook version: version.Info{GitVersion:"v1.3.0-425-gf7cac365", GitCommit:"f7cac365d743e5e40493f9ad90352f30123f7f1d", GitTreeState:"dirty", BuildDate:"2022-11-21T06:25:19Z", GoVersion:"go1.19.3", Compiler:"gc", Platform:"linux/amd64"}
I1121 06:33:46.167045       1 webhook.go:113] registering webhooks to the webhook server
I1121 06:33:46.169425       1 internal.go:362] "Starting server" path="/metrics" kind="metrics" addr="[::]:8080"
I1121 06:33:46.169569       1 internal.go:362] "Starting server" kind="health probe" addr="[::]:8000"
I1121 06:33:46.169670       1 shared_informer.go:285] caches populated
I1121 06:33:46.169828       1 internal.go:567] "Stopping and waiting for non leader election runnables"
I1121 06:33:46.169848       1 internal.go:571] "Stopping and waiting for leader election runnables"
I1121 06:33:46.169856       1 internal.go:577] "Stopping and waiting for caches"
I1121 06:33:46.169883       1 internal.go:581] "Stopping and waiting for webhooks"
I1121 06:33:46.169899       1 internal.go:585] "Wait completed, proceeding to shutdown the manager"
E1121 06:33:46.169909       1 webhook.go:132] webhook server exits unexpectedly: too many open files
E1121 06:33:46.169926       1 run.go:74] "command failed" err="too many open files"

It’s a resource exhaustion issue. You can fix it with:

sysctl fs.inotify.max_user_watches=16384
sysctl -w fs.inotify.max_user_watches=100000
sysctl -w fs.inotify.max_user_instances=100000

ServiceAccount deployed to the Karmada control-plane don’t generate token

To improve token security and scalability, the Kubernetes community proposes KEP-1205, which aims to introduce a new mechanism to use ServiceAccount tokens instead of directly mounting secrets generated by ServiceAccount to Pods. For details, see ServiceAccount automation. This feature is called BoundServiceAccountTokenVolume, which has GA in Kubernetes v1.22.

With the GA of the BoundServiceAccountTokenVolume feature, the Kubernetes community considers that it is unnecessary to automatically generate tokens for ServiceAccount because it is insecure. Therefore, KEP-2799 is proposed. One purpose of this KEP is not to automatically generate token secrets for ServiceAccount, and the other purpose is to clear token secrets generated by unused ServiceAccounts.

For the first purpose, the Kubernetes provides the LegacyServiceAccountTokenNoAutoGeneration feature gate, which has entered the Beta phase in Kubernetes v1.24. This is why the Karmada control-plane cannot generate tokens, because karmada-apiserver v1.24 is used in Karmada. If you still want to use the previous method to generate a token secret for ServiceAccount, you can refer to this section.

Schedule failed due to “cluster(s) did not have the API resource”

Karmada detector focus only on resources with karmada-apiserver preferred version.

For example, assuming karmada-apiserver is v1.25 version, its HPA resource has both autoscaling/v1 and autoscaling/v2 version. However, since the preferred version of HPA is autoscaling/v2, the detector will only list/watch autoscaling/v2 version. If the user creates an HPA of autoscaling/v1, kubernetes originally generates create event of both version, but only create event of autoscaling/v2 been watched by detector.

Basing on this background, you need to pay attention to the following two points:

When writing propagation policy, its resourceSelector field only supports resource with karmada-apiserver preferred version.
Member cluster apiserver should support the resource version which karmada-apiserver preferred.

Put it more specifically, still take HPA as an example, you are advised to use autoscaling/v2 HPA in both resource template and propagation policy, just like:

propagate autoscaling/v2 by select autoscaling/v2

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: test-hpa
  namespace: default
spec:
  behavior:
    scaleUp:
      policies:
        - type: Percent
          value: 100
          periodSeconds: 15
        - type: Pods
          value: 4
          periodSeconds: 15
      selectPolicy: Max
      stabilizationWindowSeconds: 0
  maxReplicas: 10
  minReplicas: 1
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: d1
---
apiVersion: policy.karmada.io/v1alpha1
kind: PropagationPolicy
metadata:
  name: tetst-hpa-pp
spec:
  placement:
    clusterAffinity:
      clusterNames:
        - member1
  resourceSelectors:
    - apiVersion: autoscaling/v2
      kind: HorizontalPodAutoscaler
      name: test-hpa
      namespace: default

However, if you insist on propagating a autoscaling/v1 HPA template, you can still succeed if you define resourceSelector in propagation policy as apiVersion: autoscaling/v2, just like:

propagate autoscaling/v1 by select autoscaling/v2

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: test-hpa
spec:
  maxReplicas: 5
  minReplicas: 1
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx
  targetCPUUtilizationPercentage: 10
---
apiVersion: policy.karmada.io/v1alpha1
kind: PropagationPolicy
metadata:
  name: test-hpa-pp
spec:
  placement:
    clusterAffinity:
      clusterNames:
        - member1
  resourceSelectors:
    - apiVersion: autoscaling/v2
      kind: HorizontalPodAutoscaler
      name: test-hpa
      namespace: default

Then, Karmada finally propagates autoscaling/v2 HPA to member clusters, if your member clusters doesn’t support autoscaling/v2 version HPA, you will get propagation failure event like “cluster(s) did not have the API resource”.

Troubleshooting

Troubleshooting

I can’t access some resources when installing Karmada

Member cluster healthy checking does not work

x509: certificate signed by unknown authority issue when using karmadactl init

karmada-webhook keeps on crashing due to “too many open files”

ServiceAccount deployed to the Karmada control-plane don’t generate token

Schedule failed due to “cluster(s) did not have the API resource”

x509: certificate signed by unknown authority issue when using `karmadactl init`