Application-level failover

In the multi-cluster scenario, user workloads may be deployed in multiple clusters to improve service high availability. Karmada already supports multi-cluster failover when detecting a cluster fault. It’s a consideration from a cluster perspective. However, some failures of clusters will only affect specific applications. From the perspective of the cluster, it may be necessary to distinguish between affected and unaffected applications. Also, the application may still be unavailable when the control plane of the cluster is in a healthy state. Therefore, Karmada needs to provide a means of fault migration from an application perspective.

Why Application-level Failover is Required

The following describes some scenarios of application-level failover:

  • The administrator deploys an application in multiple clusters with preemptive scheduling. When cluster resources are in short supply, low-priority applications that were running normally are preempted and cannot run normally for a long time. At this time, applications cannot self-heal within a cluster. Users wants to try to schedule it to another cluster to ensure that the service is continuously served.
  • The administrator use the cloud vendor’s spot instance to deploy the application. When users use spot instances to deploy applications, the application may fail to run due to resources being recycled. In this scenario, the amount of resource perceived by the scheduler is the size of the resource quota, not the actual available resources. At this time, users want to schedule the application other than the one that failed previously.
  • ….

How Do I Enable the Feature?

When an application migrates from one cluster to another, it needs to ensure that its dependencies are migrated synchronously. Therefore, you need to ensure that PropagateDeps feature gate is enabled and propagateDeps: true is set in the propagation policy. PropagateDeps feature gate has evolved to Beta since Karmada v1.4 and is enabled by default.

Also, whether the application needs to be migrated depends on the health status of the application. Karmada’s Resource Interpreter Framework is designed for interpreting resource structure. It provides users with a interpreter operation to tell Karmada how to figure out the health status of a specific object. It’s up to users to decide when to reschedule. Before you use the feature, you need to ensure that the interpretHealth rules for the application is configured.

The application-level failover is controlled by the Failover feature gate. Failover feature gate has evolved to Beta since Karmada v1.4 and is enabled by default. In addition, if you use the purge mode with graceful eviction, GracefulEviction feature gate should be enabled. GracefulEviction feature gate has also evolved to Beta since Karmada v1.4 and is enabled by default.

Configure Application Failover

.spec.failover.application field of PropagationPolicy represents the rules of application failover.

It has three fields to set:

  • DecisionConditions
  • PurgeMode
  • GracePeriodSeconds

Configure Decision Conditions

DecisionConditions indicates the decision conditions of performing the failover process. Only when all conditions are met can the failover process be performed. Currently, it includes tolerance time for the application’s unhealthy state. It’s 300s by default.

PropagationPolicy can be configured as follows:

  1. apiVersion: policy.karmada.io/v1alpha1
  2. kind: PropagationPolicy
  3. metadata:
  4. name: test-propagation
  5. spec:
  6. #...
  7. failover:
  8. application:
  9. decisionConditions:
  10. tolerationSeconds: 300
  11. #...

Configure PurgeMode

PurgeMode represents represents how to deal with the legacy applications on the cluster from which the application is migrated. Karmada supports three different purgeMode for eviction:

  • Immediately represents that Karmada will immediately evict the legacy application.
  • Graciously represents that Karmada will wait for the application to come back to healthy on the new cluster or after a timeout is reached before evicting the application. You need to configure GracePeriodSeconds meanwhile. If the application on the new cluster cannot reach a Healthy state, Karmada will delete the application after GracePeriodSeconds is reached. It’s 600s by default.
  • Never represents that Karmada will not evict the application and users manually confirms how to clean up redundant copies.

PropagationPolicy can be configured as follows:

  1. apiVersion: policy.karmada.io/v1alpha1
  2. kind: PropagationPolicy
  3. metadata:
  4. name: test-propagation
  5. spec:
  6. #...
  7. failover:
  8. application:
  9. decisionConditions:
  10. tolerationSeconds: 300
  11. gracePeriodSeconds: 600
  12. purgeMode: Graciously
  13. #...

Or

  1. apiVersion: policy.karmada.io/v1alpha1
  2. kind: PropagationPolicy
  3. metadata:
  4. name: test-propagation
  5. spec:
  6. #...
  7. failover:
  8. application:
  9. decisionConditions:
  10. tolerationSeconds: 300
  11. purgeMode: Never
  12. #...

Example

Assume that you have configured a propagationPolicy:

  1. apiVersion: policy.karmada.io/v1alpha1
  2. kind: PropagationPolicy
  3. metadata:
  4. name: nginx-propagation
  5. spec:
  6. failover:
  7. application:
  8. decisionConditions:
  9. tolerationSeconds: 120
  10. purgeMode: Never
  11. resourceSelectors:
  12. - apiVersion: apps/v1
  13. kind: Deployment
  14. name: nginx
  15. placement:
  16. clusterAffinity:
  17. clusterNames:
  18. - member1
  19. - member2
  20. - member3
  21. spreadConstraints:
  22. - maxGroups: 1
  23. minGroups: 1
  24. spreadByField: cluster
  25. ---
  26. apiVersion: apps/v1
  27. kind: Deployment
  28. metadata:
  29. name: nginx
  30. labels:
  31. app: nginx
  32. spec:
  33. replicas: 2
  34. selector:
  35. matchLabels:
  36. app: nginx
  37. template:
  38. metadata:
  39. labels:
  40. app: nginx
  41. spec:
  42. containers:
  43. - image: nginx
  44. name: nginx

Now the application is scheduled into member2 and these two replicas run normally. Now you taint all nodes in member2 and evict the replica to construct the abnormal state of the application.

  1. # mark node "member2-control-plane" as unschedulable in cluster member2
  2. kubectl --context member2 cordon member2-control-plane
  3. # delete the pod in cluster member2
  4. kubectl --context member2 delete pod -l app=nginx

You can immediately find that the deployment is unhealthy now from the ResourceBinding.

  1. #...
  2. status:
  3. aggregatedStatus:
  4. - applied: true
  5. clusterName: member2
  6. health: Unhealthy
  7. status:
  8. availableReplicas: 0
  9. readyReplicas: 0
  10. replicas: 2

After tolerationSeconds is reached, you will find that the deployment in member2 has been evicted and it’s re-scheduled to member1.

  1. #...
  2. spec:
  3. clusters:
  4. - name: member1
  5. replicas: 2
  6. gracefulEvictionTasks:
  7. - creationTimestamp: "2023-05-08T09:29:02Z"
  8. fromCluster: member2
  9. producer: resource-binding-application-failover-controller
  10. reason: ApplicationFailure
  11. suppressDeletion: true

You can edit suppressDeletion to false in gracefulEvictionTasks to evict the application in the failed cluster after you confirm the failure.

Application-level failover - 图1note

Application failover is still a work in progress. We are in the progress of gathering use cases. If you are interested in this feature, please feel free to start an enhancement issue to let us know.