Workflow Working Mechanism

This document will give a brief introduction to the core mechanisms of KubeVela Workflow.

The execution of workflow has two different running modes: DAG mode and StepByStep mode. In DAG mode, all steps in the workflow will execute concurrently. They will form a dependency graph for running according to the Input/Output in the step configuration automatically. If one workflow step has not met all its dependencies, it will wait for the conditions. In StepByStep mode, all steps will be executed in order. In KubeVela v1.2+, the defaut running mode is StepByStep. Using DAG mode is not supported in version before KubeVela v1.5.

Workflow will retry steps and suspend for different reasons.

  1. If step fails or waits for conditions, the workflow will retry after a backoff time. The backoff time will increase by the retry times.
  2. If step fails too many times, the workflow will enter suspending state and stop retry.
  3. If step is waiting for manual approval, the workflow will enter suspending state immediately.

The backoff time for workflow to retry can be calculated by int(0.05 * 2^(n-1)) where n is the number of retries. The minimal backoff time is 1 second,the first ten backoff time will be like:

Times2^(n-1)0.05*2^(n-1)Requeue After(s)
110.051
220.11
340.21
480.41
5160.81
6321.61
7643.23
81286.46
925612.812
1051225.625

If the workflow step is waiting, the max backoff time is 60s, you can change it by setting --max-workflow-wait-backoff-time in the bootstrap parameter of KubeVela controller.

If the workflow step is failed, the max backoff time is 300s, you can change it by setting--max-workflow-failed-backoff-time in the bootstrap parameter of KubeVela controller.

For failure case, the workflow will retry at most 10 times by default and enter suspending state after that. You can change the retry times by setting --max-workflow-step-error-retry-times in the bootstrap parameter of KubeVela controller.

Note that if the workflow step is unhealthy, the workflow step will be marked as wait but not failed and it will wait for healthy.

When workflow enters running state or suspends due to condition wait, KubeVela application will re-apply applied resources to prevent configuration drift routinely. This process is called State Keep in KubeVela. By default, the interval of State Keep is 5 minutes, which can be configured in the bootstrap parameter of KubeVela controller by setting --application-re-sync-period. If you want to disable the state keep capability, you can also use the apply-once policy in the application.

Last updated on Aug 4, 2023 by Daniel Higuero