Introduction to Chaos Mesh Workflow

When you use Chaos Mesh to simulate real system faults, continuous validation is always a need. You might want to build a series of faults on the Chaos Mesh platform, instead of performing individual Chaos injections.

To meet this need, Chaos Mesh provided Chaos Mesh Workflow, a built-in workflow engine. Using this engine, you can run different Chaos experiments in serial or parallel to simulate production-level errors.

Currently, Chaos Mesh Workflow supports the following features:

  • Serial Orchestration
  • Parallel Orchestration
  • Customized tasks
  • Conditional branch

Typical user scenarios:

  • Use parallel orchestration to inject multiple NetworkChaos faults to simulate complex web environments.
  • Use serial orchestration to perform health checks and use the conditional branch to determine whether to perform the remaining steps.

The design of Chaos Mesh Workflow is, to some extent, inspired by Argo Workflows. If you are familiar with Argo Workflows, you can also quickly get started with Chaos Mesh Workflow.

More workflow examples are available in the Chaos Mesh GitHub repository.

Create a workflow using a YAML file and kubtl

Similar to various types of Chaos objects, workflows also exist in a Kubernetes cluster as a CRD. You can create a Chaos Mesh workflow using kubectl create -f <workflow.yaml>. The following command is an example of creating a workflow. Create a workflow using a local YAML file:

  1. kubectl create -f <workflow.yaml>

Create a workflow using a YAML file from the network:

  1. kubectl create -f https://raw.githubusercontent.com/chaos-mesh/chaos-mesh/master/examples/workflow/serial.yaml

A simple workflow YAML file is defined as follows. In this workflow, StressChaos, NetworkChaos, and PodChaos are injected:

  1. apiVersion: chaos-mesh.org/v1alpha1
  2. kind: Workflow
  3. metadata:
  4. name: try-workflow-parallel
  5. spec:
  6. entry: the-entry
  7. templates:
  8. - name: the-entry
  9. templateType: Parallel
  10. deadline: 240s
  11. children:
  12. - workflow-stress-chaos
  13. - workflow-network-chaos
  14. - workflow-pod-chaos
  15. - name: workflow-network-chaos
  16. templateType: NetworkChaos
  17. deadline: 20s
  18. networkChaos:
  19. direction: to
  20. action: delay
  21. mode: all
  22. selector:
  23. labelSelectors:
  24. "app": "hello-kubernetes"
  25. delay:
  26. latency: "90ms"
  27. correlation: "25"
  28. jitter: "90ms"
  29. - name: workflow-pod-chaos-schedule
  30. templateType: Schedule
  31. deadline: 40s
  32. schedule:
  33. schedule: "@every 2s"
  34. podChaos:
  35. action: pod-kill
  36. mode: one
  37. selector:
  38. labelSelectors:
  39. "app": "hello-kubernetes"
  40. - name: workflow-stress-chaos
  41. templateType: StressChaos
  42. deadline: 20s
  43. stressChaos:
  44. mode: one
  45. selector:
  46. labelSelectors:
  47. "app": "hello-kubernetes"
  48. stressors:
  49. cpu:
  50. workers: 1
  51. load: 20
  52. options: ["--cpu 1" "--timeout 600"]

In the above YAML template, the templates fields define the steps of the experiment. The entry field defines the entry of the workflow when the workflow is being executed.

Each element in templates represents a workflow step. For example:

  1. name: the-entry
  2. templateType: Parallel
  3. deadline: 240s
  4. children:
  5. - workflow-stress-chaos
  6. - workflow-network-chaos
  7. - workflow-pod-chaos

templateType: Parallel means that the node type is parallel. deadline: 240s means that all parallel experiments on this node are expected to be performed in 240 seconds; otherwise, the experiments time out. children means the other template names to be executed in parallel.

For example:

  1. name: workflow-pod-chaos
  2. templateType: PodChaos
  3. deadline: 40s
  4. podChaos:
  5. action: pod-kill
  6. mode: one
  7. selector:
  8. labelSelectors:
  9. 'app': 'hello-kubernetes'

templateType: PodChaos means that the node type is PodChaos experiments. deadline: 40s means that the current Chaos experiment lasts for 40 seconds. podChaos is the definition of the PodChaos experiment.

It is flexible to create a workflow using a YAML file and kubectl. You can nest parallel or serial orchestrations to declare complex orchestrations, and even combine the orchestration with conditional branches to achieve a circular effect.

Field description

Workflow field description

Parameter Type Description Default value Required Example
entry string Declares the entry of the workflow. Its value is a name of a template. None Yes
templates []Template Declares the behavior of each step executable in the workflow. See Template field description for details. None Yes

Template field description

Parameter Type Description Default value Required Example
name string The name of the template, which needs to meet the DNS-1123 requirements. None Yes any-name
type string Type of template. Value options are Task, Serial, Parallel, Suspend, Schedule, AwsChaos, DNSChaos, GcpChaos, HTTPChaos, IOChaos, JVMChaos, KernelChaos, NetworkChaos, PodChaos, StressChaos, and TimeChaos. None Yes PodChaos
deadline string The duration of the template. None No ‘5m30s’
children []string Declares the subtasks under this template. You need to configure this field when the type is Serial or Parallel. None No [“any-chaos-1”, “another-serial-2”, “any-shcedue”]
task Task Configures the customized task. You need to configure this field when the type is Task. See the Task field description for details. None No
conditionalBranches []ConditionalBranch Configures the conditional branch which executes after customized task. You need to configure this field when the type is Task. See the Conditional branch field description for details. None No
awsChaos object Configures AwsChaos. You need to configure this field when the type is AwsChaos. See the Simulate AWS Faults document for details. None No
dnsChaos object Configures DNS Chaos. You need to configure this field when the type is DNSChaos. See the Simulate DNS Faults document for details. None No
gcpChaos object Configures GcpChaos. You need to configure this field when the type is GcpChaos.See the Simulation GCP Faults document for details. None No
httpChaos object Configures HTTPChaos. You need to configure this field when the type is HTTPChaos. See the Simulate HTTP Faults document for details. None No
ioChaos object Configure IOChaos. You need to configure this field when the type is IOChaos. See the Simulate File I/O Faults document for details. None No
jvmChaos object Configures JVMChaos. You need to configure this field when the type is JVMChaos. See the Simulate JVM Application Faults document for details. None No
kernelChaos object Configure KernelChaos. You need to configure this field when the type is KernelChaos. See the Simulate Kernel Faults document for details. None No
networkChaos object Configures NetworkChaos. You need to configure this field when the type is NetworkChaos. See the Simulate AWS Faults document for details. None No
podChaos object Configures PodChaosd. You need to configure this field when the type is PodChaosd. See the Simulate Network Faults document for details. None No
stressChao object Configures StressChaos. You need to configure this field when the type is StressChaos. See the Simulate Heavy Stress on Kubernetes document for details. None No
timeChaos object Configures TimeChaos. You need to configure this field when the type is TimeChaos. See the SImulate Time Faults document for details. None No
schedule object Configures Schedule. You need to configure this field when the type is Schedule. See the Define Scheduling Rules document for details. None No

:::note

When creating a Chaos with a duration in the workflow, you need to fill the duration in the outer deadline field instead of using the duration field in Chaos.

:::

Task field description

Parameter Type Description Default value Required Example
container object Defines a customized task container. See Container field description for details. None No
volumes array If you need to mount a volume in a customized task container, you need to declare the volume in this field. For the detailed definition of a volume, see the Kubernetes documentation - corev1.Volume. None No

Conditional branch field description

Parameter Type Description Default value Required Example
target string The name of the template to be executed by the current conditional branch. None Yes another-chaos
expression string The type is a boolean expression. When a customized task is completed and the expression value is true, the current condition branch is executed. When this value is not set, the conditional branch will be executed directly after the customized task is completed. None No exitCode == 0

Currently, two context variables are provided in expression:

  • exitCode means the exit code for a customized task.
  • stdout indicates the standard output for a customized task.

More context variables will be added in later releases.

Refer to this document write expression expressions.

Container field description

The following table only lists the commonly used fields. For the definitions of more fields, see Kubernetes documentation - core1.Container.

Parameter Type Description Default value Required Example
name string Container name None Yes task
image string Image name None Yes busybox:latest
command []string Container commands None No ["wget", "-q", "http://httpbin.org/status/201"]