Simulate Azure Faults

This document describes how to use Chaos Mesh to simulate Azure faults.

AzureChaos introduction

AzureChaos can help you simulate fault scenarios on the specified Azure instance. Currently, AzureChaos supports the following fault types:

  • VM Stop: stops the specified VM instance.
  • VM Restart: restarts the specified VM instance.
  • Disk Detach: uninstalls the data disk from the specified VM instance.

Secret file

To easily connect to the Azure cluster, you can create a Kubernetes Secret file to store the authentication information in advance.

A Secret file sample is as follows:

  1. apiVersion: v1
  2. kind: Secret
  3. metadata:
  4. name: cloud-key-secret
  5. namespace: chaos-testing
  6. type: Opaque
  7. stringData:
  8. client_id: your-client-id
  9. client_secret: your-client-secret
  10. tenant_id: your-tenant-id
  • name means the Kubernetes Secret object.
  • namespace means the namespace of the Kubernetes Secret object.
  • client_id stores Application (client) ID of Azure App registrations.
  • client_secret stores Application (client) secret value of Azure App registrations.
  • tenant_id stores Directory (tenant) ID of Azure App registrations. For client_id and client_secret, please refer to Confidential client application.

Simulate Azure Faults - 图1Make sure that App registrations in the Secret file has been added as a contributor or owner to the access control (IAM) of the VM instance. :::

Create experiments using Chaos Dashboard

Before you create an experiment using Chaos Dashboard, make sure the following requirements are met:

  1. Chaos Dashboard is installed.

  2. Chaos Dashboard can be accessed via kubectl port-forward:

    1. kubectl port-forward -n chaos-testing svc/chaos-dashboard 2333:2333

    Then you can access the dashboard via http://localhost:2333 in your browser.

:::

  1. Open Chaos Dashboard, and click NEW EXPERIMENT on the page to create a new experiment:

    img

  2. In the Choose a Target area, choose Azure FAULT and select a specific behavior, such as VM STOP.

  3. Fill out the experiment information, and specify the experiment scope and the scheduled experiment duration.

  4. Submit the experiment information.

Create experiments using the YAML file

A vm-stop configuration example

  1. Write the experiment configuration to the azurechaos-vm-stop.yaml file, as shown below:

    1. apiVersion: chaos-mesh.org/v1alpha1
    2. kind: AzureChaos
    3. metadata:
    4. name: vm-stop-example
    5. namespace: chaos-testing
    6. spec:
    7. action: vm-stop
    8. secretName: 'cloud-key-secret'
    9. subscriptionID: 'your-subscription-id'
    10. resourceGroupName: 'your-resource-group-name'
    11. duration: '5m'

    Based on this configuration example, Chaos Mesh will inject the vm-stop fault into the specified VM instance so that the VM instance will be unavailable in 5 minutes.

    For more information about stopping VM instances, refer to Azure documentation - Start or stop a VM.

  2. After the configuration file is prepared, use kubectl to create an experiment:

    1. kubectl apply -f azurechaos-vm-stop.yaml

A vm-restart configuration example

  1. Write the experiment configuration to the azurechaos-vm-restart.yaml file:

    1. apiVersion: chaos-mesh.org/v1alpha1
    2. kind: AzureChaos
    3. metadata:
    4. name: vm-restart-example
    5. namespace: chaos-testing
    6. spec:
    7. action: vm-restart
    8. secretName: 'cloud-key-secret'
    9. subscriptionID: 'your-subscription-id'
    10. resourceGroupName: 'your-resource-group-name'

    Based on this configuration example, Chaos Mesh will inject vm-restart fault into the specified VM instance so that the VM instance will be restarted.

    For more information about restarting the VM instance, refer to the Azure documentation - Restart a VM.

  2. After the configuration file is prepared, use kubectl to create an experiment:

    1. kubectl apply -f azurechaos-vm-restart.yaml

A detach-volume configuration example

  1. Write the experiment configuration to the azurechaos-disk-detach.yaml file:

    1. apiVersion: chaos-mesh.org/v1alpha1
    2. kind: AzureChaos
    3. metadata:
    4. name: disk-detach-example
    5. namespace: chaos-testing
    6. spec:
    7. action: disk-detach
    8. secretName: 'cloud-key-secret'
    9. subscriptionID: 'your-subscription-id'
    10. resourceGroupName: 'your-resource-group-name'
    11. lun: 'your-disk-lun'
    12. diskName: 'your-disk-name'
    13. duration: '5m'

    Based on this configuration example, Chaos Mesh will inject a disk-detach fault into the specified VM instance so that the VM instance is detached from the specified data disk within 5 minutes.

    For more information about detaching Azure date disk, refer to the Azure documentation - Detach a data disk.

  2. After the configuration file is prepared, use kubectl to create an experiment:

    1. kubectl apply -f azurechaos-disk-detach.yaml

Field description

The following table shows the fields in the YAML configuration file.

ParameterTypeDescriptionDefault valueRequiredExample
actionstringIndicates the specific type of faults. Only vm-stop, vm-restart, and disk-detach are supported.vm-stopYesvm-stop
modestringSpecifies the mode of the experiment. The mode options include one (selecting a random Pod), all (selecting all eligible Pods), fixed (selecting a specified number of eligible Pods), fixed-percent (selecting a specified percentage of Pods from the eligible Pods), and random-max-percent (selecting the maximum percentage of Pods from the eligible Pods).N/AYesone
valuestringProvides parameters for the mode configuration, depending on mode. For example, when mode is set to fixed-percent, value specifies the percentage of Pods.N/ANo1
secretNamestringSpecifies the name of the Kubernetes Secret that stores the Azure authentication information.N/ANocloud-key-secret
subscriptionIDstringSpecifies the VM instacnce’s subscription ID.N/AYesyour-subscription-id
resourceGroupNamestringSpecifies the Resource group of VM.N/AYesyour-resource-group-name
lunstringThis is a required field when the action is disk-detach, specifies the LUN (Logic Unit Number) of data disk.N/ANo0
diskNamestringThis is a required field when the action is disk-detach, specifies the name of data disk.N/ANoDATADISK_0
durationstringSpecifies the duration of the experiment.N/AYes30s