Simulate File I/O Faults

This document describes how to create IOChaos experiments in Chaos Mesh.

IOChaos introduction

IOChaos is a type of fault in Chaos Mesh. By creating an IOChaos experiment, you can simulate a scenario of file system fault. Currently, IOChaos supports the following fault types:

  • latency: delays file system calls
  • fault: returns an error for filesystem calls
  • attrOverride: modifies file properties
  • mistake: makes the file read or write a wrong value

For specific features, refer to Create experiments using the YAML files.

Notes

  1. Before creating an IOChaos experiment, make sure there is no Control Manager of Chaos Mesh running on the target Pod.

  2. IOChaos may damage your data. Use IOChaos with caution in the production environment.

Create experiments using Chaos Dashboard

  1. Open Chaos Dashboard, and click NEW EXPERIMENT on the page to create a new experiment:

    Create a New Experiment

  2. In the Choose a Target area, choose FILE SYSTEM INJECTION and select a specific fault type, such as LATENCY.

    ioChaos Experiments

  3. Fill out the experiment information, and specify the experiment scope and the scheduled experiment duration.

    Experiment Information

  4. Submit the experiment information.

Create experiments using the YAML files

Latency example

  1. Write the experiment configuration to the io-latency.yaml file, as shown below:

    1. apiVersion: chaos-mesh.org/v1alpha1
    2. kind: IOChaos
    3. metadata:
    4. name: io-latency-example
    5. namespace: chaos-testing
    6. spec:
    7. action: latency
    8. mode: one
    9. selector:
    10. labelSelectors:
    11. app: etcd
    12. volumePath: /var/run/etcd
    13. path: '/var/run/etcd/**/*'
    14. delay: '100ms'
    15. percent: 50
    16. duration: '400s'

    In this configuration example, Chaos Mesh injects a delay into the directory /var/run/etcd and causes latency of 100 milliseconds to all file system operations (including read, writing, list contents, and so on) in this directory.

  2. After the configuration file is prepared, use kubectl to create an experiment:

    1. kubectl apply -f ./io-latency.yaml

Fault example

  1. Write the experiment configuration to the io-fault.yaml file, as shown below:

    1. apiVersion: chaos-mesh.org/v1alpha1
    2. kind: IOChaos
    3. metadata:
    4. name: io-fault-example
    5. namespace: chaos-testing
    6. spec:
    7. action: fault
    8. mode: one
    9. selector:
    10. labelSelectors:
    11. app: etcd
    12. volumePath: /var/run/etcd
    13. path: /var/run/etcd/**/*
    14. errno: 5
    15. percent: 50
    16. duration: '400s'

    In this example, Chaos Mesh injects a file fault into the directory /var/run/etcd, which gives a 50% probability of failure in all file system operations under this directory and returns error code 5 (Input/output error).

  2. After the configuration file is prepared, use kubectl to create an experiment:

    1. kubectl apply -f ./io-fault.yaml

attrOverride example

  1. Write the experiment configuration to the io-attr.yaml file:

    1. apiVersion: chaos-mesh.org/v1alpha1
    2. kind: IOChaos
    3. metadata:
    4. name: io-attr-example
    5. namespace: chaos-testing
    6. spec:
    7. action: attrOverride
    8. mode: one
    9. selector:
    10. labelSelectors:
    11. app: etcd
    12. volumePath: /var/run/etcd
    13. path: /var/run/etcd/**/*
    14. attr:
    15. perm: 72
    16. percent: 10
    17. duration: '400s'

    In this configuration example, Chaos Mesh injects /var/run/etcd directories attrOverride fault, giving a 10% probability that all file system operations in this directory will change the target file permissions to 72 (110 in octal), which will allow files to be executed only by the owner and their group and not authorized to perform other actions.

  2. After the configuration file is prepared, use kubectl to create an experiment:

    1. kubectl apply -f ./io-attr.yaml

Mistake example

  1. Write the experiment configuration to the io-mistake.yaml file:

    1. apiVersion: chaos-mesh. rg/v1alpha1
    2. ind: IOChaos
    3. metadata:
    4. name: io-mistake-example
    5. namespace: chaos-testing
    6. special:
    7. action: mistake
    8. mode: one
    9. selector:
    10. labelSelectors:
    11. app: etcd
    12. volumePath: /var/run/etcd
    13. path: /var/run/etcd/**/*
    14. mistake:
    15. filling: zero
    16. maxOccurrences: 1
    17. maxLength: 10
    18. methods:
    19. - READ
    20. - WRITE
    21. percent: 10
    22. duration: '400s'

    In this configuration example, Chaos Mesh injects read and write faults into the directory /var/run/etcd, which gives a 10% probability of failure in the read and write operations under this directory. During this process, one random position with a maximum length of 10 bytes will be replaced with 0 bytes.

  2. After the configuration file is prepared, use kubectl to create an experiment:

    1. kubectl apply -f ./io-mistake.yaml

Field description

General fields

ParameterTypeDescriptionDefault valueRequiredExample
actionstringIndicates the specific type of faults. Only latency, fault, attrOverride, and mistake are supported.Yeslatency
modestringSpecifies the mode of the experiment. The mode options include one (selecting a Pod at random), all (selecting all eligible Pods), fixed (selecting a specified number of eligible Pods), fixed-percent (selecting a specified percentage of the eligible Pods), and random-max-percent (selecting the maximum percentage of the eligible Pods).NoneYesone
selectorstructSpecifies the target Pod. For details, refer to Define the experiment scope.NoneYes
valuestringProvides parameters for the mode configuration, depending on mode. For example, when mode is set to fixed-percent, value specifies the percentage of Pods.No1
volumePathstringThe mount point of volume in the target container. Must be the root directory of the mount.Yes/var/run/etcd
pathstringThe valid range of fault injections, either a wildcard or a single file.Valid for all files by defaultNo/var/run/etcd/*/
methods[]stringType of the file system call that requires injecting fault. For more information about supported types, refer to Appendix A.All TypesNoREAD
percentintProbability of failure per operation, in %.100No100
containerNames[]stringSpecifies the name of the container into which the fault is injected.No
durationstringSpecifies the duration of the experiment.Yes30s

The following are specific information about fields corresponding to action:

  • latency

    ParameterTypeDescriptionDefault valueRequiredExample
    delaystringSpecific delay timeYes100 ms
  • fault

    ParameterTypeDescriptionDefault valueRequiredExample
    errnointreturned error numberYes22

    For common error numbers, see Appendix B.

  • attrOverride

    ParameterTypeDescriptionDefault valueRequiredExample
    attrAttrOverrideSpecSpecific property override rulesYesAs follows

    AttrOverrideSpec is defined as follows:

    ParameterTypeDescriptionDefault valueRequiredExample
    inointino numberNo
    sizeintFile sizeNo
    blocksintNumber of blocks that the file usesNo
    atimeTimeSpecLast access timeNo
    mtimeTimeSpecLast modified timeNo
    ctimeTimeSpecLast status change timeNo
    kindstringFile type, see fuser::FileTypeNo
    permintFile permissions in decimalNo72 (110 in octal)
    nlinkintNumber of hard linksNo
    uidintUser ID of the ownerNo
    gidintGroup ID of the ownerNo
    rdevintDevice IDNo

    TimeSpec is defined as follows:

    ParameterTypeDescriptionDefault valueRequiredExample
    secinttimestamp in secondsNo
    nsecintTimestamp in nanosecondsNo

    For the specific meaning of parameters, you can refer to man stat.

  • mistake

    ParameterTypeDescriptionDefault valueRequiredExample
    mistakeMistakeSpecSpecific error rulesYes

    MistakeSpec is defined as follows:

    ParameterTypeDescriptionDefault valueRequiredExample
    fillingstringThe wrong data to be filled. Only zero (fill 0) or random (fill random bytes) are supported.Yes
    maxOccurrencesintMaximum number of errors in each operation.Yes1
    maxLengthintMaximum length of each error (in bytes).Yes1

::warning It is suggested that you only use mistake on READ and WRITE file system calls. Using mistake on other file system calls may lead to unexpected consequences, including but not limited to file system damage and program crashes. :::

Local debugging

If you are not sure about the effect of a certain Chaos, you can use toda to test the feature locally. Chaos Mesh also uses toda to implement IOChaos.

Appendix A: methods type

  • lookup
  • forget
  • getattr
  • setattr
  • readlink
  • mknod
  • mkdir
  • unlink
  • rmdir
  • symlink
  • rename
  • link
  • open
  • read
  • write
  • flush
  • release
  • fsync
  • opendir
  • readdir
  • releasedir
  • fsyncdir
  • statfs
  • setxattr
  • getxattr
  • listxattr
  • removexatr
  • access
  • create
  • getlk
  • setlk
  • bmap

For more information, refer to fuser::Filesystem.

Appendix B: Common Error Numbers

  • 1: Operation not permitted
  • 2: No such file or directory
  • 5: I/O error
  • 6: No such device or address
  • 12: Out of memory
  • 16: Device or resource busy
  • 17: File exists
  • 20: Not a directory
  • 22: Invalid argument
  • 24: Too many open files
  • 28: No space left on device

For more information, refer to Linux source code.