Simulate JVM Application Faults

Chaos Mesh simulates the faults of JVM application through Byteman. The supported fault types are as follows:

  • Throw custom exceptions
  • Trigger garbage collection
  • Increase method latency
  • Modify return values of a method
  • Trigger faults by setting Byteman configuration files
  • Increase JVM pressure

This document describes how to use Chaos Mesh to create the above fault types of JVM experiments.

Simulate JVM Application Faults - 图1note

Your Linux kernel must be v4.1 or later.

Create experiments using Chaos Dashboard

  1. Open Chaos Dashboard, and click NEW EXPERIMENT on the page to create a new experiment.

    create a new experiment

  2. In the Choose a Target area, choose JVM FAULT, and select a specific behavior, such as RETURN. Then, fill out the detailed configurations.

    JVMChaos experiments

    For information about how to fill out the configurations, refer to [Field Description] (#field-description).

  3. Fill out the experiment information, and specify the experiment scope and the scheduled experiment duration.

    experiment information

  4. Submit the experiment information.

Create experiments using YAML files

The following example shows the usage and effects of JVMChaos. The example specifies the return values of a method. The YAML files referred to in the following steps can be found in examples/jvm. The default work directory for the following steps is also examples/jvm. The default namespace where Chaos Mesh is installed is chaos-testing.

Step 1. Create the target application

Helloworld is a simple Java application. In this section, this application is used as the target application that is to be tested. The target application is defined in example/jvm/app.yaml as follows:

  1. apiVersion: v1
  2. kind: Pod
  3. metadata:
  4. name: helloworld
  5. namespace: helloworld
  6. spec:
  7. containers:
  8. - name: helloworld
  9. # source code: https://github.com/WangXiangUSTC/byteman-example/tree/main/example.helloworld
  10. # this application will print log like this below:
  11. # 0. Hello World
  12. # 1. Hello World
  13. # ...
  14. image: xiang13225080/helloworld:v1.0
  15. imagePullPolicy: IfNotPresent
  1. Create the namespace for the target application:

    1. kubectl create namespace helloworld
  2. Build the application Pod:

    1. kubectl apply -f app.yaml
  3. Execute kubectl -n helloworld get pods, and you are expected to find a pod named helloworld in the helloworld namespace.

    1. kubectl -n helloworld get pods

    The result is as follows:

    1. kubectl get pods -n helloworld
    2. NAME READY STATUS RESTARTS AGE
    3. helloworld 1/1 Running 0 2m

    After the READY column turns to 1/1, you can proceed to the next step.

Step 2. Observe application behaviors before injecting faults​

You can observe the behavior of helloworld application before injecting faults, for example:

  1. kubectl -n helloworld logs -f helloworld

The result is as follows:

  1. 0. Hello World
  2. 1. Hello World
  3. 2. Hello World
  4. 3. Hello World
  5. 4. Hello World
  6. 5. Hello World

You can see that helloworld outputs a line of Hello World every second, and the number of each line increases in turn.

Step 3. Inject JVMChaos and check

  1. The JVMChaos with a specified return value is as follows:

    1. apiVersion: chaos-mesh.org/v1alpha1
    2. kind: JVMChaos
    3. metadata:
    4. name: return
    5. namespace: helloworld
    6. spec:
    7. action: return
    8. class: Main
    9. method: getnum
    10. value: '9999'
    11. mode: all
    12. selector:
    13. namespaces:
    14. - helloworld

    JVMChaos changes the return value of the getnum method to the number 9999, which means that the number of each line in the helloworld output is set to 9999.

  2. Inject JVMChaos with a specified value:

    1. kubectl apply -f ./jvm-return-example.yaml
  3. Check the latest log of helloworld:

    1. kubectl -n helloworld logs -f helloworld

    The log is as follows:

    1. Rule.execute called for return_0:0
    2. return execute
    3. caught ReturnException
    4. 9999. Hello World

Field description

ParameterTypeDescriptionDefault valueRequiredExample
actionstringIndicates the specific fault type. The available fault types include latency, return, exception, stress, gc, and ruleData.NoneYesreturn
modestringIndicates how to select Pod. The supported modes include one, all, fixed, fixed-percent, and random-max-percent.NoneYesone

The meanings of the different action values are as follows:

ValueMeaning
latencyIncrease method latency
returnModify return values of a method
exceptionThrow custom exceptions
stressIncrease CPU usage of Java process, or cause memory overflow (support heap overflow and stack overflow)
gcTrigger garbage collection
ruleDataTrigger faults by setting Byteman configuration files

For different action values, there are different configuration items that can be filled in.

Parameters for latency

ParameterTypeDescriptionRequired
classstringThe name of the Java classYes
methodstringThe name of the methodYes
latencyintThe duration of increasing method latency. The unit is milisecond.Yes
portintThe port ID attached to the Java process agent. The faults are injected into the Java process through this ID.No

Parameters for return

ParameterTypeDescriptionRequired
classstringThe name of the Java classYes
methodstringThe name of the methodYes
valuestringSpecifies the return value of the methodstring type, required. Currently, the item can be numeric and string types. If the item (return value) is string, double quotes are required, like “chaos”.Yes
portintThe port ID attached to the Java process agent. The faults are injected into the Java process through this ID.No

Parameters for exception

ParameterTypeDescriptionRequired
classstringThe name of the Java classYes
methodstringThe name of the methodYes
exceptionstringThe thrown custom exception, such as ‘java.io.IOException(“BOOM”)’.Yes
portintThe port ID attached to the Java process agent. The faults are injected into the Java process through this ID.No

Parameters for stress

ParameterTypeDescriptionRequired
cpuCountintThe number of CPU cores used for increasing CPU stress. You must configure one item between cpu-count and mem-type.No
memTypestringThe type of OOM. Currently, both ‘stack’ and ‘heap’ OOM types are supported. You must configure one item between cpu-count and mem-type.No
portintThe port ID attached to the Java process agent. The faults are injected into the Java process through this ID.No

Parameters for gc

ParameterTypeDescriptionRequired
portintThe port ID attached to the Java process agent. The faults are injected into the Java process through this ID.No

Parameters for ruleData

ParameterTypeDescriptionRequired
ruleDatasrtingSpecifies the Byteman configuration dataYes
portintThe port ID attached to the Java process agent. The faults are injected into the Java process through this ID.No

When you write the rule configuration file, take into account the specific Java program and the byteman-rule-language. For example:

  1. RULE modify return value
  2. CLASS Main
  3. METHOD getnum
  4. AT ENTRY
  5. IF true
  6. DO
  7. return 9999
  8. ENDRULE

You need to escape the line breaks in the configuration file to the newline character “\n”, and use the escaped text as the value of “rule-data” as follows:

  1. \nRULE modify return value\nCLASS Main\nMETHOD getnum\nAT ENTRY\nIF true\nDO return 9999\nENDRULE\n"