Extend Chaos Daemon Interface

In Add a new chaos experiment type, you have added HelloWorldChaos, which can print Hello world! in the logs of Chaos Controller Manager.

To enable the HelloWorldChaos to inject some faults into the target Pod, you need to extend Chaos Daemon interface.

Extend Chaos Daemon Interface - 图1tip

It’s recommended to read the architecture of Chaos Mesh before you go forward.

This document covers:

Selector

In api/v1alpha1/helloworldchaos_type.go, you have defined HelloWorldSpec, which includes ContainerSelector:

  1. // HelloWorldChaosSpec defines the desired state of HelloWorldChaos
  2. type HelloWorldChaosSpec struct {
  3. // ContainerSelector specifies the target for injection
  4. ContainerSelector `json:",inline"`
  5. // Duration represents the duration of the chaos
  6. // +optional
  7. Duration *string `json:"duration,omitempty"`
  8. // RemoteCluster represents the remote cluster where the chaos will be deployed
  9. // +optional
  10. RemoteCluster string `json:"remoteCluster,omitempty"`
  11. }
  12. //...
  13. // GetSelectorSpecs is a getter for selectors
  14. func (obj *HelloWorldChaos) GetSelectorSpecs() map[string]interface{} {
  15. return map[string]interface{}{
  16. ".": &obj.Spec.ContainerSelector,
  17. }
  18. }

In Chaos Mesh, Selector is used to define the scope of a chaos experiment, the target namespace, the annotation, the label, etc.

Selector can also be some more specific values (e.g. AWSSelector in AWSChaos). Normally each chaos experiment needs only one selector, with exceptions like NetworkChaos because it sometimes needs two selectors as two objects for network partitioning.

You can refer to Define the Scope of Chaos Experiments for more information about Selector.

Implement the gRPC interface

To allow Chaos Daemon to accept the requests from Chaos Controller Manager, you need to implement a new gRPC interface.

  1. Add the RPC in pkg/chaosdaemon/pb/chaosdaemon.proto:

    1. service ChaosDaemon {
    2. ...
    3. rpc ExecHelloWorldChaos(ExecHelloWorldRequest) returns (google.protobuf.Empty) {}
    4. }
    5. message ExecHelloWorldRequest {
    6. string container_id = 1;
    7. }

    Then you need to update the related chaosdaemon.pb.go file by running the following command:

    1. make proto
  2. Implement gRPC services in Chaos Daemon.

    In the pkg/chaosdaemon directory, create a file named helloworld_server.go with the following contents:

    1. package chaosdaemon
    2. import (
    3. "context"
    4. "github.com/golang/protobuf/ptypes/empty"
    5. "github.com/chaos-mesh/chaos-mesh/pkg/bpm"
    6. "github.com/chaos-mesh/chaos-mesh/pkg/chaosdaemon/pb"
    7. )
    8. func (s *DaemonServer) ExecHelloWorldChaos(ctx context.Context, req *pb.ExecHelloWorldRequest) (*empty.Empty, error) {
    9. log := s.getLoggerFromContext(ctx)
    10. log.Info("ExecHelloWorldChaos", "request", req)
    11. pid, err := s.crClient.GetPidFromContainerID(ctx, req.ContainerId)
    12. if err != nil {
    13. return nil, err
    14. }
    15. cmd := bpm.DefaultProcessBuilder("sh", "-c", "ps aux").
    16. SetContext(ctx).
    17. SetNS(pid, bpm.MountNS).
    18. Build(ctx)
    19. out, err := cmd.Output()
    20. if err != nil {
    21. return nil, err
    22. }
    23. if len(out) != 0 {
    24. log.Info("cmd output", "output", string(out))
    25. }
    26. return &empty.Empty{}, nil
    27. }

    After chaos-daemon receives the ExecHelloWorldChaos request, you can see a list of processes in the current container.

  3. Send a gRPC request when applying the chaos experiment.

    Every chaos experiment has a life cycle: apply and then recover. However, there are some chaos experiments that cannot be recovered by default (for example, PodKill in PodChaos and HelloWorldChaos). These are called OneShot experiments. You can find +chaos-mesh:oneshot=true, which we have defined in the HelloWorldChaos schema.

    The chaos controller manager needs to send a request to the chaos daemon when HelloWorldChaos is in the apply phase. This is done by updating controllers/chaosimpl/helloworldchaos/types.go:

    1. func (impl *Impl) Apply(ctx context.Context, index int, records []*v1alpha1.Record, obj v1alpha1.InnerObject) (v1alpha1.Phase, error) {
    2. impl.Log.Info("Apply helloworld chaos")
    3. decodedContainer, err := impl.decoder.DecodeContainerRecord(ctx, records[index], obj)
    4. if err != nil {
    5. return v1alpha1.NotInjected, err
    6. }
    7. pbClient := decodedContainer.PbClient
    8. containerId := decodedContainer.ContainerId
    9. _, err = pbClient.ExecHelloWorldChaos(ctx, &pb.ExecHelloWorldRequest{
    10. ContainerId: containerId,
    11. })
    12. if err != nil {
    13. return v1alpha1.NotInjected, err
    14. }
    15. return v1alpha1.Injected, nil
    16. }
    17. func (impl *Impl) Recover(ctx context.Context, index int, records []*v1alpha1.Record, obj v1alpha1.InnerObject) (v1alpha1.Phase, error) {
    18. impl.Log.Info("Recover helloworld chaos")
    19. return v1alpha1.NotInjected, nil
    20. }

    Extend Chaos Daemon Interface - 图2info

    There is no need to recover HelloWorldChaos because HelloWorldChaos is a OneShot experiment. For the type of chaos experiment you develop, you can implement the logic of the recovery function as needed.

Verify the output of HelloWorldChaos

Now you can verify the output of HelloWorldChaos:

  1. Build Docker images as we described in Add a new chaos experiment type, then load them into your cluster.

    Extend Chaos Daemon Interface - 图3note

    If you’re using minikube, some versions of minikube cannot overwrite the existing images with the same tag. You may delete the existing images before loading the new ones.

  2. Update Chaos Mesh:

  3. Deploy a Pod for testing:

    1. kubectl apply -f https://raw.githubusercontent.com/chaos-mesh/apps/master/ping/busybox-statefulset.yaml
  4. Create a hello-busybox.yaml file with the following content:

    1. apiVersion: chaos-mesh.org/v1alpha1
    2. kind: HelloWorldChaos
    3. metadata:
    4. name: hello-busybox
    5. namespace: chaos-mesh
    6. spec:
    7. selector:
    8. namespaces:
    9. - busybox
    10. mode: all
    11. duration: 1h
  5. Run:

    1. kubectl apply -f hello-busybox.yaml
    2. # helloworldchaos.chaos-mesh.org/hello-busybox created
    • Now you can check if chaos-controller-manager has Apply helloworld chaos in its logs:

      1. kubectl logs -n chaos-mesh chaos-controller-manager-xxx

      Example output:

      1. 2023-07-16T08:20:46.823Z INFO records records/controller.go:149 apply chaos {"id": "busybox/busybox-0/busybox"}
      2. 2023-07-16T08:20:46.823Z INFO helloworldchaos helloworldchaos/types.go:27 Apply helloworld chaos
    • Check the logs of Chaos Daemon:

      1. kubectl logs -n chaos-mesh chaos-daemon-xxx

      Example output:

      1. 2023-07-16T08:20:46.833Z INFO chaos-daemon.daemon-server chaosdaemon/server.go:187 ExecHelloWorldChaos {"namespacedName": "chaos-mesh/hello-busybox", "request": "container_id:\"docker://5e01e76efdec6aa0934afc15bb80e121d58b43c529a6696a01a242f7ac68f201\""}
      2. 2023-07-16T08:20:46.834Z INFO chaos-daemon.daemon-server.background-process-manager.process-builder pb/chaosdaemon.pb.go:4568 build command {"namespacedName": "chaos-mesh/hello-busybox", "command": "/usr/local/bin/nsexec -m /proc/104710/ns/mnt -- sh -c ps aux"}
      3. 2023-07-16T08:20:46.841Z INFO chaos-daemon.daemon-server chaosdaemon/server.go:187 cmd output {"namespacedName": "chaos-mesh/hello-busybox", "output": "PID USER TIME COMMAND\n 1 root 0:00 sh -c echo Container is Running ; sleep 3600\n"}
      4. 2023-07-16T08:20:46.856Z INFO chaos-daemon.daemon-server chaosdaemon/server.go:187 ExecHelloWorldChaos {"namespacedName": "chaos-mesh/hello-busybox", "request": "container_id:\"docker://bab4f632a0358529f7d72d35e014b8c2ce57438102d99d6174dd9df52d093e99\""}
      5. 2023-07-16T08:20:46.864Z INFO chaos-daemon.daemon-server.background-process-manager.process-builder pb/chaosdaemon.pb.go:4568 build command {"namespacedName": "chaos-mesh/hello-busybox", "command": "/usr/local/bin/nsexec -m /proc/104841/ns/mnt -- sh -c ps aux"}
      6. 2023-07-16T08:20:46.867Z INFO chaos-daemon.daemon-server chaosdaemon/server.go:187 cmd output {"namespacedName": "chaos-mesh/hello-busybox", "output": "PID USER TIME COMMAND\n 1 root 0:00 sh -c echo Container is Running ; sleep 3600\n"}

    You will see two separate lines of ps aux, which are corresponding to two different Pods.

Next steps

If you encounter any problems during the process, create an issue in the Chaos Mesh repository.

If you are curious about how all this works, you can read the controllers/README.md and code for different controllers next.

You are now ready to become a Chaos Mesh developer! Feel free to visit the Chaos Mesh issues to find a good first issue and get started!