This guide demonstrates how to configure circuit breaking for destinations that are a part of an OSM managed service mesh.

Prerequisites

  • Kubernetes cluster running Kubernetes v1.22.9 or greater.
  • Have OSM installed.
  • Have kubectl available to interact with the API server.
  • Have osm CLI available for managing the service mesh.
  • OSM version >= v1.1.0.

Demo

The following demo shows a load-testing client fortio sending traffic to the httpbin service. We will see how applying circuit breakers for traffic to the httpbin service impacts the fortio client when the configured circuit breaking limits trip.

  1. For simplicity, enable permissive traffic policy mode so that explicit SMI traffic access policies are not required for application connectivity within the mesh.

    1. export osm_namespace=osm-system # Replace osm-system with the namespace where OSM is installed
    2. kubectl patch meshconfig osm-mesh-config -n "$osm_namespace" -p '{"spec":{"traffic":{"enablePermissiveTrafficPolicyMode":true}}}' --type=merge
  2. Deploy the httpbin service into the httpbin namespace after enrolling its namespace to the mesh. The httpbin service runs on port 14001.

    1. # Create the httpbin namespace
    2. kubectl create namespace httpbin
    3. # Add the namespace to the mesh
    4. osm namespace add httpbin
    5. # Deploy httpbin service in the httpbin namespace
    6. kubectl apply -f https://raw.githubusercontent.com/openservicemesh/osm-docs/release-v1.2/manifests/samples/httpbin/httpbin.yaml -n httpbin

    Confirm the httpbin service and pods are up and running.

    1. $ kubectl get svc -n httpbin
    2. NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
    3. httpbin ClusterIP 10.96.198.23 <none> 14001/TCP 20s
    1. $ kubectl get pods -n httpbin
    2. NAME READY STATUS RESTARTS AGE
    3. httpbin-5b8b94b9-lt2vs 2/2 Running 0 20s
  3. Deploy the fortio load-testing client in the client namespace after enrolling its namespace to the mesh.

    1. # Create the client namespace
    2. kubectl create namespace client
    3. # Add the namespace to the mesh
    4. osm namespace add client
    5. # Deploy fortio client in the client namespace
    6. kubectl apply -f https://raw.githubusercontent.com/openservicemesh/osm-docs/release-v1.2/manifests/samples/fortio/fortio.yaml -n client

    Confirm the fortio client pod is up and running.

    1. $ kubectl get pods -n client
    2. NAME READY STATUS RESTARTS AGE
    3. fortio-6477f8495f-bj4s9 2/2 Running 0 19s
  4. Confirm the fortio client is able to successfully make HTTP requests to the httpbin service on port 14001. We call the httpbin service with 3 concurrent connections (-c 3) and send 50 requests (-n 50).

    1. $ export fortio_pod="$(kubectl get pod -n client -l app=fortio -o jsonpath='{.items[0].metadata.name}')"
    2. $ kubectl exec "$fortio_pod" -c fortio -n client -- /usr/bin/fortio load -c 3 -qps 0 -n 50 -loglevel Warning http://httpbin.httpbin.svc.cluster.local:14001/get
    3. 17:48:46 I logger.go:127> Log level is now 3 Warning (was 2 Info)
    4. Fortio 1.17.1 running at 0 queries per second, 8->8 procs, for 50 calls: http://httpbin.httpbin.svc.cluster.local:14001/get
    5. Starting at max qps with 3 thread(s) [gomax 8] for exactly 50 calls (16 per thread + 2)
    6. Ended after 438.1586ms : 50 calls. qps=114.11
    7. Aggregated Function Time : count 50 avg 0.026068422 +/- 0.05104 min 0.0029766 max 0.1927961 sum 1.3034211
    8. # range, mid point, percentile, count
    9. >= 0.0029766 <= 0.003 , 0.0029883 , 2.00, 1
    10. > 0.003 <= 0.004 , 0.0035 , 30.00, 14
    11. > 0.004 <= 0.005 , 0.0045 , 32.00, 1
    12. > 0.005 <= 0.006 , 0.0055 , 44.00, 6
    13. > 0.006 <= 0.007 , 0.0065 , 46.00, 1
    14. > 0.007 <= 0.008 , 0.0075 , 66.00, 10
    15. > 0.008 <= 0.009 , 0.0085 , 72.00, 3
    16. > 0.009 <= 0.01 , 0.0095 , 74.00, 1
    17. > 0.01 <= 0.011 , 0.0105 , 82.00, 4
    18. > 0.03 <= 0.035 , 0.0325 , 86.00, 2
    19. > 0.035 <= 0.04 , 0.0375 , 88.00, 1
    20. > 0.12 <= 0.14 , 0.13 , 94.00, 3
    21. > 0.18 <= 0.192796 , 0.186398 , 100.00, 3
    22. # target 50% 0.0072
    23. # target 75% 0.010125
    24. # target 90% 0.126667
    25. # target 99% 0.190663
    26. # target 99.9% 0.192583
    27. Sockets used: 3 (for perfect keepalive, would be 3)
    28. Jitter: false
    29. Code 200 : 50 (100.0 %)
    30. Response Header Sizes : count 50 avg 230.3 +/- 0.6708 min 230 max 232 sum 11515
    31. Response Body/Total Sizes : count 50 avg 582.3 +/- 0.6708 min 582 max 584 sum 29115
    32. All done 50 calls (plus 0 warmup) 26.068 ms avg, 114.1 qps

    As seen above, all the requests succeeded.

    1. Code 200 : 50 (100.0 %)
  5. Next, apply a circuit breaker configuration using the UpstreamTrafficSetting resource for traffic directed to the httpbin service to limit the maximum number of concurrent connections and requests to 1.

    Note: The UpstreamTrafficSetting resource must be created in the same namespace as the upstream (destination) service, and the host must be set as the FQDN of the Kubernetes service.

    1. kubectl apply -f - <<EOF
    2. apiVersion: policy.openservicemesh.io/v1alpha1
    3. kind: UpstreamTrafficSetting
    4. metadata:
    5. name: httpbin
    6. namespace: httpbin
    7. spec:
    8. host: httpbin.httpbin.svc.cluster.local
    9. connectionSettings:
    10. tcp:
    11. maxConnections: 1
    12. http:
    13. maxPendingRequests: 1
    14. maxRequestsPerConnection: 1
    15. EOF
  6. Confirm the fortio client is unable to make the same amount of successful requests as before due to the connection and request level circuit breaking limits configured above.

    1. $ kubectl exec "$fortio_pod" -c fortio -n client -- /usr/bin/fortio load -c 3 -qps 0 -n 50 -loglevel Warning http://httpbin.httpbin.svc.cluster.local:14001/get
    2. 17:59:19 I logger.go:127> Log level is now 3 Warning (was 2 Info)
    3. Fortio 1.17.1 running at 0 queries per second, 8->8 procs, for 50 calls: http://httpbin.httpbin.svc.cluster.local:14001/get
    4. Starting at max qps with 3 thread(s) [gomax 8] for exactly 50 calls (16 per thread + 2)
    5. 17:59:19 W http_client.go:806> [0] Non ok http code 503 (HTTP/1.1 503)
    6. 17:59:19 W http_client.go:806> [1] Non ok http code 503 (HTTP/1.1 503)
    7. 17:59:19 W http_client.go:806> [1] Non ok http code 503 (HTTP/1.1 503)
    8. 17:59:19 W http_client.go:806> [1] Non ok http code 503 (HTTP/1.1 503)
    9. 17:59:19 W http_client.go:806> [0] Non ok http code 503 (HTTP/1.1 503)
    10. 17:59:19 W http_client.go:806> [1] Non ok http code 503 (HTTP/1.1 503)
    11. 17:59:19 W http_client.go:806> [0] Non ok http code 503 (HTTP/1.1 503)
    12. 17:59:19 W http_client.go:806> [0] Non ok http code 503 (HTTP/1.1 503)
    13. 17:59:19 W http_client.go:806> [0] Non ok http code 503 (HTTP/1.1 503)
    14. 17:59:19 W http_client.go:806> [0] Non ok http code 503 (HTTP/1.1 503)
    15. 17:59:19 W http_client.go:806> [0] Non ok http code 503 (HTTP/1.1 503)
    16. 17:59:19 W http_client.go:806> [0] Non ok http code 503 (HTTP/1.1 503)
    17. 17:59:19 W http_client.go:806> [0] Non ok http code 503 (HTTP/1.1 503)
    18. 17:59:19 W http_client.go:806> [2] Non ok http code 503 (HTTP/1.1 503)
    19. 17:59:19 W http_client.go:806> [0] Non ok http code 503 (HTTP/1.1 503)
    20. 17:59:19 W http_client.go:806> [1] Non ok http code 503 (HTTP/1.1 503)
    21. 17:59:19 W http_client.go:806> [1] Non ok http code 503 (HTTP/1.1 503)
    22. 17:59:19 W http_client.go:806> [1] Non ok http code 503 (HTTP/1.1 503)
    23. 17:59:19 W http_client.go:806> [1] Non ok http code 503 (HTTP/1.1 503)
    24. 17:59:19 W http_client.go:806> [2] Non ok http code 503 (HTTP/1.1 503)
    25. 17:59:19 W http_client.go:806> [0] Non ok http code 503 (HTTP/1.1 503)
    26. 17:59:19 W http_client.go:806> [0] Non ok http code 503 (HTTP/1.1 503)
    27. 17:59:19 W http_client.go:806> [0] Non ok http code 503 (HTTP/1.1 503)
    28. 17:59:19 W http_client.go:806> [0] Non ok http code 503 (HTTP/1.1 503)
    29. 17:59:19 W http_client.go:806> [0] Non ok http code 503 (HTTP/1.1 503)
    30. 17:59:19 W http_client.go:806> [0] Non ok http code 503 (HTTP/1.1 503)
    31. 17:59:19 W http_client.go:806> [2] Non ok http code 503 (HTTP/1.1 503)
    32. 17:59:19 W http_client.go:806> [0] Non ok http code 503 (HTTP/1.1 503)
    33. 17:59:19 W http_client.go:806> [1] Non ok http code 503 (HTTP/1.1 503)
    34. Ended after 122.6576ms : 50 calls. qps=407.64
    35. Aggregated Function Time : count 50 avg 0.006086436 +/- 0.00731 min 0.0005739 max 0.042604 sum 0.3043218
    36. # range, mid point, percentile, count
    37. >= 0.0005739 <= 0.001 , 0.00078695 , 14.00, 7
    38. > 0.001 <= 0.002 , 0.0015 , 32.00, 9
    39. > 0.002 <= 0.003 , 0.0025 , 40.00, 4
    40. > 0.003 <= 0.004 , 0.0035 , 52.00, 6
    41. > 0.004 <= 0.005 , 0.0045 , 64.00, 6
    42. > 0.005 <= 0.006 , 0.0055 , 66.00, 1
    43. > 0.006 <= 0.007 , 0.0065 , 72.00, 3
    44. > 0.007 <= 0.008 , 0.0075 , 74.00, 1
    45. > 0.008 <= 0.009 , 0.0085 , 76.00, 1
    46. > 0.009 <= 0.01 , 0.0095 , 80.00, 2
    47. > 0.01 <= 0.011 , 0.0105 , 82.00, 1
    48. > 0.011 <= 0.012 , 0.0115 , 88.00, 3
    49. > 0.012 <= 0.014 , 0.013 , 92.00, 2
    50. > 0.014 <= 0.016 , 0.015 , 96.00, 2
    51. > 0.025 <= 0.03 , 0.0275 , 98.00, 1
    52. > 0.04 <= 0.042604 , 0.041302 , 100.00, 1
    53. # target 50% 0.00383333
    54. # target 75% 0.0085
    55. # target 90% 0.013
    56. # target 99% 0.041302
    57. # target 99.9% 0.0424738
    58. Sockets used: 31 (for perfect keepalive, would be 3)
    59. Jitter: false
    60. Code 200 : 21 (42.0 %)
    61. Code 503 : 29 (58.0 %)
    62. Response Header Sizes : count 50 avg 96.68 +/- 113.6 min 0 max 231 sum 4834
    63. Response Body/Total Sizes : count 50 avg 399.42 +/- 186.2 min 241 max 619 sum 19971
    64. All done 50 calls (plus 0 warmup) 6.086 ms avg, 407.6 qps

    As seen above, only 42% of the requests succeeded, and the rest failed when the circuit breaker tripped.

    1. Code 200 : 21 (42.0 %)
    2. Code 503 : 29 (58.0 %)
  7. Examine the Envoy sidecar stats to see statistics pertaining to the requests that tripped the circuit breaker.

    1. $ osm proxy get stats "$fortio_pod" -n client | grep 'httpbin.*pending'
    2. cluster.httpbin/httpbin|14001.circuit_breakers.default.remaining_pending: 1
    3. cluster.httpbin/httpbin|14001.circuit_breakers.default.rq_pending_open: 0
    4. cluster.httpbin/httpbin|14001.circuit_breakers.high.rq_pending_open: 0
    5. cluster.httpbin/httpbin|14001.upstream_rq_pending_active: 0
    6. cluster.httpbin/httpbin|14001.upstream_rq_pending_failure_eject: 0
    7. cluster.httpbin/httpbin|14001.upstream_rq_pending_overflow: 29
    8. cluster.httpbin/httpbin|14001.upstream_rq_pending_total: 25

    cluster.httpbin/httpbin|14001.upstream_rq_pending_overflow: 29 indicates that 29 requests tripped the circuit breaker, which matches the number of failed requests seen in the previous step: Code 503 : 29 (58.0 %).