This guide demonstrates how to configure circuit breaking for destinations that are external to the OSM managed service mesh.

Prerequisites

  • Kubernetes cluster running Kubernetes v1.20.0 or greater.
  • Have OSM installed.
  • Have kubectl available to interact with the API server.
  • Have osm CLI available for managing the service mesh.
  • OSM version >= v1.1.0.

Demo

The following demo shows a load-testing client fortio sending traffic to the httpbin service that is external to the service mesh. Traffic external to the mesh is treated as Egress traffic, and will be authorized using an Egress traffic policy. We will see how applying circuit breakers for traffic to the external httpbin service impacts the fortio client when the configured circuit breaking limits trip.

  1. Deploy the httpbin service into the httpbin namespace. The httpbin service runs on port 14001 and is not added to the mesh, so it is considered to be a destination external to the mesh.

    1. # Create the httpbin namespace
    2. kubectl create namespace httpbin
    3. # Deploy httpbin service in the httpbin namespace
    4. kubectl apply -f https://raw.githubusercontent.com/openservicemesh/osm-docs/release-v1.1/manifests/samples/httpbin/httpbin.yaml -n httpbin

    Confirm the httpbin service and pods are up and running.

    1. $ kubectl get svc -n httpbin
    2. NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
    3. httpbin ClusterIP 10.96.198.23 <none> 14001/TCP 20s
    1. $ kubectl get pods -n httpbin
    2. NAME READY STATUS RESTARTS AGE
    3. httpbin-5b8b94b9-lt2vs 1/1 Running 0 20s
  2. Deploy the fortio load-testing client in the client namespace after enrolling its namespace to the mesh.

    1. # Create the client namespace
    2. kubectl create namespace client
    3. # Add the namespace to the mesh
    4. osm namespace add client
    5. # Deploy fortio client in the client namespace
    6. kubectl apply -f https://raw.githubusercontent.com/openservicemesh/osm-docs/release-v1.1/manifests/samples/fortio/fortio.yaml -n client

    Confirm the fortio client pod is up and running.

    1. $ kubectl get pods -n client
    2. NAME READY STATUS RESTARTS AGE
    3. fortio-6477f8495f-bj4s9 2/2 Running 0 19s
  3. Configure an Egress policy that allows the fortio client in the client namespace to communicate with the external httpbin service. The HTTP requests will be directed to the host httpbin.httpbin.svc.cluster.local on port 14001.

    1. kubectl apply -f - <<EOF
    2. kind: Egress
    3. apiVersion: policy.openservicemesh.io/v1alpha1
    4. metadata:
    5. name: httpbin-external
    6. namespace: client
    7. spec:
    8. sources:
    9. - kind: ServiceAccount
    10. name: default
    11. namespace: client
    12. hosts:
    13. - httpbin.httpbin.svc.cluster.local
    14. ports:
    15. - number: 14001
    16. protocol: http
    17. EOF
  4. Confirm the fortio client is able to successfully make HTTP requests to the external host httpbin.httpbin.svc.cluster.local service on port 14001. We call the external service with 5 concurrent connections (-c 5) and send 50 requests (-n 50).

    1. $ export fortio_pod="$(kubectl get pod -n client -l app=fortio -o jsonpath='{.items[0].metadata.name}')"
    2. $ kubectl exec "$fortio_pod" -c fortio -n client -- /usr/bin/fortio load -c 5 -qps 0 -n 50 -loglevel Warning http://httpbin.httpbin.svc.cluster.local:14001/get
    3. 19:56:34 I logger.go:127> Log level is now 3 Warning (was 2 Info)
    4. Fortio 1.17.1 running at 0 queries per second, 8->8 procs, for 50 calls: http://httpbin.httpbin.svc.cluster.local:14001/get
    5. Starting at max qps with 5 thread(s) [gomax 8] for exactly 50 calls (10 per thread + 0)
    6. Ended after 36.3659ms : 50 calls. qps=1374.9
    7. Aggregated Function Time : count 50 avg 0.003374618 +/- 0.0007546 min 0.0013124 max 0.0066215 sum 0.1687309
    8. # range, mid point, percentile, count
    9. >= 0.0013124 <= 0.002 , 0.0016562 , 4.00, 2
    10. > 0.002 <= 0.003 , 0.0025 , 10.00, 3
    11. > 0.003 <= 0.004 , 0.0035 , 86.00, 38
    12. > 0.004 <= 0.005 , 0.0045 , 98.00, 6
    13. > 0.006 <= 0.0066215 , 0.00631075 , 100.00, 1
    14. # target 50% 0.00352632
    15. # target 75% 0.00385526
    16. # target 90% 0.00433333
    17. # target 99% 0.00631075
    18. # target 99.9% 0.00659043
    19. Sockets used: 5 (for perfect keepalive, would be 5)
    20. Jitter: false
    21. Code 200 : 50 (100.0 %)
    22. Response Header Sizes : count 50 avg 230 +/- 0 min 230 max 230 sum 11500
    23. Response Body/Total Sizes : count 50 avg 460 +/- 0 min 460 max 460 sum 23000
    24. All done 50 calls (plus 0 warmup) 3.375 ms avg, 1374.9 qps

    As seen above, all the requests succeeded.

    1. Code 200 : 50 (100.0 %)
  5. Next, apply a circuit breaker configuration using the UpstreamTrafficSetting resource for traffic directed to the external host httpbin.httpbin.svc.cluster.local to limit the maximum number of concurrent connections and requests to 1. When applying an UpstreamTrafficSetting configuration for external (egress) traffic, the UpstreamTrafficSetting resource must also be specified as a match in the Egress configuration and belong to the same namespace as the matching Egress resource. This is required to enforce circuit breaking limits for external traffic. Hence, we also update the previously applied Egress configuration to specify a matches field.

    1. kubectl apply -f - <<EOF
    2. apiVersion: policy.openservicemesh.io/v1alpha1
    3. kind: UpstreamTrafficSetting
    4. metadata:
    5. name: httpbin-external
    6. namespace: client
    7. spec:
    8. host: httpbin.httpbin.svc.cluster.local
    9. connectionSettings:
    10. tcp:
    11. maxConnections: 1
    12. http:
    13. maxPendingRequests: 1
    14. maxRequestsPerConnection: 1
    15. ---
    16. kind: Egress
    17. apiVersion: policy.openservicemesh.io/v1alpha1
    18. metadata:
    19. name: httpbin-external
    20. namespace: client
    21. spec:
    22. sources:
    23. - kind: ServiceAccount
    24. name: default
    25. namespace: client
    26. hosts:
    27. - httpbin.httpbin.svc.cluster.local
    28. ports:
    29. - number: 14001
    30. protocol: http
    31. matches:
    32. - apiGroup: policy.openservicemesh.io/v1alpha1
    33. kind: UpstreamTrafficSetting
    34. name: httpbin-external
    35. EOF
  6. Confirm the fortio client is unable to make the same amount of successful requests as before due to the connection and request level circuit breaking limits configured above.

    1. $ kubectl exec "$fortio_pod" -c fortio -n client -- /usr/bin/fortio load -c 5 -qps 0 -n 50 -loglevel Warning http://httpbin.httpbin.svc.cluster.local:14001/get
    2. 19:58:48 I logger.go:127> Log level is now 3 Warning (was 2 Info)
    3. Fortio 1.17.1 running at 0 queries per second, 8->8 procs, for 50 calls: http://httpbin.httpbin.svc.cluster.local:14001/get
    4. Starting at max qps with 5 thread(s) [gomax 8] for exactly 50 calls (10 per thread + 0)
    5. 19:58:48 W http_client.go:806> [0] Non ok http code 503 (HTTP/1.1 503)
    6. 19:58:48 W http_client.go:806> [2] Non ok http code 503 (HTTP/1.1 503)
    7. 19:58:48 W http_client.go:806> [2] Non ok http code 503 (HTTP/1.1 503)
    8. 19:58:48 W http_client.go:806> [2] Non ok http code 503 (HTTP/1.1 503)
    9. 19:58:48 W http_client.go:806> [2] Non ok http code 503 (HTTP/1.1 503)
    10. 19:58:48 W http_client.go:806> [2] Non ok http code 503 (HTTP/1.1 503)
    11. 19:58:48 W http_client.go:806> [1] Non ok http code 503 (HTTP/1.1 503)
    12. 19:58:48 W http_client.go:806> [2] Non ok http code 503 (HTTP/1.1 503)
    13. 19:58:48 W http_client.go:806> [2] Non ok http code 503 (HTTP/1.1 503)
    14. 19:58:48 W http_client.go:806> [1] Non ok http code 503 (HTTP/1.1 503)
    15. 19:58:48 W http_client.go:806> [4] Non ok http code 503 (HTTP/1.1 503)
    16. 19:58:48 W http_client.go:806> [3] Non ok http code 503 (HTTP/1.1 503)
    17. 19:58:48 W http_client.go:806> [3] Non ok http code 503 (HTTP/1.1 503)
    18. 19:58:48 W http_client.go:806> [2] Non ok http code 503 (HTTP/1.1 503)
    19. 19:58:48 W http_client.go:806> [3] Non ok http code 503 (HTTP/1.1 503)
    20. 19:58:48 W http_client.go:806> [2] Non ok http code 503 (HTTP/1.1 503)
    21. 19:58:48 W http_client.go:806> [3] Non ok http code 503 (HTTP/1.1 503)
    22. 19:58:48 W http_client.go:806> [1] Non ok http code 503 (HTTP/1.1 503)
    23. 19:58:48 W http_client.go:806> [3] Non ok http code 503 (HTTP/1.1 503)
    24. 19:58:48 W http_client.go:806> [1] Non ok http code 503 (HTTP/1.1 503)
    25. 19:58:48 W http_client.go:806> [4] Non ok http code 503 (HTTP/1.1 503)
    26. Ended after 33.1549ms : 50 calls. qps=1508.1
    27. Aggregated Function Time : count 50 avg 0.002467842 +/- 0.001827 min 0.0003724 max 0.0067697 sum 0.1233921
    28. # range, mid point, percentile, count
    29. >= 0.0003724 <= 0.001 , 0.0006862 , 34.00, 17
    30. > 0.001 <= 0.002 , 0.0015 , 50.00, 8
    31. > 0.002 <= 0.003 , 0.0025 , 60.00, 5
    32. > 0.003 <= 0.004 , 0.0035 , 84.00, 12
    33. > 0.004 <= 0.005 , 0.0045 , 88.00, 2
    34. > 0.005 <= 0.006 , 0.0055 , 92.00, 2
    35. > 0.006 <= 0.0067697 , 0.00638485 , 100.00, 4
    36. # target 50% 0.002
    37. # target 75% 0.003625
    38. # target 90% 0.0055
    39. # target 99% 0.00667349
    40. # target 99.9% 0.00676008
    41. Sockets used: 25 (for perfect keepalive, would be 5)
    42. Jitter: false
    43. Code 200 : 29 (58.0 %)
    44. Code 503 : 21 (42.0 %)
    45. Response Header Sizes : count 50 avg 133.4 +/- 113.5 min 0 max 230 sum 6670
    46. Response Body/Total Sizes : count 50 avg 368.02 +/- 108.1 min 241 max 460 sum 18401
    47. All done 50 calls (plus 0 warmup) 2.468 ms avg, 1508.1 qps

    As seen above, only 58% of the requests succeeded, and the rest failed when the circuit breaker tripped.

    1. Code 200 : 29 (58.0 %)
    2. Code 503 : 21 (42.0 %)
  7. Examine the Envoy sidecar stats to see statistics pertaining to the requests that tripped the circuit breaker.

    1. $ osm proxy get stats $fortio_pod -n client | grep 'httpbin.*pending'
    2. cluster.httpbin_httpbin_svc_cluster_local_14001.circuit_breakers.default.remaining_pending: 1
    3. cluster.httpbin_httpbin_svc_cluster_local_14001.circuit_breakers.default.rq_pending_open: 0
    4. cluster.httpbin_httpbin_svc_cluster_local_14001.circuit_breakers.high.rq_pending_open: 0
    5. cluster.httpbin_httpbin_svc_cluster_local_14001.upstream_rq_pending_active: 0
    6. cluster.httpbin_httpbin_svc_cluster_local_14001.upstream_rq_pending_failure_eject: 0
    7. cluster.httpbin_httpbin_svc_cluster_local_14001.upstream_rq_pending_overflow: 21
    8. cluster.httpbin_httpbin_svc_cluster_local_14001.upstream_rq_pending_total: 29

    cluster.httpbin_httpbin_svc_cluster_local_14001.upstream_rq_pending_overflow: 21 indicates that 21 requests tripped the circuit breaker, which matches the number of failed requests seen in the previous step: Code 200 : 29 (58.0 %).