This guide demonstrates how to configure rate limiting for L4 TCP connections destined to a target host that is a part of an OSM managed service mesh.

Prerequisites

  • Kubernetes cluster running Kubernetes v1.22.9 or greater.
  • Have OSM installed.
  • Have kubectl available to interact with the API server.
  • Have osm CLI available for managing the service mesh.
  • OSM version >= v1.2.0.

Demo

The following demo shows a client fortio-client sending TCP traffic to the fortio TCP echo service. The fortio service echoes TCP messages back to the client. We will see the impact of applying local TCP rate limiting policies targeting the fortio service to control the throughput of traffic destined to the service backend.

  1. For simplicity, enable permissive traffic policy mode so that explicit SMI traffic access policies are not required for application connectivity within the mesh.

    1. export osm_namespace=osm-system # Replace osm-system with the namespace where OSM is installed
    2. kubectl patch meshconfig osm-mesh-config -n "$osm_namespace" -p '{"spec":{"traffic":{"enablePermissiveTrafficPolicyMode":true}}}' --type=merge
  2. Deploy the fortio TCP echo service in the demo namespace after enrolling its namespace to the mesh. The fortio TCP echo service runs on port 8078.

    1. # Create the demo namespace
    2. kubectl create namespace demo
    3. # Add the namespace to the mesh
    4. osm namespace add demo
    5. # Deploy fortio TCP echo in the demo namespace
    6. kubectl apply -f https://raw.githubusercontent.com/openservicemesh/osm-docs/release-v1.2/manifests/samples/fortio/fortio.yaml -n demo

    Confirm the fortio service pod is up and running.

    1. $ kubectl get pods -n demo
    2. NAME READY STATUS RESTARTS AGE
    3. fortio-c4bd7857f-7mm6w 2/2 Running 0 22m
  3. Deploy the fortio-client app in the demo namespace. We will use this client to send TCP traffic to the fortio TCP echo service deployed previously.

    1. kubectl apply -f https://raw.githubusercontent.com/openservicemesh/osm-docs/release-v1.2/manifests/samples/fortio/fortio-client.yaml -n demo

    Confirm the fortio-client pod is up and running.

    1. NAME READY STATUS RESTARTS AGE
    2. fortio-client-b9b7bbfb8-prq7r 2/2 Running 0 7s
  4. Confirm the fortio-client app is able to successfully make TCP connections and send data to the frotio TCP echo service on port 8078. We call the fortio service with 3 concurrent connections (-c 3) and send 10 calls (-n 10).

    1. $ fortio_client="$(kubectl get pod -n demo -l app=fortio-client -o jsonpath='{.items[0].metadata.name}')"
    2. $ kubectl exec "$fortio_client" -n demo -c fortio-client -- fortio load -qps -1 -c 3 -n 10 tcp://fortio.demo.svc.cluster.local:8078
    3. Fortio 1.32.3 running at -1 queries per second, 8->8 procs, for 10 calls: tcp://fortio.demo.svc.cluster.local:8078
    4. 20:41:47 I tcprunner.go:238> Starting tcp test for tcp://fortio.demo.svc.cluster.local:8078 with 3 threads at -1.0 qps
    5. Starting at max qps with 3 thread(s) [gomax 8] for exactly 10 calls (3 per thread + 1)
    6. 20:41:47 I periodic.go:723> T001 ended after 34.0563ms : 3 calls. qps=88.0894283876992
    7. 20:41:47 I periodic.go:723> T000 ended after 35.3117ms : 4 calls. qps=113.2769025563765
    8. 20:41:47 I periodic.go:723> T002 ended after 44.0273ms : 3 calls. qps=68.13954069406937
    9. Ended after 44.2097ms : 10 calls. qps=226.19
    10. Aggregated Function Time : count 10 avg 0.01096615 +/- 0.01386 min 0.001588 max 0.0386716 sum 0.1096615
    11. # range, mid point, percentile, count
    12. >= 0.001588 <= 0.002 , 0.001794 , 40.00, 4
    13. > 0.002 <= 0.003 , 0.0025 , 60.00, 2
    14. > 0.003 <= 0.004 , 0.0035 , 70.00, 1
    15. > 0.025 <= 0.03 , 0.0275 , 90.00, 2
    16. > 0.035 <= 0.0386716 , 0.0368358 , 100.00, 1
    17. # target 50% 0.0025
    18. # target 75% 0.02625
    19. # target 90% 0.03
    20. # target 99% 0.0383044
    21. # target 99.9% 0.0386349
    22. Error cases : no data
    23. Sockets used: 3 (for perfect no error run, would be 3)
    24. Total Bytes sent: 240, received: 240
    25. tcp OK : 10 (100.0 %)
    26. All done 10 calls (plus 0 warmup) 10.966 ms avg, 226.2 qps

    As seen above, all the TCP connections from the fortio-client pod succeeded.

    1. Total Bytes sent: 240, received: 240
    2. tcp OK : 10 (100.0 %)
    3. All done 10 calls (plus 0 warmup) 10.966 ms avg, 226.2 qps
  5. Next, apply a local rate limiting policy to rate limit L4 TCP connections to the fortio.demo.svc.cluster.local service to 1 connection per minute.

    1. kubectl apply -f - <<EOF
    2. apiVersion: policy.openservicemesh.io/v1alpha1
    3. kind: UpstreamTrafficSetting
    4. metadata:
    5. name: tcp-rate-limit
    6. namespace: demo
    7. spec:
    8. host: fortio.demo.svc.cluster.local
    9. rateLimit:
    10. local:
    11. tcp:
    12. connections: 1
    13. unit: minute
    14. EOF

    Confirm no traffic has been rate limited yet by examining the stats on the fortio backend pod.

    1. $ fortio_server="$(kubectl get pod -n demo -l app=fortio -o jsonpath='{.items[0].metadata.name}')"
    2. $ osm proxy get stats "$fortio_server" -n demo | grep fortio.*8078.*rate_limit
    3. local_rate_limit.inbound_demo/fortio_8078_tcp.rate_limited: 0
  6. Confirm TCP connections are rate limited.

    1. $ kubectl exec "$fortio_client" -n demo -c fortio-client -- fortio load -qps -1 -c 3 -n 10 tcp://fortio.demo.svc.cluster.local:8078
    2. Fortio 1.32.3 running at -1 queries per second, 8->8 procs, for 10 calls: tcp://fortio.demo.svc.cluster.local:8078
    3. 20:49:38 I tcprunner.go:238> Starting tcp test for tcp://fortio.demo.svc.cluster.local:8078 with 3 threads at -1.0 qps
    4. Starting at max qps with 3 thread(s) [gomax 8] for exactly 10 calls (3 per thread + 1)
    5. 20:49:38 E tcprunner.go:203> [2] Unable to read: read tcp 10.244.1.19:59244->10.96.83.254:8078: read: connection reset by peer
    6. 20:49:38 E tcprunner.go:203> [0] Unable to read: read tcp 10.244.1.19:59246->10.96.83.254:8078: read: connection reset by peer
    7. 20:49:38 E tcprunner.go:203> [2] Unable to read: read tcp 10.244.1.19:59258->10.96.83.254:8078: read: connection reset by peer
    8. 20:49:38 E tcprunner.go:203> [0] Unable to read: read tcp 10.244.1.19:59260->10.96.83.254:8078: read: connection reset by peer
    9. 20:49:38 E tcprunner.go:203> [2] Unable to read: read tcp 10.244.1.19:59266->10.96.83.254:8078: read: connection reset by peer
    10. 20:49:38 I periodic.go:723> T002 ended after 9.643ms : 3 calls. qps=311.1065021258944
    11. 20:49:38 E tcprunner.go:203> [0] Unable to read: read tcp 10.244.1.19:59268->10.96.83.254:8078: read: connection reset by peer
    12. 20:49:38 E tcprunner.go:203> [0] Unable to read: read tcp 10.244.1.19:59274->10.96.83.254:8078: read: connection reset by peer
    13. 20:49:38 I periodic.go:723> T000 ended after 14.8212ms : 4 calls. qps=269.8836801338623
    14. 20:49:38 I periodic.go:723> T001 ended after 20.3458ms : 3 calls. qps=147.45057948077735
    15. Ended after 20.5468ms : 10 calls. qps=486.69
    16. Aggregated Function Time : count 10 avg 0.00438853 +/- 0.004332 min 0.0014184 max 0.0170216 sum 0.0438853
    17. # range, mid point, percentile, count
    18. >= 0.0014184 <= 0.002 , 0.0017092 , 20.00, 2
    19. > 0.002 <= 0.003 , 0.0025 , 50.00, 3
    20. > 0.003 <= 0.004 , 0.0035 , 70.00, 2
    21. > 0.004 <= 0.005 , 0.0045 , 90.00, 2
    22. > 0.016 <= 0.0170216 , 0.0165108 , 100.00, 1
    23. # target 50% 0.003
    24. # target 75% 0.00425
    25. # target 90% 0.005
    26. # target 99% 0.0169194
    27. # target 99.9% 0.0170114
    28. Error cases : count 7 avg 0.0034268714 +/- 0.0007688 min 0.0024396 max 0.0047932 sum 0.0239881
    29. # range, mid point, percentile, count
    30. >= 0.0024396 <= 0.003 , 0.0027198 , 42.86, 3
    31. > 0.003 <= 0.004 , 0.0035 , 71.43, 2
    32. > 0.004 <= 0.0047932 , 0.0043966 , 100.00, 2
    33. # target 50% 0.00325
    34. # target 75% 0.00409915
    35. # target 90% 0.00451558
    36. # target 99% 0.00476544
    37. # target 99.9% 0.00479042
    38. Sockets used: 8 (for perfect no error run, would be 3)
    39. Total Bytes sent: 240, received: 72
    40. tcp OK : 3 (30.0 %)
    41. tcp short read : 7 (70.0 %)
    42. All done 10 calls (plus 0 warmup) 4.389 ms avg, 486.7 qps

    As seen above, only 30% of the 10 calls succeeded, while the remaining 70% was rate limitied. This is because we applied a rate limiting policy of 1 connection per minute at the fortio backend service, and the fortio-client was able to use 1 connection to make 3/10 calls, resulting in a 30% success rate.

    Examine the sidecar stats to further confirm this.

    1. $ osm proxy get stats "$fortio_server" -n demo | grep 'fortio.*8078.*rate_limit'
    2. local_rate_limit.inbound_demo/fortio_8078_tcp.rate_limited: 7
  7. Next, let’s update our rate limiting policy to allow a burst of connections. Bursts allow a given number of connections over the baseline rate of 1 connection per minute defined by our rate limiting policy.

    1. kubectl apply -f - <<EOF
    2. apiVersion: policy.openservicemesh.io/v1alpha1
    3. kind: UpstreamTrafficSetting
    4. metadata:
    5. name: tcp-echo-limit
    6. namespace: demo
    7. spec:
    8. host: fortio.demo.svc.cluster.local
    9. rateLimit:
    10. local:
    11. tcp:
    12. connections: 1
    13. unit: minute
    14. burst: 10
    15. EOF
  8. Confirm the burst capability allows a burst of connections within a small window of time.

    1. $ kubectl exec "$fortio_client" -n demo -c fortio-client -- fortio load -qps -1 -c 3 -n 10 tcp://fortio.demo.svc.cluster.local:8078
    2. Fortio 1.32.3 running at -1 queries per second, 8->8 procs, for 10 calls: tcp://fortio.demo.svc.cluster.local:8078
    3. 20:56:56 I tcprunner.go:238> Starting tcp test for tcp://fortio.demo.svc.cluster.local:8078 with 3 threads at -1.0 qps
    4. Starting at max qps with 3 thread(s) [gomax 8] for exactly 10 calls (3 per thread + 1)
    5. 20:56:56 I periodic.go:723> T002 ended after 5.1568ms : 3 calls. qps=581.7561278312132
    6. 20:56:56 I periodic.go:723> T001 ended after 5.2334ms : 3 calls. qps=573.2411052088509
    7. 20:56:56 I periodic.go:723> T000 ended after 5.2464ms : 4 calls. qps=762.4275693809088
    8. Ended after 5.2711ms : 10 calls. qps=1897.1
    9. Aggregated Function Time : count 10 avg 0.00153124 +/- 0.001713 min 0.00033 max 0.0044054 sum 0.0153124
    10. # range, mid point, percentile, count
    11. >= 0.00033 <= 0.001 , 0.000665 , 70.00, 7
    12. > 0.003 <= 0.004 , 0.0035 , 80.00, 1
    13. > 0.004 <= 0.0044054 , 0.0042027 , 100.00, 2
    14. # target 50% 0.000776667
    15. # target 75% 0.0035
    16. # target 90% 0.0042027
    17. # target 99% 0.00438513
    18. # target 99.9% 0.00440337
    19. Error cases : no data
    20. Sockets used: 3 (for perfect no error run, would be 3)
    21. Total Bytes sent: 240, received: 240
    22. tcp OK : 10 (100.0 %)
    23. All done 10 calls (plus 0 warmup) 1.531 ms avg, 1897.1 qps

    As seen above, all the TCP connections from the fortio-client pod succeeded.

    1. Total Bytes sent: 240, received: 240
    2. tcp OK : 10 (100.0 %)
    3. All done 10 calls (plus 0 warmup) 1.531 ms avg, 1897.1 qps

    Further, examine the stats to confirm the burst allows additional connections to go through. The number of connections rate limited hasn’t increased since our previous rate limit test before we configured the burst setting.

    1. $ osm proxy get stats "$fortio_server" -n demo | grep 'fortio.*8078.*rate_limit'
    2. local_rate_limit.inbound_demo/fortio_8078_tcp.rate_limited: 7