Routing Optimization
Scaling OKD HAProxy Router
Baseline Performance
The OKD router is the ingress point for all external traffic destined for OKD services.
When evaluating a single HAProxy router performance in terms of HTTP requests handled per second, the performance varies depending on many factors. In particular:
HTTP keep-alive/close mode,
TLS session resumption client support
number of concurrent connections per target route
number of target routes
backend server page size
underlying infrastructure (network/SDN solution, CPU, and so on)
While performance in your specific environment will vary, our lab tests on a public cloud instance of size 4 vCPU/16GB RAM, a single HAProxy router handling 100 routes terminated by backends serving 1kB static pages is able to handle the following number of transactions per second.
In HTTP keep-alive mode scenarios:
Encryption | ROUTER_THREADS unset | ROUTER_THREADS=4 |
---|---|---|
none | 23681 | 24327 |
edge | 14981 | 22768 |
passthrough | 34358 | 34331 |
re-encrypt | 13288 | 24605 |
In HTTP close (no keep-alive) scenarios:
Encryption | ROUTER_THREADS unset | ROUTER_THREADS=4 |
---|---|---|
none | 3245 | 4527 |
edge | 1910 | 3043 |
passthrough | 3408 | 3922 |
re-encrypt | 1333 | 2239 |
TLS session resumption was used for encrypted routes. With HTTP keep-alive, a single HAProxy router is capable of saturating 1 Gbit NIC at page sizes as small as 8 kB.
When running on bare metal with modern processors, you can expect roughly twice the performance of the public cloud instance above. This overhead is introduced by the virtualization layer in place on public clouds and holds mostly true for private cloud-based virtualization as well. The following table is a guide on how many applications to use behind the router:
Number of applications | Application type |
---|---|
5-10 | static file/web server or caching proxy |
100-1000 | applications generating dynamic content |
In general, HAProxy can support routes for 5 to 1000 applications, depending on the technology in use. Router performance might be limited by the capabilities and performance of the applications behind it, such as language or static versus dynamic content.
Router sharding should be used to serve more routes towards applications and help horizontally scale the routing tier.
Performance Optimizations
Setting the Maximum Number of Connections
One of the most important tunable parameters for HAProxy scalability is the maxconn
parameter, which sets the maximum per-process number of concurrent connections to a given number. Adjust this parameter by editing the ROUTER_MAX_CONNECTIONS
environment variable in the OKD HAProxy router’s deployment configuration file.
A connection includes the frontend and internal backend. This counts as two connections. Be sure to set |
CPU and Interrupt Affinity
In OKD, the HAProxy router runs as a single process. The OKD HAProxy router typically performs better on a system with fewer but high frequency cores, rather than on an symmetric multiprocessing (SMP) system with a high number of lower frequency cores.
Pinning the HAProxy process to one CPU core and the network interrupts to another CPU core tends to increase network performance. Having processes and interrupts on the same non-uniform memory access (NUMA) node helps avoid memory accesses by ensuring a shared L3 cache. However, this level of control is generally not possible on a public cloud environment. On bare metal hosts, irqbalance
automatically handles peripheral component interconnect (PCI) locality and NUMA affinity for interrupt request lines (IRQs). On a cloud environment, this level of information is generally not provided to the operating system.
CPU pinning is performed either by taskset
or by using HAProxy’s cpu-map
parameter. This directive takes two arguments: the process ID and the CPU core ID. For example, to pin HAProxy process 1
onto CPU core 0
, add the following line to the global section of HAProxy’s configuration file:
cpu-map 1 0
To modify the HAProxy configuration file, refer to Deploying a Customized HAProxy Router.
Increasing the Number of Threads
The HAProxy router comes with support for multithreading in OKD. On a multiple CPU core system, increasing the number of threads can help the performance, especially when terminating SSL on the router.
To specify the number of threads for the HAProxy router, refer to Enable HAProxy Threading and Router Environment Variables.
Impacts of Buffer Increases
The OKD HAProxy router request buffer configuration limits the size of headers in incoming requests and responses from applications. The HAProxy parameter tune.bufsize
can be increased to allow processing of larger headers and to allow applications with very large cookies to work, such as those accepted by load balancers provided by many public cloud providers. However, this affects the total memory use, especially when large numbers of connections are open. With very large numbers of open connections, the memory usage will be nearly proportionate to the increase of this tunable parameter.
Optimizations for HAProxy Reloads
Long-lasting connections, such as WebSocket connections, combined with long client/server HAProxy timeouts and short HAProxy reload intervals, can cause instantiation of many HAProxy processes. These processes must handle old connections, which were started before the HAProxy configuration reload. A large number of these processes is undesirable, as it will exert unnecessary load on the system and can lead to issues, such as out of memory conditions.
Router environment variables affecting this behavior are ROUTER_DEFAULT_TUNNEL_TIMEOUT
, ROUTER_DEFAULT_CLIENT_TIMEOUT
, ROUTER_DEFAULT_SERVER_TIMEOUT
, and RELOAD_INTERVAL
in particular.