2. Quick introduction to load balancing and load balancers

  1. Load balancing consists in aggregating multiple components in order to achieve
  2. a total processing capacity above each component's individual capacity, without
  3. any intervention from the end user and in a scalable way. This results in more
  4. operations being performed simultaneously by the time it takes a component to
  5. perform only one. A single operation however will still be performed on a single
  6. component at a time and will not get faster than without load balancing. It
  7. always requires at least as many operations as available components and an
  8. efficient load balancing mechanism to make use of all components and to fully
  9. benefit from the load balancing. A good example of this is the number of lanes
  10. on a highway which allows as many cars to pass during the same time frame
  11. without increasing their individual speed.
  12.  
  13. Examples of load balancing :
  14.  
  15. - Process scheduling in multi-processor systems
  16. - Link load balancing (e.g. EtherChannel, Bonding)
  17. - IP address load balancing (e.g. ECMP, DNS round-robin)
  18. - Server load balancing (via load balancers)
  19.  
  20. The mechanism or component which performs the load balancing operation is
  21. called a load balancer. In web environments these components are called a
  22. "network load balancer", and more commonly a "load balancer" given that this
  23. activity is by far the best known case of load balancing.
  24.  
  25. A load balancer may act :
  26.  
  27. - at the link level : this is called link load balancing, and it consists in
  28. choosing what network link to send a packet to;
  29.  
  30. - at the network level : this is called network load balancing, and it
  31. consists in choosing what route a series of packets will follow;
  32.  
  33. - at the server level : this is called server load balancing and it consists
  34. in deciding what server will process a connection or request.
  35.  
  36. Two distinct technologies exist and address different needs, though with some
  37. overlapping. In each case it is important to keep in mind that load balancing
  38. consists in diverting the traffic from its natural flow and that doing so always
  39. requires a minimum of care to maintain the required level of consistency between
  40. all routing decisions.
  41.  
  42. The first one acts at the packet level and processes packets more or less
  43. individually. There is a 1-to-1 relation between input and output packets, so
  44. it is possible to follow the traffic on both sides of the load balancer using a
  45. regular network sniffer. This technology can be very cheap and extremely fast.
  46. It is usually implemented in hardware (ASICs) allowing to reach line rate, such
  47. as switches doing ECMP. Usually stateless, it can also be stateful (consider
  48. the session a packet belongs to and called layer4-LB or L4), may support DSR
  49. (direct server return, without passing through the LB again) if the packets
  50. were not modified, but provides almost no content awareness. This technology is
  51. very well suited to network-level load balancing, though it is sometimes used
  52. for very basic server load balancing at high speed.
  53.  
  54. The second one acts on session contents. It requires that the input streams is
  55. reassembled and processed as a whole. The contents may be modified, and the
  56. output stream is segmented into new packets. For this reason it is generally
  57. performed by proxies and they're often called layer 7 load balancers or L7.
  58. This implies that there are two distinct connections on each side, and that
  59. there is no relation between input and output packets sizes nor counts. Clients
  60. and servers are not required to use the same protocol (for example IPv4 vs
  61. IPv6, clear vs SSL). The operations are always stateful, and the return traffic
  62. must pass through the load balancer. The extra processing comes with a cost so
  63. it's not always possible to achieve line rate, especially with small packets.
  64. On the other hand, it offers wide possibilities and is generally achieved by
  65. pure software, even if embedded into hardware appliances. This technology is
  66. very well suited for server load balancing.
  67.  
  68. Packet-based load balancers are generally deployed in cut-through mode, so they
  69. are installed on the normal path of the traffic and divert it according to the
  70. configuration. The return traffic doesn't necessarily pass through the load
  71. balancer. Some modifications may be applied to the network destination address
  72. in order to direct the traffic to the proper destination. In this case, it is
  73. mandatory that the return traffic passes through the load balancer. If the
  74. routes doesn't make this possible, the load balancer may also replace the
  75. packets' source address with its own in order to force the return traffic to
  76. pass through it.
  77.  
  78. Proxy-based load balancers are deployed as a server with their own IP addresses
  79. and ports, without architecture changes. Sometimes this requires to perform some
  80. adaptations to the applications so that clients are properly directed to the
  81. load balancer's IP address and not directly to the server's. Some load balancers
  82. may have to adjust some servers' responses to make this possible (e.g. the HTTP
  83. Location header field used in HTTP redirects). Some proxy-based load balancers
  84. may intercept traffic for an address they don't own, and spoof the client's
  85. address when connecting to the server. This allows them to be deployed as if
  86. they were a regular router or firewall, in a cut-through mode very similar to
  87. the packet based load balancers. This is particularly appreciated for products
  88. which combine both packet mode and proxy mode. In this case DSR is obviously
  89. still not possible and the return traffic still has to be routed back to the
  90. load balancer.
  91.  
  92. A very scalable layered approach would consist in having a front router which
  93. receives traffic from multiple load balanced links, and uses ECMP to distribute
  94. this traffic to a first layer of multiple stateful packet-based load balancers
  95. (L4). These L4 load balancers in turn pass the traffic to an even larger number
  96. of proxy-based load balancers (L7), which have to parse the contents to decide
  97. what server will ultimately receive the traffic.
  98.  
  99. The number of components and possible paths for the traffic increases the risk
  100. of failure; in very large environments, it is even normal to permanently have
  101. a few faulty components being fixed or replaced. Load balancing done without
  102. awareness of the whole stack's health significantly degrades availability. For
  103. this reason, any sane load balancer will verify that the components it intends
  104. to deliver the traffic to are still alive and reachable, and it will stop
  105. delivering traffic to faulty ones. This can be achieved using various methods.
  106.  
  107. The most common one consists in periodically sending probes to ensure the
  108. component is still operational. These probes are called "health checks". They
  109. must be representative of the type of failure to address. For example a ping-
  110. based check will not detect that a web server has crashed and doesn't listen to
  111. a port anymore, while a connection to the port will verify this, and a more
  112. advanced request may even validate that the server still works and that the
  113. database it relies on is still accessible. Health checks often involve a few
  114. retries to cover for occasional measuring errors. The period between checks
  115. must be small enough to ensure the faulty component is not used for too long
  116. after an error occurs.
  117.  
  118. Other methods consist in sampling the production traffic sent to a destination
  119. to observe if it is processed correctly or not, and to evict the components
  120. which return inappropriate responses. However this requires to sacrifice a part
  121. of the production traffic and this is not always acceptable. A combination of
  122. these two mechanisms provides the best of both worlds, with both of them being
  123. used to detect a fault, and only health checks to detect the end of the fault.
  124. A last method involves centralized reporting : a central monitoring agent
  125. periodically updates all load balancers about all components' state. This gives
  126. a global view of the infrastructure to all components, though sometimes with
  127. less accuracy or responsiveness. It's best suited for environments with many
  128. load balancers and many servers.
  129.  
  130. Layer 7 load balancers also face another challenge known as stickiness or
  131. persistence. The principle is that they generally have to direct multiple
  132. subsequent requests or connections from a same origin (such as an end user) to
  133. the same target. The best known example is the shopping cart on an online
  134. store. If each click leads to a new connection, the user must always be sent
  135. to the server which holds his shopping cart. Content-awareness makes it easier
  136. to spot some elements in the request to identify the server to deliver it to,
  137. but that's not always enough. For example if the source address is used as a
  138. key to pick a server, it can be decided that a hash-based algorithm will be
  139. used and that a given IP address will always be sent to the same server based
  140. on a divide of the address by the number of available servers. But if one
  141. server fails, the result changes and all users are suddenly sent to a different
  142. server and lose their shopping cart. The solution against this issue consists
  143. in memorizing the chosen target so that each time the same visitor is seen,
  144. he's directed to the same server regardless of the number of available servers.
  145. The information may be stored in the load balancer's memory, in which case it
  146. may have to be replicated to other load balancers if it's not alone, or it may
  147. be stored in the client's memory using various methods provided that the client
  148. is able to present this information back with every request (cookie insertion,
  149. redirection to a sub-domain, etc). This mechanism provides the extra benefit of
  150. not having to rely on unstable or unevenly distributed information (such as the
  151. source IP address). This is in fact the strongest reason to adopt a layer 7
  152. load balancer instead of a layer 4 one.
  153.  
  154. In order to extract information such as a cookie, a host header field, a URL
  155. or whatever, a load balancer may need to decrypt SSL/TLS traffic and even
  156. possibly to re-encrypt it when passing it to the server. This expensive task
  157. explains why in some high-traffic infrastructures, sometimes there may be a
  158. lot of load balancers.
  159.  
  160. Since a layer 7 load balancer may perform a number of complex operations on the
  161. traffic (decrypt, parse, modify, match cookies, decide what server to send to,
  162. etc), it can definitely cause some trouble and will very commonly be accused of
  163. being responsible for a lot of trouble that it only revealed. Often it will be
  164. discovered that servers are unstable and periodically go up and down, or for
  165. web servers, that they deliver pages with some hard-coded links forcing the
  166. clients to connect directly to one specific server without passing via the load
  167. balancer, or that they take ages to respond under high load causing timeouts.
  168. That's why logging is an extremely important aspect of layer 7 load balancing.
  169. Once a trouble is reported, it is important to figure if the load balancer took
  170. a wrong decision and if so why so that it doesn't happen anymore.