2. Quick introduction to load balancing and load balancers
- Load balancing consists in aggregating multiple components in order to achieve
- a total processing capacity above each component's individual capacity, without
- any intervention from the end user and in a scalable way. This results in more
- operations being performed simultaneously by the time it takes a component to
- perform only one. A single operation however will still be performed on a single
- component at a time and will not get faster than without load balancing. It
- always requires at least as many operations as available components and an
- efficient load balancing mechanism to make use of all components and to fully
- benefit from the load balancing. A good example of this is the number of lanes
- on a highway which allows as many cars to pass during the same time frame
- without increasing their individual speed.
-
- Examples of load balancing :
-
- - Process scheduling in multi-processor systems
- - Link load balancing (e.g. EtherChannel, Bonding)
- - IP address load balancing (e.g. ECMP, DNS round-robin)
- - Server load balancing (via load balancers)
-
- The mechanism or component which performs the load balancing operation is
- called a load balancer. In web environments these components are called a
- "network load balancer", and more commonly a "load balancer" given that this
- activity is by far the best known case of load balancing.
-
- A load balancer may act :
-
- - at the link level : this is called link load balancing, and it consists in
- choosing what network link to send a packet to;
-
- - at the network level : this is called network load balancing, and it
- consists in choosing what route a series of packets will follow;
-
- - at the server level : this is called server load balancing and it consists
- in deciding what server will process a connection or request.
-
- Two distinct technologies exist and address different needs, though with some
- overlapping. In each case it is important to keep in mind that load balancing
- consists in diverting the traffic from its natural flow and that doing so always
- requires a minimum of care to maintain the required level of consistency between
- all routing decisions.
-
- The first one acts at the packet level and processes packets more or less
- individually. There is a 1-to-1 relation between input and output packets, so
- it is possible to follow the traffic on both sides of the load balancer using a
- regular network sniffer. This technology can be very cheap and extremely fast.
- It is usually implemented in hardware (ASICs) allowing to reach line rate, such
- as switches doing ECMP. Usually stateless, it can also be stateful (consider
- the session a packet belongs to and called layer4-LB or L4), may support DSR
- (direct server return, without passing through the LB again) if the packets
- were not modified, but provides almost no content awareness. This technology is
- very well suited to network-level load balancing, though it is sometimes used
- for very basic server load balancing at high speed.
-
- The second one acts on session contents. It requires that the input streams is
- reassembled and processed as a whole. The contents may be modified, and the
- output stream is segmented into new packets. For this reason it is generally
- performed by proxies and they're often called layer 7 load balancers or L7.
- This implies that there are two distinct connections on each side, and that
- there is no relation between input and output packets sizes nor counts. Clients
- and servers are not required to use the same protocol (for example IPv4 vs
- IPv6, clear vs SSL). The operations are always stateful, and the return traffic
- must pass through the load balancer. The extra processing comes with a cost so
- it's not always possible to achieve line rate, especially with small packets.
- On the other hand, it offers wide possibilities and is generally achieved by
- pure software, even if embedded into hardware appliances. This technology is
- very well suited for server load balancing.
-
- Packet-based load balancers are generally deployed in cut-through mode, so they
- are installed on the normal path of the traffic and divert it according to the
- configuration. The return traffic doesn't necessarily pass through the load
- balancer. Some modifications may be applied to the network destination address
- in order to direct the traffic to the proper destination. In this case, it is
- mandatory that the return traffic passes through the load balancer. If the
- routes doesn't make this possible, the load balancer may also replace the
- packets' source address with its own in order to force the return traffic to
- pass through it.
-
- Proxy-based load balancers are deployed as a server with their own IP addresses
- and ports, without architecture changes. Sometimes this requires to perform some
- adaptations to the applications so that clients are properly directed to the
- load balancer's IP address and not directly to the server's. Some load balancers
- may have to adjust some servers' responses to make this possible (e.g. the HTTP
- Location header field used in HTTP redirects). Some proxy-based load balancers
- may intercept traffic for an address they don't own, and spoof the client's
- address when connecting to the server. This allows them to be deployed as if
- they were a regular router or firewall, in a cut-through mode very similar to
- the packet based load balancers. This is particularly appreciated for products
- which combine both packet mode and proxy mode. In this case DSR is obviously
- still not possible and the return traffic still has to be routed back to the
- load balancer.
-
- A very scalable layered approach would consist in having a front router which
- receives traffic from multiple load balanced links, and uses ECMP to distribute
- this traffic to a first layer of multiple stateful packet-based load balancers
- (L4). These L4 load balancers in turn pass the traffic to an even larger number
- of proxy-based load balancers (L7), which have to parse the contents to decide
- what server will ultimately receive the traffic.
-
- The number of components and possible paths for the traffic increases the risk
- of failure; in very large environments, it is even normal to permanently have
- a few faulty components being fixed or replaced. Load balancing done without
- awareness of the whole stack's health significantly degrades availability. For
- this reason, any sane load balancer will verify that the components it intends
- to deliver the traffic to are still alive and reachable, and it will stop
- delivering traffic to faulty ones. This can be achieved using various methods.
-
- The most common one consists in periodically sending probes to ensure the
- component is still operational. These probes are called "health checks". They
- must be representative of the type of failure to address. For example a ping-
- based check will not detect that a web server has crashed and doesn't listen to
- a port anymore, while a connection to the port will verify this, and a more
- advanced request may even validate that the server still works and that the
- database it relies on is still accessible. Health checks often involve a few
- retries to cover for occasional measuring errors. The period between checks
- must be small enough to ensure the faulty component is not used for too long
- after an error occurs.
-
- Other methods consist in sampling the production traffic sent to a destination
- to observe if it is processed correctly or not, and to evict the components
- which return inappropriate responses. However this requires to sacrifice a part
- of the production traffic and this is not always acceptable. A combination of
- these two mechanisms provides the best of both worlds, with both of them being
- used to detect a fault, and only health checks to detect the end of the fault.
- A last method involves centralized reporting : a central monitoring agent
- periodically updates all load balancers about all components' state. This gives
- a global view of the infrastructure to all components, though sometimes with
- less accuracy or responsiveness. It's best suited for environments with many
- load balancers and many servers.
-
- Layer 7 load balancers also face another challenge known as stickiness or
- persistence. The principle is that they generally have to direct multiple
- subsequent requests or connections from a same origin (such as an end user) to
- the same target. The best known example is the shopping cart on an online
- store. If each click leads to a new connection, the user must always be sent
- to the server which holds his shopping cart. Content-awareness makes it easier
- to spot some elements in the request to identify the server to deliver it to,
- but that's not always enough. For example if the source address is used as a
- key to pick a server, it can be decided that a hash-based algorithm will be
- used and that a given IP address will always be sent to the same server based
- on a divide of the address by the number of available servers. But if one
- server fails, the result changes and all users are suddenly sent to a different
- server and lose their shopping cart. The solution against this issue consists
- in memorizing the chosen target so that each time the same visitor is seen,
- he's directed to the same server regardless of the number of available servers.
- The information may be stored in the load balancer's memory, in which case it
- may have to be replicated to other load balancers if it's not alone, or it may
- be stored in the client's memory using various methods provided that the client
- is able to present this information back with every request (cookie insertion,
- redirection to a sub-domain, etc). This mechanism provides the extra benefit of
- not having to rely on unstable or unevenly distributed information (such as the
- source IP address). This is in fact the strongest reason to adopt a layer 7
- load balancer instead of a layer 4 one.
-
- In order to extract information such as a cookie, a host header field, a URL
- or whatever, a load balancer may need to decrypt SSL/TLS traffic and even
- possibly to re-encrypt it when passing it to the server. This expensive task
- explains why in some high-traffic infrastructures, sometimes there may be a
- lot of load balancers.
-
- Since a layer 7 load balancer may perform a number of complex operations on the
- traffic (decrypt, parse, modify, match cookies, decide what server to send to,
- etc), it can definitely cause some trouble and will very commonly be accused of
- being responsible for a lot of trouble that it only revealed. Often it will be
- discovered that servers are unstable and periodically go up and down, or for
- web servers, that they deliver pages with some hard-coded links forcing the
- clients to connect directly to one specific server without passing via the load
- balancer, or that they take ages to respond under high load causing timeouts.
- That's why logging is an extremely important aspect of layer 7 load balancing.
- Once a trouble is reported, it is important to figure if the load balancer took
- a wrong decision and if so why so that it doesn't happen anymore.