11. Well-known traps to avoid
11. Well-known traps to avoid

Once in a while, someone reports that after a system reboot, the haproxy
service wasn't started, and that once they start it by hand it works. Most
often, these people are running a clustered IP address mechanism such as
keepalived, to assign the service IP address to the master node only, and while
it used to work when they used to bind haproxy to address 0.0.0.0, it stopped
working after they bound it to the virtual IP address. What happens here is
that when the service starts, the virtual IP address is not yet owned by the
local node, so when HAProxy wants to bind to it, the system rejects this
because it is not a local IP address. The fix doesn't consist in delaying the
haproxy service startup (since it wouldn't stand a restart), but instead to
properly configure the system to allow binding to non-local addresses. This is
easily done on Linux by setting the net.ipv4.ip_nonlocal_bind sysctl to 1. This
is also needed in order to transparently intercept the IP traffic that passes
through HAProxy for a specific target address.
 
Multi-process configurations involving source port ranges may apparently seem
to work but they will cause some random failures under high loads because more
than one process may try to use the same source port to connect to the same
server, which is not possible. The system will report an error and a retry will
happen, picking another port. A high value in the "retries" parameter may hide
the effect to a certain extent but this also comes with increased CPU usage and
processing time. Logs will also report a certain number of retries. For this
reason, port ranges should be avoided in multi-process configurations.
 
Since HAProxy uses SO_REUSEPORT and supports having multiple independent
processes bound to the same IP:port, during troubleshooting it can happen that
an old process was not stopped before a new one was started. This provides
absurd test results which tend to indicate that any change to the configuration
is ignored. The reason is that in fact even the new process is restarted with a
new configuration, the old one also gets some incoming connections and
processes them, returning unexpected results. When in doubt, just stop the new
process and try again. If it still works, it very likely means that an old
process remains alive and has to be stopped. Linux's "netstat -lntp" is of good
help here.
 
When adding entries to an ACL from the command line (eg: when blacklisting a
source address), it is important to keep in mind that these entries are not
synchronized to the file and that if someone reloads the configuration, these
updates will be lost. While this is often the desired effect (for blacklisting)
it may not necessarily match expectations when the change was made as a fix for
a problem. See the "add acl" action of the CLI interface.