Using BIRD to run BGP

BIRD is an open-source implementation for routing Internet Protocol packets on Unix-like operating systems. If you are not familiar with it, you had best have a glance at the User’s Guide first.

BIRD provides a way to advertise routes using traditional networking protocols to allow Cilium-managed endpoints to be accessible outside the cluster. This guide assumes that Cilium is already deployed in the cluster, and that the remaining piece is how to ensure that the pod CIDR ranges are externally routable.

BIRD maintains two release families at present: 1.x and 2.x, and the configuration format varies a lot between them. Unless you have already deployed the 1.x, we suggest using 2.x directly, as the 2.x will live longer. The following examples will denote bird as the bird2 software and use configuration in the format that bird2 understands.

This guide shows how to install and configure bird on CentOS 7.x to make it collaborate with Cilium. Installation and configuration on other platforms should be very similar.

Install bird

  1. $ yum install -y bird2
  2. $ systemctl enable bird
  3. $ systemctl restart bird

Test the installation:

  1. $ birdc show route
  2. BIRD 2.0.6 ready.
  3. $ birdc # interactive shell
  4. BIRD 2.0.6 ready.
  5. bird> show bfd sessions
  6. There is no BFD protocol running
  7. bird>
  8. bird> show protocols all
  9. Name Proto Table State Since Info
  10. device1 Device --- up 10:53:40.147
  11. direct1 Direct --- down 10:53:40.147
  12. Channel ipv4
  13. State: DOWN
  14. Input filter: ACCEPT
  15. Output filter: REJECT
  16. ...

Basic configuration

It’s hard to discuss bird configurations without considering specific BGP schemes. However, BGP scheme design is beyond the scope of this guide. If you are interested in this topic, refer to BGP in the Data Center (O’Reilly, 2017) for a quick start.

In the following, we will restrict our BGP scenario as follows:

../../_images/bird_sample_topo.png

  • physical network: simple 3-tier hierarchical architecture

  • nodes connect to physical network via layer 2 switches

  • announcing each node’s PodCIDR to physical network via bird

  • for each node, do not import route announcements from physical network

In this design, the BGP connections look like this:

../../_images/bird_sample_bgp.png

This scheme is simple in that:

  • core routers learn PodCIDRs from bird, which makes the Pod IP addresses routable within the entire network.

  • bird doesn’t learn routes from core routers and other nodes, which keeps the kernel routing table of each node clean and small, and suffering no performance issues.

In this scheme, each node just sends pod egress traffic to node’s default gateway (the core routers), and lets the latter do the routing.

Below is the a reference configuration for fulfilling the above purposes:

  1. $ cat /etc/bird.conf
  2. log syslog all;
  3. router id {{ NODE_IP }};
  4. protocol device {
  5. scan time 10; # Scan interfaces every 10 seconds
  6. }
  7. # Disable automatically generating direct routes to all network interfaces.
  8. protocol direct {
  9. disabled; # Disable by default
  10. }
  11. # Forbid synchronizing BIRD routing tables with the OS kernel.
  12. protocol kernel {
  13. ipv4 { # Connect protocol to IPv4 table by channel
  14. import none; # Import to table, default is import all
  15. export none; # Export to protocol. default is export none
  16. };
  17. }
  18. # Static IPv4 routes.
  19. protocol static {
  20. ipv4;
  21. route {{ POD_CIDR }} via "cilium_host";
  22. }
  23. # BGP peers
  24. protocol bgp uplink0 {
  25. description "BGP uplink 0";
  26. local {{ NODE_IP }} as {{ NODE_ASN }};
  27. neighbor {{ NEIGHBOR_0_IP }} as {{ NEIGHBOR_0_ASN }};
  28. password {{ NEIGHBOR_PWD }};
  29. ipv4 {
  30. import filter {reject;};
  31. export filter {accept;};
  32. };
  33. }
  34. protocol bgp uplink1 {
  35. description "BGP uplink 1";
  36. local {{ NODE_IP }} as {{ NODE_ASN }};
  37. neighbor {{ NEIGHBOR_1_IP }} as {{ NEIGHBOR_1_ASN }};
  38. password {{ NEIGHBOR_PWD }};
  39. ipv4 {
  40. import filter {reject;};
  41. export filter {accept;};
  42. };
  43. }

Save the above file as /etc/bird.conf, and replace the placeholders with your own:

  1. sed -i 's/{{ NODE_IP }}/<your node ip>/g' /etc/bird.conf
  2. sed -i 's/{{ POD_CIDR }}/<your pod cidr>/g' /etc/bird.conf
  3. sed -i 's/{{ NODE_ASN }}/<your node asn>/g' /etc/bird.conf
  4. sed -i 's/{{ NEIGHBOR_0_IP }}/<your neighbor 0 ip>/g' /etc/bird.conf
  5. sed -i 's/{{ NEIGHBOR_1_IP }}/<your neighbor 1 ip>/g' /etc/bird.conf
  6. sed -i 's/{{ NEIGHBOR_0_ASN }}/<your neighbor 0 asn>/g' /etc/bird.conf
  7. sed -i 's/{{ NEIGHBOR_1_ASN }}/<your neighbor 1 asn>/g' /etc/bird.conf
  8. sed -i 's/{{ NEIGHBOR_PWD }}/<your neighbor password>/g' /etc/bird.conf

Restart bird and check the logs:

  1. $ systemctl restart bird
  2. # check logs
  3. $ journalctl -u bird
  4. -- Logs begin at Sat 2020-02-22 16:11:44 CST, end at Mon 2020-02-24 18:58:35 CST. --
  5. Feb 24 18:58:24 node systemd[1]: Started BIRD Internet Routing Daemon.
  6. Feb 24 18:58:24 node systemd[1]: Starting BIRD Internet Routing Daemon...
  7. Feb 24 18:58:24 node bird[137410]: Started

Verify the changes, you should get something like this:

  1. $ birdc show route
  2. BIRD 2.0.6 ready.
  3. Table master4:
  4. 10.5.48.0/24 unicast [static1 20:14:51.478] * (200)
  5. dev cilium_host

This indicates that the PodCIDR 10.5.48.0/24 on this node has been successfully imported into BIRD.

  1. $ birdc show protocols all uplink0 | grep -A 3 -e "Description" -e "stats"
  2. Description: BGP uplink 0
  3. BGP state: Established
  4. Neighbor address: 10.4.1.7
  5. Neighbor AS: 65418
  6. --
  7. Route change stats: received rejected filtered ignored accepted
  8. Import updates: 0 0 0 0 0
  9. Import withdraws: 10 0 --- 10 0
  10. Export updates: 1 0 0 --- 1

Here we see that the uplink0 BGP session is established and our PodCIDR from above has been exported and accepted by the BGP peer.

Monitoring

bird_exporter could collect bird daemon states, and export Prometheus-style metrics.

It also provides a simple Grafana dashboard, but you could also create your own, e.g. Trip.com’s looks like this:

../../_images/bird_dashboard.png

Advanced Configurations

You may need some advanced configurations to make your BGP scheme production-ready. This section lists some of these parameters, but we will not dive into details, that’s BIRD User’s Guide’s responsibility.

BFD

Bidirectional Forwarding Detection (BFD) is a detection protocol designed to accelerate path failure detection.

This feature also relies on peer side’s configuration.

  1. protocol bfd {
  2. interface "{{ grains['node_mgnt_device'] }}" {
  3. min rx interval 100 ms;
  4. min tx interval 100 ms;
  5. idle tx interval 300 ms;
  6. multiplier 10;
  7. password {{ NEIGHBOR_PWD }};
  8. };
  9. neighbor {{ NEIGHBOR_0_IP] }};
  10. neighbor {{ NEIGHBOR_1_IP] }};
  11. }
  12. protocol bgp uplink0 {
  13. ...
  14. bfd on;
  15. }

Verify, you should see something like this:

  1. $ birdc show bfd sessions
  2. BIRD 2.0.6 ready.
  3. bfd1:
  4. IP address Interface State Since Interval Timeout
  5. 10.5.40.2 bond0 Up 20:14:51.479 0.300 0.000
  6. 10.5.40.3 bond0 Up 20:14:51.479 0.300 0.000

ECMP

For some special purposes (e.g. L4LB), you may configure a same CIDR on multiple nodes. In this case, you need to configure Equal-Cost Multi-Path (ECMP) routing.

This feature also relies on peer side’s configuration.

  1. protocol kernel {
  2. ipv4 { # Connect protocol to IPv4 table by channel
  3. import none; # Import to table, default is import all
  4. export none; # Export to protocol. default is export none
  5. };
  6. # Configure ECMP
  7. merge paths yes limit {{ N }} ;
  8. }

See the user manual for more detailed information.

You need to check the ECMP correctness on physical network (Core router in the above scenario):

  1. CORE01# show ip route 10.5.2.0
  2. IP Route Table for VRF "default"
  3. '*' denotes best ucast next-hop
  4. '**' denotes best mcast next-hop
  5. '[x/y]' denotes [preference/metric]
  6. '%<string>' in via output denotes VRF <string>
  7. 10.5.2.0/24, ubest/mbest: 2/0
  8. *via 10.4.1.7, [200/0], 13w6d, bgp-65418, internal, tag 65418
  9. *via 10.4.1.8, [200/0], 12w4d, bgp-65418, internal, tag 65418

Graceful restart

This feature also relies on peer side’s configuration.

Add graceful restart to each bgp section:

  1. protocol bgp uplink0 {
  2. ...
  3. graceful restart;
  4. }