Dual-cluster upgrade

The dual-cluster upgrade strategy is a Kong Gateway upgrade option used primarily for traditional mode deployments and for control planes in hybrid mode.

This guide refers to the old version as cluster X and the new version as cluster Y.

With a dual-cluster upgrade, you deploy a new cluster of version Y alongside the current version X, so that two clusters serve requests concurrently during the upgrade process. You will gradually adjust the traffic ratio between the two clusters to switch traffic over from the old cluster to the new one based on the business metrics.

  1. flowchart TD
  2. DBX[(Current
  3. database)]
  4. DBY[(New
  5. database)]
  6. CPX(Current
  7. Kong Gateway X)
  8. Admin(No admin
  9. write operations)
  10. Admin2(No admin
  11. write operations)
  12. CPY(New
  13. Kong Gateway Y)
  14. LB(Load balancer)
  15. API(API requests)
  16. API --> LB & LB & LB & LB
  17. Admin2 -."X".- CPX
  18. LB -.90%.-> CPX
  19. LB --10%--> CPY
  20. Admin -."X".- CPY
  21. CPX -.-> DBX
  22. CPY --pg_restore--> DBY
  23. style API stroke:none
  24. style DBX stroke-dasharray:3
  25. style CPX stroke-dasharray:3
  26. style Admin fill:none,stroke:none,color:#d44324
  27. style Admin2 fill:none,stroke:none,color:#d44324
  28. linkStyle 4,7 stroke:#d44324,color:#d44324
  29. linkStyle 3,6,9 stroke:#b6d7a8

Figure 1: The diagram shows a Kong Gateway upgrade using the dual-cluster strategy. The new Kong Gateway cluster Y is deployed alongside the current Kong Gateway cluster X. A new database serves the new deployment. Traffic is gradually switched over to the new deployment, until all API traffic is migrated.

This upgrade strategy is the safest of all available strategies and ensures that there is no planned business downtime during the upgrade process.

This method has limitations on automatically generated runtime metrics that rely on the database. During the upgrade, some runtime metrics (for example, the number of requests) are sent to two databases separately. Since the metrics between the databases are not synced, metrics will not be accurate for the duration of the upgrade.

For example, if the Rate Limiting Advanced (RLA) plugin is configured to store request counters in the database, the counters between database X and database Y are not synchronized. The impact scope depends on the window_size parameter of the plugin and the duration of the upgrade process.

Similarly, the same limitation applies to Vitals if you have a large amount of buffered metrics in PostgreSQL or Cassandra.

Prerequisites

  • Review the general upgrade guide to prepare for the upgrade and review your options.
  • You have a traditional deployment or you need to upgrade the control planes (CPs) in a hybrid mode deployment.
  • You have enough resources to temporarily run an additional Kong Gateway cluster alongside your existing cluster.

Upgrade using the dual-cluster method

The following steps are intended as a guideline. The exact execution of these steps will vary depending on your environment.

  1. Stop any Kong Gateway configuration updates (e.g. Admin API calls). This is critical to guarantee data consistency between cluster X and cluster Y.

    To keep data consistency between the two clusters, you must not execute any write operations through the Admin API, Kong Manager, decK, or direct database updates.

  2. Back up data from the current cluster X by following the Backup guide.

  3. Evaluate factors that may impact the upgrade, as described in Upgrade considerations. You may have to consider customization of both kong.conf and Kong Gateway configuration data.

  4. Evaluate any changes that have happened between releases:

  5. Deploy a new Kong Gateway cluster of version Y:

    1. Install a new Kong Gateway cluster running version Y as instructed in the Kong Gateway Installation Options.

      Provision the new cluster Y with the same-sized resource capacity as that of the current cluster X.

    2. Install a new database of the same version.

    3. Restore the backup data to the new database.

    4. Configure the new cluster Y to point to the new database.

    5. Start cluster Y.

    6. Perform staging tests against version Y to make sure it works for all use cases.

      For example, does the Key Authentication plugin authenticate requests properly?

      If the outcome is not as expected, look over the upgrade considerations and the breaking changes again to see if you missed anything.

  6. Divert traffic from old cluster X to new cluster Y.

    This is usually done gradually and incrementally, depending on the risk profile of the deployment. Any load balancers that support traffic splitting will work here, such as DNS, Nginx, Kubernetes rollout mechanisms, and so on.

  7. Actively monitor all proxy metrics.

  8. If any issues arise, roll back by setting all traffic to cluster X, investigate the issues, and repeat the steps above.

  9. When there are no more issues, decommission the old cluster X to complete the upgrade.

Write updates to Kong Gateway can now be performed, though we suggest you keep monitoring metrics for a while.