This section describes how to create backups of your high-availability Rancher install.

In an RKE installation, the cluster data is replicated on each of three etcd nodes in the cluster, providing redundancy and data duplication in case one of the nodes fails.

Cluster Data within an RKE Kubernetes Cluster Running the Rancher Management Server

Architecture of an RKE Kubernetes cluster running the Rancher management server

Requirements

RKE Version

The commands for taking etcd snapshots are only available in RKE v0.1.7 and later.

RKE Config File

You’ll need the RKE config file that you used for Rancher install, rancher-cluster.yml. You created this file during your initial install. Place this file in same directory as the RKE binary.

Backup Outline

Backing up your high-availability Rancher cluster is process that involves completing multiple tasks.

  1. Take Snapshots of the etcd Database

    Take snapshots of your current etcd database using Rancher Kubernetes Engine (RKE).

  2. Store Snapshot(s) Externally

    After taking your snapshots, export them to a safe location that won’t be affected if your cluster encounters issues.

1. Take Snapshots of the etcd Database

Take snapshots of your etcd database. You can use these snapshots later to recover from a disaster scenario. There are two ways to take snapshots: recurringly, or as a one-off. Each option is better suited to a specific use case. Read the short description below each link to know when to use each option.

  • Option A: Recurring Snapshots

    After you stand up a high-availability Rancher install, we recommend configuring RKE to automatically take recurring snapshots so that you always have a safe restore point available.

  • Option B: One-Time Snapshots

    We advise taking one-time snapshots before events like upgrades or restore of another snapshot.

Option A: Recurring Snapshots

For all high-availability Rancher installs, we recommend taking recurring snapshots so that you always have a safe restore point available.

To take recurring snapshots, enable the etcd-snapshot service, which is a service that’s included with RKE. This service runs in a service container alongside the etcd container. You can enable this service by adding some code to rancher-cluster.yml.

To Enable Recurring Snapshots:

The steps to enable recurring snapshots differ based on the version of RKE.

  1. Open rancher-cluster.yml with your favorite text editor.
  2. Edit the code for the etcd service to enable recurring snapshots. Snapshots can be saved in a S3 compatible backend.

    1. services:
    2. etcd:
    3. backup_config:
    4. enabled: true # enables recurring etcd snapshots
    5. interval_hours: 6 # time increment between snapshots
    6. retention: 60 # time in days before snapshot purge
    7. # Optional S3
    8. s3backupconfig:
    9. access_key: "myaccesskey"
    10. secret_key: "myaccesssecret"
    11. bucket_name: "my-backup-bucket"
    12. folder: "folder-name" # Available as of v2.3.0
    13. endpoint: "s3.eu-west-1.amazonaws.com"
    14. region: "eu-west-1"
    15. custom_ca: |-
    16. -----BEGIN CERTIFICATE-----
    17. $CERTIFICATE
    18. -----END CERTIFICATE-----
  3. Save and close rancher-cluster.yml.

  4. Open Terminal and change directory to the location of the RKE binary. Your rancher-cluster.yml file must reside in the same directory.

  5. Run the following command:

    1. rke up --config rancher-cluster.yml

Result: RKE is configured to take recurring snapshots of etcd on all nodes running the etcd role. Snapshots are saved locally to the following directory: /opt/rke/etcd-snapshots/. If configured, the snapshots are also uploaded to your S3 compatible backend.

  1. Open rancher-cluster.yml with your favorite text editor.
  2. Edit the code for the etcd service to enable recurring snapshots.

    1. services:
    2. etcd:
    3. snapshot: true # enables recurring etcd snapshots
    4. creation: 6h0s # time increment between snapshots
    5. retention: 24h # time increment before snapshot purge
  3. Save and close rancher-cluster.yml.

  4. Open Terminal and change directory to the location of the RKE binary. Your rancher-cluster.yml file must reside in the same directory.

  5. Run the following command:

    1. rke up --config rancher-cluster.yml

Result: RKE is configured to take recurring snapshots of etcd on all nodes running the etcd role. Snapshots are saved locally to the following directory: /opt/rke/etcd-snapshots/.

Option B: One-Time Snapshots

When you’re about to upgrade Rancher or restore it to a previous snapshot, you should snapshot your live image so that you have a backup of etcd in its last known state.

To Take a One-Time Local Snapshot:

  1. Open Terminal and change directory to the location of the RKE binary. Your rancher-cluster.yml file must reside in the same directory.

  2. Enter the following command. Replace <SNAPSHOT.db> with any name that you want to use for the snapshot (e.g. upgrade.db).

    1. rke etcd snapshot-save \
    2. --name <SNAPSHOT.db> \
    3. --config rancher-cluster.yml

Result: RKE takes a snapshot of etcd running on each etcd node. The file is saved to /opt/rke/etcd-snapshots.

To Take a One-Time S3 Snapshot:

Available as of RKE v0.2.0

  1. Open Terminal and change directory to the location of the RKE binary. Your rancher-cluster.yml file must reside in the same directory.

  2. Enter the following command. Replace <SNAPSHOT.db> with any name that you want to use for the snapshot (e.g. upgrade.db).

    1. rke etcd snapshot-save \
    2. --config rancher-cluster.yml \
    3. --name snapshot-name \
    4. --s3 \
    5. --access-key S3_ACCESS_KEY \
    6. --secret-key S3_SECRET_KEY \
    7. --bucket-name s3-bucket-name \
    8. --s3-endpoint s3.amazonaws.com \
    9. --folder folder-name # Available as of v2.3.0

Result: RKE takes a snapshot of etcd running on each etcd node. The file is saved to /opt/rke/etcd-snapshots. It is also uploaded to the S3 compatible backend.

2. Back up Local Snapshots to a Safe Location

Note: If you are using RKE v0.2.0, you can enable saving the backups to a S3 compatible backend directly and skip this step.

After taking the etcd snapshots, save them to a safe location so that they’re unaffected if your cluster experiences a disaster scenario. This location should be persistent.

In this documentation, as an example, we’re using Amazon S3 as our safe location, and S3cmd as our tool to create the backups. The backup location and tool that you use are ultimately your decision.

Example:

  1. root@node:~# s3cmd mb s3://rke-etcd-snapshots
  2. root@node:~# s3cmd put /opt/rke/etcd-snapshots/snapshot.db s3://rke-etcd-snapshots/