Overview

An etcd cluster needs periodic maintenance to remain reliable. Depending on an etcd application’s needs, this maintenance can usually be automated and performed without downtime or significantly degraded performance.

All etcd maintenance manages storage resources consumed by the etcd keyspace. Failure to adequately control the keyspace size is guarded by storage space quotas; if an etcd member runs low on space, a quota will trigger cluster-wide alarms which will put the system into a limited-operation maintenance mode. To avoid running out of space for writes to the keyspace, the etcd keyspace history must be compacted. Storage space itself may be reclaimed by defragmenting etcd members. Finally, periodic snapshot backups of etcd member state makes it possible to recover any unintended logical data loss or corruption caused by operational error.

History compaction

Since etcd keeps an exact history of its keyspace, this history should be periodically compacted to avoid performance degradation and eventual storage space exhaustion. Compacting the keyspace history drops all information about keys superseded prior to a given keyspace revision. The space used by these keys then becomes available for additional writes to the keyspace.

The keyspace can be compacted automatically with etcd‘s time windowed history retention policy, or manually with etcdctl. The etcdctl method provides fine-grained control over the compacting process whereas automatic compacting fits applications that only need key history for some length of time.

etcd can be set to automatically compact the keyspace with the --auto-compaction option with a period of hours:

  1. # keep one hour of history
  2. $ etcd --auto-compaction-retention=1

An etcdctl initiated compaction works as follows:

  1. # compact up to revision 3
  2. $ etcdctl compact 3

Revisions prior to the compaction revision become inaccessible:

  1. $ etcdctl get --rev=2 somekey
  2. Error: rpc error: code = 11 desc = etcdserver: mvcc: required revision has been compacted

Defragmentation

After compacting the keyspace, the backend database may exhibit internal fragmentation. Any internal fragmentation is space that is free to use by the backend but still consumes storage space. The process of defragmentation releases this storage space back to the file system. Defragmentation is issued on a per-member so that cluster-wide latency spikes may be avoided.

Compacting old revisions internally fragments etcd by leaving gaps in backend database. Fragmented space is available for use by etcd but unavailable to the host filesystem.

To defragment an etcd member, use the etcdctl defrag command:

  1. $ etcdctl defrag
  2. Finished defragmenting etcd member[127.0.0.1:2379]

Note that defragmentation to a live member blocks the system from reading and writing data while rebuilding its states.

Note that defragmentation request does not get replicated over cluster. That is, the request is only applied to the local node. Specify all members in --endpoints flag.

Space quota

The space quota in etcd ensures the cluster operates in a reliable fashion. Without a space quota, etcd may suffer from poor performance if the keyspace grows excessively large, or it may simply run out of storage space, leading to unpredictable cluster behavior. If the keyspace’s backend database for any member exceeds the space quota, etcd raises a cluster-wide alarm that puts the cluster into a maintenance mode which only accepts key reads and deletes. Only after freeing enough space in the keyspace and defragmenting the backend database, along with clearing the space quota alarm can the cluster resume normal operation.

By default, etcd sets a conservative space quota suitable for most applications, but it may be configured on the command line, in bytes:

  1. # set a very small 16MB quota
  2. $ etcd --quota-backend-bytes=$((16*1024*1024))

The space quota can be triggered with a loop:

  1. # fill keyspace
  2. $ while [ 1 ]; do dd if=/dev/urandom bs=1024 count=1024 | ETCDCTL_API=3 etcdctl put key || break; done
  3. ...
  4. Error: rpc error: code = 8 desc = etcdserver: mvcc: database space exceeded
  5. # confirm quota space is exceeded
  6. $ ETCDCTL_API=3 etcdctl --write-out=table endpoint status
  7. +----------------+------------------+-----------+---------+-----------+-----------+------------+
  8. | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
  9. +----------------+------------------+-----------+---------+-----------+-----------+------------+
  10. | 127.0.0.1:2379 | bf9071f4639c75cc | 2.3.0+git | 18 MB | true | 2 | 3332 |
  11. +----------------+------------------+-----------+---------+-----------+-----------+------------+
  12. # confirm alarm is raised
  13. $ ETCDCTL_API=3 etcdctl alarm list
  14. memberID:13803658152347727308 alarm:NOSPACE

Removing excessive keyspace data and defragmenting the backend database will put the cluster back within the quota limits:

  1. # get current revision
  2. $ rev=$(ETCDCTL_API=3 etcdctl --endpoints=:2379 endpoint status --write-out="json" | egrep -o '"revision":[0-9]*' | egrep -o '[0-9]*')
  3. # compact away all old revisions
  4. $ ETCDCTL_API=3 etcdctl compact $rev
  5. compacted revision 1516
  6. # defragment away excessive space
  7. $ ETCDCTL_API=3 etcdctl defrag
  8. Finished defragmenting etcd member[127.0.0.1:2379]
  9. # disarm alarm
  10. $ ETCDCTL_API=3 etcdctl alarm disarm
  11. memberID:13803658152347727308 alarm:NOSPACE
  12. # test puts are allowed again
  13. $ ETCDCTL_API=3 etcdctl put newkey 123
  14. OK

Snapshot backup

Snapshotting the etcd cluster on a regular basis serves as a durable backup for an etcd keyspace. By taking periodic snapshots of an etcd member’s backend database, an etcd cluster can be recovered to a point in time with a known good state.

A snapshot is taken with etcdctl:

  1. $ etcdctl snapshot save backup.db
  2. $ etcdctl --write-out=table snapshot status backup.db
  3. +----------+----------+------------+------------+
  4. | HASH | REVISION | TOTAL KEYS | TOTAL SIZE |
  5. +----------+----------+------------+------------+
  6. | fe01cf57 | 10 | 7 | 2.1 MB |
  7. +----------+----------+------------+------------+