The etcd v3 API is designed to give users a more efficient and cleaner abstraction compared to etcd v2. There are a number of semantic and protocol changes in this new API. For an overview see Xiang Li’s video.
To prove out the design of the v3 API the team has also built a number of example recipes, there is a video discussing these recipes too.
Design
Flatten binary key-value space
Keep the event history until compaction
- access to old version of keys
- user controlled history compaction
Support range query
- Pagination support with limit argument
- Support consistency guarantee across multiple range queries
Replace TTL key with Lease
- more efficient/ low cost keep alive
- a logical group of TTL keys
Replace CAS/CAD with multi-object Txn
- MUCH MORE powerful and flexible
Support efficient watching with multiple ranges
RPC API supports the completed set of APIs.
- more efficient than JSON/HTTP
- additional txn/lease support
HTTP API supports a subset of APIs.
- easy for people to try out etcd
- easy for people to write simple etcd application
Notes
Request Size Limitation
The max request size is around 1MB. Since etcd replicates requests in a streaming fashion, a very large request might block other requests for a long time. The use case for etcd is to store small configuration values, so we prevent user from submitting large requests. This also applies to Txn requests. We might loosen the size in the future a little bit or make it configurable.
Protobuf Defined API
Examples
Put a key (foo=bar)
// A put is always successful
Put( PutRequest { key = foo, value = bar } )
PutResponse {
cluster_id = 0x1000,
member_id = 0x1,
revision = 1,
raft_term = 0x1,
}
Get a key (assume we have foo=bar)
Get ( RangeRequest { key = foo } )
RangeResponse {
cluster_id = 0x1000,
member_id = 0x1,
revision = 1,
raft_term = 0x1,
kvs = {
{
key = foo,
value = bar,
create_revision = 1,
mod_revision = 1,
version = 1;
},
},
}
Range over a key space (assume we have foo0=bar0… foo100=bar100)
Range ( RangeRequest { key = foo, end_key = foo80, limit = 30 } )
RangeResponse {
cluster_id = 0x1000,
member_id = 0x1,
revision = 100,
raft_term = 0x1,
kvs = {
{
key = foo0,
value = bar0,
create_revision = 1,
mod_revision = 1,
version = 1;
},
...,
{
key = foo30,
value = bar30,
create_revision = 30,
mod_revision = 30,
version = 1;
},
},
}
Finish a txn (assume we have foo0=bar0, foo1=bar1)
Txn(TxnRequest {
// mod_revision of foo0 is equal to 1, mod_revision of foo1 is greater than 1
compare = {
{compareType = equal, key = foo0, mod_revision = 1},
{compareType = greater, key = foo1, mod_revision = 1}}
},
// if the comparison succeeds, put foo2 = bar2
success = {PutRequest { key = foo2, value = success }},
// if the comparison fails, put foo2=fail
failure = {PutRequest { key = foo2, value = failure }},
)
TxnResponse {
cluster_id = 0x1000,
member_id = 0x1,
revision = 3,
raft_term = 0x1,
succeeded = true,
responses = {
// response of PUT foo2=success
{
cluster_id = 0x1000,
member_id = 0x1,
revision = 3,
raft_term = 0x1,
}
}
}
Watch on a key/range
Watch( WatchRequest{
key = foo,
end_key = fop, // prefix foo
start_revision = 20,
end_revision = 10000,
// server decided notification frequency
progress_notification = true,
}
… // this can be a watch request stream
)
// put (foo0=bar0) event at 3
WatchResponse {
cluster_id = 0x1000,
member_id = 0x1,
revision = 3,
raft_term = 0x1,
event_type = put,
kv = {
key = foo0,
value = bar0,
create_revision = 1,
mod_revision = 1,
version = 1;
},
}
…
// a notification at 2000
WatchResponse {
cluster_id = 0x1000,
member_id = 0x1,
revision = 2000,
raft_term = 0x1,
// nil event as notification
}
…
// put (foo0=bar3000) event at 3000
WatchResponse {
cluster_id = 0x1000,
member_id = 0x1,
revision = 3000,
raft_term = 0x1,
event_type = put,
kv = {
key = foo0,
value = bar3000,
create_revision = 1,
mod_revision = 3000,
version = 2;
},
}
…