Topology Service
Also known as the TOPO or lock service
The Topology Service is a set of backend processes running on different servers. Those servers store topology data and provide a distributed locking service.
Vitess uses a plug-in system to support various backends for storing topology data, which are assumed to provide a distributed, consistent key-value store. By default, our local example uses the ZooKeeper plugin, and the Kubernetes example uses etcd.
The topology service exists for several reasons:
- It enables tablets to coordinate among themselves as a cluster.
- It enables Vitess to discover tablets, so it knows where to route queries.
- It stores Vitess configuration provided by the database administrator that is needed by many different servers in the cluster, and that must persist between server restarts.A Vitess cluster has one global topology service, and a local topology service in each cell. Since cluster is an overloaded term, and one Vitess cluster is distinguished from another by the fact that each has its own global topology service, we refer to each Vitess cluster as a toposphere.
Global Topology
The global topology service stores Vitess-wide data that does not change frequently. Specifically, it contains data about keyspaces and shards as well as the master tablet alias for each shard.
The global topology is used for some operations, including reparenting and resharding. By design, the global topology service is not used a lot.
In order to survive any single cell going down, the global topology service should have nodes in multiple cells, with enough to maintain quorum in the event of a cell failure.
Local Topology
Each local topology contains information related to its own cell. Specifically, it contains data about tablets in the cell, the keyspace graph for that cell, and the replication graph for that cell.
The local topology service must be available for Vitess to discover tablets and adjust routing as tablets come and go. However, no calls to the topology service are made in the critical path of serving a query at steady state. That means queries are still served during temporary unavailability of topology.