7. Distributed SQL over TiKV - 《TiKV deep dive》

Distributed SQL over TiKV

TiKV is the storage layer for TiDB, a distributed HTAP SQL database. So far,we have only explained how a distributed transactional Key-Value database isimplemented. However this is still far from serving a SQL database. We willexplore and cover the following things in this chapter:

Storage

In this section we will see how the TiDB relational structure (i.e. SQL tablerecords and indexes) are encoded into the Key-Value form in the latestversion. We will also explore a new Key-Value format that is going to beimplemented soon and some insights on even better Key-Value formats in future.

Distributed SQL (DistSQL)

Storing data in a distributed manner using TiKV only utilizes distributed I/Oresources, while the TiDB node that receives SQL query is still in charge ofprocessing all rows. We can go a step further by delegating some processingtasks into TiKV nodes. This way, we can utilize distributed CPU resources! Inthis section, we will take a look at these supported physical plan executorsso far in TiKV and see how they enable TiDB executing SQL queries in adistributed way.

TiKV Query Execution Engine

When talking about executors we cannot ignore discussing the execution engine.Although executors running on TiKV are highly simplified and limited, we stillneed to carefully design the execution engine. It is critical to theperformance of the system. This section will cover the traditional Volcanomodel execution engine used before TiKV 3.0, for example, how it works, prosand cons, and the architecture.

Vectorization

Vectorization is a technique that performs computing over a batch of values.By introducing vectorization into the execution engine, we will achieve higherperformance. This section will introduce its theory and the architecture ofthe vectorized execution engine introduced in TiKV 3.0.