YDB overview
YDB is a horizontally scalable distributed fault tolerant DBMS. YDB is designed for high performance with a typical server being capable of handling tens of thousands of queries per second. The system is designed to handle hundreds of petabytes of data. YDB can operate both in single datacenter and geodistributed (across several datacenters) modes on a cluster made up of thousands of servers.
YDB provides:
- Strict consistency, which can be lowered in order to raise performance.
- Support of YQL queries (an SQL dialect for managing big data).
- Automatic data replication.
- High availability with automatic failover in case a server, rack, or availability zone goes offline.
- Automatic data partitioning as data or load grows.
To interact with YDB, you can use the YDB CLI or SDK for C++, Java, Python, Node.js, PHP, and Go.
YDB supports a relational data model and manages tables with a predefined schema. To make it easier to organize tables, directories can be created like in the file system.
Database commands are mainly written in YQL, an SQL dialect. This gives the user a powerful and familiar way to interact with the database.
YDB supports high-performance distributed ACID) transactions that may affect multiple records in different tables. It provides the serializable isolation level, which is the strictest transaction isolation. You can also lower the level of isolation to raise performance.
YDB is designed to service different workload types, such as OLTP and OLAP. The current version offers limited analytical query support. This is why we can say that YDB is currently an OLTP database.
YDB is used in Yandex services as a high-performance OLTP DBMS. In particular, Yandex.Cloud Yandex Object Storage use YDB to store data and are based on its components.
Use cases
YDB can be used as an alternative solution in the following cases:
- When using NoSQL systems, if strong data consistency is required.
- When using NoSQL systems, if you need to make transactional updates to data stored in different rows of one or more tables.
- In systems that need to process and store large amounts of data and allow for virtually unlimited horizontal scalability (using industrial clusters of 5000+ nodes, processing millions of RPS, and storing petabytes of data).
- In low-load systems, when supporting a separate DB instance would be a waste of money (consider using YDB in serverless mode instead).
- In systems with unpredictable or seasonally fluctuating load (you can add/reduce computing resources on request and/or in serverless mode).
- In high-load systems that shard load across relational DB instances.
- When developing a new product with no reliable load forecast or with an expected high load beyond the capabilities of conventional relational databases.