TiDB 8.0.0 Release Notes
Release date: March 29, 2024
TiDB version: 8.0.0
Quick access: Quick start
8.0.0 introduces the following key features and improvements:
Category | Feature/Enhancement | Description |
---|---|---|
Scalability and Performance | Disaggregation of PD to improve scalability (experimental) | Placement Driver (PD) contains multiple critical modules to ensure the normal operation of TiDB clusters. As the workload of a cluster increases, the resource consumption of each module in PD also increases, causing mutual interference between these modules and ultimately affecting the overall service quality of the cluster. Starting from v8.0.0, TiDB addresses this issue by splitting the TSO and scheduling modules in PD into independently deployable microservices. This can significantly reduce the mutual interference between modules as the cluster scales. With this architecture, much larger clusters with much larger workloads are now possible. |
Bulk DML for much larger transactions (experimental) | Large batch DML jobs, such as extensive cleanup jobs, joins, or aggregations, can consume a significant amount of memory and have previously been limited at very large scales. Bulk DML (tidb_dml_type = “bulk” ) is a new DML type for handling large batch DML tasks more efficiently while providing transaction guarantees and mitigating OOM issues. This feature differs from import, load, and restore operations when used for data loading. | |
Acceleration of cluster snapshot restore speed (GA) | With this feature, BR can fully leverage the scale advantage of a cluster, enabling all TiKV nodes in the cluster to participate in the preparation step of data restores. This feature can significantly improve the restore speed of large datasets in large-scale clusters. Real-world tests show that this feature can saturate the download bandwidth, with the download speed improving by 8 to 10 times, and the end-to-end restore speed improving by approximately 1.5 to 3 times. | |
Enhance the stability of caching the schema information when there is a massive number of tables (experimental) | SaaS companies using TiDB as the system of record for their multi-tenant applications often need to store a substantial number of tables. In previous versions, handling table counts in the order of a million or more was feasible, but it had the potential to degrade the overall user experience. TiDB v8.0.0 improves the situation with the following enhancements:
| |
DB Operations and Observability | Support monitoring index usage statistics | Proper index design is a crucial prerequisite to maintaining database performance. TiDB v8.0.0 introduces the INFORMATION_SCHEMA.TIDB_INDEX_USAGE table and the sys.schema_unused_indexes view to provide usage statistics of indexes. This feature helps you assess the efficiency of indexes in the database and optimize the index design. |
Data Migration | TiCDC adds support for the Simple protocol | TiCDC introduces a new protocol, the Simple protocol. This protocol provides in-band schema tracking capabilities by embedding table schema information in DDL and BOOTSTRAP events. |
TiCDC adds support for the Debezium format protocol | TiCDC introduces a new protocol, the Debezium protocol. TiCDC can now publish data change events to a Kafka sink using a protocol that generates Debezium style messages. |
Feature details
Scalability
PD supports the microservice mode (experimental) #5766 @binshi-bing
Starting from v8.0.0, PD supports the microservice mode. This mode splits the timestamp allocation and cluster scheduling functions of PD into separate microservices that can be deployed independently, thereby enhancing performance scalability for PD and addressing performance bottlenecks of PD in large-scale clusters.
tso
microservice: provides monotonically increasing timestamp allocation for the entire cluster.scheduling
microservice: provides scheduling functions for the entire cluster, including but not limited to load balancing, hot spot handling, replica repair, and replica placement.
Each microservice is deployed as an independent process. If you configure more than one replica for a microservice, the microservice automatically implements a primary-secondary fault-tolerant mode to ensure high availability and reliability of the service.
Currently, PD microservices can only be deployed using TiDB Operator. It is recommended to consider this mode when PD becomes a significant performance bottleneck that cannot be resolved by scaling up.
For more information, see documentation.
Enhance the usability of the Titan engine #16245 @Connor1996
- Enable the shared cache for Titan blob files and RocksDB block files by default (shared-blob-cache defaults to
true
), eliminating the need to configure blob-cache-size separately. - Support dynamically modifying min-blob-size, blob-file-compression, and discardable-ratio to improve performance and flexibility when using the Titan engine.
For more information, see documentation.
- Enable the shared cache for Titan blob files and RocksDB block files by default (shared-blob-cache defaults to
Performance
BR improves snapshot restore speed (GA) #50701 @3pointer @Leavrth
Starting from TiDB v8.0.0, the acceleration of snapshot restore speed is now generally available (GA) and enabled by default. BR improves the snapshot restore speed significantly by implementing various optimizations such as adopting the coarse-grained Region scattering algorithm, creating databases and tables in batches, reducing the mutual impact between SST file downloads and ingest operations, and accelerating the restore of table statistics. According to test results from real-world cases, the data restore speed of a single TiKV node stabilizes at 1.2 GiB/s, and 100 TiB of data can be restored within one hour.
This means that even in high-load environments, BR can fully utilize the resources of each TiKV node, significantly reducing database restore time, enhancing the availability and reliability of databases, and reducing downtime and business losses caused by data loss or system failures. Note that the increase in restore speed is attributed to the parallel execution of a large number of goroutines, which can result in significant memory consumption, especially when there are many tables or Regions. It is recommended to use machines with higher memory capacity to run the BR client. If the memory capacity of the machine is limited, it is recommended to use a finer-grained Region scattering algorithm. In addition, because the coarse-grained Region scattering algorithm might consume a significant amount of external storage bandwidth, you need to avoid any impact on other applications due to insufficient external bandwidth.
For more information, see documentation.
Support pushing down the following functions to TiFlash #50975 #50485 @yibin87 @windtalker
CAST(DECIMAL AS DOUBLE)
POWER()
For more information, see documentation.
The concurrent HashAgg algorithm of TiDB supports disk spill (experimental) #35637 @xzhangxian1008
In earlier versions of TiDB, the concurrency algorithm of the HashAgg operator does not support disk spill. If the execution plan of a SQL statement contains the concurrent HashAgg operator, all the data for that SQL statement can only be processed in memory. Consequently, TiDB has to process a large amount of data in memory. When the data size exceeds the memory limit, TiDB can only choose the non-concurrent algorithm, which does not leverage concurrency for performance improvement.
In v8.0.0, the concurrent HashAgg algorithm of TiDB supports disk spill. Under any concurrent conditions, the HashAgg operator can automatically trigger data spill based on memory usage, thus balancing performance and data throughput. Currently, as an experimental feature, TiDB introduces the
tidb_enable_concurrent_hashagg_spill
variable to control whether to enable the concurrent HashAgg algorithm that supports disk spill. When this variable isON
, it means enabled. This variable will be deprecated after the feature is generally available in a future release.For more information, see documentation.
Introduce the priority queue for automatic statistics collection #50132 @hi-rustin
Maintaining optimizer statistics up-to-date is the key to stabilizing database performance. Most users rely on the automatic statistics collection provided by TiDB to collect the latest statistics. Automatic statistics collection checks the status of statistics for all objects, and adds unhealthy objects to a queue for sequential collections. In previous versions, the order is random, which could result in excessive waits for more worthy candidates to be updated, causing potential performance regressions.
Starting from v8.0.0, automatic statistics collection dynamically sets priorities for objects in combination with a variety of conditions to ensure that more deserving candidates are processed in priority, such as newly created indexes and partitioned tables with definition changes. Additionally, TiDB prioritizes tables with lower health scores, placing them at the top of the queue. This enhancement makes the order of collection more reasonable, and reduces performance problems caused by outdated statistics, therefore improving database stability.
For more information, see documentation.
Remove some limitations on execution plan cache #49161 @mjonss @qw4990
TiDB supports plan cache, which can effectively reduce the latency of OLTP systems and is important for performance. In v8.0.0, TiDB removes several limitations on plan cache. Execution plans with the following items can be cached now:
- Partitioned tables
- Generated columns, including objects that depend on generated columns (such as multi-valued indexes)
This enhancement extends the use cases of plan cache and improves the overall database performance in complex scenarios.
For more information, see documentation.
Optimizer enhances support for multi-valued indexes #47759 #46539 @Arenatlx @time-and-fate
TiDB v6.6.0 introduces multi-value indexes to improve query performance for JSON data types. In v8.0.0, the optimizer enhances its support for multi-valued indexes and can correctly identify and utilize them to optimize queries in complex scenarios.
- The optimizer collects statistics on multi-valued indexes and decides execution plans with the statistics. If several multi-value indexes can be selected by a SQL statement, the optimizer can identify the one with lower cost.
- When using
OR
to connect multiplemember of
conditions, the optimizer can match an effective index partial path for each DNF item (amember of
condition) and combine these paths using Union to form anIndex Merge
. This achieves more efficient condition filtering and data fetch.
For more information, see documentation.
Support configuring the update interval for low-precision TSO #51081 @Tema
The low-precision TSO feature in TiDB uses regularly updated TSO as the transaction timestamp. In scenarios where reading outdated data is acceptable, this feature reduces the overhead of obtaining TSO for small read-only transactions by sacrificing real-time performance and improves the ability of high-concurrency reads.
Before v8.0.0, the TSO update interval of low-precision TSO feature is fixed and cannot be adjusted according to actual application requirements. In v8.0.0, TiDB introduces the system variable
tidb_low_resolution_tso_update_interval
to control the TSO update interval. This feature takes effect only when the low-precision TSO feature is enabled.For more information, see documentation.
Reliability
Support caching required schema information according to the LRU algorithm to reduce memory consumption on the TiDB server (experimental) #50959 @gmhdbjd
Before v8.0.0, each TiDB node caches the schema information of all tables. In scenarios with hundreds of thousands of tables, just caching these table schemas could consume a significant amount of memory.
Starting from v8.0.0, TiDB introduces the system variable tidb_schema_cache_size, which enables you to set an upper limit for caching schema information, thereby preventing excessive memory usage. When you enable this feature, TiDB uses the Least Recently Used (LRU) algorithm to cache the required tables, effectively reducing the memory consumed by the schema information.
For more information, see documentation.
Availability
The proxy component TiProxy becomes generally available (GA) #413 @djshow832 @xhebox
TiDB v7.6.0 introduces the proxy component TiProxy as an experimental feature. TiProxy is the official proxy component of TiDB, located between the client and TiDB server. It provides load balancing and connection persistence functions for TiDB, making the workload of the TiDB cluster more balanced and not affecting user access to the database during maintenance operations.
In v8.0.0, TiProxy becomes generally available and enhances the automatic generation of signature certificates and monitoring functions.
The usage scenarios of TiProxy are as follows:
- During maintenance operations such as rolling restarts, rolling upgrades, and scaling-in in a TiDB cluster, changes occur in the TiDB servers which result in interruptions in connections between clients and the TiDB servers. By using TiProxy, connections can be smoothly migrated to other TiDB servers during these maintenance operations so that clients are not affected.
- Client connections to a TiDB server cannot be dynamically migrated to other TiDB servers. When the workload of multiple TiDB servers is unbalanced, it might result in a situation where the overall cluster resources are sufficient, but certain TiDB servers experience resource exhaustion leading to a significant increase in latency. To address this issue, TiProxy provides dynamic migration for connection, which allows connections to be migrated from one TiDB server to another without any impact on the clients, thereby achieving load balancing for the TiDB cluster.
TiProxy has been integrated into TiUP, TiDB Operator, and TiDB Dashboard, making it easy to configure, deploy, and maintain.
For more information, see documentation.
SQL
Support a new DML type for handling a large amount of data (experimental) #50215 @ekexium
Before v8.0.0, TiDB stores all transaction data in memory before committing. When processing a large amount of data, the memory required for transactions becomes a bottleneck that limits the transaction size that TiDB can handle. Although TiDB introduces non-transactional DML to attempt to solve the transaction size limitation by splitting SQL statements, this feature has various limitations and does not provide an ideal experience in actual scenarios.
Starting from v8.0.0, TiDB supports a DML type for handling a large amount of data. This DML type writes data to TiKV in a timely manner during execution, avoiding the continuous storage of all transaction data in memory, and thus supports handling a large amount of data that exceeds the memory limit. This DML type ensures transaction integrity and uses the same syntax as standard DML.
INSERT
,UPDATE
,REPLACE
, andDELETE
statements can use this new DML type to execute large-scale DML operations.This DML type is implemented by the Pipelined DML feature and only takes effect on statements with auto-commit enabled. You can control whether to enable this DML type by setting the system variable tidb_dml_type.
For more information, see documentation.
Support using some expressions to set default column values when creating a table (experimental) #50936 @zimulala
Before v8.0.0, when you create a table, the default value of a column is limited to strings, numbers, and dates. Starting from v8.0.0, you can use some expressions as the default column values. For example, you can set the default value of a column to
UUID()
. This feature helps you meet more diverse requirements.For more information, see documentation.
Support the
div_precision_increment
system variable #51501 @yibin87MySQL 8.0 supports the variable
div_precision_increment
, which specifies the number of digits by which to increase the scale of the result of a division operation performed using the/
operator. Before v8.0.0, TiDB does not support this variable, and division is performed to 4 decimal places. Starting from v8.0.0, TiDB supports this variable. You can specify the number of digits by which to increase the scale of the result of a division operation as desired.For more information, see documentation.
DB operations
PITR supports Amazon S3 Object Lock #51184 @RidRisR
Amazon S3 Object Lock can help prevent backup data from accidental or intentional deletion during a specified retention period, enhancing the security and integrity of data. Starting from v6.3.0, BR supports Amazon S3 Object Lock for snapshot backups, adding an additional layer of security for full backups. Starting from v8.0.0, PITR also supports Amazon S3 Object Lock. Whether for full backups or log data backups, the Object Lock feature ensures more reliable data protection, further strengthening the security of data backup and recovery and meeting regulatory requirements.
For more information, see documentation.
Support making invisible indexes visible at the session level #50653 @hawkingrei
By default, the optimizer does not select invisible indexes. This mechanism is usually used to evaluate whether to delete an index. If there is uncertainty about the potential performance impact of deleting an index, you have the option to set the index to invisible temporarily and promptly restore it to visible when needed.
Starting from v8.0.0, you can set the session-level system variable tidb_opt_use_invisible_indexes to
ON
to make the current session aware of invisible indexes. With this feature, you can create a new index and test its performance by making the index visible first, and then modifying the system variable in the current session for testing without affecting other sessions. This improvement enhances the safety of SQL tuning and helps to improve the stability of production databases.For more information, see documentation.
Support writing general logs to a separate file #51248 @Defined2014
The general log is a MySQL-compatible feature that logs all executed SQL statements to help diagnose issues. TiDB also supports this feature. You can enable it by setting the variable tidb_general_log. However, in previous versions, the content of general logs can only be written to the TiDB instance log along with other information, which is inconvenient for users who need to keep logs for a long time.
Starting from v8.0.0, you can write the general log to a specified file by setting the configuration item log.general-log-file to a valid filename. The general log follows the same rotation and retention policies as the instance log.
In addition, to reduce the disk space occupied by historical log files, TiDB v8.0.0 introduces a native log compression option. You can set the configuration item log.file.compression to
gzip
to automatically compress rotated logs using the gzip format.For more information, see documentation.
Observability
Support monitoring index usage statistics #49830 @YangKeao
Proper index design is a crucial prerequisite to maintaining database performance. TiDB v8.0.0 introduces the INFORMATION_SCHEMA.TIDB_INDEX_USAGE table, which records the statistics of all indexes on the current TiDB node, including the following information:
- The cumulative execution count of statements that scan the index
- The total number of rows scanned when accessing the index
- The selectivity distribution when scanning the index
- The time of the most recent access to the index
With this information, you can identify indexes that are not used by the optimizer and indexes with poor selectivity, thereby optimizing index design to improve database performance.
Additionally, TiDB v8.0.0 introduces a view sys.schema_unused_indexes, which is compatible with MySQL. This view shows indexes that have not been used since the last start of TiDB instances. For clusters upgraded from versions earlier than v8.0.0, the
sys
schema and the views are not created automatically. You can manually create them by referring to sys.schema_unused_indexes.For more information, see documentation.
Security
TiKV encryption at rest supports Google Key Management Service (Cloud KMS) (experimental) #8906 @glorv
TiKV ensures data security by encrypting stored data using the encryption at rest technique. The core of encryption at rest for security is key management. Starting from v8.0.0, you can manage the master key of TiKV using Google Cloud KMS to establish encryption-at-rest capabilities based on Cloud KMS, thereby enhancing the security of user data.
To enable encryption at rest based on Google Cloud KMS, you need to create a key on Google Cloud and then configure the
[security.encryption.master-key]
section in the TiKV configuration file.For more information, see documentation.
Enhance TiDB log desensitization #51306 @xhebox
The enhancement of TiDB log desensitization is based on marking SQL text information in log files, facilitating the safe display of sensitive data when users view the logs. You can control whether to desensitize log information to enable secure use of TiDB logs in different scenarios, enhancing the security and flexibility of using log desensitization. To use this feature, set the system variable
tidb_redact_log
toMARKER
. This marks the SQL text in TiDB logs. When you view the logs, sensitive data is securely displayed based on the markers, thus protecting the log information.For more information, see documentation.
Data migration
TiCDC adds support for the Simple protocol #9898 @3AceShowHand
TiCDC introduces a new protocol, the Simple protocol. This protocol provides in-band schema tracking capabilities by embedding table schema information in DDL and BOOTSTRAP events.
For more information, see documentation.
TiCDC adds support for the Debezium format protocol #1799 @breezewish
TiCDC can now publish data change events to a Kafka sink using a protocol that generates event messages in a Debezium style format. This helps to simplify the migration from MySQL to TiDB for users who are currently using Debezium to pull data from MySQL for downstream processing.
For more information, see documentation.
DM supports using a user-provided secret key to encrypt and decrypt passwords of source and target databases #9492 @D3Hunter
In earlier versions, DM uses a built-in fixed secret key with relatively low security. Starting from v8.0.0, you can upload and specify a secret key file for encrypting and decrypting passwords of upstream and downstream databases. In addition, you can replace the secret key file as needed to enhance data security.
For more information, see documentation.
Supports the
IMPORT INTO ... FROM SELECT
syntax to enhance theIMPORT INTO
functionality (experimental) #49883 @D3HunterIn earlier TiDB versions, importing query results into a target table could only be done using the
INSERT INTO ... SELECT
statement, which is relatively inefficient in some large dataset scenarios. Starting from v8.0.0, TiDB enables you to useIMPORT INTO ... FROM SELECT
to import the results of aSELECT
query into an empty TiDB target table, which achieves up to 8 times the performance ofINSERT INTO ... SELECT
and significantly reduces the import time.In addition, you can use
IMPORT INTO ... FROM SELECT
to import historical data queried with AS OF TIMESTAMP.For more information, see documentation.
TiDB Lightning simplifies conflict resolution strategies and supports handling conflicting data using the
replace
strategy (experimental) #51036 @lyzx2001In earlier versions, TiDB Lightning has one data conflict resolution strategy for the logical import mode and two data conflict resolution strategies for the physical import mode, which are not easy to understand and configure.
Starting from v8.0.0, TiDB Lightning deprecates the old version of conflict detection strategy for the physical import mode, enables you to control the conflict detection strategy for both logical and physical import modes via the conflict.strategy parameter, and simplifies the configuration of this parameter. In addition, in the physical import mode, the
replace
strategy now supports retaining the latest data and overwriting the old data when the import detects data with primary key or unique key conflicts.For more information, see documentation.
Global Sort becomes generally available (GA), improving the performance and stability of
IMPORT INTO
significantly #45719 @lance6716Before v7.4.0, when executing
IMPORT INTO
tasks using the Distributed eXecution Framework (DXF), TiDB only locally sorts part of the data before importing it into TiKV due to limited local storage space. This results in significant overlap of the imported data in TiKV, requiring TiKV to perform additional compaction operations during import and affecting the TiKV performance and stability.With the Global Sort experimental feature introduced in v7.4.0, TiDB can temporarily store the data to be imported in an external storage (such as Amazon S3) for global sorting before importing it into TiKV, which eliminates the need for TiKV compaction operations during import. In v8.0.0, Global Sort becomes GA. This feature reduces the resource consumption of TiKV and significantly improves the performance and stability of
IMPORT INTO
. If you enable the Global Sort, eachIMPORT INTO
task supports importing data within 40 TiB.For more information, see documentation.
Compatibility changes
Note
This section provides compatibility changes you need to know when you upgrade from v7.6.0 to the current version (v8.0.0). If you are upgrading from v7.5.0 or earlier versions to the current version, you might also need to check the compatibility changes introduced in intermediate versions.
Behavior changes
- Prohibit setting require_secure_transport to
ON
in Security Enhanced Mode (SEM) to prevent potential connectivity issues for users. #47665 @tiancaiamao - DM removes the fixed secret key for encryption and decryption and enables you to customize a secret key for encryption and decryption. If encrypted passwords are used in data source configurations and migration task configurations before the upgrade, you need to refer to the upgrade steps in Customize a Secret Key for DM Encryption and Decryption for additional operations. #9492 @D3Hunter
- Before v8.0.0, after enabling the acceleration of
ADD INDEX
andCREATE INDEX
(tidb_ddl_enable_fast_reorg = ON
), the encoded index key ingests data to TiKV with a fixed concurrency of16
, which cannot be dynamically adjusted according to the downstream TiKV capacity. Starting from v8.0.0, you can adjust the concurrency using the tidb_ddl_reorg_worker_cnt system variable. The default value is4
. Compared with the previous default value of16
, the new default value reduces performance when ingesting indexed key-value pairs. You can adjust this system variable based on the workload of your cluster.
MySQL compatibility
- The
KEY
partition type supports statements with an empty list of partition fields, which is consistent with the behavior of MySQL.
System variables
Variable name | Change type | Description |
---|---|---|
tidb_disable_txn_auto_retry | Deprecated | Starting from v8.0.0, this system variable is deprecated, and TiDB no longer supports automatic retries of optimistic transactions. It is recommended to use the Pessimistic transaction mode. If you encounter optimistic transaction conflicts, you can capture the error and retry transactions in your application. |
tidb_ddl_version | Renamed | Controls whether to enable TiDB DDL V2. Starting from v8.0.0, this variable is renamed to tidb_enable_fast_create_table to better reflect its purpose. |
tidb_enable_collect_execution_info | Modified | Adds a control to whether to record the usage statistics of indexes. The default value is ON . |
tidb_redact_log | Modified | Controls how to handle user information in SAL text when logging TiDB logs and slow logs. The value options are OFF (indicating not processing user information in the log) and ON (indicating hiding user information in the log). To provide a richer way of processing user information in the log, the MARKER option is added in v8.0.0 to support marking log information. |
div_precision_increment | Newly added | Controls the number of digits by which to increase the scale of the result of a division operation performed using the / operator. This variable is the same as MySQL. |
tidb_dml_type | Newly added | Controls the execution mode of DML statements. The value options are “standard” and “bulk” . |
tidb_enable_auto_analyze_priority_queue | Newly added | Controls whether to enable the priority queue to schedule the tasks of automatically collecting statistics. When this variable is enabled, TiDB prioritizes collecting statistics for the tables that most need statistics. |
tidb_enable_concurrent_hashagg_spill | Newly added | Controls whether TiDB supports disk spill for the concurrent HashAgg algorithm. When it is ON , disk spill can be triggered for the concurrent HashAgg algorithm. This variable will be deprecated when this feature is generally available in a future release. |
tidb_enable_fast_create_table | Newly added | Controls whether to enable TiDB Accerates Table Creation. Set the value to ON to enable it and OFF to disable it. The default value is ON . When this variable is enabled, TiDB accelerates table creation by using CREATE TABLE. |
tidb_load_binding_timeout | Newly added | Controls the timeout of loading bindings. If the execution time of loading bindings exceeds this value, the loading will stop. |
tidb_low_resolution_tso_update_interval | Newly added | Controls the interval for updating TiDB cache timestamp. |
tidb_opt_ordering_index_selectivity_ratio | Newly added | Controls the estimated number of rows for an index that matches the SQL statement ORDER BY when there are ORDER BY and LIMIT clauses in a SQL statement, but some filter conditions not covered by the index. The default value is -1 , which means to disable this system variable. |
tidb_opt_use_invisible_indexes | Newly added | Controls whether the optimizer can select invisible indexes for query optimization in the current session. When the variable is set to ON , the optimizer can select invisible indexes for query optimization in the session. |
tidb_schema_cache_size | Newly added | Controls the upper limit of memory that can be used for caching the schema information to avoid occupying too much memory. When this feature is enabled, the LRU algorithm is used to cache the required tables, effectively reducing the memory occupied by the schema information. |
Configuration file parameters
Configuration file | Configuration parameter | Change type | Description |
---|---|---|---|
TiDB | instance.tidb_enable_collect_execution_info | Modified | Adds a control to whether to record the usage statistics of indexes. The default value is true . |
TiDB | tls-version | Modified | This parameter no longer supports “TLSv1.0” and “TLSv1.1” . Now it only supports “TLSv1.2” and “TLSv1.3” . |
TiDB | log.file.compression | Newly added | Specifies the compression format of the polling log. The default value is null, which means that the polling log is not compressed. |
TiDB | log.general-log-file | Newly added | Specifies the file to save the general log to. The default is null, which means that the general log will be written to the instance file. |
TiDB | tikv-client.enable-replica-selector-v2 | Newly added | Controls whether to use the new version of the Region replica selector when sending RPC requests to TiKV. The default value is true . |
TiKV | log-backup.initial-scan-rate-limit | Modified | Adds a limit of 1MiB as the minimum value. |
TiKV | raftstore.store-io-pool-size | Modified | Changes the default value from 0 to 1 to improve TiKV performance, meaning that the size of the StoreWriter thread pool now defaults to 1 . |
TiKV | rocksdb.defaultcf.titan.blob-cache-size | Modified | Starting from v8.0.0, TiKV introduces the shared-blob-cache configuration item and enables it by default, so there is no need to set blob-cache-size separately. The configuration of blob-cache-size only takes effect when shared-blob-cache is set to false . |
TiKV | security.encryption.master-key.vendor | Modified | Adds gcp as an available type for the service provider. |
TiKV | rocksdb.defaultcf.titan.shared-blob-cache | Newly added | Controls whether to enable the shared cache for Titan blob files and RocksDB block files. The default value is true . |
TiKV | security.encryption.master-key.gcp.credential-file-path | Newly added | Specifies the path to the Google Cloud authentication credentials file when security.encryption.master-key.vendor is gcp . |
TiDB Lightning | tikv-importer.duplicate-resolution | Deprecated | Controls whether to detect and resolve unique key conflicts in physical import mode. Starting from v8.0.0, it is replaced by conflict.strategy. |
TiDB Lightning | conflict.precheck-conflict-before-import | Newly added | Controls whether to enable preprocess conflict detection, which checks conflicts in data before importing it to TiDB. The default value of this parameter is false , which means that TiDB Lightning only checks conflicts after the data import. This parameter can be used only in the physical import mode (tikv-importer.backend = “local” ). |
TiDB Lightning | logical-import-batch-rows | Newly added | Controls the maximum number of rows inserted per transaction in the logical import mode. The default value is 65536 rows. |
TiDB Lightning | logical-import-batch-size | Newly added | Controls the maximum size of each SQL query executed on the downstream TiDB server in the logical import mode. The default value is “96KiB” . The unit can be KB, KiB, MB, or MiB. |
Data Migration | secret-key-path | Newly added | Specifies the file path of the secret key, which is used to encrypt and decrypt upstream and downstream passwords. The file must contain a 64-character hexadecimal AES-256 secret key. |
TiCDC | tls-certificate-file | Newly added | Specifies the path to the encrypted certificate file on the client, which is required when Pulsar enables TLS encrypted transmission. |
TiCDC | tls-key-file-path | Newly added | Specifies the path to the encrypted private key on the client, which is required when Pulsar enables TLS encrypted transmission. |
System tables
- Add new system tables INFORMATION_SCHEMA.TIDB_INDEX_USAGE and INFORMATION_SCHEMA.CLUSTER_TIDB_INDEX_USAGE to record index usage statistics on TiDB nodes.
- Add a new system schema sys and a new view sys.schema_unused_indexes, which records indexes that have not been used since the last start of TiDB.
Deprecated features
- Starting from v8.0.0, the tidb_disable_txn_auto_retry system variable is deprecated, and TiDB no longer supports automatic retries of optimistic transactions. As an alternative, when encountering optimistic transaction conflicts, you can capture the error and retry transactions in your application, or use the Pessimistic transaction mode instead.
- Starting from v8.0.0, TiDB no longer supports the TLSv1.0 and TLSv1.1 protocols. You must upgrade TLS to TLSv1.2 or TLSv1.3.
- Starting from v8.0.0, TiDB Lightning deprecates the old version of conflict detection strategy for the physical import mode, and enables you to control the conflict detection strategy for both logical and physical import modes via the conflict.strategy parameter. The duplicate-resolution parameter for the old version of conflict detection will be removed in a future release.
- It is planned to redesign the auto-evolution of execution plan bindings in subsequent releases, and the related variables and behavior will change.
Improvements
TiDB
- Improve the performance of executing the
CREATE TABLE
DDL statement by 10 times and support linear scalability #50052 @GMHDBJD - Support submitting 16
IMPORT INTO ... FROM FILE
tasks simultaneously, facilitating bulk data import into target tables and significantly improving the efficiency and performance of importing data files #49008 @D3Hunter - Improve the performance of spilling data to disk for the
Sort
operator #47733 @xzhangxian1008 - Support canceling queries during spilling data to disk, which optimizes the exit mechanism of the data spill feature #50511 @wshwsh12
- Support using an index that matches partial conditions to construct Index Join when processing table join queries with multiple equal conditions #47233 @winoros
- Enhance the capability of Index Merge to identify sorting requirements in queries and select indexes that meet the sorting requirements #48359 @AilinKid
- When the
Apply
operator is not executed concurrently, TiDB enables you to view the name of the operator that blocks the concurrency by executingSHOW WARNINGS
#50256 @hawkingrei - Optimize the index selection for
point get
queries by selecting the most optimal index for queries when all indexes supportpoint get
queries #50184 @elsa0520 - Temporarily adjust the priority of statistics synchronously loading tasks to high to avoid widespread timeouts during TiKV high loads, as these timeouts might result in statistics not being loaded #50332 @winoros
- When the
PREPARE
statement fails to hit the execution plan cache, TiDB enables you to view the reason by executingSHOW WARNINGS
#50407 @hawkingrei - Improve the accuracy of query estimation information when the same row of data is updated multiple times #47523 @terry1purcell
- Index Merge supports embedding multi-value indexes and
OR
operators inAND
predicates #51778 @time-and-fate - When
force-init-stats
is set totrue
, TiDB waits for statistics initialization to finish before providing services during TiDB startup. This setting no longer blocks the startup of HTTP servers, which enables users to continue monitoring #50854 @hawkingrei - MemoryTracker can track the memory usage of the
IndexLookup
operator #45901 @solotzg - MemoryTracker can track the memory usage of the
MemTableReaderExec
operator #51456 @wshwsh12 - Support loading Regions in batch from PD to speed up the conversion process from the KV range to Regions when querying large tables #51326 @SeaRise
- Optimize the query performance of the system tables
INFORMATION_SCHEMA.TABLES
,INFORMATION_SCHEMA.STATISTICS
,INFORMATION_SCHEMA.KEY_COLUMN_USAGE
, andINFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS
. Compared with earlier versions, the performance has been improved by up to 100 times. #50305 @ywqzzy
- Improve the performance of executing the
TiKV
- Enhance TSO verification and detection to improve the robustness of the cluster TSO when the configuration or operation is improper #16545 @cfzjywxk
- Optimize the logic of cleaning up pessimistic locks to improve the processing performance of uncommitted transactions #16158 @cfzjywxk
- Introduce unified health control for TiKV to reduce the impact of abnormal single TiKV node on cluster access performance. You can disable this optimization by setting tikv-client.enable-replica-selector-v2 to
false
. #16297 #1104 #1167 @MyonKeminta @zyguan @crazycs520 - The PD client uses the metadata storage interface to replace the previous global configuration interface #14484 @HuSharp
- Enhance the scanning performance by determining the data loading behavior through write cf stats #16245 @Connor1996
- Check the latest heartbeat for nodes being deleted and voters being demoted during the Raft conf change process to ensure that this behavior does not make the Region inaccessible #15799 @tonyxuqqi
- Add Flush and BufferBatchGet interfaces for Pipelined DML #16291 @ekexium
- Add monitoring and alerting for cgroup CPU and memory limits #16392 @pingandb
- Add CPU monitoring for Region workers and snapshot generation workers #16562 @Connor1996
- Add slow logs for peer and store messages #16600 @Connor1996
PD
- Enhance the service discovery capability of the PD client to improve its high availability and load balancing #7576 @CabinfeverB
- Enhance the retry mechanism of the PD client #7673 @JmPotato
- Add monitoring and alerting for cgroup CPU and memory limits #7716 #7918 @pingandb @rleungx
- Improve the performance and high availability when using etcd watch #7738 #7724 #7689 @lhy1024
- Add more monitoring metrics for heartbeat to better analyze performance bottlenecks #7868 @nolouch
- Reduce the impact of the etcd leader on the PD leader #7499 @JmPotato @HuSharp
- Enhance the detection mechanism for unhealthy etcd nodes #7730 @JmPotato @HuSharp
- Optimize the output of GC safepoint in pd-ctl #7767 @nolouch
- Support dynamic modification of the historical window configuration in the hotspot scheduler #7877 @lhy1024
- Reduce the lock contention issue in creating operators #7837 @Leavrth
- Adjust GRPC configurations to improve availability #7821 @rleungx
TiFlash
Tools
Backup & Restore (BR)
- Introduce a new restore parameter
--load-stats
for thebr
command-line tool, which controls whether to restore statistics #50568 @Leavrth - Introduce a new restore parameter
--tikv-max-restore-concurrency
for thebr
command-line tool, which controls the maximum number of download and ingest files for each TiKV node. This parameter also controls the memory consumption of a BR node by controlling the maximum length of the job queue. #51621 @3pointer - Enhance restore performance by enabling the coarse-grained Region scatter algorithm to adaptively obtain concurrent parameters #50701 @3pointer
- Display the
log
command in the command-line help information ofbr
#50927 @RidRisR - Support pre-allocating Table ID during the restore process to maximize the reuse of Table ID and improve restore performance #51736 @Leavrth
- Disable the GC memory limit tuner feature within TiDB when using BR to avoid OOM issues #51078 @Leavrth
- Improve the speed of merging SST files during data restore by using a more efficient algorithm #50613 @Leavrth
- Support creating databases in batch during data restore #50767 @Leavrth
- Print the information of the slowest Region that affects global checkpoint advancement in logs and metrics during log backups #51046 @YuJuncen
- Improve the table creation performance of the
RESTORE
statement in scenarios with large datasets #48301 @Leavrth
- Introduce a new restore parameter
TiCDC
- Optimize the memory consumption of
RowChangedEvent
to reduce memory consumption when TiCDC replicates data #10386 @lidezhu - Verify that the start-ts parameter is valid when creating and resuming a changefeed task #10499 @3AceShowHand
- Optimize the memory consumption of
TiDB Data Migration (DM)
- In a MariaDB primary-secondary replication scenario, where the migration path is: MariaDB primary instance -> MariaDB secondary instance -> DM -> TiDB, when
gtid_strict_mode = off
and the GTID of the MariaDB secondary instance is not strictly incrementing (for example, there is data writing to the MariaDB secondary instance), the DM task will report an errorless than global checkpoint position
. Starting from v8.0.0, TiDB is compatible with this scenario and data can be migrated downstream normally. #10741 @okJiang
- In a MariaDB primary-secondary replication scenario, where the migration path is: MariaDB primary instance -> MariaDB secondary instance -> DM -> TiDB, when
TiDB Lightning
- Support configuring the maximum number of rows in a batch in logical import mode using logical-import-batch-rows #46607 @kennytm
- TiDB Lightning reports an error when the space of TiFlash is insufficient #50324 @okJiang
Bug fixes
TiDB
- Fix the issue that
auto analyze
is triggered multiple times when there is no data change #51775 @hi-rustin - Fix the issue that the
auto analyze
concurrency is set incorrectly #51749 @hawkingrei - Fix the issue of index inconsistency caused by adding multiple indexes using a single SQL statement #51746 @tangenta
- Fix the
Column ... in from clause is ambiguous
error that might occur when a query usesNATURAL JOIN
#32044 @AilinKid - Fix the issue of wrong query results due to TiDB incorrectly eliminating constant values in
group by
#38756 @hi-rustin - Fix the issue that the
LEADING
hint does not take effect inUNION ALL
statements #50067 @hawkingrei - Fix the issue that
BIT
type columns might cause query errors due to decode failures when they are involved in calculations of some functions #49566 #50850 #50855 @jiyfhust - Fix the issue that TiDB might panic when performing a rolling upgrade using
tiup cluster upgrade/start
due to an interaction issue with PD #50152 @zimulala - Fix the issue that executing
UNIQUE
index lookup with anORDER BY
clause might cause an error #49920 @jackysp - Fix the issue that TiDB returns wrong query results when processing
ENUM
orSET
types by constant propagation #49440 @winoros - Fix the issue that TiDB might panic when a query contains the Apply operator and the
fatal error: concurrent map writes
error occurs #50347 @SeaRise - Fix the issue that the control of
SET_VAR
for variables of the string type might become invalid #50507 @qw4990 - Fix the issue that the
SYSDATE()
function incorrectly uses the time in the plan cache whentidb_sysdate_is_now
is set to1
#49299 @hawkingrei - Fix the issue that when executing the
CREATE GLOBAL BINDING
statement, if the schema name is in uppercase, the binding does not take effect #50646 @qw4990 - Fix the issue that
Index Path
selects duplicated indexes #50496 @AilinKid - Fix the issue that
PLAN REPLAYER
fails to load bindings when theCREATE GLOBAL BINDING
statement containsIN()
#43192 @King-Dylan - Fix the issue that when multiple
analyze
tasks fail, the failure reasons are not recorded correctly #50481 @hi-rustin - Fix the issue that
tidb_stats_load_sync_wait
does not take effect #50872 @jiyfhust - Fix the issue that
max_execute_time
settings at multiple levels interfere with each other #50914 @jiyfhust - Fix the issue of thread safety caused by concurrent updating of statistics #50835 @hi-rustin
- Fix the issue that executing
auto analyze
on a partition table might cause TiDB to panic #51187 @hi-rustin - Fix the issue that SQL bindings might not work when
IN()
in a SQL statement contains a different number of values #51222 @hawkingrei - Fix the issue that TiDB cannot correctly convert the type of a system variable in an expression #43527 @hi-rustin
- Fix the issue that TiDB does not listen to the corresponding port when
force-init-stats
is configured #51473 @hawkingrei - Fix the issue that in
determinate
mode (tidb_opt_objective='determinate'
), if a query does not contain predicates, statistics might not be loaded #48257 @time-and-fate - Fix the issue that the
init-stats
process might cause TiDB to panic and theload stats
process to quit #51581 @hawkingrei - Fix the issue that the query result is incorrect when the
IN()
predicate containsNULL
#51560 @winoros - Fix the issue that blocked DDL statements are not displayed in the MDL View when a DDL task involves multiple tables #47743 @wjhuang2016
- Fix the issue that the
processed_rows
of theANALYZE
task on a table might exceed the total number of rows in that table #50632 @hawkingrei - Fix the goroutine leak issue that might occur when the
HashJoin
operator fails to spill to disk #50841 @wshwsh12 - Fix the goroutine leak issue that occurs when the memory usage of CTE queries exceed limits #50337 @guo-shaoge
- Fix the
Can't find column ...
error that might occur when aggregate functions are used for group calculations #50926 @qw4990 - Fix the issue that DDL operations such as renaming tables are stuck when the
CREATE TABLE
statement contains specific partitions or constraints #50972 @lcwangchao - Fix the issue that the monitoring metric
tidb_statistics_auto_analyze_total
on Grafana is not displayed as an integer #51051 @hawkingrei - Fix the issue that the
tidb_gogc_tuner_threshold
system variable is not adjusted accordingly after thetidb_server_memory_limit
variable is modified #48180 @hawkingrei - Fix the issue that the
index out of range
error might occur when a query involves JOIN operations #42588 @AilinKid - Fix the issue that getting the default value of a column returns an error if the column default value is dropped #50043 #51324 @crazycs520
- Fix the issue that wrong results might be returned when TiFlash late materialization processes associated columns #49241 #51204 @Lloyd-Pottiger
- Fix the issue that the
LIKE()
function might return wrong results when processing binary collation inputs #50393 @yibin87 - Fix the issue that the
JSON_LENGTH()
function returns wrong results when the second parameter isNULL
#50931 @SeaRise - Fix the issue that
CAST(AS DATETIME)
might lose time precision under certain circumstances #49555 @SeaRise - Fix the issue that parallel
Apply
might generate incorrect results when the table has a clustered index #51372 @guo-shaoge - Fix the issue that
ALTER TABLE ... COMPACT TIFLASH REPLICA
might incorrectly end when the primary key type isVARCHAR
#51810 @breezewish - Fix the issue that the check on the
NULL
value of theDEFAULT NULL
attribute is incorrect when exchanging partitioned tables using theEXCHANGE PARTITION
statement #47167 @jiyfhust - Fix the issue that the partition table definition might cause wrong behavior when using a non-UTF8 character set #49251 @YangKeao
- Fix the issue that incorrect default values are displayed in the
INFORMATION_SCHEMA.VARIABLES_INFO
table for some system variables #49461 @jiyfhust - Fix the issue that no error is reported when empty strings are used as database names in some cases #45873 @yoshikipom
- Fix the issue that the
SPLIT TABLE ... INDEX
statement might cause TiDB to panic #50177 @Defined2014 - Fix the issue that querying a partitioned table of
KeyPartition
type might cause an error #50206 #51313 #51196 @time-and-fate @jiyfhust @mjonss - Fix the issue that querying a Hash partitioned table might produce incorrect results #50427 @Defined2014
- Fix the issue that opentracing does not work correctly #50508 @Defined2014
- Fix the issue that the error message is not complete when
ALTER INSTANCE RELOAD TLS
reports an error #50699 @dveeden - Fix the issue that the
AUTO_INCREMENT
attribute causes non-consecutive IDs due to unnecessary transaction conflicts when assigning auto-increment IDs #50819 @tiancaiamao - Fix the issue of incomplete stack information in TiDB logs for some errors #50849 @tiancaiamao
- Fix the issue of excessive memory usage in some queries when the number in the
LIMIT
clause is too large #51188 @Defined2014 - Fix the issue that the TTL feature causes data hotspots due to incorrect data range splitting in some cases #51527 @lcwangchao
- Fix the issue that the
SET
statement does not take effect when it appears on the first line of an explicit transaction #51387 @YangKeao - Fix the issue that querying JSON of
BINARY
type might cause an error in some cases #51547 @YangKeao - Fix the issue that TTL does not handle the transition for daylight saving time adjustments correctly when calculating expiration times #51675 @lcwangchao
- Fix the issue that the
SURVIVAL_PREFERENCES
attribute might not appear in the output of theSHOW CREATE PLACEMENT POLICY
statement under certain conditions #51699 @lcwangchao - Fix the issue that the configuration file does not take effect when it contains an invalid configuration item #51399 @Defined2014
- Fix the issue that
TiKV
- Fix the issue that enabling
tidb_enable_row_level_checksum
might cause TiKV to panic #16371 @cfzjywxk - Fix the issue that hibernated Regions are not promptly awakened in exceptional circumstances #16368 @LykxSassinator
- Fix the issue that the entire Region becomes unavailable when one replica is offline, by checking the last heartbeat time of all replicas of the Region before taking a node offline #16465 @tonyxuqqi
- Fix the issue that JSON integers greater than the maximum
INT64
value but less than the maximumUINT64
value are parsed asFLOAT64
by TiKV, resulting in inconsistency with TiDB #16512 @YangKeao - Fix the issue that the monitoring metric
tikv_unified_read_pool_thread_count
has no data in some cases #16629 @YuJuncen
- Fix the issue that enabling
PD
- Fix the issue that data race occurs when the
MergeLabels
function is called #7535 @lhy1024 - Fix the issue that there is no output when the
evict-leader-scheduler
interface is called #7672 @CabinfeverB - Fix the issue that the PD monitoring item
learner-peer-count
does not synchronize the old value after a leader switch #7728 @CabinfeverB - Fix the memory leak issue that occurs when
watch etcd
is not turned off correctly #7807 @rleungx - Fix the issue that some TSO logs do not print the error cause #7496 @CabinfeverB
- Fix the issue that there are unexpected negative monitoring metrics after restart #4489 @lhy1024
- Fix the issue that the Leader lease expires later than the log time #7700 @CabinfeverB
- Fix the issue that TiDB panics when TLS switches between TiDB (the PD client) and PD are inconsistent #7900 #7902 #7916 @CabinfeverB
- Fix the issue that Goroutine leaks when it is not closed properly #7782 @HuSharp
- Fix the issue that pd-ctl cannot remove a scheduler that contains special characters #7798 @JmPotato
- Fix the issue that the PD client might be blocked when obtaining TSO #7864 @CabinfeverB
- Fix the issue that data race occurs when the
TiFlash
- Fix the issue that TiFlash might panic due to unstable network connections with PD during replica migration #8323 @JaySon-Huang
- Fix the issue that the memory usage increases significantly due to slow queries #8564 @JinheLin
- Fix the issue that removing and then re-adding TiFlash replicas might lead to data corruption in TiFlash #8695 @JaySon-Huang
- Fix the issue that TiFlash replica data might be accidentally deleted after performing point-in-time recovery (PITR) or executing
FLASHBACK CLUSTER TO
, which might result in data anomalies #8777 @JaySon-Huang - Fix the issue that TiFlash panics after executing
ALTER TABLE ... MODIFY COLUMN ... NOT NULL
, which changes nullable columns to non-nullable #8419 @JaySon-Huang - Fix the issue that in the disaggregated storage and compute architecture, queries might be permanently blocked after network isolation #8806 @JinheLin
- Fix the issue that in the disaggregated storage and compute architecture, TiFlash might panic during shutdown #8837 @JaySon-Huang
- Fix the issue that TiFlash might crash due to data race in case of remote reads #8685 @solotzg
- Fix the issue that the
CAST(AS JSON)
function does not de-duplicate the JSON object key #8712 @SeaRise - Fix the issue that the
ENUM
column might cause TiFlash to crash during chunk encoding #8674 @yibin87
Tools
Backup & Restore (BR)
- Fix the issue that the log backup checkpoint gets stuck when a Region is split or merged immediately after it becomes a leader #16469 @YuJuncen
- Fix the issue that TiKV panics when a full backup fails to find a peer in some extreme cases #16394 @Leavrth
- Fix the issue that log backup gets stuck after changing the TiKV IP address on the same node #50445 @3pointer
- Fix the issue that BR cannot retry when encountering an error while reading file content from S3 #49942 @Leavrth
- Fix the issue that when resuming from a checkpoint after data restore fails, an error
the target cluster is not fresh
occurs #50232 @Leavrth - Fix the issue that stopping a log backup task causes TiDB to crash #50839 @YuJuncen
- Fix the issue that data restore is slowed down due to absence of a leader on a TiKV node #50566 @Leavrth
- Fix the issue that full restore still requires the target cluster to be empty after the
--filter
option is specified #51009 @3pointer
TiCDC
- Fix the issue that the file sequence number generated by the storage service might not increment correctly when using the storage sink #10352 @CharlesCheung96
- Fix the issue that TiCDC returns the
ErrChangeFeedAlreadyExists
error when concurrently creating multiple changefeeds #10430 @CharlesCheung96 - Fix the issue that after filtering out
add table partition
events is configured inignore-event
, TiCDC does not replicate other types of DML changes for related partitions to the downstream #10524 @CharlesCheung96 - Fix the issue that the changefeed reports an error after
TRUNCATE PARTITION
is executed on the upstream table #10522 @sdojjy - Fix the issue that
snapshot lost caused by GC
is not reported in time when resuming a changefeed and thecheckpoint-ts
of the changefeed is smaller than the GC safepoint of TiDB #10463 @sdojjy - Fix the issue that TiCDC fails to validate
TIMESTAMP
type checksum due to time zone mismatch after data integrity validation for single-row data is enabled #10573 @3AceShowHand - Fix the issue that the Syncpoint table might be incorrectly replicated #10576 @asddongmen
- Fix the issue that OAuth2.0, TLS, and mTLS cannot be enabled properly when using Apache Pulsar as the downstream #10602 @asddongmen
- Fix the issue that a changefeed might get stuck when TiKV upgrades, restarts, or evicts a leader #10584 @asddongmen
- Fix the issue that data is written to a wrong CSV file due to wrong BarrierTS in scenarios where DDL statements are executed frequently #10668 @lidezhu
- Fix the issue that data race in the KV client causes TiCDC to panic #10718 @asddongmen
- Fix the issue TiCDC panics when scheduling table replication tasks #10613 @CharlesCheung96
TiDB Data Migration (DM)
TiDB Lightning
- Fix the performance regression issue caused by checking TiKV space #43636 @lance6716
- Fix the issue that TiDB Lightning reports an error when encountering invalid symbolic link files during file scanning #49423 @lance6716
- Fix the issue that TiDB Lightning fails to correctly parse date values containing
0
whenNO_ZERO_IN_DATE
is not included insql_mode
#50757 @GMHDBJD
Contributors
We would like to thank the following contributors from the TiDB community: