TiDB 7.2.0 Release Notes
Release date: June 29, 2023
TiDB version: 7.2.0
Quick access: Quick start
7.2.0 introduces the following key features and improvements:
Category | Feature | Description |
---|---|---|
Scalability and Performance | Resource groups support managing runaway queries (experimental) | You can now manage query timeout with more granularity, allowing for different behaviors based on query classifications. Queries meeting your specified threshold can be deprioritized or terminated. |
TiFlash supports the pipeline execution model (experimental) | TiFlash supports a pipeline execution model to optimize thread resource control. | |
SQL | Support a new SQL statement, IMPORT INTO, for data import (experimental) | To simplify the deployment and maintenance of TiDB Lightning, TiDB introduces a new SQL statement IMPORT INTO , which integrates physical import mode of TiDB Lightning, including remote import from Amazon S3 or Google Cloud Storage (GCS) directly into TiDB. |
DB Operations and Observability | DDL supports pause and resume operations (experimental) | This new capability lets you temporarily suspend resource-intensive DDL operations, such as index creation, to conserve resources and minimize the impact on online traffic. You can seamlessly resume these operations when ready, without the need to cancel and restart. This feature enhances resource utilization, improves user experience, and streamlines schema changes. |
Feature details
Performance
Support pushing down the following two window functions to TiFlash #7427 @xzhangxian1008
FIRST_VALUE
LAST_VALUE
TiFlash supports the pipeline execution model (experimental) #6518 @SeaRise
Prior to v7.2.0, each task in the TiFlash engine must individually request thread resources during execution. TiFlash controls the number of tasks to limit thread resource usage and prevent overuse, but this issue could not be completely eliminated. To address this problem, starting from v7.2.0, TiFlash introduces a pipeline execution model. This model centrally manages all thread resources and schedules task execution uniformly, maximizing the utilization of thread resources while avoiding resource overuse. To enable or disable the pipeline execution model, modify the tidb_enable_tiflash_pipeline_model system variable.
For more information, see documentation.
TiFlash reduces the latency of schema replication #7630 @hongyunyan
When the schema of a table changes, TiFlash needs to replicate the latest schema from TiKV in a timely manner. Before v7.2.0, when TiFlash accesses table data and detects a table schema change within a database, TiFlash needs to replicate the schemas of all tables in this database again, including those tables without TiFlash replicas. As a result, in a database with a large number of tables, even if you only need to read data from a single table using TiFlash, you might experience significant latency to wait for TiFlash to complete the schema replication of all tables.
In v7.2.0, TiFlash optimizes the schema replication mechanism and supports only replicating schemas of tables with TiFlash replicas. When a schema change is detected for a table with TiFlash replicas, TiFlash only replicates the schema of that table, which reduces the latency of schema replication of TiFlash and minimizes the impact of DDL operations on TiFlash data replication. This optimization is automatically applied and does not require any manual configuration.
Improve the performance of statistics collection #44725 @xuyifangreeneyes
TiDB v7.2.0 optimizes the statistics collection strategy, skipping some of the duplicate information and information that is of little value to the optimizer. The overall speed of statistics collection has been improved by 30%. This improvement allows TiDB to update the statistics of the database in a more timely manner, making the generated execution plans more accurate, thus improving the overall database performance.
By default, statistics collection skips the columns of the
JSON
,BLOB
,MEDIUMBLOB
, andLONGBLOB
types. You can modify the default behavior by setting the tidb_analyze_skip_column_types system variable. TiDB supports skipping theJSON
,BLOB
, andTEXT
types and their subtypes.For more information, see documentation.
Improve the performance of checking data and index consistency #43693 @wjhuang2016
The ADMIN CHECK [TABLE|INDEX] statement is used to check the consistency between data in a table and its corresponding indexes. In v7.2.0, TiDB optimizes the method for checking data consistency and improves the execution efficiency of ADMIN CHECK [TABLE|INDEX] greatly. In scenarios with large amounts of data, this optimization can provide a performance boost of hundreds of times.
The optimization is enabled by default (tidb_enable_fast_table_check is
ON
by default) to greatly reduce the time required for data consistency checks in large-scale tables and enhance operational efficiency.For more information, see documentation.
Reliability
Automatically manage queries that consume more resources than expected (experimental) #43691 @Connor1996 @CabinfeverB @glorv @HuSharp @nolouch
The most common challenge to database stability is the degradation of overall database performance caused by abrupt SQL performance problems. There are many causes for SQL performance issues, such as new SQL statements that have not been fully tested, drastic changes in data volume, and abrupt changes in execution plans. These issues are difficult to completely avoid at the root. TiDB v7.2.0 provides the ability to manage queries that consume more resources than expected. This feature can quickly reduce the scope of impact when a performance issue occurs.
To manage these queries, you can set the maximum execution time of queries for a resource group. When the execution time of a query exceeds this limit, the query is automatically deprioritized or cancelled. You can also set a period of time to immediately match identified queries by text or execution plan. This helps prevent high concurrency of the problematic queries during the identification phase that could consume more resources than expected.
Automatic management of queries that consume more resources than expected provides you with an effective means to quickly respond to unexpected query performance problems. This feature can reduce the impact of the problem on overall database performance, thereby improving database stability.
For more information, see documentation.
Enhance the capability of creating a binding according to a historical execution plan #39199 @qw4990
TiDB v7.2.0 enhances the capability of creating a binding according to a historical execution plan. This feature improves the parsing and binding process for complex statements, making the bindings more stable, and supports the following new hints:
For more information, see documentation.
Introduce the Optimizer Fix Controls mechanism to provide fine-grained control over optimizer behaviors #43169 @time-and-fate
To generate more reasonable execution plans, the behavior of the TiDB optimizer evolves over product iterations. However, in some particular scenarios, the changes might lead to performance regression. TiDB v7.2.0 introduces Optimizer Fix Controls to let you control some of the fine-grained behaviors of the optimizer. This enables you to roll back or control some new changes.
Each controllable behavior is described by a GitHub issue corresponding to the fix number. All controllable behaviors are listed in Optimizer Fix Controls. You can set a target value for one or more behaviors by setting the tidb_opt_fix_control system variable to achieve behavior control.
The Optimizer Fix Controls mechanism helps you control the TiDB optimizer at a granular level. It provides a new means of fixing performance issues caused by the upgrade process and improves the stability of TiDB.
For more information, see documentation.
Lightweight statistics initialization becomes generally available (GA) #42160 @xuyifangreeneyes
Starting from v7.2.0, the lightweight statistics initialization feature becomes GA. Lightweight statistics initialization can significantly reduce the number of statistics that must be loaded during startup, thus improving the speed of loading statistics. This feature increases the stability of TiDB in complex runtime environments and reduces the impact on the overall service when TiDB nodes restart.
For newly created clusters of v7.2.0 or later versions, TiDB loads lightweight statistics by default during TiDB startup and will wait for the loading to finish before providing services. For clusters upgraded from earlier versions, you can set the TiDB configuration items lite-init-stats and force-init-stats to
true
to enable this feature.For more information, see documentation.
SQL
Support the
CHECK
constraints #41711 @fzzf678Starting from v7.2.0, you can use
CHECK
constraints to restrict the values of one or more columns in a table to meet your specified conditions. When aCHECK
constraint is added to a table, TiDB checks whether the constraint is satisfied before inserting or updating data in the table. Only the data that satisfies the constraint can be written.This feature is disabled by default. You can set the tidb_enable_check_constraint system variable to
ON
to enable it.For more information, see documentation.
DB operations
DDL jobs support pause and resume operations (experimental) #18015 @godouxm
Before TiDB v7.2.0, when a DDL job encounters a business peak during execution, you can only manually cancel the DDL job to reduce its impact on the business. In v7.2.0, TiDB introduces pause and resume operations for DDL jobs. These operations let you pause DDL jobs during a peak and resume them after the peak ends, thus avoiding impact on your application workloads.
For example, you can pause and resume multiple DDL jobs using
ADMIN PAUSE DDL JOBS
orADMIN RESUME DDL JOBS
:ADMIN PAUSE DDL JOBS 1,2;
ADMIN RESUME DDL JOBS 1,2;
For more information, see documentation.
Data migration
Introduce a new SQL statement
IMPORT INTO
to improve data import efficiency greatly (experimental) #42930 @D3HunterThe
IMPORT INTO
statement integrates the Physical Import Mode capability of TiDB Lightning. With this statement, you can quickly import data in formats such as CSV, SQL, and PARQUET into an empty table in TiDB. This import method eliminates the need for a separate deployment and management of TiDB Lightning, thereby reducing the complexity of data import and greatly improving import efficiency.For data files stored in Amazon S3 or GCS, when the TiDB Distributed eXecution Framework (DXF) is enabled,
IMPORT INTO
also supports splitting a data import job into multiple sub-jobs and scheduling them to multiple TiDB nodes for parallel import, which further enhances import performance.For more information, see documentation.
TiDB Lightning supports importing source files with the Latin-1 character set into TiDB #44434 @lance6716
With this feature, you can directly import source files with the Latin-1 character set into TiDB using TiDB Lightning. Before v7.2.0, importing such files requires your additional preprocessing or conversion. Starting from v7.2.0, you only need to specify
character-set = "latin1"
when configuring the TiDB Lightning import task. Then, TiDB Lightning automatically handles the character set conversion during the import process to ensure data integrity and accuracy.For more information, see documentation.
Compatibility changes
Note
This section provides compatibility changes you need to know when you upgrade from v7.1.0 to the current version (v7.2.0). If you are upgrading from v7.0.0 or earlier versions to the current version, you might also need to check the compatibility changes introduced in intermediate versions.
Behavior changes
- When processing update event, TiCDC splits an event into delete and insert events if the primary key or non-null unique index value is modified in the event. For more information, see documentation.
System variables
Variable name | Change type | Description |
---|---|---|
last_insert_id | Modified | Changes the maximum value from 9223372036854775807 to 18446744073709551615 to be consistent with that of MySQL. |
tidb_enable_non_prepared_plan_cache | Modified | Changes the default value from OFF to ON after further tests, meaning that non-prepared execution plan cache is enabled. |
tidb_remove_orderby_in_subquery | Modified | Changes the default value from OFF to ON after further tests, meaning that the optimizer removes the ORDER BY clause in a subquery. |
tidb_analyze_skip_column_types | Newly added | Controls which types of columns are skipped for statistics collection when executing the ANALYZE command to collect statistics. The variable is only applicable for tidb_analyze_version = 2. When using the syntax of ANALYZE TABLE t COLUMNS c1, …, cn , if the type of a specified column is included in tidb_analyze_skip_column_types , the statistics of this column will not be collected. |
tidb_enable_check_constraint | Newly added | Controls whether to enable CHECK constraints. The default value is OFF , which means this feature is disabled. |
tidb_enable_fast_table_check | Newly added | Controls whether to use a checksum-based approach to quickly check the consistency of data and indexes in a table. The default value is ON , which means this feature is enabled. |
tidb_enable_tiflash_pipeline_model | Newly added | Controls whether to enable the new execution model of TiFlash, the pipeline model. The default value is OFF , which means the pipeline model is disabled. |
tidb_expensive_txn_time_threshold | Newly added | Controls the threshold for logging expensive transactions, which is 600 seconds by default. When the duration of a transaction exceeds the threshold, and the transaction is neither committed nor rolled back, it is considered an expensive transaction and will be logged. |
Configuration file parameters
Configuration file | Configuration parameter | Change type | Description |
---|---|---|---|
TiDB | lite-init-stats | Modified | Changes the default value from false to true after further tests, meaning that TiDB uses lightweight statistics initialization by default during TiDB startup to improve the initialization efficiency. |
TiDB | force-init-stats | Modified | Changes the default value from false to true to align with lite-init-stats, meaning that TiDB waits for statistics initialization to finish before providing services during TiDB startup. |
TiKV | rocksdb.[defaultcf|writecf|lockcf].compaction-guard-min-output-file-size | Modified | Changes the default value from “8MB” to “1MB” to reduce the data volume of compaction tasks in RocksDB. |
TiKV | rocksdb.[defaultcf|writecf|lockcf].optimize-filters-for-memory | Newly added | Controls whether to generate Bloom/Ribbon filters that minimize memory internal fragmentation. |
TiKV | rocksdb.[defaultcf|writecf|lockcf].periodic-compaction-seconds | Newly added | Controls the time interval for periodic compaction. SST files with updates older than this value will be selected for compaction and rewritten to the same level where these SST files originally reside. |
TiKV | rocksdb.[defaultcf|writecf|lockcf].ribbon-filter-above-level | Newly added | Controls whether to use Ribbon filters for levels greater than or equal to this value and use non-block-based bloom filters for levels less than this value. |
TiKV | rocksdb.[defaultcf|writecf|lockcf].ttl | Newly added | SST files with updates older than the TTL will be automatically selected for compaction. |
TiDB Lightning | send-kv-pairs | Deprecated | Starting from v7.2.0, the parameter send-kv-pairs is deprecated. You can use send-kv-size to control the maximum size of one request when sending data to TiKV in physical import mode. |
TiDB Lightning | character-set | Modified | Introduces a new value option latin1 for the supported character sets of data import. You can use this option to import source files with the Latin-1 character set. |
TiDB Lightning | send-kv-size | Newly added | Specify the maximum size of one request when sending data to TiKV in physical import mode. When the size of key-value pairs reaches the specified threshold, TiDB Lightning will immediately send them to TiKV. This avoids the OOM problems caused by TiDB Lightning nodes accumulating too many key-value pairs in memory when importing large wide tables. By adjusting this parameter, you can find a balance between memory usage and import speed, improving the stability and efficiency of the import process. |
Data Migration | strict-optimistic-shard-mode | Newly added | This configuration item is used to be compatible with the DDL shard merge behavior in TiDB Data Migration v2.0. You can enable this configuration item in optimistic mode. After this is enabled, the replication task will be interrupted when it encounters a Type 2 DDL statement. In scenarios where there are dependencies between DDL changes in multiple tables, a timely interruption can be made. You need to manually process the DDL statements of each table before resuming the replication task to ensure data consistency between the upstream and the downstream. |
TiCDC | sink.protocol | Modified | Introduces a new value option “open-protocol” when the downstream is Kafka. Specifies the protocol format used for encoding messages. |
TiCDC | sink.delete-only-output-handle-key-columns | Newly added | Specifies the output of DELETE events. This parameter is valid only for “canal-json” and “open-protocol” protocols. The default value is false , which means outputting all columns. When you set it to true , only primary key columns or unique index columns are output. |
Improvements
TiDB
- Optimize the logic of constructing index scan range so that it supports converting complex conditions into index scan range #41572 #44389 @xuyifangreeneyes
- Add new monitoring metrics
Stale Read OPS
andStale Read Traffic
#43325 @you06 - When the retry leader of stale read encounters a lock, TiDB forcibly retries with the leader after resolving the lock, which avoids unnecessary overhead #43659 @you06
- Use estimated time to calculate stale read ts and reduce the overhead of stale read #44215 @you06
- Add logs and system variables for long-running transactions #41471 @crazycs520
- Support connecting to TiDB through the compressed MySQL protocol, which improves the performance of data-intensive queries under low bandwidth networks and saves bandwidth costs. This supports both
zlib
andzstd
based compression. #22605 @dveeden - Recognize both
utf8
andutf8bm3
as the legacy three-byte UTF-8 character set encodings, which facilitates the migration of tables with legacy UTF-8 encodings from MySQL 8.0 to TiDB #26226 @dveeden - Support using
:=
for assignment inUPDATE
statements #44751 @CbcWestwolf
TiKV
- Support configuring the retry interval of PD connections in scenarios such as connection request failures using
pd.retry-interval
#14964 @rleungx - Optimize the resource control scheduling algorithm by incorporating the global resource usage #14604 @Connor1996
- Use gzip compression for
check_leader
requests to reduce traffic #14553 @you06 - Add related metrics for
check_leader
requests #14658 @you06 - Provide detailed time information during TiKV handling write commands #12362 @cfzjywxk
- Support configuring the retry interval of PD connections in scenarios such as connection request failures using
PD
- Use a separate gRPC connection for PD leader election to prevent the impact of other requests #6403 @rleungx
- Enable the bucket splitting by default to mitigate hotspot issues in multi-Region scenarios #6433 @bufferflies
Tools
Backup & Restore (BR)
TiCDC
- Optimize the structure of the directory where data files are stored when a DDL operation occurs in the scenario of replication to an object storage service #8891 @CharlesCheung96
- Support the OAUTHBEARER authentication in the scenario of replication to Kafka #8865 @hi-rustin
- Add the option of outputting only the handle keys for the
DELETE
operation in the scenario of replication to Kafka #9143 @3AceShowHand
TiDB Data Migration (DM)
TiDB Lightning
Bug fixes
TiDB
- Fix the issue that the query with CTE causes TiDB to hang #43749 #36896 @guo-shaoge
- Fix the issue that the
min, max
query result is incorrect #43805 @wshwsh12 - Fix the issue that the
SHOW PROCESSLIST
statement cannot display the TxnStart of the transaction of the statement with a long subquery time #40851 @crazycs520 - Fix the issue that the stale read global optimization does not take effect due to the lack of
TxnScope
in Coprocessor tasks #43365 @you06 - Fix the issue that follower read does not handle flashback errors before retrying, which causes query errors #43673 @you06
- Fix the issue that data and indexes are inconsistent when the
ON UPDATE
statement does not correctly update the primary key #44565 @zyguan - Modify the upper limit of the
UNIX_TIMESTAMP()
function to3001-01-19 03:14:07.999999 UTC
to be consistent with that of MySQL 8.0.28 or later versions #43987 @YangKeao - Fix the issue that adding an index fails in the ingest mode #44137 @tangenta
- Fix the issue that canceling a DDL task in the rollback state causes errors in related metadata #44143 @wjhuang2016
- Fix the issue that using
memTracker
with cursor fetch causes memory leaks #44254 @YangKeao - Fix the issue that dropping a database causes slow GC progress #33069 @tiancaiamao
- Fix the issue that TiDB returns an error when the corresponding rows in partitioned tables cannot be found in the probe phase of index join #43686 @AilinKid @mjonss
- Fix the issue that there is no warning when using
SUBPARTITION
to create partitioned tables #41198 #41200 @mjonss - Fix the issue that when a query is killed because it exceeds
MAX_EXECUTION_TIME
, the returned error message is inconsistent with that of MySQL #43031 @dveeden - Fix the issue that the
LEADING
hint does not support querying block aliases #44645 @qw4990 - Modify the return type of the
LAST_INSERT_ID()
function from VARCHAR to LONGLONG to be consistent with that of MySQL #44574 @Defined2014 - Fix the issue that incorrect results might be returned when using a common table expression (CTE) in statements with non-correlated subqueries #44051 @winoros
- Fix the issue that Join Reorder might cause incorrect outer join results #44314 @AilinKid
- Fix the issue that
PREPARE stmt FROM "ANALYZE TABLE xxx"
might be killed bytidb_mem_quota_query
#44320 @chrysan
TiKV
- Fix the issue that the transaction returns an incorrect value when TiKV handles stale pessimistic lock conflicts #13298 @cfzjywxk
- Fix the issue that in-memory pessimistic lock might cause flashback failures and data inconsistency #13303 @JmPotato
- Fix the issue that the fair lock might be incorrect when TiKV handles stale requests #13298 @cfzjywxk
- Fix the issue that
autocommit
andpoint get replica read
might break linearizability #14715 @cfzjywxk
PD
TiFlash
Tools
Backup & Restore (BR)
TiCDC
- Fix the issue that Resolved TS does not advance properly in some cases #8963 @CharlesCheung96
- Fix the issue that the
UPDATE
operation cannot output old values when the Avro or CSV protocol is used #9086 @3AceShowHand - Fix the issue of excessive downstream pressure caused by reading downstream metadata too frequently when replicating data to Kafka #8959 @hi-rustin
- Fix the issue of too many downstream logs caused by frequently setting the downstream bidirectional replication-related variables when replicating data to TiDB or MySQL #9180 @asddongmen
- Fix the issue that the PD node crashing causes the TiCDC node to restart #8868 @asddongmen
- Fix the issue that TiCDC cannot create a changefeed with a downstream Kafka-on-Pulsar #8892 @hi-rustin
TiDB Lightning
Contributors
We would like to thank the following contributors from the TiDB community: