TiDB 7.5.0 Release Notes
Release date: December 1, 2023
TiDB version: 7.5.0
Quick access: Quick start | Production deployment
TiDB 7.5.0 is a Long-Term Support Release (LTS).
Compared with the previous LTS 7.1.0, 7.5.0 includes new features, improvements, and bug fixes released in 7.2.0-DMR, 7.3.0-DMR, and 7.4.0-DMR. When you upgrade from 7.1.x to 7.5.0, you can download the TiDB Release Notes PDF to view all release notes between the two LTS versions. The following table lists some highlights from 7.2.0 to 7.5.0:
Category | Feature | Description |
---|---|---|
Scalability and Performance | Support running multiple ADD INDEX statements in parallel | This feature allows for concurrent jobs to add multiple indexes for a single table. Previously, it would take the time of X plus the time of Y to execute two ADD INDEX statements simultaneously (X and Y). With this feature, adding two indexes X and Y in one SQL can be concurrently executed, and the total execution time of DDL is significantly reduced. Especially in scenarios with wide tables, internal test data shows that performance can be improved by up to 94%. |
Reliability and Availability | Optimize Global Sort (experimental, introduced in v7.4.0) | TiDB v7.1.0 introduced the Distributed eXecution Framework (DXF). For tasks that take advantage of this framework, v7.4 introduces global sorting to eliminate the unnecessary I/O, CPU, and memory spikes caused by temporarily out-of-order data during data re-organization tasks. The global sorting takes advantage of external shared object storage (Amazon S3 in this first iteration) to store intermediary files during the job, adding flexibility and cost savings. Operations like ADD INDEX and IMPORT INTO will be faster, more resilient, more stable, more flexible, and cost less to run. |
Resource control for background tasks (experimental, introduced in v7.4.0) | In v7.1.0, the Resource Control feature was introduced to mitigate resource and storage access interference between workloads. TiDB v7.4.0 applied this control to the priority of background tasks as well. In v7.4.0, Resource Control now identifies and manages the priority of background task execution, such as auto-analyze, Backup & Restore, bulk load with TiDB Lightning, and online DDL. In future releases, this control will eventually apply to all background tasks. | |
Resource control for managing runaway queries (experimental, introduced in v7.2.0) | Resource Control is a framework for resource-isolating workloads by Resource Groups, but it makes no calls on how individual queries affect work inside of each group. TiDB v7.2.0 introduces “runaway queries control” to let you control how TiDB identifies and treats these queries per Resource Group. Depending on needs, long running queries might be terminated or throttled, and the queries can be identified by exact SQL text, SQL digests or their plan digests, for better generalization. In v7.3.0, TiDB enables you to proactively watch for known bad queries, similar to a SQL blocklist at the database level. | |
SQL | MySQL 8.0 compatibility (introduced in v7.4.0) | In MySQL 8.0, the default character set is utf8mb4, and the default collation of utf8mb4 is utf8mb4_0900_ai_ci . TiDB v7.4.0 adding support for this enhances compatibility with MySQL 8.0 so that migrations and replications from MySQL 8.0 databases with the default collation are now much smoother. |
DB Operations and Observability | TiDB Lightning’s physical import mode integrated into TiDB with IMPORT INTO (GA) | Before v7.2.0, to import data based on the file system, you needed to install TiDB Lightning and used its physical import mode. Now, the same capability is integrated into the IMPORT INTO statement so you can use this statement to quickly import data without installing any additional tool. This statement also supports the Distributed eXecution Framework (DXF) for parallel import, which improves import efficiency during large-scale imports. |
Specify the respective TiDB nodes to execute the ADD INDEX and IMPORT INTO SQL statements (GA) | You have the flexibility to specify whether to execute ADD INDEX or IMPORT INTO SQL statements on some of the existing TiDB nodes or newly added TiDB nodes. This approach enables resource isolation from the rest of the TiDB nodes, preventing any impact on business operations while ensuring optimal performance for executing the preceding SQL statements. In v7.5.0, this feature becomes generally available (GA). | |
DDL supports pause and resume operations (GA) | Adding indexes can be big resource consumers and can affect online traffic. Even when throttled in a Resource Group or isolated to labeled nodes, there may still be a need to suspend these jobs in emergencies. As of v7.2.0, TiDB now natively supports suspending any number of these background jobs at once, freeing up needed resources while avoiding having to cancel and restart the jobs. | |
TiDB Dashboard supports heap profiling for TiKV | Previously, addressing TiKV OOM or high memory usage issues typically required manual execution of jeprof to generate a heap profile in the instance environment. Starting from v7.5.0, TiKV enables remote processing of heap profiles. You can now directly access the flame graph and call graph of heap profile. This feature provides the same simple and easy-to-use experience as Go heap profiling. |
Feature details
Scalability
Support designating and isolating TiDB nodes to distributedly execute
ADD INDEX
orIMPORT INTO
tasks when the Distributed eXecution Framework (DXF) is enabled #46258 @ywqzzyExecuting
ADD INDEX
orIMPORT INTO
tasks in parallel in a resource-intensive cluster can consume a large amount of TiDB node resources, which can lead to cluster performance degradation. To avoid performance impact on existing services, v7.4.0 introduces the system variable tidb_service_scope as an experimental feature to control the service scope of each TiDB node under the TiDB Distributed eXecution Framework (DXF). You can select several existing TiDB nodes or set the TiDB service scope for new TiDB nodes, and all distributedly executedADD INDEX
andIMPORT INTO
tasks only run on these nodes. In v7.5.0, this feature becomes generally available (GA).For more information, see documentation.
Performance
The TiDB Distributed eXecution Framework (DXF) becomes generally available (GA), improving the performance and stability of
ADD INDEX
andIMPORT INTO
tasks in parallel execution #45719 @wjhuang2016The DXF introduced in v7.1.0 has become GA. In versions before TiDB v7.1.0, only one TiDB node can execute DDL tasks at the same time. Starting from v7.1.0, multiple TiDB nodes can execute the same DDL task in parallel under the DXF. Starting from v7.2.0, the DXF supports multiple TiDB nodes to execute the same
IMPORT INTO
task in parallel, thereby better utilizing the resources of the TiDB cluster and significantly improving the performance of DDL andIMPORT INTO
tasks. In addition, you can also increase TiDB nodes to linearly improve the performance of these tasks.To use the DXF, set tidb_enable_dist_task value to
ON
.SET GLOBAL tidb_enable_dist_task = ON;
For more information, see documentation.
Improve the performance of adding multiple indexes in a single SQL statement #41602 @tangenta
Before v7.5.0, when you add multiple indexes (
ADD INDEX
) in a single SQL statement, the performance was similar to adding multiple indexes using separate SQL statements. Starting from v7.5.0, the performance of adding multiple indexes in a single SQL statement is significantly improved. Especially in scenarios with wide tables, internal test data shows that performance can be improved by up to 94%.
DB operations
DDL jobs support pause and resume operations (GA) #18015 @godouxm
The pause and resume operations for DDL jobs introduced in v7.2.0 become generally available (GA). These operations let you pause resource-intensive DDL jobs (such as creating indexes) to save resources and minimize the impact on online traffic. When resources permit, you can seamlessly resume DDL jobs without canceling and restarting them. This feature improves resource utilization, enhances user experience, and simplifies the schema change process.
You can pause and resume multiple DDL jobs using
ADMIN PAUSE DDL JOBS
orADMIN RESUME DDL JOBS
:ADMIN PAUSE DDL JOBS 1,2;
ADMIN RESUME DDL JOBS 1,2;
For more information, see documentation.
BR supports backing up and restoring statistics #48008 @Leavrth
Starting from TiDB v7.5.0, the br command-line tool introduces the
--ignore-stats
parameter to back up and restore database statistics. When you set this parameter tofalse
, the br command-line tool supports backing up and restoring statistics of columns, indexes, and tables. In this case, you do not need to manually run the statistics collection task for the TiDB database restored from the backup, or wait for the completion of automatic collection tasks. This feature simplifies database maintenance work and improves query performance.For more information, see documentation.
Observability
TiDB Dashboard supports heap profiling for TiKV #15927 @Connor1996
Previously, addressing TiKV OOM or high memory usage issues typically required manual execution of
jeprof
to generate a heap profile in the instance environment. Starting from v7.5.0, TiKV enables remote processing of heap profiles. You can now directly access the flame graph and call graph of heap profile. This feature provides the same simple and easy-to-use experience as Go heap profiling.For more information, see documentation.
Data migration
Support the
IMPORT INTO
SQL statement (GA) #46704 @D3HunterIn v7.5.0, the
IMPORT INTO
SQL statement becomes generally available (GA). This statement integrates the Physical Import Mode capability of TiDB Lightning and allows you to quickly import data in formats such as CSV, SQL, and PARQUET into an empty table in TiDB. This import method eliminates the need for a separate deployment and management of TiDB Lightning, thereby reducing the complexity of data import and greatly improving import efficiency.For more information, see documentation.
Data Migration (DM) supports blocking incompatible (data-consistency-corrupting) DDL changes #9692 @GMHDBJD
Before v7.5.0, the DM Binlog Filter feature can only migrate or filter specified events, and the granularity is relatively coarse. For example, it can only filter large granularity of DDL events such as
ALTER
. This method is limited in some scenarios. For example, the application allowsADD COLUMN
but notDROP COLUMN
, but they are both filtered byALTER
events in the earlier DM versions.To address such issues, v7.5.0 refines the granularity of the supported DDL events, such as support filtering
MODIFY COLUMN
(modify the column data type),DROP COLUMN
, and other fine-grained DDL events that lead to data loss, truncation of data, and loss of precision. You can configure it as needed. This feature also supports blocking incompatible DDL changes and reporting errors for such changes, so that you can intervene manually in time to avoid impacting downstream application data.For more information, see documentation.
Support real-time checkpoint updates for continuous data validation #8463 @lichunzhu
Before v7.5.0, the continuous data validation feature ensures the data consistency during replication from DM to downstream. This serves as the basis for cutting over business traffic from the upstream database to TiDB. However, due to various factors such as replication delay and waiting for re-validation of inconsistent data, the continuous validation checkpoint must be refreshed every few minutes. This is unacceptable for some business scenarios where the cutover time is limited to tens of seconds.
With the introduction of real-time updating of checkpoint for continuous data validation, you can now provide the binlog position from the upstream database. Once the continuous validation program detects this binlog position in memory, it immediately refreshes the checkpoint instead of refreshing it every few minutes. Therefore, you can quickly perform cut-off operations based on this immediately updated checkpoint.
For more information, see documentation.
Compatibility changes
Note
This section provides compatibility changes you need to know when you upgrade from v7.4.0 to the current version (v7.5.0). If you are upgrading from v7.3.0 or earlier versions to the current version, you might also need to check the compatibility changes introduced in intermediate versions.
System variables
Variable name | Change type | Description |
---|---|---|
tidb_enable_fast_analyze | Deprecated | Controls whether to enable the statistics Fast Analyze feature. This feature is deprecated in v7.5.0. |
tidb_analyze_partition_concurrency | Modified | Changes the default value from 1 to 2 after further tests. |
tidb_build_stats_concurrency | Modified | Changes the default value from 4 to 2 after further tests. |
tidb_merge_partition_stats_concurrency | Modified | This system variable takes effect starting from v7.5.0. It specifies the concurrency of merging statistics for a partitioned table when TiDB analyzes the partitioned table. |
tidb_build_sampling_stats_concurrency | Newly added | Controls the sampling concurrency of the ANALYZE process. |
tidb_enable_async_merge_global_stats | Newly added | This variable is used by TiDB to merge statistics asynchronously to avoid OOM issues. |
tidb_gogc_tuner_max_value | Newly added | Controls the maximum value of GOGC that the GOGC Tuner can adjust. |
tidb_gogc_tuner_min_value | Newly added | Controls the minimum value of GOGC that the GOGC Tuner can adjust. |
Configuration file parameters
Configuration file | Configuration parameter | Change type | Description |
---|---|---|---|
TiDB | tikv-client.copr-req-timeout | Newly added | Sets the timeout of a single Coprocessor request. |
TiKV | raftstore.inspect-interval | Modified | Changes the default value from 500ms to 100ms after optimizing the algorithm to improve the sensitivity of slow node detection. |
TiKV | raftstore.region-compact-min-redundant-rows | Modified | Sets the number of redundant MVCC rows required to trigger RocksDB compaction. Starting from v7.5.0, this configuration item takes effect for the “raft-kv” storage engine. |
TiKV | raftstore.region-compact-redundant-rows-percent | Modified | Sets the percentage of redundant MVCC rows required to trigger RocksDB compaction. Starting from v7.5.0, this configuration item takes effect for the “raft-kv” storage engine. |
TiKV | raftstore.evict-cache-on-memory-ratio | Newly added | When the memory usage of TiKV exceeds 90% of the system available memory, and the memory occupied by Raft entry cache exceeds the evict-cache-on-memory-ratio of used memory, TiKV evicts the Raft entry cache. |
TiKV | memory.enable-heap-profiling | Newly added | Controls whether to enable Heap Profiling to track the memory usage of TiKV. |
TiKV | memory.profiling-sample-per-bytes | Newly added | Specifies the amount of data sampled by Heap Profiling each time, rounding up to the nearest power of 2. |
BR | —ignore-stats | Newly added | Controls whether to back up and restore database statistics. When you set this parameter to false , the br command-line tool supports backing up and restoring statistics of columns, indexes, and tables. |
TiCDC | case-sensitive | Modified | Changes the default value from true to false after further tests, which means that the table names and database names in the TiCDC configuration file are case-insensitive by default. |
TiCDC | sink.dispatchers.partition | Modified | Controls how TiCDC dispatches incremental data to Kafka partitions. v7.5.0 introduces a new value option columns , which uses the explicitly specified column values to calculate the partition number. |
TiCDC | changefeed-error-stuck-duration | Newly added | Controls the duration for which the changefeed is allowed to automatically retry when internal errors or exceptions occur. |
TiCDC | encoding-worker-num | Newly added | Controls the number of encoding and decoding workers in the redo module. |
TiCDC | flush-worker-num | Newly added | Controls the number of flushing workers in the redo module. |
TiCDC | sink.column-selectors | Newly added | Controls the specified columns of data change events that TiCDC sends to Kafka when dispatching incremental data. |
TiCDC | sql-mode | Newly added | Specifies the SQL mode used by TiCDC when parsing DDL statements. The default value is the same as the default SQL mode of TiDB. |
TiDB Lightning | —importer | Deleted | Specifies the address of TiKV-importer, which is deprecated in v7.5.0. |
Offline package changes
Starting from v7.5.0, the following contents are removed from the TiDB-community-toolkit
binary package:
tikv-importer-{version}-linux-{arch}.tar.gz
mydumper
spark-{version}-any-any.tar.gz
tispark-{version}-any-any.tar.gz
Deprecated features
Mydumper is deprecated in v7.5.0 and most of its features have been replaced by Dumpling. It is strongly recommended that you use Dumpling instead of Mydumper.
TiKV-importer is deprecated in v7.5.0. It is strongly recommended that you use the Physical Import Mode of TiDB Lightning as an alternative.
Starting from TiDB v7.5.0, technical support for the data replication feature of TiDB Binlog is no longer provided. It is strongly recommended to use TiCDC as an alternative solution for data replication. Although TiDB Binlog v7.5.0 still supports the Point-in-Time Recovery (PITR) scenario, this component will be completely deprecated in future versions. It is recommended to use PITR as an alternative solution for data recovery.
The Fast Analyze feature (experimental) for statistics is deprecated in v7.5.0.
The incremental collection feature (experimental) for statistics is deprecated in v7.5.0.
Improvements
TiDB
- Optimize the concurrency model of merging GlobalStats: introduce tidb_enable_async_merge_global_stats to enable simultaneous loading and merging of statistics, which speeds up the generation of GlobalStats on partitioned tables. Optimize the memory usage of merging GlobalStats to avoid OOM and reduce memory allocations. #47219 @hawkingrei
- Optimize the
ANALYZE
process: introduce tidb_build_sampling_stats_concurrency to better control theANALYZE
concurrency to reduce resource consumption. Optimize the memory usage ofANALYZE
to reduce memory allocation and avoid frequent GC by reusing some intermediate results. #47275 @hawkingrei - Optimize the use of placement policies: support configuring the range of a policy to global and improve the syntax support for common scenarios. #45384 @nolouch
- Improve the performance of adding indexes with
tidb_ddl_enable_fast_reorg
enabled. In internal tests, v7.5.0 improves the performance by up to 62.5% compared with v6.5.0. #47757 @tangenta
TiKV
- Avoid holding mutex when writing Titan manifest files to prevent affecting other threads #15351 @Connor1996
PD
- Improve the stability and usability of the
evict-slow-trend
scheduler #7156 @LykxSassinato
- Improve the stability and usability of the
Tools
Backup & Restore (BR)
- Add a new inter-table backup parameter
table-concurrency
for snapshot backups. This parameter is used to control the inter-table concurrency of meta information such as statistics backup and data validation #48571 @3pointer - During restoring a snapshot backup, BR retries when it encounters certain network errors #48528 @Leavrth
- Add a new inter-table backup parameter
Bug fixes
TiDB
- Prohibit split table operations on non-integer clustered indexes #47350 @tangenta
- Fix the issue of encoding time fields with incorrect timezone information #46033 @tangenta
- Fix the issue that the Sort operator might cause TiDB to crash during the spill process #47538 @windtalker
- Fix the issue that TiDB returns
Can't find column
for queries withGROUP_CONCAT
#41957 @AilinKid - Fix the panic issue of
batch-client
inclient-go
#47691 @crazycs520 - Fix the issue of incorrect memory usage estimation in
INDEX_LOOKUP_HASH_JOIN
#47788 @SeaRise - Fix the issue of uneven workload caused by the rejoining of a TiFlash node that has been offline for a long time #35418 @windtalker
- Fix the issue that the chunk cannot be reused when the HashJoin operator performs probe #48082 @wshwsh12
- Fix the issue that the
COALESCE()
function returns incorrect result type forDATE
type parameters #46475 @xzhangxian1008 - Fix the issue that
UPDATE
statements with subqueries are incorrectly converted to PointGet #48171 @hi-rustin - Fix the issue that incorrect results are returned when the cached execution plans contain the comparison between date types and
unix_timestamp
#48165 @qw4990 - Fix the issue that an error is reported when default inline common table expressions (CTEs) with aggregate functions or window functions are referenced by recursive CTEs #47881 @elsa0520
- Fix the issue that the optimizer mistakenly selects IndexFullScan to reduce sort introduced by window functions #46177 @qw4990
- Fix the issue that multiple references to CTEs result in incorrect results due to condition pushdown of CTEs #47881 @winoros
- Fix the issue that the MySQL compression protocol cannot handle large loads of data (>=16M) #47152 #47157 #47161 @dveeden
- Fix the issue that TiDB does not read
cgroup
resource limits when it is started withsystemd
#47442 @hawkingrei
TiKV
- Fix the issue that retrying prewrite requests in the pessimistic transaction mode might cause the risk of data inconsistency in rare cases #11187 @MyonKeminta
PD
- Fix the issue that
evict-leader-scheduler
might lose configuration #6897 @HuSharp - Fix the issue that after a store goes offline, the monitoring metric of its statistics is not deleted #7180 @rleungx
- Fix the issue that
canSync
andhasMajority
might be calculated incorrectly for clusters adopting the Data Replication Auto Synchronous (DR Auto-Sync) mode when the configuration of Placement Rules is complex #7201 @disksing - Fix the issue that the rule checker does not add Learners according to the configuration of Placement Rules #7185 @nolouch
- Fix the issue that TiDB Dashboard cannot read PD
trace
data correctly #7253 @nolouch - Fix the issue that PD might panic due to empty Regions obtained internally #7261 @lhy1024
- Fix the issue that
available_stores
is calculated incorrectly for clusters adopting the Data Replication Auto Synchronous (DR Auto-Sync) mode #7221 @disksing - Fix the issue that PD might delete normal Peers when TiKV nodes are unavailable #7249 @lhy1024
- Fix the issue that adding multiple TiKV nodes to a large cluster might cause TiKV heartbeat reporting to become slow or stuck #7248 @rleungx
- Fix the issue that
TiFlash
- Fix the issue that the
UPPER()
andLOWER()
functions return inconsistent results between TiDB and TiFlash #7695 @windtalker - Fix the issue that executing queries on empty partitions causes query failure #8220 @JaySon-Huang
- Fix the panic issue caused by table creation failure when replicating TiFlash replicas #8217 @hongyunyan
- Fix the issue that the
Tools
Backup & Restore (BR)
TiCDC
- Fix the performance issue caused by accessing NFS directories when replicating data to an object store sink #10041 @CharlesCheung96
- Fix the issue that the storage path is misspelled when
claim-check
is enabled #10036 @3AceShowHand - Fix the issue that TiCDC scheduling is not balanced in some cases #9845 @3AceShowHand
- Fix the issue that TiCDC might get stuck when replicating data to Kafka #9855 @hicqu
- Fix the issue that the TiCDC processor might panic in some cases #9849 #9915 @hicqu @3AceShowHand
- Fix the issue that enabling
kv-client.enable-multiplexing
causes replication tasks to get stuck #9673 @fubinzh - Fix the issue that an owner node gets stuck due to NFS failure when the redo log is enabled #9886 @3AceShowHand
Performance test
To learn about the performance of TiDB v7.5.0, you can refer to the TPC-C performance test report and Sysbench performance test report of the TiDB Dedicated cluster.
Contributors
We would like to thank the following contributors from the TiDB community: