TiDB 7.3.0 Release Notes
Release date: August 14, 2023
TiDB version: 7.3.0
Quick access: Quick start
7.3.0 introduces the following major features. In addition to that, 7.3.0 also includes a series of enhancements (described in the Feature details section) to query stability in TiDB server and TiFlash. These enhancements are more miscellaneous in nature and not user-facing so they are not included in the following table.
Category | Feature | Description |
---|---|---|
Scalability and Performance | TiDB Lightning supports Partitioned Raft KV (experimental) | TiDB Lightning now supports the new Partitioned Raft KV architecture, as part of the near-term GA of the architecture. |
Reliability and Availability | Add automatic conflict detection and resolution on data imports | The TiDB Lightning Physical Import Mode supports a new version of conflict detection, which implements the semantics of replacing (replace ) or ignoring (ignore ) conflict data when encountering conflicts. It automatically handles conflict data for you while improving the performance of conflict resolution. |
Manual management of runaway queries(experimental) | Queries might take longer than you expect. With the new watch list of resource groups, you can now manage queries more effectively and either deprioritize or kill them. Allowing operators to mark target queries by exact SQL text, SQL digest, or plan digest and deal with the queries at a resource group level, this feature gives you much more control over the potential impact of unexpected large queries on a cluster. | |
SQL | Enhance operator control over query stability by adding more optimizer hints to the query planner | Added hints: NO_INDEX_JOIN() , NO_MERGE_JOIN() , NO_INDEX_MERGE_JOIN() , NO_HASH_JOIN() , NO_INDEX_HASH_JOIN() |
DB Operations and Observability | Show the progress of statistics collection tasks | Support viewing the progress of ANALYZE tasks using the SHOW ANALYZE STATUS statement or through the mysql.analyze_jobs system table. |
Feature details
Performance
TiFlash supports the replica selection strategy #44106 @XuHuaiyu
Before v7.3.0, TiFlash uses replicas from all its nodes for data scanning and MPP calculations to maximize performance. Starting from v7.3.0, TiFlash introduces the replica selection strategy and lets you configure it using the tiflash_replica_read system variable. This strategy supports selecting specific replicas based on the zone attributes of nodes and scheduling specific nodes for data scanning and MPP calculations.
For a cluster that is deployed in multiple data centers and each data center has complete TiFlash data replicas, you can configure this strategy to only select TiFlash replicas from the current data center. This means data scanning and MPP calculations are performed only on TiFlash nodes in the current data center, which avoids excessive network data transmission across data centers.
For more information, see documentation.
TiFlash supports Runtime Filter within nodes #40220 @elsa0520
Runtime Filter is a dynamic predicate generated during the query planning phase. In the process of table joining, these dynamic predicates can effectively filter out rows that do not meet the join conditions, reducing scan time and network overhead, and improving the efficiency of table joining. Starting from v7.3.0, TiFlash supports Runtime Filter within nodes, improving the overall performance of analytical queries. In some TPC-DS workloads, the performance can be improved by 10% to 50%.
This feature is disabled by default in v7.3.0. To enable this feature, set the system variable tidb_runtime_filter_mode to
LOCAL
.For more information, see documentation.
TiFlash supports executing common table expressions (CTEs) (experimental) #43333 @winoros
Before v7.3.0, the MPP engine of TiFlash cannot execute queries that contain CTEs by default. To achieve the best execution performance within the MPP framework, you need to use the system variable tidb_opt_force_inline_cte to enforce inlining CTE.
Starting from v7.3.0, TiFlash’s MPP engine supports executing queries with CTEs without inlining them, allowing for optimal query execution within the MPP framework. In TPC-DS benchmark tests, compared with inlining CTEs, this feature has shown a 20% improvement in overall query execution speed for queries containing CTE.
This feature is experimental and is disabled by default. It is controlled by the system variable tidb_opt_enable_mpp_shared_cte_execution.
Reliability
Add new optimizer hints #45520 @qw4990
In v7.3.0, TiDB introduces several new optimizer hints to control the join methods between tables, including:
- NO_MERGE_JOIN() selects join methods other than merge join.
- NO_INDEX_JOIN() selects join methods other than index nested loop join.
- NO_INDEX_MERGE_JOIN() selects join methods other than index nested loop merge join.
- NO_HASH_JOIN() selects join methods other than hash join.
- NO_INDEX_HASH_JOIN() selects join methods other than index nested loop hash join.
For more information, see documentation.
Manually mark queries that use resources more than expected (experimental) #43691 @Connor1996 @CabinfeverB
In v7.2.0, TiDB automatically manages queries that use resources more than expected (Runaway Query) by automatically downgrading or canceling runaway queries. In actual practice, rules alone cannot cover all cases. Therefore, TiDB v7.3.0 introduces the ability to manually mark runaway queries. With the new command QUERY WATCH, you can mark runaway queries based on SQL text, SQL Digest, or execution plan, and the marked runaway queries can be downgraded or cancelled.
This feature provides an effective intervention method for sudden performance issues in the database. For performance issues caused by queries, before identifying the root cause, this feature can quickly alleviate its impact on overall performance, thereby improving system service quality.
For more information, see documentation.
SQL
List and List COLUMNS partitioned tables support default partitions #20679 @mjonss @bb7133
Before v7.3.0, when you use the
INSERT
statement to insert data into a List or List COLUMNS partitioned table, the data needs to meet the specified partitioning conditions of the table. If the data to be inserted does not meet any of these conditions, either the execution of the statement will fail or the non-compliant data will be ignored.Starting from v7.3.0, List and List COLUMNS partitioned tables support default partitions. After a default partition is created, if the data to be inserted does not meet any partitioning condition, it will be written to the default partition. This feature improves the usability of List and List COLUMNS partitioning, avoiding the execution failure of the
INSERT
statement or data being ignored due to data that does not meet partitioning conditions.Note that this feature is a TiDB extension to MySQL syntax. For a partitioned table with a default partition, the data in the table cannot be directly replicated to MySQL.
For more information, see documentation.
Observability
Show the progress of collecting statistics #44033 @hawkingrei
Collecting statistics for large tables often takes a long time. In previous versions, you cannot see the progress of collecting statistics, and therefore cannot predict the completion time. TiDB v7.3.0 introduces a feature to show the progress of collecting statistics. You can view the overall workload, current progress, and estimated completion time for each subtask using the system table
mysql.analyze_jobs
orSHOW ANALYZE STATUS
. In scenarios such as large-scale data import and SQL performance optimization, this feature helps you understand the overall task progress and improves the user experience.For more information, see documentation.
Plan Replayer supports exporting historical statistics #45038 @time-and-fate
Starting from v7.3.0, with the newly added dump with stats as of timestamp clause, you can use Plan Replayer to export the statistics of specified SQL-related objects at a specific point in time. During the diagnosis of execution plan issues, accurately capturing historical statistics can help analyze more precisely how the execution plan was generated at the time when the issue occurred. This helps identify the root cause of the issue and greatly improves efficiency in diagnosing execution plan issues.
For more information, see documentation.
Data migration
TiDB Lightning introduces a new version of conflict data detection and handling strategy #41629 @lance6716
In previous versions, TiDB Lightning uses different conflict detection and handling methods for Logical Import Mode and Physical Import Mode, which are complex to configure and not easy for users to understand. In addition, Physical Import Mode cannot handle conflicts using the
replace
orignore
strategy. Starting from v7.3.0, TiDB Lightning introduces a unified conflict detection and handling strategy for both Logical Import Mode and Physical Import Mode. You can choose to report an error (error
), replace (replace
) or ignore (ignore
) conflicting data when encountering conflicts. You can limit the number of conflict records, such as the task is interrupted and terminated after processing a specified number of conflict records. Furthermore, the system can record conflicting data for troubleshooting.For import data with many conflicts, it is recommended to use the new version of the conflict detection and handling strategy for better performance. In the lab environment, the new version strategy can improve the performance of conflict detection and handling up to three times faster than the old version. This performance value is for reference only. The actual performance might vary depending on your configuration, table structure, and the percentage of conflicting data. Note that the new version and the old version of the conflict strategy cannot be used at the same time. The old conflict detection and handling strategy will be deprecated in the future.
For more information, see documentation.
TiDB Lightning supports Partitioned Raft KV (experimental) #14916 @GMHDBJD
TiDB Lightning now supports Partitioned Raft KV. This feature helps improve the data import performance of TiDB Lightning.
TiDB Lightning introduces a new parameter
enable-diagnose-log
to enhance troubleshooting by printing more diagnostic logs #45497 @D3HunterBy default, this feature is disabled and TiDB Lightning only prints logs containing
lightning/main
. When enabled, TiDB Lightning prints logs for all packages (includingclient-go
andtidb
) to help diagnose issues related toclient-go
andtidb
.For more information, see documentation.
Compatibility changes
Note
This section provides compatibility changes you need to know when you upgrade from v7.2.0 to the current version (v7.3.0). If you are upgrading from v7.1.0 or earlier versions to the current version, you might also need to check the compatibility changes introduced in intermediate versions.
Behavior changes
TiDB
- MPP is a distributed computing framework provided by the TiFlash engine, which allows data exchange between nodes and provides high-performance, high-throughput SQL algorithms. Compared with other protocols, the MPP protocol is more mature and can provide better task and resource management. Starting from v7.3.0, when TiDB pushes computation tasks to TiFlash, the optimizer only generates execution plans using the MPP protocol by default. If tidb_allow_mpp is set to
OFF
, queries might return errors after you upgrade TiDB. It is recommended that you check the value oftidb_allow_mpp
and set it toON
before the upgrade. If you still need the optimizer to choose one of the Cop, BatchCop, and MPP protocols for generating execution plans based on cost estimates, you can set the tidb_allow_tiflash_cop variable toON
.
- MPP is a distributed computing framework provided by the TiFlash engine, which allows data exchange between nodes and provides high-performance, high-throughput SQL algorithms. Compared with other protocols, the MPP protocol is more mature and can provide better task and resource management. Starting from v7.3.0, when TiDB pushes computation tasks to TiFlash, the optimizer only generates execution plans using the MPP protocol by default. If tidb_allow_mpp is set to
Backup & Restore (BR)
- BR adds an empty cluster check before performing a full data restoration. By default, restoring data to a non-empty cluster is not allowed. If you want to force the restoration, you can use the
--filter
option to specify the corresponding table name to restore data to.
- BR adds an empty cluster check before performing a full data restoration. By default, restoring data to a non-empty cluster is not allowed. If you want to force the restoration, you can use the
TiDB Lightning
tikv-importer.on-duplicate
is deprecated and replaced by conflict.strategy.- The
max-error
parameter, which controls the maximum number of non-fatal errors that TiDB Lightning can tolerate before stopping the migration task, no longer limits import data conflicts. The conflict.threshold parameter now controls the maximum number of conflicting records that can be tolerated.
TiCDC
- When Kafka sink uses Avro protocol, if the
force-replicate
parameter is set totrue
, TiCDC reports an error when creating a changefeed. - Due to incompatibility between
delete-only-output-handle-key-columns
andforce-replicate
parameters, when both parameters are enabled, TiCDC reports an error when creating a changefeed. - When the output protocol is Open Protocol, the
UPDATE
events only output the changed columns.
- When Kafka sink uses Avro protocol, if the
System variables
Variable name | Change type | Description |
---|---|---|
tidb_opt_enable_mpp_shared_cte_execution | Modified | This system variable takes effect starting from v7.3.0. It controls whether non-recursive Common Table Expressions (CTEs) can be executed in TiFlash MPP. |
tidb_allow_tiflash_cop | Newly added | This system variable is used to select the protocol for generating execution plans when TiDB pushes computation tasks down to TiFlash. |
tidb_lock_unchanged_keys | Newly added | This variable is used to control in certain scenarios whether to lock the keys that are involved but not modified in a transaction. |
tidb_opt_enable_non_eval_scalar_subquery | Newly added | Controls whether the EXPLAIN statement disables the execution of constant subqueries that can be expanded at the optimization stage. |
tidb_skip_missing_partition_stats | Newly added | This variable controls the generation of GlobalStats when partition statistics are missing. |
tiflash_replica_read | Newly added | Controls the strategy for selecting TiFlash replicas when a query requires the TiFlash engine. |
Configuration file parameters
Configuration file | Configuration parameter | Change type | Description |
---|---|---|---|
TiDB | enable-32bits-connection-id | Newly added | Controls whether to enable the 32-bit connection ID feature. |
TiDB | in-mem-slow-query-recent-num | Newly added | Controls the number of recently used slow queries that are cached in memory. |
TiDB | in-mem-slow-query-topn-num | Newly added | Controls the number of slowest queries that are cached in memory. |
TiKV | coprocessor.region-bucket-size | Modified | Changes the default value from 96MiB to 50MiB . |
TiKV | raft-engine.format-version | Modified | When using Partitioned Raft KV (storage.engine=”partitioned-raft-kv” ), Ribbon filter is used. Therefore, TiKV changes the default value from 2 to 5 . |
TiKV | raftdb.max-total-wal-size | Modified | When using Partitioned Raft KV (storage.engine=”partitioned-raft-kv” ), TiKV skips writing WAL. Therefore, TiKV changes the default value from “4GB” to 1 , meaning that WAL is disabled. |
TiKV | rocksdb.[defaultcf|writecf|lockcf].compaction-guard-min-output-file-size | Modified | Changes the default value from “1MB” to “8MB” to resolve the issue that compaction speed cannot keep up with the write speed during large data writes. |
TiKV | rocksdb.[defaultcf|writecf|lockcf].format-version | Modified | When using Partitioned Raft KV (storage.engine=”partitioned-raft-kv” ), Ribbon filter is used. Therefore, TiKV changes the default value from 2 to 5 . |
TiKV | rocksdb.lockcf.write-buffer-size | Modified | When using Partitioned Raft KV (storage.engine=”partitioned-raft-kv” ), to speed up compaction on lockcf, TiKV changes the default value from “32MB” to “4MB” . |
TiKV | rocksdb.max-total-wal-size | Modified | When using Partitioned Raft KV (storage.engine=”partitioned-raft-kv” ), TiKV skips writing WAL. Therefore, TiKV changes the default value from “4GB” to 1 , meaning that WAL is disabled. |
TiKV | rocksdb.stats-dump-period | Modified | When using Partitioned Raft KV (storage.engine=”partitioned-raft-kv” ), to disable redundant log printing, changes the default value from “10m” to “0” . |
TiKV | rocksdb.write-buffer-limit | Modified | To reduce the memory overhead of memtables, when storage.engine=”raft-kv” , TiKV changes the default value from 25% of the memory of the machine to 0 , which means no limit. When using Partitioned Raft KV (storage.engine=”partitioned-raft-kv” ), TiKV changes the default value from 25% to 20% of the memory of the machine. |
TiKV | storage.block-cache.capacity | Modified | When using Partitioned Raft KV (storage.engine=”partitioned-raft-kv” ), to compensate for the memory overhead of memtables, TiKV changes the default value from 45% to 30% of the size of total system memory. |
TiFlash | storage.format_version | Modified | Introduces a new DTFile format format_version = 5 to reduce the number of physical files by merging smaller files. Note that this format is experimental and not enabled by default. |
TiDB Lightning | tikv-importer.incremental-import | Deleted | TiDB Lightning parallel import parameter. Because it could easily be mistaken as an incremental import parameter, this parameter is now renamed to tikv-importer.parallel-import . If a user passes in the old parameter name, it will be automatically converted to the new one. |
TiDB Lightning | tikv-importer.on-duplicate | Deprecated | Controls action to do when trying to insert a conflicting record in the logical import mode. Starting from v7.3.0, this parameter is replaced by conflict.strategy. |
TiDB Lightning | conflict.max-record-rows | Newly added | The new version of strategy to handle conflicting data. It controls the maximum number of rows in the conflict_records table. The default value is 100. |
TiDB Lightning | conflict.strategy | Newly added | The new version of strategy to handle conflicting data. It includes the following options: “” (TiDB Lightning does not detect and process conflicting data), error (terminate the import and report an error if a primary or unique key conflict is detected in the imported data), replace (when encountering data with conflicting primary or unique keys, the new data is retained and the old data is overwritten.), ignore (when encountering data with conflicting primary or unique keys, the old data is retained and the new data is ignored.). The default value is “”, that is, TiDB Lightning does not detect and process conflicting data. |
TiDB Lightning | conflict.threshold | Newly added | Controls the upper limit of the conflicting data. When conflict.strategy=”error” , the default value is 0 . When conflict.strategy=”replace” or conflict.strategy=”ignore” , you can set it as a maxint. |
TiDB Lightning | enable-diagnose-logs | Newly added | Controls whether to enable the diagnostic logs. The default value is false , that is, only the logs related to the import are output, and the logs of other dependent components are not output. When you set it to true , logs from both the import process and other dependent components are output, and GRPC debugging is enabled, which can be used for diagnosis. |
TiDB Lightning | tikv-importer.parallel-import | Newly added | TiDB Lightning parallel import parameter. It replaces the existing tikv-importer.incremental-import parameter, which could be mistaken as an incremental import parameter and misused. |
BR | azblob.encryption-scope | Newly added | BR provides encryption scope support for Azure Blob Storage. |
BR | azblob.encryption-key | Newly added | BR provides encryption key support for Azure Blob Storage. |
TiCDC | large-message-handle-option | Newly added | Empty by default, which means that when the message size exceeds the limit of Kafka topic, the changefeed fails. When this configuration is set to “handle-key-only” , if the message exceeds the size limit, only the handle key will be sent to reduce the message size; if the reduced message still exceeds the limit, then the changefeed fails. |
TiCDC | sink.csv.binary-encoding-method | Newly added | The encoding method of binary data, which can be ‘base64’ or ‘hex’ . The default value is ‘base64’ . |
System tables
- Add a new system table
mysql.tidb_timers
to store the metadata of internal timers.
Deprecated features
TiDB
- The Fast Analyze feature (experimental) for statistics will be deprecated in v7.5.0.
- The incremental collection feature for statistics will be deprecated in v7.5.0.
Improvements
TiDB
- Introduce a new system variable tidb_opt_enable_non_eval_scalar_subquery to control whether the
EXPLAIN
statement executes subqueries in advance during the optimization phase #22076 @winoros - When Global Kill is enabled, you can terminate the current session by pressing Control+C #8854 @pingyu
- Support the
IS_FREE_LOCK()
andIS_USED_LOCK()
locking functions #44493 @dveeden - Optimize the performance of reading the dumped chunks from disk #45125 @YangKeao
- Optimize the overestimation issue of the inner table of Index Join by using Optimizer Fix Controls #44855 @time-and-fate
- Introduce a new system variable tidb_opt_enable_non_eval_scalar_subquery to control whether the
TiKV
PD
TiFlash
- Support a new DTFile format version storage.format_version = 5 to reduce the number of physical files (experimental) #7595 @hongyunyan
Tools
Backup & Restore (BR)
TiCDC
- Optimize the message size of the Open Protocol output to make it include only the updated column values when sending
UPDATE
events #9336 @3AceShowHand - Storage Sink now supports hexadecimal encoding for HEX formatted data, making it compatible with AWS DMS format specifications #9373 @CharlesCheung96
- Kafka Sink supports sending only handle key data when the message is too large, reducing the size of the message #9382 @3AceShowHand
- Optimize the message size of the Open Protocol output to make it include only the updated column values when sending
Bug fixes
TiDB
- Fix the issue that when the MySQL Cursor Fetch protocol is used, the memory consumption of result sets might exceed the
tidb_mem_quota_query
limit and causes TiDB OOM. After the fix, TiDB will automatically write result sets to the disk to release memory #43233 @YangKeao - Fix the TiDB panic issue caused by data race #45561 @genliqi
- Fix the hang-up issue that occurs when queries with
indexMerge
are killed #45279 @xzhangxian1008 - Fix the issue that query results in MPP mode are incorrect when
tidb_enable_parallel_apply
is enabled #45299 @windtalker - Fix the issue that
resolve lock
might hang when there is a sudden change in PD time #44822 @zyguan - Fix the issue that the GC Resolve Locks step might miss some pessimistic locks #45134 @MyonKeminta
- Fix the issue that the query with
ORDER BY
returns incorrect results in dynamic pruning mode #45007 @Defined2014 - Fix the issue that
AUTO_INCREMENT
can be specified on the same column with theDEFAULT
column value #45136 @Defined2014 - Fix the issue that querying the system table
INFORMATION_SCHEMA.TIKV_REGION_STATUS
returns incorrect results in some cases #45531 @Defined2014 - Fix the issue of incorrect partition table pruning in some cases #42273 @jiyfhust
- Fix the issue that global indexes are not cleared when truncating partition of a partitioned table #42435 @L-maple
- Fix the issue that other TiDB nodes do not take over TTL tasks after failures in one TiDB node #45022 @lcwangchao
- Fix the memory leak issue when TTL is running #45510 @lcwangchao
- Fix the issue of inaccurate error messages when inserting data into partitioned tables #44966 @lilinghai
- Fix the read permission issue on the
INFORMATION_SCHEMA.TIFLASH_REPLICA
table #7795 @Lloyd-Pottiger - Fix the issue that an error occurs when using a wrong partition table name #44967 @River2000i
- Fix the issue that creating indexes gets stuck when
tidb_enable_dist_task
is enabled in some cases #44440 @tangenta - Fix the
duplicate entry
error that occurs when restoring a table withAUTO_ID_CACHE=1
using BR #44716 @tiancaiamao - Fix the issue that the time consumed for executing
TRUNCATE TABLE
is inconsistent with the task execution time shown inADMIN SHOW DDL JOBS
#44785 @tangenta - Fix the issue that upgrading TiDB gets stuck when reading metadata takes longer than one DDL lease #45176 @zimulala
- Fix the issue that the query result of the
SELECT CAST(n AS CHAR)
statement is incorrect whenn
in the statement is a negative number #44786 @xhebox - Fix the issue that queries might return incorrect results when
tidb_opt_agg_push_down
is enabled #44795 @AilinKid - Fix the issue of wrong results that occurs when a query with
current_date()
uses plan cache #45086 @qw4990
- Fix the issue that when the MySQL Cursor Fetch protocol is used, the memory consumption of result sets might exceed the
TiKV
- Fix the issue that reading data during GC might cause TiKV panic in some rare cases #15109 @MyonKeminta
PD
- Fix the issue that restarting PD might cause the
default
resource group to be reinitialized #6787 @glorv - Fix the issue that when etcd is already started but the client has not yet connected to it, calling the client might cause PD to panic #6860 @HuSharp
- Fix the issue that the
health-check
output of a Region is inconsistent with the Region information returned by querying the Region ID #6560 @JmPotato - Fix the issue that failed learner peers in
unsafe recovery
are ignored inauto-detect
mode #6690 @v01dstar - Fix the issue that Placement Rules select TiFlash learners that do not meet the rules #6662 @rleungx
- Fix the issue that unhealthy peers cannot be removed when rule checker selects peers #6559 @nolouch
- Fix the issue that restarting PD might cause the
TiFlash
- Fix the issue that TiFlash cannot replicate partitioned tables successfully due to deadlocks #7758 @hongyunyan
- Fix the issue that the
INFORMATION_SCHEMA.TIFLASH_REPLICA
system table contains tables that users do not have privileges to access #7795 @Lloyd-Pottiger - Fix the issue that when there are multiple HashAgg operators within the same MPP task, the compilation of the MPP task might take an excessively long time, severely affecting query performance #7810 @SeaRise
Tools
TiCDC
- Fix the issue that changefeeds would fail due to the temporary unavailability of PD #9294 @asddongmen
- Fix the data inconsistency issue that might occur when some TiCDC nodes are isolated from the network #9344 @CharlesCheung96
- Fix the issue that when Kafka Sink encounters errors it might indefinitely block changefeed progress #9309 @hicqu
- Fix the panic issue that might occur when the TiCDC node status changes #9354 @sdojjy
- Fix the encoding error for the default
ENUM
values #9259 @3AceShowHand
TiDB Lightning
Contributors
We would like to thank the following contributors from the TiDB community: