The SQL statement will hit the cache if it is completely consistent.
Demand scenarios & solutions
Design principles
SQLCache stores and obtains the cache based on the SQL signature, the partition ID of the queried table, and the latest version of the partition. The combination of the three determines a cached data set. If any one of them changes, such as the SQL changes, the query fields or conditions are different, or the version changes after the data is updated, the cache will not be hit.
If multiple tables are joined, the most recently updated partition ID and latest version number are used. If one of the tables is updated, the partition ID or version number will be different, and the cache will not be hit.
SQLCache is more suitable for T+1 update scenarios. Data is updated in the early morning. The first query obtains the results from BE and puts them into the cache. Subsequent queries of the same nature obtain the results from the cache. Real-time update data can also be used, but there may be a problem of low hit rate.
Currently supports OlapTable internal table and Hive external table.
Usage
Make sure cache_enable_sql_mode=true in fe.conf (default is true)
vim fe/conf/fe.conf
cache_enable_sql_mode=true
Set variables in MySQL command line
MySQL [(none)]> set [global] enable_sql_cache=true;
Note: global is a global variable and does not refer to the current session variable.
In versions 2.1.3 and above, the Nereids optimizer saves cached key information, such as non-deterministic functions and their evaluation values, in the memory of FE. It can skip SQL parsing when the key information remains unchanged, optimizing the query speed of SQL cache.
The number of key information and elimination time can be controlled to reduce memory consumption on FE through the frontend configuration items sql_cache_manageNum and expires_sql_cache_in_fe_second.
MySQL [(none)]> ADMIN SET FRONTEND CONFIG ('sql_cache_manage_num' = '100');
MySQL [(none)]> ADMIN SET FRONTEND CONFIG ('expire_sql_cache_in_fe_second' = '300');
Cache conditions
After the first query, if the following three conditions are met, the query results will be cached.
(Current time - the last update time of the queried partition) is greater than cache_last_version_interval_second in fe.conf.
The number of query result rows is less than cache_result_max_row_count in fe.conf.
The query result bytes is less than cache_result_max_data_size in fe.conf.
For detailed parameter introduction and unfinished matters, see query-cache.md.
Unfinished business
SQL contains functions that generate random values, such as random(). Using QueryCache will cause the query results to lose their randomness, and the same results will be obtained every time they are executed.
Similar SQL, 2 indicators were queried before, and now 3 indicators are queried. Can the cache of 2 indicators be used? Not currently supported