Thanks to our community users and developers, about 1000 improvements and bug fixes have been made in Doris 2.0.3 version, including optimizer statistics, inverted index, complex datatypes, data lake, replica management.
1 Behavior change
- The output format of the complex data type array/map/struct has been changed to be consistent to the input format and JSON specification. The main changes from the previous version are that DATE/DATETIME and STRING/VARCHAR are enclosed in double quotes and null values inside ARRAY/MAP are displayed as
null
instead ofNULL
. - SHOW_VIEW permission is supported. Users with SELECT or LOAD permission will no longer be able to execute the ‘SHOW CREATE VIEW’ statement and must be granted the SHOW_VIEW permission separately.
2 New features
2.1 Support collecting statistics for optimizer automatically
Collecting statistics helps the optimizer understand the data distribution characteristics and choose a better plan to greatly improve query performance. It is officially supported starting from version 2.0.3 and is enabled all day by default.
2.2 Support complex datatypes for more datalake source
- Support complex datatypes for JAVA UDF, JDBC and Hudi MOR
- Support complex datatypes for Paimon
- Suport Paimon version 0.5
2.3 Add more builtin functions
- Support the BitmapAgg function in new optimizer
- Supports SHA series digest functions
- Support the BITMAP datatype in the aggregate functions min_by and max_by
- Add milliseconds/microseconds_add/sub/diff functions
- Add some json functions: json_insert, json_replace, json_set
3 Improvement and optimizations
3.1 Performance optimizations
- When the inverted index MATCH WHERE condition with a high filter rate is combined with the common WHERE condition with a low filter rate, the I/O of the index column is greatly reduced.
- Optimize the efficiency of random data access after the where filter.
- Optimizes the performance of the old get_json_xx function on JSON data types by 2~4x.
- Supports the configuration to reduce the priority of the data read thread, ensuring the CPU resources for real-time writing.
- Adds
uuid-numeric
function that returns largeint, which is 20 times faster thanuuid
function that returns string. - Optimized the performance of case when by 3x.
- Cut out unnecessary predicate calculations in storage engine execution.
- Accelerate count performance by pushing down count operator to storage tier.
- Optimizes the computation performance of the nullable type in and or expressions.
- Supports rewriting the limit operator before
join
in more scenarios to improve query performance. - Eliminate useless
order by
operators from inline view to improve query performance. - Optimizes the accuracy of cardinality estimates and cost models in some cases.
- Optimized jdbc catalog predicate pushdown logic.
- Optimized the read efficiency of the file cache when it’s enable for the first time.
- Optimizes the hive table sql cache policy and uses the partition update time stored in HMS to improve the cache hit ratio.
- Optimize mow compaction efficiency.
- Optimized thread allocation logic for external table query to reduce memory usage
- Optimize memory usage for column reader.
3.2 Distributed replica management improvements
Distributed replica management improvements include skipping partition deletion, colocate group deletion, balance failure due to continuous write, and hot and cold seperation table balance.
3.3 Security enhancement
- The audit log plug-in uses a token instead of a plaintext password to enhance security
- log4j configures security enhancement
- Sensitive user information is not displayed in logs
4 Bugfix and stability
4.1 Complex datatypes
- Fix issues that fixed-length CHAR(n) was not truncated correctly in map/struct.
- Fix write failure for struct datatype nested for map/array
- Fix the issue that count distinct did not support array/map/struct
- Fix be crash in updating to 2.0.3 after the delete complex type appeared in query
- Fix be crash when JSON datatype is in WHERE clause.
- Fix be crash when ARRAY datatype is in OUTER JOIN clause.
- Fix reading incorrect result for DECIMAL datatype in ORC format.
4.2 Inverted index
- Fix incorrect result for OR NOT combination in WHERE clause were incorrect when disable inverted index query.
- Fix be crash when write a empty with inverted index
- Fix be crash in index compaction when the output of compaction is empty.
- Fixed the problem of adding an inverted index to be crashed when no data is written to the newly added column.
- Fix be crash when BUILD INDEX after ADD COLUMN without new data written.
- Fix missing and leak problem of hardlink for inverted index file.
- Fix index file corrupt when disk is full temporarilly
- Fix incorrect result due to optimization for skip reading index column
4.3 Materialized View
- Fix the problem of BE crash caused by repeated expressions in the group by statement
- Fix be crash when there are duplicate expressions in
group by
statements. - Disables the float/double type in the
group by
clause when a view is created. - Improve the function of select query matching materialized view
- Fix an issue that materialized views could not be matched when a table alias was used
- Fix the problem using percentile_approx when creating materialized views
4.4 Table sample
- Fix the problem that table sample query can not work on table with partitions.
- Fix the problem that table sample query can not work when specify tablet.
4.5 Unique with merge on write
- Fix null pointer exception in conditional update based on primary key
- Fix field name capitalization issues in partial update
- Fix duplicate keys occur in mow during schema change repairement.
4.6 Load and compaction
- Fix unkown slot descriptor error in routineload for running multiple tables
- Fix be crash due to concurrent memory access when caculating memory
- Fix be crash on duplicate cancel for load.
- Fix broker connection error during broker load
- Fix incorrect result delete predicates in concurrent case of compation and scan.
- Fix the problem tha compaction task would print too many stacktrace logs
4.7 Data Lake compatibility
- Solve the problem that the iceberg table contains special characters that cause query failure
- Fix compatibility issues of different hive metastore versions
- Fix an error reading max compute partition table
- Fix the issue that backup to object storage failed
4.8 JDBC external table compatibility
- Fix Oracle date type format error in jdbc catalog
- Fix MySQL 0000-00-00 date exception in jdbc catalog
- Fix an exception in reading data from Mariadb where the default value of the time type is current_timestamp
- Fix be crash when processing BITMAP datatype in jdbc catalog
4.9 SQL Planner and Optimizer
Fix partition prune error in some scenes
Fix incorrect sub-query processing in some scenarios
Fix some semantic parsing errors
Fix data loss during right outer/anti join
Fix incorrect pushing down of predicate pass aggregation operators.
Fix incorrect result header in some cases
Fix incorrect plan when the nullsafeEquals expression (<=>) is used as the join condition
Fix correct column prune in set operation operator.
Others
- Fix BE crash when the order of columns in a table is changed and then upgraded to 2.0.3.
See the complete list of improvements and bug fixes on github dev/2.0.3-merged .