Apache Doris version 2.1.5 was officially released on July 24, 2024. In this update, we have optimized various functional experiences for data lakehouse and high concurrency scenarios, functionalities of asynchronous materialized views. Additionaly, we have implemented several improvemnents and bug fixes to enhance the stability.

Quick Download: https://doris.apache.org/download/

GitHub Release: https://github.com/apache/doris/releases

Behavior changes

  • The default connection pool size for the JDBC Catalog has been increased from 10 to 30 to prevent connection exhaustion in high-concurrency scenarios. #37023.

  • The system’s reserved memory (low water mark) has been adjusted to min(6.4GB, MemTotal * 5%) to mitigate BE OOM issues.

  • When processing multiple statements in a single request, only the last statement’s result is returned if the CLIENT_MULTI_STATEMENTS flag is not set.

  • Direct modifications to data in asynchronous materialized views are no longer permitted.#37129

  • A session variable use_max_length_of_varchar_in_ctas has been added to control the behavior of varchar and char type length generation during CTAS (Create Table As Select). The default value is true. When set to false, the derived varchar length is used instead of the maximum length. #37284

  • Statistics collection now defaults to enabling the functionality of estimating the number of rows in Hive tables based on file size. #37694

  • Transparent rewrite for asynchronous materialized views is now enabled by default. #35897

  • Transparent rewrite utilizes partitioned materialized views. If partitions fail, the base tables are unioned with the materialized view to ensure data correctness. #35897

New features

Lakehouse

  • The session variable read_csv_empty_line_as_null can be used to control whether empty lines are ignored when reading CSV format files. #37153

    By default, empty lines are ignored. When set to true, empty lines will be read as rows where all columns are null.

  • Compatibility with Presto’s complex type output format can be enabled by setting serde_dialect="presto". #37253

Multi-Table Materialized View

  • Supports non-deterministic functions in materialized view building. #37651

  • Atomically replaces definitions of asynchronous materialized views. #37147

  • Views creation statements can be viewed via SHOW CREATE MATERIALIZED VIEW. #37125

  • Transparent rewrites for multi-dimensional aggregation and non-aggregate queries. #37436 #37497

  • Supports DISTINCT aggregations with key columns and partitioning for roll-ups. #37651

  • Support for partitioning materialized views to roll up partitions using date_trunc #31812 #35562

  • Partitioned table-valued functions (TVFs) are supported. #36479

Semi-Structured Data Management

  • Tables using the VARIANT type now support partial column updates. #34925

  • PreparedStatement support is now enabled by default. #36581

  • The VARIANT type can be exported to CSV format. #37857

  • explode_json_object function transposes JSON Object rows into columns. #36887

  • The ES Catalog now maps ES NESTED or OBJECT types to the Doris JSON type.#37101

  • By default, support_phrase is enabled for inverted indexes with specified analyzers to improve the performance of match_phrase series queries. #37949

Query Optimizer

  • Support for explaining DELETE FROM statements. #37100

  • Support for hint form of constant expression parameters #37988

Memory Management

  • Added an HTTP API to clear the cache. #36599

Permissions

  • Support for authorization of resources within Table-Valued Functions (TVFs) #37132

Improvements

Lakehouse

  • Upgraded Paimon to version 0.8.1

  • Fixes ClassNotFoundException for org.apache.commons.lang.StringUtils when querying Paimon tables. #37512

  • Added support for Tencent Cloud LakeFS. #36891

  • Optimized the timeout duration when fetching file lists for external table queries. #36842

  • Configurable via the session variable fetch_splits_max_wait_time_ms.

  • Improved default connection logic for SQLServer JDBC Catalog. #36971

    By default, the connection encryption settings are not intervened. Only when force_sqlserver_jdbc_encrypt_false is set to true, encrypt=false is forcibly added to the JDBC URL to reduce authentication errors. This allows for more flexible control over encryption behavior, enabling it to be turned on or off as needed.

  • Added serde properties to the show create table statements for Hive tables. #37096

  • Changed the default cache time for Hive table lists on the FE from 1 day to 4 hours

  • Data export (Export/Outfile) now supports specifying compression formats for Parquet and ORC

    For more information, please refer to docs.

  • When creating a table using CTAS+TVF, partition columns in the TVF are automatically mapped to Varchar(65533) instead of String, allowing them to be used as partition columns for internal tables #37161

  • Optimized the number of metadata accesses for Hive write operations #37127

  • ES Catalog now supports mapping nested/object types to Doris’s Json type. #37182

  • Improved error messages when connecting to Oracle using older versions of the ojdbc driver #37634

  • When Hudi tables return an empty set during Incremental Read, Doris now also returns an empty set instead of error #37636

  • Fixed an issue where inner-outer table join queries could lead to FE timeouts in some cases #37757

  • Fixed an issue with FE metadata replay errors during upgrades from older versions to newer versions when the Hive metastore event listener is enabled. #37757

Multi-Table Materialized View

  • Automate key column selection for asynchronous materialized views. #36601

  • Support date_trunc in materialized view partition definitions.. #35562

  • Enable transparent rewrites across nested materialized view aggregations. #37651

  • Asynchronous materialized views remain available when schema changes do not affect the correctness of their data. #37122

  • Improve planning speed for transparent rewrites. #37935

  • When calculating the availability of asynchronous materialized views, the current refresh status is no longer taken into account. #36617

Semi-Structured Data Management

  • Optimize DESC performance for viewing VARIANT sub-columns through sampling. #37217

  • Support for special JSON data with empty keys in the JSON type. #36762

Inverted Index

  • Reduce latency by minimizing the invocation of inverted index exists to avoid delays in accessing object storage. #36945

  • Optimize the overhead of the inverted index query process. #35357

  • Prevent inverted indices in materialized views. #36869

Query Optimizer

  • When both sides of a comparison expression are literals, the string literal will attempt to convert to the type of the other side. #36921

  • Refactored the sub-path pushdown functionality for the variant type, now better supporting complex pushdown scenarios. #36923

  • Optimized the logic for calculating the cost of materialized views, enabling more accurate selection of lower-cost materialized views. #37098

  • Improved the SQL cache planning speed when using user variables in SQL. #37119

  • Optimized the row estimation logic for NOT NULL expressions, resulting in better performance when NOT NULL is present in queries. #37498

  • Optimized the null rejection derivation logic for LIKE expressions. #37864

  • Improved error messages when querying a specific partition fails, making it clearer which table is causing the issue. #37280

Query Execution

  • Improved the performance of the bitmap_union operator up to 3 times in certain scenarios.

  • Enhanced the reading performance of Arrow Flight in ARM environments.

  • Optimized the execution performance of the explode, explode_map, and explode_json functions.

Data Loading

  • Support setting max_filter_ratio for INSERT INTO ... FROM TABLE VALUE FUNCTION

Bug fixes

Lakehouse

  • Fixed an issue that caused BE crashes in some cases when querying Parquet format #37086

  • Fixed an issue where BE printed excessive logs when querying Parquet format. #37012

  • Fixed an issue where the FE side created a large number of duplicate FileSystem objects in some cases. #37142

  • Fixed an issue where transaction information was not cleaned up after writing to Hive in some cases. #37172

  • Fixed a thread leak issue caused by Hive table write operations in some cases. #37247

  • Fixed an issue where Hive Text format row and column delimiters could not be correctly obtained in some cases. #37188

  • Fixed a concurrency issue when reading lz4 compressed blocks in some cases. #37187

  • Fixed an issue where count(*) on Iceberg tables returned incorrect results in some cases. #37810

  • Fixed an issue where creating a Paimon catalog based on MinIO caused FE metadata replay errors in some cases. #37249

  • Fixed an issue where using Ranger to create a catalog caused the client to hang in some cases. #37551

Multi-Table Materialized View

  • Fixed an issue where adding new partitions to the base table could lead to incorrect results after partition aggregation roll-up rewrites. #37651

  • Fixed an issue where the materialized view partition status was not set to out-of-sync after deleting associated base table partitions. #36602

  • Fixed an occasional deadlock issue during asynchronous materialized view builds. #37133

  • Fixed an occasional “nereids cost too much time” error when refreshing a large number of partitions in a single asynchronous materialized view refresh. #37589

  • Fixed an issue where an asynchronous materialized view could not be created if the final select list contained a null literal. #37281

  • Fixed an issue with single-table materialized views where, even though the aggregation materialized view was successfully rewritten, the CBO did not select it. #35721 #36058

  • Fixed an issue where partition derivation failed when building a partitioned materialized view with both join inputs being aggregations. #34781

Semi-Structured Data Management

  • Fixed issues with VARIANT in special cases such as concurrency and abnormal data.#37976 #37839 #37794 #37674 #36997

  • Fixed coredump issues when using VARIANT in unsupported SQL. #37640

  • Fixed coredump issues related to MAP data type when upgrading from 1.x to 2.x or higher versions. #36937

  • Improved ES Catalog support for Array types. #36936

Inverted Index

  • Fixed an issue where DROP INDEX for Inverted Index v2 did not delete metadata. #37646

  • Fixed query accuracy issues when string length exceeded the “ignore above” threshold. #37679

  • Fixed issues with index size statistics. #37232 #37564

Query Optimizer

  • Fixed an issue that prevented import operations from executing due to the use of reserved keywords. #35938

  • Fixed a type error where char(255) was incorrectly recorded as char(1) when creating a table. #37671

  • Fixed incorrect results when the join expression in a correlated subquery was a complex expression. #37683

  • Fixed a potential issue with incorrect bucket pruning for decimal types. #38013

  • Fixed incorrect aggregation operator results when pipeline local shuffle was enabled in certain scenarios. #38016

  • Fixed planning errors that could occur when equal expressions existed in aggregation operators. #36622

  • Fixed planning errors that could occur when lambda expressions were present in aggregation operators. #37285

  • Fixed an issue where a literal generated from a window function being optimized to a literal had the wrong type, preventing execution. #37283

  • Fixed an issue with the null attribute being incorrectly output by the aggregate function foreach combinator. #37980

  • Fixed an issue where the acos function could not be planned when its parameter was a literal out of range. #37996

  • Fixed planning errors when specifying partitions for a query on a synchronized materialized view. #36982

  • Fixed occasional Null Pointer Exceptions (NPEs) during planning. #38024

Query Execution

  • Fixed an error in delete where statements when using decimal data types as conditions. #37801

  • Fixed an issue where BE memory was not released after query execution ended. #37792 #37297

  • Fixed a problem where audit logs occupied too much FE memory under high QPS scenarios. #37786

  • Fixed BE core dumps when the sleep function received illegal input values. #37681

  • Fixed an error encountered during sync filter size execution. #37103

  • Fixed incorrect results when using time zones during execution. #37062

  • Fixed incorrect results when casting strings to integers. #36788

  • Fixed query errors when using the Arrow Flight protocol with pipelinex enabled. #35804

  • Fixed errors when casting strings to dates/datetimes. #35637

  • Fixed BE core dumps during large table join queries using <=>. #36263

Storage Management

  • Fixed the issue of invisible DELETE SIGN data encountered during column update and write operations. #36755

  • Optimized FE’s memory usage during schema changes. #36756

  • Fixed the issue where BE would hang during restart due to transactions not being aborted #36437

  • Fixed occasional errors when changing from NOT NULL to NULL data types. #36389

  • Optimized replica repair scheduling when BE goes down. #36897

  • Supported round-robin disk selection for tablet creation on a single BE. #36900

  • Fixed query error -230 caused by slow publishing. #36222

  • Improved the speed of partition balancing. #36976

  • Controlled segment cache using the number of file descriptors (FDs) and memory to avoid FD exhaustion. #37035

  • Fixed potential replica loss caused by concurrent clone and alter operations #36858

  • Fixed the issue of not being able to adjust column order.#37226

  • Prohibited certain schema change operations on auto-increment columns. #37331

  • Fixed inaccurate error reporting for DELETE operations. #37374

  • Adjusted the trash expiration time on BE side to one day. #37409

  • Optimized compaction memory usage and scheduling. #37491

  • Checked for potential oversized backups causing FE restarts. #37466

  • Restored dynamic partition deletion policies and cross-partition behaviors to 2.1.3. #37570 #37506

  • Fixed errors related to decimal types in DELETE predicates. #37710

Data Loading

  • Fixed data invisibility issues caused by race conditions in error handling during imports #36744

  • Added support for hhl_from_base64 in streamload imports. #36819

  • Fixed potential FE OOM issues when importing very large numbers of tablets for a single table. #36944

  • Fixed possible auto-increment column duplication during FE master-slave switchovers. #36961

  • Fixed errors when inserting into select with auto-increment columns. #37029

  • Reduced the number of data flush threads to optimize memory usage. #37092

  • Improved automatic recovery and error messaging for routine load tasks. #37371

  • Increased the default batch size for routine load. #37388

  • Fixed routine load task stoppage due to Kafka EOF expiration. #37983

  • Fixed coredump issues in multi-table streaming. #37370

  • Fixed premature backpressure caused by inaccurate memory estimation in groupcommit. #37379

  • Optimized BE-side thread usage in groupcommit. #37380

  • Fixed the issue of no error URL when data was not partitioned. #37401

  • Fixed potential memory misoperations during imports. #38021

Merge on Write Unique Key

  • Reduced memory usage during compaction for primary key tables. #36968

  • Fixed potential duplicate data issues when primary key replica cloning fails. #37229

Permissions

  • Fixed the issue of missing authorization when a table-valued function references a resource. #37132

  • Fixed the issue where the SHOW ROLE statement did not include workload group permissions. #36032

  • Fixed the issue where executing two statements simultaneously when creating a row policy could cause FE to fail to restart. #37342

  • Fixed the issue where, in some cases, upgrading from an older version could result in FE metadata replay failures due to row policies. #37342

Others

  • Fixed the issue of compute nodes participating in internal table creation. #37961

  • Fixed the read lag issue when enable_strong_read_consistency is set to true. #37641