Feature

Highlight

  1. Full Vectorizied-Engine support, greatly improved performance

    In the standard ssb-100-flat benchmark, the performance of 1.2 is 2 times faster than that of 1.1; in complex TPCH 100 benchmark, the performance of 1.2 is 3 times faster than that of 1.1.

  2. Merge-on-Write Unique Key

    Support Merge-On-Write on Unique Key Model. This mode marks the data that needs to be deleted or updated when the data is written, thereby avoiding the overhead of Merge-On-Read when querying, and greatly improving the reading efficiency on the updateable data model.

  3. Multi Catalog

    The multi-catalog feature provides Doris with the ability to quickly access external data sources for access. Users can connect to external data sources through the CREATE CATALOG command. Doris will automatically map the library and table information of external data sources. After that, users can access the data in these external data sources just like accessing ordinary tables. It avoids the complicated operation that the user needs to manually establish external mapping for each table.

    Currently this feature supports the following data sources:

    1. Hive Metastore: You can access data tables including Hive, Iceberg, and Hudi. It can also be connected to data sources compatible with Hive Metastore, such as Alibaba Cloud’s DataLake Formation. Supports data access on both HDFS and object storage.
    2. Elasticsearch: Access ES data sources.
    3. JDBC: Access MySQL through the JDBC protocol.

    Documentation: https://doris.apache.org//docs/dev/lakehouse/multi-catalog)

    Note: The corresponding permission level will also be changed automatically, see the “Upgrade Notes” section for details.

  4. Light table structure changes

In the new version, it is no longer necessary to change the data file synchronously for the operation of adding and subtracting columns to the data table, and only need to update the metadata in FE, thus realizing the millisecond-level Schema Change operation. Through this function, the DDL synchronization capability of upstream CDC data can be realized. For example, users can use Flink CDC to realize DML and DDL synchronization from upstream database to Doris.

Documentation: https://doris.apache.org//docs/dev/sql-manual/sql-reference/Data-Definition-Statements/Create/CREATE-TABLE

When creating a table, set "light_schema_change"="true" in properties.

  1. JDBC facade

    Users can connect to external data sources through JDBC. Currently supported:

    • MySQL
    • PostgreSQL
    • Oracle
    • SQL Server
    • Clickhouse

    Documentation: https://doris.apache.org/en/docs/dev/lakehouse/multi-catalog/jdbc

    Note: The ODBC feature will be removed in a later version, please try to switch to the JDBC.

  2. JAVA UDF

    Supports writing UDF/UDAF in Java, which is convenient for users to use custom functions in the Java ecosystem. At the same time, through technologies such as off-heap memory and Zero Copy, the efficiency of cross-language data access has been greatly improved.

    Document: https://doris.apache.org//docs/dev/ecosystem/udf/java-user-defined-function

    Example: https://github.com/apache/doris/tree/master/samples/doris-demo

  3. Remote UDF

    Supports accessing remote user-defined function services through RPC, thus completely eliminating language restrictions for users to write UDFs. Users can use any programming language to implement custom functions to complete complex data analysis work.

    Documentation: https://doris.apache.org//docs/ecosystem/udf/remote-user-defined-function

    Example: https://github.com/apache/doris/tree/master/samples/doris-demo

  4. More data types support

    • Array type

      Array types are supported. It also supports nested array types. In some scenarios such as user portraits and tags, the Array type can be used to better adapt to business scenarios. At the same time, in the new version, we have also implemented a large number of data-related functions to better support the application of data types in actual scenarios.

    Documentation: https://doris.apache.org//docs/dev/sql-manual/sql-reference/Data-Types/ARRAY

    Related functions: https://doris.apache.org//docs/dev/sql-manual/sql-functions/array-functions/array_max

    • Jsonb type

      Support binary Json data type: Jsonb. This type provides a more compact json encoding format, and at the same time provides data access in the encoding format. Compared with json data stored in strings, it is several times newer and can be improved.

    Documentation: https://doris.apache.org//docs/dev/sql-manual/sql-reference/Data-Types/JSONB

    Related functions: https://doris.apache.org//docs/dev/sql-manual/sql-functions/json-functions/jsonb_parse

    • Date V2

      Sphere of influence:

      1. The user needs to specify datev2 and datetimev2 when creating the table, and the date and datetime of the original table will not be affected.
      2. When datev2 and datetimev2 are calculated with the original date and datetime (for example, equivalent connection), the original type will be cast into a new type for calculation
      3. The example is in the documentation

      Documentation: https://doris.apache.org/docs/dev/sql-manual/sql-reference/Data-Types/DATEV2

More

  1. A new memory management framework

    Documentation: https://doris.apache.org//docs/dev/admin-manual/maint-monitor/memory-management/memory-tracker

  2. Table Valued Function

    Doris implements a set of Table Valued Function (TVF). TVF can be regarded as an ordinary table, which can appear in all places where “table” can appear in SQL.

    For example, we can use S3 TVF to implement data import on object storage:

    1. insert into tbl select * from s3("s3://bucket/file.*", "ak" = "xx", "sk" = "xxx") where c1 > 2;

    Or directly query data files on HDFS:

    1. insert into tbl select * from hdfs("hdfs://bucket/file.*") where c1 > 2;

    TVF can help users make full use of the rich expressiveness of SQL and flexibly process various data.

    Documentation:

    https://doris.apache.org//docs/dev/sql-manual/sql-functions/table-functions/s3

    https://doris.apache.org//docs/dev/sql-manual/sql-functions/table-functions/hdfs

  3. A more convenient way to create partitions

    Support for creating multiple partitions within a time range via the FROM TO command.

  4. Column renaming

    For tables with Light Schema Change enabled, column renaming is supported.

    Documentation: https://doris.apache.org//docs/dev/sql-manual/sql-reference/Data-Definition-Statements/Alter/ALTER-TABLE-RENAME

  5. Richer permission management

  6. Import

  7. Support viewing the contents of the catalog recycle bin through SHOW CATALOG RECYCLE BIN function.

    Documentation: https://doris.apache.org//docs/dev/sql-manual/sql-reference/Show-Statements/SHOW-CATALOG-RECYCLE-BIN

  8. Support SELECT * EXCEPT syntax.

    Documentation: https://doris.apache.org//docs/dev/data-table/basic-usage

  9. OUTFILE supports ORC format export. And supports multi-byte delimiters.

    Documentation: https://doris.apache.org//docs/dev/sql-manual/sql-reference/Data-Manipulation-Statements/OUTFILE

  10. Support to modify the number of Query Profiles that can be saved through configuration.

    Document search FE configuration item: max_query_profile_num

  11. The DELETE statement supports IN predicate conditions. And it supports partition pruning.

    Documentation: https://doris.apache.org//docs/dev/sql-manual/sql-reference/Data-Manipulation-Statements/Manipulation/DELETE

  12. The default value of the time column supports using CURRENT_TIMESTAMP

    Search for “CURRENT_TIMESTAMP” in the documentation: https://doris.apache.org//docs/dev/sql-manual/sql-reference/Data-Definition-Statements/Create/CREATE-TABLE

  13. Add two system tables: backends, rowsets

    Documentation:

    https://doris.apache.org//docs/dev/admin-manual/system-table/backends

    https://doris.apache.org//docs/dev/admin-manual/system-table/rowsets

  14. Backup and restore

    • The Restore job supports the reserve_replica parameter, so that the number of replicas of the restored table is the same as that of the backup.

    • The Restore job supports reserve_dynamic_partition_enable parameter, so that the restored table keeps the dynamic partition enabled.

    Documentation: https://doris.apache.org//docs/dev/sql-manual/sql-reference/Data-Definition-Statements/Backup-and-Restore/RESTORE

    • Support backup and restore operations through the built-in libhdfs, no longer rely on broker.

    Documentation: https://doris.apache.org//docs/dev/sql-manual/sql-reference/Data-Definition-Statements/Backup-and-Restore/CREATE-REPOSITORY

  15. Support data balance between multiple disks on the same machine

    Documentation:

    https://doris.apache.org//docs/dev/sql-manual/sql-reference/Database-Administration-Statements/ADMIN-REBALANCE-DISK

    https://doris.apache.org//docs/dev/sql-manual/sql-reference/Database-Administration-Statements/ADMIN-CANCEL-REBALANCE-DISK

  16. Routine Load supports subscribing to Kerberos-authenticated Kafka services.

    Search for kerberos in the documentation: https://doris.apache.org//docs/dev/data-operate/import/import-way/routine-load-manual

  17. New built-in-function

    Added the following built-in functions:

    • cbrt
    • sequence_match/sequence_count
    • mask/mask_first_n/mask_last_n
    • elt
    • any/any_value
    • group_bitmap_xor
    • ntile
    • nvl
    • uuid
    • initcap
    • regexp_replace_one/regexp_extract_all
    • multi_search_all_positions/multi_match_any
    • domain/domain_without_www/protocol
    • running_difference
    • bitmap_hash64
    • murmur_hash3_64
    • to_monday
    • not_null_or_empty
    • window_funnel
    • group_bit_and/group_bit_or/group_bit_xor
    • outer combine
    • and all array functions

Upgrade Notice

Known Issues

  • Use JDK11 will cause BE crash, please use JDK8 instead.

Behavior Changed

  • Permission level changes

    Because the catalog level is introduced, the corresponding user permission level will also be changed automatically. The rules are as follows:

    • GlobalPrivs and ResourcePrivs remain unchanged
    • Added CatalogPrivs level.
    • The original DatabasePrivs level is added with the internal prefix (indicating the db in the internal catalog)
    • Add the internal prefix to the original TablePrivs level (representing tbl in the internal catalog)
  • In GroupBy and Having clauses, match on column names in preference to aliases. (#14408)

  • Creating columns starting with mv_ is no longer supported. mv_ is a reserved keyword in materialized views (#14361)

  • Removed the default limit of 65535 rows added by the order by statement, and added the session variable default_order_by_limit to configure this limit. (#12478)

  • In the table generated by “Create Table As Select”, all string columns use the string type uniformly, and no longer distinguish varchar/char/string (#14382)

  • In the audit log, remove the word default_cluster before the db and user names. (#13499) (#11408)

  • Add sql digest field in audit log (#8919)

  • The union clause always changes the order by logic. In the new version, the order by clause will be executed after the union is executed, unless explicitly associated by parentheses. (#9745)

  • During the decommission operation, the tablet in the recycle bin will be ignored to ensure that the decomission can be completed. (#14028)

  • The returned result of Decimal will be displayed according to the precision declared in the original column, or according to the precision specified in the cast function. (#13437)

  • Changed column name length limit from 64 to 256 (#14671)

  • Changes to FE configuration items

    • The enable_vectorized_load parameter is enabled by default. (#11833)

    • Increased create_table_timeout value. The default timeout for table creation operations will be increased. (#13520)

    • Modify stream_load_default_timeout_second default value to 3 days.

    • Modify the default value of alter_table_timeout_second to one month.

    • Increase the parameter max_replica_count_when_schema_change to limit the number of replicas involved in the alter job, the default is 100000. (#12850)

    • Add disable_iceberg_hudi_table. The iceberg and hudi appearances are disabled by default, and the multi catalog function is recommended. (#13932)

  • Changes to BE configuration items

    • Removed disable_stream_load_2pc parameter. 2PC’s stream load can be used directly. (#13520)

    • Modify tablet_rowset_stale_sweep_time_sec from 1800 seconds to 300 seconds.

    • Redesigned configuration item name about compaction (#13495)

    • Revisited parameter about memory optimization (#13781)

  • Session variable changes

    • Modify the variable enable_insert_strict to true by default. This will cause some insert operations that could be executed before, but inserted illegal values, to no longer be executed. (11866)

    • Modified variable enable_local_exchange to default to true (#13292)

    • Default data transmission via lz4 compression, controlled by variable fragment_transmission_compression_codec (#11955)

    • Add skip_storage_engine_merge variable for debugging unique or agg model data (#11952)

      Documentation: https://doris.apache.org//docs/dev/advanced/variables

  • The BE startup script will check whether the value is greater than 200W through /proc/sys/vm/max_map_count. Otherwise, the startup fails. (#11052)

  • Removed mini load interface (#10520)

  • FE Metadata Version

    FE Meta Version changed from 107 to 114, and cannot be rolled back after upgrading.

During Upgrade

  1. Upgrade preparation

    • Need to replace: lib, bin directory (start/stop scripts have been modified)

    • BE also needs to configure JAVA_HOME, and already supports JDBC Table and Java UDF.

    • The default JVM Xmx parameter in fe.conf is changed to 8GB.

  2. Possible errors during the upgrade process

    • The repeat function cannot be used and an error is reported: vectorized repeat function cannot be executed, you can turn off the vectorized execution engine before upgrading. (#13868)

    • schema change fails with error: desc_tbl is not set. Maybe the FE version is not equal to the BE (#13822)

    • Vectorized hash join cannot be used and an error will be reported. vectorized hash join cannot be executed. You can turn off the vectorized execution engine before upgrading. (#13753)

    The above errors will return to normal after a full upgrade.

Performance Impact

  • By default, JeMalloc is used as the memory allocator of the new version BE, replacing TcMalloc (#13367)

  • The batch size in the tablet sink is modified to be at least 8K. (#13912)

  • Disable chunk allocator by default (#13285)

Api change

Big Thanks

Thanks to ALL who contributed to this release! (alphabetically)

  1. @924060929
  2. @a19920714liou
  3. @adonis0147
  4. @Aiden-Dong
  5. @aiwenmo
  6. @AshinGau
  7. @b19mud
  8. @BePPPower
  9. @BiteTheDDDDt
  10. @bridgeDream
  11. @ByteYue
  12. @caiconghui
  13. @CalvinKirs
  14. @cambyzju
  15. @caoliang-web
  16. @carlvinhust2012
  17. @catpineapple
  18. @ccoffline
  19. @chenlinzhong
  20. @chovy-3012
  21. @coderjiang
  22. @cxzl25
  23. @dataalive
  24. @dataroaring
  25. @dependabot[bot]
  26. @dinggege1024
  27. @DongLiang-0
  28. @Doris-Extras
  29. @eldenmoon
  30. @EmmyMiao87
  31. @englefly
  32. @FreeOnePlus
  33. @Gabriel39
  34. @gaodayue
  35. @geniusjoe
  36. @gj-zhang
  37. @gnehil
  38. @GoGoWen
  39. @HappenLee
  40. @hello-stephen
  41. @Henry2SS
  42. @hf200012
  43. @huyuanfeng2018
  44. @jacktengg
  45. @jackwener
  46. @jeffreys-cat
  47. @Jibing-Li
  48. @JNSimba
  49. @Kikyou1997
  50. @Lchangliang
  51. @LemonLiTree
  52. @lexoning
  53. @liaoxin01
  54. @lide-reed
  55. @link3280
  56. @liutang123
  57. @liuyaolin
  58. @LOVEGISER
  59. @lsy3993
  60. @luozenglin
  61. @luzhijing
  62. @madongz
  63. @morningman
  64. @morningman-cmy
  65. @morrySnow
  66. @mrhhsg
  67. @Myasuka
  68. @myfjdthink
  69. @nextdreamblue
  70. @pan3793
  71. @pangzhili
  72. @pengxiangyu
  73. @platoneko
  74. @qidaye
  75. @qzsee
  76. @SaintBacchus
  77. @SeekingYang
  78. @smallhibiscus
  79. @sohardforaname
  80. @song7788q
  81. @spaces-X
  82. @ssusieee
  83. @stalary
  84. @starocean999
  85. @SWJTU-ZhangLei
  86. @TaoZex
  87. @timelxy
  88. @Wahno
  89. @wangbo
  90. @wangshuo128
  91. @wangyf0555
  92. @weizhengte
  93. @weizuo93
  94. @wsjz
  95. @wunan1210
  96. @xhmz
  97. @xiaokang
  98. @xiaokangguo
  99. @xinyiZzz
  100. @xy720
  101. @yangzhg
  102. @Yankee24
  103. @yeyudefeng
  104. @yiguolei
  105. @yinzhijian
  106. @yixiutt
  107. @yuanyuan8983
  108. @zbtzbtzbt
  109. @zenoyang
  110. @zhangboya1
  111. @zhangstar333
  112. @zhannngchen
  113. @ZHbamboo
  114. @zhengshiJ
  115. @zhenhb
  116. @zhqu1148980644
  117. @zuochunwei
  118. @zy-kkk