事务是指一个操作,包含一个或多个SQL语句,这些语句的执行要么完全成功,要么完全失败,是一个不可分割的工作单位。

显式事务和隐式事务

显式事务

显式事务需要用户主动的开启,提交或回滚事务。 在 Doris 中,提供了 2 种显式事务:

  1. 本文中介绍的事务写方式,即:

    1. BEGIN;
    2. [INSERT, UPDATE, DELETE statement]
    3. COMMIT; / ROLLBACK;
  2. Stream Load 2PC

隐式事务

隐式事务是指用户在所执行的一条或多条SQL语句的前后,没有显式添加开启事务和提交事务的语句。

在 Doris 中,除Group Commit外,每个导入语句在开始执行时都会开启一个事务,并且在该语句执行完成之后,自动提交该事务;或执行失败后,自动回滚该事务。更多详细信息请参考: 导入事务与原子性

事务操作

开启事务

  1. BEGIN;
  2. BEGIN WITH LABEL {user_label};

如果执行该语句时,当前 Session 正处于一个事务的中间过程,那么 Doris 会忽略该语句,也可以理解为事务是不能嵌套的。

提交事务

  1. COMMIT;

用于提交在当前事务中进行的所有修改。

回滚事务

  1. ROLLBACK;

用于撤销当前事务的所有修改。

事务是 Session 级别的,如果 Session 中止或关闭,也会自动回滚该事务。

事务写入

目前 Doris 中支持 2 种方式的事务写入。

单表多次INSERT INTO VALUES写入

假如表的结构为:

  1. CREATE TABLE `dt` (
  2. `id` INT(11) NOT NULL,
  3. `name` VARCHAR(50) NULL,
  4. `score` INT(11) NULL
  5. ) ENGINE=OLAP
  6. UNIQUE KEY(`id`)
  7. DISTRIBUTED BY HASH(`id`) BUCKETS 1
  8. PROPERTIES (
  9. "replication_num" = "1"
  10. );

写入:

  1. mysql> BEGIN;
  2. Query OK, 0 rows affected (0.01 sec)
  3. {'label':'txn_insert_b55db21aad7451b-b5b6c339704920c5', 'status':'PREPARE', 'txnId':''}
  4. mysql> INSERT INTO dt (id, name, score) VALUES (1, "Emily", 25), (2, "Benjamin", 35), (3, "Olivia", 28), (4, "Alexander", 60), (5, "Ava", 17);
  5. Query OK, 5 rows affected (0.08 sec)
  6. {'label':'txn_insert_b55db21aad7451b-b5b6c339704920c5', 'status':'PREPARE', 'txnId':'10013'}
  7. mysql> INSERT INTO dt VALUES (6, "William", 69), (7, "Sophia", 32), (8, "James", 64), (9, "Emma", 37), (10, "Liam", 64);
  8. Query OK, 5 rows affected (0.00 sec)
  9. {'label':'txn_insert_b55db21aad7451b-b5b6c339704920c5', 'status':'PREPARE', 'txnId':'10013'}
  10. mysql> COMMIT;
  11. Query OK, 0 rows affected (1.02 sec)
  12. {'label':'txn_insert_b55db21aad7451b-b5b6c339704920c5', 'status':'VISIBLE', 'txnId':'10013'}

这种写入方式不仅可以实现写入的原子性,而且在 Doris 中,能提升 INSERT INTO VALUES 的写入性能。

如果用户同时开启了 Group Commit 和事务写,事务写生效。

也可以参考 Insert Into获取更多信息。

多表多次INSERT INTO SELECT, UPDATE, DELETE写入

假设有dt1, dt2, dt3 3 张表,表结构同上,表中数据为:

  1. mysql> SELECT * FROM dt1;
  2. +------+-----------+-------+
  3. | id | name | score |
  4. +------+-----------+-------+
  5. | 1 | Emily | 25 |
  6. | 2 | Benjamin | 35 |
  7. | 3 | Olivia | 28 |
  8. | 4 | Alexander | 60 |
  9. | 5 | Ava | 17 |
  10. +------+-----------+-------+
  11. 5 rows in set (0.04 sec)
  12. mysql> SELECT * FROM dt2;
  13. +------+---------+-------+
  14. | id | name | score |
  15. +------+---------+-------+
  16. | 6 | William | 69 |
  17. | 7 | Sophia | 32 |
  18. | 8 | James | 64 |
  19. | 9 | Emma | 37 |
  20. | 10 | Liam | 64 |
  21. +------+---------+-------+
  22. 5 rows in set (0.03 sec)
  23. mysql> SELECT * FROM dt3;
  24. Empty set (0.03 sec)

做事务写入,把dt1dt2的数据写入到dt3中,同时,对dt1表中的分数进行更新,dt2表中的数据进行删除:

  1. mysql> BEGIN;
  2. Query OK, 0 rows affected (0.00 sec)
  3. {'label':'txn_insert_442a6311f6c541ae-b57d7f00fa5db028', 'status':'PREPARE', 'txnId':''}
  4. # 导入任务的状态是 PREPARE
  5. mysql> INSERT INTO dt3 SELECT * FROM dt1;
  6. Query OK, 5 rows affected (0.07 sec)
  7. {'label':'txn_insert_442a6311f6c541ae-b57d7f00fa5db028', 'status':'PREPARE', 'txnId':'11024'}
  8. mysql> INSERT INTO dt3 SELECT * FROM dt2;
  9. Query OK, 5 rows affected (0.08 sec)
  10. {'label':'txn_insert_442a6311f6c541ae-b57d7f00fa5db028', 'status':'PREPARE', 'txnId':'11025'}
  11. mysql> UPDATE dt1 SET score = score + 10 WHERE id >= 4;
  12. Query OK, 2 rows affected (0.07 sec)
  13. {'label':'txn_insert_442a6311f6c541ae-b57d7f00fa5db028', 'status':'PREPARE', 'txnId':'11026'}
  14. mysql> DELETE FROM dt2 WHERE id >= 9;
  15. Query OK, 0 rows affected (0.01 sec)
  16. {'label':'txn_insert_442a6311f6c541ae-b57d7f00fa5db028', 'status':'PREPARE', 'txnId':'11027'}
  17. mysql> COMMIT;
  18. Query OK, 0 rows affected (0.03 sec)
  19. {'label':'txn_insert_442a6311f6c541ae-b57d7f00fa5db028', 'status':'VISIBLE', 'txnId':'11024'}

查询数据:

  1. # id >= 4 的分数加 10
  2. mysql> SELECT * FROM dt1;
  3. +------+-----------+-------+
  4. | id | name | score |
  5. +------+-----------+-------+
  6. | 1 | Emily | 25 |
  7. | 2 | Benjamin | 35 |
  8. | 3 | Olivia | 28 |
  9. | 4 | Alexander | 70 |
  10. | 5 | Ava | 27 |
  11. +------+-----------+-------+
  12. 5 rows in set (0.01 sec)
  13. # id >= 9 的数据被删除
  14. mysql> SELECT * FROM dt2;
  15. +------+---------+-------+
  16. | id | name | score |
  17. +------+---------+-------+
  18. | 6 | William | 69 |
  19. | 7 | Sophia | 32 |
  20. | 8 | James | 64 |
  21. +------+---------+-------+
  22. 3 rows in set (0.02 sec)
  23. # dt1 和 dt2 中已提交的数据被写入到 dt3 中
  24. mysql> SELECT * FROM dt3;
  25. +------+-----------+-------+
  26. | id | name | score |
  27. +------+-----------+-------+
  28. | 1 | Emily | 25 |
  29. | 2 | Benjamin | 35 |
  30. | 3 | Olivia | 28 |
  31. | 4 | Alexander | 60 |
  32. | 5 | Ava | 17 |
  33. | 6 | William | 69 |
  34. | 7 | Sophia | 32 |
  35. | 8 | James | 64 |
  36. | 9 | Emma | 37 |
  37. | 10 | Liam | 64 |
  38. +------+-----------+-------+
  39. 10 rows in set (0.01 sec)

隔离级别

目前 Doris 事务写提供的隔离级别为 READ COMMITTED。需要注意以下两点:

  • 事务中的多个语句,每个语句会读取到本语句开始执行时已提交的数据,如:

    1. timestamp | ------------ Session 1 ------------ | ------------ Session 2 ------------
    2. t1 | BEGIN; |
    3. t2 | # read n rows from dt1 table |
    4. | INSERT INTO dt3 SELECT * FROM dt1; |
    5. t3 | | # write 2 rows to dt1 table
    6. | | INSERT INTO dt1 VALUES(...), (...);
    7. t4 | # read n + 2 rows from dt1 table |
    8. | INSERT INTO dt3 SELECT * FROM dt1; |
    9. t5 | COMMIT; |
  • 事务中的多个语句,每个语句不能读到本事务内其它语句做出的修改,如:

    假如事务开启前,表 dt1 有 5 行,表 dt2 有 5 行,表 dt3 为空,执行以下语句:

    1. BEGIN;
    2. # dt2 中写入 5 行,事务提交后共 10 行
    3. INSERT INTO dt2 SELECT * FROM dt1;
    4. # dt3 中写入 5 行,不能读出上一步中 dt2 中新写入的数据
    5. INSERT INTO dt3 SELECT * FROM dt2;
    6. COMMIT;

    具体的例子为:

    1. # 建表并写入数据
    2. CREATE TABLE `dt1` (
    3. `id` INT(11) NOT NULL,
    4. `name` VARCHAR(50) NULL,
    5. `score` INT(11) NULL
    6. ) ENGINE=OLAP
    7. DUPLICATE KEY(`id`)
    8. DISTRIBUTED BY HASH(`id`) BUCKETS 1
    9. PROPERTIES (
    10. "replication_num" = "1"
    11. );
    12. CREATE TABLE dt2 LIKE dt1;
    13. CREATE TABLE dt3 LIKE dt1;
    14. INSERT INTO dt1 VALUES (1, "Emily", 25), (2, "Benjamin", 35), (3, "Olivia", 28), (4, "Alexander", 60), (5, "Ava", 17);
    15. INSERT INTO dt2 VALUES (6, "William", 69), (7, "Sophia", 32), (8, "James", 64), (9, "Emma", 37), (10, "Liam", 64);
    16. # 事务写
    17. BEGIN;
    18. INSERT INTO dt2 SELECT * FROM dt1;
    19. INSERT INTO dt3 SELECT * FROM dt2;
    20. COMMIT;
    21. # 查询
    22. mysql> SELECT * FROM dt2;
    23. +------+-----------+-------+
    24. | id | name | score |
    25. +------+-----------+-------+
    26. | 6 | William | 69 |
    27. | 7 | Sophia | 32 |
    28. | 8 | James | 64 |
    29. | 9 | Emma | 37 |
    30. | 10 | Liam | 64 |
    31. | 1 | Emily | 25 |
    32. | 2 | Benjamin | 35 |
    33. | 3 | Olivia | 28 |
    34. | 4 | Alexander | 60 |
    35. | 5 | Ava | 17 |
    36. +------+-----------+-------+
    37. 10 rows in set (0.01 sec)
    38. mysql> SELECT * FROM dt3;
    39. +------+---------+-------+
    40. | id | name | score |
    41. +------+---------+-------+
    42. | 6 | William | 69 |
    43. | 7 | Sophia | 32 |
    44. | 8 | James | 64 |
    45. | 9 | Emma | 37 |
    46. | 10 | Liam | 64 |
    47. +------+---------+-------+
    48. 5 rows in set (0.01 sec)

事务中执行失败的语句

当事务中的某个语句执行失败时,这个操作已经自动回滚。然而,事务中其它执行成功的语句,仍然是可提交或回滚的。当事务被成功提交后,事务中执行成功的语句的修改被应用。

比如:

  1. mysql> BEGIN;
  2. Query OK, 0 rows affected (0.00 sec)
  3. {'label':'txn_insert_c5940d31bf364f57-a48b628886415442', 'status':'PREPARE', 'txnId':''}
  4. mysql> INSERT INTO dt3 SELECT * FROM dt1;
  5. Query OK, 5 rows affected (0.07 sec)
  6. {'label':'txn_insert_c5940d31bf364f57-a48b628886415442', 'status':'PREPARE', 'txnId':'11058'}
  7. # 失败的写入自动回滚
  8. mysql> INSERT INTO dt3 SELECT * FROM dt2;
  9. ERROR 5025 (HY000): Insert has filtered data in strict mode, tracking_url=http://172.21.16.12:9082/api/_load_error_log?file=__shard_3/error_log_insert_stmt_3d1fed266ce443f2-b54d2609c2ea6b11_3d1fed266ce443f2_b54d2609c2ea6b11
  10. mysql> INSERT INTO dt3 SELECT * FROM dt2 WHERE id = 7;
  11. Query OK, 0 rows affected (0.07 sec)
  12. mysql> COMMIT;
  13. Query OK, 0 rows affected (0.02 sec)
  14. {'label':'txn_insert_c5940d31bf364f57-a48b628886415442', 'status':'VISIBLE', 'txnId':'11058'}

查询:

  1. # dt1 的数据被写入到 dt3 中,dt2 中 id = 7的数据写入成功,其它写入失败
  2. mysql> SELECT * FROM dt3;
  3. +------+----------+-------+
  4. | id | name | score |
  5. +------+----------+-------+
  6. | 1 | Emily | 25 |
  7. | 2 | Benjamin | 35 |
  8. | 3 | Olivia | 28 |
  9. | 4 | Alexande | 60 |
  10. | 5 | Ava | 17 |
  11. | 7 | Sophia | 32 |
  12. +------+----------+-------+
  13. 6 rows in set (0.01 sec)

常见问题

  • 写入的多表必须属于同一个 Database,否则会遇到错误 Transaction insert must be in the same database

  • 两种事务写入INSERT INTO SELECT, UPDATE, DELETEINSET INTO VALUES 不能混用,否则会遇到错误 Transaction insert can not insert into values and insert into select at the same time

  • Delete 操作提供了通过谓词和 Using 子句两种方式,为了保证隔离级别,在一个事务中,对相同表的删除必须在写入前,否则会遇到报错 Can not delete because there is a insert operation for the same table

  • 当从 BEGIN 开始的导入耗时超出 Doris 配置的 timeout 时,会导致事务回滚,导入失败。目前 timeout 使用的是 Session 变量 insert_timeoutquery_timeout 的最大值

  • 当使用 JDBC 连接 Doris 进行事务操作时,请在 JDBC URL 中添加 useLocalSessionState=true,否则可能会遇到错误 This is in a transaction, only insert, update, delete, commit, rollback is acceptable.

  • 存算分离模式下,事务写不支持 Merge-on-Write 表,否则会遇到报错 Transaction load is not supported for merge on write unique keys table in cloud mode