TPC-H Benchmark

TPC-H 是一个决策支持基准(Decision Support Benchmark),它由一套面向业务的特别查询和并发数据修改组成。查询和填充数据库的数据具有广泛的行业相关性。这个基准测试演示了检查大量数据、执行高度复杂的查询并回答关键业务问题的决策支持系统。TPC-H 报告的性能指标称为 TPC-H 每小时复合查询性能指标 (QphH@Size),反映了系统处理查询能力的多个方面。这些方面包括执行查询时所选择的数据库大小,由单个流提交查询时的查询处理能力,以及由多个并发用户提交查询时的查询吞吐量。

本文档主要介绍 Doris 在 TPC-H 1000G 测试集上的性能表现。

在 TPC-H 标准测试数据集上的 22 个查询上,我们基于 Apache Doris 2.1.1-rc03 和 Apache Doris 2.0.6 版本进行了对比测试。

TPCDS_1000G

1. 硬件环境

硬件配置说明
机器数量4 台腾讯云主机(1 个 FE,3 个 BE)
CPUAMD EPYC™ Milan(2.55GHz/3.5GHz) 48 核
内存192G
网络带宽21Gbps
磁盘高性能云硬盘

2. 软件环境

  • Doris 部署 3BE 1FE
  • 内核版本:Linux version 5.4.0-96-generic (buildd@lgw01-amd64-051)
  • 操作系统版本:Ubuntu 20.04 LTS (Focal Fossa)
  • Doris 软件版本:Apache Doris 2.1.1-rc03、Apache Doris 2.0.6
  • JDK:openjdk version “1.8.0_131”

3. 测试数据量

整个测试模拟生成 TPCH 1000G 的数据分别导入到 Apache Doris 2.1.1-rc03 和 Apache Doris 2.0.6 版本进行测试,下面是表的相关说明及数据量。

TPC-H 表名行数备注
REGION5区域表
NATION25国家表
SUPPLIER1000 万供应商表
PART2 亿零部件表
PARTSUPP8 亿零部件供应表
CUSTOMER1.5 亿客户表
ORDERS15 亿订单表
LINEITEM60 亿订单明细表

4. 测试 SQL

TPCH 22 个测试查询语句: TPCH-Query-SQL

5. 测试结果

这里我们使用 Apache Doris 2.1.1-rc03 和 Apache Doris 2.0.6 版本进行对比测试,测试结果如下:

QueryApache Doris 2.1.1-rc03 (ms)Apache Doris 2.0.6 (ms)
Q172407638
Q2249377
Q335284389
Q415341903
Q554576425
Q6159150
Q715983141
Q829582792
Q91480324732
Q1067437315
Q11414395
Q12371417
Q1362038095
Q14485681
Q1522461459
Q1612521382
Q1714611204
Q181042811386
Q1923292172
Q205691455
Q2155746570
Q2210421517
合计7664395595

6. 环境准备

请先参照 官方文档 进行 Doris 的安装部署,以获得一个正常运行中的 Doris 集群(至少包含 1 FE 1 BE,推荐 1 FE 3 BE)。

7. 数据准备

7.1 下载安装 TPC-H 数据生成工具

执行以下脚本下载并编译 tpch-tools 工具。

  1. sh bin/build-tpch-dbgen.sh

安装成功后,将在 TPC-H_Tools_v3.0.0/ 目录下生成 dbgen 二进制文件。

7.2 生成 TPC-H 测试集

执行以下脚本生成 TPC-H 数据集:

  1. sh bin/gen-tpch-data.sh -s 1000

注 1:通过 sh gen-tpch-data.sh -h 查看脚本帮助。

注 2:数据会以 .tbl 为后缀生成在 tpch-data/ 目录下。文件总大小约 1000GB。生成时间可能在数分钟到 1 小时不等。

注 3:默认生成 100G 的标准测试数据集

7.3 建表

7.3.1 准备 doris-cluster.conf 文件

在调用导入脚本前,需要将 FE 的 ip 端口等信息写在 doris-cluster.conf 文件中。

文件位置在 ${DORIS_HOME}/tools/tpch-tools/conf/ 目录下。

文件内容包括 FE 的 ip,HTTP 端口,用户名,密码以及待导入数据的 DB 名称:

  1. # Any of FE host
  2. export FE_HOST='127.0.0.1'
  3. # http_port in fe.conf
  4. export FE_HTTP_PORT=8030
  5. # query_port in fe.conf
  6. export FE_QUERY_PORT=9030
  7. # Doris username
  8. export USER='root'
  9. # Doris password
  10. export PASSWORD=''
  11. # The database where TPC-H tables located
  12. export DB='tpch'

7.3.2 执行以下脚本生成创建 TPC-H 表

  1. sh bin/create-tpch-tables.sh -s 1000

或者复制 create-tpch-tables.sql 中的建表语句,在 Doris 中执行。

7.4 导入数据

通过下面的命令执行数据导入:

  1. sh bin/load-tpch-data.sh

7.5 检查导入数据

执行下面的 SQL 语句检查导入的数据与上面的数据量一致。

  1. select count(*) from lineitem;
  2. select count(*) from orders;
  3. select count(*) from partsupp;
  4. select count(*) from part;
  5. select count(*) from customer;
  6. select count(*) from supplier;
  7. select count(*) from nation;
  8. select count(*) from region;
  9. select count(*) from revenue0;

7.6 查询测试

7.6.1 执行查询脚本

执行上面的测试 SQL 或者 执行下面的命令

  1. sh bin/run-tpch-queries.sh -s 1000

7.6.2 单个 SQL 执行

下面是测试时使用的 SQL 语句,你也可以从代码库里获取最新的 SQL。最新测试查询语句地址:TPC-H 测试查询语句

  1. --Q1
  2. select
  3. l_returnflag,
  4. l_linestatus,
  5. sum(l_quantity) as sum_qty,
  6. sum(l_extendedprice) as sum_base_price,
  7. sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
  8. sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
  9. avg(l_quantity) as avg_qty,
  10. avg(l_extendedprice) as avg_price,
  11. avg(l_discount) as avg_disc,
  12. count(*) as count_order
  13. from
  14. lineitem
  15. where
  16. l_shipdate <= date '1998-12-01' - interval '90' day
  17. group by
  18. l_returnflag,
  19. l_linestatus
  20. order by
  21. l_returnflag,
  22. l_linestatus;
  23. --Q2
  24. select
  25. s_acctbal,
  26. s_name,
  27. n_name,
  28. p_partkey,
  29. p_mfgr,
  30. s_address,
  31. s_phone,
  32. s_comment
  33. from
  34. part,
  35. supplier,
  36. partsupp,
  37. nation,
  38. region
  39. where
  40. p_partkey = ps_partkey
  41. and s_suppkey = ps_suppkey
  42. and p_size = 15
  43. and p_type like '%BRASS'
  44. and s_nationkey = n_nationkey
  45. and n_regionkey = r_regionkey
  46. and r_name = 'EUROPE'
  47. and ps_supplycost = (
  48. select
  49. min(ps_supplycost)
  50. from
  51. partsupp,
  52. supplier,
  53. nation,
  54. region
  55. where
  56. p_partkey = ps_partkey
  57. and s_suppkey = ps_suppkey
  58. and s_nationkey = n_nationkey
  59. and n_regionkey = r_regionkey
  60. and r_name = 'EUROPE'
  61. )
  62. order by
  63. s_acctbal desc,
  64. n_name,
  65. s_name,
  66. p_partkey
  67. limit 100;
  68. --Q3
  69. select
  70. l_orderkey,
  71. sum(l_extendedprice * (1 - l_discount)) as revenue,
  72. o_orderdate,
  73. o_shippriority
  74. from
  75. customer,
  76. orders,
  77. lineitem
  78. where
  79. c_mktsegment = 'BUILDING'
  80. and c_custkey = o_custkey
  81. and l_orderkey = o_orderkey
  82. and o_orderdate < date '1995-03-15'
  83. and l_shipdate > date '1995-03-15'
  84. group by
  85. l_orderkey,
  86. o_orderdate,
  87. o_shippriority
  88. order by
  89. revenue desc,
  90. o_orderdate
  91. limit 10;
  92. --Q4
  93. select
  94. o_orderpriority,
  95. count(*) as order_count
  96. from
  97. orders
  98. where
  99. o_orderdate >= date '1993-07-01'
  100. and o_orderdate < date '1993-07-01' + interval '3' month
  101. and exists (
  102. select
  103. *
  104. from
  105. lineitem
  106. where
  107. l_orderkey = o_orderkey
  108. and l_commitdate < l_receiptdate
  109. )
  110. group by
  111. o_orderpriority
  112. order by
  113. o_orderpriority;
  114. --Q5
  115. select
  116. n_name,
  117. sum(l_extendedprice * (1 - l_discount)) as revenue
  118. from
  119. customer,
  120. orders,
  121. lineitem,
  122. supplier,
  123. nation,
  124. region
  125. where
  126. c_custkey = o_custkey
  127. and l_orderkey = o_orderkey
  128. and l_suppkey = s_suppkey
  129. and c_nationkey = s_nationkey
  130. and s_nationkey = n_nationkey
  131. and n_regionkey = r_regionkey
  132. and r_name = 'ASIA'
  133. and o_orderdate >= date '1994-01-01'
  134. and o_orderdate < date '1994-01-01' + interval '1' year
  135. group by
  136. n_name
  137. order by
  138. revenue desc;
  139. --Q6
  140. select
  141. sum(l_extendedprice * l_discount) as revenue
  142. from
  143. lineitem
  144. where
  145. l_shipdate >= date '1994-01-01'
  146. and l_shipdate < date '1994-01-01' + interval '1' year
  147. and l_discount between .06 - 0.01 and .06 + 0.01
  148. and l_quantity < 24;
  149. --Q7
  150. select
  151. supp_nation,
  152. cust_nation,
  153. l_year,
  154. sum(volume) as revenue
  155. from
  156. (
  157. select
  158. n1.n_name as supp_nation,
  159. n2.n_name as cust_nation,
  160. extract(year from l_shipdate) as l_year,
  161. l_extendedprice * (1 - l_discount) as volume
  162. from
  163. supplier,
  164. lineitem,
  165. orders,
  166. customer,
  167. nation n1,
  168. nation n2
  169. where
  170. s_suppkey = l_suppkey
  171. and o_orderkey = l_orderkey
  172. and c_custkey = o_custkey
  173. and s_nationkey = n1.n_nationkey
  174. and c_nationkey = n2.n_nationkey
  175. and (
  176. (n1.n_name = 'FRANCE' and n2.n_name = 'GERMANY')
  177. or (n1.n_name = 'GERMANY' and n2.n_name = 'FRANCE')
  178. )
  179. and l_shipdate between date '1995-01-01' and date '1996-12-31'
  180. ) as shipping
  181. group by
  182. supp_nation,
  183. cust_nation,
  184. l_year
  185. order by
  186. supp_nation,
  187. cust_nation,
  188. l_year;
  189. --Q8
  190. select
  191. o_year,
  192. sum(case
  193. when nation = 'BRAZIL' then volume
  194. else 0
  195. end) / sum(volume) as mkt_share
  196. from
  197. (
  198. select
  199. extract(year from o_orderdate) as o_year,
  200. l_extendedprice * (1 - l_discount) as volume,
  201. n2.n_name as nation
  202. from
  203. part,
  204. supplier,
  205. lineitem,
  206. orders,
  207. customer,
  208. nation n1,
  209. nation n2,
  210. region
  211. where
  212. p_partkey = l_partkey
  213. and s_suppkey = l_suppkey
  214. and l_orderkey = o_orderkey
  215. and o_custkey = c_custkey
  216. and c_nationkey = n1.n_nationkey
  217. and n1.n_regionkey = r_regionkey
  218. and r_name = 'AMERICA'
  219. and s_nationkey = n2.n_nationkey
  220. and o_orderdate between date '1995-01-01' and date '1996-12-31'
  221. and p_type = 'ECONOMY ANODIZED STEEL'
  222. ) as all_nations
  223. group by
  224. o_year
  225. order by
  226. o_year;
  227. --Q9
  228. select
  229. nation,
  230. o_year,
  231. sum(amount) as sum_profit
  232. from
  233. (
  234. select
  235. n_name as nation,
  236. extract(year from o_orderdate) as o_year,
  237. l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount
  238. from
  239. part,
  240. supplier,
  241. lineitem,
  242. partsupp,
  243. orders,
  244. nation
  245. where
  246. s_suppkey = l_suppkey
  247. and ps_suppkey = l_suppkey
  248. and ps_partkey = l_partkey
  249. and p_partkey = l_partkey
  250. and o_orderkey = l_orderkey
  251. and s_nationkey = n_nationkey
  252. and p_name like '%green%'
  253. ) as profit
  254. group by
  255. nation,
  256. o_year
  257. order by
  258. nation,
  259. o_year desc;
  260. --Q10
  261. select
  262. c_custkey,
  263. c_name,
  264. sum(l_extendedprice * (1 - l_discount)) as revenue,
  265. c_acctbal,
  266. n_name,
  267. c_address,
  268. c_phone,
  269. c_comment
  270. from
  271. customer,
  272. orders,
  273. lineitem,
  274. nation
  275. where
  276. c_custkey = o_custkey
  277. and l_orderkey = o_orderkey
  278. and o_orderdate >= date '1993-10-01'
  279. and o_orderdate < date '1993-10-01' + interval '3' month
  280. and l_returnflag = 'R'
  281. and c_nationkey = n_nationkey
  282. group by
  283. c_custkey,
  284. c_name,
  285. c_acctbal,
  286. c_phone,
  287. n_name,
  288. c_address,
  289. c_comment
  290. order by
  291. revenue desc
  292. limit 20;
  293. --Q11
  294. select
  295. ps_partkey,
  296. sum(ps_supplycost * ps_availqty) as value
  297. from
  298. partsupp,
  299. supplier,
  300. nation
  301. where
  302. ps_suppkey = s_suppkey
  303. and s_nationkey = n_nationkey
  304. and n_name = 'GERMANY'
  305. group by
  306. ps_partkey having
  307. sum(ps_supplycost * ps_availqty) > (
  308. select
  309. sum(ps_supplycost * ps_availqty) * 0.000002
  310. from
  311. partsupp,
  312. supplier,
  313. nation
  314. where
  315. ps_suppkey = s_suppkey
  316. and s_nationkey = n_nationkey
  317. and n_name = 'GERMANY'
  318. )
  319. order by
  320. value desc;
  321. --Q12
  322. select
  323. l_shipmode,
  324. sum(case
  325. when o_orderpriority = '1-URGENT'
  326. or o_orderpriority = '2-HIGH'
  327. then 1
  328. else 0
  329. end) as high_line_count,
  330. sum(case
  331. when o_orderpriority <> '1-URGENT'
  332. and o_orderpriority <> '2-HIGH'
  333. then 1
  334. else 0
  335. end) as low_line_count
  336. from
  337. orders,
  338. lineitem
  339. where
  340. o_orderkey = l_orderkey
  341. and l_shipmode in ('MAIL', 'SHIP')
  342. and l_commitdate < l_receiptdate
  343. and l_shipdate < l_commitdate
  344. and l_receiptdate >= date '1994-01-01'
  345. and l_receiptdate < date '1994-01-01' + interval '1' year
  346. group by
  347. l_shipmode
  348. order by
  349. l_shipmode;
  350. --Q13
  351. select
  352. c_count,
  353. count(*) as custdist
  354. from
  355. (
  356. select
  357. c_custkey,
  358. count(o_orderkey) as c_count
  359. from
  360. customer left outer join orders on
  361. c_custkey = o_custkey
  362. and o_comment not like '%special%requests%'
  363. group by
  364. c_custkey
  365. ) as c_orders
  366. group by
  367. c_count
  368. order by
  369. custdist desc,
  370. c_count desc;
  371. --Q14
  372. select
  373. 100.00 * sum(case
  374. when p_type like 'PROMO%'
  375. then l_extendedprice * (1 - l_discount)
  376. else 0
  377. end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue
  378. from
  379. lineitem,
  380. part
  381. where
  382. l_partkey = p_partkey
  383. and l_shipdate >= date '1995-09-01'
  384. and l_shipdate < date '1995-09-01' + interval '1' month;
  385. --Q15
  386. select
  387. s_suppkey,
  388. s_name,
  389. s_address,
  390. s_phone,
  391. total_revenue
  392. from
  393. supplier,
  394. revenue0
  395. where
  396. s_suppkey = supplier_no
  397. and total_revenue = (
  398. select
  399. max(total_revenue)
  400. from
  401. revenue0
  402. )
  403. order by
  404. s_suppkey;
  405. --Q16
  406. select
  407. p_brand,
  408. p_type,
  409. p_size,
  410. count(distinct ps_suppkey) as supplier_cnt
  411. from
  412. partsupp,
  413. part
  414. where
  415. p_partkey = ps_partkey
  416. and p_brand <> 'Brand#45'
  417. and p_type not like 'MEDIUM POLISHED%'
  418. and p_size in (49, 14, 23, 45, 19, 3, 36, 9)
  419. and ps_suppkey not in (
  420. select
  421. s_suppkey
  422. from
  423. supplier
  424. where
  425. s_comment like '%Customer%Complaints%'
  426. )
  427. group by
  428. p_brand,
  429. p_type,
  430. p_size
  431. order by
  432. supplier_cnt desc,
  433. p_brand,
  434. p_type,
  435. p_size;
  436. --Q17
  437. select
  438. sum(l_extendedprice) / 7.0 as avg_yearly
  439. from
  440. lineitem,
  441. part
  442. where
  443. p_partkey = l_partkey
  444. and p_brand = 'Brand#23'
  445. and p_container = 'MED BOX'
  446. and l_quantity < (
  447. select
  448. 0.2 * avg(l_quantity)
  449. from
  450. lineitem
  451. where
  452. l_partkey = p_partkey
  453. );
  454. --Q18
  455. select
  456. c_name,
  457. c_custkey,
  458. o_orderkey,
  459. o_orderdate,
  460. o_totalprice,
  461. sum(l_quantity)
  462. from
  463. customer,
  464. orders,
  465. lineitem
  466. where
  467. o_orderkey in (
  468. select
  469. l_orderkey
  470. from
  471. lineitem
  472. group by
  473. l_orderkey having
  474. sum(l_quantity) > 300
  475. )
  476. and c_custkey = o_custkey
  477. and o_orderkey = l_orderkey
  478. group by
  479. c_name,
  480. c_custkey,
  481. o_orderkey,
  482. o_orderdate,
  483. o_totalprice
  484. order by
  485. o_totalprice desc,
  486. o_orderdate
  487. limit 100;
  488. --Q19
  489. select
  490. sum(l_extendedprice* (1 - l_discount)) as revenue
  491. from
  492. lineitem,
  493. part
  494. where
  495. (
  496. p_partkey = l_partkey
  497. and p_brand = 'Brand#12'
  498. and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
  499. and l_quantity >= 1 and l_quantity <= 1 + 10
  500. and p_size between 1 and 5
  501. and l_shipmode in ('AIR', 'AIR REG')
  502. and l_shipinstruct = 'DELIVER IN PERSON'
  503. )
  504. or
  505. (
  506. p_partkey = l_partkey
  507. and p_brand = 'Brand#23'
  508. and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
  509. and l_quantity >= 10 and l_quantity <= 10 + 10
  510. and p_size between 1 and 10
  511. and l_shipmode in ('AIR', 'AIR REG')
  512. and l_shipinstruct = 'DELIVER IN PERSON'
  513. )
  514. or
  515. (
  516. p_partkey = l_partkey
  517. and p_brand = 'Brand#34'
  518. and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
  519. and l_quantity >= 20 and l_quantity <= 20 + 10
  520. and p_size between 1 and 15
  521. and l_shipmode in ('AIR', 'AIR REG')
  522. and l_shipinstruct = 'DELIVER IN PERSON'
  523. );
  524. --Q20
  525. select
  526. s_name,
  527. s_address
  528. from
  529. supplier,
  530. nation
  531. where
  532. s_suppkey in (
  533. select
  534. ps_suppkey
  535. from
  536. partsupp
  537. where
  538. ps_partkey in (
  539. select
  540. p_partkey
  541. from
  542. part
  543. where
  544. p_name like 'forest%'
  545. )
  546. and ps_availqty > (
  547. select
  548. 0.5 * sum(l_quantity)
  549. from
  550. lineitem
  551. where
  552. l_partkey = ps_partkey
  553. and l_suppkey = ps_suppkey
  554. and l_shipdate >= date '1994-01-01'
  555. and l_shipdate < date '1994-01-01' + interval '1' year
  556. )
  557. )
  558. and s_nationkey = n_nationkey
  559. and n_name = 'CANADA'
  560. order by
  561. s_name;
  562. --Q21
  563. select
  564. s_name,
  565. count(*) as numwait
  566. from
  567. supplier,
  568. lineitem l1,
  569. orders,
  570. nation
  571. where
  572. s_suppkey = l1.l_suppkey
  573. and o_orderkey = l1.l_orderkey
  574. and o_orderstatus = 'F'
  575. and l1.l_receiptdate > l1.l_commitdate
  576. and exists (
  577. select
  578. *
  579. from
  580. lineitem l2
  581. where
  582. l2.l_orderkey = l1.l_orderkey
  583. and l2.l_suppkey <> l1.l_suppkey
  584. )
  585. and not exists (
  586. select
  587. *
  588. from
  589. lineitem l3
  590. where
  591. l3.l_orderkey = l1.l_orderkey
  592. and l3.l_suppkey <> l1.l_suppkey
  593. and l3.l_receiptdate > l3.l_commitdate
  594. )
  595. and s_nationkey = n_nationkey
  596. and n_name = 'SAUDI ARABIA'
  597. group by
  598. s_name
  599. order by
  600. numwait desc,
  601. s_name
  602. limit 100;
  603. --Q22
  604. select
  605. cntrycode,
  606. count(*) as numcust,
  607. sum(c_acctbal) as totacctbal
  608. from
  609. (
  610. select
  611. substring(c_phone, 1, 2) as cntrycode,
  612. c_acctbal
  613. from
  614. customer
  615. where
  616. substring(c_phone, 1, 2) in
  617. ('13', '31', '23', '29', '30', '18', '17')
  618. and c_acctbal > (
  619. select
  620. avg(c_acctbal)
  621. from
  622. customer
  623. where
  624. c_acctbal > 0.00
  625. and substring(c_phone, 1, 2) in
  626. ('13', '31', '23', '29', '30', '18', '17')
  627. )
  628. and not exists (
  629. select
  630. *
  631. from
  632. orders
  633. where
  634. o_custkey = c_custkey
  635. )
  636. ) as custsale
  637. group by
  638. cntrycode
  639. order by
  640. cntrycode;