TPC-H Benchmark

TPC-H is a decision support benchmark (Decision Support Benchmark), which consists of a set of business-oriented special query and concurrent data modification. The data that is queried and populates the database has broad industry relevance. This benchmark demonstrates a decision support system that examines large amounts of data, executes highly complex queries, and answers key business questions. The performance index reported by TPC-H is called TPC-H composite query performance index per hour (QphH@Size), which reflects multiple aspects of the system’s ability to process queries. These aspects include the database size chosen when executing the query, the query processing capability when the query is submitted by a single stream, and the query throughput when the query is submitted by many concurrent users.

This document mainly introduces the performance of Doris on the TPC-H 1000G test set.

On 22 queries on the TPC-H standard test data set, we conducted a comparison test based on Apache Doris 2.1.1-rc03 and Apache Doris 2.0.6 versions.

image-20220614114351241

1. Hardware Environment

HardwareConfiguration Instructions
Number of mMachines4 Tencent Cloud Virtual Machine(1FE,3BEs)
CPUAMD EPYC™ Milan(2.55GHz/3.5GHz) 48C
Memory192G
Network21Gbps
DiskESSD Cloud Hard Disk

2. Software Environment

  • Doris Deployed 3BEs and 1FE
  • Kernel Version: Linux version 5.4.0-96-generic (buildd@lgw01-amd64-051)
  • OS version: Ubuntu 20.04 LTS (Focal Fossa)
  • Doris software version: Apache Doris 2.1.1-rc03, Apache Doris 2.0.6.
  • JDK: openjdk version “1.8.0_131”

3. Test Data Volume

The TPCH 1000G data generated by the simulation of the entire test are respectively imported into Apache Doris 2.1.1-rc03 and Apache Doris 2.0.6 for testing. The following is the relevant description and data volume of the table.

TPC-H Table NameRowsAnnotation
REGION5Region
NATION25Nation
SUPPLIER10,000,000Supplier
PART200,000,000Parts
PARTSUPP800,000,000Parts Supply
CUSTOMER150,000,000Customer
ORDERS1,500,000,000Orders
LINEITEM5,999,989,709Order Details

4. Test SQL

TPCH 22 test query statements : TPCH-Query-SQL

5. Test Results

Here we use Apache Doris 2.1.1-rc03 and Apache Doris 2.0.6 for comparative testing. In the test, we use Query Time(ms) as the main performance indicator. The test results are as follows:

QueryApache Doris 2.1.1-rc03 (ms)Apache Doris 2.0.6 (ms)
Q172407638
Q2249377
Q335284389
Q415341903
Q554576425
Q6159150
Q715983141
Q829582792
Q91480324732
Q1067437315
Q11414395
Q12371417
Q1362038095
Q14485681
Q1522461459
Q1612521382
Q1714611204
Q181042811386
Q1923292172
Q205691455
Q2155746570
Q2210421517
Total7664395595

6. Environmental Preparation

Please refer to the official document to install and deploy Doris to obtain a normal running Doris cluster (at least 1 FE 1 BE, 1 FE 3 BE is recommended).

7. Data Preparation

7.1 Download and Install TPC-H Data Generation Tool

Execute the following script to download and compile the tpch-tools tool.

  1. sh bin/build-tpch-dbgen.sh

After successful installation, the dbgen binary will be generated under the TPC-H_Tools_v3.0.0/ directory.

7.2 Generating the TPC-H Test Set

Execute the following script to generate the TPC-H dataset:

  1. sh bin/gen-tpch-data.sh -s 1000

Note 1: Check the script help via sh gen-tpch-data.sh -h.

Note 2: The data will be generated under the tpch-data/ directory with the suffix .tbl. The total file size is about 1000GB and may need a few minutes to an hour to generate.

Note 3: A standard test data set of 100G is generated by default.

7.3 Create Table

7.3.1 Prepare the doris-cluster.conf File

Before import the script, you need to write the FE’s ip port and other information in the doris-cluster.conf file.

The file is located under ${DORIS_HOME}/tools/tpch-tools/conf/ .

The content of the file includes FE’s ip, HTTP port, user name, password and the DB name of the data to be imported:

  1. # Any of FE host
  2. export FE_HOST='127.0.0.1'
  3. # http_port in fe.conf
  4. export FE_HTTP_PORT=8030
  5. # query_port in fe.conf
  6. export FE_QUERY_PORT=9030
  7. # Doris username
  8. export USER='root'
  9. # Doris password
  10. export PASSWORD=''
  11. # The database where TPC-H tables located
  12. export DB='tpch'

Execute the Following Script to Generate and Create TPC-H Table

  1. sh bin/create-tpch-tables.sh -s 1000

Or copy the table creation statement in create-tpch-tables.sql and excute it in Doris.

7.4 Import Data

Please perform data import with the following command:

  1. sh bin/load-tpch-data.sh

7.5 Check Imported Data

Execute the following SQL statement to check that the imported data is consistent with the above data.

  1. select count(*) from lineitem;
  2. select count(*) from orders;
  3. select count(*) from partsupp;
  4. select count(*) from part;
  5. select count(*) from customer;
  6. select count(*) from supplier;
  7. select count(*) from nation;
  8. select count(*) from region;
  9. select count(*) from revenue0;

7.6 Query Test

7.6.1 Executing Query Scripts

Execute the above test SQL or execute the following command

  1. sh bin/run-tpch-queries.sh -s 1000

7.6.2 Single SQL Execution

The following is the SQL statement used in the test, you can also get the latest SQL from the code base.

  1. --Q1
  2. select
  3. l_returnflag,
  4. l_linestatus,
  5. sum(l_quantity) as sum_qty,
  6. sum(l_extendedprice) as sum_base_price,
  7. sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
  8. sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
  9. avg(l_quantity) as avg_qty,
  10. avg(l_extendedprice) as avg_price,
  11. avg(l_discount) as avg_disc,
  12. count(*) as count_order
  13. from
  14. lineitem
  15. where
  16. l_shipdate <= date '1998-12-01' - interval '90' day
  17. group by
  18. l_returnflag,
  19. l_linestatus
  20. order by
  21. l_returnflag,
  22. l_linestatus;
  23. --Q2
  24. select
  25. s_acctbal,
  26. s_name,
  27. n_name,
  28. p_partkey,
  29. p_mfgr,
  30. s_address,
  31. s_phone,
  32. s_comment
  33. from
  34. part,
  35. supplier,
  36. partsupp,
  37. nation,
  38. region
  39. where
  40. p_partkey = ps_partkey
  41. and s_suppkey = ps_suppkey
  42. and p_size = 15
  43. and p_type like '%BRASS'
  44. and s_nationkey = n_nationkey
  45. and n_regionkey = r_regionkey
  46. and r_name = 'EUROPE'
  47. and ps_supplycost = (
  48. select
  49. min(ps_supplycost)
  50. from
  51. partsupp,
  52. supplier,
  53. nation,
  54. region
  55. where
  56. p_partkey = ps_partkey
  57. and s_suppkey = ps_suppkey
  58. and s_nationkey = n_nationkey
  59. and n_regionkey = r_regionkey
  60. and r_name = 'EUROPE'
  61. )
  62. order by
  63. s_acctbal desc,
  64. n_name,
  65. s_name,
  66. p_partkey
  67. limit 100;
  68. --Q3
  69. select
  70. l_orderkey,
  71. sum(l_extendedprice * (1 - l_discount)) as revenue,
  72. o_orderdate,
  73. o_shippriority
  74. from
  75. customer,
  76. orders,
  77. lineitem
  78. where
  79. c_mktsegment = 'BUILDING'
  80. and c_custkey = o_custkey
  81. and l_orderkey = o_orderkey
  82. and o_orderdate < date '1995-03-15'
  83. and l_shipdate > date '1995-03-15'
  84. group by
  85. l_orderkey,
  86. o_orderdate,
  87. o_shippriority
  88. order by
  89. revenue desc,
  90. o_orderdate
  91. limit 10;
  92. --Q4
  93. select
  94. o_orderpriority,
  95. count(*) as order_count
  96. from
  97. orders
  98. where
  99. o_orderdate >= date '1993-07-01'
  100. and o_orderdate < date '1993-07-01' + interval '3' month
  101. and exists (
  102. select
  103. *
  104. from
  105. lineitem
  106. where
  107. l_orderkey = o_orderkey
  108. and l_commitdate < l_receiptdate
  109. )
  110. group by
  111. o_orderpriority
  112. order by
  113. o_orderpriority;
  114. --Q5
  115. select
  116. n_name,
  117. sum(l_extendedprice * (1 - l_discount)) as revenue
  118. from
  119. customer,
  120. orders,
  121. lineitem,
  122. supplier,
  123. nation,
  124. region
  125. where
  126. c_custkey = o_custkey
  127. and l_orderkey = o_orderkey
  128. and l_suppkey = s_suppkey
  129. and c_nationkey = s_nationkey
  130. and s_nationkey = n_nationkey
  131. and n_regionkey = r_regionkey
  132. and r_name = 'ASIA'
  133. and o_orderdate >= date '1994-01-01'
  134. and o_orderdate < date '1994-01-01' + interval '1' year
  135. group by
  136. n_name
  137. order by
  138. revenue desc;
  139. --Q6
  140. select
  141. sum(l_extendedprice * l_discount) as revenue
  142. from
  143. lineitem
  144. where
  145. l_shipdate >= date '1994-01-01'
  146. and l_shipdate < date '1994-01-01' + interval '1' year
  147. and l_discount between .06 - 0.01 and .06 + 0.01
  148. and l_quantity < 24;
  149. --Q7
  150. select
  151. supp_nation,
  152. cust_nation,
  153. l_year,
  154. sum(volume) as revenue
  155. from
  156. (
  157. select
  158. n1.n_name as supp_nation,
  159. n2.n_name as cust_nation,
  160. extract(year from l_shipdate) as l_year,
  161. l_extendedprice * (1 - l_discount) as volume
  162. from
  163. supplier,
  164. lineitem,
  165. orders,
  166. customer,
  167. nation n1,
  168. nation n2
  169. where
  170. s_suppkey = l_suppkey
  171. and o_orderkey = l_orderkey
  172. and c_custkey = o_custkey
  173. and s_nationkey = n1.n_nationkey
  174. and c_nationkey = n2.n_nationkey
  175. and (
  176. (n1.n_name = 'FRANCE' and n2.n_name = 'GERMANY')
  177. or (n1.n_name = 'GERMANY' and n2.n_name = 'FRANCE')
  178. )
  179. and l_shipdate between date '1995-01-01' and date '1996-12-31'
  180. ) as shipping
  181. group by
  182. supp_nation,
  183. cust_nation,
  184. l_year
  185. order by
  186. supp_nation,
  187. cust_nation,
  188. l_year;
  189. --Q8
  190. select
  191. o_year,
  192. sum(case
  193. when nation = 'BRAZIL' then volume
  194. else 0
  195. end) / sum(volume) as mkt_share
  196. from
  197. (
  198. select
  199. extract(year from o_orderdate) as o_year,
  200. l_extendedprice * (1 - l_discount) as volume,
  201. n2.n_name as nation
  202. from
  203. part,
  204. supplier,
  205. lineitem,
  206. orders,
  207. customer,
  208. nation n1,
  209. nation n2,
  210. region
  211. where
  212. p_partkey = l_partkey
  213. and s_suppkey = l_suppkey
  214. and l_orderkey = o_orderkey
  215. and o_custkey = c_custkey
  216. and c_nationkey = n1.n_nationkey
  217. and n1.n_regionkey = r_regionkey
  218. and r_name = 'AMERICA'
  219. and s_nationkey = n2.n_nationkey
  220. and o_orderdate between date '1995-01-01' and date '1996-12-31'
  221. and p_type = 'ECONOMY ANODIZED STEEL'
  222. ) as all_nations
  223. group by
  224. o_year
  225. order by
  226. o_year;
  227. --Q9
  228. select
  229. nation,
  230. o_year,
  231. sum(amount) as sum_profit
  232. from
  233. (
  234. select
  235. n_name as nation,
  236. extract(year from o_orderdate) as o_year,
  237. l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount
  238. from
  239. part,
  240. supplier,
  241. lineitem,
  242. partsupp,
  243. orders,
  244. nation
  245. where
  246. s_suppkey = l_suppkey
  247. and ps_suppkey = l_suppkey
  248. and ps_partkey = l_partkey
  249. and p_partkey = l_partkey
  250. and o_orderkey = l_orderkey
  251. and s_nationkey = n_nationkey
  252. and p_name like '%green%'
  253. ) as profit
  254. group by
  255. nation,
  256. o_year
  257. order by
  258. nation,
  259. o_year desc;
  260. --Q10
  261. select
  262. c_custkey,
  263. c_name,
  264. sum(l_extendedprice * (1 - l_discount)) as revenue,
  265. c_acctbal,
  266. n_name,
  267. c_address,
  268. c_phone,
  269. c_comment
  270. from
  271. customer,
  272. orders,
  273. lineitem,
  274. nation
  275. where
  276. c_custkey = o_custkey
  277. and l_orderkey = o_orderkey
  278. and o_orderdate >= date '1993-10-01'
  279. and o_orderdate < date '1993-10-01' + interval '3' month
  280. and l_returnflag = 'R'
  281. and c_nationkey = n_nationkey
  282. group by
  283. c_custkey,
  284. c_name,
  285. c_acctbal,
  286. c_phone,
  287. n_name,
  288. c_address,
  289. c_comment
  290. order by
  291. revenue desc
  292. limit 20;
  293. --Q11
  294. select
  295. ps_partkey,
  296. sum(ps_supplycost * ps_availqty) as value
  297. from
  298. partsupp,
  299. supplier,
  300. nation
  301. where
  302. ps_suppkey = s_suppkey
  303. and s_nationkey = n_nationkey
  304. and n_name = 'GERMANY'
  305. group by
  306. ps_partkey having
  307. sum(ps_supplycost * ps_availqty) > (
  308. select
  309. sum(ps_supplycost * ps_availqty) * 0.000002
  310. from
  311. partsupp,
  312. supplier,
  313. nation
  314. where
  315. ps_suppkey = s_suppkey
  316. and s_nationkey = n_nationkey
  317. and n_name = 'GERMANY'
  318. )
  319. order by
  320. value desc;
  321. --Q12
  322. select
  323. l_shipmode,
  324. sum(case
  325. when o_orderpriority = '1-URGENT'
  326. or o_orderpriority = '2-HIGH'
  327. then 1
  328. else 0
  329. end) as high_line_count,
  330. sum(case
  331. when o_orderpriority <> '1-URGENT'
  332. and o_orderpriority <> '2-HIGH'
  333. then 1
  334. else 0
  335. end) as low_line_count
  336. from
  337. orders,
  338. lineitem
  339. where
  340. o_orderkey = l_orderkey
  341. and l_shipmode in ('MAIL', 'SHIP')
  342. and l_commitdate < l_receiptdate
  343. and l_shipdate < l_commitdate
  344. and l_receiptdate >= date '1994-01-01'
  345. and l_receiptdate < date '1994-01-01' + interval '1' year
  346. group by
  347. l_shipmode
  348. order by
  349. l_shipmode;
  350. --Q13
  351. select
  352. c_count,
  353. count(*) as custdist
  354. from
  355. (
  356. select
  357. c_custkey,
  358. count(o_orderkey) as c_count
  359. from
  360. customer left outer join orders on
  361. c_custkey = o_custkey
  362. and o_comment not like '%special%requests%'
  363. group by
  364. c_custkey
  365. ) as c_orders
  366. group by
  367. c_count
  368. order by
  369. custdist desc,
  370. c_count desc;
  371. --Q14
  372. select
  373. 100.00 * sum(case
  374. when p_type like 'PROMO%'
  375. then l_extendedprice * (1 - l_discount)
  376. else 0
  377. end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue
  378. from
  379. lineitem,
  380. part
  381. where
  382. l_partkey = p_partkey
  383. and l_shipdate >= date '1995-09-01'
  384. and l_shipdate < date '1995-09-01' + interval '1' month;
  385. --Q15
  386. select
  387. s_suppkey,
  388. s_name,
  389. s_address,
  390. s_phone,
  391. total_revenue
  392. from
  393. supplier,
  394. revenue0
  395. where
  396. s_suppkey = supplier_no
  397. and total_revenue = (
  398. select
  399. max(total_revenue)
  400. from
  401. revenue0
  402. )
  403. order by
  404. s_suppkey;
  405. --Q16
  406. select
  407. p_brand,
  408. p_type,
  409. p_size,
  410. count(distinct ps_suppkey) as supplier_cnt
  411. from
  412. partsupp,
  413. part
  414. where
  415. p_partkey = ps_partkey
  416. and p_brand <> 'Brand#45'
  417. and p_type not like 'MEDIUM POLISHED%'
  418. and p_size in (49, 14, 23, 45, 19, 3, 36, 9)
  419. and ps_suppkey not in (
  420. select
  421. s_suppkey
  422. from
  423. supplier
  424. where
  425. s_comment like '%Customer%Complaints%'
  426. )
  427. group by
  428. p_brand,
  429. p_type,
  430. p_size
  431. order by
  432. supplier_cnt desc,
  433. p_brand,
  434. p_type,
  435. p_size;
  436. --Q17
  437. select
  438. sum(l_extendedprice) / 7.0 as avg_yearly
  439. from
  440. lineitem,
  441. part
  442. where
  443. p_partkey = l_partkey
  444. and p_brand = 'Brand#23'
  445. and p_container = 'MED BOX'
  446. and l_quantity < (
  447. select
  448. 0.2 * avg(l_quantity)
  449. from
  450. lineitem
  451. where
  452. l_partkey = p_partkey
  453. );
  454. --Q18
  455. select
  456. c_name,
  457. c_custkey,
  458. o_orderkey,
  459. o_orderdate,
  460. o_totalprice,
  461. sum(l_quantity)
  462. from
  463. customer,
  464. orders,
  465. lineitem
  466. where
  467. o_orderkey in (
  468. select
  469. l_orderkey
  470. from
  471. lineitem
  472. group by
  473. l_orderkey having
  474. sum(l_quantity) > 300
  475. )
  476. and c_custkey = o_custkey
  477. and o_orderkey = l_orderkey
  478. group by
  479. c_name,
  480. c_custkey,
  481. o_orderkey,
  482. o_orderdate,
  483. o_totalprice
  484. order by
  485. o_totalprice desc,
  486. o_orderdate
  487. limit 100;
  488. --Q19
  489. select
  490. sum(l_extendedprice* (1 - l_discount)) as revenue
  491. from
  492. lineitem,
  493. part
  494. where
  495. (
  496. p_partkey = l_partkey
  497. and p_brand = 'Brand#12'
  498. and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
  499. and l_quantity >= 1 and l_quantity <= 1 + 10
  500. and p_size between 1 and 5
  501. and l_shipmode in ('AIR', 'AIR REG')
  502. and l_shipinstruct = 'DELIVER IN PERSON'
  503. )
  504. or
  505. (
  506. p_partkey = l_partkey
  507. and p_brand = 'Brand#23'
  508. and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
  509. and l_quantity >= 10 and l_quantity <= 10 + 10
  510. and p_size between 1 and 10
  511. and l_shipmode in ('AIR', 'AIR REG')
  512. and l_shipinstruct = 'DELIVER IN PERSON'
  513. )
  514. or
  515. (
  516. p_partkey = l_partkey
  517. and p_brand = 'Brand#34'
  518. and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
  519. and l_quantity >= 20 and l_quantity <= 20 + 10
  520. and p_size between 1 and 15
  521. and l_shipmode in ('AIR', 'AIR REG')
  522. and l_shipinstruct = 'DELIVER IN PERSON'
  523. );
  524. --Q20
  525. select
  526. s_name,
  527. s_address
  528. from
  529. supplier,
  530. nation
  531. where
  532. s_suppkey in (
  533. select
  534. ps_suppkey
  535. from
  536. partsupp
  537. where
  538. ps_partkey in (
  539. select
  540. p_partkey
  541. from
  542. part
  543. where
  544. p_name like 'forest%'
  545. )
  546. and ps_availqty > (
  547. select
  548. 0.5 * sum(l_quantity)
  549. from
  550. lineitem
  551. where
  552. l_partkey = ps_partkey
  553. and l_suppkey = ps_suppkey
  554. and l_shipdate >= date '1994-01-01'
  555. and l_shipdate < date '1994-01-01' + interval '1' year
  556. )
  557. )
  558. and s_nationkey = n_nationkey
  559. and n_name = 'CANADA'
  560. order by
  561. s_name;
  562. --Q21
  563. select
  564. s_name,
  565. count(*) as numwait
  566. from
  567. supplier,
  568. lineitem l1,
  569. orders,
  570. nation
  571. where
  572. s_suppkey = l1.l_suppkey
  573. and o_orderkey = l1.l_orderkey
  574. and o_orderstatus = 'F'
  575. and l1.l_receiptdate > l1.l_commitdate
  576. and exists (
  577. select
  578. *
  579. from
  580. lineitem l2
  581. where
  582. l2.l_orderkey = l1.l_orderkey
  583. and l2.l_suppkey <> l1.l_suppkey
  584. )
  585. and not exists (
  586. select
  587. *
  588. from
  589. lineitem l3
  590. where
  591. l3.l_orderkey = l1.l_orderkey
  592. and l3.l_suppkey <> l1.l_suppkey
  593. and l3.l_receiptdate > l3.l_commitdate
  594. )
  595. and s_nationkey = n_nationkey
  596. and n_name = 'SAUDI ARABIA'
  597. group by
  598. s_name
  599. order by
  600. numwait desc,
  601. s_name
  602. limit 100;
  603. --Q22
  604. select
  605. cntrycode,
  606. count(*) as numcust,
  607. sum(c_acctbal) as totacctbal
  608. from
  609. (
  610. select
  611. substring(c_phone, 1, 2) as cntrycode,
  612. c_acctbal
  613. from
  614. customer
  615. where
  616. substring(c_phone, 1, 2) in
  617. ('13', '31', '23', '29', '30', '18', '17')
  618. and c_acctbal > (
  619. select
  620. avg(c_acctbal)
  621. from
  622. customer
  623. where
  624. c_acctbal > 0.00
  625. and substring(c_phone, 1, 2) in
  626. ('13', '31', '23', '29', '30', '18', '17')
  627. )
  628. and not exists (
  629. select
  630. *
  631. from
  632. orders
  633. where
  634. o_custkey = c_custkey
  635. )
  636. ) as custsale
  637. group by
  638. cntrycode
  639. order by
  640. cntrycode;