Star Schema Benchmark

Star Schema Benchmark(SSB) 是一个轻量级的数仓场景下的性能测试集。SSB 基于 TPC-H 提供了一个简化版的星型模型数据集,主要用于测试在星型模型下,多表关联查询的性能表现。另外,业界内通常也会将 SSB 打平为宽表模型(以下简称:SSB flat),来测试查询引擎的性能,参考Clickhouse

本文档主要介绍 Apache Doris 在 SSB 1000G 测试集上的性能表现。

在 SSB 标准测试数据集上的 13 个查询上,我们基于 Apache Doris 2.0.6 版本进行了测试。

1. 硬件环境

硬件配置说明
机器数量4 台腾讯云主机(1 个 FE,3 个 BE)
CPUAMD EPYC™ Milan(2.55GHz/3.5GHz) 48 核
内存192G
网络带宽21Gbps
磁盘高性能云硬盘

2. 软件环境

  • Doris 部署 3BE 1FE
  • 内核版本:Linux version 5.4.0-96-generic (buildd@lgw01-amd64-051)
  • 操作系统版本:Ubuntu 20.04 LTS (Focal Fossa)
  • Doris 软件版本:Apache Doris 2.0.6
  • JDK:openjdk version “1.8.0_131”

3. 测试数据量

SSB 表名行数备注
lineorder5,999,989,709商品订单明细表表
customer30,000,000客户信息表
part2,000,000零件信息表
supplier2,000,000供应商信息表
dates2,556日期表
lineorder_flat5,999,989,709数据展平后的宽表

4. SSB 宽表测试结果

使用 Apache Doris 2.0.6 版本进行测试结果如下:

QueryDoris 2.0.6 (ms)
q1.186
q1.231
q1.387
q2.11046
q2.2569
q2.3480
q3.11339
q3.2957
q3.3215
q3.434
q4.11569
q4.2174
q4.3109
Total6696

5. 标准 SSB 测试结果

使用 Apache Doris 2.0.6 版本进行测试结果如下:

QueryDoris 2.0.6 (ms)
q1.1332
q1.286
q1.380
q2.1985
q2.2844
q2.3768
q3.12924
q3.2944
q3.3766
q3.4146
q4.13451
q4.2829
q4.3325
Total12480

6. 环境准备

请先参照 官方文档 进行 Apache Doris 的安装部署,以获得一个正常运行中的 Doris 集群(至少包含 1 FE 1 BE,推荐 1 FE 3 BE)。

7. 数据准备

7.1 下载安装 SSB 数据生成工具。

执行以下脚本下载并编译 ssb-tools 工具。

  1. sh bin/build-ssb-dbgen.sh

安装成功后,将在 ssb-dbgen/ 目录下生成 dbgen 二进制文件。

7.2 生成 SSB 测试集

执行以下脚本生成 SSB 数据集:

  1. sh bin/gen-ssb-data.sh -s 1000

注 1:通过 sh gen-ssb-data.sh -h 查看脚本帮助。

注 2:数据会以 .tbl 为后缀生成在 ssb-data/ 目录下。文件总大小约 600GB。生成时间可能在数分钟到 1 小时不等。

注 3:默认生成 100G 的标准测试数据集

7.3 建表

7.3.1 准备 doris-cluster.conf 文件

在调用导入脚本前,需要将 FE 的 ip 端口等信息写在 doris-cluster.conf 文件中。

文件位置在 ${DORIS_HOME}/tools/ssb-tools/conf/ 目录下。

文件内容包括 FE 的 ip,HTTP 端口,用户名,密码以及待导入数据的 DB 名称:

  1. # Any of FE host
  2. export FE_HOST='127.0.0.1'
  3. # http_port in fe.conf
  4. export FE_HTTP_PORT=8030
  5. # query_port in fe.conf
  6. export FE_QUERY_PORT=9030
  7. # Doris username
  8. export USER='root'
  9. # Doris password
  10. export PASSWORD=''
  11. # The database where SSB tables located
  12. export DB='ssb'

7.3.2 执行以下脚本生成创建 SSB 表

  1. sh bin/create-ssb-tables.sh -s 1000

或者复制 create-ssb-tables.sqlcreate-ssb-flat-table.sql 中的建表语句,在 MySQL 客户端中执行。

7.4 导入数据

我们使用以下命令完成 SSB 测试集所有数据导入及 SSB FLAT 宽表数据合成并导入到表里。

  1. sh bin/load-ssb-data.sh

7.5 检查导入数据

  1. select count(*) from part;
  2. select count(*) from customer;
  3. select count(*) from supplier;
  4. select count(*) from dates;
  5. select count(*) from lineorder;
  6. select count(*) from lineorder_flat;

7.6 查询测试

SSB-FlAT 查询语句:ssb-flat-queries

标准 SSB 查询语句:ssb-queries

7.6.1 SSB FLAT 测试 SQL

  1. --Q1.1
  2. SELECT SUM(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue
  3. FROM lineorder_flat
  4. WHERE
  5. LO_ORDERDATE >= 19930101
  6. AND LO_ORDERDATE <= 19931231
  7. AND LO_DISCOUNT BETWEEN 1 AND 3
  8. AND LO_QUANTITY < 25;
  9. --Q1.2
  10. SELECT SUM(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue
  11. FROM lineorder_flat
  12. WHERE
  13. LO_ORDERDATE >= 19940101
  14. AND LO_ORDERDATE <= 19940131
  15. AND LO_DISCOUNT BETWEEN 4 AND 6
  16. AND LO_QUANTITY BETWEEN 26 AND 35;
  17. --Q1.3
  18. SELECT SUM(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue
  19. FROM lineorder_flat
  20. WHERE
  21. weekofyear(LO_ORDERDATE) = 6
  22. AND LO_ORDERDATE >= 19940101
  23. AND LO_ORDERDATE <= 19941231
  24. AND LO_DISCOUNT BETWEEN 5 AND 7
  25. AND LO_QUANTITY BETWEEN 26 AND 35;
  26. --Q2.1
  27. SELECT
  28. SUM(LO_REVENUE), (LO_ORDERDATE DIV 10000) AS YEAR,
  29. P_BRAND
  30. FROM lineorder_flat
  31. WHERE P_CATEGORY = 'MFGR#12' AND S_REGION = 'AMERICA'
  32. GROUP BY YEAR, P_BRAND
  33. ORDER BY YEAR, P_BRAND;
  34. --Q2.2
  35. SELECT
  36. SUM(LO_REVENUE), (LO_ORDERDATE DIV 10000) AS YEAR,
  37. P_BRAND
  38. FROM lineorder_flat
  39. WHERE
  40. P_BRAND >= 'MFGR#2221'
  41. AND P_BRAND <= 'MFGR#2228'
  42. AND S_REGION = 'ASIA'
  43. GROUP BY YEAR, P_BRAND
  44. ORDER BY YEAR, P_BRAND;
  45. --Q2.3
  46. SELECT
  47. SUM(LO_REVENUE), (LO_ORDERDATE DIV 10000) AS YEAR,
  48. P_BRAND
  49. FROM lineorder_flat
  50. WHERE
  51. P_BRAND = 'MFGR#2239'
  52. AND S_REGION = 'EUROPE'
  53. GROUP BY YEAR, P_BRAND
  54. ORDER BY YEAR, P_BRAND;
  55. --Q3.1
  56. SELECT
  57. C_NATION,
  58. S_NATION, (LO_ORDERDATE DIV 10000) AS YEAR,
  59. SUM(LO_REVENUE) AS revenue
  60. FROM lineorder_flat
  61. WHERE
  62. C_REGION = 'ASIA'
  63. AND S_REGION = 'ASIA'
  64. AND LO_ORDERDATE >= 19920101
  65. AND LO_ORDERDATE <= 19971231
  66. GROUP BY C_NATION, S_NATION, YEAR
  67. ORDER BY YEAR ASC, revenue DESC;
  68. --Q3.2
  69. SELECT
  70. C_CITY,
  71. S_CITY, (LO_ORDERDATE DIV 10000) AS YEAR,
  72. SUM(LO_REVENUE) AS revenue
  73. FROM lineorder_flat
  74. WHERE
  75. C_NATION = 'UNITED STATES'
  76. AND S_NATION = 'UNITED STATES'
  77. AND LO_ORDERDATE >= 19920101
  78. AND LO_ORDERDATE <= 19971231
  79. GROUP BY C_CITY, S_CITY, YEAR
  80. ORDER BY YEAR ASC, revenue DESC;
  81. --Q3.3
  82. SELECT
  83. C_CITY,
  84. S_CITY, (LO_ORDERDATE DIV 10000) AS YEAR,
  85. SUM(LO_REVENUE) AS revenue
  86. FROM lineorder_flat
  87. WHERE
  88. C_CITY IN ('UNITED KI1', 'UNITED KI5')
  89. AND S_CITY IN ('UNITED KI1', 'UNITED KI5')
  90. AND LO_ORDERDATE >= 19920101
  91. AND LO_ORDERDATE <= 19971231
  92. GROUP BY C_CITY, S_CITY, YEAR
  93. ORDER BY YEAR ASC, revenue DESC;
  94. --Q3.4
  95. SELECT
  96. C_CITY,
  97. S_CITY, (LO_ORDERDATE DIV 10000) AS YEAR,
  98. SUM(LO_REVENUE) AS revenue
  99. FROM lineorder_flat
  100. WHERE
  101. C_CITY IN ('UNITED KI1', 'UNITED KI5')
  102. AND S_CITY IN ('UNITED KI1', 'UNITED KI5')
  103. AND LO_ORDERDATE >= 19971201
  104. AND LO_ORDERDATE <= 19971231
  105. GROUP BY C_CITY, S_CITY, YEAR
  106. ORDER BY YEAR ASC, revenue DESC;
  107. --Q4.1
  108. SELECT (LO_ORDERDATE DIV 10000) AS YEAR,
  109. C_NATION,
  110. SUM(LO_REVENUE - LO_SUPPLYCOST) AS profit
  111. FROM lineorder_flat
  112. WHERE
  113. C_REGION = 'AMERICA'
  114. AND S_REGION = 'AMERICA'
  115. AND P_MFGR IN ('MFGR#1', 'MFGR#2')
  116. GROUP BY YEAR, C_NATION
  117. ORDER BY YEAR ASC, C_NATION ASC;
  118. --Q4.2
  119. SELECT (LO_ORDERDATE DIV 10000) AS YEAR,
  120. S_NATION,
  121. P_CATEGORY,
  122. SUM(LO_REVENUE - LO_SUPPLYCOST) AS profit
  123. FROM lineorder_flat
  124. WHERE
  125. C_REGION = 'AMERICA'
  126. AND S_REGION = 'AMERICA'
  127. AND LO_ORDERDATE >= 19970101
  128. AND LO_ORDERDATE <= 19981231
  129. AND P_MFGR IN ('MFGR#1', 'MFGR#2')
  130. GROUP BY YEAR, S_NATION, P_CATEGORY
  131. ORDER BY
  132. YEAR ASC,
  133. S_NATION ASC,
  134. P_CATEGORY ASC;
  135. --Q4.3
  136. SELECT (LO_ORDERDATE DIV 10000) AS YEAR,
  137. S_CITY,
  138. P_BRAND,
  139. SUM(LO_REVENUE - LO_SUPPLYCOST) AS profit
  140. FROM lineorder_flat
  141. WHERE
  142. S_NATION = 'UNITED STATES'
  143. AND LO_ORDERDATE >= 19970101
  144. AND LO_ORDERDATE <= 19981231
  145. AND P_CATEGORY = 'MFGR#14'
  146. GROUP BY YEAR, S_CITY, P_BRAND
  147. ORDER BY YEAR ASC, S_CITY ASC, P_BRAND ASC;

7.6.2 SSB 标准测试 SQL

  1. --Q1.1
  2. SELECT SUM(lo_extendedprice * lo_discount) AS REVENUE
  3. FROM lineorder, dates
  4. WHERE
  5. lo_orderdate = d_datekey
  6. AND d_year = 1993
  7. AND lo_discount BETWEEN 1 AND 3
  8. AND lo_quantity < 25;
  9. --Q1.2
  10. SELECT SUM(lo_extendedprice * lo_discount) AS REVENUE
  11. FROM lineorder, dates
  12. WHERE
  13. lo_orderdate = d_datekey
  14. AND d_yearmonth = 'Jan1994'
  15. AND lo_discount BETWEEN 4 AND 6
  16. AND lo_quantity BETWEEN 26 AND 35;
  17. --Q1.3
  18. SELECT
  19. SUM(lo_extendedprice * lo_discount) AS REVENUE
  20. FROM lineorder, dates
  21. WHERE
  22. lo_orderdate = d_datekey
  23. AND d_weeknuminyear = 6
  24. AND d_year = 1994
  25. AND lo_discount BETWEEN 5 AND 7
  26. AND lo_quantity BETWEEN 26 AND 35;
  27. --Q2.1
  28. SELECT SUM(lo_revenue), d_year, p_brand
  29. FROM lineorder, dates, part, supplier
  30. WHERE
  31. lo_orderdate = d_datekey
  32. AND lo_partkey = p_partkey
  33. AND lo_suppkey = s_suppkey
  34. AND p_category = 'MFGR#12'
  35. AND s_region = 'AMERICA'
  36. GROUP BY d_year, p_brand
  37. ORDER BY p_brand;
  38. --Q2.2
  39. SELECT SUM(lo_revenue), d_year, p_brand
  40. FROM lineorder, dates, part, supplier
  41. WHERE
  42. lo_orderdate = d_datekey
  43. AND lo_partkey = p_partkey
  44. AND lo_suppkey = s_suppkey
  45. AND p_brand BETWEEN 'MFGR#2221' AND 'MFGR#2228'
  46. AND s_region = 'ASIA'
  47. GROUP BY d_year, p_brand
  48. ORDER BY d_year, p_brand;
  49. --Q2.3
  50. SELECT SUM(lo_revenue), d_year, p_brand
  51. FROM lineorder, dates, part, supplier
  52. WHERE
  53. lo_orderdate = d_datekey
  54. AND lo_partkey = p_partkey
  55. AND lo_suppkey = s_suppkey
  56. AND p_brand = 'MFGR#2239'
  57. AND s_region = 'EUROPE'
  58. GROUP BY d_year, p_brand
  59. ORDER BY d_year, p_brand;
  60. --Q3.1
  61. SELECT
  62. c_nation,
  63. s_nation,
  64. d_year,
  65. SUM(lo_revenue) AS REVENUE
  66. FROM customer, lineorder, supplier, dates
  67. WHERE
  68. lo_custkey = c_custkey
  69. AND lo_suppkey = s_suppkey
  70. AND lo_orderdate = d_datekey
  71. AND c_region = 'ASIA'
  72. AND s_region = 'ASIA'
  73. AND d_year >= 1992
  74. AND d_year <= 1997
  75. GROUP BY c_nation, s_nation, d_year
  76. ORDER BY d_year ASC, REVENUE DESC;
  77. --Q3.2
  78. SELECT
  79. c_city,
  80. s_city,
  81. d_year,
  82. SUM(lo_revenue) AS REVENUE
  83. FROM customer, lineorder, supplier, dates
  84. WHERE
  85. lo_custkey = c_custkey
  86. AND lo_suppkey = s_suppkey
  87. AND lo_orderdate = d_datekey
  88. AND c_nation = 'UNITED STATES'
  89. AND s_nation = 'UNITED STATES'
  90. AND d_year >= 1992
  91. AND d_year <= 1997
  92. GROUP BY c_city, s_city, d_year
  93. ORDER BY d_year ASC, REVENUE DESC;
  94. --Q3.3
  95. SELECT
  96. c_city,
  97. s_city,
  98. d_year,
  99. SUM(lo_revenue) AS REVENUE
  100. FROM customer, lineorder, supplier, dates
  101. WHERE
  102. lo_custkey = c_custkey
  103. AND lo_suppkey = s_suppkey
  104. AND lo_orderdate = d_datekey
  105. AND (
  106. c_city = 'UNITED KI1'
  107. OR c_city = 'UNITED KI5'
  108. )
  109. AND (
  110. s_city = 'UNITED KI1'
  111. OR s_city = 'UNITED KI5'
  112. )
  113. AND d_year >= 1992
  114. AND d_year <= 1997
  115. GROUP BY c_city, s_city, d_year
  116. ORDER BY d_year ASC, REVENUE DESC;
  117. --Q3.4
  118. SELECT
  119. c_city,
  120. s_city,
  121. d_year,
  122. SUM(lo_revenue) AS REVENUE
  123. FROM customer, lineorder, supplier, dates
  124. WHERE
  125. lo_custkey = c_custkey
  126. AND lo_suppkey = s_suppkey
  127. AND lo_orderdate = d_datekey
  128. AND (
  129. c_city = 'UNITED KI1'
  130. OR c_city = 'UNITED KI5'
  131. )
  132. AND (
  133. s_city = 'UNITED KI1'
  134. OR s_city = 'UNITED KI5'
  135. )
  136. AND d_yearmonth = 'Dec1997'
  137. GROUP BY c_city, s_city, d_year
  138. ORDER BY d_year ASC, REVENUE DESC;
  139. --Q4.1
  140. SELECT
  141. d_year,
  142. c_nation,
  143. SUM(lo_revenue - lo_supplycost) AS PROFIT
  144. FROM dates, customer, supplier, part, lineorder
  145. WHERE
  146. lo_custkey = c_custkey
  147. AND lo_suppkey = s_suppkey
  148. AND lo_partkey = p_partkey
  149. AND lo_orderdate = d_datekey
  150. AND c_region = 'AMERICA'
  151. AND s_region = 'AMERICA'
  152. AND (
  153. p_mfgr = 'MFGR#1'
  154. OR p_mfgr = 'MFGR#2'
  155. )
  156. GROUP BY d_year, c_nation
  157. ORDER BY d_year, c_nation;
  158. --Q4.2
  159. SELECT
  160. d_year,
  161. s_nation,
  162. p_category,
  163. SUM(lo_revenue - lo_supplycost) AS PROFIT
  164. FROM dates, customer, supplier, part, lineorder
  165. WHERE
  166. lo_custkey = c_custkey
  167. AND lo_suppkey = s_suppkey
  168. AND lo_partkey = p_partkey
  169. AND lo_orderdate = d_datekey
  170. AND c_region = 'AMERICA'
  171. AND s_region = 'AMERICA'
  172. AND (
  173. d_year = 1997
  174. OR d_year = 1998
  175. )
  176. AND (
  177. p_mfgr = 'MFGR#1'
  178. OR p_mfgr = 'MFGR#2'
  179. )
  180. GROUP BY d_year, s_nation, p_category
  181. ORDER BY d_year, s_nation, p_category;
  182. --Q4.3
  183. SELECT
  184. d_year,
  185. s_city,
  186. p_brand,
  187. SUM(lo_revenue - lo_supplycost) AS PROFIT
  188. FROM dates, customer, supplier, part, lineorder
  189. WHERE
  190. lo_custkey = c_custkey
  191. AND lo_suppkey = s_suppkey
  192. AND lo_partkey = p_partkey
  193. AND lo_orderdate = d_datekey
  194. AND s_nation = 'UNITED STATES'
  195. AND (
  196. d_year = 1997
  197. OR d_year = 1998
  198. )
  199. AND p_category = 'MFGR#14'
  200. GROUP BY d_year, s_city, p_brand
  201. ORDER BY d_year, s_city, p_brand;