Star Schema Benchmark

Star Schema Benchmark(SSB) is a lightweight performance test set in the data warehouse scenario. SSB provides a simplified star schema data based on TPC-H, which is mainly used to test the performance of multi-table JOIN query under star schema. In addition, the industry usually flattens SSB into a wide table model (Referred as: SSB flat) to test the performance of the query engine, refer to Clickhouse.

This document mainly introduces the performance of Doris on the SSB 1000G test set.

We tested 13 queries on the SSB standard test dataset based on Apache Doris version 2.0.6.

1. Hardware Environment

HardwareConfiguration Instructions
Number of mMachines4 Tencent Cloud Virtual Machine(1FE,3BEs)
CPUAMD EPYC™ Milan(2.55GHz/3.5GHz) 48C
Memory192G
Network21Gbps
DiskESSD Cloud Hard Disk

2. Software Environment

  • Doris Deployed 3BEs and 1FE
  • Kernel Version: Linux version 5.4.0-96-generic (buildd@lgw01-amd64-051)
  • OS version: Ubuntu 20.04 LTS (Focal Fossa)
  • Doris software version: Apache Doris 2.0.6.
  • JDK: openjdk version “1.8.0_131”

3. Test Data Volume

SSB Table NameRowsAnnotation
lineorder5,999,989,709Commodity Order Details
customer30,000,000Customer Information
part2,000,000Parts Information
supplier2,000,000Supplier Information
dates2,556Date
lineorder_flat5,999,989,709Wide Table after Data Flattening

4. SSB Flat Test Results

Here we use Apache Doris 2.0.6 for comparative testing. In the test, we use Query Time(ms) as the main performance indicator. The test results are as follows:

QueryDoris 2.0.6 (ms)
q1.186
q1.231
q1.387
q2.11046
q2.2569
q2.3480
q3.11339
q3.2957
q3.3215
q3.434
q4.11569
q4.2174
q4.3109
Total6696

5. Standard SSB Test Results

Here we use Apache Doris 2.0.6 for comparative testing. In the test, we use Query Time(ms) as the main performance indicator. The test results are as follows:

QueryDoris 2.0.6 (ms)
q1.1332
q1.286
q1.380
q2.1985
q2.2844
q2.3768
q3.12924
q3.2944
q3.3766
q3.4146
q4.13451
q4.2829
q4.3325
Total12480

6. Environment Preparation

Please first refer to the [official documentation](. /install/install-deploy.md) to install and deploy Apache Doris first to obtain a Doris cluster which is working well(including at least 1 FE 1 BE, 1 FE 3 BEs is recommended).

7. Data Preparation

7.1 Download and Install the SSB Data Generation Tool.

Execute the following script to download and compile the ssb-tools tool.

  1. sh bin/build-ssb-dbgen.sh

After successful installation, the dbgen binary will be generated under the ssb-dbgen/ directory.

7.2 Generate SSB Test Set

Execute the following script to generate the SSB dataset:

  1. sh bin/gen-ssb-data.sh -s 1000

Note 1: Check the script help via sh gen-ssb-data.sh -h.

Note 2: The data will be generated under the ssb-data/ directory with the suffix .tbl. The total file size is about 600GB and may need a few minutes to an hour to generate.

Note 3: A standard test data set of 100G is generated by default.

7.3 Create Table

7.3.1 Prepare the doris-cluster.conf File.

Before import the script, you need to write the FE’s ip port and other information in the doris-cluster.conf file.

The file is located under ${DORIS_HOME}/tools/ssb-tools/conf/.

The content of the file includes FE’s ip, HTTP port, user name, password and the DB name of the data to be imported:

  1. # Any of FE host
  2. export FE_HOST='127.0.0.1'
  3. # http_port in fe.conf
  4. export FE_HTTP_PORT=8030
  5. # query_port in fe.conf
  6. export FE_QUERY_PORT=9030
  7. # Doris username
  8. export USER='root'
  9. # Doris password
  10. export PASSWORD=''
  11. # The database where SSB tables located
  12. export DB='ssb'

7.3.2 Execute the Following Script to Generate and Create the SSB Table:

  1. sh bin/create-ssb-tables.sh -s 1000

Or copy the table creation statements in create-ssb-tables.sql and create-ssb-flat-table.sql and then execute them in the MySQL client.

7.4 Import data

We use the following command to complete all data import of SSB test set and SSB FLAT wide table data synthesis and then import into the table.

  1. sh bin/load-ssb-data.sh

7.5 Checking Imported data

  1. select count(*) from part;
  2. select count(*) from customer;
  3. select count(*) from supplier;
  4. select count(*) from dates;
  5. select count(*) from lineorder;
  6. select count(*) from lineorder_flat;

7.6 Query Test

7.6.1 SSB FLAT Test for SQL

  1. --Q1.1
  2. SELECT SUM(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue
  3. FROM lineorder_flat
  4. WHERE
  5. LO_ORDERDATE >= 19930101
  6. AND LO_ORDERDATE <= 19931231
  7. AND LO_DISCOUNT BETWEEN 1 AND 3
  8. AND LO_QUANTITY < 25;
  9. --Q1.2
  10. SELECT SUM(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue
  11. FROM lineorder_flat
  12. WHERE
  13. LO_ORDERDATE >= 19940101
  14. AND LO_ORDERDATE <= 19940131
  15. AND LO_DISCOUNT BETWEEN 4 AND 6
  16. AND LO_QUANTITY BETWEEN 26 AND 35;
  17. --Q1.3
  18. SELECT SUM(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue
  19. FROM lineorder_flat
  20. WHERE
  21. weekofyear(LO_ORDERDATE) = 6
  22. AND LO_ORDERDATE >= 19940101
  23. AND LO_ORDERDATE <= 19941231
  24. AND LO_DISCOUNT BETWEEN 5 AND 7
  25. AND LO_QUANTITY BETWEEN 26 AND 35;
  26. --Q2.1
  27. SELECT
  28. SUM(LO_REVENUE), (LO_ORDERDATE DIV 10000) AS YEAR,
  29. P_BRAND
  30. FROM lineorder_flat
  31. WHERE P_CATEGORY = 'MFGR#12' AND S_REGION = 'AMERICA'
  32. GROUP BY YEAR, P_BRAND
  33. ORDER BY YEAR, P_BRAND;
  34. --Q2.2
  35. SELECT
  36. SUM(LO_REVENUE), (LO_ORDERDATE DIV 10000) AS YEAR,
  37. P_BRAND
  38. FROM lineorder_flat
  39. WHERE
  40. P_BRAND >= 'MFGR#2221'
  41. AND P_BRAND <= 'MFGR#2228'
  42. AND S_REGION = 'ASIA'
  43. GROUP BY YEAR, P_BRAND
  44. ORDER BY YEAR, P_BRAND;
  45. --Q2.3
  46. SELECT
  47. SUM(LO_REVENUE), (LO_ORDERDATE DIV 10000) AS YEAR,
  48. P_BRAND
  49. FROM lineorder_flat
  50. WHERE
  51. P_BRAND = 'MFGR#2239'
  52. AND S_REGION = 'EUROPE'
  53. GROUP BY YEAR, P_BRAND
  54. ORDER BY YEAR, P_BRAND;
  55. --Q3.1
  56. SELECT
  57. C_NATION,
  58. S_NATION, (LO_ORDERDATE DIV 10000) AS YEAR,
  59. SUM(LO_REVENUE) AS revenue
  60. FROM lineorder_flat
  61. WHERE
  62. C_REGION = 'ASIA'
  63. AND S_REGION = 'ASIA'
  64. AND LO_ORDERDATE >= 19920101
  65. AND LO_ORDERDATE <= 19971231
  66. GROUP BY C_NATION, S_NATION, YEAR
  67. ORDER BY YEAR ASC, revenue DESC;
  68. --Q3.2
  69. SELECT
  70. C_CITY,
  71. S_CITY, (LO_ORDERDATE DIV 10000) AS YEAR,
  72. SUM(LO_REVENUE) AS revenue
  73. FROM lineorder_flat
  74. WHERE
  75. C_NATION = 'UNITED STATES'
  76. AND S_NATION = 'UNITED STATES'
  77. AND LO_ORDERDATE >= 19920101
  78. AND LO_ORDERDATE <= 19971231
  79. GROUP BY C_CITY, S_CITY, YEAR
  80. ORDER BY YEAR ASC, revenue DESC;
  81. --Q3.3
  82. SELECT
  83. C_CITY,
  84. S_CITY, (LO_ORDERDATE DIV 10000) AS YEAR,
  85. SUM(LO_REVENUE) AS revenue
  86. FROM lineorder_flat
  87. WHERE
  88. C_CITY IN ('UNITED KI1', 'UNITED KI5')
  89. AND S_CITY IN ('UNITED KI1', 'UNITED KI5')
  90. AND LO_ORDERDATE >= 19920101
  91. AND LO_ORDERDATE <= 19971231
  92. GROUP BY C_CITY, S_CITY, YEAR
  93. ORDER BY YEAR ASC, revenue DESC;
  94. --Q3.4
  95. SELECT
  96. C_CITY,
  97. S_CITY, (LO_ORDERDATE DIV 10000) AS YEAR,
  98. SUM(LO_REVENUE) AS revenue
  99. FROM lineorder_flat
  100. WHERE
  101. C_CITY IN ('UNITED KI1', 'UNITED KI5')
  102. AND S_CITY IN ('UNITED KI1', 'UNITED KI5')
  103. AND LO_ORDERDATE >= 19971201
  104. AND LO_ORDERDATE <= 19971231
  105. GROUP BY C_CITY, S_CITY, YEAR
  106. ORDER BY YEAR ASC, revenue DESC;
  107. --Q4.1
  108. SELECT (LO_ORDERDATE DIV 10000) AS YEAR,
  109. C_NATION,
  110. SUM(LO_REVENUE - LO_SUPPLYCOST) AS profit
  111. FROM lineorder_flat
  112. WHERE
  113. C_REGION = 'AMERICA'
  114. AND S_REGION = 'AMERICA'
  115. AND P_MFGR IN ('MFGR#1', 'MFGR#2')
  116. GROUP BY YEAR, C_NATION
  117. ORDER BY YEAR ASC, C_NATION ASC;
  118. --Q4.2
  119. SELECT (LO_ORDERDATE DIV 10000) AS YEAR,
  120. S_NATION,
  121. P_CATEGORY,
  122. SUM(LO_REVENUE - LO_SUPPLYCOST) AS profit
  123. FROM lineorder_flat
  124. WHERE
  125. C_REGION = 'AMERICA'
  126. AND S_REGION = 'AMERICA'
  127. AND LO_ORDERDATE >= 19970101
  128. AND LO_ORDERDATE <= 19981231
  129. AND P_MFGR IN ('MFGR#1', 'MFGR#2')
  130. GROUP BY YEAR, S_NATION, P_CATEGORY
  131. ORDER BY
  132. YEAR ASC,
  133. S_NATION ASC,
  134. P_CATEGORY ASC;
  135. --Q4.3
  136. SELECT (LO_ORDERDATE DIV 10000) AS YEAR,
  137. S_CITY,
  138. P_BRAND,
  139. SUM(LO_REVENUE - LO_SUPPLYCOST) AS profit
  140. FROM lineorder_flat
  141. WHERE
  142. S_NATION = 'UNITED STATES'
  143. AND LO_ORDERDATE >= 19970101
  144. AND LO_ORDERDATE <= 19981231
  145. AND P_CATEGORY = 'MFGR#14'
  146. GROUP BY YEAR, S_CITY, P_BRAND
  147. ORDER BY YEAR ASC, S_CITY ASC, P_BRAND ASC;

7.6.2 SSB Standard Test for SQL

  1. --Q1.1
  2. SELECT SUM(lo_extendedprice * lo_discount) AS REVENUE
  3. FROM lineorder, dates
  4. WHERE
  5. lo_orderdate = d_datekey
  6. AND d_year = 1993
  7. AND lo_discount BETWEEN 1 AND 3
  8. AND lo_quantity < 25;
  9. --Q1.2
  10. SELECT SUM(lo_extendedprice * lo_discount) AS REVENUE
  11. FROM lineorder, dates
  12. WHERE
  13. lo_orderdate = d_datekey
  14. AND d_yearmonth = 'Jan1994'
  15. AND lo_discount BETWEEN 4 AND 6
  16. AND lo_quantity BETWEEN 26 AND 35;
  17. --Q1.3
  18. SELECT
  19. SUM(lo_extendedprice * lo_discount) AS REVENUE
  20. FROM lineorder, dates
  21. WHERE
  22. lo_orderdate = d_datekey
  23. AND d_weeknuminyear = 6
  24. AND d_year = 1994
  25. AND lo_discount BETWEEN 5 AND 7
  26. AND lo_quantity BETWEEN 26 AND 35;
  27. --Q2.1
  28. SELECT SUM(lo_revenue), d_year, p_brand
  29. FROM lineorder, dates, part, supplier
  30. WHERE
  31. lo_orderdate = d_datekey
  32. AND lo_partkey = p_partkey
  33. AND lo_suppkey = s_suppkey
  34. AND p_category = 'MFGR#12'
  35. AND s_region = 'AMERICA'
  36. GROUP BY d_year, p_brand
  37. ORDER BY p_brand;
  38. --Q2.2
  39. SELECT SUM(lo_revenue), d_year, p_brand
  40. FROM lineorder, dates, part, supplier
  41. WHERE
  42. lo_orderdate = d_datekey
  43. AND lo_partkey = p_partkey
  44. AND lo_suppkey = s_suppkey
  45. AND p_brand BETWEEN 'MFGR#2221' AND 'MFGR#2228'
  46. AND s_region = 'ASIA'
  47. GROUP BY d_year, p_brand
  48. ORDER BY d_year, p_brand;
  49. --Q2.3
  50. SELECT SUM(lo_revenue), d_year, p_brand
  51. FROM lineorder, dates, part, supplier
  52. WHERE
  53. lo_orderdate = d_datekey
  54. AND lo_partkey = p_partkey
  55. AND lo_suppkey = s_suppkey
  56. AND p_brand = 'MFGR#2239'
  57. AND s_region = 'EUROPE'
  58. GROUP BY d_year, p_brand
  59. ORDER BY d_year, p_brand;
  60. --Q3.1
  61. SELECT
  62. c_nation,
  63. s_nation,
  64. d_year,
  65. SUM(lo_revenue) AS REVENUE
  66. FROM customer, lineorder, supplier, dates
  67. WHERE
  68. lo_custkey = c_custkey
  69. AND lo_suppkey = s_suppkey
  70. AND lo_orderdate = d_datekey
  71. AND c_region = 'ASIA'
  72. AND s_region = 'ASIA'
  73. AND d_year >= 1992
  74. AND d_year <= 1997
  75. GROUP BY c_nation, s_nation, d_year
  76. ORDER BY d_year ASC, REVENUE DESC;
  77. --Q3.2
  78. SELECT
  79. c_city,
  80. s_city,
  81. d_year,
  82. SUM(lo_revenue) AS REVENUE
  83. FROM customer, lineorder, supplier, dates
  84. WHERE
  85. lo_custkey = c_custkey
  86. AND lo_suppkey = s_suppkey
  87. AND lo_orderdate = d_datekey
  88. AND c_nation = 'UNITED STATES'
  89. AND s_nation = 'UNITED STATES'
  90. AND d_year >= 1992
  91. AND d_year <= 1997
  92. GROUP BY c_city, s_city, d_year
  93. ORDER BY d_year ASC, REVENUE DESC;
  94. --Q3.3
  95. SELECT
  96. c_city,
  97. s_city,
  98. d_year,
  99. SUM(lo_revenue) AS REVENUE
  100. FROM customer, lineorder, supplier, dates
  101. WHERE
  102. lo_custkey = c_custkey
  103. AND lo_suppkey = s_suppkey
  104. AND lo_orderdate = d_datekey
  105. AND (
  106. c_city = 'UNITED KI1'
  107. OR c_city = 'UNITED KI5'
  108. )
  109. AND (
  110. s_city = 'UNITED KI1'
  111. OR s_city = 'UNITED KI5'
  112. )
  113. AND d_year >= 1992
  114. AND d_year <= 1997
  115. GROUP BY c_city, s_city, d_year
  116. ORDER BY d_year ASC, REVENUE DESC;
  117. --Q3.4
  118. SELECT
  119. c_city,
  120. s_city,
  121. d_year,
  122. SUM(lo_revenue) AS REVENUE
  123. FROM customer, lineorder, supplier, dates
  124. WHERE
  125. lo_custkey = c_custkey
  126. AND lo_suppkey = s_suppkey
  127. AND lo_orderdate = d_datekey
  128. AND (
  129. c_city = 'UNITED KI1'
  130. OR c_city = 'UNITED KI5'
  131. )
  132. AND (
  133. s_city = 'UNITED KI1'
  134. OR s_city = 'UNITED KI5'
  135. )
  136. AND d_yearmonth = 'Dec1997'
  137. GROUP BY c_city, s_city, d_year
  138. ORDER BY d_year ASC, REVENUE DESC;
  139. --Q4.1
  140. SELECT
  141. d_year,
  142. c_nation,
  143. SUM(lo_revenue - lo_supplycost) AS PROFIT
  144. FROM dates, customer, supplier, part, lineorder
  145. WHERE
  146. lo_custkey = c_custkey
  147. AND lo_suppkey = s_suppkey
  148. AND lo_partkey = p_partkey
  149. AND lo_orderdate = d_datekey
  150. AND c_region = 'AMERICA'
  151. AND s_region = 'AMERICA'
  152. AND (
  153. p_mfgr = 'MFGR#1'
  154. OR p_mfgr = 'MFGR#2'
  155. )
  156. GROUP BY d_year, c_nation
  157. ORDER BY d_year, c_nation;
  158. --Q4.2
  159. SELECT
  160. d_year,
  161. s_nation,
  162. p_category,
  163. SUM(lo_revenue - lo_supplycost) AS PROFIT
  164. FROM dates, customer, supplier, part, lineorder
  165. WHERE
  166. lo_custkey = c_custkey
  167. AND lo_suppkey = s_suppkey
  168. AND lo_partkey = p_partkey
  169. AND lo_orderdate = d_datekey
  170. AND c_region = 'AMERICA'
  171. AND s_region = 'AMERICA'
  172. AND (
  173. d_year = 1997
  174. OR d_year = 1998
  175. )
  176. AND (
  177. p_mfgr = 'MFGR#1'
  178. OR p_mfgr = 'MFGR#2'
  179. )
  180. GROUP BY d_year, s_nation, p_category
  181. ORDER BY d_year, s_nation, p_category;
  182. --Q4.3
  183. SELECT
  184. d_year,
  185. s_city,
  186. p_brand,
  187. SUM(lo_revenue - lo_supplycost) AS PROFIT
  188. FROM dates, customer, supplier, part, lineorder
  189. WHERE
  190. lo_custkey = c_custkey
  191. AND lo_suppkey = s_suppkey
  192. AND lo_partkey = p_partkey
  193. AND lo_orderdate = d_datekey
  194. AND s_nation = 'UNITED STATES'
  195. AND (
  196. d_year = 1997
  197. OR d_year = 1998
  198. )
  199. AND p_category = 'MFGR#14'
  200. GROUP BY d_year, s_city, p_brand
  201. ORDER BY d_year, s_city, p_brand;