TPC-DS Benchmark

TPC-DS (Transaction Processing Performance Council Decision Support Benchmark) is a benchmark test that focuses on decision support and aims to evaluate the performance of data warehousing and analytics systems. It was developed by the Transaction Processing Performance Council (TPC) organization to compare the capabilities of different systems in handling complex queries and large-scale data analysis.

The design goal of TPC-DS is to simulate complex decision support workloads in the real world. It tests the performance of systems through a series of complex queries and data operations, including joins, aggregations, sorting, filtering, subqueries, and more. These query patterns cover various scenarios ranging from simple to complex, such as report generation, data mining, and OLAP (Online Analytical Processing).

This document mainly introduces the performance of Doris on the TPC-DS 1000G test set.

On 99 queries on the TPC-DS standard test data set, we conducted a comparison test based on Apache Doris 2.1.1-rc03 and Apache Doris 2.0.6 versions.

TPCDS_1000G

1. Hardware Environment

HardwareConfiguration Instructions
Number of mMachines4 Tencent Cloud Virtual Machine(1FE,3BEs)
CPUAMD EPYC™ Milan(2.55GHz/3.5GHz) 48C
Memory192G
Network21Gbps
DiskESSD Cloud Hard Disk

2. Software Environment

  • Doris Deployed 3BEs and 1FE
  • Kernel Version: Linux version 5.4.0-96-generic (buildd@lgw01-amd64-051)
  • OS version: Ubuntu 20.04 LTS (Focal Fossa)
  • Doris software version: Apache Doris 2.1.1-rc03, Apache Doris 2.0.6.
  • JDK: openjdk version “1.8.0_131”

3. Test Data Volume

The TPC-DS 1000G data generated by the simulation of the entire test are respectively imported into Apache Doris 2.1.1-rc03 and Apache Doris 2.0.6 for testing. The following is the relevant description and data volume of the table.

TPC-DS Table NameRows
customer_demographics1,920,800
reason65
warehouse20
date_dim73,049
catalog_sales1,439,980,416
call_center42
inventory783,000,000
catalog_returns143,996,756
household_demographics7,200
customer_address6,000,000
income_band20
catalog_page30,000
item300,000
web_returns71,997,522
web_site54
promotion1,500
web_sales720,000,376
store1,002
web_page3,000
time_dim86,400
store_returns287,999,764
store_sales2,879,987,999
ship_mode20
customer12,000,000

4. Test SQL

TPC-DS 99 test query statements : TPC-DS-Query-SQL

5. Test Results

Here we use Apache Doris 2.1.1-rc03 and Apache Doris 2.0.6 for comparative testing. In the test, we use Query Time(ms) as the main performance indicator. The test results are as follows:

QueryApache Doris 2.1.1-rc03 (ms)Apache Doris 2.0.6 (ms)
query1729914
query251204669
query3286285
query41163335148
query564122979
query62671351
query7468517
query8263591
query944445430
query104183341
query11724623300
query12115105
query136611719
query141395533254
query154741414
query16366402
query1710972371
query18581760
query19283308
query20137117
query2111094
query2219962481
query234482677381
query24987323910
query256661021
query26221213
query27490544
query2840894593
query297681024
query30313682
query3118472252
query327168
query33460539
query34629638
query35166010505
query36412441
query379486
query3888048379
query39606898
query40164190
query415530
query42115113
query438041332
query4415091520
query4516781306
query4611962167
query4728123859
query485591419
query49646725
query507571299
query5163804954
query52128123
query53396391
query543888212
query55124124
query56360434
query5718112494
query58304666
query5957587432
query60474481
query61486536
query626471082
query63358303
query6432504968
query6554105971
query66484603
query672634734052
query6814221428
query69654808
query7022854462
query716501006
query7243244717
query73500558
query74667814127
query7537346312
query7618351870
query77382496
query781992323091
query7930614090
query808511559
query81565960
query82242221
query83254415
query84203131
query85364444
query86651931
query8789728554
query8840955202
query89508480
query90233322
query91174159
query926259
query9316011618
query94297297
query95124027354
query96508847
query97544911528
query98382287
query9914102147
Total264028487990

6. Environmental Preparation

Please refer to the official document to install and deploy Doris to obtain a normal running Doris cluster (at least 1 FE 1 BE, 1 FE 3 BE is recommended).

7. Data Preparation

7.1 Download and Install TPC-DS Data Generation Tool

Execute the following script to download and compile the tpcds-tools tool.

  1. sh bin/build-tpcds-dbgen.sh

7.2 Generating the TPC-DS Test Set

Execute the following script to generate the TPC-H dataset:

  1. sh bin/gen-tpcds-data.sh -s 1000

Note 1: Check the script help via sh gen-tpcds-data.sh -h.

Note 2: The data will be generated under the tpcds-data/ directory with the suffix .dat. The total file size is about 1000GB and may need a few minutes to an hour to generate.

Note 3: A standard test data set of 100G is generated by default.

7.3 Create Table

7.3.1 Prepare the doris-cluster.conf File

Before import the script, you need to write the FE’s ip port and other information in the doris-cluster.conf file.

The file is located under ${DORIS_HOME}/tools/tpcds-tools/conf/ .

The content of the file includes FE’s ip, HTTP port, user name, password and the DB name of the data to be imported:

  1. # Any of FE host
  2. export FE_HOST='127.0.0.1'
  3. # http_port in fe.conf
  4. export FE_HTTP_PORT=8030
  5. # query_port in fe.conf
  6. export FE_QUERY_PORT=9030
  7. # Doris username
  8. export USER='root'
  9. # Doris password
  10. export PASSWORD=''
  11. # The database where TPC-H tables located
  12. export DB='tpcds'

Execute the Following Script to Generate and Create TPC-H Table

  1. sh bin/create-tpcds-tables.sh -s 1000

Or copy the table creation statement in create-tpcds-tables.sql and excute it in Doris.

7.4 Import Data

Please perform data import with the following command:

  1. sh bin/load-tpcds-data.sh

7.5 Query Test

7.5.1 Executing Query Scripts

Execute the above test SQL or execute the following command

  1. sh bin/run-tpcds-queries.sh -s 1000

7.5.2 Single SQL Execution

You can also retrieve the latest SQL from the code repository. The address for the latest test query statements of TPC-DS.