The diagram below visualizes the deployment architecture of Doris in the compute-storage mode. It involves three modules:

  • FE: Responsible for receiving user requests and storing the meta data of databases and tables. It is currently stateful, but will evolve to be stateless like BE.
  • BE: Stateless BE nodes, responsible for computation. The BE will cache a portion of the Tablet metadata and data to improve query performance.
  • Meta Service: A new module added in the compute-storage decoupled mode, with the program name doris_cloud, which can be specified as one of the following two roles by starting with different parameters:
    • Meta Service: Responsible for metadata management. It provides services for metadata operations, such as creating Tablets, adding Rowsets, and querying metadata of Tablets and Rowsets.
    • Recycler: Responsible for data recycling. It implements periodic asynchronous forward recycling of data by regularly scanning the metadata of the data marked for deletion (the data files are stored on S3 or HDFS), without the need to list the data objects for metadata comparison.

apache-doris-in-compute-storage-decoupled-mode

The Meta Service is a stateless service that relies on FoundationDB, a high-performance distributed transactional KV store, to store metadata. This greatly simplifies the metadata management process and provides high horizontal scalability.

deployment-of-compute-storage-decoupled-mode

Deploying Doris in the compute-storage decoupled mode relies on two open-source projects. Please install the following dependencies before proceeding:

  • FoundationDB (FDB)
  • OpenJDK17: Needs to be installed on all nodes where the Meta Service is deployed.

Deployment steps

Given the modules and their functionalities, it is recommended to deploy Doris in the compute-storage decoupled mode from bottom up:

  1. Machine planning: Follow the instructions on this page.
  2. Deployment of FoundationDB and the required runtime dependencies: This step can be completed without the need for any Doris compilation outputs. Follow the instructions on this page.
  3. Deploy Meta Service and Recycler
  4. Deploy FE and BE

Before Deployment - 图3info

Note: A single FoundationDB + Meta Service + Recycler infrastructure can support multiple Doris instances (i.e., multiple FE + BE setups) running in the compute-storage decoupled mode.

Deployment planning

To avoid inter-module interference as much as possible, the recommended deployment is to deploy module by module.

  • The Meta Service, Recycler, and FoundationDB modules use the same set of machines, with a minimum requirement of 3 machines.
    • To enable the compute-storage decoupled mode, at least one Meta Service process and one Recycler process must be deployed. These stateless processes can be scaled as needed, typically with 3 instances for each.
    • To ensure the performance, reliability, and scalability of FoundationDB, a multi-replica deployment is required.
  • FE is deployed independently, with a minimum of 1 machine, and can be scaled out based on the actual query demands.
  • BE is deployed independently, with a minimum of 1 machine, and can be scaled out based on the actual query demands.
  1. Host1 Host2
  2. .------------------. .------------------.
  3. | | | |
  4. | FE | | BE |
  5. | | | |
  6. '------------------' '------------------'
  7. Host3 Host4 Host5
  8. .------------------. .------------------. .------------------.
  9. | Recycler | | Recycler | | Recycler |
  10. | Meta Service | | Meta Service | | Meta Service |
  11. | FoundationDB | | FoundationDB | | FoundationDB |
  12. '------------------' '------------------' '------------------'

If machine resources are limited, a hybrid deployment approach can be used, where all the modules are deployed on the same set of machines. This approach requires a minimum of 3 machines.

One feasible planning is as follows:

  1. Host1 Host2 Host3
  2. .------------------. .------------------. .------------------.
  3. | | | | | |
  4. | FE | | | | |
  5. | | | BE | | BE |
  6. | Recycler | | | | |
  7. | Meta Servcie | | | | |
  8. | FoundationDB | | FoundationDB | | FoundationDB |
  9. | | | | | |
  10. '------------------' '------------------' '------------------'

Install FoundationDB

Machine requirements

Typically, at least 3 machines are required to form a FoundationDB cluster having double data replicas and allowing for failure of a single machine.

Before Deployment - 图4tip

If this is only for development/testing purposes, a single machine will be enough.

Each machine needs to have the FoundationDB service installed first. You can download the FoundationDB installation package from here. Currently, the 7.1.38 version is generally recommended.

For CentOS (Red Hat) and Ubuntu users, the download links are as follows:

If you need faster downloads, you can also use the following image links:

Use the following command to install FoundationDB:

  1. // Ubuntu user@host
  2. $ sudo dpkg -i foundationdb-clients_7.1.23-1_amd64.deb \ foundationdb-server_7.1.23-1_amd64.deb
  3. // CentOS
  4. user@host$ sudo rpm -Uvh foundationdb-clients-7.1.23-1.el7.x86_64.rpm \ foundationdb-server-7.1.23-1.el7.x86_64.rpm

Enter fdbcli in the command line to check if the installation was successful. If the output shows the word available, it indicates a successful installation:

  1. user@host$ fdbcli
  2. Using cluster file `/etc/foundationdb/fdb.cluster'.
  3. The database is available.
  4. Welcome to the fdbcli. For help, type `help'.

Before Deployment - 图5info

After a successful installation:

  • By default, a FoundationDB service will be started.
  • By default, the cluster information file fdb.cluster will be stored at /etc/foundationdb/fdb.cluster, and the default cluster configuration file foundationdb.conf will be stored at /etc/foundationdb/foundationdb.conf.
  • By default, the data and logs will be saved in /var/lib/foundationdb/data/ and /var/log/foundationdb.
  • By default, a FoundationDB user and group will be created. The paths for the data and logs are already granted with access permissions to FoundationDB.

Primary machine configuration

Select one of the three machines to be the primary machine. Configure the primary machine first, and then the other machines.

Modify FoundationDB configuration

Adjust the FoundationDB configurations based on different hardware specifications. You may follow the FoundationDB System Requirements guidelines.

This is an example foundationdb.conf configuration file for a machine with 8 CPU cores, 32 GB of memory, and a 500 GB SSD data disk. Ensure that the datadir and logdir paths are set correctly. The data disk is typically mounted at /mnt:

  1. # foundationdb.conf
  2. ##
  3. ## Configuration file for FoundationDB server processes
  4. ## Full documentation is available at
  5. ## https://apple.github.io/foundationdb/configuration.html#the-configuration-file
  6. [fdbmonitor]
  7. user = foundationdb
  8. group = foundationdb
  9. [general]
  10. restart-delay = 60
  11. ## By default, restart-backoff = restart-delay-reset-interval = restart-delay
  12. # initial-restart-delay = 0
  13. # restart-backoff = 60
  14. # restart-delay-reset-interval = 60
  15. cluster-file = /etc/foundationdb/fdb.cluster
  16. # delete-envvars =
  17. # kill-on-configuration-change = true
  18. ## Default parameters for individual fdbserver processes
  19. [fdbserver]
  20. command = /usr/sbin/fdbserver
  21. public-address = auto:$ID
  22. listen-address = public
  23. logdir = /mnt/foundationdb/log
  24. datadir = /mnt/foundationdb/data/$ID
  25. # logsize = 10MiB
  26. # maxlogssize = 100MiB
  27. # machine-id =
  28. # datacenter-id =
  29. # class =
  30. # memory = 8GiB
  31. # storage-memory = 1GiB
  32. # cache-memory = 2GiB
  33. # metrics-cluster =
  34. # metrics-prefix =
  35. ## An individual fdbserver process with id 4500
  36. ## Parameters set here override defaults from the [fdbserver] section
  37. [fdbserver.4500]
  38. class = stateless
  39. [fdbserver.4501]
  40. class = stateless
  41. [fdbserver.4502]
  42. class = storage
  43. [fdbserver.4503]
  44. class = storage
  45. [fdbserver.4504]
  46. class = log
  47. [backup_agent]
  48. command = /usr/lib/foundationdb/backup_agent/backup_agent
  49. logdir = /mnt/foundationdb/log
  50. [backup_agent.1]

Firstly, on the primary host machine, create the directories corresponding to the configured datadir and logdir paths, and grant the foundationdb user and group access to them.

  1. chown -R foundationdb:foundationdb /mnt/foundationdb/data/ /mnt/foundationdb/log

Then, replace the relevant contents of the /etc/foundationdb/foundationdb.conf file with the corresponding configurations.

Configure access privilege

Set the access privileges for the /etc/foundationdb directory:

  1. chmod -R 777 /etc/foundationdb

On the primary machine, update the ip in the /etc/foundationdb/fdb.cluster file. It is set to the address of the local machine by default, and it should be updated to the appropriate internal network address. For example:

  1. 3OrXp9ei:diDqAjYV@127.0.0.1:4500 -> 3OrXp9ei:diDqAjYV@172.21.16.37:4500

Then, restart the FoundationDB service to apply the changes:

  1. # for service
  2. user@host$ sudo service foundationdb restart
  3. # for systemd
  4. user@host$ sudo systemctl restart foundationdb.service

Configure a new database

Due to changes in the storage paths for data and log, a new database needs to be created on the primary machine. This can be done in fdbcli by creating a new database with ssd as the storage engine.

  1. user@host$ fdbcli
  2. fdb> configure new single ssd
  3. Database created

Finally, check through fdbcli to see if it starts up normally.

  1. user@host$ fdbcli
  2. Using cluster file `/etc/foundationdb/fdb.cluster'.
  3. The database is available.
  4. Welcome to the fdbcli. For help, type `help'.

At this point, the configuration of the primary machine is completed.

Build FoundationDB cluster

Before Deployment - 图6tip

If you are only deploying a single machine for development or testing, you can skip this step.

For machines other than the primary machine, follow the same steps of configuring the primary machine to create the data and log directories. Then, set access privileges to the /etc/foundationdbdirectory:

  1. chmod -R 777 /etc/foundationdb

Replace /etc/foundationdb/foundationdb.conf and /etc/foundationdb/fdb.cluster of the primary machine with those of the local machine.

Then, restart FoundationDB service on the local machine.

  1. # for service
  2. user@host$ sudo service foundationdb restart
  3. # for systemd
  4. user@host$ sudo systemctl restart foundationdb.service

After these steps on all machines, the machines will be connected to the same cluster (i.e., the same fdb.cluster). Log in to the primary machine and configure double replicas.

  1. user@host$ fdbcli
  2. Using cluster file `/etc/foundationdb/fdb.cluster'.
  3. The database is available.
  4. Welcome to the fdbcli. For help, type `help'.
  5. fdb> configure double
  6. Configuration changed.

Then, on the primary machine, configure the fdb.clusterfile with the accessible machines and ports for disaster recovery purposes.

  1. user@host$ fdbcli
  2. Using cluster file `/etc/foundationdb/fdb.cluster'.
  3. The database is available.
  4. Welcome to the fdbcli. For help, type `help'.
  5. fdb> coordinators ${primary machine ip}:4500 ${secondary machine 1 ip}:4500 ${secondary machine 2 ip}:4500 (Fill in all machines)
  6. Coordinators changed

Finally, check if the configuration is successful using the status command in fdbcli:

  1. [root@ip-10-100-3-91 recycler]# fdbcli
  2. Using cluster file `/etc/foundationdb/fdb.cluster'.
  3. The database is available.
  4. Welcome to the fdbcli. For help, type `help'.
  5. fdb> status
  6. Using cluster file `/etc/foundationdb/fdb.cluster'.
  7. Configuration:
  8. Redundancy mode - double
  9. Storage engine - ssd-2
  10. Coordinators - 3
  11. Usable Regions - 1
  12. Cluster:
  13. FoundationDB processes - 15
  14. Zones - 3
  15. Machines - 3
  16. Memory availability - 6.1 GB per process on machine with least available
  17. Fault Tolerance - 1 machines
  18. Server time - 11/11/22 04:47:30
  19. Data:
  20. Replication health - Healthy
  21. Moving data - 0.000 GB
  22. Sum of key-value sizes - 0 MB
  23. Disk space used - 944 MB
  24. Operating space:
  25. Storage server - 473.9 GB free on most full server
  26. Log server - 473.9 GB free on most full server
  27. Workload:
  28. Read rate - 19 Hz
  29. Write rate - 0 Hz
  30. Transactions started - 5 Hz
  31. Transactions committed - 0 Hz
  32. Conflict rate - 0 Hz
  33. Backup and DR:
  34. Running backups - 0
  35. Running DRs - 0

Install OpenJDK17

All nodes must have OpenJDK 17 installed. You can download the installation package from the following link: OpenJDK 17

Then, simply extract the downloaded OpenJDK package directly to the installation path:

  1. tar xf openjdk-17.0.1_linux-x64_bin.tar.gz -C /opt/
  2. # Before starting Meta Service or Recycler
  3. export JAVA_HOME=/opt/jdk-17.0.1

Note

The machines deployed with FoundationDB can also be deployed with Meta Service and Recycler, which is also the recommended deployment method to save on machine resources.