Merge Container RocksDB in DN

In Ozone, user data are separated into blocks and stored in HDDS Containers. Containers are the fundamental replication unit of Ozone/HDDS. Each Container has its metadata and data. Data are saved as files on disk. Metadata is saved in RocksDB.

Currently there will be one RocksDB for each Container on datanode. With user data continously grow, there will be hundreds of thousands of RocksDB instances on one datanode. It’s a big challenge to manage this amount of RocksDB instances in one JVM.

Unlike the current approach, this “Merge Container RocksDB in DN” feature will use only one RocksDB for each data volume, holding all metadata of Containers in this RocksDB.

Configuration

This is mainly a DN feature, which doesn’t require much configuration.

Here is a configuration which disable this feature if the current one RocksDB for each container mode is more preferred. Please be noted that once the feature is enabled, it’s strongly suggested not to disable it in later.

  1. <property>
  2. <name>hdds.datanode.container.schema.v3.enabled</name>
  3. <value>false</value>
  4. <description>Disable or enable this feature.</description>
  5. </property>

Without any specific configuration, the single RocksDB will be created under the data volume configured in “hdds.datanode.dir”.

For some advanced cluster admins who have the high performance requirement, he/she can leverage quick storages to save RocksDB. In this case, configure these two properties.

  1. <property>
  2. <name>hdds.datanode.container.db.dir</name>
  3. <value/>
  4. <description>This setting is optional. Specify where the per-disk rocksdb instances will be stored.</description>
  5. </property>
  6. <property>
  7. <name>hdds.datanode.failed.db.volumes.tolerated</name>
  8. <value>-1</value>
  9. <description>The number of db volumes that are allowed to fail before a datanode stops offering service.
  10. Default -1 means unlimited, but we should have at least one good volume left.</description>
  11. </property>

Backward compatibility

Existing containers each has one RocksDB for them will be still accessible after this feature is enabled. All container data will co-exist in an existing Ozone cluster.

References