Topology awareness

Topology awareness

Ozone can use topology related information (for example rack placement) to optimize read and write pipelines. To get full rack-aware cluster, Ozone requires three different configuration.

The topology information should be configured by Ozone.
Topology related information should be used when Ozone chooses 3 different datanodes for a specific pipeline/container. (WRITE)
When Ozone reads a Key it should prefer to read from the closest node.

Ozone uses RAFT replication for Open containers (write), and an async replication for closed, immutable containers (cold data). As RAFT requires low-latency network, topology awareness placement is available only for closed containers. See the page about Containers about more information related to Open vs Closed containers.

Topology hierarchy

Topology hierarchy can be configured with using net.topology.node.switch.mapping.impl configuration key. This configuration should define an implementation of the org.apache.hadoop.net.CachedDNSToSwitchMapping. As this is a Hadoop class, the configuration is exactly the same as the Hadoop Configuration

Static list

Static list can be configured with the help of TableMapping:

<property>
   <name>net.topology.node.switch.mapping.impl</name>
   <value>org.apache.hadoop.net.TableMapping</value>
</property>
<property>
   <name>net.topology.table.file.name</name>
   <value>/opt/hadoop/compose/ozone-topology/network-config</value>
</property>

The second configuration option should point to a text file. The file format is a two column text file, with columns separated by whitespace. The first column is a DNS or IP address and the second column specifies the rack where the address maps. If no entry corresponding to a host in the cluster is found, then /default-rack is assumed.

Dynamic list

Rack information can be identified with the help of an external script:

<property>
   <name>net.topology.node.switch.mapping.impl</name>
   <value>org.apache.hadoop.net.ScriptBasedMapping</value>
</property>
<property>
   <name>net.topology.script.file.name</name>
   <value>/usr/local/bin/rack.sh</value>
</property>

If implementing an external script, it will be specified with the net.topology.script.file.name parameter in the configuration files. Unlike the java class, the external topology script is not included with the Ozone distribution and is provided by the administrator. Ozone will send multiple IP addresses to ARGV when forking the topology script. The number of IP addresses sent to the topology script is controlled with net.topology.script.number.args and defaults to 100. If net.topology.script.number.args was changed to 1, a topology script would get forked for each IP submitted.

Write path

Placement of the closed containers can be configured with ozone.scm.container.placement.impl configuration key. The available container placement policies can be found in the org.apache.hdds.scm.container.placement package.

By default the SCMContainerPlacementRandom is used for topology-awareness the SCMContainerPlacementRackAware can be used:

<property>
   <name>ozone.scm.container.placement.impl</name>
   <value>org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRackAware</value>
</property>

This placement policy complies with the algorithm used in HDFS. With default 3 replica, two replicas will be on the same rack, the third one will on a different rack.

This implementation applies to network topology like “/rack/node”. Don’t recommend to use this if the network topology has more layers.

Read path

Finally the read path also should be configured to read the data from the closest pipeline.

<property>
   <name>ozone.network.topology.aware.read</name>
   <value>true</value>
</property>

References

Hadoop documentation about net.topology.node.switch.mapping.impl: https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/RackAwareness.html
Design doc