Topology awareness
Ozone can use topology related information (for example rack placement) to optimize read and write pipelines. To get full rack-aware cluster, Ozone requires three different configuration.
- The topology information should be configured by Ozone.
- Topology related information should be used when Ozone chooses 3 different datanodes for a specific pipeline/container. (WRITE)
- When Ozone reads a Key it should prefer to read from the closest node.
Ozone uses RAFT replication for Open containers (write), and an async replication for closed, immutable containers (cold data). As RAFT requires low-latency network, topology awareness placement is available only for closed containers. See the page about Containers about more information related to Open vs Closed containers.
Topology hierarchy
Topology hierarchy can be configured with using net.topology.node.switch.mapping.impl
configuration key. This configuration should define an implementation of the org.apache.hadoop.net.CachedDNSToSwitchMapping
. As this is a Hadoop class, the configuration is exactly the same as the Hadoop Configuration
Static list
Static list can be configured with the help of TableMapping
:
<property>
<name>net.topology.node.switch.mapping.impl</name>
<value>org.apache.hadoop.net.TableMapping</value>
</property>
<property>
<name>net.topology.table.file.name</name>
<value>/opt/hadoop/compose/ozone-topology/network-config</value>
</property>
The second configuration option should point to a text file. The file format is a two column text file, with columns separated by whitespace. The first column is a DNS or IP address and the second column specifies the rack where the address maps. If no entry corresponding to a host in the cluster is found, then /default-rack
is assumed.
Dynamic list
Rack information can be identified with the help of an external script:
<property>
<name>net.topology.node.switch.mapping.impl</name>
<value>org.apache.hadoop.net.ScriptBasedMapping</value>
</property>
<property>
<name>net.topology.script.file.name</name>
<value>/usr/local/bin/rack.sh</value>
</property>
If implementing an external script, it will be specified with the net.topology.script.file.name
parameter in the configuration files. Unlike the java class, the external topology script is not included with the Ozone distribution and is provided by the administrator. Ozone will send multiple IP addresses to ARGV when forking the topology script. The number of IP addresses sent to the topology script is controlled with net.topology.script.number.args
and defaults to 100. If net.topology.script.number.args
was changed to 1, a topology script would get forked for each IP submitted.
Write path
Placement of the closed containers can be configured with ozone.scm.container.placement.impl
configuration key. The available container placement policies can be found in the org.apache.hdds.scm.container.placement
package.
By default the SCMContainerPlacementRandom
is used for topology-awareness the SCMContainerPlacementRackAware
can be used:
<property>
<name>ozone.scm.container.placement.impl</name>
<value>org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRackAware</value>
</property>
This placement policy complies with the algorithm used in HDFS. With default 3 replica, two replicas will be on the same rack, the third one will on a different rack.
This implementation applies to network topology like “/rack/node”. Don’t recommend to use this if the network topology has more layers.
Read path
Finally the read path also should be configured to read the data from the closest pipeline.
<property>
<name>ozone.network.topology.aware.read</name>
<value>true</value>
</property>
References
- Hadoop documentation about
net.topology.node.switch.mapping.impl
: https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/RackAwareness.html - Design doc