O3fs (Hadoop compatible)

The Hadoop compatible file system interface allows storage backends like Ozone to be easily integrated into Hadoop eco-system. Ozone file system is an Hadoop compatible file system.

Currently, Ozone supports two scheme: o3fs:// and ofs://. The biggest difference between the o3fs and ofs,is that o3fs supports operations only at a single bucket, while ofs supports operations across all volumes and buckets and provides a full view of all the volume/buckets.

Setting up the o3fs

To create an ozone file system, we have to choose a bucket where the file system would live. This bucket will be used as the backend store for OzoneFileSystem. All the files and directories will be stored as keys in this bucket.

Please run the following commands to create a volume and bucket, if you don’t have them already.

  1. ozone sh volume create /volume
  2. ozone sh bucket create /volume/bucket

Once this is created, please make sure that bucket exists via the list volume or list bucket commands.

Please add the following entry to the core-site.xml.

  1. <property>
  2. <name>fs.AbstractFileSystem.o3fs.impl</name>
  3. <value>org.apache.hadoop.fs.ozone.OzFs</value>
  4. </property>
  5. <property>
  6. <name>fs.defaultFS</name>
  7. <value>o3fs://bucket.volume</value>
  8. </property>

Tip: For the OM HA cluster, you need to specify the ozone service id. For example, if ozone.om.service.ids = ozone1, then the URL is o3fs://bucket.volume.ozone1. For non-HA, it can be o3fs://bucket.volume.

This will make this bucket to be the default Hadoop compatible file system and register the o3fs file system type.

You also need to add the ozone-filesystem-hadoop3.jar file to the classpath:

  1. export HADOOP_CLASSPATH=/opt/ozone/share/ozone/lib/ozone-filesystem-hadoop3-*.jar:$HADOOP_CLASSPATH

(Note: with Hadoop 2.x, use the ozone-filesystem-hadoop2-*.jar)

Once the default Filesystem has been setup, users can run commands like ls, put, mkdir, etc. For example,

  1. hdfs dfs -ls /

or

  1. hdfs dfs -mkdir /users

Or put command etc. In other words, all programs like Hive, Spark, and Distcp will work against this file system. Please note that any keys created/deleted in the bucket using methods apart from OzoneFileSystem will show up as directories and files in the Ozone File System.

Note: Bucket and volume names are not allowed to have a period in them.

Moreover, the filesystem URI can take a fully qualified form with the OM host and an optional port as a part of the path following the volume name. For example, you can specify both host and port:

  1. hdfs dfs -ls o3fs://bucket.volume.om-host.example.com:5678/key

When the port number is not specified, it will be retrieved from config key ozone.om.address if defined; or it will fall back to the default port 9862. For example, we have ozone.om.address configured as following in ozone-site.xml:

  1. <property>
  2. <name>ozone.om.address</name>
  3. <value>0.0.0.0:6789</value>
  4. </property>

When we run command:

  1. hdfs dfs -ls o3fs://bucket.volume.om-host.example.com/key

The above command is essentially equivalent to:

  1. hdfs dfs -ls o3fs://bucket.volume.om-host.example.com:6789/key

Note: Only port number from the config is used in this case, whereas the host name in the config ozone.om.address is ignored.