Oracle Cloud Infrastructure

The Oracle Object Storage system provides strongly-consistent operations on all buckets in all regions. OCI Object Storage provides an HDFS Connector your Application will need to access data.

OCI Configs

To use HUDI on OCI Object Storage you must:

  • Configure the HDFS Connector using an API key
  • Include the HDFS Connector and dependencies in your application
  • Construct an OCI HDFS URI

Configuring the HDFS Connector

The OCI HDFS Connector requires configurations from an API key to authenticate and select the correct region. Start by generating an API key.

If you are using Hadoop, include these in your core-site.xml:

  1. <property>
  2. <name>fs.oci.client.auth.tenantId</name>
  3. <value>ocid1.tenancy.oc1..[tenant]</value>
  4. <description>The OCID of your OCI tenancy</description>
  5. </property>
  6. <property>
  7. <name>fs.oci.client.auth.userId</name>
  8. <value>ocid1.user.oc1..[user]</value>
  9. <description>The OCID of your OCI user</description>
  10. </property>
  11. <property>
  12. <name>fs.oci.client.auth.fingerprint</name>
  13. <value>XX::XX</value>
  14. <description>Your 32-digit hexidecimal public key fingerprint</description>
  15. </property>
  16. <property>
  17. <name>fs.oci.client.auth.pemfilepath</name>
  18. <value>/path/to/file</value>
  19. <description>Local path to your private key file</description>
  20. </property>
  21. <property>
  22. <name>fs.oci.client.auth.hostname</name>
  23. <value>https://objectstorage.[region].oraclecloud.com</value>
  24. <description>HTTPS endpoint of your regional object store</description>
  25. </property>

If you are using Spark outside of Hadoop, set these configurations in your Spark Session:

KeyDescription
spark.hadoop.fs.oci.client.auth.tenantIdThe OCID of your OCI tenancy
spark.hadoop.fs.oci.client.auth.userIdThe OCID of your OCI user
spark.hadoop.fs.oci.client.auth.fingerprintYour 32-digit hexidecimal public key fingerprint
spark.hadoop.fs.oci.client.auth.pemfilepathLocal path to your private key file
spark.hadoop.fs.oci.client.hostnameHTTPS endpoint of your regional object store

If you are running Spark in OCI Data Flow you do not need to configure these settings, object storage access is configured for you.

Libraries

These libraries need to be added to your application. The versions below are a reference, the libraries are continuously updated and you should check for later releases in Maven Central.

  • com.oracle.oci.sdk:oci-java-sdk-core:2.18.0
  • com.oracle.oci.sdk:oci-hdfs-connector:3.3.0.5

Construct an OCI HDFS URI

OCI HDFS URIs have the form of:

oci://<bucket>@<namespace>/<path>

The HDFS connector allows you to treat these locations similar to an HDFS location on Hadoop. Your tenancy has a unique Object Storage namespace. If you’re not sure what your namespace is you can find it by installing the OCI CLI and running oci os ns get.