Alibaba Cloud

In this page, we explain how to get your Hudi spark job to store into Aliyun OSS.

Aliyun OSS configs

There are two configurations required for Hudi-OSS compatibility:

  • Adding Aliyun OSS Credentials for Hudi
  • Adding required Jars to classpath

Aliyun OSS Credentials

Add the required configs in your core-site.xml from where Hudi can fetch them. Replace the fs.defaultFS with your OSS bucket name, replace fs.oss.endpoint with your OSS endpoint, replace fs.oss.accessKeyId with your OSS key, replace fs.oss.accessKeySecret with your OSS secret. Hudi should be able to read/write from the bucket.

  1. <property>
  2. <name>fs.defaultFS</name>
  3. <value>oss://bucketname/</value>
  4. </property>
  5. <property>
  6. <name>fs.oss.endpoint</name>
  7. <value>oss-endpoint-address</value>
  8. <description>Aliyun OSS endpoint to connect to.</description>
  9. </property>
  10. <property>
  11. <name>fs.oss.accessKeyId</name>
  12. <value>oss_key</value>
  13. <description>Aliyun access key ID</description>
  14. </property>
  15. <property>
  16. <name>fs.oss.accessKeySecret</name>
  17. <value>oss-secret</value>
  18. <description>Aliyun access key secret</description>
  19. </property>
  20. <property>
  21. <name>fs.oss.impl</name>
  22. <value>org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem</value>
  23. </property>

Aliyun OSS Libs

Aliyun hadoop libraries jars to add to our pom.xml. Since hadoop-aliyun depends on the version of hadoop 2.9.1+, you need to use the version of hadoop 2.9.1 or later.

  1. <dependency>
  2. <groupId>org.apache.hadoop</groupId>
  3. <artifactId>hadoop-aliyun</artifactId>
  4. <version>3.2.1</version>
  5. </dependency>
  6. <dependency>
  7. <groupId>com.aliyun.oss</groupId>
  8. <artifactId>aliyun-sdk-oss</artifactId>
  9. <version>3.8.1</version>
  10. </dependency>
  11. <dependency>
  12. <groupId>org.jdom</groupId>
  13. <artifactId>jdom</artifactId>
  14. <version>1.1</version>
  15. </dependency>