Installation for Spark
- installation inheriting from Hadoop cluster configuration
- installation not inheriting from Hadoop cluster configuration

Installation for Spark

Follow instructions on spark doc:

installation inheriting from Hadoop cluster configuration

Inheriting from Hadoop cluster configuration should be the easiest way.

To make these files visible to Spark, set HADOOP_CONF_DIR in $SPARK_HOME/conf/spark-env.sh to a location containing the configuration file core-site.xml, usually /etc/hadoop/conf

installation not inheriting from Hadoop cluster configuration

Copy the seaweedfs-hadoop2-client-x.x.x.jar to all executor machines.

Add the following to spark/conf/spark-defaults.conf on every node running Spark

spark.driver.extraClassPath   /path/to/seaweedfs-hadoop2-client-x.x.x.jar
spark.executor.extraClassPath /path/to/seaweedfs-hadoop2-client-x.x.x.jar

And modify the configuration at runntime:

./bin/spark-submit \ 
  --name "My app" \ 
  --master local[4] \  
  --conf spark.eventLog.enabled=false \ 
  --conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps" \ 
  --conf spark.hadoop.fs.seaweedfs.impl=seaweed.hdfs.SeaweedFileSystem \ 
  --conf spark.hadoop.fs.defaultFS=seaweedfs://localhost:8888 \ 
  myApp.jar

run Spark on SeaweedFS

Installation for Spark

installation inheriting from Hadoop cluster configuration

installation not inheriting from Hadoop cluster configuration