Installation for Spark
Follow instructions on spark doc:
- https://spark.apache.org/docs/latest/configuration.html#inheriting-hadoop-cluster-configuration
- https://spark.apache.org/docs/latest/configuration.html#custom-hadoophive-configuration
installation inheriting from Hadoop cluster configuration
Inheriting from Hadoop cluster configuration should be the easiest way.
To make these files visible to Spark, set HADOOP_CONF_DIR in $SPARK_HOME/conf/spark-env.sh to a location containing the configuration file core-site.xml
, usually /etc/hadoop/conf
installation not inheriting from Hadoop cluster configuration
Copy the seaweedfs-hadoop2-client-x.x.x.jar to all executor machines.
Add the following to spark/conf/spark-defaults.conf on every node running Spark
spark.driver.extraClassPath /path/to/seaweedfs-hadoop2-client-x.x.x.jar
spark.executor.extraClassPath /path/to/seaweedfs-hadoop2-client-x.x.x.jar
And modify the configuration at runntime:
./bin/spark-submit \
--name "My app" \
--master local[4] \
--conf spark.eventLog.enabled=false \
--conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps" \
--conf spark.hadoop.fs.seaweedfs.impl=seaweed.hdfs.SeaweedFileSystem \
--conf spark.hadoop.fs.defaultFS=seaweedfs://localhost:8888 \
myApp.jar