Spark in Kubernetes with OzoneFS
This recipe shows how Ozone object store can be used from Spark using:
- OzoneFS (Hadoop compatible file system)
- Hadoop 2.7 (included in the Spark distribution)
- Kubernetes Spark scheduler
- Local spark client
Requirements
Download latest Spark and Ozone distribution and extract them. This method is tested with the spark-2.4.6-bin-hadoop2.7
distribution.
You also need the following:
- A container repository to push and pull the spark+ozone images. (In this recipe we will use the dockerhub)
- A repo/name for the custom containers (in this recipe myrepo/ozone-spark)
- A dedicated namespace in kubernetes (we use yournamespace in this recipe)
Create the docker image for drivers
Create the base Spark driver/executor image
First of all create a docker image with the Spark image creator. Execute the following from the Spark distribution
./bin/docker-image-tool.sh -r myrepo -t 2.4.6 build
Note: if you use Minikube add the -m
flag to use the docker daemon of the Minikube image:
./bin/docker-image-tool.sh -m -r myrepo -t 2.4.6 build
./bin/docker-image-tool.sh
is an official Spark tool to create container images and this step will create multiple Spark container images with the name myrepo/spark. The first container will be used as a base container in the following steps.
Customize the docker image
Create a new directory for customizing the created docker image.
Copy the ozone-site.xml
from the cluster:
kubectl cp om-0:/opt/hadoop/etc/hadoop/ozone-site.xml .
And create a custom core-site.xml
.
<configuration>
<property>
<name>fs.AbstractFileSystem.o3fs.impl</name>
<value>org.apache.hadoop.fs.ozone.OzFs</value>
</property>
</configuration>
Copy the ozonefs.jar
file from an ozone distribution (use the hadoop2 version!)
kubectl cp om-0:/opt/hadoop/share/ozone/lib/ozone-filesystem-hadoop2-VERSION.jar ozone-filesystem-hadoop2.jar
Create a new Dockerfile and build the image:
FROM myrepo/spark:2.4.6
ADD core-site.xml /opt/hadoop/conf/core-site.xml
ADD ozone-site.xml /opt/hadoop/conf/ozone-site.xml
ENV HADOOP_CONF_DIR=/opt/hadoop/conf
ENV SPARK_EXTRA_CLASSPATH=/opt/hadoop/conf
ADD ozone-filesystem-hadoop2.jar /opt/ozone-filesystem-hadoop2.jar
docker build -t myrepo/spark-ozone
For remote Kubernetes cluster you may need to push it:
docker push myrepo/spark-ozone
Create a bucket
Download any text file and put it to the /tmp/alice.txt
first.
kubectl port-forward s3g-0 9878:9878
aws s3api --endpoint http://localhost:9878 create-bucket --bucket=test
aws s3api --endpoint http://localhost:9878 put-object --bucket test --key alice.txt --body /tmp/alice.txt
Create service account to use
kubectl create serviceaccount spark -n yournamespace
kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=yournamespace:spark --namespace=yournamespace
Execute the job
Execute the following spark-submit command, but change at least the following values:
- the Kubernetes master url (you can check your ~/.kube/config to find the actual value)
- the Kubernetes namespace (yournamespace in this example)
- serviceAccountName (you can use the spark value if you followed the previous steps)
- container.image (in this example this is myrepo/spark-ozone. This is pushed to the registry in the previous steps)
bin/spark-submit \
--master k8s://https://kubernetes:6443 \
--deploy-mode cluster \
--name spark-word-count \
--class org.apache.spark.examples.JavaWordCount \
--conf spark.executor.instances=1 \
--conf spark.kubernetes.namespace=yournamespace \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kubernetes.container.image=myrepo/spark-ozone \
--conf spark.kubernetes.container.image.pullPolicy=Always \
--jars /opt/ozone-filesystem-hadoop2.jar \
local:///opt/spark/examples/jars/spark-examples_2.11-2.4.0.jar \
o3fs://test.s3v.ozone-om-0.ozone-om:9862/alice.txt
Check the available spark-word-count-...
pods with kubectl get pod
Check the output of the calculation withkubectl logs spark-word-count-1549973913699-driver
You should see the output of the wordcount job. For example:
...
name: 8
William: 3
this,': 1
SOUP!': 1
`Silence: 1
`Mine: 1
ordered.: 1
considering: 3
muttering: 3
candle: 2
...