Running Alluxio on Alibaba Cloud Container Service for Kubernetes (ACK)

Slack Docker Pulls GitHub edit source

This guide describes how to install and configure Alluxio on Alibaba Cloud Container Service for Kubernetes (ACK).

Prerequisites

  • ACK version >= 1.12.6

Install Alluxio in ACK

This section introduces how to install Alluxio on Alibaba Cloud Container Service for Kubernetes (ACK) in a few steps.

Specify Which Nodes to Install Alluxio

Before installing Alluxio components, you need to label the target Kubernetes nodes with “alluxio=true”, the steps are as follows:

Select cluster

Login to Container Service - Kubernetes Console. Under the Kubernetes menu, click “Clusters” > “Nodes” in the left navigation bar to enter the node list page. Select the specific cluster and click the “Manage Labels” in the upper right corner of the page. Alibaba Cloud ACK - 图4

Select nodes

In the node list, select nodes in batches, and then click “Add Label”. Alibaba Cloud ACK - 图5

Add label

Fill in the label name as “alluxio” and the value as “true”, click “OK”. Alibaba Cloud ACK - 图6

Install Alluxio Using App Catalog

Login to Container Service - Kubernetes Console. Select “Marketplace” > “App Catalog” on the left navigation bar, and select Alluxio on the right. On the “App Catalog” > “Alluxio” page, select the cluster and namespace created in the prerequisites in the creation panel on the right, and click “Create”.

Verify Installation

Use kubectl to check whether the Alluxio pods are running:

  1. # kubectl get po -n alluxio
  2. NAME READY STATUS RESTARTS AGE
  3. alluxio-fuse-pjw5x 1/1 Running 0 83m
  4. alluxio-fuse-pqgz4 1/1 Running 0 83m
  5. alluxio-master-0 2/2 Running 0 83m
  6. alluxio-worker-8lcpb 2/2 Running 0 83m
  7. alluxio-worker-hqv8l 2/2 Running 0 83m

Use kubectl to log in to the Alluxio master pod and check the health of this Alluxio cluster:

  1. # kubectl exec -ti alluxio-master-0 -n alluxio bash
  2. bash-4.4# alluxio fsadmin report capacity
  3. Capacity information for all workers:
  4. Total Capacity: 2048.00MB
  5. Tier: MEM Size: 2048.00MB
  6. Used Capacity: 0B
  7. Tier: MEM Size: 0B
  8. Used Percentage: 0%
  9. Free Percentage: 100%
  10. Worker Name Last Heartbeat Storage MEM
  11. 192.168.5.202 0 capacity 1024.00MB
  12. used 0B (0%)
  13. 192.168.5.201 0 capacity 1024.00MB
  14. used 0B (0%)

Example: Running Spark Jobs

Install spark-operator

Go to Container Service Application Catalog, search for “ack-spark-operator” in the search box in the upper right: image-4.png

Choose to install “ack-spark-operator” on the target cluster (the cluster in this document is “ack-create-by-openapi-1”), and then click “create”, as shown in the figure: image-5.png

Build Spark docker image

Download the required Spark version from Spark download page. The Spark version used in this example is 2.4.6. Run the following command to download Spark:

  1. $ cd /root
  2. $ wget https://mirror.bit.edu.cn/apache/spark/spark-2.4.6/spark-2.4.6-bin-hadoop2.7.tgz

After the download is complete, unzip the package and set env “SPARK_HOME”:

  1. $ tar -xf spark-2.4.6-bin-hadoop2.7.tgz
  2. $ export SPARK_HOME=$(pwd)/spark-2.4.6-bin-hadoop2.7

The spark docker image is the image we used when submitting the Spark task. This image needs to include the Alluxio client jar package. You can obtain the Alluxio client jar package as follows:

  1. $ id=$(docker create alluxio/alluxio:2.5.0)
  2. $ docker cp $id:/opt/alluxio/client/alluxio-2.5.0-client.jar \
  3. $SPARK_HOME/jars/alluxio-2.5.0-client.jar
  4. $ docker rm -v $id 1>/dev/null

After the Alluxio client jar package is ready, start building the image:

  1. $ docker build -t \
  2. spark-alluxio:2.4.6 -f $SPARK_HOME/kubernetes/dockerfiles/spark/Dockerfile $SPARK_HOME

You can find more details about running Alluxio with Spark on Kubernetes here.

After the image is built, there are two ways to distribute the image:

  • If there is a private image warehouse, push the image to the private image warehouse, and ensure that the K8s cluster node can pull the image.
  • If there is no private image warehouse, you need to use the docker save command to export the image, then scp the image to each node of the K8s cluster, use the docker load command on each node to load the image.

Upload files to Alluxio

As mentioned at the beginning: In this experiment we will submit a Spark job to K8s. The Spark job will perform a word count calculation on a certain file. Before we kick off the Spark job, we need to upload the file to the Alluxio storage. Here, for convenience, we directly upload the file /opt/alluxio-2.5.0/LICENSE from the Alluxio master node (the file path may be slightly different due to the Alluxio version) to the Alluxio namespace.

Use kubectl exec to enter the Alluxio master pod, and upload the LICENSE file from the current directory to the root directory in Alluxio:

  1. $ kubectl exec -ti alluxio-master-0 -n alluxio bash
  2. # The following steps are executed in the alluxio-master-0 pod
  3. bash-4.4# alluxio fs copyFromLocal LICENSE /

Then check which workers store the blocks of file LICENSE.

  1. $ kubectl exec -ti alluxio-master-0 -n alluxio bash
  2. # The following steps are executed in the alluxio-master-0 pod
  3. bash-4.4# alluxio fs stat /LICENSE
  4. /LICENSE is a file path.
  5. FileInfo{fileId=33554431, fileIdentifier=null, name=LICENSE, path=/LICENSE, ufsPath=/opt/alluxio/underFSStorage/LICENSE, length=27040, blockSizeBytes=67108864, creationTimeMs=1592381889733, completed= true, folder=false, pinned=false, pinnedlocation=[], cacheable=true, persisted=false, blockIds=[16777216], inMemoryPercentage=100, lastModificationTimesMs=1592381890390, ttl=-1, lastAccessTimesMs=1592381890390, ttlAction=DELETE, owner=root, group=root, mode=420, persistenceState=TO_BE_PERSISTED, mountPoint=false, replicationMax=-1, replicationMin=0, fileBlockInfos=[FileBlockInfo{blockInfo=BlockInfo{id=16777216, length=27040, locations=[BlockLocation {workerId=8217561227881498090, address=WorkerNetAddress{host=192.168.8.17, containerHost=, rpcPort=29999, dataPort=29999, webPort=30000, domainSocketPath=, tieredIdentity=TieredIdentity(node=192.168.8.17, rack=null)}, tierAlias =MEM, mediumType=MEM}]}, offset=0, ufsLocations=[]}], mountId=1, inAlluxioPercentage=100, ufsFingerprint= , acl=user::rw-,group::r--,other::r--, defaultAcl=}
  6. Containing the following blocks:
  7. BlockInfo{id=16777216, length=27040, locations=[BlockLocation{workerId=8217561227881498090, address=WorkerNetAddress{host=192.168.8.17, containerHost=, rpcPort=29999, dataPort=29999, webPort=30000, domainSocketPath=, tieredIdentity=TieredIdentity (node=192.168.8.17, rack=null)}, tierAlias=MEM, mediumType=MEM}]}

As shown, this LICENSE file has only one block whose id is 16777216, placed on the K8s node 192.168.8.17.

We use kubectl to identify that the node name is cn-beijing.192.168.8.17:

  1. $ kubectl get nodes -o wide | awk '{print $1,$6}'
  2. NAME INTERNAL-IP
  3. cn-beijing.192.168.8.12 192.168.8.12
  4. cn-beijing.192.168.8.13 192.168.8.13
  5. cn-beijing.192.168.8.14 192.168.8.14
  6. cn-beijing.192.168.8.15 192.168.8.15
  7. cn-beijing.192.168.8.16 192.168.8.16
  8. cn-beijing.192.168.8.17 192.168.8.17

Submit Spark job

The following steps will submit a Spark job to the K8s cluster. The job is mainly to count the number of occurrences of each word in the /LICENSE file in Alluxio.

In the previous step, we see that the blocks contained in the LICENSE file are all on the node cn-beijing.192.168.8.17. In this experiment, we specify the node selector to let the Spark driver and Spark executor run on the node cn-beijing. 192.168.8.17. Then we verify that the communication between the Spark executor and the Alluxio worker is completed through the domain socket when Alluxio’s short-circuit function is turned on.

  • Description: If Alluxio short-circuit is enabled, and the block of the spark executor and the file it wants to access (/LICENSE in this experiment) are on the same k8s node, then the communication between the Alluxio client and the Alluxio worker on the K8s node is done through domain socket.

First generate a YAML file for submitting the Spark job. Note that you should update the variables in the below example YAML file in your test.

  1. $ export SPARK_ALLUXIO_IMAGE="spark-alluxio:2.4.6"
  2. $ export ALLUXIO_MASTER="alluxio-master-0"
  3. $ export TARGET_NODE="cn-beijing.192.168.8.17"
  4. $ cat > /tmp/spark-example.yaml <<- EOF
  5. apiVersion: "sparkoperator.k8s.io/v1beta2"
  6. kind: SparkApplication
  7. metadata:
  8. name: spark-count-words
  9. namespace: default
  10. spec:
  11. type: Scala
  12. mode: cluster
  13. image: "$SPARK_ALLUXIO_IMAGE"
  14. imagePullPolicy: Always
  15. mainClass: org.apache.spark.examples.JavaWordCount
  16. mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.11-2.4.6.jar"
  17. arguments:
  18. - alluxio://${ALLUXIO_MASTER}.alluxio:19998/LICENSE
  19. sparkVersion: "2.4.6"
  20. restartPolicy:
  21. type: Never
  22. volumes:
  23. - name: "test-volume"
  24. hostPath:
  25. path: "/tmp"
  26. type: Directory
  27. - name: "alluxio-domain"
  28. hostPath:
  29. path: "/tmp/alluxio-domain"
  30. type: Directory
  31. driver:
  32. cores: 1
  33. coreLimit: "1200m"
  34. memory: "512m"
  35. labels:
  36. version: 2.4.6
  37. serviceAccount: spark
  38. volumeMounts:
  39. - name: "test-volume"
  40. mountPath: "/tmp"
  41. - name: "alluxio-domain"
  42. mountPath: "/opt/domain"
  43. nodeSelector:
  44. kubernetes.io/hostname: "$TARGET_NODE"
  45. executor:
  46. cores: 1
  47. instances: 1
  48. memory: "512m"
  49. labels:
  50. version: 2.4.6
  51. nodeSelector:
  52. kubernetes.io/hostname: "$TARGET_NODE"
  53. volumeMounts:
  54. - name: "test-volume"
  55. mountPath: "/tmp"
  56. - name: "alluxio-domain"
  57. mountPath: "/opt/domain"
  58. EOF

Then, use sparkctl to submit the Spark job:

  1. $ sparkctl create /tmp/spark-example.yaml
  • Description: if sparkctl is not installed, please refer to sparkctl to install it.

Check Results

After submitting the task, use kubectl to check the Spark driver status:

  1. $ kubectl get po -l spark-role=driver
  2. NAME READY STATUS RESTARTS AGE
  3. spark-alluxio-1592296972094-driver 0/1 Completed 0 4h33m

When the status is Completed, it means the job has finished. Now read the spark driver log to find the result:

  1. $ kubectl logs spark-alluxio-1592296972094-driver --tail 20
  2. USE,: 3
  3. Patents: 2
  4. d): 1
  5. comment: 1
  6. executed: 1
  7. replaced: 1
  8. mechanical: 1
  9. 20/06/16 13:14:28 INFO SparkUI: Stopped Spark web UI at http://spark-alluxio-1592313250782-driver-svc.default.svc:4040
  10. 20/06/16 13:14:28 INFO KubernetesClusterSchedulerBackend: Shutting down all executors
  11. 20/06/16 13:14:28 INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each executor to shut down
  12. 20/06/16 13:14:28 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed (this is expected if the application is shutting down.)
  13. 20/06/16 13:14:28 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
  14. 20/06/16 13:14:28 INFO MemoryStore: MemoryStore cleared
  15. 20/06/16 13:14:28 INFO BlockManager: BlockManager stopped
  16. 20/06/16 13:14:28 INFO BlockManagerMaster: BlockManagerMaster stopped
  17. 20/06/16 13:14:28 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
  18. 20/06/16 13:14:28 INFO SparkContext: Successfully stopped SparkContext
  19. 20/06/16 13:14:28 INFO ShutdownHookManager: Shutdown hook called
  20. 20/06/16 13:14:28 INFO ShutdownHookManager: Deleting directory /var/data/spark-2f619243-59b2-4258-ba5e-69b8491123a6/spark-3d70294a-291a-423a-b034-8fc779244f40
  21. 20/06/16 13:14:28 INFO ShutdownHookManager: Deleting directory /tmp/spark-054883b4-15d3-43ee-94c3-5810a8a6cdc7

Finally, we login to the Alluxio master and check the statistics on the relevant indicators:

  1. $ kubectl exec -ti alluxio-master-0 -n alluxio bash
  2. bash-4.4# alluxio fsadmin report metrics
  3. Cluster.BytesReadRemote (Type: COUNTER, Value: 0B)
  4. Cluster.BytesReadRemoteThroughput (Type: GAUGE, Value: 0B/MIN)
  5. Cluster.BytesReadDomain (Type: COUNTER, Value: 237.66KB)
  6. Cluster.BytesReadDomainThroughput (Type: GAUGE, Value: 47.53KB/MIN)
  7. ......

From the metrics above, BytesReadRemote and BytesReadRemoteThroughput represent data transmission via the network stack; BytesReadDomain and BytesReadDomainThroughput represent data transmission via domain socket. You can observe that all data is transferred via the domain socket.