Customizing Kubeflow on AWS

Tailoring a AWS deployment of Kubeflow

This guide describes how to customize your deployment of Kubeflow on Amazon EKS.Some of the steps can be done before you run the apply platform command, and some of them can be done before you run the apply k8s command. Please see the following sections for details. If you don’t understand the deployment process, please see deploy for details.

Customizing Kubeflow

Here are the optional configuration parameters for kfctl on the AWS platform.

OptionsDescriptionRequired
awsClusterNameName of your new or existing Amazon EKS clusterYES
awsRegionThe AWS Region to launch inYES
awsNodegroupRoleNamesThe IAM role names for your worker nodesYES for existing clusters / No for new clusters

Customize your Amazon EKS cluster

Before you run ${KUBEFLOW_SRC}/scripts/kfctl.sh apply platform, you can edit the cluster configuration file to change cluster specification before you create the cluster.

Cluster configuration is stored in ${KUBEFLOW_SRC}/${KFAPP}/aws_config/cluster_config.yaml. Please see eksctl for configuration details.

For example, the following is a cluster manifest with one node group which has 2 p2.xlarge instances. You can easily enable SSH and configure a public key. All worker nodes will be in single Availability Zone.

  1. apiVersion: eksctl.io/v1alpha4
  2. kind: ClusterConfig
  3. metadata:
  4. # AWS_CLUSTER_NAME and AWS_REGION will override `name` and `region` here.
  5. name: kubeflow-example
  6. region: us-west-2
  7. version: '1.12'
  8. # If your region has multiple availability zones, you can specify 3 of them.
  9. #availabilityZones: ["us-west-2b", "us-west-2c", "us-west-2d"]
  10. # NodeGroup holds all configuration attributes that are specific to a nodegroup
  11. # You can have several node groups in your cluster.
  12. nodeGroups:
  13. - name: eks-gpu
  14. instanceType: p2.xlarge
  15. availabilityZones: ["us-west-2b"]
  16. desiredCapacity: 2
  17. minSize: 0
  18. maxSize: 2
  19. volumeSize: 30
  20. allowSSH: true
  21. sshPublicKeyPath: '~/.ssh/id_rsa.pub'
  22. # Example of GPU node group
  23. # - name: Tesla-V100
  24. # Choose your Instance type for the node group.
  25. # instanceType: p3.2xlarge
  26. # GPU cluster can use single availability zone to improve network performance
  27. # availabilityZones: ["us-west-2b"]
  28. # Autoscaling Groups settings
  29. # desiredCapacity: 0
  30. # minSize: 0
  31. # maxSize: 4
  32. # Node Root Disk
  33. # volumeSize: 50
  34. # Enable SSH out side your VPC.
  35. # allowSSH: true
  36. # sshPublicKeyPath: '~/.ssh/id_rsa.pub'
  37. # Customize Labels
  38. # labels:
  39. # 'k8s.amazonaws.com/accelerator': 'nvidia-tesla-k80'
  40. # Setup pre-defined iam roles to node group.
  41. # iam:
  42. # withAddonPolicies:
  43. # autoScaler: true

Customize Private Access

Please see this section

Customize Logging

Please see this section

Customize Authentication

Please see this section