Katib Configuration Overview
How to make changes in Katib configuration
This guide describes Katib config — the Kubernetes Config Map that contains information about:
Current metrics collectors (
key = metrics-collector-sidecar
).Current algorithms (suggestions) (
key = suggestion
).Current early stopping algorithms (
key = early-stopping
).
The Katib Config Map must be deployed in the KATIB_CORE_NAMESPACE
namespace with the katib-config
name. The Katib controller parses the Katib config when you submit your experiment.
You can edit this Config Map even after deploying Katib.
If you are deploying Katib in the Kubeflow namespace, run this command to edit your Katib config:
kubectl edit configMap katib-config -n kubeflow
Metrics Collector Sidecar settings
These settings are related to Katib metrics collectors, where:
- key:
metrics-collector-sidecar
- value: corresponding JSON settings for each metrics collector kind
Example for the File
metrics collector with all settings:
metrics-collector-sidecar: |-
{
"File": {
"image": "docker.io/kubeflowkatib/file-metrics-collector",
"imagePullPolicy": "Always",
"resources": {
"requests": {
"memory": "200Mi",
"cpu": "250m",
"ephemeral-storage": "200Mi"
},
"limits": {
"memory": "1Gi",
"cpu": "500m",
"ephemeral-storage": "2Gi"
}
},
"waitAllProcesses": false
},
...
}
All of these settings except image
can be omitted. If you don’t specify any other settings, a default value is set automatically.
image
- a Docker image for theFile
metrics collector’s container (must be specified).imagePullPolicy
- an image pull policy for theFile
metrics collector’s container.The default value is
IfNotPresent
resources
- resources for theFile
metrics collector’s container. In the above example you can check how to specifylimits
andrequests
. Currently, you can specify onlymemory
,cpu
andephemeral-storage
resources.The default values for the
requests
are:memory = 10Mi
cpu = 50m
ephemeral-storage = 500Mi
The default values for the
limits
are:memory = 100Mi
cpu = 500m
ephemeral-storage = 5Gi
You can run your metrics collector’s container without requesting the
cpu
,memory
, orephemeral-storage
resource from the Kubernetes cluster. For instance, you have to removeephemeral-storage
from the container resources to use the Google Kubernetes Engine cluster autoscaler.To remove specific resources from the metrics collector’s container set the negative values in requests and limits in your Katib config as follows:
"requests": {
"cpu": "-1",
"memory": "-1",
"ephemeral-storage": "-1"
},
"limits": {
"cpu": "-1",
"memory": "-1",
"ephemeral-storage": "-1"
}
waitAllProcesses
- a flag to define whether the metrics collector should wait until all processes in the training container are finished before start to collect metrics.The default value is
true
Suggestion settings
These settings are related to Katib suggestions, where:
- key:
suggestion
- value: corresponding JSON settings for each algorithm name
If you want to use a new algorithm, you need to update the Katib config. For example, using a random
algorithm with all settings looks as follows:
suggestion: |-
{
"random": {
"image": "docker.io/kubeflowkatib/suggestion-hyperopt",
"imagePullPolicy": "Always",
"resources": {
"requests": {
"memory": "100Mi",
"cpu": "100m",
"ephemeral-storage": "100Mi"
},
"limits": {
"memory": "500Mi",
"cpu": "500m",
"ephemeral-storage": "3Gi"
}
},
"serviceAccountName": "random-sa"
},
...
}
All of these settings except image
can be omitted. If you don’t specify any other settings, a default value is set automatically.
image
- a Docker image for the suggestion’s container with arandom
algorithm (must be specified).Image example:
docker.io/kubeflowkatib/<suggestion-name>
For each algorithm (suggestion) you can specify one of the following suggestion names in the Docker image:
Suggestion name List of supported algorithms Description suggestion-hyperopt
random
,tpe
Hyperopt optimization framework suggestion-chocolate
grid
,random
,quasirandom
,bayesianoptimization
,mocmaes
Chocolate optimization framework suggestion-skopt
bayesianoptimization
Scikit-optimize optimization framework suggestion-goptuna
cmaes
,random
,tpe
Goptuna optimization framework suggestion-hyperband
hyperband
Katib Hyperband implementation suggestion-enas
enas
Katib ENAS implementation suggestion-darts
darts
Katib DARTS implementation imagePullPolicy
- an image pull policy for the suggestion’s container with arandom
algorithm.The default value is
IfNotPresent
resources
- resources for the suggestion’s container with arandom
algorithm. In the above example you can check how to specifylimits
andrequests
. Currently, you can specify onlymemory
,cpu
andephemeral-storage
resources.The default values for the
requests
are:memory = 10Mi
cpu = 50m
ephemeral-storage = 500Mi
The default values for the
limits
are:memory = 100Mi
cpu = 500m
ephemeral-storage = 5Gi
You can run your suggestion’s container without requesting the
cpu
,memory
, orephemeral-storage
resource from the Kubernetes cluster. For instance, you have to removeephemeral-storage
from the container resources to use the Google Kubernetes Engine cluster autoscaler.To remove specific resources from the suggestion’s container set the negative values in requests and limits in your Katib config as follows:
"requests": {
"cpu": "-1",
"memory": "-1",
"ephemeral-storage": "-1"
},
"limits": {
"cpu": "-1",
"memory": "-1",
"ephemeral-storage": "-1"
}
serviceAccountName
- a service account for the suggestion’s container with arandom
algorithm.In the above example, the
random-sa
service account is attached for each experiment’s suggestion with arandom
algorithm until you change or delete this service account from the Katib config.By default, the suggestion pod doesn’t have any specific service account, in which case, the pod uses the default service account.
Note: If you want to run your experiments with early stopping, the suggestion’s deployment must have permission to update the experiment’s trial status. If you don’t specify a service account in the Katib config, Katib controller creates required Kubernetes Role-based access control for the suggestion.
If you need your own service account for the experiment’s suggestion with early stopping, you have to follow the rules:
The service account name can’t be equal to
<experiment-name>-<experiment-algorithm>
The service account must have sufficient permissions to update the experiment’s trial status.
Suggestion volume settings
When you create an experiment with FromVolume
resume policy, you are able to specify PersistentVolume (PV) and PersistentVolumeClaim (PVC) settings for the experiment’s suggestion. Learn more about Katib concepts in the overview guide.
If PV settings are empty, Katib controller creates only PVC. If you want to use the default volume specification, you can omit these settings.
Follow the example for the random
algorithm:
suggestion: |-
{
"random": {
"image": "docker.io/kubeflowkatib/suggestion-hyperopt",
"volumeMountPath": "/opt/suggestion/data",
"persistentVolumeClaimSpec": {
"accessModes": [
"ReadWriteMany"
],
"resources": {
"requests": {
"storage": "3Gi"
}
},
"storageClassName": "katib-suggestion"
},
"persistentVolumeSpec": {
"accessModes": [
"ReadWriteMany"
],
"capacity": {
"storage": "3Gi"
},
"hostPath": {
"path": "/tmp/suggestion/unique/path"
},
"storageClassName": "katib-suggestion"
},
"persistentVolumeLabels": {
"type": "local"
}
},
...
}
volumeMountPath
- a mount path for the suggestion’s container withrandom
algorithm.The default value is
/opt/katib/data
persistentVolumeClaimSpec
- a PVC specification for the suggestion’s PVC.The default value is set, if you don’t specify any of these settings:
persistentVolumeClaimSpec.accessModes[0]
- the default value isReadWriteOnce
persistentVolumeClaimSpec.resources.requests.storage
- the default value is1Gi
persistentVolumeSpec
- a PV specification for the suggestion’s PV.PV
persistentVolumeReclaimPolicy
is always equal toDelete
to properly remove all resources once Katib experiment is deleted. To know more about PV reclaim policies check the Kubernetes documentation.persistentVolumeLabels
- PV labels for the suggestion’s PV.
Early stopping settings
These settings are related to Katib early stopping, where:
- key:
early-stopping
- value: corresponding JSON settings for each early stopping algorithm name
If you want to use a new early stopping algorithm, you need to update the Katib config. For example, using a medianstop
early stopping algorithm with all settings looks as follows:
early-stopping: |-
{
"medianstop": {
"image": "docker.io/kubeflowkatib/earlystopping-medianstop",
"imagePullPolicy": "Always"
},
...
}
All of these settings except image
can be omitted. If you don’t specify any other settings, a default value is set automatically.
image
- a Docker image for the early stopping’s container with amedianstop
algorithm (must be specified).Image example:
docker.io/kubeflowkatib/<early-stopping-name>
For each early stopping algorithm you can specify one of the following early stopping names in the Docker image:
Early stopping name Early stopping algorithm Description earlystopping-medianstop
medianstop
Katib Median Stopping implementation imagePullPolicy
- an image pull policy for the early stopping’s container with amedianstop
algorithm.The default value is
IfNotPresent
Next steps
Learn how to configure and run your Katib experiments.
How to set up environment variables for each Katib component.
Last modified 30.06.2021: Update empty resources in Katib config (#2799) (97626a60)