Katib Configuration Overview
How to make changes in Katib configuration
This guide describes Katib config — the Kubernetes Config Map that contains information about:
Current metrics collectors (
key = metrics-collector-sidecar
).Current algorithms (suggestions) (
key = suggestion
).Current early stopping algorithms (
key = early-stopping
).
The Katib Config Map must be deployed in the KATIB_CORE_NAMESPACE
namespace with the katib-config
name. The Katib controller parses the Katib config when you submit your experiment.
You can edit this Config Map even after deploying Katib.
If you are deploying Katib in the Kubeflow namespace, run this command to edit your Katib config:
kubectl edit configMap katib-config -n kubeflow
Metrics Collector Sidecar settings
These settings are related to Katib metrics collectors, where:
- key:
metrics-collector-sidecar
- value: corresponding JSON settings for each metrics collector kind
Example for the File
metrics collector with all settings:
metrics-collector-sidecar: |-
{
"File": {
"image": "docker.io/kubeflowkatib/file-metrics-collector",
"imagePullPolicy": "Always",
"resources": {
"requests": {
"memory": "200Mi",
"cpu": "250m",
"ephemeral-storage": "200Mi"
},
"limits": {
"memory": "1Gi",
"cpu": "500m",
"ephemeral-storage": "2Gi"
}
},
"waitAllProcesses": false
},
...
}
All of these settings except image
can be omitted. If you don’t specify any other settings, a default value is set automatically.
image
- a Docker image for theFile
metrics collector’s container (must be specified).imagePullPolicy
- an image pull policy for theFile
metrics collector’s container.The default value is
IfNotPresent
resources
- resources for theFile
metrics collector’s container. In the above example you can check how to specifylimits
andrequests
. Currently, you can specify onlymemory
,cpu
andephemeral-storage
resources.The default values for the
requests
are:memory = 10Mi
cpu = 50m
ephemeral-storage = 500Mi
The default values for the
limits
are:memory = 100Mi
cpu = 500m
ephemeral-storage = 5Gi
You can run your metrics collector’s container without requesting the
ephemeral-storage
resource from the Kubernetes cluster. For instance, when using the Google Kubernetes Engine cluster autoscaler for your Katib experiments, you can remove theephemeral-storage
resource from the metrics collector’s container by setting the negative values for theephemeral-storage
requests and limits in your Katib config as follows:"requests": {
"ephemeral-storage": "-1"
},
"limits": {
"ephemeral-storage": "-1"
}
waitAllProcesses
- a flag to define whether the metrics collector should wait until all processes in the training container are finished before start to collect metrics.The default value is
true
Suggestion settings
These settings are related to Katib suggestions, where:
- key:
suggestion
- value: corresponding JSON settings for each algorithm name
If you want to use a new algorithm, you need to update the Katib config. For example, using a random
algorithm with all settings looks as follows:
suggestion: |-
{
"random": {
"image": "docker.io/kubeflowkatib/suggestion-hyperopt",
"imagePullPolicy": "Always",
"resources": {
"requests": {
"memory": "100Mi",
"cpu": "100m",
"ephemeral-storage": "100Mi"
},
"limits": {
"memory": "500Mi",
"cpu": "500m",
"ephemeral-storage": "3Gi"
}
},
"serviceAccountName": "random-sa"
},
...
}
All of these settings except image
can be omitted. If you don’t specify any other settings, a default value is set automatically.
image
- a Docker image for the suggestion’s container with arandom
algorithm (must be specified).Image example:
docker.io/kubeflowkatib/<suggestion-name>
For each algorithm (suggestion) you can specify one of the following suggestion names in the Docker image:
Suggestion name List of supported algorithms Description suggestion-hyperopt
random
,tpe
Hyperopt optimization framework suggestion-chocolate
grid
,random
,quasirandom
,bayesianoptimization
,mocmaes
Chocolate optimization framework suggestion-skopt
bayesianoptimization
Scikit-optimize optimization framework suggestion-goptuna
cmaes
,random
,tpe
Goptuna optimization framework suggestion-hyperband
hyperband
Katib Hyperband implementation suggestion-enas
enas
Katib ENAS implementation suggestion-darts
darts
Katib DARTS implementation imagePullPolicy
- an image pull policy for the suggestion’s container with arandom
algorithm.The default value is
IfNotPresent
resources
- resources for the suggestion’s container with arandom
algorithm. In the above example you can check how to specifylimits
andrequests
. Currently, you can specify onlymemory
,cpu
andephemeral-storage
resources.The default values for the
requests
are:memory = 10Mi
cpu = 50m
ephemeral-storage = 500Mi
The default values for the
limits
are:memory = 100Mi
cpu = 500m
ephemeral-storage = 5Gi
You can run your suggestion’s container without requesting the
ephemeral-storage
resource from the Kubernetes cluster. For instance, when using the Google Kubernetes Engine cluster autoscaler for your Katib experiments, you can remove theephemeral-storage
resource from the suggestion’s container by setting the negative values for theephemeral-storage
requests and limits in your Katib config as follows:"requests": {
"ephemeral-storage": "-1"
},
"limits": {
"ephemeral-storage": "-1"
}
serviceAccountName
- a service account for the suggestion’s container with arandom
algorithm.In the above example, the
random-sa
service account is attached for each experiment’s suggestion with arandom
algorithm until you change or delete this service account from the Katib config.By default, the suggestion pod doesn’t have any specific service account, in which case, the pod uses the default service account.
Note: If you want to run your experiments with early stopping, the suggestion’s deployment must have permission to update the experiment’s trial status. If you don’t specify a service account in the Katib config, Katib controller creates required Kubernetes Role-based access control for the suggestion.
If you need your own service account for the experiment’s suggestion with early stopping, you have to follow the rules:
The service account name can’t be equal to
<experiment-name>-<experiment-algorithm>
The service account must have sufficient permissions to update the experiment’s trial status.
Suggestion volume settings
When you create an experiment with FromVolume
resume policy, you are able to specify PersistentVolume (PV) and PersistentVolumeClaim (PVC) settings for the experiment’s suggestion. Learn more about Katib concepts in the overview guide. If you want to use the default volume specification, you can omit these settings.
Follow the example for the random
algorithm:
suggestion: |-
{
"random": {
"image": "docker.io/kubeflowkatib/suggestion-hyperopt",
"volumeMountPath": "/opt/suggestion/data",
"persistentVolumeClaimSpec": {
"accessModes": [
"ReadWriteMany"
],
"resources": {
"requests": {
"storage": "3Gi"
}
},
"storageClassName": "katib-suggestion"
},
"persistentVolumeSpec": {
"accessModes": [
"ReadWriteMany"
],
"capacity": {
"storage": "3Gi"
},
"hostPath": {
"path": "/tmp/suggestion/unique/path"
}
}
},
...
}
volumeMountPath
- a mount path for the suggestion’s container withrandom
algorithm.The default value is
/opt/katib/data
persistentVolumeClaimSpec
- a PVC specification for the suggestion’s PVC.The default value is set, if you don’t specify any of these settings:
persistentVolumeClaimSpec.accessModes[0]
- the default value isReadWriteOnce
persistentVolumeClaimSpec.resources.requests.storage
- the default value is1Gi
persistentVolumeClaimSpec.storageClassName
- the default iskatib-suggestion
Note: If your Kubernetes cluster doesn’t have dynamic volume provisioning to automatically provision storage for the PVC,
storageClassName
must be equal tokatib-suggestion
. Then, Katib creates PV and PVC for the suggestion. Otherwise, Katib creates only PVC.
persistentVolumeSpec
- a PV specification for the suggestion’s PV.The default value is set, if you don’t specify any of these parameters:
persistentVolumeSpec.accessModes[0]
- the default values isReadWriteOnce
persistentVolumeSpec.capacity.storage
- the default value is1Gi
persistentVolumeSpec.hostPath.path
- the default value is/tmp/katib/suggestions/<suggestion-name>-<suggestion-algorithm>-<suggestion-namespace>
For the default PV source Katib uses
hostPath
. If.hostPath.path
in the config settings is equal to/tmp/katib/suggestions/
, Katib controller adds<suggestion-name>-<suggestion-algorithm>-<suggestion-namespace>
to the path. That makes host paths unique across suggestions.Note:
PV
storageClassName
is always equal tokatib-suggestion
.PV
persistentVolumeReclaimPolicy
is always equal toDelete
to properly remove all resources once Katib experiment is deleted. To know more about PV reclaim policies check the Kubernetes documentation.
Early stopping settings
These settings are related to Katib early stopping, where:
- key:
early-stopping
- value: corresponding JSON settings for each early stopping algorithm name
If you want to use a new early stopping algorithm, you need to update the Katib config. For example, using a medianstop
early stopping algorithm with all settings looks as follows:
early-stopping: |-
{
"medianstop": {
"image": "docker.io/kubeflowkatib/earlystopping-medianstop",
"imagePullPolicy": "Always"
},
...
}
All of these settings except image
can be omitted. If you don’t specify any other settings, a default value is set automatically.
image
- a Docker image for the early stopping’s container with amedianstop
algorithm (must be specified).Image example:
docker.io/kubeflowkatib/<early-stopping-name>
For each early stopping algorithm you can specify one of the following early stopping names in the Docker image:
Early stopping name Early stopping algorithm Description earlystopping-medianstop
medianstop
Katib Median Stopping implementation imagePullPolicy
- an image pull policy for the early stopping’s container with amedianstop
algorithm.The default value is
IfNotPresent
Next steps
Learn how to configure and run your Katib experiments.
How to set up environment variables for each Katib component.
Last modified 15.03.2021: Add Katib config PV reclaim policy (#2533) (a0cea7c5)