ImagePullJob

NodeImage and ImagePullJob are new CRDs provided since Kruise v0.8.0 version.

Kruise will create a NodeImage for each Node, and it contains images that should be downloaded on this Node.

Users can create an ImagePullJob to declare an image should be downloaded on which nodes.

Image Pulling

Note that the NodeImage is quite a low-level API. You should only use it when you prepare to pull an image on a definite Node. Otherwise, you should use the ImagePullJob to pull an image on a batch of Nodes.

Feature-gate

Since kruise v1.5.0 ImagePullJob/ImageListPullJob feature is turned off by default to reduce the privilege of default installation. One can turn it on by setting feature-gate ImagePullJobGate.

  1. $ helm install/upgrade kruise https://... --set featureGates="ImagePullJobGate=true"

ImagePullJob (high-level)

ImagePullJob is a namespaced-scope resource.

API definition: https://github.com/openkruise/kruise/blob/master/apis/apps/v1alpha1/imagepulljob_types.go

  1. apiVersion: apps.kruise.io/v1alpha1
  2. kind: ImagePullJob
  3. metadata:
  4. name: job-with-always
  5. spec:
  6. image: nginx:1.9.1 # [required] image to pull
  7. parallelism: 10 # [optional] the maximal number of Nodes that pull this image at the same time, defaults to 1
  8. selector: # [optional] the names or label selector to assign Nodes (only one of them can be set)
  9. names:
  10. - node-1
  11. - node-2
  12. matchLabels:
  13. node-type: xxx
  14. # podSelector: # [optional] label selector over pods that should pull image on nodes of these pods. Mutually exclusive with selector.
  15. # matchLabels:
  16. # pod-label: xxx
  17. # matchExpressions:
  18. # - key: pod-label
  19. # operator: In
  20. # values:
  21. # - xxx
  22. completionPolicy:
  23. type: Always # [optional] defaults to Always
  24. activeDeadlineSeconds: 1200 # [optional] no default, only work for Always type
  25. ttlSecondsAfterFinished: 300 # [optional] no default, only work for Always type
  26. pullPolicy: # [optional] defaults to backoffLimit=3, timeoutSeconds=600
  27. backoffLimit: 3
  28. timeoutSeconds: 300

You can write the names or label selector in the selector field to assign Nodes (only one of them can be set). If no selector is set, the image will be pulled on all Nodes in the cluster.

Or you can write the podSelector to pull image on nodes of these pods. podSelector is mutually exclusive with selector.

Also, ImagePullJob has two completionPolicy types:

  • Always means this job will eventually complete with either failed or succeeded.
    • activeDeadlineSeconds: timeout duration for this job
    • ttlSecondsAfterFinished: after this job finished (including success or failure) over this time, this job will be removed
  • Never means this job will never complete, it will continuously pull image on the desired Nodes every day.

Configure secrets

If the image is in a private registry, you may want to configure the pull secrets for the image:

  1. # ...
  2. spec:
  3. pullSecrets:
  4. - secret-name1
  5. - secret-name2

Because of ImagePullJob is a namespaced-scope resource, the secrets should be in the same namespace of this ImagePullJob, and you should only put the secret names into pullSecrets field.

You can also use Configure image credential provider for private registry.

Configure image credential provider

FEATURE STATE: Kruise v1.7.0

Starting from Kubernetes v1.20, the kubelet can dynamically retrieve credentials for a container image registry using exec plugins. Refer Community Documentation.

OpenKruise also supports the same way for pre-download image with the following steps:

a. Configure image credential provider on AWS:

  1. Install AWS‘s credential provisioning plugin on k8s nodes.
  2. Create credential-provider-config Configmap in K8S:
  1. apiVersion: v1
  2. kind: ConfigMap
  3. metadata:
  4. name: credential-provider-config
  5. namespace: kruise-system
  6. data:
  7. CredentialProviderPlugin.yaml: |
  8. apiVersion: kubelet.config.k8s.io/v1
  9. kind: CredentialProviderConfig
  10. providers:
  11. # name is the required name of the credential provider.
  12. - name: ecr-credential-provider
  13. matchImages:
  14. - "*.dkr.ecr.*.amazonaws.com"
  15. - "*.dkr.ecr.*.amazonaws.com.cn"
  16. - "*.dkr.ecr-fips.*.amazonaws.com"
  17. - "*.dkr.ecr.us-iso-east-1.c2s.ic.gov"
  18. - "*.dkr.ecr.us-isob-east-1.sc2s.sgov.gov"
  19. defaultCacheDuration: "12h"
  20. apiVersion: credentialprovider.kubelet.k8s.io/v1
  21. env:
  22. - name: AWS_PROFILE
  23. value: temp
  1. Install kruise with AWS Shared Credentials File.

If you have a shared credentials file($HOME/.aws/credentials) on every machine, you can mount the directory to kruise-daemon for authentication, as follows:

  1. helm install kruise https://... --set installation.createNamespace=false --set daemon.credentialProvider.enable=true --set daemon.credentialProvider.hostPath=/etc/eks/image-credential-provider --set daemon.credentialProvider.configmap=credential-provider-config --set daemon.credentialProvider.awsCredentialsDir=/root/.aws
  1. Create an ImagePullJob, authenticate the image repository via the above plugin, and complete pre-download image.

Note: If other cloud vendors (e.g., Tencent Cloud) have a similar mechanism, it should work. If you have similar needs, please contact us.

Attach metadata into cri interface

FEATURE STATE: Kruise v1.4.0

When kubelet creates pods, kubelet will attach pod metadata as podSandboxConfig params in the PullImage CRI interface. The OpenKruise ImagePullJob also supports the similar capability, as follows:

  1. apiVersion: apps.kruise.io/v1alpha1
  2. kind: ImagePullJob
  3. spec:
  4. ...
  5. image: nginx:1.9.1
  6. sandboxConfig:
  7. annotations:
  8. io.kubernetes.image.metrics.tags: "cluster=cn-shanghai"
  9. labels:
  10. io.kubernetes.image.app: "foo"

Image Pull Policy support ‘Always’

FEATURE STATE: Kruise v1.6.0

  • spec.imagePullPolicy=Always means that kruise always attempts to pull the latest image, even if with the name as previous one.
  • spec.imagePullPolicy=IfNotPresent means that kruise only pull the image if it isn’t present on node.
  • Defaults is IfNotPresent.
  1. apiVersion: apps.kruise.io/v1alpha1
  2. kind: ImagePullJob
  3. spec:
  4. ...
  5. image: nginx:1.9.1
  6. imagePullPolicy: Always | IfNotPresent

ImageListPullJob

FEATURE STATE: Kruise v1.5.0

ImagePullJob can only support a single image pre-download, one can use multiple ImagePullJob to download multiple images, or use ImageListPullJob to pre-download multiple images in a single job, as follows:

  1. apiVersion: apps.kruise.io/v1alpha1
  2. kind: ImageListPullJob
  3. metadata:
  4. name: job-with-always
  5. spec:
  6. images:
  7. - nginx:1.9.1 # [required] image to pull
  8. - busybox:1.29.2
  9. parallelism: 10 # [optional] the maximal number of Nodes that pull this image at the same time, defaults to 1
  10. selector: # [optional] the names or label selector to assign Nodes (only one of them can be set)
  11. names:
  12. - node-1
  13. - node-2
  14. matchLabels:
  15. node-type: xxx
  16. completionPolicy:
  17. type: Always # [optional] defaults to Always
  18. activeDeadlineSeconds: 1200 # [optional] no default, only work for Always type
  19. ttlSecondsAfterFinished: 300 # [optional] no default, only work for Always type
  20. pullPolicy: # [optional] defaults to backoffLimit=3, timeoutSeconds=600
  21. backoffLimit: 3
  22. timeoutSeconds: 300

NodeImage (low-level)

NodeImage is a cluster-scope resource.

API definition: https://github.com/openkruise/kruise/blob/master/apis/apps/v1alpha1/nodeimage_types.go

When Kruise has been installed, nodeimage-controller will create NodeImages for Nodes with the same names immediately. And when a Node has been added or removed, nodeimage-controller will also create or delete NodeImage for this Node.

What’s more, nodeimage-controller will also synchronize labels from Node to NodeImage. So the NodeImage and Node always have the same name and labels. You can get NodeImage with the Node name, or list NodeImage with the Node labels as selector.

Typically, an empty NodeImage looks like this:

  1. apiVersion: apps.kruise.io/v1alpha1
  2. kind: NodeImage
  3. metadata:
  4. labels:
  5. kubernetes.io/arch: amd64
  6. kubernetes.io/os: linux
  7. # ...
  8. name: node-xxx
  9. # ...
  10. spec: {}
  11. status:
  12. desired: 0
  13. failed: 0
  14. pulling: 0
  15. succeeded: 0

If you want to pull an image such as ubuntu:latest on this Node, you can:

  1. kubectl edit nodeimage node-xxx and write below into it (ignore the comments):
  1. # ...
  2. spec:
  3. images:
  4. ubuntu: # image name
  5. tags:
  6. - tag: latest # image tag
  7. pullPolicy:
  8. ttlSecondsAfterFinished: 300 # [required] after this image pulling finished (including success or failure) over 300s, this task will be removed
  9. timeoutSeconds: 600 # [optional] timeout duration for once pulling, defaults to 600
  10. backoffLimit: 3 # [optional] retry times for pulling, defaults to 3
  11. activeDeadlineSeconds: 1200 # [optional] timeout duration for this task, no default
  1. kubectl patch nodeimage node-xxx --type=merge -p '{"spec":{"images":{"ubuntu":{"tags":[{"tag":"latest","pullPolicy":{"ttlSecondsAfterFinished":300}}]}}}}'

You can read the NodeImage status using kubectl get nodeimage node-xxx -o yaml, and you will find the task removed from spec and status after it has finished over 300s.

FAQ

  1. If ImagePullJob failed, as follows:
  1. % kubectl get imagepulljob
  2. NAME TOTAL ACTIVE SUCCEED FAILED AGE MESSAGE
  3. job-with-always 4 0 0 4 9m49s job has completed
  1. You can find out the failed node.name from imagePullJob.status, as follows:
  1. % kubectl get imagepulljob job-with-always -oyaml
  2. apiVersion: apps.kruise.io/v1alpha1
  3. kind: ImagePullJob
  4. status:
  5. active: 0
  6. completionTime: "2024-08-09T10:06:26Z"
  7. desired: 4
  8. failed: 4
  9. failedNodes:
  10. - cn-hangzhou.x.125
  11. - cn-hangzhou.x.126
  12. - cn-hangzhou.x.127
  13. - cn-hangzhou.x.128
  14. message: job has completed
  15. startTime: "2024-08-09T10:03:52Z"
  16. succeeded: 0
  1. You can see the exact cause of failure via NodeImage, as follows:
  1. % kubectl get nodeimage cn-hangzhou.x.125 -oyaml
  2. apiVersion: apps.kruise.io/v1alpha1
  3. kind: NodeImage
  4. status:
  5. desired: 1
  6. failed: 1
  7. imageStatuses:
  8. nginx:
  9. tags:
  10. - completionTime: "2024-08-09T10:06:22Z"
  11. message: 'Failed to pull image reference "nginx:1.9.1": rpc error: code =
  12. DeadlineExceeded desc = failed to pull and unpack image "docker.io/library/nginx:1.9.1":
  13. failed to copy: httpReadSeeker: failed open: failed to do request: Get "https://production.cloudflare.docker.com/registry-v2/docker/registry/v2/blobs/sha256/c5/c5dd085dcc7c78a296c80b87916831fd19a3f447d94b99580ccd19a052720211/data?verify=1723200943-x6RCoD1a2P3aEdh1%!B(MISSING)XcQSFe2h%!B(MISSING)U%!D(MISSING)":
  14. dial tcp 10.1.1.1:443: i/o timeout'
  15. phase: Failed
  16. startTime: "2024-08-09T10:03:52Z"
  17. tag: 1.9.1
  18. pulling: 0
  19. succeeded: 0