Windows containers in Kubernetes

Windows containers in Kubernetes

Windows applications constitute a large portion of the services and applications that run in many organizations. Windows containers provide a way to encapsulate processes and package dependencies, making it easier to use DevOps practices and follow cloud native patterns for Windows applications.

Organizations with investments in Windows-based applications and Linux-based applications don’t have to look for separate orchestrators to manage their workloads, leading to increased operational efficiencies across their deployments, regardless of operating system.

Windows nodes in Kubernetes

To enable the orchestration of Windows containers in Kubernetes, include Windows nodes in your existing Linux cluster. Scheduling Windows containers in Pods on Kubernetes is similar to scheduling Linux-based containers.

In order to run Windows containers, your Kubernetes cluster must include multiple operating systems. While you can only run the control plane on Linux, you can deploy worker nodes running either Windows or Linux.

Windows nodes are supported provided that the operating system is Windows Server 2019.

This document uses the term Windows containers to mean Windows containers with process isolation. Kubernetes does not support running Windows containers with Hyper-V isolation.

Compatibility and limitations

Some node features are only available if you use a specific container runtime; others are not available on Windows nodes, including:

HugePages: not supported for Windows containers
Privileged containers: not supported for Windows containers. HostProcess Containers offer similar functionality.
TerminationGracePeriod: requires containerD

Not all features of shared namespaces are supported. See API compatibility for more details.

See Windows OS version compatibility for details on the Windows versions that Kubernetes is tested against.

From an API and kubectl perspective, Windows containers behave in much the same way as Linux-based containers. However, there are some notable differences in key functionality which are outlined in this section.

Comparison with Linux

Key Kubernetes elements work the same way in Windows as they do in Linux. This section refers to several key workload abstractions and how they map to Windows.

Pods

A Pod is the basic building block of Kubernetes–the smallest and simplest unit in the Kubernetes object model that you create or deploy. You may not deploy Windows and Linux containers in the same Pod. All containers in a Pod are scheduled onto a single Node where each Node represents a specific platform and architecture. The following Pod capabilities, properties and events are supported with Windows containers:
- Single or multiple containers per Pod with process isolation and volume sharing
- Pod status fields
- Readiness, liveness, and startup probes
- postStart & preStop container lifecycle hooks
- ConfigMap, Secrets: as environment variables or volumes
- emptyDir volumes
- Named pipe host mounts
- Resource limits
- OS field:
  
  The .spec.os.name field should be set to windows to indicate that the current Pod uses Windows containers.
  
  Note: Starting from 1.25, the IdentifyPodOS feature gate is in GA stage and defaults to be enabled.
  
  If you set the .spec.os.name field to windows, you must not set the following fields in the .spec of that Pod:
  - spec.hostPID
  - spec.hostIPC
  - spec.securityContext.seLinuxOptions
  - spec.securityContext.seccompProfile
  - spec.securityContext.fsGroup
  - spec.securityContext.fsGroupChangePolicy
  - spec.securityContext.sysctls
  - spec.shareProcessNamespace
  - spec.securityContext.runAsUser
  - spec.securityContext.runAsGroup
  - spec.securityContext.supplementalGroups
  - spec.containers[*].securityContext.seLinuxOptions
  - spec.containers[*].securityContext.seccompProfile
  - spec.containers[*].securityContext.capabilities
  - spec.containers[*].securityContext.readOnlyRootFilesystem
  - spec.containers[*].securityContext.privileged
  - spec.containers[*].securityContext.allowPrivilegeEscalation
  - spec.containers[*].securityContext.procMount
  - spec.containers[*].securityContext.runAsUser
  - spec.containers[*].securityContext.runAsGroup
  In the above list, wildcards (*) indicate all elements in a list. For example, spec.containers[*].securityContext refers to the SecurityContext object for all containers. If any of these fields is specified, the Pod will not be admitted by the API server.
Workload resources including:
- ReplicaSet
- Deployment
- StatefulSet
- DaemonSet
- Job
- CronJob
- ReplicationController
Services See Load balancing and Services for more details.

Pods, workload resources, and Services are critical elements to managing Windows workloads on Kubernetes. However, on their own they are not enough to enable the proper lifecycle management of Windows workloads in a dynamic cloud native environment.

kubectl exec
Pod and container metrics
Horizontal pod autoscaling
Resource quotas
Scheduler preemption

Command line options for the kubelet

Some kubelet command line options behave differently on Windows, as described below:

The --windows-priorityclass lets you set the scheduling priority of the kubelet process (see CPU resource management)
The --kube-reserved, --system-reserved , and --eviction-hard flags update NodeAllocatable
Eviction by using --enforce-node-allocable is not implemented
Eviction by using --eviction-hard and --eviction-soft are not implemented
When running on a Windows node the kubelet does not have memory or CPU restrictions. --kube-reserved and --system-reserved only subtract from NodeAllocatable and do not guarantee resource provided for workloads. See Resource Management for Windows nodes for more information.
The MemoryPressure Condition is not implemented
The kubelet does not take OOM eviction actions

API compatibility

There are subtle differences in the way the Kubernetes APIs work for Windows due to the OS and container runtime. Some workload properties were designed for Linux, and fail to run on Windows.

At a high level, these OS concepts are different:

Identity - Linux uses userID (UID) and groupID (GID) which are represented as integer types. User and group names are not canonical - they are just an alias in /etc/groups or /etc/passwd back to UID+GID. Windows uses a larger binary security identifier (SID) which is stored in the Windows Security Access Manager (SAM) database. This database is not shared between the host and containers, or between containers.
File permissions - Windows uses an access control list based on (SIDs), whereas POSIX systems such as Linux use a bitmask based on object permissions and UID+GID, plus optional access control lists.
File paths - the convention on Windows is to use \ instead of /. The Go IO libraries typically accept both and just make it work, but when you’re setting a path or command line that’s interpreted inside a container, \ may be needed.
Signals - Windows interactive apps handle termination differently, and can implement one or more of these:
- A UI thread handles well-defined messages including WM_CLOSE.
- Console apps handle Ctrl-C or Ctrl-break using a Control Handler.
- Services register a Service Control Handler function that can accept SERVICE_CONTROL_STOP control codes.

Container exit codes follow the same convention where 0 is success, and nonzero is failure. The specific error codes may differ across Windows and Linux. However, exit codes passed from the Kubernetes components (kubelet, kube-proxy) are unchanged.

Field compatibility for container specifications

The following list documents differences between how Pod container specifications work between Windows and Linux:

Huge pages are not implemented in the Windows container runtime, and are not available. They require asserting a user privilege that’s not configurable for containers.
requests.cpu and requests.memory - requests are subtracted from node available resources, so they can be used to avoid overprovisioning a node. However, they cannot be used to guarantee resources in an overprovisioned node. They should be applied to all containers as a best practice if the operator wants to avoid overprovisioning entirely.
securityContext.allowPrivilegeEscalation - not possible on Windows; none of the capabilities are hooked up
securityContext.capabilities - POSIX capabilities are not implemented on Windows
securityContext.privileged - Windows doesn’t support privileged containers, use HostProcess Containers instead
securityContext.procMount - Windows doesn’t have a /proc filesystem
securityContext.readOnlyRootFilesystem - not possible on Windows; write access is required for registry & system processes to run inside the container
securityContext.runAsGroup - not possible on Windows as there is no GID support
securityContext.runAsNonRoot - this setting will prevent containers from running as ContainerAdministrator which is the closest equivalent to a root user on Windows.
securityContext.runAsUser - use runAsUserName instead
securityContext.seLinuxOptions - not possible on Windows as SELinux is Linux-specific
terminationMessagePath - this has some limitations in that Windows doesn’t support mapping single files. The default value is /dev/termination-log, which does work because it does not exist on Windows by default.

Field compatibility for Pod specifications

The following list documents differences between how Pod specifications work between Windows and Linux:

hostIPC and hostpid - host namespace sharing is not possible on Windows
hostNetwork - see below
dnsPolicy - setting the Pod dnsPolicy to ClusterFirstWithHostNet is not supported on Windows because host networking is not provided. Pods always run with a container network.
podSecurityContext see below
shareProcessNamespace - this is a beta feature, and depends on Linux namespaces which are not implemented on Windows. Windows cannot share process namespaces or the container’s root filesystem. Only the network can be shared.
terminationGracePeriodSeconds - this is not fully implemented in Docker on Windows, see the GitHub issue. The behavior today is that the ENTRYPOINT process is sent CTRL_SHUTDOWN_EVENT, then Windows waits 5 seconds by default, and finally shuts down all processes using the normal Windows shutdown behavior. The 5 second default is actually in the Windows registry inside the container, so it can be overridden when the container is built.
volumeDevices - this is a beta feature, and is not implemented on Windows. Windows cannot attach raw block devices to pods.
volumes
- If you define an emptyDir volume, you cannot set its volume source to memory.
You cannot enable mountPropagation for volume mounts as this is not supported on Windows.

Field compatibility for hostNetwork

FEATURE STATE: Kubernetes v1.26 [alpha]

The kubelet can now request that pods running on Windows nodes use the host’s network namespace instead of creating a new pod network namespace. To enable this functionality pass --feature-gates=WindowsHostNetwork=true to the kubelet.

Note: This functionality requires a container runtime that supports this functionality.

Field compatibility for Pod security context

None of the Pod securityContext fields work on Windows.

Node problem detector

The node problem detector (see Monitor Node Health) has preliminary support for Windows. For more information, visit the project’s GitHub page.

Pause container

In a Kubernetes Pod, an infrastructure or “pause” container is first created to host the container. In Linux, the cgroups and namespaces that make up a pod need a process to maintain their continued existence; the pause process provides this. Containers that belong to the same pod, including infrastructure and worker containers, share a common network endpoint (same IPv4 and / or IPv6 address, same network port spaces). Kubernetes uses pause containers to allow for worker containers crashing or restarting without losing any of the networking configuration.

Kubernetes maintains a multi-architecture image that includes support for Windows. For Kubernetes v1.26 the recommended pause image is registry.k8s.io/pause:3.6. The source code is available on GitHub.

Microsoft maintains a different multi-architecture image, with Linux and Windows amd64 support, that you can find as mcr.microsoft.com/oss/kubernetes/pause:3.6. This image is built from the same source as the Kubernetes maintained image but all of the Windows binaries are authenticode signed by Microsoft. The Kubernetes project recommends using the Microsoft maintained image if you are deploying to a production or production-like environment that requires signed binaries.

Container runtimes

You need to install a container runtime into each node in the cluster so that Pods can run there.

The following container runtimes work with Windows:

Note: This section links to third party projects that provide functionality required by Kubernetes. The Kubernetes project authors aren’t responsible for these projects, which are listed alphabetically. To add a project to this list, read the content guide before submitting a change. More information.

ContainerD

FEATURE STATE: Kubernetes v1.20 [stable]

You can use ContainerD 1.4.0+ as the container runtime for Kubernetes nodes that run Windows.

Learn how to install ContainerD on a Windows node.

Note: There is a known limitation when using GMSA with containerd to access Windows network shares, which requires a kernel patch.

Mirantis Container Runtime

Mirantis Container Runtime (MCR) is available as a container runtime for all Windows Server 2019 and later versions.

See Install MCR on Windows Servers for more information.

Windows OS version compatibility

On Windows nodes, strict compatibility rules apply where the host OS version must match the container base image OS version. Only Windows containers with a container operating system of Windows Server 2019 are fully supported.

For Kubernetes v1.26, operating system compatibility for Windows nodes (and Pods) is as follows:

Windows Server LTSC release

Windows Server 2019

Windows Server 2022

Windows Server SAC release

Windows Server version 20H2

The Kubernetes version-skew policy also applies.

Getting help and troubleshooting

Your main source of help for troubleshooting your Kubernetes cluster should start with the Troubleshooting page.

Some additional, Windows-specific troubleshooting help is included in this section. Logs are an important element of troubleshooting issues in Kubernetes. Make sure to include them any time you seek troubleshooting assistance from other contributors. Follow the instructions in the SIG Windows contributing guide on gathering logs.

Reporting issues and feature requests

If you have what looks like a bug, or you would like to make a feature request, please follow the SIG Windows contributing guide to create a new issue. You should first search the list of issues in case it was reported previously and comment with your experience on the issue and add additional logs. SIG Windows channel on the Kubernetes Slack is also a great avenue to get some initial support and troubleshooting ideas prior to creating a ticket.

Deployment tools

The kubeadm tool helps you to deploy a Kubernetes cluster, providing the control plane to manage the cluster it, and nodes to run your workloads. Adding Windows nodes explains how to deploy Windows nodes to your cluster using kubeadm.

The Kubernetes cluster API project also provides means to automate deployment of Windows nodes.

Windows distribution channels

For a detailed explanation of Windows distribution channels see the Microsoft documentation.

Information on the different Windows Server servicing channels including their support models can be found at Windows Server servicing channels.