- Postinstallation machine configuration tasks
- Understanding the Machine Config Operator
- Using MachineConfig objects to configure nodes
- Configuring chrony time service
- Disabling the chrony time service
- Adding kernel arguments to nodes
- Enabling multipathing with kernel arguments on FCOS
- Adding a real-time kernel to nodes
- Configuring journald settings
- Adding extensions to FCOS
- Loading custom firmware blobs in the machine config manifest
- Changing the core user password for node access
- Configuring MCO-related custom resources
Postinstallation machine configuration tasks
There are times when you need to make changes to the operating systems running on OKD nodes. This can include changing settings for network time service, adding kernel arguments, or configuring journaling in a specific way.
Aside from a few specialized features, most changes to operating systems on OKD nodes can be done by creating what are referred to as MachineConfig
objects that are managed by the Machine Config Operator.
Tasks in this section describe how to use features of the Machine Config Operator to configure operating system features on OKD nodes.
NetworkManager stores new network configurations to Previously, NetworkManager stored new network configurations to |
Understanding the Machine Config Operator
Machine Config Operator
Purpose
The Machine Config Operator manages and applies configuration and updates of the base operating system and container runtime, including everything between the kernel and kubelet.
There are four components:
machine-config-server
: Provides Ignition configuration to new machines joining the cluster.machine-config-controller
: Coordinates the upgrade of machines to the desired configurations defined by aMachineConfig
object. Options are provided to control the upgrade for sets of machines individually.machine-config-daemon
: Applies new machine configuration during update. Validates and verifies the state of the machine to the requested machine configuration.machine-config
: Provides a complete source of machine configuration at installation, first start up, and updates for a machine.
Currently, there is no supported way to block or restrict the machine config server endpoint. The machine config server must be exposed to the network so that newly-provisioned machines, which have no existing configuration or state, are able to fetch their configuration. In this model, the root of trust is the certificate signing requests (CSR) endpoint, which is where the kubelet sends its certificate signing request for approval to join the cluster. Because of this, machine configs should not be used to distribute sensitive information, such as secrets and certificates. To ensure that the machine config server endpoints, ports 22623 and 22624, are secured in bare metal scenarios, customers must configure proper network policies. |
Additional resources
Project
openshift-machine-config-operator
Machine config overview
The Machine Config Operator (MCO) manages updates to systemd, CRI-O and Kubelet, the kernel, Network Manager and other system features. It also offers a MachineConfig
CRD that can write configuration files onto the host (see machine-config-operator). Understanding what MCO does and how it interacts with other components is critical to making advanced, system-level changes to an OKD cluster. Here are some things you should know about MCO, machine configs, and how they are used:
Machine configs are processed alphabetically, in lexicographically increasing order, of their name. The render controller uses the first machine config in the list as the base and appends the rest to the base machine config.
A machine config can make a specific change to a file or service on the operating system of each system representing a pool of OKD nodes.
MCO applies changes to operating systems in pools of machines. All OKD clusters start with worker and control plane node pools. By adding more role labels, you can configure custom pools of nodes. For example, you can set up a custom pool of worker nodes that includes particular hardware features needed by an application. However, examples in this section focus on changes to the default pool types.
A node can have multiple labels applied that indicate its type, such as
master
orworker
, however it can be a member of only a single machine config pool.After a machine config change, the MCO updates the affected nodes alphabetically by zone, based on the
topology.kubernetes.io/zone
label. If a zone has more than one node, the oldest nodes are updated first. For nodes that do not use zones, such as in bare metal deployments, the nodes are upgraded by age, with the oldest nodes updated first. The MCO updates the number of nodes as specified by themaxUnavailable
field on the machine configuration pool at a time.Some machine configuration must be in place before OKD is installed to disk. In most cases, this can be accomplished by creating a machine config that is injected directly into the OKD installer process, instead of running as a postinstallation machine config. In other cases, you might need to do bare metal installation where you pass kernel arguments at OKD installer startup, to do such things as setting per-node individual IP addresses or advanced disk partitioning.
MCO manages items that are set in machine configs. Manual changes you do to your systems will not be overwritten by MCO, unless MCO is explicitly told to manage a conflicting file. In other words, MCO only makes specific updates you request, it does not claim control over the whole node.
Manual changes to nodes are strongly discouraged. If you need to decommission a node and start a new one, those direct changes would be lost.
MCO is only supported for writing to files in
/etc
and/var
directories, although there are symbolic links to some directories that can be writeable by being symbolically linked to one of those areas. The/opt
and/usr/local
directories are examples.Ignition is the configuration format used in MachineConfigs. See the Ignition Configuration Specification v3.2.0 for details.
Although Ignition config settings can be delivered directly at OKD installation time, and are formatted in the same way that MCO delivers Ignition configs, MCO has no way of seeing what those original Ignition configs are. Therefore, you should wrap Ignition config settings into a machine config before deploying them.
When a file managed by MCO changes outside of MCO, the Machine Config Daemon (MCD) sets the node as
degraded
. It will not overwrite the offending file, however, and should continue to operate in adegraded
state.A key reason for using a machine config is that it will be applied when you spin up new nodes for a pool in your OKD cluster. The
machine-api-operator
provisions a new machine and MCO configures it.
MCO uses Ignition as the configuration format. OKD 4.6 moved from Ignition config specification version 2 to version 3.
What can you change with machine configs?
The kinds of components that MCO can change include:
config: Create Ignition config objects (see the Ignition configuration specification) to do things like modify files, systemd services, and other features on OKD machines, including:
Configuration files: Create or overwrite files in the
/var
or/etc
directory.systemd units: Create and set the status of a systemd service or add to an existing systemd service by dropping in additional settings.
users and groups: Change SSH keys in the passwd section postinstallation.
Changing SSH keys by using a machine config is supported only for the
core
user.Adding new users by using a machine config is not supported.
kernelArguments: Add arguments to the kernel command line when OKD nodes boot.
kernelType: Optionally identify a non-standard kernel to use instead of the standard kernel. Use
realtime
to use the RT kernel (for RAN). This is only supported on select platforms. Use the64k-pages
parameter to enable the 64k page size kernel. This setting is exclusive to machines with 64-bit ARM architectures.extensions: Extend FCOS features by adding selected pre-packaged software. For this feature, available extensions include usbguard and kernel modules.
Custom resources (for
ContainerRuntime
andKubelet
): Outside of machine configs, MCO manages two special custom resources for modifying CRI-O container runtime settings (ContainerRuntime
CR) and the Kubelet service (Kubelet
CR).
The MCO is not the only Operator that can change operating system components on OKD nodes. Other Operators can modify operating system-level features as well. One example is the Node Tuning Operator, which allows you to do node-level tuning through Tuned daemon profiles.
Tasks for the MCO configuration that can be done postinstallation are included in the following procedures. See descriptions of FCOS bare metal installation for system configuration tasks that must be done during or before OKD installation.
There might be situations where the configuration on a node does not fully match what the currently-applied machine config specifies. This state is called configuration drift. The Machine Config Daemon (MCD) regularly checks the nodes for configuration drift. If the MCD detects configuration drift, the MCO marks the node degraded
until an administrator corrects the node configuration. A degraded node is online and operational, but, it cannot be updated. For more information on configuration drift, see Understanding configuration drift detection.
Project
See the openshift-machine-config-operator GitHub site for details.
Understanding configuration drift detection
There might be situations when the on-disk state of a node differs from what is configured in the machine config. This is known as configuration drift. For example, a cluster admin might manually modify a file, a systemd unit file, or a file permission that was configured through a machine config. This causes configuration drift. Configuration drift can cause problems between nodes in a Machine Config Pool or when the machine configs are updated.
The Machine Config Operator (MCO) uses the Machine Config Daemon (MCD) to check nodes for configuration drift on a regular basis. If detected, the MCO sets the node and the machine config pool (MCP) to Degraded
and reports the error. A degraded node is online and operational, but, it cannot be updated.
The MCD performs configuration drift detection upon each of the following conditions:
When a node boots.
After any of the files (Ignition files and systemd drop-in units) specified in the machine config are modified outside of the machine config.
Before a new machine config is applied.
If you apply a new machine config to the nodes, the MCD temporarily shuts down configuration drift detection. This shutdown is needed because the new machine config necessarily differs from the machine config on the nodes. After the new machine config is applied, the MCD restarts detecting configuration drift using the new machine config.
When performing configuration drift detection, the MCD validates that the file contents and permissions fully match what the currently-applied machine config specifies. Typically, the MCD detects configuration drift in less than a second after the detection is triggered.
If the MCD detects configuration drift, the MCD performs the following tasks:
Emits an error to the console logs
Emits a Kubernetes event
Stops further detection on the node
Sets the node and MCP to
degraded
You can check if you have a degraded node by listing the MCPs:
$ oc get mcp worker
If you have a degraded MCP, the DEGRADEDMACHINECOUNT
field is non-zero, similar to the following output:
Example output
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
worker rendered-worker-404caf3180818d8ac1f50c32f14b57c3 False True True 2 1 1 1 5h51m
You can determine if the problem is caused by configuration drift by examining the machine config pool:
$ oc describe mcp worker
Example output
...
Last Transition Time: 2021-12-20T18:54:00Z
Message: Node ci-ln-j4h8nkb-72292-pxqxz-worker-a-fjks4 is reporting: "content mismatch for file \"/etc/mco-test-file\"" (1)
Reason: 1 nodes are reporting degraded status on sync
Status: True
Type: NodeDegraded (2)
...
1 | This message shows that a node’s /etc/mco-test-file file, which was added by the machine config, has changed outside of the machine config. |
2 | The state of the node is NodeDegraded . |
Or, if you know which node is degraded, examine that node:
$ oc describe node/ci-ln-j4h8nkb-72292-pxqxz-worker-a-fjks4
Example output
...
Annotations: cloud.network.openshift.io/egress-ipconfig: [{"interface":"nic0","ifaddr":{"ipv4":"10.0.128.0/17"},"capacity":{"ip":10}}]
csi.volume.kubernetes.io/nodeid:
{"pd.csi.storage.gke.io":"projects/openshift-gce-devel-ci/zones/us-central1-a/instances/ci-ln-j4h8nkb-72292-pxqxz-worker-a-fjks4"}
machine.openshift.io/machine: openshift-machine-api/ci-ln-j4h8nkb-72292-pxqxz-worker-a-fjks4
machineconfiguration.openshift.io/controlPlaneTopology: HighlyAvailable
machineconfiguration.openshift.io/currentConfig: rendered-worker-67bd55d0b02b0f659aef33680693a9f9
machineconfiguration.openshift.io/desiredConfig: rendered-worker-67bd55d0b02b0f659aef33680693a9f9
machineconfiguration.openshift.io/reason: content mismatch for file "/etc/mco-test-file" (1)
machineconfiguration.openshift.io/state: Degraded (2)
...
1 | The error message indicating that configuration drift was detected between the node and the listed machine config. Here the error message indicates that the contents of the /etc/mco-test-file , which was added by the machine config, has changed outside of the machine config. |
2 | The state of the node is Degraded . |
You can correct configuration drift and return the node to the Ready
state by performing one of the following remediations:
Ensure that the contents and file permissions of the files on the node match what is configured in the machine config. You can manually rewrite the file contents or change the file permissions.
Generate a force file on the degraded node. The force file causes the MCD to bypass the usual configuration drift detection and reapplies the current machine config.
Generating a force file on a node causes that node to reboot.
Checking machine config pool status
To see the status of the Machine Config Operator (MCO), its sub-components, and the resources it manages, use the following oc
commands:
Procedure
To see the number of MCO-managed nodes available on your cluster for each machine config pool (MCP), run the following command:
$ oc get machineconfigpool
Example output
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
master rendered-master-06c9c4… True False False 3 3 3 0 4h42m
worker rendered-worker-f4b64… False True False 3 2 2 0 4h42m
where:
UPDATED
The
True
status indicates that the MCO has applied the current machine config to the nodes in that MCP. The current machine config is specified in theSTATUS
field in theoc get mcp
output. TheFalse
status indicates a node in the MCP is updating.UPDATING
The
True
status indicates that the MCO is applying the desired machine config, as specified in theMachineConfigPool
custom resource, to at least one of the nodes in that MCP. The desired machine config is the new, edited machine config. Nodes that are updating might not be available for scheduling. TheFalse
status indicates that all nodes in the MCP are updated.DEGRADED
A
True
status indicates the MCO is blocked from applying the current or desired machine config to at least one of the nodes in that MCP, or the configuration is failing. Nodes that are degraded might not be available for scheduling. AFalse
status indicates that all nodes in the MCP are ready.MACHINECOUNT
Indicates the total number of machines in that MCP.
READYMACHINECOUNT
Indicates the total number of machines in that MCP that are ready for scheduling.
UPDATEDMACHINECOUNT
Indicates the total number of machines in that MCP that have the current machine config.
DEGRADEDMACHINECOUNT
Indicates the total number of machines in that MCP that are marked as degraded or unreconcilable.
In the previous output, there are three control plane (master) nodes and three worker nodes. The control plane MCP and the associated nodes are updated to the current machine config. The nodes in the worker MCP are being updated to the desired machine config. Two of the nodes in the worker MCP are updated and one is still updating, as indicated by the
UPDATEDMACHINECOUNT
being2
. There are no issues, as indicated by theDEGRADEDMACHINECOUNT
being0
andDEGRADED
beingFalse
.While the nodes in the MCP are updating, the machine config listed under
CONFIG
is the current machine config, which the MCP is being updated from. When the update is complete, the listed machine config is the desired machine config, which the MCP was updated to.If a node is being cordoned, that node is not included in the
READYMACHINECOUNT
, but is included in theMACHINECOUNT
. Also, the MCP status is set toUPDATING
. Because the node has the current machine config, it is counted in theUPDATEDMACHINECOUNT
total:Example outputNAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
master rendered-master-06c9c4… True False False 3 3 3 0 4h42m
worker rendered-worker-c1b41a… False True False 3 2 3 0 4h42m
To check the status of the nodes in an MCP by examining the
MachineConfigPool
custom resource, run the following command: :$ oc describe mcp worker
Example output
...
Degraded Machine Count: 0
Machine Count: 3
Observed Generation: 2
Ready Machine Count: 3
Unavailable Machine Count: 0
Updated Machine Count: 3
Events: <none>
If a node is being cordoned, the node is not included in the
Ready Machine Count
. It is included in theUnavailable Machine Count
:Example output…
Degraded Machine Count: 0
Machine Count: 3
Observed Generation: 2
Ready Machine Count: 2
Unavailable Machine Count: 1
Updated Machine Count: 3
To see each existing
MachineConfig
object, run the following command:$ oc get machineconfigs
Example output
NAME GENERATEDBYCONTROLLER IGNITIONVERSION AGE
00-master 2c9371fbb673b97a6fe8b1c52... 3.2.0 5h18m
00-worker 2c9371fbb673b97a6fe8b1c52... 3.2.0 5h18m
01-master-container-runtime 2c9371fbb673b97a6fe8b1c52... 3.2.0 5h18m
01-master-kubelet 2c9371fbb673b97a6fe8b1c52… 3.2.0 5h18m
...
rendered-master-dde... 2c9371fbb673b97a6fe8b1c52... 3.2.0 5h18m
rendered-worker-fde... 2c9371fbb673b97a6fe8b1c52... 3.2.0 5h18m
Note that the
MachineConfig
objects listed asrendered
are not meant to be changed or deleted.To view the contents of a particular machine config (in this case,
01-master-kubelet
), run the following command:$ oc describe machineconfigs 01-master-kubelet
The output from the command shows that this
MachineConfig
object contains both configuration files (cloud.conf
andkubelet.conf
) and a systemd service (Kubernetes Kubelet):Example output
Name: 01-master-kubelet
...
Spec:
Config:
Ignition:
Version: 3.2.0
Storage:
Files:
Contents:
Source: data:,
Mode: 420
Overwrite: true
Path: /etc/kubernetes/cloud.conf
Contents:
Source: data:,kind%3A%20KubeletConfiguration%0AapiVersion%3A%20kubelet.config.k8s.io%2Fv1beta1%0Aauthentication%3A%0A%20%20x509%3A%0A%20%20%20%20clientCAFile%3A%20%2Fetc%2Fkubernetes%2Fkubelet-ca.crt%0A%20%20anonymous...
Mode: 420
Overwrite: true
Path: /etc/kubernetes/kubelet.conf
Systemd:
Units:
Contents: [Unit]
Description=Kubernetes Kubelet
Wants=rpc-statd.service network-online.target crio.service
After=network-online.target crio.service
ExecStart=/usr/bin/hyperkube \
kubelet \
--config=/etc/kubernetes/kubelet.conf \ ...
If something goes wrong with a machine config that you apply, you can always back out that change. For example, if you had run oc create -f ./myconfig.yaml
to apply a machine config, you could remove that machine config by running the following command:
$ oc delete -f ./myconfig.yaml
If that was the only problem, the nodes in the affected pool should return to a non-degraded state. This actually causes the rendered configuration to roll back to its previously rendered state.
If you add your own machine configs to your cluster, you can use the commands shown in the previous example to check their status and the related status of the pool to which they are applied.
Viewing and interacting with certificates
The following certificates are handled in the cluster by the Machine Config Controller (MCC) and can be found in the ControllerConfig
resource:
/etc/kubernetes/kubelet-ca.crt
/etc/kubernetes/static-pod-resources/configmaps/cloud-config/ca-bundle.pem
/etc/pki/ca-trust/source/anchors/openshift-config-user-ca-bundle.crt
The MCC also handles the image registry certificates and its associated user bundle certificate.
You can get information about the listed certificates, including the underyling bundle the certificate comes from, and the signing and subject data.
Procedure
Get detailed certificate information by running the following command:
$ oc get controllerconfig/machine-config-controller -o yaml | yq -y '.status.controllerCertificates'
Example output
"controllerCertificates": [
{
"bundleFile": "KubeAPIServerServingCAData",
"signer": "<signer_data1>",
"subject": "CN=openshift-kube-apiserver-operator_node-system-admin-signer@168909215"
},
{
"bundleFile": "RootCAData",
"signer": "<signer_data2>",
"subject": "CN=root-ca,OU=openshift"
}
]
Get a simpler version of the information found in the ControllerConfig by checking the machine config pool status using the following command:
$ oc get mcp master -o yaml | yq -y '.status.certExpirys'
Example output
status:
certExpirys:
- bundle: KubeAPIServerServingCAData
subject: CN=admin-kubeconfig-signer,OU=openshift
- bundle: KubeAPIServerServingCAData
subject: CN=kube-csr-signer_@1689585558
- bundle: KubeAPIServerServingCAData
subject: CN=kubelet-signer,OU=openshift
- bundle: KubeAPIServerServingCAData
subject: CN=kube-apiserver-to-kubelet-signer,OU=openshift
- bundle: KubeAPIServerServingCAData
subject: CN=kube-control-plane-signer,OU=openshift
This method is meant for OKD applications that already consume machine config pool information.
Check which image registry certificates are on the nodes by looking at the contents of the
/etc/docker/cert.d
directory:# ls /etc/docker/certs.d
Example output
image-registry.openshift-image-registry.svc.cluster.local:5000 image-registry.openshift-image-registry.svc:5000
Using MachineConfig objects to configure nodes
You can use the tasks in this section to create MachineConfig
objects that modify files, systemd unit files, and other operating system features running on OKD nodes. For more ideas on working with machine configs, see content related to updating SSH authorized keys, verifying image signatures, enabling SCTP, and configuring iSCSI initiatornames for OKD.
OKD supports Ignition specification version 3.2. All new machine configs you create going forward should be based on Ignition specification version 3.2. If you are upgrading your OKD cluster, any existing Ignition specification version 2.x machine configs will be translated automatically to specification version 3.2.
There might be situations where the configuration on a node does not fully match what the currently-applied machine config specifies. This state is called configuration drift. The Machine Config Daemon (MCD) regularly checks the nodes for configuration drift. If the MCD detects configuration drift, the MCO marks the node degraded
until an administrator corrects the node configuration. A degraded node is online and operational, but, it cannot be updated. For more information on configuration drift, see Understanding configuration drift detection.
Use the following “Configuring chrony time service” procedure as a model for how to go about adding other configuration files to OKD nodes. |
Configuring chrony time service
You can set the time server and related settings used by the chrony time service (chronyd
) by modifying the contents of the chrony.conf
file and passing those contents to your nodes as a machine config.
Procedure
Create a Butane config including the contents of the
chrony.conf
file. For example, to configure chrony on worker nodes, create a99-worker-chrony.bu
file.See “Creating machine configs with Butane” for information about Butane.
variant: openshift
version: 4.0
metadata:
name: 99-worker-chrony (1)
labels:
machineconfiguration.openshift.io/role: worker (1)
storage:
files:
- path: /etc/chrony.conf
mode: 0644 (2)
overwrite: true
contents:
inline: |
pool 0.rhel.pool.ntp.org iburst (3)
driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync
logdir /var/log/chrony
1 On control plane nodes, substitute master
forworker
in both of these locations.2 Specify an octal value mode for the mode
field in the machine config file. After creating the file and applying the changes, themode
is converted to a decimal value. You can check the YAML file with the commandoc get mc <mc-name> -o yaml
.3 Specify any valid, reachable time source, such as the one provided by your DHCP server. Alternately, you can specify any of the following NTP servers: 1.rhel.pool.ntp.org
,2.rhel.pool.ntp.org
, or3.rhel.pool.ntp.org
.Use Butane to generate a
MachineConfig
object file,99-worker-chrony.yaml
, containing the configuration to be delivered to the nodes:$ butane 99-worker-chrony.bu -o 99-worker-chrony.yaml
Apply the configurations in one of two ways:
If the cluster is not running yet, after you generate manifest files, add the
MachineConfig
object file to the<installation_directory>/openshift
directory, and then continue to create the cluster.If the cluster is already running, apply the file:
$ oc apply -f ./99-worker-chrony.yaml
Additional resources
Disabling the chrony time service
You can disable the chrony time service (chronyd
) for nodes with a specific role by using a MachineConfig
custom resource (CR).
Prerequisites
Install the OpenShift CLI (
oc
).Log in as a user with
cluster-admin
privileges.
Procedure
Create the
MachineConfig
CR that disableschronyd
for the specified node role.Save the following YAML in the
disable-chronyd.yaml
file:apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: <node_role> (1)
name: disable-chronyd
spec:
config:
ignition:
version: 3.2.0
systemd:
units:
- contents: |
[Unit]
Description=NTP client/server
Documentation=man:chronyd(8) man:chrony.conf(5)
After=ntpdate.service sntp.service ntpd.service
Conflicts=ntpd.service systemd-timesyncd.service
ConditionCapability=CAP_SYS_TIME
[Service]
Type=forking
PIDFile=/run/chrony/chronyd.pid
EnvironmentFile=-/etc/sysconfig/chronyd
ExecStart=/usr/sbin/chronyd $OPTIONS
ExecStartPost=/usr/libexec/chrony-helper update-daemon
PrivateTmp=yes
ProtectHome=yes
ProtectSystem=full
[Install]
WantedBy=multi-user.target
enabled: false
name: "chronyd.service"
1 Node role where you want to disable chronyd
, for example,master
.Create the
MachineConfig
CR by running the following command:$ oc create -f disable-chronyd.yaml
Adding kernel arguments to nodes
In some special cases, you might want to add kernel arguments to a set of nodes in your cluster. This should only be done with caution and clear understanding of the implications of the arguments you set.
Improper use of kernel arguments can result in your systems becoming unbootable. |
Examples of kernel arguments you could set include:
enforcing=0: Configures Security Enhanced Linux (SELinux) to run in permissive mode. In permissive mode, the system acts as if SELinux is enforcing the loaded security policy, including labeling objects and emitting access denial entries in the logs, but it does not actually deny any operations. While not supported for production systems, permissive mode can be helpful for debugging.
nosmt: Disables symmetric multithreading (SMT) in the kernel. Multithreading allows multiple logical threads for each CPU. You could consider
nosmt
in multi-tenant environments to reduce risks from potential cross-thread attacks. By disabling SMT, you essentially choose security over performance.systemd.unified_cgroup_hierarchy: Configures the version of Linux control group that is installed on your nodes: cgroup v1 or cgroup v2. cgroup v2 is the next version of the kernel control group and offers multiple improvements. However, it can have some unwanted effects on your nodes.
cgroup v2 is enabled by default. To disable cgroup v2, use the
systemd.unified_cgroup_hierarchy=0
kernel argument, as shown in the following procedure.
See Kernel.org kernel parameters for a list and descriptions of kernel arguments.
In the following procedure, you create a MachineConfig
object that identifies:
A set of machines to which you want to add the kernel argument. In this case, machines with a worker role.
Kernel arguments that are appended to the end of the existing kernel arguments.
A label that indicates where in the list of machine configs the change is applied.
Prerequisites
- Have administrative privilege to a working OKD cluster.
Procedure
List existing
MachineConfig
objects for your OKD cluster to determine how to label your machine config:$ oc get MachineConfig
Example output
NAME GENERATEDBYCONTROLLER IGNITIONVERSION AGE
00-master 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
00-worker 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
01-master-container-runtime 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
01-master-kubelet 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
01-worker-container-runtime 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
01-worker-kubelet 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
99-master-generated-registries 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
99-master-ssh 3.2.0 40m
99-worker-generated-registries 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
99-worker-ssh 3.2.0 40m
rendered-master-23e785de7587df95a4b517e0647e5ab7 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
rendered-worker-5d596d9293ca3ea80c896a1191735bb1 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
Create a
MachineConfig
object file that identifies the kernel argument (for example,05-worker-kernelarg-selinuxpermissive.yaml
)apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: worker (1)
name: 05-worker-kernelarg-selinuxpermissive (2)
spec:
config:
ignition:
version: 3.2.0
kernelArguments:
- enforcing=0 (3)
systemd.unified_cgroup_hierarchy=0 (4)
#...
1 Applies the new kernel argument only to worker nodes. 2 Named to identify where it fits among the machine configs (05) and what it does (adds a kernel argument to configure SELinux permissive mode). 3 Identifies the exact kernel argument as enforcing=0
.4 Configures cgroup v1 on the associated nodes. cgroup v2 is the default. Create the new machine config:
$ oc create -f 05-worker-kernelarg-selinuxpermissive.yaml
Check the machine configs to see that the new one was added:
$ oc get MachineConfig
Example output
NAME GENERATEDBYCONTROLLER IGNITIONVERSION AGE
00-master 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
00-worker 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
01-master-container-runtime 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
01-master-kubelet 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
01-worker-container-runtime 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
01-worker-kubelet 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
05-worker-kernelarg-selinuxpermissive 3.2.0 105s
99-master-generated-registries 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
99-master-ssh 3.2.0 40m
99-worker-generated-registries 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
99-worker-ssh 3.2.0 40m
rendered-master-23e785de7587df95a4b517e0647e5ab7 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
rendered-worker-5d596d9293ca3ea80c896a1191735bb1 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
Check the nodes:
$ oc get nodes
Example output
NAME STATUS ROLES AGE VERSION
ip-10-0-136-161.ec2.internal Ready worker 28m v1.28.5
ip-10-0-136-243.ec2.internal Ready master 34m v1.28.5
ip-10-0-141-105.ec2.internal Ready,SchedulingDisabled worker 28m v1.28.5
ip-10-0-142-249.ec2.internal Ready master 34m v1.28.5
ip-10-0-153-11.ec2.internal Ready worker 28m v1.28.5
ip-10-0-153-150.ec2.internal Ready master 34m v1.28.5
You can see that scheduling on each worker node is disabled as the change is being applied.
Check that the kernel argument worked by going to one of the worker nodes and listing the kernel command line arguments (in
/proc/cmdline
on the host):$ oc debug node/ip-10-0-141-105.ec2.internal
Example output
Starting pod/ip-10-0-141-105ec2internal-debug ...
To use host binaries, run `chroot /host`
sh-4.2# cat /host/proc/cmdline
BOOT_IMAGE=/ostree/rhcos-... console=tty0 console=ttyS0,115200n8
rootflags=defaults,prjquota rw root=UUID=fd0... ostree=/ostree/boot.0/rhcos/16...
coreos.oem.id=qemu coreos.oem.id=ec2 ignition.platform.id=ec2 enforcing=0
sh-4.2# exit
You should see the
enforcing=0
argument added to the other kernel arguments.
Enabling multipathing with kernel arguments on FCOS
Fedora CoreOS (FCOS) supports multipathing on the primary disk, allowing stronger resilience to hardware failure to achieve higher host availability. Postinstallation support is available by activating multipathing via the machine config.
Enabling multipathing during installation is supported and recommended for nodes provisioned in OKD 4.8 or higher. In setups where any I/O to non-optimized paths results in I/O system errors, you must enable multipathing at installation time. For more information about enabling multipathing during installation time, see “Enabling multipathing with kernel arguments on RHCOS” in the Installing on bare metal documentation. |
On IBM Z® and IBM® LinuxONE, you can enable multipathing only if you configured your cluster for it during installation. For more information, see “Installing FCOS and starting the OKD bootstrap process” in Installing a cluster with z/VM on IBM Z® and IBM® LinuxONE. |
Prerequisites
You have a running OKD cluster that uses version 4.7 or later.
You are logged in to the cluster as a user with administrative privileges.
You have confirmed that the disk is enabled for multipathing. Multipathing is only supported on hosts that are connected to a SAN via an HBA adapter.
Procedure
To enable multipathing postinstallation on control plane nodes:
Create a machine config file, such as
99-master-kargs-mpath.yaml
, that instructs the cluster to add themaster
label and that identifies the multipath kernel argument, for example:apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: "master"
name: 99-master-kargs-mpath
spec:
kernelArguments:
- 'rd.multipath=default'
- 'root=/dev/disk/by-label/dm-mpath-root'
To enable multipathing postinstallation on worker nodes:
Create a machine config file, such as
99-worker-kargs-mpath.yaml
, that instructs the cluster to add theworker
label and that identifies the multipath kernel argument, for example:apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: "worker"
name: 99-worker-kargs-mpath
spec:
kernelArguments:
- 'rd.multipath=default'
- 'root=/dev/disk/by-label/dm-mpath-root'
Create the new machine config by using either the master or worker YAML file you previously created:
$ oc create -f ./99-worker-kargs-mpath.yaml
Check the machine configs to see that the new one was added:
$ oc get MachineConfig
Example output
NAME GENERATEDBYCONTROLLER IGNITIONVERSION AGE
00-master 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
00-worker 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
01-master-container-runtime 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
01-master-kubelet 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
01-worker-container-runtime 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
01-worker-kubelet 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
99-master-generated-registries 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
99-master-ssh 3.2.0 40m
99-worker-generated-registries 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
99-worker-kargs-mpath 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 105s
99-worker-ssh 3.2.0 40m
rendered-master-23e785de7587df95a4b517e0647e5ab7 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
rendered-worker-5d596d9293ca3ea80c896a1191735bb1 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
Check the nodes:
$ oc get nodes
Example output
NAME STATUS ROLES AGE VERSION
ip-10-0-136-161.ec2.internal Ready worker 28m v1.28.5
ip-10-0-136-243.ec2.internal Ready master 34m v1.28.5
ip-10-0-141-105.ec2.internal Ready,SchedulingDisabled worker 28m v1.28.5
ip-10-0-142-249.ec2.internal Ready master 34m v1.28.5
ip-10-0-153-11.ec2.internal Ready worker 28m v1.28.5
ip-10-0-153-150.ec2.internal Ready master 34m v1.28.5
You can see that scheduling on each worker node is disabled as the change is being applied.
Check that the kernel argument worked by going to one of the worker nodes and listing the kernel command line arguments (in
/proc/cmdline
on the host):$ oc debug node/ip-10-0-141-105.ec2.internal
Example output
Starting pod/ip-10-0-141-105ec2internal-debug ...
To use host binaries, run `chroot /host`
sh-4.2# cat /host/proc/cmdline
...
rd.multipath=default root=/dev/disk/by-label/dm-mpath-root
...
sh-4.2# exit
You should see the added kernel arguments.
Additional resources
- See Enabling multipathing with kernel arguments on RHCOS for more information about enabling multipathing during installation time.
Adding a real-time kernel to nodes
Some OKD workloads require a high degree of determinism.While Linux is not a real-time operating system, the Linux real-time kernel includes a preemptive scheduler that provides the operating system with real-time characteristics.
If your OKD workloads require these real-time characteristics, you can switch your machines to the Linux real-time kernel. For OKD, 4 you can make this switch using a MachineConfig
object. Although making the change is as simple as changing a machine config kernelType
setting to realtime
, there are a few other considerations before making the change:
Currently, real-time kernel is supported only on worker nodes, and only for radio access network (RAN) use.
The following procedure is fully supported with bare metal installations that use systems that are certified for Red Hat Enterprise Linux for Real Time 8.
Real-time support in OKD is limited to specific subscriptions.
The following procedure is also supported for use with Google Cloud Platform.
Prerequisites
Have a running OKD cluster (version 4.4 or later).
Log in to the cluster as a user with administrative privileges.
Procedure
Create a machine config for the real-time kernel: Create a YAML file (for example,
99-worker-realtime.yaml
) that contains aMachineConfig
object for therealtime
kernel type. This example tells the cluster to use a real-time kernel for all worker nodes:$ cat << EOF > 99-worker-realtime.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: "worker"
name: 99-worker-realtime
spec:
kernelType: realtime
EOF
Add the machine config to the cluster. Type the following to add the machine config to the cluster:
$ oc create -f 99-worker-realtime.yaml
Check the real-time kernel: Once each impacted node reboots, log in to the cluster and run the following commands to make sure that the real-time kernel has replaced the regular kernel for the set of nodes you configured:
$ oc get nodes
Example output
NAME STATUS ROLES AGE VERSION
ip-10-0-143-147.us-east-2.compute.internal Ready worker 103m v1.28.5
ip-10-0-146-92.us-east-2.compute.internal Ready worker 101m v1.28.5
ip-10-0-169-2.us-east-2.compute.internal Ready worker 102m v1.28.5
$ oc debug node/ip-10-0-143-147.us-east-2.compute.internal
Example output
Starting pod/ip-10-0-143-147us-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
sh-4.4# uname -a
Linux <worker_node> 4.18.0-147.3.1.rt24.96.el8_1.x86_64 #1 SMP PREEMPT RT
Wed Nov 27 18:29:55 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
The kernel name contains
rt
and text “PREEMPT RT” indicates that this is a real-time kernel.To go back to the regular kernel, delete the
MachineConfig
object:$ oc delete -f 99-worker-realtime.yaml
Configuring journald settings
If you need to configure settings for the journald
service on OKD nodes, you can do that by modifying the appropriate configuration file and passing the file to the appropriate pool of nodes as a machine config.
This procedure describes how to modify journald
rate limiting settings in the /etc/systemd/journald.conf
file and apply them to worker nodes. See the journald.conf
man page for information on how to use that file.
Prerequisites
Have a running OKD cluster.
Log in to the cluster as a user with administrative privileges.
Procedure
Create a Butane config file,
40-worker-custom-journald.bu
, that includes an/etc/systemd/journald.conf
file with the required settings.See “Creating machine configs with Butane” for information about Butane.
variant: openshift
version: 4.0
metadata:
name: 40-worker-custom-journald
labels:
machineconfiguration.openshift.io/role: worker
storage:
files:
- path: /etc/systemd/journald.conf
mode: 0644
overwrite: true
contents:
inline: |
# Disable rate limiting
RateLimitInterval=1s
RateLimitBurst=10000
Storage=volatile
Compress=no
MaxRetentionSec=30s
Use Butane to generate a
MachineConfig
object file,40-worker-custom-journald.yaml
, containing the configuration to be delivered to the worker nodes:$ butane 40-worker-custom-journald.bu -o 40-worker-custom-journald.yaml
Apply the machine config to the pool:
$ oc apply -f 40-worker-custom-journald.yaml
Check that the new machine config is applied and that the nodes are not in a degraded state. It might take a few minutes. The worker pool will show the updates in progress, as each node successfully has the new machine config applied:
$ oc get machineconfigpool
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
master rendered-master-35 True False False 3 3 3 0 34m
worker rendered-worker-d8 False True False 3 1 1 0 34m
To check that the change was applied, you can log in to a worker node:
$ oc get node | grep worker
ip-10-0-0-1.us-east-2.compute.internal Ready worker 39m v0.0.0-master+$Format:%h$
$ oc debug node/ip-10-0-0-1.us-east-2.compute.internal
Starting pod/ip-10-0-141-142us-east-2computeinternal-debug ...
...
sh-4.2# chroot /host
sh-4.4# cat /etc/systemd/journald.conf
# Disable rate limiting
RateLimitInterval=1s
RateLimitBurst=10000
Storage=volatile
Compress=no
MaxRetentionSec=30s
sh-4.4# exit
Additional resources
Adding extensions to FCOS
FCOS is a minimal container-oriented RHEL operating system, designed to provide a common set of capabilities to OKD clusters across all platforms. While adding software packages to FCOS systems is generally discouraged, the MCO provides an extensions
feature you can use to add a minimal set of features to FCOS nodes.
Currently, the following extensions are available:
usbguard: Adding the
usbguard
extension protects FCOS systems from attacks from intrusive USB devices. See USBGuard for details.kerberos: Adding the
kerberos
extension provides a mechanism that allows both users and machines to identify themselves to the network to receive defined, limited access to the areas and services that an administrator has configured. See Using Kerberos for details, including how to set up a Kerberos client and mount a Kerberized NFS share.
The following procedure describes how to use a machine config to add one or more extensions to your FCOS nodes.
Prerequisites
Have a running OKD cluster (version 4.6 or later).
Log in to the cluster as a user with administrative privileges.
Procedure
Create a machine config for extensions: Create a YAML file (for example,
80-extensions.yaml
) that contains aMachineConfig
extensions
object. This example tells the cluster to add theusbguard
extension.$ cat << EOF > 80-extensions.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: worker
name: 80-worker-extensions
spec:
config:
ignition:
version: 3.2.0
extensions:
- usbguard
EOF
Add the machine config to the cluster. Type the following to add the machine config to the cluster:
$ oc create -f 80-extensions.yaml
This sets all worker nodes to have rpm packages for
usbguard
installed.Check that the extensions were applied:
$ oc get machineconfig 80-worker-extensions
Example output
NAME GENERATEDBYCONTROLLER IGNITIONVERSION AGE
80-worker-extensions 3.2.0 57s
Check that the new machine config is now applied and that the nodes are not in a degraded state. It may take a few minutes. The worker pool will show the updates in progress, as each machine successfully has the new machine config applied:
$ oc get machineconfigpool
Example output
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
master rendered-master-35 True False False 3 3 3 0 34m
worker rendered-worker-d8 False True False 3 1 1 0 34m
Check the extensions. To check that the extension was applied, run:
$ oc get node | grep worker
Example output
NAME STATUS ROLES AGE VERSION
ip-10-0-169-2.us-east-2.compute.internal Ready worker 102m v1.28.5
$ oc debug node/ip-10-0-169-2.us-east-2.compute.internal
Example output
...
To use host binaries, run `chroot /host`
sh-4.4# chroot /host
sh-4.4# rpm -q usbguard
usbguard-0.7.4-4.el8.x86_64.rpm
Loading custom firmware blobs in the machine config manifest
Because the default location for firmware blobs in /usr/lib
is read-only, you can locate a custom firmware blob by updating the search path. This enables you to load local firmware blobs in the machine config manifest when the blobs are not managed by FCOS.
Procedure
Create a Butane config file,
98-worker-firmware-blob.bu
, that updates the search path so that it is root-owned and writable to local storage. The following example places the custom blob file from your local workstation onto nodes under/var/lib/firmware
.See “Creating machine configs with Butane” for information about Butane.
Butane config file for custom firmware blob
variant: openshift
version: 4.0
metadata:
labels:
machineconfiguration.openshift.io/role: worker
name: 98-worker-firmware-blob
storage:
files:
- path: /var/lib/firmware/<package_name> (1)
contents:
local: <package_name> (2)
mode: 0644 (3)
openshift:
kernel_arguments:
- 'firmware_class.path=/var/lib/firmware' (4)
1 Sets the path on the node where the firmware package is copied to. 2 Specifies a file with contents that are read from a local file directory on the system running Butane. The path of the local file is relative to a files-dir
directory, which must be specified by using the—files-dir
option with Butane in the following step.3 Sets the permissions for the file on the FCOS node. It is recommended to set 0644
permissions.4 The firmware_class.path
parameter customizes the kernel search path of where to look for the custom firmware blob that was copied from your local workstation onto the root file system of the node. This example uses/var/lib/firmware
as the customized path.Run Butane to generate a
MachineConfig
object file that uses a copy of the firmware blob on your local workstation named98-worker-firmware-blob.yaml
. The firmware blob contains the configuration to be delivered to the nodes. The following example uses the--files-dir
option to specify the directory on your workstation where the local file or files are located:$ butane 98-worker-firmware-blob.bu -o 98-worker-firmware-blob.yaml --files-dir <directory_including_package_name>
Apply the configurations to the nodes in one of two ways:
If the cluster is not running yet, after you generate manifest files, add the
MachineConfig
object file to the<installation_directory>/openshift
directory, and then continue to create the cluster.If the cluster is already running, apply the file:
$ oc apply -f 98-worker-firmware-blob.yaml
A
MachineConfig
object YAML file is created for you to finish configuring your machines.
Save the Butane config in case you need to update the
MachineConfig
object in the future.
Additional resources
Changing the core user password for node access
By default, Fedora CoreOS (FCOS) creates a user named core
on the nodes in your cluster. You can use the core
user to access the node through a cloud provider serial console or a bare metal baseboard controller manager (BMC). This can be helpful, for example, if a node is down and you cannot access that node by using SSH or the oc debug node
command. However, by default, there is no password for this user, so you cannot log in without creating one.
You can create a password for the core
user by using a machine config. The Machine Config Operator (MCO) assigns the password and injects the password into the /etc/shadow
file, allowing you to log in with the core
user. The MCO does not examine the password hash. As such, the MCO cannot report if there is a problem with the password.
|
You can change the password, if needed, by editing the machine config you used to create the password. Also, you can remove the password by deleting the machine config. Deleting the machine config does not remove the user account.
Prerequisites
- Create a hashed password by using a tool that is supported by your operating system.
Procedure
Create a machine config file that contains the
core
username and the hashed password:apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: worker
name: set-core-user-password
spec:
config:
ignition:
version: 3.2.0
passwd:
users:
- name: core (1)
passwordHash: <password> (2)
1 This must be core
.2 The hashed password to use with the core
account.Create the machine config by running the following command:
$ oc create -f <file-name>.yaml
The nodes do not reboot and should become available in a few moments. You can use the
oc get mcp
to watch for the machine config pools to be updated, as shown in the following example:NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
master rendered-master-d686a3ffc8fdec47280afec446fce8dd True False False 3 3 3 0 64m
worker rendered-worker-4605605a5b1f9de1d061e9d350f251e5 False True False 3 0 0 0 64m
Verification
After the nodes return to the
UPDATED=True
state, start a debug session for a node by running the following command:$ oc debug node/<node_name>
Set
/host
as the root directory within the debug shell by running the following command:sh-4.4# chroot /host
Check the contents of the
/etc/shadow
file:Example output
...
core:$6$2sE/010goDuRSxxv$o18K52wor.wIwZp:19418:0:99999:7:::
...
The hashed password is assigned to the
core
user.
Configuring MCO-related custom resources
Besides managing MachineConfig
objects, the MCO manages two custom resources (CRs): KubeletConfig
and ContainerRuntimeConfig
. Those CRs let you change node-level settings impacting how the Kubelet and CRI-O container runtime services behave.
Creating a KubeletConfig CRD to edit kubelet parameters
The kubelet configuration is currently serialized as an Ignition configuration, so it can be directly edited. However, there is also a new kubelet-config-controller
added to the Machine Config Controller (MCC). This lets you use a KubeletConfig
custom resource (CR) to edit the kubelet parameters.
As the fields in the |
Consider the following guidance:
Create one
KubeletConfig
CR for each machine config pool with all the config changes you want for that pool. If you are applying the same content to all of the pools, you need only oneKubeletConfig
CR for all of the pools.Edit an existing
KubeletConfig
CR to modify existing settings or add new settings, instead of creating a CR for each change. It is recommended that you create a CR only to modify a different machine config pool, or for changes that are intended to be temporary, so that you can revert the changes.As needed, create multiple
KubeletConfig
CRs with a limit of 10 per cluster. For the firstKubeletConfig
CR, the Machine Config Operator (MCO) creates a machine config appended withkubelet
. With each subsequent CR, the controller creates anotherkubelet
machine config with a numeric suffix. For example, if you have akubelet
machine config with a-2
suffix, the nextkubelet
machine config is appended with-3
.
If you want to delete the machine configs, delete them in reverse order to avoid exceeding the limit. For example, you delete the kubelet-3
machine config before deleting the kubelet-2
machine config.
If you have a machine config with a |
Example KubeletConfig
CR
$ oc get kubeletconfig
NAME AGE
set-max-pods 15m
Example showing a KubeletConfig
machine config
$ oc get mc | grep kubelet
...
99-worker-generated-kubelet-1 b5c5119de007945b6fe6fb215db3b8e2ceb12511 3.2.0 26m
...
The following procedure is an example to show how to configure the maximum number of pods per node on the worker nodes.
Prerequisites
Obtain the label associated with the static
MachineConfigPool
CR for the type of node you want to configure. Perform one of the following steps:View the machine config pool:
$ oc describe machineconfigpool <name>
For example:
$ oc describe machineconfigpool worker
Example output
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
creationTimestamp: 2019-02-08T14:52:39Z
generation: 1
labels:
custom-kubelet: set-max-pods (1)
1 If a label has been added it appears under labels
.If the label is not present, add a key/value pair:
$ oc label machineconfigpool worker custom-kubelet=set-max-pods
Procedure
View the available machine configuration objects that you can select:
$ oc get machineconfig
By default, the two kubelet-related configs are
01-master-kubelet
and01-worker-kubelet
.Check the current value for the maximum pods per node:
$ oc describe node <node_name>
For example:
$ oc describe node ci-ln-5grqprb-f76d1-ncnqq-worker-a-mdv94
Look for
value: pods: <value>
in theAllocatable
stanza:Example output
Allocatable:
attachable-volumes-aws-ebs: 25
cpu: 3500m
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 15341844Ki
pods: 250
Set the maximum pods per node on the worker nodes by creating a custom resource file that contains the kubelet configuration:
Kubelet configurations that target a specific machine config pool also affect any dependent pools. For example, creating a kubelet configuration for the pool containing worker nodes will also apply to any subset pools, including the pool containing infrastructure nodes. To avoid this, you must create a new machine config pool with a selection expression that only includes worker nodes, and have your kubelet configuration target this new pool.
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
name: set-max-pods
spec:
machineConfigPoolSelector:
matchLabels:
custom-kubelet: set-max-pods (1)
kubeletConfig:
maxPods: 500 (2)
1 Enter the label from the machine config pool. 2 Add the kubelet configuration. In this example, use maxPods
to set the maximum pods per node.The rate at which the kubelet talks to the API server depends on queries per second (QPS) and burst values. The default values,
50
forkubeAPIQPS
and100
forkubeAPIBurst
, are sufficient if there are limited pods running on each node. It is recommended to update the kubelet QPS and burst rates if there are enough CPU and memory resources on the node.apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
name: set-max-pods
spec:
machineConfigPoolSelector:
matchLabels:
custom-kubelet: set-max-pods
kubeletConfig:
maxPods: <pod_count>
kubeAPIBurst: <burst_rate>
kubeAPIQPS: <QPS>
Update the machine config pool for workers with the label:
$ oc label machineconfigpool worker custom-kubelet=set-max-pods
Create the
KubeletConfig
object:$ oc create -f change-maxPods-cr.yaml
Verify that the
KubeletConfig
object is created:$ oc get kubeletconfig
Example output
NAME AGE
set-max-pods 15m
Depending on the number of worker nodes in the cluster, wait for the worker nodes to be rebooted one by one. For a cluster with 3 worker nodes, this could take about 10 to 15 minutes.
Verify that the changes are applied to the node:
Check on a worker node that the
maxPods
value changed:$ oc describe node <node_name>
Locate the
Allocatable
stanza:...
Allocatable:
attachable-volumes-gce-pd: 127
cpu: 3500m
ephemeral-storage: 123201474766
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 14225400Ki
pods: 500 (1)
...
1 In this example, the pods
parameter should report the value you set in theKubeletConfig
object.
Verify the change in the
KubeletConfig
object:$ oc get kubeletconfigs set-max-pods -o yaml
This should show a status of
True
andtype:Success
, as shown in the following example:spec:
kubeletConfig:
maxPods: 500
machineConfigPoolSelector:
matchLabels:
custom-kubelet: set-max-pods
status:
conditions:
- lastTransitionTime: "2021-06-30T17:04:07Z"
message: Success
status: "True"
type: Success
Creating a ContainerRuntimeConfig CR to edit CRI-O parameters
You can change some of the settings associated with the OKD CRI-O runtime for the nodes associated with a specific machine config pool (MCP). Using a ContainerRuntimeConfig
custom resource (CR), you set the configuration values and add a label to match the MCP. The MCO then rebuilds the crio.conf
and storage.conf
configuration files on the associated nodes with the updated values.
To revert the changes implemented by using a |
You can modify the following settings by using a ContainerRuntimeConfig
CR:
PIDs limit: Setting the PIDs limit in the
ContainerRuntimeConfig
is expected to be deprecated. If PIDs limits are required, it is recommended to use thepodPidsLimit
field in theKubeletConfig
CR instead. The default value of thepodPidsLimit
field is4096
.The CRI-O flag is applied on the cgroup of the container, while the Kubelet flag is set on the cgroup of the pod. Please adjust the PIDs limit accordingly.
Log level: The
logLevel
parameter sets the CRI-Olog_level
parameter, which is the level of verbosity for log messages. The default isinfo
(log_level = info
). Other options includefatal
,panic
,error
,warn
,debug
, andtrace
.Overlay size: The
overlaySize
parameter sets the CRI-O Overlay storage driversize
parameter, which is the maximum size of a container image.Maximum log size: Setting the maximum log size in the
ContainerRuntimeConfig
is expected to be deprecated. If a maximum log size is required, it is recommended to use thecontainerLogMaxSize
field in theKubeletConfig
CR instead.Container runtime: The
defaultRuntime
parameter sets the container runtime to eitherrunc
orcrun
. The default isrunc
.
You should have one ContainerRuntimeConfig
CR for each machine config pool with all the config changes you want for that pool. If you are applying the same content to all the pools, you only need one ContainerRuntimeConfig
CR for all the pools.
You should edit an existing ContainerRuntimeConfig
CR to modify existing settings or add new settings instead of creating a new CR for each change. It is recommended to create a new ContainerRuntimeConfig
CR only to modify a different machine config pool, or for changes that are intended to be temporary so that you can revert the changes.
You can create multiple ContainerRuntimeConfig
CRs, as needed, with a limit of 10 per cluster. For the first ContainerRuntimeConfig
CR, the MCO creates a machine config appended with containerruntime
. With each subsequent CR, the controller creates a new containerruntime
machine config with a numeric suffix. For example, if you have a containerruntime
machine config with a -2
suffix, the next containerruntime
machine config is appended with -3
.
If you want to delete the machine configs, you should delete them in reverse order to avoid exceeding the limit. For example, you should delete the containerruntime-3
machine config before deleting the containerruntime-2
machine config.
If you have a machine config with a |
Example showing multiple ContainerRuntimeConfig
CRs
$ oc get ctrcfg
Example output
NAME AGE
ctr-overlay 15m
ctr-level 5m45s
Example showing multiple containerruntime
machine configs
$ oc get mc | grep container
Example output
...
01-master-container-runtime b5c5119de007945b6fe6fb215db3b8e2ceb12511 3.2.0 57m
...
01-worker-container-runtime b5c5119de007945b6fe6fb215db3b8e2ceb12511 3.2.0 57m
...
99-worker-generated-containerruntime b5c5119de007945b6fe6fb215db3b8e2ceb12511 3.2.0 26m
99-worker-generated-containerruntime-1 b5c5119de007945b6fe6fb215db3b8e2ceb12511 3.2.0 17m
99-worker-generated-containerruntime-2 b5c5119de007945b6fe6fb215db3b8e2ceb12511 3.2.0 7m26s
...
The following example sets the log_level
field to debug
and sets the overlay size to 8 GB:
Example ContainerRuntimeConfig
CR
apiVersion: machineconfiguration.openshift.io/v1
kind: ContainerRuntimeConfig
metadata:
name: overlay-size
spec:
machineConfigPoolSelector:
matchLabels:
pools.operator.machineconfiguration.openshift.io/worker: '' (1)
containerRuntimeConfig:
logLevel: debug (2)
overlaySize: 8G (3)
defaultRuntime: "crun" (4)
1 | Specifies the machine config pool label. |
2 | Optional: Specifies the level of verbosity for log messages. |
3 | Optional: Specifies the maximum size of a container image. |
4 | Optional: Specifies the container runtime to deploy to new containers. The default value is runc . |
Procedure
To change CRI-O settings using the ContainerRuntimeConfig
CR:
Create a YAML file for the
ContainerRuntimeConfig
CR:apiVersion: machineconfiguration.openshift.io/v1
kind: ContainerRuntimeConfig
metadata:
name: overlay-size
spec:
machineConfigPoolSelector:
matchLabels:
pools.operator.machineconfiguration.openshift.io/worker: '' (1)
containerRuntimeConfig: (2)
logLevel: debug
overlaySize: 8G
1 Specify a label for the machine config pool that you want you want to modify. 2 Set the parameters as needed. Create the
ContainerRuntimeConfig
CR:$ oc create -f <file_name>.yaml
Verify that the CR is created:
$ oc get ContainerRuntimeConfig
Example output
NAME AGE
overlay-size 3m19s
Check that a new
containerruntime
machine config is created:$ oc get machineconfigs | grep containerrun
Example output
99-worker-generated-containerruntime 2c9371fbb673b97a6fe8b1c52691999ed3a1bfc2 3.2.0 31s
Monitor the machine config pool until all are shown as ready:
$ oc get mcp worker
Example output
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
worker rendered-worker-169 False True False 3 1 1 0 9h
Verify that the settings were applied in CRI-O:
Open an
oc debug
session to a node in the machine config pool and runchroot /host
.$ oc debug node/<node_name>
sh-4.4# chroot /host
Verify the changes in the
crio.conf
file:sh-4.4# crio config | grep 'log_level'
Example output
log_level = "debug"
Verify the changes in the `storage.conf`file:
sh-4.4# head -n 7 /etc/containers/storage.conf
Example output
[storage]
driver = "overlay"
runroot = "/var/run/containers/storage"
graphroot = "/var/lib/containers/storage"
[storage.options]
additionalimagestores = []
size = "8G"
Setting the default maximum container root partition size for Overlay with CRI-O
The root partition of each container shows all of the available disk space of the underlying host. Follow this guidance to set a maximum partition size for the root disk of all containers.
To configure the maximum Overlay size, as well as other CRI-O options like the log level, you can create the following ContainerRuntimeConfig
custom resource definition (CRD):
apiVersion: machineconfiguration.openshift.io/v1
kind: ContainerRuntimeConfig
metadata:
name: overlay-size
spec:
machineConfigPoolSelector:
matchLabels:
custom-crio: overlay-size
containerRuntimeConfig:
logLevel: debug
overlaySize: 8G
Procedure
Create the configuration object:
$ oc apply -f overlaysize.yml
To apply the new CRI-O configuration to your worker nodes, edit the worker machine config pool:
$ oc edit machineconfigpool worker
Add the
custom-crio
label based on thematchLabels
name you set in theContainerRuntimeConfig
CRD:apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
creationTimestamp: "2020-07-09T15:46:34Z"
generation: 3
labels:
custom-crio: overlay-size
machineconfiguration.openshift.io/mco-built-in: ""
Save the changes, then view the machine configs:
$ oc get machineconfigs
New
99-worker-generated-containerruntime
andrendered-worker-xyz
objects are created:Example output
99-worker-generated-containerruntime 4173030d89fbf4a7a0976d1665491a4d9a6e54f1 3.2.0 7m42s
rendered-worker-xyz 4173030d89fbf4a7a0976d1665491a4d9a6e54f1 3.2.0 7m36s
After those objects are created, monitor the machine config pool for the changes to be applied:
$ oc get mcp worker
The worker nodes show
UPDATING
asTrue
, as well as the number of machines, the number updated, and other details:Example output
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
worker rendered-worker-xyz False True False 3 2 2 0 20h
When complete, the worker nodes transition back to
UPDATING
asFalse
, and theUPDATEDMACHINECOUNT
number matches theMACHINECOUNT
:Example output
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
worker rendered-worker-xyz True False False 3 3 3 0 20h
Looking at a worker machine, you see that the new 8 GB max size configuration is applied to all of the workers:
Example output
head -n 7 /etc/containers/storage.conf
[storage]
driver = "overlay"
runroot = "/var/run/containers/storage"
graphroot = "/var/lib/containers/storage"
[storage.options]
additionalimagestores = []
size = "8G"
Looking inside a container, you see that the root partition is now 8 GB:
Example output
~ $ df -h
Filesystem Size Used Available Use% Mounted on
overlay 8.0G 8.0K 8.0G 0% /