Understanding the File Integrity Operator
The File Integrity Operator is an OKD Operator that continually runs file integrity checks on the cluster nodes. It deploys a daemon set that initializes and runs privileged advanced intrusion detection environment (AIDE) containers on each node, providing a status object with a log of files that are modified during the initial run of the daemon set pods.
Currently, only Fedora CoreOS (FCOS) nodes are supported. |
Creating the FileIntegrity custom resource
An instance of a FileIntegrity
custom resource (CR) represents a set of continuous file integrity scans for one or more nodes.
Each FileIntegrity
CR is backed by a daemon set running AIDE on the nodes matching the FileIntegrity
CR specification.
Procedure
Create the following example
FileIntegrity
CR namedworker-fileintegrity.yaml
to enable scans on worker nodes:Example FileIntegrity CR
apiVersion: fileintegrity.openshift.io/v1alpha1
kind: FileIntegrity
metadata:
name: worker-fileintegrity
namespace: openshift-file-integrity
spec:
nodeSelector: (1)
node-role.kubernetes.io/worker: ""
tolerations: (2)
- key: "myNode"
operator: "Exists"
effect: "NoSchedule"
config: (3)
name: "myconfig"
namespace: "openshift-file-integrity"
key: "config"
gracePeriod: 20 (4)
maxBackups: 5 (5)
initialDelay: 60 (6)
debug: false
status:
phase: Active (7)
1 Defines the selector for scheduling node scans. 2 Specify tolerations
to schedule on nodes with custom taints. When not specified, a default toleration allowing running on main and infra nodes is applied.3 Define a ConfigMap
containing an AIDE configuration to use.4 The number of seconds to pause in between AIDE integrity checks. Frequent AIDE checks on a node might be resource intensive, so it can be useful to specify a longer interval. Default is 900 seconds (15 minutes). 5 The maximum number of AIDE database and log backups (leftover from the re-init process) to keep on a node. Older backups beyond this number are automatically pruned by the daemon. Default is set to 5. 6 The number of seconds to wait before starting the first AIDE integrity check. Default is set to 0. 7 The running status of the FileIntegrity
instance. Statuses areInitializing
,Pending
, orActive
.Initializing
The
FileIntegrity
object is currently initializing or re-initializing the AIDE database.Pending
The
FileIntegrity
deployment is still being created.Active
The scans are active and ongoing.
Apply the YAML file to the
openshift-file-integrity
namespace:$ oc apply -f worker-fileintegrity.yaml -n openshift-file-integrity
Verification
Confirm the
FileIntegrity
object was created successfully by running the following command:$ oc get fileintegrities -n openshift-file-integrity
Example output
NAME AGE
worker-fileintegrity 14s
Checking the FileIntegrity custom resource status
The FileIntegrity
custom resource (CR) reports its status through the .status.phase
subresource.
Procedure
To query the
FileIntegrity
CR status, run:$ oc get fileintegrities/worker-fileintegrity -o jsonpath="{ .status.phase }"
Example output
Active
FileIntegrity custom resource phases
Pending
- The phase after the custom resource (CR) is created.Active
- The phase when the backing daemon set is up and running.Initializing
- The phase when the AIDE database is being reinitialized.
Understanding the FileIntegrityNodeStatuses object
The scan results of the FileIntegrity
CR are reported in another object called FileIntegrityNodeStatuses
.
$ oc get fileintegritynodestatuses
Example output
NAME AGE
worker-fileintegrity-ip-10-0-130-192.ec2.internal 101s
worker-fileintegrity-ip-10-0-147-133.ec2.internal 109s
worker-fileintegrity-ip-10-0-165-160.ec2.internal 102s
It might take some time for the |
There is one result object per node. The nodeName
attribute of each FileIntegrityNodeStatus
object corresponds to the node being scanned. The status of the file integrity scan is represented in the results
array, which holds scan conditions.
$ oc get fileintegritynodestatuses.fileintegrity.openshift.io -ojsonpath='{.items[*].results}' | jq
The fileintegritynodestatus
object reports the latest status of an AIDE run and exposes the status as Failed
, Succeeded
, or Errored
in a status
field.
$ oc get fileintegritynodestatuses -w
Example output
NAME NODE STATUS
example-fileintegrity-ip-10-0-134-186.us-east-2.compute.internal ip-10-0-134-186.us-east-2.compute.internal Succeeded
example-fileintegrity-ip-10-0-150-230.us-east-2.compute.internal ip-10-0-150-230.us-east-2.compute.internal Succeeded
example-fileintegrity-ip-10-0-169-137.us-east-2.compute.internal ip-10-0-169-137.us-east-2.compute.internal Succeeded
example-fileintegrity-ip-10-0-180-200.us-east-2.compute.internal ip-10-0-180-200.us-east-2.compute.internal Succeeded
example-fileintegrity-ip-10-0-194-66.us-east-2.compute.internal ip-10-0-194-66.us-east-2.compute.internal Failed
example-fileintegrity-ip-10-0-222-188.us-east-2.compute.internal ip-10-0-222-188.us-east-2.compute.internal Succeeded
example-fileintegrity-ip-10-0-134-186.us-east-2.compute.internal ip-10-0-134-186.us-east-2.compute.internal Succeeded
example-fileintegrity-ip-10-0-222-188.us-east-2.compute.internal ip-10-0-222-188.us-east-2.compute.internal Succeeded
example-fileintegrity-ip-10-0-194-66.us-east-2.compute.internal ip-10-0-194-66.us-east-2.compute.internal Failed
example-fileintegrity-ip-10-0-150-230.us-east-2.compute.internal ip-10-0-150-230.us-east-2.compute.internal Succeeded
example-fileintegrity-ip-10-0-180-200.us-east-2.compute.internal ip-10-0-180-200.us-east-2.compute.internal Succeeded
FileIntegrityNodeStatus CR status types
These conditions are reported in the results array of the corresponding FileIntegrityNodeStatus
CR status:
Succeeded
- The integrity check passed; the files and directories covered by the AIDE check have not been modified since the database was last initialized.Failed
- The integrity check failed; some files or directories covered by the AIDE check have been modified since the database was last initialized.Errored
- The AIDE scanner encountered an internal error.
FileIntegrityNodeStatus CR success example
Example output of a condition with a success status
[
{
"condition": "Succeeded",
"lastProbeTime": "2020-09-15T12:45:57Z"
}
]
[
{
"condition": "Succeeded",
"lastProbeTime": "2020-09-15T12:46:03Z"
}
]
[
{
"condition": "Succeeded",
"lastProbeTime": "2020-09-15T12:45:48Z"
}
]
In this case, all three scans succeeded and so far there are no other conditions.
FileIntegrityNodeStatus CR failure status example
To simulate a failure condition, modify one of the files AIDE tracks. For example, modify /etc/resolv.conf
on one of the worker nodes:
$ oc debug node/ip-10-0-130-192.ec2.internal
Example output
Creating debug namespace/openshift-debug-node-ldfbj ...
Starting pod/ip-10-0-130-192ec2internal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.130.192
If you don't see a command prompt, try pressing enter.
sh-4.2# echo "# integrity test" >> /host/etc/resolv.conf
sh-4.2# exit
Removing debug pod ...
Removing debug namespace/openshift-debug-node-ldfbj ...
After some time, the Failed
condition is reported in the results array of the corresponding FileIntegrityNodeStatus
object. The previous Succeeded
condition is retained, which allows you to pinpoint the time the check failed.
$ oc get fileintegritynodestatuses.fileintegrity.openshift.io/worker-fileintegrity-ip-10-0-130-192.ec2.internal -ojsonpath='{.results}' | jq -r
Alternatively, if you are not mentioning the object name, run:
$ oc get fileintegritynodestatuses.fileintegrity.openshift.io -ojsonpath='{.items[*].results}' | jq
Example output
[
{
"condition": "Succeeded",
"lastProbeTime": "2020-09-15T12:54:14Z"
},
{
"condition": "Failed",
"filesChanged": 1,
"lastProbeTime": "2020-09-15T12:57:20Z",
"resultConfigMapName": "aide-ds-worker-fileintegrity-ip-10-0-130-192.ec2.internal-failed",
"resultConfigMapNamespace": "openshift-file-integrity"
}
]
The Failed
condition points to a config map that gives more details about what exactly failed and why:
$ oc describe cm aide-ds-worker-fileintegrity-ip-10-0-130-192.ec2.internal-failed
Example output
Name: aide-ds-worker-fileintegrity-ip-10-0-130-192.ec2.internal-failed
Namespace: openshift-file-integrity
Labels: file-integrity.openshift.io/node=ip-10-0-130-192.ec2.internal
file-integrity.openshift.io/owner=worker-fileintegrity
file-integrity.openshift.io/result-log=
Annotations: file-integrity.openshift.io/files-added: 0
file-integrity.openshift.io/files-changed: 1
file-integrity.openshift.io/files-removed: 0
Data
integritylog:
------
AIDE 0.15.1 found differences between database and filesystem!!
Start timestamp: 2020-09-15 12:58:15
Summary:
Total number of files: 31553
Added files: 0
Removed files: 0
Changed files: 1
---------------------------------------------------
Changed files:
---------------------------------------------------
changed: /hostroot/etc/resolv.conf
---------------------------------------------------
Detailed information about changes:
---------------------------------------------------
File: /hostroot/etc/resolv.conf
SHA512 : sTQYpB/AL7FeoGtu/1g7opv6C+KT1CBJ , qAeM+a8yTgHPnIHMaRlS+so61EN8VOpg
Events: <none>
Due to the config map data size limit, AIDE logs over 1 MB are added to the failure config map as a base64-encoded gzip archive. In this case, you want to pipe the output of the above command to base64 --decode | gunzip
. Compressed logs are indicated by the presence of a file-integrity.openshift.io/compressed
annotation key in the config map.
Understanding events
Transitions in the status of the FileIntegrity
and FileIntegrityNodeStatus
objects are logged by events. The creation time of the event reflects the latest transition, such as Initializing
to Active
, and not necessarily the latest scan result. However, the newest event always reflects the most recent status.
$ oc get events --field-selector reason=FileIntegrityStatus
Example output
LAST SEEN TYPE REASON OBJECT MESSAGE
97s Normal FileIntegrityStatus fileintegrity/example-fileintegrity Pending
67s Normal FileIntegrityStatus fileintegrity/example-fileintegrity Initializing
37s Normal FileIntegrityStatus fileintegrity/example-fileintegrity Active
When a node scan fails, an event is created with the add/changed/removed
and config map information.
$ oc get events --field-selector reason=NodeIntegrityStatus
Example output
LAST SEEN TYPE REASON OBJECT MESSAGE
114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-134-173.ec2.internal
114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-168-238.ec2.internal
114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-169-175.ec2.internal
114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-152-92.ec2.internal
114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-158-144.ec2.internal
114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-131-30.ec2.internal
87m Warning NodeIntegrityStatus fileintegrity/example-fileintegrity node ip-10-0-152-92.ec2.internal has changed! a:1,c:1,r:0 \ log:openshift-file-integrity/aide-ds-example-fileintegrity-ip-10-0-152-92.ec2.internal-failed
Changes to the number of added, changed, or removed files results in a new event, even if the status of the node has not transitioned.
$ oc get events --field-selector reason=NodeIntegrityStatus
Example output
LAST SEEN TYPE REASON OBJECT MESSAGE
114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-134-173.ec2.internal
114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-168-238.ec2.internal
114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-169-175.ec2.internal
114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-152-92.ec2.internal
114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-158-144.ec2.internal
114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-131-30.ec2.internal
87m Warning NodeIntegrityStatus fileintegrity/example-fileintegrity node ip-10-0-152-92.ec2.internal has changed! a:1,c:1,r:0 \ log:openshift-file-integrity/aide-ds-example-fileintegrity-ip-10-0-152-92.ec2.internal-failed
40m Warning NodeIntegrityStatus fileintegrity/example-fileintegrity node ip-10-0-152-92.ec2.internal has changed! a:3,c:1,r:0 \ log:openshift-file-integrity/aide-ds-example-fileintegrity-ip-10-0-152-92.ec2.internal-failed