Metrics Reference

Understanding metrics collected by DC/OS

Mesosphere DC/OS collects basic system metrics–such as CPU and memory–for nodes and containers automatically. Mesosphere DC/OS also collects metadata about the different categories of metrics. For more information about the metadata metrics, see Dimensions.

You should note that automatically-collected metrics are only available for containers that provide endpoint statistics. For example, Docker containers do not provide networking data for DC/OS to consume, so the networking metrics that are available for UCR containers are not available for Docker containers.

Node

CPU and memory metrics

MetricDescription
cpu.idlePercentage of CPUs idle.
cpu.systemPercentage of system used.
cpu.totalPercentage of CPUs used.
cpu.userPercentage of CPU used by the user.
cpu.waitPercentage idle while waiting for an operation to complete.
load.1minLoad average for the past minute.
load.5minLoad average for the past 5 minutes.
load.15minLoad average for the past 15 minutes.
memory.buffersNumber of memory buffers.
memory.cachedAmount of cached memory.
memory.freeAmount of free memory in bytes.
memory.totalTotal memory in bytes.
process.countNumber of processes that are running.
swap.freeAmount of free swap space.
swap.totalTotal swap space.
swap.usedAmount of swap space used.
system.uptimeThe system uptime.

File system metrics

MetricDescription
filesystem.capacity.freeAmount of available capacity in bytes.
filesystem.capacity.totalTotal capacity in bytes.
filesystem.capacity.usedCapacity used in bytes.
filesystem.inode.freeAmount of available inodes in bytes.
filesystem.inode.totalTotal inodes in bytes.
filesystem.inode.usedInodes used in bytes.

NOTE: The tag path is automatically populated based on the mount path of the local filesystem (for example, /, /boot, etc).

Network interface metrics

MetricDescription
network.inNumber of bytes downloaded.
network.in.droppedNumber of downloaded bytes dropped.
network.in.errorsNumber of downloaded bytes in error.
network.in.packetsNumber of packets downloaded.
network.outNumber of bytes uploaded.
network.out.droppedNumber of uploaded bytes dropped.
network.out.errorsNumber of uploaded bytes in error.
network.out.packetsNumber of packets uploaded.

NOTE: The tag interface is automatically populated based on the type of the network interface (for example, spartan, d-dcos, minuteman, etc).

Process

The following per-process resource utilization metrics are collected.

MetricDescription
procstat.cpu_time_guestThe amount of time that the CPU is running a virtual CPU for a guest operating system.
procstat.cpu_time_guest_niceThe amount of time that the CPU is running a virtual CPU for a guest operating system, which is low-priority and can be interrupted by other processes.
procstat.cpu_time_idleThe amount of time that the CPU is idle.
procstat.cpu_time_iowaitThe amount of time that the CPU is waiting for I/O operations to complete.
procstat.cpu_time_irqThe amount of time that the CPU is servicing interrupts.
procstat.cpu_time_niceThe amount of time that the CPU is in user mode with low-priority processes, which can easily be interrupted by higher-priority processes.
procstat.cpu_time_soft_irqThe amount of time that the CPU is servicing software interrupts.
procstat.cpu_time_stealThe amount of time that the CPU is in stolen time, which is time spent in other operating systems in a virtualized environment.
procstat.cpu_time_systemThe amount of time that the CPU is in system mode.
procstat.cpu_time_userThe amount of time that the CPU is in user mode.
procstat.cpu_usageThe percentage of time that the process is active in any capacity.
procstat.involuntary_context_switchesThe number of times the process was involuntarily context-switched.
procstat.memory_dataThe amount of memory the process uses for data.
procstat.memory_lockedThe amount of memory the process has locked.
procstat.memory_rssThe amount of real memory (resident set) that the process is using.
procstat.memory_stackThe amount of stack memory the process is using.
procstat.memory_swapThe amount of swap memory the process is using.
procstat.memory_vmsThe amount of virtual memory the process is using.
procstat.nice_priorityThe current usage of nice priority for the process.
procstat.num_threadsThe number of threads in the process.
procstat.pidProcess identifier (ID).
procstat.realtime_priorityThe current usage of realtime priority for the process.
procstat.rlimit_cpu_time_hardThe hard resource limit on the process for memory used for data.
procstat.rlimit_cpu_time_softThe soft resource limit on the process for memory used for data.
procstat.rlimit_file_locks_hardThe hard file locks resource limit for the process.
procstat.rlimit_file_locks_softThe soft file locks resource limit for the process.
procstat.rlimit_memory_data_hardThe hard resource limit on the process for memory used for data.
procstat.rlimit_memory_data_softThe soft resource limit on the process for memory used for data.
procstat.rlimit_memory_locked_hardThe hard resource limit on the process for locked memory.
procstat.rlimit_memory_locked_softThe soft resource limit on the process for locked memory.
procstat.rlimit_memory_rss_hardThe hard resource limit on the process for physical memory.
procstat.rlimit_memory_rss_softThe soft resource limit on the process for physical memory.
procstat.rlimit_memory_stack_hardThe hard resource limit on the process stack.
procstat.rlimit_memory_stack_softThe soft resource limit on the process stack.
procstat.rlimit_memory_vms_hardThe hard resource limit on the process for virtual memory.
procstat.rlimit_memory_vms_softThe soft resource limit on the process for virtual memory.
procstat.rlimit_nice_priority_hardThe hard resource limit on the ceiling for the process’s nice priority value.
procstat.rlimit_nice_priority_softThe soft resource limit on the ceiling for the process’s nice priority value.
procstat.rlimit_num_fds_hardThe hard resource limit on the file descriptors for the process.
procstat.rlimit_num_fds_softThe soft resource limit on the file descriptors for the process.
procstat.rlimit_realtime_priority_hardThe hard resource limit on the ceiling for the process’s real-time priority value.
procstat.rlimit_realtime_priority_softThe soft resource limit on the ceiling for the process’s real-time priority value.
procstat.rlimit_signals_pending_hardThe hard resource limit on the number of signals that are pending for delivery to the process.
procstat.rlimit_signals_pending_softThe soft resource limit on the number of signals that are pending for delivery to the process.
procstat.signals_pendingThe number of signals pending to be handled by the process.
procstat.voluntary_context_switchesThe number of times the process was context-switched voluntarily.

Source: AWS DOCS - Collect Process Metrics with the procstat Plugin

Container

The following per-container resource utilization metrics are collected.

CPU usage metrics

MetricDescription
cpus.limitThe number of CPU shares allocated.
cpus.system_time_secsTotal CPU time spent in kernel mode in seconds.
cpus.throttled_time_secsTotal time, in seconds, that CPU was throttled.
cpus.user_time_secsTotal CPU time spent in user mode.

Disk metrics

MetricDescription
disk.limit_bytesHard capacity limit for disk in bytes.
disk.used_bytesHard capacity used in bytes.

Memory metrics

MetricDescription
mem.limit_bytesHard memory limit for a container.
mem.total_bytesTotal memory of a process in RAM (as opposed to in swap).

Network metrics

MetricDescription
net.rx.bytesBytes received.
net.rx.droppedPackets dropped on receive.
net.rx.errorsErrors reported on receive.
net.rx.packetsPackets received.
net.tx.bytesBytes sent.
net.tx.droppedPackets dropped on send.
net.tx.errorsErrors reported on send.
net.tx.packetsPackets sent.

Dimensions

Dimensions are metadata about the metrics. The following table lists the available dimensions and the entities where they appear.

DimensionDescriptionEntity
mesos_idThe Mesos ID of the node.node, container
cluster_idThe ID of the Mesos cluster.node, container
container_idThe ID of the container.metric, container
executor_nameThe name of the task executor.metric
framework_nameThe name of the framework.container
hostnameThe IP address of the node.container, node
labelsKey-value pairs describing the metric.container
task_nameThe task name.container

Read the following resource for more information on Metrics:

  1. Additional Mesos volume and Network metrics documentation.