List of Metrics
There are two types of metrics in Alluxio, cluster-wide aggregated metrics, and per-process detailed metrics.
Cluster metrics are collected and calculated by the leading master and displayed in the metrics tab of the web UI. These metrics are designed to provide a snapshot of the cluster state and the overall amount of data and metadata served by Alluxio.
Process metrics are collected by each Alluxio process and exposed in a machine-readable format through any configured sinks. Process metrics are highly detailed and are intended to be consumed by third-party monitoring tools. Users can then view fine-grained dashboards with time-series graphs of each metric, such as data transferred or the number of RPC invocations.
Metrics in Alluxio have the following format for master node metrics:
Master.[metricName].[tag1].[tag2]...
Metrics in Alluxio have the following format for non-master node metrics:
[processType].[metricName].[tag1].[tag2]...[hostName]
There is generally an Alluxio metric for every RPC invocation, to Alluxio or to the under store.
Tags are additional pieces of metadata for the metric such as user name or under storage location. Tags can be used to further filter or aggregate on various characteristics.
Cluster Metrics
Workers and clients send metrics data to the Alluxio master through heartbeats. The interval is defined by property alluxio.master.worker.heartbeat.interval
and alluxio.user.metrics.heartbeat.interval
respectively.
Bytes metrics are aggregated value from workers or clients. Bytes throughput metrics are calculated on the leading master. The values of bytes throughput metrics equal to bytes metrics counter value divided by the metrics record time and shown as bytes per minute.
Name | Type | Description |
---|
Cluster.BytesReadAlluxio | COUNTER | Total number of bytes read from Alluxio storage reported by all workers. This does not include UFS reads. |
Cluster.BytesReadAlluxioThroughput | GAUGE | Bytes read throughput from Alluxio storage by all workers |
Cluster.BytesReadDomain | COUNTER | Total number of bytes read from Alluxio storage via domain socket reported by all workers |
Cluster.BytesReadDomainThroughput | GAUGE | Bytes read throughput from Alluxio storage via domain socket by all workers |
Cluster.BytesReadLocal | COUNTER | Total number of bytes short-circuit read from local storage by all clients |
Cluster.BytesReadLocalThroughput | GAUGE | Bytes throughput short-circuit read from local storage by all clients |
Cluster.BytesReadPerUfs | COUNTER | Total number of bytes read from a specific UFS by all workers |
Cluster.BytesReadUfsAll | COUNTER | Total number of bytes read from a all Alluxio UFSes by all workers |
Cluster.BytesReadUfsThroughput | GAUGE | Bytes read throughput from all Alluxio UFSes by all workers |
Cluster.BytesWrittenAlluxio | COUNTER | Total number of bytes written to Alluxio storage in all workers. This does not include UFS writes |
Cluster.BytesWrittenAlluxioThroughput | GAUGE | Bytes write throughput to Alluxio storage in all workers |
Cluster.BytesWrittenDomain | COUNTER | Total number of bytes written to Alluxio storage via domain socket by all workers |
Cluster.BytesWrittenDomainThroughput | GAUGE | Throughput of bytes written to Alluxio storage via domain socket by all workers |
Cluster.BytesWrittenLocal | COUNTER | Total number of bytes short-circuit written to local storage by all clients |
Cluster.BytesWrittenLocalThroughput | GAUGE | Bytes throughput written to local storage by all clients |
Cluster.BytesWrittenPerUfs | COUNTER | Total number of bytes written to a specific Alluxio UFS by all workers |
Cluster.BytesWrittenUfsAll | COUNTER | Total number of bytes written to all Alluxio UFSes by all workers |
Cluster.BytesWrittenUfsThroughput | GAUGE | Bytes write throughput to all Alluxio UFSes by all workers |
Cluster.CapacityFree | GAUGE | Total free bytes on all tiers, on all workers of Alluxio |
Cluster.CapacityTotal | GAUGE | Total capacity (in bytes) on all tiers, on all workers of Alluxio |
Cluster.CapacityUsed | GAUGE | Total used bytes on all tiers, on all workers of Alluxio |
Cluster.RootUfsCapacityFree | GAUGE | Free capacity of the Alluxio root UFS in bytes |
Cluster.RootUfsCapacityTotal | GAUGE | Total capacity of the Alluxio root UFS in bytes |
Cluster.RootUfsCapacityUsed | GAUGE | Used capacity of the Alluxio root UFS in bytes |
Cluster.Workers | GAUGE | Total number of active workers inside the cluster |
Master Metrics
Default master metrics:
Name | Type | Description |
---|
Master.CompleteFileOps | COUNTER | Total number of the CompleteFile operations |
Master.CreateDirectoryOps | COUNTER | Total number of the CreateDirectory operations |
Master.CreateFileOps | COUNTER | Total number of the CreateFile operations |
Master.DeletePathOps | COUNTER | Total number of the Delete operations |
Master.DirectoriesCreated | COUNTER | Total number of the succeed CreateDirectory operations |
Master.EdgeCacheSize | GAUGE | Total number of edges (inode metadata) cached. The edge cache is responsible for managing the mapping from (parentId, childName) to childId. |
Master.FileBlockInfosGot | COUNTER | Total number of succeed GetFileBlockInfo operations |
Master.FileInfosGot | COUNTER | Total number of the succeed GetFileInfo operations |
Master.FilesCompleted | COUNTER | Total number of the succeed CompleteFile operations |
Master.FilesCreated | COUNTER | Total number of the succeed CreateFile operations |
Master.FilesFreed | COUNTER | Total number of succeed FreeFile operations |
Master.FilesPersisted | COUNTER | Total number of successfully persisted files |
Master.FilesPinned | GAUGE | Total number of currently pinned files |
Master.FreeFileOps | COUNTER | Total number of FreeFile operations |
Master.GetFileBlockInfoOps | COUNTER | Total number of GetFileBlockInfo operations |
Master.GetFileInfoOps | COUNTER | Total number of the GetFileInfo operations |
Master.GetNewBlockOps | COUNTER | Total number of the GetNewBlock operations |
Master.InodeCacheSize | GAUGE | Total number of inodes (inode metadata) cached |
Master.JournalFlushFailure | COUNTER | Total number of failed journal flush |
Master.JournalFlushTimer | TIMER | The timer statistics of journal flush |
Master.JournalGainPrimacyTimer | TIMER | The timer statistics of journal gain primacy |
Master.LastBackupEntriesCount | GAUGE | The total number of entries written in the last leading master metadata backup |
Master.LastBackupRestoreCount | GAUGE | The total number of entries restored from backup when a leading master initializes its metadata |
Master.LastBackupRestoreTimeMs | GAUGE | The process time of the last restore from backup |
Master.LastBackupTimeMs | GAUGE | The process time of the last backup |
Master.ListingCacheSize | GAUGE | The size of master listing cache |
Master.MountOps | COUNTER | Total number of Mount operations |
Master.NewBlocksGot | COUNTER | Total number of the succeed GetNewBlock operations |
Master.PathsDeleted | COUNTER | Total number of the succeed Delete operations |
Master.PathsMounted | COUNTER | Total number of succeed Mount operations |
Master.PathsRenamed | COUNTER | Total number of succeed Rename operations |
Master.PathsUnmounted | COUNTER | Total number of succeed Unmount operations |
Master.RenamePathOps | COUNTER | Total number of Rename operations |
Master.SetAclOps | COUNTER | Total number of SetAcl operations |
Master.SetAttributeOps | COUNTER | Total number of SetAttribute operations |
Master.TotalPaths | GAUGE | Total number of files and directory in Alluxio namespace |
Master.UfsJournalFailureRecoverTime | TIMER | The timer statistics of ufs journal failure recover |
Master.UnmountOps | COUNTER | Total number of Unmount operations |
Dynamically generated master metrics:
Metric Name | Description |
---|
Master.CapacityTotalTier | Total capacity in tier of the Alluxio file system in bytes |
Master.CapacityUsedTier | Used capacity in tier of the Alluxio file system in bytes |
Master.CapacityFreeTier | Free capacity in tier of the Alluxio file system in bytes |
Master.UfsSessionCount-Ufs: | The total number of currently opened UFS sessions to connect to the given |
Master..UFS:.UFS_TYPE:.User: | The details UFS rpc operation done by the current master |
Master.PerUfsOp.UFS: | The aggregated number of UFS operation ran on UFS by leading master |
Master. | The duration statistics of RPC calls exposed on leading master |
Worker Metrics
Default master metrics:
Name | Type | Description |
---|
Worker.AsyncCacheDuplicateRequests | COUNTER | Total number of duplicated async cache request received by this worker |
Worker.AsyncCacheFailedBlocks | COUNTER | Total number of async cache failed blocks in this worker |
Worker.AsyncCacheRemoteBlocks | COUNTER | Total number of blocks that need to be async cached from remote source |
Worker.AsyncCacheRequests | COUNTER | Total number of async cache request received by this worker |
Worker.AsyncCacheSucceededBlocks | COUNTER | Total number of async cache succeeded blocks in this worker |
Worker.AsyncCacheUfsBlocks | COUNTER | Total number of blocks that need to be async cached from local source |
Worker.BlocksAccessed | COUNTER | Total number of times any one of the blocks in this worker is accessed. |
Worker.BlocksCached | GAUGE | Total number of blocks used for caching data in an Alluxio worker |
Worker.BlocksCancelled | COUNTER | Total number of aborted temporary blocks in this worker. |
Worker.BlocksDeleted | COUNTER | Total number of deleted blocks in this worker by external requests. |
Worker.BlocksEvicted | COUNTER | Total number of evicted blocks in this worker. |
Worker.BlocksLost | COUNTER | Total number of lost blocks in this worker. |
Worker.BlocksPromoted | COUNTER | Total number of times any one of the blocks in this worker moved to a new tier. |
Worker.BytesReadAlluxio | COUNTER | Total number of bytes read from Alluxio storage managed by this worker. This does not include UFS reads. |
Worker.BytesReadAlluxioThroughput | METER | Bytes read throughput from Alluxio storage by this worker |
Worker.BytesReadDomain | COUNTER | Total number of bytes read from Alluxio storage via domain socket by this worker |
Worker.BytesReadDomainThroughput | METER | Bytes read throughput from Alluxio storage via domain socket by this worker |
Worker.BytesReadPerUfs | COUNTER | Total number of bytes read from a specific Alluxio UFS by this worker |
Worker.BytesReadUfsThroughput | METER | Bytes read throughput from all Alluxio UFSes by this worker |
Worker.BytesWrittenAlluxio | COUNTER | Total number of bytes written to Alluxio storage by this worker. This does not include UFS writes |
Worker.BytesWrittenAlluxioThroughput | METER | Bytes write throughput to Alluxio storage by this worker |
Worker.BytesWrittenDomain | COUNTER | Total number of bytes written to Alluxio storage via domain socket by this worker |
Worker.BytesWrittenDomainThroughput | METER | Throughput of bytes written to Alluxio storage via domain socket by this worker |
Worker.BytesWrittenPerUfs | COUNTER | Total number of bytes written to a specific Alluxio UFS by this worker |
Worker.BytesWrittenUfsThroughput | METER | Bytes write throughput to all Alluxio UFSes by this worker |
Worker.CapacityFree | GAUGE | Total free bytes on all tiers of a specific Alluxio worker |
Worker.CapacityTotal | GAUGE | Total capacity (in bytes) on all tiers of a specific Alluxio worker |
Worker.CapacityUsed | GAUGE | Total used bytes on all tiers of a specific Alluxio worker |
Dynamically generated master metrics:
Metric Name | Description |
---|
Worker.UfsSessionCount-Ufs: | The total number of currently opened UFS sessions to connect to the given |
Worker. | The duration statistics of RPC calls exposed on workers |
Client Metrics
Each client metric will be recorded with its local hostname or alluxio.user.app.id
is configured. If alluxio.user.app.id
is configured, multiple clients can be combined into a logical application.
Name | Type | Description |
---|
Client.BytesReadLocal | COUNTER | Total number of bytes short-circuit read from local storage by this client |
Client.BytesReadLocalThroughput | METER | Bytes throughput short-circuit read from local storage by this client |
Client.BytesWrittenLocal | COUNTER | Total number of bytes short-circuit written to local storage by this client |
Client.BytesWrittenLocalThroughput | METER | Bytes throughput short-circuit written to local storage by this client |
Client.BytesWrittenUfs | COUNTER | Total number of bytes write to Alluxio UFS by this client |
Client.CacheBytesEvicted | METER | Total number of bytes evicted from the client cache. |
Client.CacheBytesReadCache | METER | Total number of bytes read from the client cache. |
Client.CacheBytesReadExternal | METER | Total number of bytes read from external storage due to a cache miss on the client cache. |
Client.CacheBytesRequestedExternal | METER | Total number of bytes the user requested to read which resulted in a cache miss. This number may be smaller than Client.CacheBytesReadExternal due to chunk reads. |
Client.CacheBytesWrittenCache | METER | Total number of bytes written to the client cache. |
Client.CacheCleanupGetErrors | COUNTER | Number of failures when cleaning up a failed cache read. |
Client.CacheCleanupPutErrors | COUNTER | Number of failures when cleaning up a failed cache write. |
Client.CacheCreateErrors | COUNTER | Number of failures when creating a cache in the client cache. |
Client.CacheDeleteErrors | COUNTER | Number of failures when deleting cached data in the client cache. |
Client.CacheDeleteNonExistingPageErrors | COUNTER | Number of failures when deleting pages due to absence. |
Client.CacheDeleteNotReadyErrors | COUNTER | Number of failures when when cache is not ready to delete pages. |
Client.CacheDeleteStoreDeleteErrors | COUNTER | Number of failures when deleting pages due to failed delete in page stores. |
Client.CacheGetErrors | COUNTER | Number of failures when getting cached data in the client cache. |
Client.CacheGetNotReadyErrors | COUNTER | Number of failures when cache is not ready to get pages. |
Client.CacheGetStoreReadErrors | COUNTER | Number of failures when getting cached data in the client cache due to failed read from page stores. |
Client.CacheHitRate | GAUGE | Cache hit rate: (# bytes read from cache) / (# bytes requested). |
Client.CachePages | COUNTER | Total number of pages in the client cache. |
Client.CachePagesEvicted | METER | Total number of pages evicted from the client cache. |
Client.CachePutAsyncRejectionErrors | COUNTER | Number of failures when putting cached data in the client cache due to failed injection to async write queue. |
Client.CachePutBenignRacingErrors | COUNTER | Number of failures when adding pages due to racing eviction. This error is benign. |
Client.CachePutErrors | COUNTER | Number of failures when putting cached data in the client cache. |
Client.CachePutEvictionErrors | COUNTER | Number of failures when putting cached data in the client cache due to failed eviction. |
Client.CachePutNotReadyErrors | COUNTER | Number of failures when cache is not ready to add pages. |
Client.CachePutStoreDeleteErrors | COUNTER | Number of failures when putting cached data in the client cache due to failed deletes in page store. |
Client.CachePutStoreWriteErrors | COUNTER | Number of failures when putting cached data in the client cache due to failed writes to page store. |
Client.CacheSpaceAvailable | GAUGE | Amount of bytes available in the client cache. |
Client.CacheSpaceUsed | GAUGE | Amount of bytes used by the client cache. |
Client.CacheSpaceUsedCount | COUNTER | Amount of bytes used by the client cache as a counter. |
Client.CacheState | COUNTER | State of the cache: 0 (NOT_IN_USE), 1 (READ_ONLY) and 2 (READ_WRITE) |
Client.CacheUnremovableFiles | COUNTER | Amount of bytes unusable managed by the client cache. |
Process Common Metrics
The following metrics are collected on each instance (Master, Worker or Client).
JVM Attributes
Metric Name | Description |
---|
name | The name of the JVM |
uptime | The uptime of the JVM |
vendor | The current JVM vendor |
Garbage Collector Statistics
Metric Name | Description |
---|
PS-MarkSweep.count | Total number of mark and sweep |
PS-MarkSweep.time | The time used to mark and sweep |
PS-Scavenge.count | Total number of scavenge |
PS-Scavenge.time | The time used to scavenge |
Memory Usage
Alluxio provides overall and detailed memory usage information. Detailed memory usage information of code cache, compressed class space, metaspace, PS Eden space, PS old gen, and PS survivor space is collected in each process.
A subset of the memory usage metrics are listed as following:
Metric Name | Description |
---|
total.committed | The amount of memory in bytes that is guaranteed to be available for use by the JVM |
total.init | The amount of the memory in bytes that is available for use by the JVM |
total.max | The maximum amount of memory in bytes that is available for use by the JVM |
total.used | The amount of memory currently used in bytes |
heap.committed | The amount of memory from heap area guaranteed to be available |
heap.init | The amount of memory from heap area available at initialization |
heap.max | The maximum amount of memory from heap area that is available |
heap.usage | The amount of memory from heap area currently used in GB |
heap.used | The amount of memory from heap area that has been used |
pools.Code-Cache.used | Used memory of collection usage from the pool from which memory is used for compilation and storage of native code |
pools.Compressed-Class-Space.used | Used memory of collection usage from the pool from which memory is use for class metadata |
pools.PS-Eden-Space.used | Used memory of collection usage from the pool from which memory is initially allocated for most objects |
pools.PS-Survivor-Space.used | Used memory of collection usage from the pool containing objects that have survived the garbage collection of the Eden space |