Prometheus Monitoring
Prometheus is a widely popular tool for monitoring and alerting a wide variety of systems. Dask.distributed exposesscheduler and worker metrics in a prometheus text based format. Metrics are available at http://scheduler-address:8787/metrics
.
Available metrics are as following
Metric name | Description | Scheduler | Worker |
---|---|---|---|
python_gc_objects_collected_total | Objects collected during gc. | Yes | Yes |
python_gc_objects_uncollectable_total | Uncollectable object found during GC. | Yes | Yes |
python_gc_collections_total | Number of times this generation was collected. | Yes | Yes |
python_info | Python platform information. | Yes | Yes |
dask_scheduler_workers | Number of workers connected. | Yes | |
dask_scheduler_clients | Number of clients connected. | Yes | |
dask_scheduler_tasks | Number of tasks at scheduler. | Yes | |
dask_worker_tasks | Number of tasks at worker. | Yes | |
dask_worker_connections | Number of task connections to other workers. | Yes | |
dask_worker_threads | Number of worker threads. | Yes | |
dask_worker_latency_seconds | Latency of worker connection. | Yes | |
dask_worker_tick_duration_median_seconds | Median tick duration at worker. | Yes | |
dask_worker_task_duration_median_seconds | Median task runtime at worker. | Yes | |
dask_worker_transfer_bandwidth_median_bytes | Bandwidth for transfer at worker in Bytes. | Yes |