Web Interface
Information about the current state of the network helps to track progress,identify performance issues, and debug failures.
Dask.distributed includes a web interface to help deliver this information overa normal web page in real time. This web interface is launched by defaultwherever the scheduler is launched if the scheduler machine has Bokehinstalled (conda install bokeh -c bokeh
).
These diagnostic pages are:
- Main Scheduler pages at
http://scheduler-address:8787
. These pages,particularly the/status
page are the main page that most peopleassociate with Dask. These pages are served from a separate standaloneBokeh server application running in a separate process.
The available pages are http://scheduler-address:8787/<page>/
where <page>
is one of
status
: a stream of recently run tasks, progress bars, resource usetasks
: a larger stream of the last 100k tasksworkers
: basic information about workers and their current loadhealth
: basic health check, returnsok
if service is running
Plots
Example Computation
The following plots show a trace of the following computation:
- from distributed import Client
- from time import sleep
- import random
- def inc(x):
- sleep(random.random() / 10)
- return x + 1
- def dec(x):
- sleep(random.random() / 10)
- return x - 1
- def add(x, y):
- sleep(random.random() / 10)
- return x + y
- client = Client('127.0.0.1:8786')
- incs = client.map(inc, range(100))
- decs = client.map(dec, range(100))
- adds = client.map(add, incs, decs)
- total = client.submit(sum, adds)
- del incs, decs, adds
- total.result()
Progress
The interface shows the progress of the various computations as well as theexact number completed.Each bar is assigned a color according to the function being run. Each barhas a few components. On the left the lighter shade is the number of tasksthat have both completed and have been released from memory. The darker shadeto the right corresponds to the tasks that are completed and whose data stillreside in memory. If errors occur then they appear as a black colored blockto the right.
Typical computations may involve dozens of kinds of functions. We handle thisvisually with the following approaches:
- Functions are ordered by the number of total tasks
- The colors are assigned in a round-robin fashion from a standard palette
- The progress bars shrink horizontally to make space for more functions
- Only the largest functions (in terms of number of tasks) are displayedCounts of tasks processing, waiting for dependencies, processing, etc.. aredisplayed in the title bar.
Memory Use
The interface shows the relative memory use of each function with a horizontalbar sorted by function name.The title shows the number of total bytes in use. Hovering over any bartells you the specific function and how many bytes its results are activelytaking up in memory. This does not count data that has been released.
Task Stream
The task stream plot shows when tasks complete on which workers. Worker coresare on the y-axis and time is on the x-axis. As a worker completes a task itsstart and end times are recorded and a rectangle is added to this plotaccordingly.The colors signifying the following:
- Serialization (gray)
- Communication between workers (red)
- Disk I/O (orange)
- Error (black)
- Execution times (colored by task: purple, green, yellow, etc)If data transfer occurs between workers a red bar appears preceding thetask bar showing the duration of the transfer. If an error occurs than ablack bar replaces the normal color. This plot show the last 1000 tasks.It resets if there is a delay greater than 10 seconds.
For a full history of the last 100,000 tasks see the tasks/
page.
Resources
The resources plot show the average CPU and Memory use over time as well asaverage network traffic. More detailed information on a per-worker basis isavailable in the workers/
page.
Per-worker resources
The workers/
page shows per-worker resources, the main ones being CPU andmemory use. Custom metrics can be registered and displayed in this page. Hereis an example showing how to display GPU utilization and GPU memory use:
- import subprocess
- def nvidia_data(name):
- def dask_function(dask_worker):
- cmd = 'nvidia-smi --query-gpu={} --format=csv,noheader'.format(name)
- result = subprocess.check_output(cmd.split())
- return result.strip().decode()
- return dask_function
- def register_metrics(dask_worker):
- for name in ['utilization.gpu', 'utilization.memory']:
- dask_worker.metrics[name] = nvidia_data(name)
- client.run(register_metrics)
Connecting to Web Interface
Default
By default, dask-scheduler
prints out the address of the web interface:
- INFO - Bokeh UI at: http://10.129.39.91:8787/status
- ...
- INFO - Starting Bokeh server on port 8787 with applications at paths ['/status', '/tasks']
The machine hosting the scheduler runs an HTTP server serving at that address.
Troubleshooting
Some clusters restrict the ports that are visible to the outside world. Theseports may include the default port for the web interface, 8787
. There area few ways to handle this:
- Open port
8787
to the outside world. Often this involves asking yourcluster administrator. - Use a different port that is publicly accessible using the
—dashboard-address :8787
option on thedask-scheduler
command. - Use fancier techniques, like Port ForwardingRunning distributed on a remote machine can cause issues with viewing the webUI – this depends on the remote machines network configuration.
Port Forwarding
If you have SSH access then one way to gain access to a blocked port is throughSSH port forwarding. A typical use case looks like the following:
- local$ ssh -L 8000:localhost:8787 [email protected]
- remote$ dask-scheduler # now, the web UI is visible at localhost:8000
- remote$ # continue to set up dask if needed -- add workers, etc
It is then possible to go to localhost:8000
and see Dask Web UI. This same approach isnot specific to dask.distributed, but can be used by any service that operates over anetwork, such as Jupyter notebooks. For example, if we chose to do this we couldforward port 8888 (the default Jupyter port) to port 8001 withssh -L 8001:localhost:8888 user@remote
.