Overview

This document gives examples and pointers on how to experiment with and extend
Dopamine.

You can find the documentation for each module in our codebase in our
API documentation.

File organization

Dopamine is organized as follows:

  • agents
    contains agent implementations.
  • atari
    contains Atari-specific code, including code to run experiments and
    preprocessing code.
  • common
    contains additional helper functionality, including logging and
    checkpointing.
  • replay_memory
    contains the replay memory schemes used in Dopamine.
  • colab
    contains code used to inspect the results of experiments, as well as example
    colab notebooks.
  • tests
    contains all our test files.

Configuring agents

The whole of Dopamine is easily configured using the
gin configuration framework.

We provide a number of configuration files for each of the agents. The main
configuration file for each agent corresponds to an “apples to apples”
comparison, where hyperparameters have been selected to give a standardized
performance comparison between agents. These are

More details on the exact choices behind these parameters are given in our
baselines page.

We also provide configuration files corresponding to settings previously used in
the literature. These are

All of these use the deterministic version of the Arcade Learning Environment
(ALE), and slightly different hyperparameters.

Checkpointing and logging

Dopamine provides basic functionality for performing experiments. This
functionality can be broken down into two main components: checkpointing and
logging. Both components depend on the command-line parameter base_dir,
which informs Dopamine of where it should store experimental data.

Checkpointing

By default, Dopamine will save an experiment checkpoint every iteration: one
training and one evaluation phase, following a standard set by Mnih et al.
Checkpoints are saved in the checkpoints subdirectory under base_dir. At a
high-level, the following are checkpointed:

If you’re curious, the checkpointing code itself is in
dopamine/common/checkpointer.py.

Logging

At the end of each iteration, Dopamine also records the agent’s performance,
both during training and (if enabled) during an optional evaluation phase. The
log files are generated in
dopamine/atari/run_experiment.py
and more specifically in
dopamine/common/logger.py,
and are pickle files containing a dictionary mapping iteration keys
(e.g., "iteration_47") to dictionaries containing data.

A simple way to read log data from multiple experiments is to use the provided
read_experiment
method in
colab/utils.py.

We provide a
colab
to illustrate how you can load the statistics from an experiment and plot them
against our provided baseline runs.

Modifying and extending agents

Dopamine is designed to make algorithmic research simple. With this in mind, we
decided to keep a relatively flat class hierarchy, with no abstract base class;
we’ve found this sufficient for our research purposes, with the added benefits
of simplicity and ease of use. To begin, we recommend modifying the agent code
directly to suit your research purposes.

We provide a
colab
where we illustrate how one can extend the DQN agent, or create a new agent from
scratch, and then plot the experimental results against our provided baselines.

DQN

The DQN agent is contained in two files:

The agent class defines the DQN network, the update rule, and also the basic
operations of a RL agent (epsilon-greedy action selection, storing transitions,
episode bookkeeping, etc.). For example, the Q-Learning update rule used in DQN
is defined in two methods, _build_target_q_op and _build_train_op.

Rainbow and C51

The Rainbow agent is contained in two files:

The C51 agent is a specific parametrization of the Rainbow agent, where
update_horizon (the n in n-step update) is set to 1 and a uniform replay
scheme is used.

Implicit quantile networks (IQN)

The IQN agent is defined by one additional file:

Downloads

We provide a series of files for all 4 agents on all 60 games. These are all
*.tar.gz files which you will need to uncompress:

  • The raw logs are available
    here
    • You can view this
      colab
      for instructions on how to load and visualize them.
  • The compiled pickle files are available
    here
    • We make use of these compiled pickle files in both
      agents
      and the
      statistics
      colabs.
  • The Tensorboard event files are available
    here
    • We provide a
      colab
      where you can start Tensorboard directly from the colab using ngrok.
      In the provided example your Tensorboard will look something like this:

Overview - 图1


  1. * You can also view these with Tensorboard on your machine. For instance, after
  2. uncompressing the files you can run:
  3. ```
  4. tensorboard --logdir c51/Asterix/
  5. ```
  6. to display the training runs for C51 on Asterix:

Overview - 图2


  • The TensorFlow checkpoint files for 5 independent runs of the 4 agents on
    all 60 games are available. The format for each of the files is:
    https://storage.cloud.google.com/download-dopamine-rl/lucid/${AGENT}/${GAME}/${RUN}/tf_ckpt-199.${SUFFIX},
    where:
    • AGENT can be “dqn”, “c51”, “rainbow”, or “iqn”.
    • GAME can be any of the 60 games.
    • RUN can be 1, 2, 3, 4, or 5
    • SUFFIX can be one of data-00000-of-00001, index, or meta.
  • You can also download all of these as a single .tar.gz file. Note: these files are quite large, over 15Gb each.