rainbow_agent.RainbowAgent
Class RainbowAgent
Inherits From: DQNAgent
A compact implementation of a simplified Rainbow agent.
Methods
init
__init__(
*args,
**kwargs
)
Initializes the agent and constructs the components of its graph.
Args:
sess
:tf.Session
, for executing ops.num_actions
: int, number of actions the agent can take at any
state.num_atoms
: int, the number of buckets of the value function
distribution.vmax
: float, the value distribution support is [-vmax, vmax].gamma
: float, discount factor with the usual RL meaning.update_horizon
: int, horizon at which updates are performed, the
‘n’ in n-step update.min_replay_history
: int, number of transitions that should be
experienced before the agent begins training its value function.update_period
: int, period between DQN updates.target_update_period
: int, update period for the target network.epsilon_fn
: function expecting 4 parameters: (decay_period, step,
warmup_steps, epsilon). This function should return the epsilon value used
for exploration during training.epsilon_train
: float, the value to which the agent’s epsilon is
eventually decayed during training.epsilon_eval
: float, epsilon used when evaluating the agent.epsilon_decay_period
: int, length of the epsilon decay schedule.replay_scheme
: str, ‘prioritized’ or ‘uniform’, the sampling scheme
of the replay memory.tf_device
: str, Tensorflow device on which the agent’s graph is
executed.use_staging
: bool, when True use a staging area to prefetch the
next training batch, speeding training up by about 30%.optimizer
:tf.train.Optimizer
, for training the value function.
begin_episode
begin_episode(observation)
Returns the agent’s first action for this episode.
Args:
observation
: numpy array, the environment’s initial observation.
Returns:
int, the selected action.
bundle_and_checkpoint
bundle_and_checkpoint(
checkpoint_dir,
iteration_number
)
Returns a self-contained bundle of the agent’s state.
This is used for checkpointing. It will return a dictionary containing all
non-TensorFlow objects (to be saved into a file by the caller), and it saves all
TensorFlow objects into a checkpoint file.
Args:
checkpoint_dir
: str, directory where TensorFlow objects will be
saved.iteration_number
: int, iteration number to use for naming the
checkpoint file.
Returns:
A dict containing additional Python objects to be checkpointed by the
experiment. If the checkpoint directory does not exist, returns None.
end_episode
end_episode(reward)
Signals the end of the episode to the agent.
We store the observation of the current time step, which is the last observation
of the episode.
Args:
reward
: float, the last reward from the environment.
step
step(
reward,
observation
)
Records the most recent transition and returns the agent’s next action.
We store the observation of the last time step since we want to store it with
the reward.
Args:
reward
: float, the reward received from the agent’s most recent
action.observation
: numpy array, the most recent observation.
Returns:
int, the selected action.
unbundle
unbundle(
checkpoint_dir,
iteration_number,
bundle_dictionary
)
Restores the agent from a checkpoint.
Restores the agent’s Python objects to those specified in bundle_dictionary, and
restores the TensorFlow objects to those specified in the checkpoint_dir. If the
checkpoint_dir does not exist, will not reset the agent’s state.
Args:
checkpoint_dir
: str, path to the checkpoint saved by tf.Save.iteration_number
: int, checkpoint version, used when restoring
replay buffer.bundle_dictionary
: dict, containing additional Python objects owned
by the agent.
Returns:
bool, True if unbundling was successful.