implicit_quantile_agent.ImplicitQuantileAgent
- Class ImplicitQuantileAgent
- Methods

implicit_quantile_agent.ImplicitQuantileAgent

Class `ImplicitQuantileAgent`

Inherits From: RainbowAgent

An extension of Rainbow to perform implicit quantile regression.

Methods

`init`

__init__(
    *args,
    **kwargs
)

Initializes the agent and constructs the Graph.

Most of this constructor’s parameters are IQN-specific hyperparameters whose
values are taken from Dabney et al. (2018).

Args:

sess: tf.Session object for running associated ops.
num_actions: int, number of actions the agent can take at any
state.
kappa: float, Huber loss cutoff.
num_tau_samples: int, number of online quantile samples for loss
estimation.
num_tau_prime_samples: int, number of target quantile samples for
loss estimation.
num_quantile_samples: int, number of quantile samples for computing
Q-values.
quantile_embedding_dim: int, embedding dimension for the quantile
input.

`begin_episode`

begin_episode(observation)

Returns the agent’s first action for this episode.

Args:

observation: numpy array, the environment’s initial observation.

Returns:

int, the selected action.

`bundle_and_checkpoint`

bundle_and_checkpoint(
    checkpoint_dir,
    iteration_number
)

Returns a self-contained bundle of the agent’s state.

This is used for checkpointing. It will return a dictionary containing all
non-TensorFlow objects (to be saved into a file by the caller), and it saves all
TensorFlow objects into a checkpoint file.

Args:

checkpoint_dir: str, directory where TensorFlow objects will be
saved.
iteration_number: int, iteration number to use for naming the
checkpoint file.

Returns:

A dict containing additional Python objects to be checkpointed by the
experiment. If the checkpoint directory does not exist, returns None.

`end_episode`

end_episode(reward)

Signals the end of the episode to the agent.

We store the observation of the current time step, which is the last observation
of the episode.

Args:

reward: float, the last reward from the environment.

`step`

step(
    reward,
    observation
)

Records the most recent transition and returns the agent’s next action.

We store the observation of the last time step since we want to store it with
the reward.

Args:

reward: float, the reward received from the agent’s most recent
action.
observation: numpy array, the most recent observation.

Returns:

int, the selected action.

`unbundle`

unbundle(
    checkpoint_dir,
    iteration_number,
    bundle_dictionary
)

Restores the agent from a checkpoint.

Restores the agent’s Python objects to those specified in bundle_dictionary, and
restores the TensorFlow objects to those specified in the checkpoint_dir. If the
checkpoint_dir does not exist, will not reset the agent’s state.

Args:

checkpoint_dir: str, path to the checkpoint saved by tf.Save.
iteration_number: int, checkpoint version, used when restoring
replay buffer.
bundle_dictionary: dict, containing additional Python objects owned
by the agent.

Returns:

bool, True if unbundling was successful.

implicit_quantile_agent.ImplicitQuantileAgent

implicit_quantile_agent.ImplicitQuantileAgent

Class ImplicitQuantileAgent

Methods

init

Args:

begin_episode

Args:

Returns:

bundle_and_checkpoint

Args:

Returns:

end_episode

Args:

step

Args:

Returns:

unbundle

Args:

Returns:

Class `ImplicitQuantileAgent`

`init`

`begin_episode`

`bundle_and_checkpoint`

`end_episode`

`step`

`unbundle`