Module: circular_replay_buffer
The standard DQN replay memory.
This implementation is an out-of-graph replay memory + in-graph wrapper. It
supports vanilla n-step updates of the form typically found in the literature,
i.e. where rewards are accumulated for n steps and the intermediate trajectory
is not exposed to the agent. This does not allow, for example, performing
off-policy corrections.
Classes
class OutOfGraphReplayBuffer
:
A simple out-of-graph Replay Buffer.
class WrappedReplayBuffer
:
Wrapper of OutOfGraphReplayBuffer with an in graph sampling mechanism.