Experience Replay

keras_gym.caching.ExperienceReplayBuffer A simple numpy implementation of an experience replay buffer.
class keras_gym.caching.ExperienceReplayBuffer(env, capacity, batch_size=32, bootstrap_n=1, gamma=0.99, random_seed=None)[source]

A simple numpy implementation of an experience replay buffer. This is written primarily with computer game environments (Atari) in mind.

It implements a generic experience replay buffer for environments in which individual observations (frames) are stacked to represent the state.

Parameters:
env : gym environment

The main gym environment. This is needed to infer the number of stacked frames num_frames as well as the number of actions num_actions.

capacity : positive int

The capacity of the experience replay buffer. DQN typically uses capacity=1000000.

batch_size : positive int, optional

The desired batch size of the sample.

bootstrap_n : positive int

The number of steps over which to delay bootstrapping, i.e. n-step bootstrapping.

gamma : float between 0 and 1

Reward discount factor.

random_seed : int or None

To get reproducible results.

add(s, a, r, done, episode_id)[source]

Add a transition to the experience replay buffer.

Parameters:
s : state

A single state observation.

a : action

A single action.

r : float

The observed rewards associated with this transition.

done : bool

Whether the episode has finished.

episode_id : int

The episode in which the transition took place. This is needed for generating consistent samples.

clear()[source]

Clear the experience replay buffer.

classmethod from_value_function(value_function, capacity, batch_size=32)[source]

Create a new instance by extracting some settings from a Q-function.

The settings that are extracted from the value function are: gamma, bootstrap_n and num_frames. The latter is taken from the value function’s env attribute.

Parameters:
value_function : value-function object

A state value function or a state-action value function.

capacity : positive int

The capacity of the experience replay buffer. DQN typically uses capacity=1000000.

batch_size : positive int, optional

The desired batch size of the sample.

Returns:
experience_replay_buffer

A new instance.

sample()[source]

Get a batch of transitions to be used for bootstrapped updates.

Returns:
S, A, Rn, In, S_next, A_next : tuple of arrays

The returned tuple represents a batch of preprocessed transitions:

(S, A, Rn, In, S_next, A_next)

These are typically used for bootstrapped updates, e.g. minimizing the bootstrapped MSE:

\[\left( R^{(n)}_t + I^{(n)}_t\,\sum_aP(a|S_{t+n})\,Q(S_{t+n},a) - \sum_aP(a|S_t)\,Q(S_t,a) \right)^2\]