Experience Replay¶

keras_gym.caching.ExperienceReplayBuffer A simple numpy implementation of an experience replay buffer.

class keras_gym.caching.ExperienceReplayBuffer(env, capacity, batch_size=32, bootstrap_n=1, gamma=0.99, random_seed=None)[source]¶

A simple numpy implementation of an experience replay buffer. This is written primarily with computer game environments (Atari) in mind.

It implements a generic experience replay buffer for environments in which individual observations (frames) are stacked to represent the state.

Parameters:

env : gym environment: The main gym environment. This is needed to infer the number of stacked frames num_frames as well as the number of actions num_actions.
capacity : positive int: The capacity of the experience replay buffer. DQN typically uses capacity=1000000.
batch_size : positive int, optional: The desired batch size of the sample.
bootstrap_n : positive int: The number of steps over which to delay bootstrapping, i.e. n-step bootstrapping.
gamma : float between 0 and 1: Reward discount factor.
random_seed : int or None: To get reproducible results.

add(s, a, r, done, episode_id)[source]¶

Add a transition to the experience replay buffer.

Parameters:	s : state A single state observation. a : action A single action. r : float The observed rewards associated with this transition. done : bool Whether the episode has finished. episode_id : int The episode in which the transition took place. This is needed for generating consistent samples.

clear()[source]¶: Clear the experience replay buffer.

classmethod from_value_function(value_function, capacity, batch_size=32)[source]¶

Create a new instance by extracting some settings from a Q-function.

The settings that are extracted from the value function are: gamma, bootstrap_n and num_frames. The latter is taken from the value function’s env attribute.

Parameters:	value_function : value-function object A state value function or a state-action value function. capacity : positive int The capacity of the experience replay buffer. DQN typically uses `capacity=1000000`. batch_size : positive int, optional The desired batch size of the sample.
Returns:	experience_replay_buffer A new instance.

sample()[source]¶

Get a batch of transitions to be used for bootstrapped updates.

Returns:	S, A, Rn, In, S_next, A_next : tuple of arrays The returned tuple represents a batch of preprocessed transitions: (S, A, Rn, In, S_next, A_next) These are typically used for bootstrapped updates, e.g. minimizing the bootstrapped MSE: \[\left( R^{(n)}_t + I^{(n)}_t\,\sum_aP(a\|S_{t+n})\,Q(S_{t+n},a) - \sum_aP(a\|S_t)\,Q(S_t,a) \right)^2\]