Experience Replay¶
keras_gym.caching.ExperienceReplayBuffer |
A simple numpy implementation of an experience replay buffer. |
-
class
keras_gym.caching.
ExperienceReplayBuffer
(env, capacity, batch_size=32, bootstrap_n=1, gamma=0.99, random_seed=None)[source]¶ A simple numpy implementation of an experience replay buffer. This is written primarily with computer game environments (Atari) in mind.
It implements a generic experience replay buffer for environments in which individual observations (frames) are stacked to represent the state.
Parameters: - env : gym environment
The main gym environment. This is needed to infer the number of stacked frames
num_frames
as well as the number of actionsnum_actions
.- capacity : positive int
The capacity of the experience replay buffer. DQN typically uses
capacity=1000000
.- batch_size : positive int, optional
The desired batch size of the sample.
- bootstrap_n : positive int
The number of steps over which to delay bootstrapping, i.e. n-step bootstrapping.
- gamma : float between 0 and 1
Reward discount factor.
- random_seed : int or None
To get reproducible results.
-
add
(s, a, r, done, episode_id)[source]¶ Add a transition to the experience replay buffer.
Parameters: - s : state
A single state observation.
- a : action
A single action.
- r : float
The observed rewards associated with this transition.
- done : bool
Whether the episode has finished.
- episode_id : int
The episode in which the transition took place. This is needed for generating consistent samples.
-
classmethod
from_value_function
(value_function, capacity, batch_size=32)[source]¶ Create a new instance by extracting some settings from a Q-function.
The settings that are extracted from the value function are:
gamma
,bootstrap_n
andnum_frames
. The latter is taken from the value function’senv
attribute.Parameters: - value_function : value-function object
A state value function or a state-action value function.
- capacity : positive int
The capacity of the experience replay buffer. DQN typically uses
capacity=1000000
.- batch_size : positive int, optional
The desired batch size of the sample.
Returns: - experience_replay_buffer
A new instance.
-
sample
()[source]¶ Get a batch of transitions to be used for bootstrapped updates.
Returns: - S, A, Rn, In, S_next, A_next : tuple of arrays
The returned tuple represents a batch of preprocessed transitions:
These are typically used for bootstrapped updates, e.g. minimizing the bootstrapped MSE:
\[\left( R^{(n)}_t + I^{(n)}_t\,\sum_aP(a|S_{t+n})\,Q(S_{t+n},a) - \sum_aP(a|S_t)\,Q(S_t,a) \right)^2\]