Short-term Caching

keras_gym.caching.MonteCarloCache
keras_gym.caching.NStepCache A convenient helper class for n-step bootstrapping.
class keras_gym.caching.MonteCarloCache(env, gamma)[source]
add(self, s, a, r, done)[source]

Add a transition to the experience cache.

Parameters:
s : state observation

A single state observation.

a : action

A single action.

r : float

A single observed reward.

done : bool

Whether the episode has finished.

flush(self)[source]

Flush all transitions from the cache.

Returns:
S, A, G : tuple of arrays

The returned tuple represents a batch of preprocessed transitions:

(S, A, G)

pop(self)[source]

Pop a single transition from the cache.

Returns:
S, A, G : tuple of arrays, batch_size=1

The returned tuple represents a batch of preprocessed transitions:

(S, A, G)

reset(self)[source]

Reset the cache to the initial state.

class keras_gym.caching.NStepCache(env, n, gamma)[source]

A convenient helper class for n-step bootstrapping.

Parameters:
env : gym environment

The main gym environment. This is needed to determine num_actions.

n : positive int

The number of steps over which to bootstrap.

gamma : float between 0 and 1

The amount by which to discount future rewards.

add(self, s, a, r, done)[source]

Add a transition to the experience cache.

Parameters:
s : state observation

A single state observation.

a : action

A single action.

r : float

A single observed reward.

done : bool

Whether the episode has finished.

flush(self)[source]

Flush all transitions from the cache.

Returns:
S, A, Rn, In, S_next, A_next : tuple of arrays

The returned tuple represents a batch of preprocessed transitions:

(S, A, Rn, In, S_next, A_next)

These are typically used for bootstrapped updates, e.g. minimizing the bootstrapped MSE:

\[\left( R^{(n)}_t + I^{(n)}_t\,Q(S_{t+n},A_{t+n}) - Q(S_t,A_t) \right)^2\]
pop(self)[source]

Pop a single transition from the cache.

Returns:
S, A, Rn, In, S_next, A_next : tuple of arrays, batch_size=1

The returned tuple represents a batch of preprocessed transitions:

(S, A, Rn, In, S_next, A_next)

These are typically used for bootstrapped updates, e.g. minimizing the bootstrapped MSE:

\[\left( R^{(n)}_t + I^{(n)}_t\,Q(S_{t+n},A_{t+n}) - Q(S_t,A_t) \right)^2\]
reset(self)[source]

Reset the cache to the initial state.