In RL we often make use of data caching. This might be short-term caching, over the course of an episode, or it might be long-term caching as is done in experience replay.

Short-term Caching

Our short-term caching objects allow us to cache experience within an episode. For instance MonteCarloCache caches all transitions collected over an entire episode and then gives us back the the \(\gamma\)-discounted returns when the episode finishes.

Another short-term caching object is NStepCache, which keeps an \(n\)-sized sliding window of transitions that allows us to do \(n\)-step bootstrapping.

Experience Replay Buffer

At the moment, we only have one long-term caching object, which is the ExperienceReplayBuffer. This object can hold an arbitrary number of transitions; the only constraint is the amount of available memory on your machine.

The way we use learn from the experience stored in the ExperienceReplayBuffer is by sampling from it and then feeding the batch of transitions to our function approximator.