Release Notes

v0.2.17

  • Made keras-gym compatible with tensorflow v2.0 (unfortunately had to disable eager mode)
  • Added SoftActorCritic class
  • Added frozen_lake/sac script and notebook
  • Added atari/sac script, which is still WIP

v0.2.16

Major update: support Box action spaces.

  • introduced keras_gym.proba_dists sub-module, which implements differentiable proability ditributions (incl. differentiable sample() methods)
  • removed policy-based losses in favor BaseUpdateablePolicy.policy_loss_with_metrics(), which now uses the differentiable ProbaDist objects
  • removed ConjointActorCritic (was redundant)
  • changed how we implement target models: no longer rely on global namespaces; instead we use keras.models.clone_model()
  • changed BaseFunctionApproximator.sync_target_model(): use model.{get,set}_weights()
  • added script and notebook for Pendulum-v0 with PPO

v0.2.15

This is a relatively minor update. Just a couple of small bug fixes.

  • fixed logging, which was broken by abseil (dependence of tensorflow>=1.14)
  • added enable_logging helper
  • updated some docs

v0.2.13

This version is another major overhaul. In particular, the FunctionApproximator class is introduced, which offers a unified interface for all function approximator types, i.e. state(-action) value functions and updateable policies. This makes it a lot easier to create your own custom function approximator, whereby you only ahve to define your own forward-pass by creating a subclass of FunctionApproximator and providing a body method. Further flexibility is provided by allowing the head method(s) to be overridden.

  • added FunctionApproximator class
  • refactored value functions and policies to just be a wrapper around a FunctionApproximator object
  • MILESTONE: got AlphaZero to work on ConnectFour (although this game is likely too simple to see the real power of AlphaZero - MCTS on its own works fine)

v0.2.12

  • MILESTONE: got PPO working on Atari Pong
  • added PolicyKLDivergence and PolicyEntropy
  • added entropy_beta and ppo_clip_eps kwargs to updateable policies

v0.2.11

  • optimized ActorCritic to avoid feeding in S three times instead of once
  • removed all mention of bootstrap_model
  • implemented PPO with ClippedSurrogateLoss

v0.2.10

This is the second overhaul, a complete rewrite in fact. There was just too much of the old scikit-gym structure that was standing in the way of progress.

The main thing that changed in this version is that I ditch the notion of an algorithm. Instead, function approximators carry their own “update strategy”. In the case of Q-functions, this is ‘sarsa’, ‘q_learning’ etc., while policies have the options ‘vanilla’, ‘ppo’, etc.

Value functions carry another property that was previously attributed to algorithm objects. This is the bootstrap-n, i.e. the number of steps over which to delay bootstrapping.

This new structure accommodates for modularity much much better than the old structure.

  • removed algorithms, replaced by ‘bootstrap_n’ and ‘update_strategy’ settings on function approximators
  • implemented ExperienceReplayBuffer
  • milestone: added DQN implementation for Atari 2600 envs.
  • other than that.. too much to mention. It really was a complete rewrite

v0.2.9

  • changed definitions of Q-functions to GenericQ and GenericQTypeII
  • added option for efficient bootstrapped updating (bootstrap_model argument in value functions, see example usage: NStepBootstrapV)
  • renamed ValuePolicy to ValueBasedPolicy

v0.2.8

  • implemented base class for updateable policy objects
  • implemented first example of updateable policy: GenericSoftmaxPolicy
  • implemented predefined softmax policy: LinearSoftmaxPolicy
  • added first policy gradient algorithm: Reinforce
  • added REINFORCE example notebook
  • updated documentation

v0.2.7

This was a MAJOR overhaul in which I ported everything from scikit-learn to Keras. The reason for this is that I was stuck on the implementation of policy gradient methods due to the lack of flexibility of the scikit-learn ecosystem. I chose Keras as a replacement, it’s nice an modular like scikit-learn, but in addition it’s much more flexible. In particular, the ability to provide custom loss functions has been the main selling point. Another selling point was that some environments require more sophisticated neural nets than a simple MLP, which is readily available in Keras.

  • added compatibility wrapper for scikit-learn function approximators
  • ported all value functions to use keras.Model
  • ported predefined models LinearV and LinearQ to keras
  • ported algorithms to keras
  • ported all notebooks to keras
  • changed name of the package keras-gym and root module keras_gym

Other changes:

  • added propensity score outputs to policy objects
  • created a stub for directly updateable policies

v0.2.6

  • refactored BaseAlgorithm to simplify implementation (at the cost of more code, but it’s worth it)
  • refactored notebooks: they are now bundled by environment / algo type
  • added n-step bootstrap algorithms:
    • NStepQLearning
    • NStepSarsa
    • NStepExpectedSarsa

v0.2.5

  • added algorithm: keras_gym.algorithms.ExpectedSarsa
  • added object: keras_gym.utils.ExperienceCache
  • rewrote MonteCarlo to use ExperienceCache

v0.2.4

  • added algorithm: keras_gym.algorithms.MonteCarlo

v0.2.3

  • added algorithm: keras_gym.algorithms.Sarsa

v0.2.2

  • changed doc theme from sklearn to readthedocs

v0.2.1

  • first working implementation value function + policy + algorithm
  • added first working example in a notebook
  • added algorithm: keras_gym.algorithms.QLearning