Release Notes¶

v0.2.17¶

Made keras-gym compatible with tensorflow v2.0 (unfortunately had to disable eager mode)
Added SoftActorCritic class
Added frozen_lake/sac script and notebook
Added atari/sac script, which is still WIP

v0.2.16¶

Major update: support Box action spaces.

introduced keras_gym.proba_dists sub-module, which implements differentiable proability ditributions (incl. differentiable sample() methods)
removed policy-based losses in favor BaseUpdateablePolicy.policy_loss_with_metrics(), which now uses the differentiable ProbaDist objects
removed ConjointActorCritic (was redundant)
changed how we implement target models: no longer rely on global namespaces; instead we use keras.models.clone_model()
changed BaseFunctionApproximator.sync_target_model(): use model.{get,set}_weights()
added script and notebook for Pendulum-v0 with PPO

v0.2.15¶

This is a relatively minor update. Just a couple of small bug fixes.

fixed logging, which was broken by abseil (dependence of tensorflow>=1.14)
added enable_logging helper
updated some docs

v0.2.13¶

This version is another major overhaul. In particular, the FunctionApproximator class is introduced, which offers a unified interface for all function approximator types, i.e. state(-action) value functions and updateable policies. This makes it a lot easier to create your own custom function approximator, whereby you only ahve to define your own forward-pass by creating a subclass of FunctionApproximator and providing a body method. Further flexibility is provided by allowing the head method(s) to be overridden.

added FunctionApproximator class
refactored value functions and policies to just be a wrapper around a FunctionApproximator object
MILESTONE: got AlphaZero to work on ConnectFour (although this game is likely too simple to see the real power of AlphaZero - MCTS on its own works fine)

v0.2.12¶

MILESTONE: got PPO working on Atari Pong
added PolicyKLDivergence and PolicyEntropy
added entropy_beta and ppo_clip_eps kwargs to updateable policies

v0.2.11¶

optimized ActorCritic to avoid feeding in S three times instead of once
removed all mention of bootstrap_model
implemented PPO with ClippedSurrogateLoss

v0.2.10¶

This is the second overhaul, a complete rewrite in fact. There was just too much of the old scikit-gym structure that was standing in the way of progress.

The main thing that changed in this version is that I ditch the notion of an algorithm. Instead, function approximators carry their own “update strategy”. In the case of Q-functions, this is ‘sarsa’, ‘q_learning’ etc., while policies have the options ‘vanilla’, ‘ppo’, etc.

Value functions carry another property that was previously attributed to algorithm objects. This is the bootstrap-n, i.e. the number of steps over which to delay bootstrapping.

This new structure accommodates for modularity much much better than the old structure.

removed algorithms, replaced by ‘bootstrap_n’ and ‘update_strategy’ settings on function approximators
implemented ExperienceReplayBuffer
milestone: added DQN implementation for Atari 2600 envs.
other than that.. too much to mention. It really was a complete rewrite

v0.2.9¶

changed definitions of Q-functions to GenericQ and GenericQTypeII
added option for efficient bootstrapped updating (bootstrap_model argument in value functions, see example usage: NStepBootstrapV)
renamed ValuePolicy to ValueBasedPolicy

v0.2.8¶

implemented base class for updateable policy objects
implemented first example of updateable policy: GenericSoftmaxPolicy
implemented predefined softmax policy: LinearSoftmaxPolicy
added first policy gradient algorithm: Reinforce
added REINFORCE example notebook
updated documentation

v0.2.7¶

This was a MAJOR overhaul in which I ported everything from scikit-learn to Keras. The reason for this is that I was stuck on the implementation of policy gradient methods due to the lack of flexibility of the scikit-learn ecosystem. I chose Keras as a replacement, it’s nice an modular like scikit-learn, but in addition it’s much more flexible. In particular, the ability to provide custom loss functions has been the main selling point. Another selling point was that some environments require more sophisticated neural nets than a simple MLP, which is readily available in Keras.

added compatibility wrapper for scikit-learn function approximators
ported all value functions to use keras.Model
ported predefined models LinearV and LinearQ to keras
ported algorithms to keras
ported all notebooks to keras
changed name of the package keras-gym and root module keras_gym

Other changes:

added propensity score outputs to policy objects
created a stub for directly updateable policies

v0.2.6¶

refactored BaseAlgorithm to simplify implementation (at the cost of more code, but it’s worth it)
refactored notebooks: they are now bundled by environment / algo type
added n-step bootstrap algorithms:
- NStepQLearning
- NStepSarsa
- NStepExpectedSarsa

v0.2.5¶

added algorithm: keras_gym.algorithms.ExpectedSarsa
added object: keras_gym.utils.ExperienceCache
rewrote MonteCarlo to use ExperienceCache

v0.2.4¶

added algorithm: keras_gym.algorithms.MonteCarlo

v0.2.3¶

added algorithm: keras_gym.algorithms.Sarsa

v0.2.2¶

changed doc theme from sklearn to readthedocs

v0.2.1¶

first working implementation value function + policy + algorithm
added first working example in a notebook
added algorithm: keras_gym.algorithms.QLearning