Release Notes¶
v0.2.17¶
- Made keras-gym compatible with tensorflow v2.0 (unfortunately had to disable eager mode)
- Added
SoftActorCritic
class - Added
frozen_lake/sac
script and notebook - Added
atari/sac
script, which is still WIP
v0.2.16¶
Major update: support Box action spaces.
- introduced
keras_gym.proba_dists
sub-module, which implements differentiable proability ditributions (incl. differentiablesample()
methods) - removed policy-based losses in favor
BaseUpdateablePolicy.policy_loss_with_metrics()
, which now uses the differentiableProbaDist
objects - removed
ConjointActorCritic
(was redundant) - changed how we implement target models: no longer rely on global namespaces; instead we use
keras.models.clone_model()
- changed
BaseFunctionApproximator.sync_target_model()
: usemodel.{get,set}_weights()
- added script and notebook for Pendulum-v0 with PPO
v0.2.15¶
This is a relatively minor update. Just a couple of small bug fixes.
- fixed logging, which was broken by abseil (dependence of tensorflow>=1.14)
- added enable_logging helper
- updated some docs
v0.2.13¶
This version is another major overhaul. In particular, the
FunctionApproximator
class is
introduced, which offers a unified interface for all function approximator
types, i.e. state(-action) value functions and updateable policies. This makes
it a lot easier to create your own custom function approximator, whereby you
only ahve to define your own forward-pass by creating a subclass of
FunctionApproximator
and providing a
body
method. Further flexibility
is provided by allowing the head method(s) to be overridden.
- added
FunctionApproximator
class - refactored value functions and policies to just be a wrapper around a
FunctionApproximator
object - MILESTONE: got AlphaZero to work on ConnectFour (although this game is likely too simple to see the real power of AlphaZero - MCTS on its own works fine)
v0.2.12¶
- MILESTONE: got PPO working on Atari Pong
- added
PolicyKLDivergence
andPolicyEntropy
- added
entropy_beta
andppo_clip_eps
kwargs to updateable policies
v0.2.11¶
- optimized ActorCritic to avoid feeding in S three times instead of once
- removed all mention of
bootstrap_model
- implemented PPO with
ClippedSurrogateLoss
v0.2.10¶
This is the second overhaul, a complete rewrite in fact. There was just too much of the old scikit-gym structure that was standing in the way of progress.
The main thing that changed in this version is that I ditch the notion of an algorithm. Instead, function approximators carry their own “update strategy”. In the case of Q-functions, this is ‘sarsa’, ‘q_learning’ etc., while policies have the options ‘vanilla’, ‘ppo’, etc.
Value functions carry another property that was previously attributed to algorithm objects. This is the bootstrap-n, i.e. the number of steps over which to delay bootstrapping.
This new structure accommodates for modularity much much better than the old structure.
- removed algorithms, replaced by ‘bootstrap_n’ and ‘update_strategy’ settings on function approximators
- implemented
ExperienceReplayBuffer
- milestone: added DQN implementation for Atari 2600 envs.
- other than that.. too much to mention. It really was a complete rewrite
v0.2.9¶
- changed definitions of Q-functions to
GenericQ
andGenericQTypeII
- added option for efficient bootstrapped updating (
bootstrap_model
argument in value functions, see example usage:NStepBootstrapV
) - renamed
ValuePolicy
toValueBasedPolicy
v0.2.8¶
- implemented base class for updateable policy objects
- implemented first example of updateable policy:
GenericSoftmaxPolicy
- implemented predefined softmax policy:
LinearSoftmaxPolicy
- added first policy gradient algorithm:
Reinforce
- added REINFORCE example notebook
- updated documentation
v0.2.7¶
This was a MAJOR overhaul in which I ported everything from scikit-learn to Keras. The reason for this is that I was stuck on the implementation of policy gradient methods due to the lack of flexibility of the scikit-learn ecosystem. I chose Keras as a replacement, it’s nice an modular like scikit-learn, but in addition it’s much more flexible. In particular, the ability to provide custom loss functions has been the main selling point. Another selling point was that some environments require more sophisticated neural nets than a simple MLP, which is readily available in Keras.
- added compatibility wrapper for scikit-learn function approximators
- ported all value functions to use keras.Model
- ported predefined models
LinearV
andLinearQ
to keras - ported algorithms to keras
- ported all notebooks to keras
- changed name of the package keras-gym and root module
keras_gym
Other changes:
- added propensity score outputs to policy objects
- created a stub for directly updateable policies
v0.2.6¶
- refactored BaseAlgorithm to simplify implementation (at the cost of more code, but it’s worth it)
- refactored notebooks: they are now bundled by environment / algo type
- added n-step bootstrap algorithms:
NStepQLearning
NStepSarsa
NStepExpectedSarsa
v0.2.5¶
- added algorithm:
keras_gym.algorithms.ExpectedSarsa
- added object:
keras_gym.utils.ExperienceCache
- rewrote
MonteCarlo
to useExperienceCache
v0.2.4¶
- added algorithm:
keras_gym.algorithms.MonteCarlo
v0.2.3¶
- added algorithm:
keras_gym.algorithms.Sarsa
v0.2.2¶
- changed doc theme from sklearn to readthedocs
v0.2.1¶
- first working implementation value function + policy + algorithm
- added first working example in a notebook
- added algorithm:
keras_gym.algorithms.QLearning