# Release Notes¶

## v0.2.17¶

- Made keras-gym compatible with tensorflow v2.0 (unfortunately had to disable eager mode)
- Added
`SoftActorCritic`

class - Added
`frozen_lake/sac`

script and notebook - Added
`atari/sac`

script, which is still WIP

## v0.2.16¶

Major update: support Box action spaces.

- introduced
`keras_gym.proba_dists`

sub-module, which implements differentiable proability ditributions (incl. differentiable`sample()`

methods) - removed policy-based losses in favor
`BaseUpdateablePolicy.policy_loss_with_metrics()`

, which now uses the differentiable`ProbaDist`

objects - removed
`ConjointActorCritic`

(was redundant) - changed how we implement target models: no longer rely on global namespaces; instead we use
`keras.models.clone_model()`

- changed
`BaseFunctionApproximator.sync_target_model()`

: use`model.{get,set}_weights()`

- added script and notebook for Pendulum-v0 with PPO

## v0.2.15¶

This is a relatively minor update. Just a couple of small bug fixes.

- fixed logging, which was broken by abseil (dependence of tensorflow>=1.14)
- added enable_logging helper
- updated some docs

## v0.2.13¶

This version is another major overhaul. In particular, the
`FunctionApproximator`

class is
introduced, which offers a unified interface for all function approximator
types, i.e. state(-action) value functions and updateable policies. This makes
it a lot easier to create your own custom function approximator, whereby you
only ahve to define your own forward-pass by creating a subclass of
`FunctionApproximator`

and providing a
`body`

method. Further flexibility
is provided by allowing the head method(s) to be overridden.

- added
`FunctionApproximator`

class - refactored value functions and policies to just be a wrapper around a
`FunctionApproximator`

object - MILESTONE: got AlphaZero to work on ConnectFour (although this game is likely too simple to see the real power of AlphaZero - MCTS on its own works fine)

## v0.2.12¶

- MILESTONE: got PPO working on Atari Pong
- added
`PolicyKLDivergence`

and`PolicyEntropy`

- added
`entropy_beta`

and`ppo_clip_eps`

kwargs to updateable policies

## v0.2.11¶

- optimized ActorCritic to avoid feeding in S three times instead of once
- removed all mention of
`bootstrap_model`

- implemented PPO with
`ClippedSurrogateLoss`

## v0.2.10¶

This is the second overhaul, a complete rewrite in fact. There was just too much of the old scikit-gym structure that was standing in the way of progress.

The main thing that changed in this version is that I ditch the notion of an algorithm. Instead, function approximators carry their own “update strategy”. In the case of Q-functions, this is ‘sarsa’, ‘q_learning’ etc., while policies have the options ‘vanilla’, ‘ppo’, etc.

Value functions carry another property that was previously attributed to algorithm objects. This is the bootstrap-n, i.e. the number of steps over which to delay bootstrapping.

This new structure accommodates for modularity much much better than the old structure.

- removed algorithms, replaced by ‘bootstrap_n’ and ‘update_strategy’ settings on function approximators
- implemented
`ExperienceReplayBuffer`

- milestone: added DQN implementation for Atari 2600 envs.
- other than that.. too much to mention. It really was a complete rewrite

## v0.2.9¶

- changed definitions of Q-functions to
`GenericQ`

and`GenericQTypeII`

- added option for efficient bootstrapped updating (
`bootstrap_model`

argument in value functions, see example usage:`NStepBootstrapV`

) - renamed
`ValuePolicy`

to`ValueBasedPolicy`

## v0.2.8¶

- implemented base class for updateable policy objects
- implemented first example of updateable policy:
`GenericSoftmaxPolicy`

- implemented predefined softmax policy:
`LinearSoftmaxPolicy`

- added first policy gradient algorithm:
`Reinforce`

- added REINFORCE example notebook
- updated documentation

## v0.2.7¶

This was a *MAJOR* overhaul in which I ported everything from scikit-learn to
Keras. The reason for this is that I was stuck on the implementation of policy
gradient methods due to the lack of flexibility of the scikit-learn ecosystem.
I chose Keras as a replacement, it’s nice an modular like scikit-learn,
but in addition it’s much more flexible. In particular, the ability to provide
custom loss functions has been the main selling point. Another selling point
was that some environments require more sophisticated neural nets than a
simple MLP, which is readily available in Keras.

- added compatibility wrapper for scikit-learn function approximators
- ported all value functions to use keras.Model
- ported predefined models
`LinearV`

and`LinearQ`

to keras - ported algorithms to keras
- ported all notebooks to keras
- changed name of the package keras-gym and root module
`keras_gym`

Other changes:

- added propensity score outputs to policy objects
- created a stub for directly updateable policies

## v0.2.6¶

- refactored BaseAlgorithm to simplify implementation (at the cost of more code, but it’s worth it)
- refactored notebooks: they are now bundled by environment / algo type
- added n-step bootstrap algorithms:
`NStepQLearning`

`NStepSarsa`

`NStepExpectedSarsa`

## v0.2.5¶

- added algorithm:
`keras_gym.algorithms.ExpectedSarsa`

- added object:
`keras_gym.utils.ExperienceCache`

- rewrote
`MonteCarlo`

to use`ExperienceCache`

## v0.2.4¶

- added algorithm:
`keras_gym.algorithms.MonteCarlo`

## v0.2.3¶

- added algorithm:
`keras_gym.algorithms.Sarsa`

## v0.2.2¶

- changed doc theme from sklearn to readthedocs

## v0.2.1¶

- first working implementation value function + policy + algorithm
- added first working example in a notebook
- added algorithm:
`keras_gym.algorithms.QLearning`