# Function Approximators¶

The central object object in this package is the
`keras_gym.FunctionApproximator`

, which provides an interface between a
gym-type environment and function approximators like value functions and updateable policies.

## FunctionApproximator class¶

The way we would define a function approximator is by specifying a body. For instance, the example below specifies a simple multi-layer perceptron:

```
import gym
import keras_gym as km
from tensorflow import keras
class MLP(km.FunctionApproximator):
""" multi-layer perceptron with one hidden layer """
def body(self, S):
X = keras.layers.Flatten()(S)
X = keras.layers.Dense(units=4)(X)
return X
# environment
env = gym.make(...)
# value function and its derived policy
function_approximator = MLP(env, lr=0.01)
```

This `function_approximator`

can now be used to construct a value function or
updateable policy, which we cover in the remainder of this page.

## Predefined Function Approximators¶

Although it’s pretty easy to create a custom function approximator,
**keras-gym** also provides some predefined function approximators. They are
listed here.

## Value Functions¶

Value functions estimate the expected (discounted) sum of future rewards. For instance, state value functions are defined as:

Here, the \(R\) are the individual rewards we receive from the Markov Decision Process (MDP) at each time step.

In **keras-gym** we define a state value functions as follows:

```
v = km.V(function_approximator, gamma=0.9, bootstrap_n=1)
```

The function_approximator is discussed above. The other arguments set the discount factor \(\gamma\in[0,1]\) and the number of steps over which to bootstrap.

Similar to state value functions, we can also define state-action value functions:

**keras-gym** provides two distinct ways to define such a Q-function, which are
refered to as type-I and
type-II Q-functions. The
difference between the two is in how the function approximator models the
Q-function. A type-I Q-function models the Q-function as \((s, a)\mapsto
q(s, a)\in\mathbb{R}\), whereas a type-II Q-function models it as
\(s\mapsto q(s,.)\in\mathbb{R}^n\). Here, \(n\) is the number of
actions, which means that this is only well-defined for discrete action spaces.

In **keras-gym** we define a type-I Q-function as follows:

```
q = km.QTypeI(function_approximator, update_strategy='sarsa')
```

and similarly for type-II:

```
q = km.QTypeII(function_approximator, update_strategy='sarsa')
```

The `update_strategy`

argument specifies our bootstrapping target. Available
choices are `'sarsa'`

, `'q_learning'`

and `'double_q_learning'`

.

The main reason for using a Q-function is for value-based control. In other words, we typically want to derive a policy from the Q-function. This is pretty straightforward too:

```
pi = km.EpsilonGreedy(q, epsilon=0.1)
# the epsilon parameter may be updated dynamically
pi.set_epsilon(0.25)
```

## Updateable Policies¶

Besides value-based control in which we derive a policy from a Q-function, we can also do policy-based control. In policy-based methods we learn a policy directly as a probability distribution over the space of actions \(\pi(a|s)\).

The updateable policies for discrete action spaces are known as softmax policies:

where the logits are defined over the real line \(z(s,a)\in\mathbb{R}\).

In **keras-gym** we define a softmax policy as follows:

```
pi = km.SoftmaxPolicy(function_approximator, update_strategy='vanilla')
```

Similar to Q-functions, we can pick different update strategies. Available
options for policies are `'vanilla'`

, `'ppo'`

and `'cross_entropy'`

.
These specify the objective function used in our policy updates.

## Actor-Critics¶

It’s often useful to combine a policy with a value function into what is called
an actor-critic. The value function (critic) can be used to aid the
update procedure for the policy (actor). The **keras-gym** package provides
simple way of constructing an actor-critic using the `ActorCritic`

class:

```
# separate policy and value function
pi = km.SoftmaxPolicy(function_approximator, update_strategy='vanilla')
v = km.V(function_approximator, gamma=0.9, bootstrap_n=1)
# combine them into a single actor-critic
actor_critic = km.ActorCritic(pi, v)
```