Predefined Function Approximators

keras_gym.predefined.LinearFunctionApproximator A linear function approximator.
keras_gym.predefined.AtariFunctionApproximator A function approximator specifically designed for Atari 2600 environments.
keras_gym.predefined.ConnectFourFunctionApproximator A function approximator specifically designed for the ConnectFour environment.
class keras_gym.predefined.LinearFunctionApproximator(env, interaction=None, optimizer=None, **optimizer_kwargs)

A linear function approximator.

Parameters:
env : environment

A gym-style environment.

interaction : str or keras.layers.Layer, optional

The desired feature interactions that are fed to the linear regression model. Available predefined preprocessors can be chosen by passing a string, one of the following:

‘full_quadratic’

This option generates full-quadratic interactions, which include all linear, bilinear and quadratic terms. It does not include an intercept. Let \(b\) and \(n\) be the batch size and number of features. Then, the input shape is \((b, n)\) and the output shape is \((b, (n + 1) (n + 2) / 2 - 1))\).

Note: This option requires the tensorflow backend.

‘elementwise_quadratic’

This option generates element-wise quadratic interactions, which only include linear and quadratic terms. It does not include bilinear terms or an intercept. Let \(b\) and \(n\) be the batch size and number of features. Then, the input shape is \((b, n)\) and the output shape is \((b, 2n)\).

Otherwise, a custom interaction layer can be passed as well. If left unspecified (interaction=None), the interaction layer is omitted altogether.

optimizer : keras.optimizers.Optimizer, optional

If left unspecified (optimizer=None), the function approximator’s DEFAULT_OPTIMIZER is used. See keras documentation for more details.

**optimizer_kwargs : keyword arguments

Keyword arguments for the optimizer. This is useful when you want to use the default optimizer with a different setting, e.g. changing the learning rate.

DEFAULT_OPTIMIZER

alias of tensorflow.python.keras.optimizer_v2.gradient_descent.SGD

body(self, S)

This is the part of the computation graph that may be shared between e.g. policy (actor) and value function (critic). It is typically the part of a neural net that does most of the heavy lifting. One may think of the body() as an elaborate automatic feature extractor.

Parameters:
S : nd Tensor: shape: [batch_size, …]

The input state observation.

Returns:
X : nd Tensor, shape: [batch_size, …]

The intermediate keras tensor.

body_q1(self, S, A)

This is similar to body(), except that it takes a state-action pair as input instead of only state observations.

Parameters:
S : nd Tensor: shape: [batch_size, …]

The input state observation.

A : nd Tensor: shape: [batch_size, …]

The input actions.

Returns:
X : nd Tensor, shape: [batch_size, …]

The intermediate keras tensor.

head_pi(self, X)

This is the policy head. It returns logits, i.e. not probabilities. Use a softmax to turn the output into probabilities.

Parameters:
X : nd Tensor, shape: [batch_size, …]

X is an intermediate tensor in the full forward-pass of the computation graph; it’s the output of the last layer of the body() method.

Returns:
*params : Tensor or tuple of Tensors, shape: [batch_size, …]

These constitute the raw policy distribution parameters.

head_q1(self, X)

This is the type-I Q-value head. It returns a scalar Q-value \(q(s,a)\in\mathbb{R}\).

Parameters:
X : nd Tensor, shape: [batch_size, …]

X is an intermediate tensor in the full forward-pass of the computation graph; it’s the output of the last layer of the body() method.

Returns:
Q_sa : 2d Tensor, shape: [batch_size, 1]

The output type-I Q-values \(q(s,a)\in\mathbb{R}\).

head_q2(self, X)

This is the type-II Q-value head. It returns a vector of Q-values \(q(s,.)\in\mathbb{R}^n\).

Parameters:
X : nd Tensor, shape: [batch_size, …]

X is an intermediate tensor in the full forward-pass of the computation graph; it’s the output of the last layer of the body() method.

Returns:
Q_s : 2d Tensor, shape: [batch_size, num_actions]

The output type-II Q-values \(q(s,.)\in\mathbb{R}^n\).

head_v(self, X)

This is the state value head. It returns a scalar V-value \(v(s)\in\mathbb{R}\).

Parameters:
X : nd Tensor, shape: [batch_size, …]

X is an intermediate tensor in the full forward-pass of the computation graph; it’s the output of the last layer of the body() method.

Returns:
V : 2d Tensor, shape: [batch_size, 1]

The output state values \(v(s)\in\mathbb{R}\).

class keras_gym.predefined.AtariFunctionApproximator(env, optimizer=None, **optimizer_kwargs)

A function approximator specifically designed for Atari 2600 environments.

Parameters:
env : environment

An Atari 2600 gym environment.

optimizer : keras.optimizers.Optimizer, optional

If left unspecified (optimizer=None), the function approximator’s DEFAULT_OPTIMIZER is used. See keras documentation for more details.

**optimizer_kwargs : keyword arguments

Keyword arguments for the optimizer. This is useful when you want to use the default optimizer with a different setting, e.g. changing the learning rate.

DEFAULT_OPTIMIZER

alias of tensorflow.python.keras.optimizer_v2.adam.Adam

body(self, S)

This is the part of the computation graph that may be shared between e.g. policy (actor) and value function (critic). It is typically the part of a neural net that does most of the heavy lifting. One may think of the body() as an elaborate automatic feature extractor.

Parameters:
S : nd Tensor: shape: [batch_size, …]

The input state observation.

Returns:
X : nd Tensor, shape: [batch_size, …]

The intermediate keras tensor.

body_q1(self, S, A)

This is similar to body(), except that it takes a state-action pair as input instead of only state observations.

Parameters:
S : nd Tensor: shape: [batch_size, …]

The input state observation.

A : nd Tensor: shape: [batch_size, …]

The input actions.

Returns:
X : nd Tensor, shape: [batch_size, …]

The intermediate keras tensor.

head_pi(self, X)

This is the policy head. It returns logits, i.e. not probabilities. Use a softmax to turn the output into probabilities.

Parameters:
X : nd Tensor, shape: [batch_size, …]

X is an intermediate tensor in the full forward-pass of the computation graph; it’s the output of the last layer of the body() method.

Returns:
*params : Tensor or tuple of Tensors, shape: [batch_size, …]

These constitute the raw policy distribution parameters.

head_q1(self, X)

This is the type-I Q-value head. It returns a scalar Q-value \(q(s,a)\in\mathbb{R}\).

Parameters:
X : nd Tensor, shape: [batch_size, …]

X is an intermediate tensor in the full forward-pass of the computation graph; it’s the output of the last layer of the body() method.

Returns:
Q_sa : 2d Tensor, shape: [batch_size, 1]

The output type-I Q-values \(q(s,a)\in\mathbb{R}\).

head_q2(self, X)

This is the type-II Q-value head. It returns a vector of Q-values \(q(s,.)\in\mathbb{R}^n\).

Parameters:
X : nd Tensor, shape: [batch_size, …]

X is an intermediate tensor in the full forward-pass of the computation graph; it’s the output of the last layer of the body() method.

Returns:
Q_s : 2d Tensor, shape: [batch_size, num_actions]

The output type-II Q-values \(q(s,.)\in\mathbb{R}^n\).

head_v(self, X)

This is the state value head. It returns a scalar V-value \(v(s)\in\mathbb{R}\).

Parameters:
X : nd Tensor, shape: [batch_size, …]

X is an intermediate tensor in the full forward-pass of the computation graph; it’s the output of the last layer of the body() method.

Returns:
V : 2d Tensor, shape: [batch_size, 1]

The output state values \(v(s)\in\mathbb{R}\).

class keras_gym.predefined.ConnectFourFunctionApproximator(env, optimizer=None, **optimizer_kwargs)

A function approximator specifically designed for the ConnectFour environment.

Parameters:
env : environment

An Atari 2600 gym environment.

optimizer : keras.optimizers.Optimizer, optional

If left unspecified (optimizer=None), the function approximator’s DEFAULT_OPTIMIZER is used. See keras documentation for more details.

**optimizer_kwargs : keyword arguments

Keyword arguments for the optimizer. This is useful when you want to use the default optimizer with a different setting, e.g. changing the learning rate.

DEFAULT_OPTIMIZER

alias of tensorflow.python.keras.optimizer_v2.adam.Adam

body(self, S)

This is the part of the computation graph that may be shared between e.g. policy (actor) and value function (critic). It is typically the part of a neural net that does most of the heavy lifting. One may think of the body() as an elaborate automatic feature extractor.

Parameters:
S : nd Tensor: shape: [batch_size, …]

The input state observation.

Returns:
X : nd Tensor, shape: [batch_size, …]

The intermediate keras tensor.

body_q1(self, S, A)

This is similar to body(), except that it takes a state-action pair as input instead of only state observations.

Parameters:
S : nd Tensor: shape: [batch_size, …]

The input state observation.

A : nd Tensor: shape: [batch_size, …]

The input actions.

Returns:
X : nd Tensor, shape: [batch_size, …]

The intermediate keras tensor.

head_pi(self, X)

This is the policy head. It returns logits, i.e. not probabilities. Use a softmax to turn the output into probabilities.

Parameters:
X : nd Tensor, shape: [batch_size, …]

X is an intermediate tensor in the full forward-pass of the computation graph; it’s the output of the last layer of the body() method.

Returns:
*params : Tensor or tuple of Tensors, shape: [batch_size, …]

These constitute the raw policy distribution parameters.

head_q1(self, X)

This is the type-I Q-value head. It returns a scalar Q-value \(q(s,a)\in\mathbb{R}\).

Parameters:
X : nd Tensor, shape: [batch_size, …]

X is an intermediate tensor in the full forward-pass of the computation graph; it’s the output of the last layer of the body() method.

Returns:
Q_sa : 2d Tensor, shape: [batch_size, 1]

The output type-I Q-values \(q(s,a)\in\mathbb{R}\).

head_q2(self, X)

This is the type-II Q-value head. It returns a vector of Q-values \(q(s,.)\in\mathbb{R}^n\).

Parameters:
X : nd Tensor, shape: [batch_size, …]

X is an intermediate tensor in the full forward-pass of the computation graph; it’s the output of the last layer of the body() method.

Returns:
Q_s : 2d Tensor, shape: [batch_size, num_actions]

The output type-II Q-values \(q(s,.)\in\mathbb{R}^n\).

head_v(self, X)

This is the state value head. It returns a scalar V-value \(v(s)\in\mathbb{R}\).

Parameters:
X : nd Tensor, shape: [batch_size, …]

X is an intermediate tensor in the full forward-pass of the computation graph; it’s the output of the last layer of the body() method.

Returns:
V : 2d Tensor, shape: [batch_size, 1]

The output state values \(v(s)\in\mathbb{R}\).