Special Policies

keras_gym.policies.RandomPolicy Value-based policy to select actions using epsilon-greedy strategy.
keras_gym.policies.UserInputPolicy A policy that prompts the user to take an action.
class keras_gym.policies.RandomPolicy(env, random_seed=None)[source]

Value-based policy to select actions using epsilon-greedy strategy.

Parameters:
env : gym environment

The gym environment is used to sample from the action space.

random_seed : int, optional

Sets the random state to get reproducible results.

__call__(self, s)[source]

Draw an action from the current policy \(\pi(a|s)\).

Parameters:
s : state observation

A single state observation.

Returns:
a : action

A single action proposed under the current policy.

dist_params(self, s)[source]

Get the parameters of the (conditional) probability distribution \(\pi(a|s)\).

Parameters:
s : state observation

A single state observation.

Returns:
params : nd array

An array containing the distribution parameters.

greedy(self, s)[source]

Draw the greedy action, i.e. \(\arg\max_a\pi(a|s)\).

Parameters:
s : state observation

A single state observation.

Returns:
a : action

A single action proposed under the current policy.

class keras_gym.policies.UserInputPolicy(env, render_before_prompt=False)[source]

A policy that prompts the user to take an action.

Parameters:
env : gym environment

The gym environment is used to sample from the action space.

render_before_prompt : bool, optional

Whether to render the env before prompting the user to pick an action.

__call__(self, s)[source]

Draw an action from the current policy \(\pi(a|s)\).

Parameters:
s : state observation

A single state observation.

Returns:
a : action

A single action proposed under the current policy.

greedy(self, s)[source]

Draw the greedy action, i.e. \(\arg\max_a\pi(a|s)\).

Parameters:
s : state observation

A single state observation.

Returns:
a : action

A single action proposed under the current policy.