Special Policies¶

`keras_gym.policies.RandomPolicy`	Value-based policy to select actions using epsilon-greedy strategy.
`keras_gym.policies.UserInputPolicy`	A policy that prompts the user to take an action.

class keras_gym.policies.RandomPolicy(env, random_seed=None)[source]¶

Value-based policy to select actions using epsilon-greedy strategy.

Parameters:	env : gym environment The gym environment is used to sample from the action space. random_seed : int, optional Sets the random state to get reproducible results.

__call__(self, s)[source]¶

Draw an action from the current policy \(\pi(a|s)\).

Parameters:	s : state observation A single state observation.
Returns:	a : action A single action proposed under the current policy.

dist_params(self, s)[source]¶

Get the parameters of the (conditional) probability distribution \(\pi(a|s)\).

Parameters:	s : state observation A single state observation.
Returns:	params : nd array An array containing the distribution parameters.

greedy(self, s)[source]¶

Draw the greedy action, i.e. \(\arg\max_a\pi(a|s)\).

Parameters:	s : state observation A single state observation.
Returns:	a : action A single action proposed under the current policy.

class keras_gym.policies.UserInputPolicy(env, render_before_prompt=False)[source]¶

A policy that prompts the user to take an action.

Parameters:	env : gym environment The gym environment is used to sample from the action space. render_before_prompt : bool, optional Whether to render the env before prompting the user to pick an action.

__call__(self, s)[source]¶

Draw an action from the current policy \(\pi(a|s)\).

Parameters:	s : state observation A single state observation.
Returns:	a : action A single action proposed under the current policy.

greedy(self, s)[source]¶

Draw the greedy action, i.e. \(\arg\max_a\pi(a|s)\).

Parameters:	s : state observation A single state observation.
Returns:	a : action A single action proposed under the current policy.