Value-Based Policies

keras_gym.policies.EpsilonGreedy Value-based policy to select actions using epsilon-greedy strategy.
class keras_gym.policies.EpsilonGreedy(q_function, epsilon=0.1, random_seed=None)[source]

Value-based policy to select actions using epsilon-greedy strategy.

Parameters:
q_function : callable

A state-action value function object.

epsilon : float between 0 and 1

The probability of selecting an action uniformly at random.

random_seed : int, optional

Sets the random state to get reproducible results.

__call__(self, s)[source]

Draw an action from the current policy \(\pi(a|s)\).

Parameters:
s : state observation

A single state observation.

Returns:
a : action

A single action proposed under the current policy.

dist_params(self, s)[source]

Get the parameters of the (conditional) probability distribution \(\pi(a|s)\).

Parameters:
s : state observation

A single state observation.

Returns:
params : nd array

An array containing the distribution parameters.

greedy(self, s)[source]

Draw the greedy action, i.e. \(\arg\max_a\pi(a|s)\).

Parameters:
s : state observation

A single state observation.

Returns:
a : action

A single action proposed under the current policy.

set_epsilon(self, epsilon)[source]

Change the value for epsilon.

Parameters:
epsilon : float between 0 and 1

The probability of selecting an action uniformly at random.

Returns:
self

The updated instance.