Value-Based Policies¶

keras_gym.policies.EpsilonGreedy Value-based policy to select actions using epsilon-greedy strategy.

class keras_gym.policies.EpsilonGreedy(q_function, epsilon=0.1, random_seed=None)[source]¶

Value-based policy to select actions using epsilon-greedy strategy.

Parameters:	q_function : callable A state-action value function object. epsilon : float between 0 and 1 The probability of selecting an action uniformly at random. random_seed : int, optional Sets the random state to get reproducible results.

__call__(self, s)[source]¶

Draw an action from the current policy \(\pi(a|s)\).

Parameters:	s : state observation A single state observation.
Returns:	a : action A single action proposed under the current policy.

dist_params(self, s)[source]¶

Get the parameters of the (conditional) probability distribution \(\pi(a|s)\).

Parameters:	s : state observation A single state observation.
Returns:	params : nd array An array containing the distribution parameters.

greedy(self, s)[source]¶

Draw the greedy action, i.e. \(\arg\max_a\pi(a|s)\).

Parameters:	s : state observation A single state observation.
Returns:	a : action A single action proposed under the current policy.

set_epsilon(self, epsilon)[source]¶

Change the value for epsilon.

Parameters:	epsilon : float between 0 and 1 The probability of selecting an action uniformly at random.
Returns:	self The updated instance.