Self-Play Environments

keras_gym.envs.ConnectFourEnv An adversarial environment for playing the Connect-Four game.
class keras_gym.envs.ConnectFourEnv[source]

An adversarial environment for playing the Connect-Four game.

action_space : gym.spaces.Discrete(7)

The action space.

observation_space : MultiDiscrete(nvec)

The state observation space, representing the position of the current player’s tokens (s[1:,:,0]) and the other player’s tokens (s[1:,:,1]) as well as a mask over the space of actions, indicating which actions are available to the current player (s[0,:,0]) or the other player (s[0,:,1]).

Note: The “current” player is relative to whose turn it is, which means that the entries s[:,:,0] and s[:,:,1] swap between turns.

max_time_steps : int

Maximum number of timesteps within each episode.

available_actions : array of int

Array of available actions. This list shrinks when columns saturate.

win_reward : 1.0

The reward associated with a win.

loss_reward : -1.0

The reward associated with a loss.

draw_reward : 0.0

The reward associated with a draw.


Override close in your subclass to perform any necessary cleanup.

Environments will automatically close() themselves when garbage collected or when the program exits.

render(self, *args, **kwargs)[source]

Render the current state of the environment.


Reset the environment to the starting position.

s : 3d-array, shape: [num_rows + 1, num_cols, num_players]

A state observation, representing the position of the current player’s tokens (s[1:,:,0]) and the other player’s tokens (s[1:,:,1]) as well as a mask over the space of actions, indicating which actions are available to the current player (s[0,:,0]) or the other player (s[0,:,1]).

Note: The “current” player is relative to whose turn it is, which means that the entries s[:,:,0] and s[:,:,1] swap between turns.

seed(self, seed=None)

Sets the seed for this env’s random number generator(s).

Some environments use multiple pseudorandom number generators. We want to capture all such seeds used in order to ensure that there aren’t accidental correlations between multiple generators.
list<bigint>: Returns the list of seeds used in this env’s random
number generators. The first value in the list should be the “main” seed, or the value which a reproducer should pass to ‘seed’. Often, the main seed equals the provided ‘seed’, but this won’t be true if seed=None, for example.
step(self, a)[source]

Take one step in the MDP, following the single-player convention from gym.

a : int, options: {0, 1, 2, 3, 4, 5, 6}

The action to be taken. The action is the zero-based count of the possible insertion slots, starting from the left of the board.

s_next : array, shape [6, 7, 2]

A next-state observation, representing the position of the current player’s tokens (s[1:,:,0]) and the other player’s tokens (s[1:,:,1]) as well as a mask over the space of actions, indicating which actions are available to the current player (s[0,:,0]) or the other player (s[0,:,1]).

Note: The “current” player is relative to whose turn it is, which means that the entries s[:,:,0] and s[:,:,1] swap between turns.

r : float

Reward associated with the transition \((s, a)\to s_\text{next}\).

Note: Since “current” player is relative to whose turn it is, you need to be careful about aligning the rewards with the correct state or state-action pair. In particular, this reward \(r\) is the one associated with the \(s\) and \(a\), i.e. not aligned with \(s_\text{next}\).

done : bool

Whether the episode is done.

info : dict or None

A dict with some extra information (or None).


Completely unwrap this env.

gym.Env: The base non-wrapped gym.Env instance