Self-Play Environments¶
keras_gym.envs.ConnectFourEnv |
An adversarial environment for playing the Connect-Four game. |
-
class
keras_gym.envs.
ConnectFourEnv
[source]¶ An adversarial environment for playing the Connect-Four game.
Attributes: - action_space : gym.spaces.Discrete(7)
The action space.
- observation_space : MultiDiscrete(nvec)
The state observation space, representing the position of the current player’s tokens (
s[1:,:,0]
) and the other player’s tokens (s[1:,:,1]
) as well as a mask over the space of actions, indicating which actions are available to the current player (s[0,:,0]
) or the other player (s[0,:,1]
).Note: The “current” player is relative to whose turn it is, which means that the entries
s[:,:,0]
ands[:,:,1]
swap between turns.- max_time_steps : int
Maximum number of timesteps within each episode.
- available_actions : array of int
Array of available actions. This list shrinks when columns saturate.
- win_reward : 1.0
The reward associated with a win.
- loss_reward : -1.0
The reward associated with a loss.
- draw_reward : 0.0
The reward associated with a draw.
-
close
(self)¶ Override close in your subclass to perform any necessary cleanup.
Environments will automatically close() themselves when garbage collected or when the program exits.
-
reset
(self)[source]¶ Reset the environment to the starting position.
Returns: - s : 3d-array, shape: [num_rows + 1, num_cols, num_players]
A state observation, representing the position of the current player’s tokens (
s[1:,:,0]
) and the other player’s tokens (s[1:,:,1]
) as well as a mask over the space of actions, indicating which actions are available to the current player (s[0,:,0]
) or the other player (s[0,:,1]
).Note: The “current” player is relative to whose turn it is, which means that the entries
s[:,:,0]
ands[:,:,1]
swap between turns.
-
seed
(self, seed=None)¶ Sets the seed for this env’s random number generator(s).
- Note:
- Some environments use multiple pseudorandom number generators. We want to capture all such seeds used in order to ensure that there aren’t accidental correlations between multiple generators.
- Returns:
- list<bigint>: Returns the list of seeds used in this env’s random
- number generators. The first value in the list should be the “main” seed, or the value which a reproducer should pass to ‘seed’. Often, the main seed equals the provided ‘seed’, but this won’t be true if seed=None, for example.
-
step
(self, a)[source]¶ Take one step in the MDP, following the single-player convention from gym.
Parameters: - a : int, options: {0, 1, 2, 3, 4, 5, 6}
The action to be taken. The action is the zero-based count of the possible insertion slots, starting from the left of the board.
Returns: - s_next : array, shape [6, 7, 2]
A next-state observation, representing the position of the current player’s tokens (
s[1:,:,0]
) and the other player’s tokens (s[1:,:,1]
) as well as a mask over the space of actions, indicating which actions are available to the current player (s[0,:,0]
) or the other player (s[0,:,1]
).Note: The “current” player is relative to whose turn it is, which means that the entries
s[:,:,0]
ands[:,:,1]
swap between turns.- r : float
Reward associated with the transition \((s, a)\to s_\text{next}\).
Note: Since “current” player is relative to whose turn it is, you need to be careful about aligning the rewards with the correct state or state-action pair. In particular, this reward \(r\) is the one associated with the \(s\) and \(a\), i.e. not aligned with \(s_\text{next}\).
- done : bool
Whether the episode is done.
- info : dict or None
A dict with some extra information (or None).
-
unwrapped
¶ Completely unwrap this env.
- Returns:
- gym.Env: The base non-wrapped gym.Env instance