Monitors

keras_gym.wrappers.TrainMonitor Environment wrapper for monitoring the training process.
class keras_gym.wrappers.TrainMonitor(env, tensorboard_dir=None)[source]

Environment wrapper for monitoring the training process.

This wrapper logs some diagnostics at the end of each episode and it also gives us some handy attributes (listed below).

Parameters:
env : gym environment

A gym environment.

tensorboard_dir : str, optional

If provided, TrainMonitor will log all diagnostics to be viewed in tensorboard. To view these, point tensorboard to the same dir:

$ tensorboard --logdir {tensorboard_dir}
Attributes:
T : positive int

Global step counter. This is not reset by env.reset(), use env.reset_global() instead.

ep : positive int

Global episode counter. This is not reset by env.reset(), use env.reset_global() instead.

t : positive int

Step counter within an episode.

G : float

The return, i.e. amount of reward accumulated from the start of the current episode.

avg_G : float

The average return G, averaged over the past 100 episodes.

dt_ms : float

The average wall time of a single step, in milliseconds.

close(self)

Override close in your subclass to perform any necessary cleanup.

Environments will automatically close() themselves when garbage collected or when the program exits.

record_losses(self, losses)[source]

Record losses during the training process.

These are used to print more diagnostics.

Parameters:
losses : dict

A dict of losses/metrics, of type {name <str>: value <float>}.

render(self, mode='human', **kwargs)

Renders the environment.

The set of supported modes varies per environment. (And some environments do not support rendering at all.) By convention, if mode is:

  • human: render to the current display or terminal and return nothing. Usually for human consumption.
  • rgb_array: Return an numpy.ndarray with shape (x, y, 3), representing RGB values for an x-by-y pixel image, suitable for turning into a video.
  • ansi: Return a string (str) or StringIO.StringIO containing a terminal-style text representation. The text can include newlines and ANSI escape sequences (e.g. for colors).
Note:
Make sure that your class’s metadata ‘render.modes’ key includes
the list of supported modes. It’s recommended to call super() in implementations to use the functionality of this method.
Args:
mode (str): the mode to render with

Example:

class MyEnv(Env):

metadata = {‘render.modes’: [‘human’, ‘rgb_array’]}

def render(self, mode=’human’):
if mode == ‘rgb_array’:
return np.array(…) # return RGB frame suitable for video
elif mode == ‘human’:
… # pop up a window and render
else:
super(MyEnv, self).render(mode=mode) # just raise an exception
reset(self)[source]

Resets the state of the environment and returns an initial observation.

Returns:
observation (object): the initial observation.
reset_global(self)[source]

Reset the global counters, not just the episodic ones.

seed(self, seed=None)

Sets the seed for this env’s random number generator(s).

Note:
Some environments use multiple pseudorandom number generators. We want to capture all such seeds used in order to ensure that there aren’t accidental correlations between multiple generators.
Returns:
list<bigint>: Returns the list of seeds used in this env’s random
number generators. The first value in the list should be the “main” seed, or the value which a reproducer should pass to ‘seed’. Often, the main seed equals the provided ‘seed’, but this won’t be true if seed=None, for example.
step(self, a)[source]

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Args:
action (object): an action provided by the agent
Returns:
observation (object): agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
unwrapped

Completely unwrap this env.

Returns:
gym.Env: The base non-wrapped gym.Env instance