Atari 2600: Pong with PPO

In this notebook we solve the PongDeterministic-v4 environment using a TD actor-critic algorithm with PPO policy updates.

We use convolutional neural nets (without pooling) as our function approximators for the state value function \(v(s)\) and policy \(\pi(a|s)\), see AtariFunctionApproximator.

This notebook periodically generates GIFs, so that we can inspect how the training is progressing.

After a few hundred episodes, this is what you can expect:

Beating Atari 2600 Pong after a few hundred episodes.

To view the notebook in a new tab, click here. To interact with the notebook in Google Colab, hit the “Open in Colab” button below.

Open in Colab