Atari 2600: Pong with PPO¶
In this notebook we solve the PongDeterministic-v4 environment using a TD actor-critic algorithm with PPO policy updates.
We use convolutional neural nets (without pooling) as our function
approximators for the state value function \(v(s)\) and
policy \(\pi(a|s)\), see
AtariFunctionApproximator
.
This notebook periodically generates GIFs, so that we can inspect how the training is progressing.
After a few hundred episodes, this is what you can expect:
To view the notebook in a new tab, click here. To interact with the notebook in Google Colab, hit the “Open in Colab” button below.