Reinforcement Learning (RL) is all the rage nowadays. It doesn’t matter if you want to play Atari 2600 games, to master go (twice) or even chess, you should probably use RL. Almost two years ago Andrej Karpathy wrote an excellent post “Pong from Pixels”. He did a wonderful job at explaining the RL problem and a specific RL algorithm (Policy Gradient) by the example of learning to play Pong. Here we extend his work, and compare multiple approaches to learn Pong. In contrast to the work of Kaparthy or DeepMind, we do not solve Pong “from pixels”, but from the actual internal state of the game. It’s easier, and allows us to focus on the RL part of the problem rather than the image processing part.

We rely on deep learning as a building block within classical RL algorithms. Solving Pong is not a very challenging task. The focus of this project is on understanding the merits and pitfalls of RL algorithms when combined with deep learning.

The structure of this website is as follows:

  • In the Introduction we give a very short introduction to RL, to make sure we’re all on the same page, and expand on what are we trying to do here.

  • Perhaps the simplest approach to solve RL problems is by imitating an expert. In Chapter 1 we describe this approach and underscore its limitations.

  • In Chapter 2: When You Know the Model we present the most common mathematical formulation of the problem (MDP), and some classic solutions. We also show how they actually work when applied to Pong.

  • In Chapter 3: AlphaZero we present DeepMind’s breakthrough algorithm AlphaGo Zero, and our own implementation, AlphaPong Zero.

  • In Chapter 4: Learning While Playing Part 1 we move on to learn how to play Pong without knowing the rules in advance, and present the Policy Gradient algorithm. We also give a variant of this algorithm with improved peformance.

  • In Chapter 5: Learning While Playing Part 2 we present DeepMind’s Deep-Q-Learning algorithm, and some previously published variants and some of our own making.

  • In Chapter 6: Learning While Playing Part 3 we present Actor-Critic methods: methods that attempt to combine the other algorithms we described. This combination does indeed obtain the best results in our experiments.

  • Finally, there are some References.

Should you read this article? If you’re interested in RL, but can’t find your feet in all the mess of concepts and algorithms, you should certainly read this. If you’re an RL expert, you probably already know most of it, but you might find some of the empirical results interesting. If you’re not at all interested in RL, then you should most certainly not read this article. Here, play some Pong instead:

Play Pong. Use the arrows to move the right paddle.

Anyway, we hope you’ll enjoy reading through. And if you don’t know yet where to begin, simply start at the Introduction.