Join mailing list →

Why DeepMind’s breakthrough ML algorithm was so tricky to train


In the past decade, Deep Reinforcement Learning has conquered games ranging from Go to StarCraft and Poker. This progress can be traced back to DeepMind’s DQN algorithm [1], the first time Deep Learning and Reinforcement Learning were successfully combined.

First published in 2013, Deep Q-Networks (DQN) learn to take actions that outperform humans in a suite of Atari games, taking only the pixel values of the game as input. Learning such effective behaviour directly from experience is a remarkable achievement. It’s the first hint at a viable path towards Artificial General Intelligence - the ability of computers to get intelligent across a wide range of tasks, similar or greater than humans are!

Part of the reason DQN was such an impressive advance is that Q-learning - the subtype of Reinforcement Learning used - is very hard to train. To understand why we’ll first look at how Q-Learning works under the hood. We’ll then explore why it’s so tricky, and dig into how DeepMind trained DQN successfully.

How Q-Learning works

DeepMind’s breakthrough combined neural networks with reinforcement learning (RL) for the first time. An RL agent interacts with an environment, receiving rewards if it takes favourable actions. The goal of an RL algorithm is to maximise the long-term sum of rewards it receives. Specifically, DQN uses a type of RL called Q-learning (hence the name DQN) - we’ll see what this means shortly.

This makes RL incredibly powerful. RL can be used to solve any decision-making task.