Introduction to this Chapter:
Hello, brave explorer! At the end of the previous chapter, we encountered the high wall of "the curse of dimensionality." Traditional tabular Q-Learning proved inadequate when faced with massive state spaces like those in Atari games, which take raw pixels as input. In 2013, the DeepMind team (later acquired by Google) dropped a bombshell: Deep Q-Network (DQN). It successfully combined deep convolutional neural networks (CNNs) with Q-Learning, enabling agents to learn to play various Atari games just by observing the game screen, even surpassing the performance of human professional players in many of them.
The advent of DQN marked the official beginning of the era of Deep Reinforcement Learning (DRL). In this chapter, we will delve into the two core secrets behind DQN's success.
🧠
DQN's first core idea is to replace the Q-Table with a deep neural network. This network is called the Q-network.
Q(s, a; w)
:
s
: State (e.g., pixel data from the game screen).a
.w
: The weights and biases of the neural network. Our goal is to learn this set of optimal parameters w
.Now, the Q-Learning update objective is no longer to update a single cell in a table, but to update the network's parameters w
through gradient descent.
Loss Function: We want the Q-network's predicted value Q(s, a; w)
to be as close as possible to the TD target y
. Therefore, we can define a Mean Squared Error (MSE) loss function:
L(w) = E[ (y - Q(s, a; w))² ]
y
(TD Target): R + γ * max_{a'} Q(s', a'; w)
Seems simple? But it hides a crisis! 💥
If we directly train using this approach, we will find that the network is difficult to converge, and may even collapse. This is mainly due to two problems:
y
is the same as the Q-network we are currently updating. This means that with every update step, the target y
itself also changes. This is like chasing a moving target, making it difficult to stabilize.✨
To solve the two problems mentioned above, DQN introduced two pioneering techniques: