Est. time to complete: 1 hour 30 mins

https://embed.notionlytics.com/wt/ZXlKd1lXZGxTV1FpT2lJME4yTmlZVGN3TURGa1lqSTBZemt5WWpWbFlqaGxOVEF3WXpOaE5HWmhNeUlzSW5kdmNtdHpjR0ZqWlZSeVlXTnJaWEpKWkNJNklsRjBaRGt4TVRWNGJVVk9aVlJaYm5BMWIxUkhJbjA9

Parts of this tutorial have been adapted from Reinforcement Learning: an Introduction

Tutorial 1 Terminology Recap Quiz

https://docs.google.com/forms/d/e/1FAIpQLSdfViYc_OEILgtFzMCRn40IpnYy1hCaDfbTOzdyHlvrpuh-sA/viewform

In the previous tutorial, we saw how reinforcement learning algorithms learn a policy. The algorithm’s aim is to find the optimal policy. This is the policy that takes the actions that maximise the sum of future rewards received.

In this tutorial, we start by better defining the goal of learning the optimal policy. We then introduce the key concept (value functions) and equation (Bellman Equation) that allow us to build our first reinforcement learning algorithm in Tutorial 3!

1. Return $G_t$

In Tutorial 1 we discussed, informally, the objective of reinforcement learning algorithms. We said that the goal of a reinforcement learning algorithm is to maximize the cumulative reward it receives in the long run.

We define this as the return, denoted $G_t$.