Preamble

What is this?

In a nutshell: I recently worked on reimplementing reinforcement learning algorithms following the curriculum recommended by OpenAI's Spinning Up. This write-up covers my experiences, which I hope to be a useful supplement for any learners considering doing the same.

An alternative title for this meta-guide could have been "Spinning Up In Spinning Up", but that would be a terribly uninformative title just for the sake of a rather weak joke.

Motivation

Half a year ago, I was listening to The 80,000 Hours Podcast and episodes #3 with Dario Amodei and #47 with Catherine Olsson & Daniel Ziegler caught my attention. They spoke matter-of-factly about a straightforward method for testing individual fit to AI research careers - simply reimplement research papers.

In particular, Daniel Ziegler details his experience spending six full-time weeks ramping up by reading and implementing key papers in Deep Reinforcement Learning (RL), leading to a full-time position at OpenAI.

Six full-time weeks to get a taste of RL work seemed like a more-than-reasonable tradeoff to me. Of course, I still had a full-time job so I would need more than six weeks, but this was 2020 during the lockdown - I had some weekends to spare anyway.

How Did It Go?

I'll cut to the chase. Working on this part-time for about 15 hours a week for six months took me from "Who is a policy?" to where I am today. Six months sounds like a lot but at 15 hours a week that's just 360 hours - not too far from Daniel's six full-time weeks.

Of those six months, the first three were spent getting up to speed on the prerequisites I lacked, and the latter three months focused on implementing key algorithms.

Here's what I managed in those 360 hours:

Completed a basic course on reinforcement learning.
Re-implemented the main RL algorithms that are implemented in Spinning Up.
Read close to 20 research papers on RL (classics as well as modern ones).
Became comfortable with the main ideas in RL, and have a finger on the pulse of some interesting research directions of the field.

I am still very much a beginner, but I am very pleased with what I have learned, and have many directions now for further RL projects.

What is Spinning Up, and Why Use It?

If you're simply looking to replicate the results of an RL paper, you could do without any of this - you can go straight from paper to code.