In a nutshell: I recently worked on reimplementing reinforcement learning algorithms following the curriculum recommended by OpenAI's Spinning Up. This write-up covers my experiences, which I hope to be a useful supplement for any learners considering doing the same.
An alternative title for this meta-guide could have been "Spinning Up In Spinning Up", but that would be a terribly uninformative title just for the sake of a rather weak joke.
Half a year ago, I was listening to The 80,000 Hours Podcast and episodes #3 with Dario Amodei and #47 with Catherine Olsson & Daniel Ziegler caught my attention. They spoke matter-of-factly about a straightforward method for testing individual fit to AI research careers - simply reimplement research papers.
In particular, Daniel Ziegler details his experience spending six full-time weeks ramping up by reading and implementing key papers in Deep Reinforcement Learning (RL), leading to a full-time position at OpenAI.
Six full-time weeks to get a taste of RL work seemed like a more-than-reasonable tradeoff to me. Of course, I still had a full-time job so I would need more than six weeks, but this was 2020 during the lockdown - I had some weekends to spare anyway.
I'll cut to the chase. Working on this part-time for about 15 hours a week for six months took me from "Who is a policy?" to where I am today. Six months sounds like a lot but at 15 hours a week that's just 360 hours - not too far from Daniel's six full-time weeks.
Of those six months, the first three were spent getting up to speed on the prerequisites I lacked, and the latter three months focused on implementing key algorithms.
Here's what I managed in those 360 hours:
I am still very much a beginner, but I am very pleased with what I have learned, and have many directions now for further RL projects.
If you're simply looking to replicate the results of an RL paper, you could do without any of this - you can go straight from paper to code.