Why Can't I Reproduce Their Results?

Or

What I Wish I Knew as a Graduate Student.

At the start of my PhD I remember well the experience of reading state-of-the-art papers and trying to re-implement them to reproduce their results.

To say this was a frustrating experience was an understatement, and I consistently achieved the same result: it didn't work. Whatever I tried, I couldn't reproduce their results, and even when I did it wasn't the same - the quality was worse and it broke all the time.

Along with this, I'd seen the skepticism which had destroyed the motivation of many of my friends and colleagues. I started to wonder if some of the things they were saying were true: "it's all cronyism and favors at the top" - "authors purposefully remove details to maintain their competitive advantage" - "results are all cherry picked" - "it's just nonsense hidden behind mathematical jargon". Well - it's hard to deny that sometimes some of these things do have an element of truth to them.

But what I didn't realize as a graduate student is this: reproducing papers is the learning-by-rote of academia. If you're feeling the pain it's because you're doing it right. The goal of the fake-it-until-you-make-it school of learning isn't to actually succeed - it's to build up enough familiarity with something so that the next time you attempt to do it, you're not blinded by fear or undermined by a lack of confidence.

Nonetheless, the pain and frustration you can feel as a graduate student is real and if I could go back in time to give myself some advice, at least to ease the pain a bit, this is what I might say:

Why doesn't it work?

Why Doesn't it Work?

I'm about to tell you something which can sometimes be harder to believe than conspiracy theories about academia: you've got a bug in your code.

I can't tell you what it is, because it could literally be one of a million things - ranging from the mundane to the fundamental. Perhaps you have a typo? Perhaps you used the wrong variable name? Perhaps you called a function with the arguments in the wrong order? Perhaps you're calling the wrong function? Perhaps you misunderstood what a particular function does? Perhaps you have an off by one error? Or you indexed an array incorrectly? Perhaps you have a bug in your data pre-processing? Perhaps your data isn't clean in the first place? Perhaps it has outliers or invalid entries? Perhaps there is a bug in your visualization? Perhaps you're visualizing the wrong thing? Perhaps you need to transpose your matrices? Perhaps you're using the path to the wrong file? Perhaps you have numerical issues? Perhaps you need to add a small epsilon value to some equation? The list is endless...

Debugging research code is extremely difficult, and requires a different mind set - the attention to detail you need to adopt will be beyond anything you've done before. With research code, and numerical or data-driven code in particular, bugs will not manifest themselves in crashes. Often bugged code will not only run, but produce some kind of broken result. It is up to you to validate that every line of code you write is correct. No one is going to do that for you. And yes, sometimes that means doing things the hard way: examining data by hand, and staring at lines of code one by one until you spot what is wrong.

Why are my results worse?

One thing that I think attracts a lot of us to academia is the thought that it may be a domain where ideas, rather than other factors, can triumph. We all want to work in a field where a good idea, a good way of doing something, or solving something, or thinking about something, is all that is required for being recognized.

But we also work in the field of computer science, and that means people don't just want an idea - they want a proof - they want the idea implemented in code on a computer, with experiments, and evaluations, and comparisons.

And here is where it gets tricky, because programming is undeniably a skill to be practiced and improved, and no good idea will produce good results if you don't have the skill to implement it properly. Experience matters too. Things like how quickly you can iterate, how intuitively you can work out what is wrong, how easily you can fix it, how deeply you understand the concepts you're using, how many times you've programmed this sort of thing before, all make a massive difference in the manifestation of the idea.

In fact, most often research in computer science is not at all the meritocracy of ideas we imagine. An average idea executed well tends to produce better results than a good idea executed poorly. And I'm sorry to say, but this is most likely why your results are worse - the original authors just have more practice and experience doing this sort of thing than you - that's all.

So give it time, with experience and practice your results will improve, and eventually you will be ready to combine them with an excellent idea - ready for the perfect slam-dunk.

Why is the notation so difficult and imprecise?

Have you ever read a mathematical paper from before the time mathematical notation was invented? Take a look at this quote from a mathematical paper of 1575 which introduces the equality sign: