Effective Baselines

These are mostly just a collection of disjointed thought’s I had whilst encountering my most recent f’up.

https://twitter.com/reach_vb/status/1534564806904238085?s=20&t=oRgSrmJIKAgnTqUoyyFfKA

In the last 2 years I’ve worked with 3 different research groups and all of them had a particular focus on establishing a strong baseline before testing out any idea in the wild. I, being the naïve engineer I am always continued to ignore all those advices and focused more on the engineering/ building aspect of the whole idea.

Turns out, I couldn’t be more wrong. “build fast, fail faster” works well in a setup where you already have strong foundation to rely on. You can’t really “fail fast” if you don’t have something to build fast against.. something to compare with. Progress in most cases, is measured relatively, not absolutely.

Anyway, enough BS let’s look into what does it take to establish a strong baseline. Typically you need 3 things to put together an effective baseline.

Objective - The key criteria that you want to test the presence of.
Ground Truth - The equivalent of training samples where you know that the actual result for the Objective defined above is.
Test Samples - A controlled yet easily interpretable set of instances that you benchmark baselines and your “method” against.

While this looks trivial, #1 is where you should spend the most time and in most cases have multiple different objectives to benchmark against. The more precise the objective function the better it is.

Let’s unpack this a bit more, say you are supposed to build a classifier to identify toxic tweets.