TLDR: I go from wanting a machine learning model to getting that trained model, without actually having a dataset.

I finally got GPT-3 access (big thanks to GDB at OpenAI) and took a stab at doing a fun side-project using GPT-3.

I decided that I want to be able to say "train me a classifier for X", and to have a procedure that spits out a trained classifier without having to manually collect a dataset or do any annotation.

I was curious if GPT-3 lets me do this: can I use generation to create a dataset for a language task and then train a model for that task that actually works?

Why? My main motivator was research curiosity, specifically understanding the fidelity of few-shot generation with large language models. I do a lot of work on data augmentation (e.g. model patching, preprint coming soon), so I'm always interested in ways we can cheaply increase the amount of data available for a task.

Another more practical consideration is that it would be nice to be able to bootstrap to a model very quickly without having to waste time on data collection, which is a fairly expensive and time-consuming process.

Let me add here that I don't actually expect this to work very well.

Building machine learning models is hard, and reducing this process to the press of a button is highly non-trivial. But it's still fun to try to understand what's possible and what's not, and hopefully learn a few things along the way.

Lastly, you could just learn a few-shot model with GPT-3, but there are a lot of reasons you may not want to: you're in a resource-constrained setting and want a small model that's very good at one thing only, or maybe you don't want to share your (test) data with OpenAI.

So to summarize, I want a procedure that looks like

  1. Me: Ask for a classifier for my task
  2. Machine: Automatically curates an appropriate (artificial) dataset for training