🎧 Podcast | Lenny’s Podcast | 74 min
👩🔬 Karina Nguyen, AI Researcher at OpenAI (ex‑Anthropic, New York Times, Dropbox, Square)
⏱️ Full Episode Listening Time: 74 mins
⏱️ Summary Reading Time: ~8 mins
“Any method can yield strategic or tactical insights—what matters is the problem you solve.”
Model training is an art, not a recipe.
Debugging LLMs looks a lot like software development—finding where they get confused (e.g. “I have no body to set a real‑world alarm”) and iterating.
Synthetic data unlocks rapid iteration.
By auto‑generating training examples for specific behaviours (triggering, editing, commenting in Canvas), teams can teach new features without endless human labelling.
Evals are your north star.
Clear, automated and human‑in‑the‑loop evaluations (“does this prompt fire Canvas?” “Which model completion wins?”) measure progress and prevent regressions.
Problem‑first, method‑agnostic mindset.
Start with the team’s biggest blockage, then pick or combine methods—tactical usability tests or strategic field studies—to surface the right insight.
Soft skills drive impact.
Creativity, prioritisation, influence, and collaboration—especially “insight primers” before roadmap planning—earn you a seat at the table.
Concept | How to Use It |
---|---|
Synthetic Data Training | Auto‑generate examples for new behaviours (e.g. Canvas edit, comment, trigger) to train and validate features. |
Automated + Human Evals | Build simple pass/fail tests and human rating workflows to benchmark models continuously. |
Problem‑First Planning | Define the core user or business problem, then choose your research or prototyping method. |
Insight Primers | Deliver concise research briefs ahead of strategic planning cycles to influence direction. |
Hybrid Insights | Weave strategic questions into tactical sessions (and vice versa) via warm‑up phases. |
Karina’s peek behind OpenAI’s curtain transformed how I think about model‑driven product research. The combination of synthetic data + rigorous evals feels like the UX equivalent of guerrilla testing on steroids—rapid, targeted, and measurable.
Most of all, it reinforced that no method is sacred. Whether you’re running a field study or prototyping via prompt engineering, the real value is the insight you deliver and the problems you unblock.