Working on the Cutting Edge of AI & LLMs – Karina on Lenny’s Podcast

🎧 Podcast | Lenny’s Podcast | 74 min

👩‍🔬 Karina Nguyen, AI Researcher at OpenAI (ex‑Anthropic, New York Times, Dropbox, Square)

📺 Watch on YouTube

⏱️ Full Episode Listening Time: 74 mins

⏱️ Summary Reading Time: ~8 mins

🧠 Key Concepts

“Any method can yield strategic or tactical insights—what matters is the problem you solve.”

Model training is an art, not a recipe.

Debugging LLMs looks a lot like software development—finding where they get confused (e.g. “I have no body to set a real‑world alarm”) and iterating.
Synthetic data unlocks rapid iteration.

By auto‑generating training examples for specific behaviours (triggering, editing, commenting in Canvas), teams can teach new features without endless human labelling.
Evals are your north star.

Clear, automated and human‑in‑the‑loop evaluations (“does this prompt fire Canvas?” “Which model completion wins?”) measure progress and prevent regressions.
Problem‑first, method‑agnostic mindset.

Start with the team’s biggest blockage, then pick or combine methods—tactical usability tests or strategic field studies—to surface the right insight.
Soft skills drive impact.

Creativity, prioritisation, influence, and collaboration—especially “insight primers” before roadmap planning—earn you a seat at the table.

🛠️ Methods & Frameworks

Concept	How to Use It
Synthetic Data Training	Auto‑generate examples for new behaviours (e.g. Canvas edit, comment, trigger) to train and validate features.
Automated + Human Evals	Build simple pass/fail tests and human rating workflows to benchmark models continuously.
Problem‑First Planning	Define the core user or business problem, then choose your research or prototyping method.
Insight Primers	Deliver concise research briefs ahead of strategic planning cycles to influence direction.
Hybrid Insights	Weave strategic questions into tactical sessions (and vice versa) via warm‑up phases.

🪙 My 5 pence

Karina’s peek behind OpenAI’s curtain transformed how I think about model‑driven product research. The combination of synthetic data + rigorous evals feels like the UX equivalent of guerrilla testing on steroids—rapid, targeted, and measurable.

Most of all, it reinforced that no method is sacred. Whether you’re running a field study or prototyping via prompt engineering, the real value is the insight you deliver and the problems you unblock.