Role: Data Engineer (VoiceAI)

Location: Toulouse, Paris

Job type: Full-time

Work setup: 2-3 days remote per week

Start: ASAP


Job offer

About pyannoteAI

pyannoteAI is pioneering Speaker Intelligence AI, transforming how AI processes and understands spoken language. Our speaker diarization technology distinguishes speakers with unmatched precision, regardless of the spoken language, making AI understand not just what is said, but who said it and when.

Founded by voice AI experts with 10+ years in the industry (ex-CNRS research scientists), we've built the 9th most downloaded open-source model on HuggingFace with 52 million monthly downloads and over 140,000 users worldwide. After raising €8M from leading international VCs (Crane Venture Partners, Serena, and angels from HuggingFace and OpenAI), we're now scaling our enterprise platform.

From meeting transcription and call center analytics to video dubbing and voice agents, pyannoteAI powers the next generation of voice-enabled applications across industries that depend on understanding who speaks and when.

🧵 Your role

As a Data Engineer at pyannoteAI, you'll be embedded in our world-class research team, building the data infrastructure that powers breakthrough speaker diarization models. You'll own the entire data pipeline—from acquisition to quality assessment—supporting researchers who are training state-of-the-art models on massive audio datasets across multiple tasks: speaker diarization, separation, transcription, streaming, and tagging. Your work will take our already industry-leading models to the next level through high-quality, curated datasets.

You'll: