Cloudglue - Video Understanding Infrastructure

Cloudglue is a Y Combinator-backed startup building developer APIs that turn video and audio into structured, searchable data. We handle the hard infrastructure - transcription, visual analysis, search, extraction - so developers can build on top of video without managing ML pipelines themselves.

We process millions of minutes of video for customers building search, analytics, and automation products. The research problems are real: how do you retrieve the right 10 seconds from 10,000 hours of video? How do you extract structured facts from noisy, multimodal content? How do you reason across visual and spoken information at scale?

Our team has shipped large-scale systems at Snapchat and Amazon, with work presented at NeurIPS, ICCV, CVPR, KubeCon, and DEF CON. We’re a small, technical team where researchers ship code and engineers read papers.

The Role

We’re looking for a research engineer to work on the core multimodal retrieval and video reasoning systems that power Cloudglue. This is a 50/50 research and engineering role - you’ll design novel approaches to hard retrieval and understanding problems, and you’ll ship them into production where real customers depend on them.

You’ll work across:

This is not a pure research role. You’ll be expected to take ideas from paper to prototype to production. But it’s also not a pure engineering role - we need someone with genuine research depth who can identify the right problems to work on and design novel solutions.

What You’ll Do

What We’re Looking For

Required