3K, 60fps, 130ms: achieving it with Rust | tonari blog

How we chose the Rust programming language to advance the state-of-the-art in real-time communication

This post was written collectively with Ryo Kawaguchi, Andrea Law, Brian Schwind.

Our goal for tonari is to build a virtual doorway to another space that allows for truly natural human interactions. Nearly two years in development, tonari is, to the best of our knowledge, the lowest-latency high resolution production-ready "teleconferencing" (we are truly not fond of that word) product available.

130ms glass-to-glass latency (the time from light hitting the camera to when it appears on-screen on the other side)
3K, 60fps video transmission
Audiophile-quality stereo audio

Compare this to the typical 315-500ms latency for Zoom and WebRTC, as measured between two laptops (X1 Carbon and MacBook Pro) on the same network at our office. It's a huge difference. It's the difference between constantly interrupting each other versus having a natural flow of conversation. It's the difference between a blurry face from a camera seemingly pointed up someone's nose versus a wide-view high fidelity image that smoothly transfers all the subtle body language of an in-person conversation.

Since launching our first pilot in February, we've experienced no software-related downtime (tripping over ethernet cables is a different story). And as much as we would love to think we're infallible engineers, we truly don't believe we could have achieved these numbers with this level of stability without Rust.

In the beginning (or: why we're not WebRTC)

The very first tonari proof-of-concept used a basic projector, bluetooth speakers, and a website running on top of vanilla WebRTC (JavaScript). We've come a long way since those days.

While that prototype (and our opinionated vision of the future) got us grant funding, we knew that tonari would be dead on arrival unless we could achieve significantly lower latency and higher fidelity than WebRTC—two things that aren't currently associated with video chat in 2020.

We figured, “Okay, so we can just modify WebRTC directly and wrap it up with a slick UI in C++ and launch it in no time.”

A week of struggling with WebRTC’s nearly 750,000 LoC behemoth of a codebase revealed just how painful a single small change could be — how hard it was to test, and feel truly safe, with the code you were dealing with.

Let there be light...weight code

So in a furious (read: calm and thoroughly-discussed) rage quit we decided it was easier to re-implement the whole stack from scratch. We wanted to know and understand every line of code being run on our hardware, and it should be designed for the exact hardware we wanted.

Thus began our journey to the depths beyond high-level interfaces like a browser or existing RTC project, and into the world of low-level systems and hardware interaction from scratch.

We needed it to be inherently secure to protect the privacy of those who use tonari. We needed it to be performant to make it feel as human and real-time as possible. And we needed it to be maintainable as the code becomes more mature, as new brains show up and have to learn our work and expand on it.

We discussed and ruled out a handful of alternative approaches:

Security: C and C++ are memory- and concurrency-unsafe, and their disparate and seemingly infinite build systems make it hard to have a consistent and simple development experience.