The Role

We’re building a world of multi-modal AIs that you can interact with as easily and naturally as the people you talk to every day - via fluid, engaging voice conversations. We envision a future where you engage with technology just by saying what you want to a truly knowledgeable agent - not struggling with limited user interfaces or underwhelmed by the capabilities of today’s AI assistants (e.g., Siri).

With this vision in mind, we’re building AI models that have the reasoning capabilities of today’s text-based LLMs, but can also consume and output voice and video data in real time. Unlike today’s LLMs, these models will have the ability to understand the considerable nuance present in human speech and absent in plain text, and this will also allow them to respond to the user more quickly and fluently.

We have a v1 version of the system up and running in our prototype app, hisanta.ai, and we’ve seen excellent engagement since launch. While the v1 builds around a text-based LLM, we’re actively working toward the speech-to-speech vision for v2.

We’re looking for a stellar AI expert with experience in LLMs and multimodal models (and available for in-person collaboration in Seattle) to help us build this future.

What you’ll do

Things we’re looking for

Benefits