How to Integrate Gemini 3.1 Flash Live Into Your App — The Complete Guide

A no-BS guide based on actually building real-time two-way voice into a production health tracking app. Every mistake, dead end, and "aha" moment included.

What is Gemini 3.1 Flash Live?

Google's real-time, bidirectional audio AI model. You stream audio in, it streams audio back — like a phone call with AI. It does speech recognition, understanding, reasoning, and voice response ALL in one model. No separate STT → LLM → TTS pipeline needed.

Key specs:

Audio in: raw PCM16, 16kHz, little-endian
Audio out: raw PCM16, 24kHz, little-endian
Protocol: WebSocket (persistent connection)
API version: v1alpha (NOT v1beta — critical!)
Model ID: gemini-3.1-flash-live-preview
Supports: function calling, system instructions, input/output transcription, barge-in (interrupting the AI mid-sentence)

How is it different from regular Gemini?

Regular Gemini models (gemini-2.5-flash, etc.) support generateContent — you send text/images, get text back via REST API. The Live model ONLY supports bidiGenerateContent — bidirectional streaming over WebSocket. You can't use curl or a regular API call. This tripped us up early.

Step 0: Getting Your API Key (Google Cloud Console)

This is where most people get confused. Here's the exact flow:

Go to console.cloud.google.com
Create a project (or select existing one)
Go to APIs & Services → Library → search "Generative Language API" → Enable it
Go to APIs & Services → Credentials → Create Credentials → API Key