Voice Chat

Real-time audio-to-audio conversation. Native speech understanding with 30 selectable voices.

Disconnected

Configuration

Audio Controls

Input
Output

Transcription

Click "Microphone Off" to grant mic access and start talking.

Text Chat

You type; the model replies with native audio. Choose transcript-only (speaker off), audio-only, or both. Matches Live API requirements for Gemini 3.1 Flash Live.

Disconnected

Configuration

Chat

Type a message and hit Enter — session connects automatically.

Vision

Send camera or screen video alongside audio/text. The model sees JPEG frames at up to 1 FPS.

Disconnected

Video Source

Preview

Response

Click Camera or Screen Share to grant access and start.

Function Calling

The model can call functions you define. Supports synchronous tool execution with the Live API.

Disconnected

Registered Functions

get_weather

Get current weather for a city. Params: city (string)

calculate

Evaluate a math expression. Params: expression (string)

get_current_time

Get the current time. Params: timezone (string, optional)

roll_dice

Roll dice. Params: sides (int), count (int)

Conversation

Try: "What's the weather in Tokyo?" or "Roll 3 dice"

Function Call Log

Function calls and responses will appear here.

Thinking

Configure thinking depth: minimal (fastest) to high (most thorough). View the model's reasoning process.

Disconnected

Configuration

Conversation

Ask a complex question to see the thinking process.

Settings

Global configuration for session behavior.

Connection

Configured via .env

Voice Activity Detection

Used for Voice Chat and Vision (realtime audio). Not sent for text-only features.

Session Management

Allows reconnecting to a session within 2 hours if disconnected.

Sliding window compression for unlimited session length.

Audio

Audio formats are fixed by the API. The browser handles conversion automatically.

Model Capabilities

Audio Input/Output
Text Input/Output
Image/Video Input
Input Transcription
Output Transcription
Voice Selection (30)
Function Calling (sync)
Google Search
Thinking (4 levels)
Session Resumption
Context Compression
VAD / Barge-in
97 Languages
Ephemeral Tokens
Async Function Calling
Code Execution
Image Generation
Structured Output
Proactive Audio
Affective Dialog

Protocol Log

Raw WebSocket messages exchanged with the Gemini Live API.

WebSocket messages will appear here when a session is active.