Skip to content

Gemini Live (Vertex AI, native audio) speaks twice after function calls #4554

@IngLP

Description

@IngLP

Bug Description

When using LiveKit Agents with Gemini Live on Vertex AI (native audio), the agent consistently produces two spoken responses after a function call. The first response occurs during/after the tool call, and a second response occurs after the tool result is sent back. This happens even when the tool itself does not call generate_reply or say and simply returns a small string output.

WHY can't I move to Gemini API (and you shouldn't too)? All Gemini API models are preview. Not ready for production. Gemini realtime 12-2025 has issue with tool calls, unreliable and highly variable latency, and so on.

Expected Behavior

Only one spoken response per user turn. A tool call should not trigger a second, separate spoken response for the same turn.

Reproduction Steps

# repro.py
import asyncio
from typing import Any
from dotenv import load_dotenv
from livekit.agents import cli, JobContext, WorkerOptions, function_tool, RunContext
from livekit.agents.voice import AgentSession, Agent
from livekit.plugins import google

load_dotenv()

@function_tool
async def store_call_info(context: RunContext, info: dict[str, Any]) -> str:
    # No extra say/generate_reply here
    # await asyncio.sleep(0.1)
    return "Information saved."

async def entrypoint(ctx: JobContext):
    await ctx.connect()
    participant = await ctx.wait_for_participant()

    agent = Agent(
        instructions=(
            "Introduce yourself (you are John). Ask the user for their name, then call store_call_info when you have it to save his name."
            "Then ask his job, and save also it with the function."
            "Then, continue the conversation saying something about his job."
        ),
        tools=[store_call_info],
    )

    session = AgentSession(
        llm=google.realtime.RealtimeModel(
            vertexai=True,
            model="gemini-live-2.5-flash-native-audio",
            voice="Charon",
        ),
    )

    await session.start(agent=agent, room=ctx.room)

if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

Operating System

macOS, Linux dev server

Models Used

gemini-live-2.5-flash-native-audio

Package Versions

python 3.12.11
livekit 1.0.23
livekit-agents 1.3.11
livekit-plugins-google 1.3.11
google-genai 1.58.0

Session/Room/Call IDs

No response

Proposed Solution

Additional Context

No response

Screenshots and Recordings

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions