Skip to content

Agent Generates First Message but No Audio Playback #4587

@Ahmedhaggg

Description

@Ahmedhaggg

Bug Description

The AI agent intermittently fails to produce audible speech in LiveKit rooms. In affected cases, the agent successfully joins the room, receives and processes jobs, initializes without errors, and publishes an audio track. The client can also receive the agent’s initial text response, confirming that the LLM logic and job execution are functioning correctly.

However, despite the audio track being published, no speech is heard by the client. This indicates that the failure occurs at the media/audio pipeline level rather than in the AI reasoning or job handling logic. The issue does not occur consistently and is observed only in some sessions, while other sessions work as expected.

Expected Behavior

When the AI agent joins a LiveKit room and receives a job, it should consistently generate both text and audible speech output for every session. After initializing successfully and publishing an audio track, the agent’s synthesized speech should be clearly heard by all connected clients without delay or silence.

Reproduction Steps

# Get model language (map 'eg' to 'ar')
  lang_model = params.get_model_language()
  
  # Get VAD
  vad = self.model_provider.get_vad()
  
  # Setup turn detector
  turn_detector = MultilingualModel()
  
  # Create session runner
  session_runner = AgentSession(
      llm=self.model_provider.get_llm(creativity=params.creativity),
      stt=self.model_provider.get_stt(language=lang_model),
      tts=self.model_provider.get_tts(voice_type=params.voice_type, speed=params.voice_speed),
      vad=vad,
      turn_detection=turn_detector,
  )
  # 12. Start session (this is blocking and won't return until session ends)
  try:
      logger.info(f"Starting session in room: {ctx.room.name}")
      await session_runner.start(
          room=ctx.room,
          agent=coach,
          room_input_options=RoomInputOptions(
              noise_cancellation=noise_cancellation.BVC()
          ),
          room_output_options=RoomOutputOptions(
              audio_enabled=True,
              transcription_enabled=True,
              sync_transcription=False,
          ),
      )
      
  except Exception as e:
      logger.exception("Failed to start session runner")
      raise e

Operating System

ubuntu 2024 and windows 10

Models Used

No response

Package Versions

"livekit-agents==1.3.11"
"livekit-plugins-silero",
"livekit-plugins-noise-cancellation",
"livekit-plugins-turn-detector",
"livekit-plugins-openai",
"livekit-plugins-deepgram",

Session/Room/Call IDs

No response

Proposed Solution

Additional Context

No response

Screenshots and Recordings

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions