Agent Generates First Message but No Audio Playback

### Bug Description

The AI agent intermittently fails to produce audible speech in LiveKit rooms. In affected cases, the agent successfully joins the room, receives and processes jobs, initializes without errors, and publishes an audio track. The client can also receive the agent’s initial text response, confirming that the LLM logic and job execution are functioning correctly.

However, despite the audio track being published, no speech is heard by the client. This indicates that the failure occurs at the media/audio pipeline level rather than in the AI reasoning or job handling logic. The issue does not occur consistently and is observed only in some sessions, while other sessions work as expected.

### Expected Behavior

When the AI agent joins a LiveKit room and receives a job, it should consistently generate both text and audible speech output for every session. After initializing successfully and publishing an audio track, the agent’s synthesized speech should be clearly heard by all connected clients without delay or silence.

### Reproduction Steps

```bash
# Get model language (map 'eg' to 'ar')
  lang_model = params.get_model_language()
  
  # Get VAD
  vad = self.model_provider.get_vad()
  
  # Setup turn detector
  turn_detector = MultilingualModel()
  
  # Create session runner
  session_runner = AgentSession(
      llm=self.model_provider.get_llm(creativity=params.creativity),
      stt=self.model_provider.get_stt(language=lang_model),
      tts=self.model_provider.get_tts(voice_type=params.voice_type, speed=params.voice_speed),
      vad=vad,
      turn_detection=turn_detector,
  )
  # 12. Start session (this is blocking and won't return until session ends)
  try:
      logger.info(f"Starting session in room: {ctx.room.name}")
      await session_runner.start(
          room=ctx.room,
          agent=coach,
          room_input_options=RoomInputOptions(
              noise_cancellation=noise_cancellation.BVC()
          ),
          room_output_options=RoomOutputOptions(
              audio_enabled=True,
              transcription_enabled=True,
              sync_transcription=False,
          ),
      )
      
  except Exception as e:
      logger.exception("Failed to start session runner")
      raise e
```

### Operating System

ubuntu 2024 and windows 10

### Models Used

_No response_

### Package Versions

```bash
"livekit-agents==1.3.11"
"livekit-plugins-silero",
"livekit-plugins-noise-cancellation",
"livekit-plugins-turn-detector",
"livekit-plugins-openai",
"livekit-plugins-deepgram",
```

### Session/Room/Call IDs

_No response_

### Proposed Solution

```python

```

### Additional Context

_No response_

### Screenshots and Recordings

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent Generates First Message but No Audio Playback #4587

Bug Description

Expected Behavior

Reproduction Steps

Operating System

Models Used

Package Versions

Session/Room/Call IDs

Proposed Solution

Additional Context

Screenshots and Recordings

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Agent Generates First Message but No Audio Playback #4587

Description

Bug Description

Expected Behavior

Reproduction Steps

Operating System

Models Used

Package Versions

Session/Room/Call IDs

Proposed Solution

Additional Context

Screenshots and Recordings

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions