Skip to content

Add HLS and YouTube streaming examples to Python samples#59

Open
yidakra wants to merge 8 commits into
gladiaio:mainfrom
yidakra:main
Open

Add HLS and YouTube streaming examples to Python samples#59
yidakra wants to merge 8 commits into
gladiaio:mainfrom
yidakra:main

Conversation

@yidakra
Copy link
Copy Markdown

@yidakra yidakra commented Mar 13, 2025

Description

This PR adds two new Python examples demonstrating how to use Gladia's API for real-time transcription of HLS and YouTube streams:

  • live-from-hls.py: Transcribe audio from any HLS stream
  • live-from-youtube.py: Transcribe audio from YouTube videos or livestreams

Features

Both examples include:

  • Proper signal handling for graceful shutdown
  • Configurable language and custom vocabulary support
  • Real-time audio streaming with FFmpeg
  • Clear error handling and user feedback
  • Type hints and comprehensive documentation

Requirements

The examples require:

  • FFmpeg installed on the system
  • yt-dlp (for YouTube example)
  • Python packages: websockets, requests

Usage

HLS streaming:

python src/streaming/live-from-hls.py YOUR_GLADIA_API_KEY

YouTube streaming:

python src/streaming/live-from-youtube.py YOUR_GLADIA_API_KEY

Testing

Both scripts have been tested with:

  • Various HLS streams
  • YouTube videos and livestreams
  • Different language configurations
  • Custom vocabulary settings
  • Graceful shutdown scenarios

Notes

  • Example URLs are provided but can be easily replaced with any valid stream URL
  • Configuration examples follow the same pattern as existing samples
  • Error handling matches the style of other examples in the repository

Summary by CodeRabbit

  • New Features
    • Added live audio transcription support for HTTP Live Streaming sources with robust error handling and graceful shutdown.
    • Introduced real-time transcription of audio directly from YouTube videos, delivering immediate transcription results.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 13, 2025

Walkthrough

Two new Python scripts have been added to handle live audio transcription using the Gladia API. One script streams audio from an HLS source while the other works with audio extracted from a YouTube video. Both scripts define several TypedDicts for configuration and responses, and implement functions for API key retrieval, session initialization, audio streaming with FFmpeg (and yt-dlp for YouTube), message handling over a WebSocket, and graceful shutdown via signal handling.

Changes

File(s) Changes Summary
python/src/streaming/live-from-hls.py Introduces a new script for HLS audio transcription. Adds functions: get_gladia_key, init_live_session, format_duration, stream_audio_from_hls, print_messages_from_socket, stop_recording, and main. Defines TypedDicts: InitiateResponse, LanguageConfiguration, and StreamingConfiguration for structured data handling.
python/src/streaming/live-from-youtube.py Introduces a new script for YouTube audio transcription. Similar set of functions implemented: get_gladia_key, init_live_session, format_duration, stream_audio_from_youtube, print_messages_from_socket, stop_recording, and main. Uses yt-dlp to retrieve audio and defines the same TypedDicts for configuration.

Sequence Diagram(s)

sequenceDiagram
    participant User as User
    participant Main as main()
    participant API as Gladia API
    participant FFmpeg as FFmpeg Processor
    participant WS as WebSocket

    User->>Main: Provide API key & HLS URL
    Main->>Main: Call get_gladia_key()
    Main->>API: Call init_live_session(config)
    API-->>Main: Return session info
    Main->>FFmpeg: Execute stream_audio_from_hls()
    FFmpeg->>WS: Send audio chunks
    WS-->>Main: Return transcription messages
    Main->>Main: Process messages via print_messages_from_socket()
    Main->>WS: Trigger stop_recording()
Loading
sequenceDiagram
    participant User as User
    participant Main as main()
    participant API as Gladia API
    participant YTDL as yt-dlp/FFmpeg Processor
    participant WS as WebSocket

    User->>Main: Provide API key & YouTube URL
    Main->>Main: Call get_gladia_key()
    Main->>API: Call init_live_session(config)
    API-->>Main: Return session info
    Main->>YTDL: Execute stream_audio_from_youtube()
    YTDL->>WS: Send audio chunks
    WS-->>Main: Return transcription messages
    Main->>Main: Process messages via print_messages_from_socket()
    Main->>WS: Trigger stop_recording()
Loading

Poem

I'm a bunny with hops so fleet,
Coding streams that sound so sweet!
From HLS waves to YouTube tunes,
My lines of code make joyful boons.
Gladia helps my dreams take flight—
A rabbit's rhythm in the night! 🐇✨

Tip

⚡🧪 Multi-step agentic review comment chat (experimental)
  • We're introducing multi-step agentic chat in review comments. This experimental feature enhances review discussions with the CodeRabbit agentic chat by enabling advanced interactions, including the ability to create pull requests directly from comments.
    - To enable this feature, set early_access to true under in the settings.
✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (10)
python/src/streaming/live-from-hls.py (5)

1-11: Imports are well-organized, but consider handling FFmpeg/Requests version constraints.

All necessary modules are imported, including asyncio for concurrency, requests for HTTP calls, and websockets for real-time communication. As a best practice, verify that the installed versions of FFmpeg and Requests meet your project's stability and security needs.


12-14: Consider using environment variables for GLADIA_API_URL.

Having the API endpoint coded as a constant is functional. However, storing environment-specific information (e.g., GLADIA_API_URL) in an environment variable or a configuration file increases flexibility and protects against accidental commits of sensitive data.


24-28: Optionally annotate default values for typed fields.

Even though languages and code_switching are optional, consider clarifying defaults within docstrings or specifying them in the data structures if that's the intended usage. This reduces guesswork when these fields are missing.


39-60: STREAMING_CONFIGURATION is comprehensive but watch out for complex nested structures.

Having a nested realtime_processing dictionary supports custom vocabulary. Over time, with more nested keys, the code may become difficult to follow. Consider a dedicated typed structure or factory function for config generation if complexity grows further.


71-84: Graceful error-handling is in place, but consider logging for improved observability.

The code checks response.ok and exits with the status code if an error occurs. Adding structured logging statements would improve debugging in production scenarios, especially when the HTTP request fails or times out.

python/src/streaming/live-from-youtube.py (5)

1-11: Imports are suitable for YouTube streaming, but clarify dependencies.

This script depends on yt-dlp for retrieving YouTube audio. Confirm that users installing your code clearly understand this dependency, ideally in a requirements file or README.


15-17: Example URL is helpful; encourage user-defined values.

Providing an example YouTube link is beneficial. Consider adding inline documentation or command-line arguments so users can supply their own URLs without modifying the code.


29-38: StreamingConfiguration typed fields are consistent with the HLS script.

Since both scripts define a near-identical structure, consider factoring out the shared logic or typed definitions into a common Python module to reduce duplication and simplify maintenance.


63-69: Ensure consistent error messaging for missing Gladia API key.

The script prints a user-facing error and immediately exits if the key is absent. This is appropriate for a CLI-based approach. If programmatic usage is foreseen, consider raising a custom exception instead.


161-173: Message printing is done well, but consider deeper data usage.

The script prints final transcripts and timestamps, which is great for demonstration. For advanced scenarios, you might parse or store these transcripts in a database or message queue.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 75368d7 and 2b0224a.

📒 Files selected for processing (2)
  • python/src/streaming/live-from-hls.py (1 hunks)
  • python/src/streaming/live-from-youtube.py (1 hunks)
🔇 Additional comments (4)
python/src/streaming/live-from-hls.py (2)

18-23: TypedDict usage is appropriate and enhances clarity.

Defining InitiateResponse as a TypedDict ensures your code benefits from static type checks. This is excellent for maintainability and clarity.


142-197: Signal handling and concurrency usage appear correct, but confirm resource cleanup.

Using stop_recording and terminating the FFmpeg process in the loop is a solid approach. Ensure any open resources—such as file descriptors—are closed if the process is forcibly terminated. Consider using a context manager or final cleanup block for robust resource handling.

Please confirm that any leftover temporary resources or processes are closed or killed cleanly on all operating systems by testing with various HLS streams.

python/src/streaming/live-from-youtube.py (2)

18-23: TypedDict usage ensures robust structure.

Clearly defining InitiateResponse clarifies data shape expectations. This type-based approach will scale well as the API evolves.


182-235: Graceful shutdown logic is good, ensure resilience after failures.

Using asyncio.Event and signal handlers effectively coordinates tasks. Double-check that abrupt errors (e.g., network outages) lead to clean shutdowns. Confirm final transcripts are still printed or stored if the connection drops mid-stream.

Please run tests with an intentionally dropped connection or invalid YouTube URL to ensure partial transcripts and resources are cleaned up gracefully.

Comment thread python/src/streaming/live-from-hls.py
Comment thread python/src/streaming/live-from-youtube.py
yidakra and others added 2 commits March 17, 2025 16:08
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (10)
python/src/streaming/live-from-hls.py (4)

1-11: Use narrower imports or confirm necessity.

All the imported modules appear relevant (asyncio, json, subprocess, etc.) for the streaming logic. However, if some imports (like time from datetime) are used only once, consider inline usage or verifying that each import is strictly necessary to improve clarity.


12-16: Clarify usage of constants.

The GLADIA_API_URL and EXAMPLE_HLS_STREAM_URL constants are well-defined, but it's essential to communicate that EXAMPLE_HLS_STREAM_URL is purely illustrative. For maintainability, consider a comment or docstring clarifying that developers must replace this link with a valid HLS URL for real-use scenarios.


41-60: Consider scoping or dynamic configuration.

Defining STREAMING_CONFIGURATION at the module level is convenient, but for dynamic usage, it might be beneficial to construct this dictionary at runtime or allow the user to override fields. This could improve reusability if users need different sampling rates or bit depths without modifying the source code directly.


71-84: Add retry or fallback mechanism.

Currently, init_live_session() exits the entire program upon an unsuccessful API call. Depending on the broader usage context, consider adding a retry mechanism or error handling that provides meaningful feedback (e.g., asking the user to retry or check credentials) instead of outright exiting.

python/src/streaming/live-from-youtube.py (6)

1-11: Review optional built-ins versus standard imports.

Most imports look appropriate (asyncio, json, subprocess, etc.). If any remain unused (like signal for graceful shutdown), they can be removed. Conversely, if time from datetime is used occasionally, it’s acceptable as is.


13-16: Provide clarity on example URLs.

Like the HLS script, make it explicit that EXAMPLE_YOUTUBE_URL is only a placeholder. Encourage users to replace it with their own YouTube link to avoid confusion.


40-60: Centralize configuration.

STREAMING_CONFIGURATION is largely the same as in the HLS script. Consider a shared utility or module to reduce duplication and ensure default streaming settings remain consistent across scripts.


63-69: Consider reusability of the API key retrieval.

get_gladia_key() is duplicated from the HLS script. If these scripts continue to evolve, centralizing the retrieval of environment variables or command-line arguments could eliminate duplication and reduce future maintenance overhead.


71-84: Handle partial failures gracefully.

init_live_session() currently exits on any non-OK response. While suitable for a standalone script, you might want to allow partial error handling or user prompts for re-entry of credentials. This is especially relevant in interactive or service-based contexts.


182-235: Robust cancellation flow.

Captured signals lead to a cancellation approach that tasks are cancel()ed after FIRST_COMPLETED. Ensure that partial transcriptions are handled correctly. If yt-dlp or FFmpeg is still running, you might want to terminate them or read their stderr to confirm the reason for stopping (user or network error).

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2b0224a and 1468852.

📒 Files selected for processing (2)
  • python/src/streaming/live-from-hls.py (1 hunks)
  • python/src/streaming/live-from-youtube.py (1 hunks)
🔇 Additional comments (9)
python/src/streaming/live-from-hls.py (5)

19-38: Validate optional fields in TypedDicts.

The LanguageConfiguration and StreamingConfiguration types indicate certain fields are optional (languages can be None, for example). Ensure that logic in subsequent functions gracefully handles these optional fields. If not, consider adding checks or default values to reduce potential runtime errors.


64-69: Graceful approach to missing API key.

The get_gladia_key() function immediately terminates execution upon missing arguments. This is perfectly valid for a CLI script. Just ensure that the calling environment indeed wants the process to exit, rather than handle or re-ask for the missing parameter. If you plan to integrate into a larger system, you might handle this error more gracefully.


86-95: Supports cross-script reuse.

format_duration() function is commonly used throughout these streaming scripts, making it a good candidate for reuse. If multiple scripts require the same functionality, consider creating a utility module to avoid duplication.


97-114: [Duplicate from prior suggestion regarding FFmpeg termination]

The prior review (#Ref: coderabbitai[bot] comment) already suggested monitoring stderr and the ffmpeg_process exit status to avoid silent hangs. Marking this as a duplicate.


125-137: Check for robust concurrency controls.

The code sends audio chunks to the WebSocket asynchronously in a loop. Ensure there's no concurrency conflict with other tasks that might also send messages on the same socket. Typically, a single-producer/multi-consumer flow is safe if consistently awaited, but confirm that the rest of the code does not send interleaving chunks to the same socket in parallel.

python/src/streaming/live-from-youtube.py (4)

19-22: TypedDict correctness.

The InitiateResponse fields are minimal but critical. If the API returns other fields, either add them or confirm ignoring them is correct. In typed contexts, partial definitions of responses can cause confusion if unrecognized fields appear in the data.


24-38: Optional fields usage checks.

As with the HLS script, confirm that the optional fields (e.g., code_switching within LanguageConfiguration) are either validated or set to defaults. This prevents accidental NoneType usage in subsequent calls.


133-147: [Duplicate from prior suggestion regarding stderr checks for external processes]

As in the HLS script, a previous comment recommended capturing error messages from external processes. Marking this note as a duplicate for tracking consistency across scripts.


161-173: Verify assumptions about final transcript.

Within print_messages_from_socket(), if the post_final_transcript event is triggered multiple times or never, confirm that it won't cause unexpected behavior, e.g., printing multiple "End of session" blocks or skipping the final message. If the API can produce multiple final transcripts, handle them accordingly.

Comment on lines +163 to +199
async def main():
"""Main function to transcribe an HLS stream."""
print("\nThis script demonstrates how to transcribe audio from an HLS stream.")
print("Requirements:")
print("- FFmpeg installed on your system")
print("- A valid HLS stream URL")
print("\nExample usage: python live-from-hls.py YOUR_GLADIA_API_KEY\n")

# Initialize session
response = init_live_session(STREAMING_CONFIGURATION)

async with connect(response["url"]) as websocket:
print("\n################ Begin session ################\n")

# Setup signal handler for graceful shutdown
loop = asyncio.get_running_loop()
loop.add_signal_handler(
signal.SIGINT,
loop.create_task,
stop_recording(websocket),
)

try:
tasks = [
asyncio.create_task(
stream_audio_from_hls(websocket, EXAMPLE_HLS_STREAM_URL)
),
asyncio.create_task(print_messages_from_socket(websocket)),
]
await asyncio.wait(tasks)
except asyncio.exceptions.CancelledError:
for task in tasks:
task.cancel()
await stop_recording(websocket)


if __name__ == "__main__":
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Graceful signal handling and cleanup.

The signal handling approach (especially registering SIGINT and calling stop_recording()) is neat. However, ensure that the ongoing FFmpeg process is terminated promptly on all OS platforms. In some environments, intercepting SIGINT might not always allow Popen processes to shut down gracefully. Double-check cross-platform behavior if your user base might run on Windows or other platforms.

Comment on lines +97 to +116
async def stream_audio_from_youtube(socket: ClientConnection, youtube_url: str) -> None:
"""Stream audio from YouTube livestream to the WebSocket."""
yt_dlp_command = [
"yt-dlp",
"--buffer-size", "16K",
"-f", "bestaudio", # Select best audio format
"-o", "-", # Output to stdout
youtube_url,
]

ffmpeg_command = [
"ffmpeg",
"-re", # Read input at native framerate
"-i", "pipe:0", # Read from stdin
"-ar", str(STREAMING_CONFIGURATION["sample_rate"]),
"-ac", str(STREAMING_CONFIGURATION["channels"]),
"-f", "wav",
"-bufsize", "16K",
"pipe:1",
]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Multi-process interplay.

yt-dlp and FFmpeg run concurrently. Consider robust error-checking on both processes, especially if one process fails or hangs unexpectedly. Logging stderr from both might provide insight. Consider collecting or reading from their stderr streams.

@sboudouk
Copy link
Copy Markdown
Contributor

Thanks for your contribution @yidakra , we'll review this one soon ! :)

@yidakra
Copy link
Copy Markdown
Author

yidakra commented Apr 10, 2025

Thanks for your contribution @yidakra , we'll review this one soon ! :)

Happy to contribute! Currently, I am also working on a sample that mirrors the HLS stream with generated subtitles. It would be amazing if someone could have a look at it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants