Add HLS and YouTube streaming examples to Python samples#59
Conversation
…m vocabulary functionality for YT and HLS scripts
WalkthroughTwo new Python scripts have been added to handle live audio transcription using the Gladia API. One script streams audio from an HLS source while the other works with audio extracted from a YouTube video. Both scripts define several TypedDicts for configuration and responses, and implement functions for API key retrieval, session initialization, audio streaming with FFmpeg (and yt-dlp for YouTube), message handling over a WebSocket, and graceful shutdown via signal handling. Changes
Sequence Diagram(s)sequenceDiagram
participant User as User
participant Main as main()
participant API as Gladia API
participant FFmpeg as FFmpeg Processor
participant WS as WebSocket
User->>Main: Provide API key & HLS URL
Main->>Main: Call get_gladia_key()
Main->>API: Call init_live_session(config)
API-->>Main: Return session info
Main->>FFmpeg: Execute stream_audio_from_hls()
FFmpeg->>WS: Send audio chunks
WS-->>Main: Return transcription messages
Main->>Main: Process messages via print_messages_from_socket()
Main->>WS: Trigger stop_recording()
sequenceDiagram
participant User as User
participant Main as main()
participant API as Gladia API
participant YTDL as yt-dlp/FFmpeg Processor
participant WS as WebSocket
User->>Main: Provide API key & YouTube URL
Main->>Main: Call get_gladia_key()
Main->>API: Call init_live_session(config)
API-->>Main: Return session info
Main->>YTDL: Execute stream_audio_from_youtube()
YTDL->>WS: Send audio chunks
WS-->>Main: Return transcription messages
Main->>Main: Process messages via print_messages_from_socket()
Main->>WS: Trigger stop_recording()
Poem
Tip ⚡🧪 Multi-step agentic review comment chat (experimental)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (10)
python/src/streaming/live-from-hls.py (5)
1-11: Imports are well-organized, but consider handling FFmpeg/Requests version constraints.All necessary modules are imported, including
asynciofor concurrency,requestsfor HTTP calls, andwebsocketsfor real-time communication. As a best practice, verify that the installed versions of FFmpeg and Requests meet your project's stability and security needs.
12-14: Consider using environment variables for GLADIA_API_URL.Having the API endpoint coded as a constant is functional. However, storing environment-specific information (e.g.,
GLADIA_API_URL) in an environment variable or a configuration file increases flexibility and protects against accidental commits of sensitive data.
24-28: Optionally annotate default values for typed fields.Even though
languagesandcode_switchingare optional, consider clarifying defaults within docstrings or specifying them in the data structures if that's the intended usage. This reduces guesswork when these fields are missing.
39-60: STREAMING_CONFIGURATION is comprehensive but watch out for complex nested structures.Having a nested
realtime_processingdictionary supports custom vocabulary. Over time, with more nested keys, the code may become difficult to follow. Consider a dedicated typed structure or factory function for config generation if complexity grows further.
71-84: Graceful error-handling is in place, but consider logging for improved observability.The code checks
response.okand exits with the status code if an error occurs. Adding structured logging statements would improve debugging in production scenarios, especially when the HTTP request fails or times out.python/src/streaming/live-from-youtube.py (5)
1-11: Imports are suitable for YouTube streaming, but clarify dependencies.This script depends on
yt-dlpfor retrieving YouTube audio. Confirm that users installing your code clearly understand this dependency, ideally in a requirements file or README.
15-17: Example URL is helpful; encourage user-defined values.Providing an example YouTube link is beneficial. Consider adding inline documentation or command-line arguments so users can supply their own URLs without modifying the code.
29-38: StreamingConfiguration typed fields are consistent with the HLS script.Since both scripts define a near-identical structure, consider factoring out the shared logic or typed definitions into a common Python module to reduce duplication and simplify maintenance.
63-69: Ensure consistent error messaging for missing Gladia API key.The script prints a user-facing error and immediately exits if the key is absent. This is appropriate for a CLI-based approach. If programmatic usage is foreseen, consider raising a custom exception instead.
161-173: Message printing is done well, but consider deeper data usage.The script prints final transcripts and timestamps, which is great for demonstration. For advanced scenarios, you might parse or store these transcripts in a database or message queue.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
python/src/streaming/live-from-hls.py(1 hunks)python/src/streaming/live-from-youtube.py(1 hunks)
🔇 Additional comments (4)
python/src/streaming/live-from-hls.py (2)
18-23: TypedDict usage is appropriate and enhances clarity.Defining
InitiateResponseas aTypedDictensures your code benefits from static type checks. This is excellent for maintainability and clarity.
142-197: Signal handling and concurrency usage appear correct, but confirm resource cleanup.Using
stop_recordingand terminating the FFmpeg process in the loop is a solid approach. Ensure any open resources—such as file descriptors—are closed if the process is forcibly terminated. Consider using a context manager or final cleanup block for robust resource handling.Please confirm that any leftover temporary resources or processes are closed or killed cleanly on all operating systems by testing with various HLS streams.
python/src/streaming/live-from-youtube.py (2)
18-23: TypedDict usage ensures robust structure.Clearly defining
InitiateResponseclarifies data shape expectations. This type-based approach will scale well as the API evolves.
182-235: Graceful shutdown logic is good, ensure resilience after failures.Using
asyncio.Eventand signal handlers effectively coordinates tasks. Double-check that abrupt errors (e.g., network outages) lead to clean shutdowns. Confirm final transcripts are still printed or stored if the connection drops mid-stream.Please run tests with an intentionally dropped connection or invalid YouTube URL to ensure partial transcripts and resources are cleaned up gracefully.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (10)
python/src/streaming/live-from-hls.py (4)
1-11: Use narrower imports or confirm necessity.All the imported modules appear relevant (asyncio, json, subprocess, etc.) for the streaming logic. However, if some imports (like
timefromdatetime) are used only once, consider inline usage or verifying that each import is strictly necessary to improve clarity.
12-16: Clarify usage of constants.The
GLADIA_API_URLandEXAMPLE_HLS_STREAM_URLconstants are well-defined, but it's essential to communicate thatEXAMPLE_HLS_STREAM_URLis purely illustrative. For maintainability, consider a comment or docstring clarifying that developers must replace this link with a valid HLS URL for real-use scenarios.
41-60: Consider scoping or dynamic configuration.Defining
STREAMING_CONFIGURATIONat the module level is convenient, but for dynamic usage, it might be beneficial to construct this dictionary at runtime or allow the user to override fields. This could improve reusability if users need different sampling rates or bit depths without modifying the source code directly.
71-84: Add retry or fallback mechanism.Currently,
init_live_session()exits the entire program upon an unsuccessful API call. Depending on the broader usage context, consider adding a retry mechanism or error handling that provides meaningful feedback (e.g., asking the user to retry or check credentials) instead of outright exiting.python/src/streaming/live-from-youtube.py (6)
1-11: Review optional built-ins versus standard imports.Most imports look appropriate (asyncio, json, subprocess, etc.). If any remain unused (like
signalfor graceful shutdown), they can be removed. Conversely, iftimefromdatetimeis used occasionally, it’s acceptable as is.
13-16: Provide clarity on example URLs.Like the HLS script, make it explicit that
EXAMPLE_YOUTUBE_URLis only a placeholder. Encourage users to replace it with their own YouTube link to avoid confusion.
40-60: Centralize configuration.
STREAMING_CONFIGURATIONis largely the same as in the HLS script. Consider a shared utility or module to reduce duplication and ensure default streaming settings remain consistent across scripts.
63-69: Consider reusability of the API key retrieval.
get_gladia_key()is duplicated from the HLS script. If these scripts continue to evolve, centralizing the retrieval of environment variables or command-line arguments could eliminate duplication and reduce future maintenance overhead.
71-84: Handle partial failures gracefully.
init_live_session()currently exits on any non-OK response. While suitable for a standalone script, you might want to allow partial error handling or user prompts for re-entry of credentials. This is especially relevant in interactive or service-based contexts.
182-235: Robust cancellation flow.Captured signals lead to a cancellation approach that tasks are
cancel()ed afterFIRST_COMPLETED. Ensure that partial transcriptions are handled correctly. Ifyt-dlpor FFmpeg is still running, you might want to terminate them or read theirstderrto confirm the reason for stopping (user or network error).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
python/src/streaming/live-from-hls.py(1 hunks)python/src/streaming/live-from-youtube.py(1 hunks)
🔇 Additional comments (9)
python/src/streaming/live-from-hls.py (5)
19-38: Validate optional fields in TypedDicts.The
LanguageConfigurationandStreamingConfigurationtypes indicate certain fields are optional (languagescan beNone, for example). Ensure that logic in subsequent functions gracefully handles these optional fields. If not, consider adding checks or default values to reduce potential runtime errors.
64-69: Graceful approach to missing API key.The
get_gladia_key()function immediately terminates execution upon missing arguments. This is perfectly valid for a CLI script. Just ensure that the calling environment indeed wants the process to exit, rather than handle or re-ask for the missing parameter. If you plan to integrate into a larger system, you might handle this error more gracefully.
86-95: Supports cross-script reuse.
format_duration()function is commonly used throughout these streaming scripts, making it a good candidate for reuse. If multiple scripts require the same functionality, consider creating a utility module to avoid duplication.
97-114: [Duplicate from prior suggestion regarding FFmpeg termination]The prior review (#Ref: coderabbitai[bot] comment) already suggested monitoring
stderrand theffmpeg_processexit status to avoid silent hangs. Marking this as a duplicate.
125-137: Check for robust concurrency controls.The code sends audio chunks to the WebSocket asynchronously in a loop. Ensure there's no concurrency conflict with other tasks that might also send messages on the same socket. Typically, a single-producer/multi-consumer flow is safe if consistently awaited, but confirm that the rest of the code does not send interleaving chunks to the same socket in parallel.
python/src/streaming/live-from-youtube.py (4)
19-22: TypedDict correctness.The
InitiateResponsefields are minimal but critical. If the API returns other fields, either add them or confirm ignoring them is correct. In typed contexts, partial definitions of responses can cause confusion if unrecognized fields appear in the data.
24-38: Optional fields usage checks.As with the HLS script, confirm that the optional fields (e.g.,
code_switchingwithinLanguageConfiguration) are either validated or set to defaults. This prevents accidentalNoneTypeusage in subsequent calls.
133-147: [Duplicate from prior suggestion regarding stderr checks for external processes]As in the HLS script, a previous comment recommended capturing error messages from external processes. Marking this note as a duplicate for tracking consistency across scripts.
161-173: Verify assumptions about final transcript.Within
print_messages_from_socket(), if thepost_final_transcriptevent is triggered multiple times or never, confirm that it won't cause unexpected behavior, e.g., printing multiple "End of session" blocks or skipping the final message. If the API can produce multiple final transcripts, handle them accordingly.
| async def main(): | ||
| """Main function to transcribe an HLS stream.""" | ||
| print("\nThis script demonstrates how to transcribe audio from an HLS stream.") | ||
| print("Requirements:") | ||
| print("- FFmpeg installed on your system") | ||
| print("- A valid HLS stream URL") | ||
| print("\nExample usage: python live-from-hls.py YOUR_GLADIA_API_KEY\n") | ||
|
|
||
| # Initialize session | ||
| response = init_live_session(STREAMING_CONFIGURATION) | ||
|
|
||
| async with connect(response["url"]) as websocket: | ||
| print("\n################ Begin session ################\n") | ||
|
|
||
| # Setup signal handler for graceful shutdown | ||
| loop = asyncio.get_running_loop() | ||
| loop.add_signal_handler( | ||
| signal.SIGINT, | ||
| loop.create_task, | ||
| stop_recording(websocket), | ||
| ) | ||
|
|
||
| try: | ||
| tasks = [ | ||
| asyncio.create_task( | ||
| stream_audio_from_hls(websocket, EXAMPLE_HLS_STREAM_URL) | ||
| ), | ||
| asyncio.create_task(print_messages_from_socket(websocket)), | ||
| ] | ||
| await asyncio.wait(tasks) | ||
| except asyncio.exceptions.CancelledError: | ||
| for task in tasks: | ||
| task.cancel() | ||
| await stop_recording(websocket) | ||
|
|
||
|
|
||
| if __name__ == "__main__": |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Graceful signal handling and cleanup.
The signal handling approach (especially registering SIGINT and calling stop_recording()) is neat. However, ensure that the ongoing FFmpeg process is terminated promptly on all OS platforms. In some environments, intercepting SIGINT might not always allow Popen processes to shut down gracefully. Double-check cross-platform behavior if your user base might run on Windows or other platforms.
| async def stream_audio_from_youtube(socket: ClientConnection, youtube_url: str) -> None: | ||
| """Stream audio from YouTube livestream to the WebSocket.""" | ||
| yt_dlp_command = [ | ||
| "yt-dlp", | ||
| "--buffer-size", "16K", | ||
| "-f", "bestaudio", # Select best audio format | ||
| "-o", "-", # Output to stdout | ||
| youtube_url, | ||
| ] | ||
|
|
||
| ffmpeg_command = [ | ||
| "ffmpeg", | ||
| "-re", # Read input at native framerate | ||
| "-i", "pipe:0", # Read from stdin | ||
| "-ar", str(STREAMING_CONFIGURATION["sample_rate"]), | ||
| "-ac", str(STREAMING_CONFIGURATION["channels"]), | ||
| "-f", "wav", | ||
| "-bufsize", "16K", | ||
| "pipe:1", | ||
| ] |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Multi-process interplay.
yt-dlp and FFmpeg run concurrently. Consider robust error-checking on both processes, especially if one process fails or hangs unexpectedly. Logging stderr from both might provide insight. Consider collecting or reading from their stderr streams.
|
Thanks for your contribution @yidakra , we'll review this one soon ! :) |
Happy to contribute! Currently, I am also working on a sample that mirrors the HLS stream with generated subtitles. It would be amazing if someone could have a look at it. |
Description
This PR adds two new Python examples demonstrating how to use Gladia's API for real-time transcription of HLS and YouTube streams:
live-from-hls.py: Transcribe audio from any HLS streamlive-from-youtube.py: Transcribe audio from YouTube videos or livestreamsFeatures
Both examples include:
Requirements
The examples require:
yt-dlp(for YouTube example)websockets,requestsUsage
HLS streaming:
YouTube streaming:
Testing
Both scripts have been tested with:
Notes
Summary by CodeRabbit