Fix unsynchronized timestamps using rewritten OrcasoundHLSClient#461
Fix unsynchronized timestamps using rewritten OrcasoundHLSClient#461
Conversation
Fixes #430 (timestamp drift) and #457 (expensive S3 listing). Key changes: - New src/orcasound_hls/ module with deterministic timestamp computation from S3 folder epoch + actual M3U8 segment durations (not system time) - Prefix-filtered S3 folder listing via hls_locator.py (avoids scanning all ~1000+ folders) - LiveHLSStream and DateRangeHLSStream as drop-in replacements - Remove orca-hls-utils==0.0.4 dependency; add ffmpeg-python, m3u8, boto3 - Add tests/test_orcasound_hls.py with 11 tests against real S3 data https://claude.ai/code/session_0183cFGd7Ji4UohnNypgckNC
Replace stateful stream classes with clean functional design: - HLSSegment: frozen dataclass with metadata (folder_epoch, segment indices, offsets, URLs) + download_as_wav() / download_as_flac() - date_range_segments(): generator yielding HLSSegment for a time range - live_segments(): generator yielding HLSSegment from live stream - No mutable state — standard Python iterators, composable Simplify orchestrator run_loop to: for segment in hls_segments: ... https://claude.ai/code/session_0183cFGd7Ji4UohnNypgckNC
|
Haven't done a proper review, but just a heads up that at some point in the future the folder structure and format for HLS streams may change (e.g. we may remove So it would be ideal to keep using That said, don't want to create a ton of extra overhead, and I totally understand if it's easier/faster to split away right now, go for it. But I thought it's worth a mention and if you do decide to move off |
@paulcretu - agreed and noted here #457. Just prefer to rewrite incrementally, to avoid triggering a major refactor everywhere at once. Once we're comfortable with the new API, can split into a separate issue to track upstream merge. Also @dthaler - do you have the right permissions to merge and trigger CI on https://github.com/orcasound/orca-hls-utils/tree/master? |
I didn't previously, but I think I do now. |
There was a problem hiding this comment.
Pull request overview
This PR replaces the external orca-hls-utils dependency in InferenceSystem with an internal src/orcasound_hls/ module to compute HLS segment timestamps deterministically from S3 folder epochs + M3U8 durations (fixing timestamp drift) and to reduce expensive S3 listing operations. It also refactors LiveInferenceOrchestrator.py so the orchestrator owns polling/cadence while the client provides synchronous segment metadata.
Changes:
- Introduces
orcasound_hls(client/utils/types) to locate folders, parse playlists, and represent downloadable timestamped HLS segments. - Refactors the inference orchestrator to fetch segments synchronously and process them via
_process_segment()for both DateRange and LiveHLS modes. - Adds new integration-style tests for the HLS module and updates CI smoke-test expectations.
Reviewed changes
Copilot reviewed 10 out of 11 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| InferenceSystem/src/orcasound_hls/utils.py | Adds S3 prefix-filtered folder lookup, M3U8 helpers, and latest-folder discovery. |
| InferenceSystem/src/orcasound_hls/types.py | Adds OrcasoundHLSSegment frozen dataclass with WAV/FLAC download+conversion helpers. |
| InferenceSystem/src/orcasound_hls/client.py | Adds synchronous OrcasoundHLSClient.get_segments() over time ranges and latest_stream_start(). |
| InferenceSystem/src/orcasound_hls/init.py | Exposes the new client and segment types as a module API. |
| InferenceSystem/src/LiveInferenceOrchestrator.py | Refactors to use OrcasoundHLSClient and moves polling/sleep into run_loop(). |
| InferenceSystem/tests/test_orcasound_hls.py | Adds end-to-end tests validating folder lookup, timestamp determinism, and downloads against real S3. |
| InferenceSystem/tests/test_orchestrator.py | Updates subprocess test formatting and relaxes LiveHLS smoke expectations to match new logs. |
| InferenceSystem/tests/orch_configs/LiveHLS/LiveHLS_OrcasoundLab.yml | Replaces polling interval with live delay buffer + inference segment sizing. |
| InferenceSystem/pyproject.toml | Removes orca-hls-utils and adds direct deps (boto3, m3u8, ffmpeg-python). |
| InferenceSystem/uv.lock | Updates locked dependency graph to reflect removal/additions. |
| .github/workflows/InferenceSystem.yaml | Updates Docker smoke-test assertions to match new orchestrator logging behavior. |
You can also share your feedback on Copilot code review. Take the survey.
Summary
orca-hls-utils==0.0.4dependency; adds internalsrc/orcasound_hls/module that fixes timestamp drift (Unsynchronized and inconsistent detection timestamps #430) and reduces S3 API calls via prefix-filtered folder lookup Fix expensive HLS S3 listing and expose precise seek location metadata #457OrcasoundHLSClientwith synchronousget_segments()returning a list ofOrcasoundHLSSegmentmetadata — polling/sleep/download now lives in the orchestrator, not the module.OrcasoundHLSSegmentmetadata serves as pointers to HLS audio to S3 with methods for inferring timestamps, and downloadLiveInferenceOrchestratoris synchronized to trigger atXX:01segment-aligned clock boundaries with cleaner loggingWhat changed
src/orcasound_hls/types.pyOrcasoundHLSSegmentfrozen dataclass withdownload_as_wav()/download_as_flac()src/orcasound_hls/utils.pyfetch_latest_folder_epoch()src/orcasound_hls/client.pyOrcasoundHLSClientwithget_segments(start_unix, end_unix)andlatest_stream_start()src/orcasound_hls/__init__.pyOrcasoundHLSClient,OrcasoundHLSSegmentsrc/LiveInferenceOrchestrator.pybuild_orcasound_client()replacesbuild_hls_stream();run_loop()owns polling loop; per-segment logic in_process_segment()tests/test_orcasound_hls.pypyproject.tomlorca-hls-utils; addedffmpeg-python,m3u8,boto3as direct depsWhy
Timestamp drift (#430): Old library computed
start_time = end_index * avg_duration + folder_epoch - 60. Rounding error compounds over hundreds of segments. New module uses cumulative durations from M3U8 metadata — no drift.Client API (#457): Old code using
.get_next_clip()(with internal wait loop) made it impossible to poll multiple hydrophones in one cycle. NewOrcasoundHLSClient.get_segments() -> OrcasoundHLSSegmentis synchronous; the orchestrator controls cadence.Test plan
uv run pytest tests/test_orcasound_hls.py -v(module-level, hits real S3)uv run pytest tests/test_orchestrator.py -v(needs HuggingFace model access)uv run python src/LiveInferenceOrchestrator.py --orch_config tests/orch_configs/LiveHLS/LiveHLS_OrcasoundLab.yml --max_iterations 2Fixes: #430
Fixes: #457