Skip to content

Fix unsynchronized timestamps using rewritten OrcasoundHLSClient#461

Merged
akashmjn merged 23 commits intomainfrom
claude/debug-timestamp-issue-5iZAA
Mar 20, 2026
Merged

Fix unsynchronized timestamps using rewritten OrcasoundHLSClient#461
akashmjn merged 23 commits intomainfrom
claude/debug-timestamp-issue-5iZAA

Conversation

@akashmjn
Copy link
Copy Markdown
Collaborator

@akashmjn akashmjn commented Mar 18, 2026

Summary

  • Removes orca-hls-utils==0.0.4 dependency; adds internal src/orcasound_hls/ module that fixes timestamp drift (Unsynchronized and inconsistent detection timestamps #430) and reduces S3 API calls via prefix-filtered folder lookup Fix expensive HLS S3 listing and expose precise seek location metadata #457
  • Refactors to OrcasoundHLSClient with synchronous get_segments() returning a list of OrcasoundHLSSegment metadata — polling/sleep/download now lives in the orchestrator, not the module.
  • OrcasoundHLSSegment metadata serves as pointers to HLS audio to S3 with methods for inferring timestamps, and download
  • Polling logic with LiveInferenceOrchestrator is synchronized to trigger at XX:01 segment-aligned clock boundaries with cleaner logging
2026-03-20 11:18:03,197 INFO --- [iter 1] LiveHLS poll: fetching segments in [1774030320, 1774030380] (now=1774030680, delay=300.0s)
2026-03-20 11:18:03,475 INFO Found 1 folders in date range
2026-03-20 11:18:03,715 INFO Dropping 10.0s tail audio (1 ts_segments)
2026-03-20 11:18:03,716 INFO [iter 1] LiveHLS poll: got 1 segments
2026-03-20 11:18:03,716 INFO Segment: folder=1773990017, indices=[4030:4036), start=2026-03-20T18:11:59Z, duration=60.0s
2026-03-20 11:18:05,412 INFO Processing clip: rpi-north-sjc_2026_03_20_11_11_59_PDT.wav, start_timestamp=2026-03-20T18:11:59Z
2026-03-20 11:18:05,931 DEBUG Generated spectrogram: wav_dir/rpi-north-sjc_2026_03_20_11_11_59_PDT.png

=== Performance ===
File duration:   59.00s
Processing time: 0.29s
Realtime factor: 204.43x

=== Summary ===
3/29 segments predicted positive
Detected at times: [30.0, 38.0, 44.0]
global_confidence: 0.746
global_prediction: 1
2026-03-20 11:18:06,222 INFO Inference: prediction=1, confidence=0.746, positive_segments=3/29
2026-03-20 11:18:06,222 INFO Orca detected (confidence=0.746)
2026-03-20 11:18:06,222 DEBUG Deleted local files: wav_dir/rpi-north-sjc_2026_03_20_11_11_59_PDT.wav, wav_dir/rpi-north-sjc_2026_03_20_11_11_59_PDT.png
2026-03-20 11:18:06,222 DEBUG Sleeping for 53.8s until 1774030740

What changed

File Change
src/orcasound_hls/types.py New — OrcasoundHLSSegment frozen dataclass with download_as_wav()/download_as_flac()
src/orcasound_hls/utils.py New — S3 folder lookup (prefix-filtered bisect), M3U8 playlist loading, fetch_latest_folder_epoch()
src/orcasound_hls/client.py New — OrcasoundHLSClient with get_segments(start_unix, end_unix) and latest_stream_start()
src/orcasound_hls/__init__.py New — exports OrcasoundHLSClient, OrcasoundHLSSegment
src/LiveInferenceOrchestrator.py build_orcasound_client() replaces build_hls_stream(); run_loop() owns polling loop; per-segment logic in _process_segment()
tests/test_orcasound_hls.py New — 20 tests covering locator, segment metadata, client API, cross-hydrophone, download
pyproject.toml Removed orca-hls-utils; added ffmpeg-python, m3u8, boto3 as direct deps

Why

Timestamp drift (#430): Old library computed start_time = end_index * avg_duration + folder_epoch - 60. Rounding error compounds over hundreds of segments. New module uses cumulative durations from M3U8 metadata — no drift.

Client API (#457): Old code using .get_next_clip() (with internal wait loop) made it impossible to poll multiple hydrophones in one cycle. New OrcasoundHLSClient.get_segments() -> OrcasoundHLSSegment is synchronous; the orchestrator controls cadence.

Test plan

  • uv run pytest tests/test_orcasound_hls.py -v (module-level, hits real S3)
  • uv run pytest tests/test_orchestrator.py -v (needs HuggingFace model access)
  • LiveHLS smoke test: uv run python src/LiveInferenceOrchestrator.py --orch_config tests/orch_configs/LiveHLS/LiveHLS_OrcasoundLab.yml --max_iterations 2
  • Docker build

Fixes: #430
Fixes: #457

claude and others added 7 commits March 17, 2026 23:22
Fixes #430 (timestamp drift) and #457 (expensive S3 listing).

Key changes:
- New src/orcasound_hls/ module with deterministic timestamp computation
  from S3 folder epoch + actual M3U8 segment durations (not system time)
- Prefix-filtered S3 folder listing via hls_locator.py (avoids scanning
  all ~1000+ folders)
- LiveHLSStream and DateRangeHLSStream as drop-in replacements
- Remove orca-hls-utils==0.0.4 dependency; add ffmpeg-python, m3u8, boto3
- Add tests/test_orcasound_hls.py with 11 tests against real S3 data

https://claude.ai/code/session_0183cFGd7Ji4UohnNypgckNC
Replace stateful stream classes with clean functional design:

- HLSSegment: frozen dataclass with metadata (folder_epoch, segment
  indices, offsets, URLs) + download_as_wav() / download_as_flac()
- date_range_segments(): generator yielding HLSSegment for a time range
- live_segments(): generator yielding HLSSegment from live stream
- No mutable state — standard Python iterators, composable

Simplify orchestrator run_loop to: for segment in hls_segments: ...

https://claude.ai/code/session_0183cFGd7Ji4UohnNypgckNC
Comment thread InferenceSystem/src/orcasound_hls/types.py Outdated
@paulcretu
Copy link
Copy Markdown
Member

Haven't done a proper review, but just a heads up that at some point in the future the folder structure and format for HLS streams may change (e.g. we may remove latest.txt & separate folders for each restart, or even bake in timestamps into .ts filenames).

So it would be ideal to keep using orca-hls-utils so we can keep a single place for breaking changes down the line. In other words, the HLS stream folder structure should be considered a private API subject to change and not something to build off long-term. If OrcasoundHLSClient is the way forward, I think those changes should just go into orca-hls-utils.

That said, don't want to create a ton of extra overhead, and I totally understand if it's easier/faster to split away right now, go for it. But I thought it's worth a mention and if you do decide to move off orca-hls-utils, it could use a follow up issue to keep track of a future re-integration once the upstream code can be updated.

@akashmjn
Copy link
Copy Markdown
Collaborator Author

akashmjn commented Mar 18, 2026

Haven't done a proper review, but just a heads up that at some point in the future the folder structure and format for HLS streams may change (e.g. we may remove latest.txt & separate folders for each restart, or even bake in timestamps into .ts filenames).

So it would be ideal to keep using orca-hls-utils so we can keep a single place for breaking changes down the line. In other words, the HLS stream folder structure should be considered a private API subject to change and not something to build off long-term. If OrcasoundHLSClient is the way forward, I think those changes should just go into orca-hls-utils.

That said, don't want to create a ton of extra overhead, and I totally understand if it's easier/faster to split away right now, go for it. But I thought it's worth a mention and if you do decide to move off orca-hls-utils, it could use a follow up issue to keep track of a future re-integration once the upstream code can be updated.

@paulcretu - agreed and noted here #457. Just prefer to rewrite incrementally, to avoid triggering a major refactor everywhere at once.

Once we're comfortable with the new API, can split into a separate issue to track upstream merge.

Also @dthaler - do you have the right permissions to merge and trigger CI on https://github.com/orcasound/orca-hls-utils/tree/master?

@akashmjn akashmjn marked this pull request as ready for review March 19, 2026 02:26
@akashmjn akashmjn requested a review from scottveirs as a code owner March 19, 2026 02:33
@akashmjn akashmjn requested a review from dthaler March 19, 2026 15:36
@dthaler
Copy link
Copy Markdown
Collaborator

dthaler commented Mar 19, 2026

Also @dthaler - do you have the right permissions to merge and trigger CI on https://github.com/orcasound/orca-hls-utils/tree/master?

I didn't previously, but I think I do now.

@dthaler dthaler requested a review from Copilot March 19, 2026 17:34
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR replaces the external orca-hls-utils dependency in InferenceSystem with an internal src/orcasound_hls/ module to compute HLS segment timestamps deterministically from S3 folder epochs + M3U8 durations (fixing timestamp drift) and to reduce expensive S3 listing operations. It also refactors LiveInferenceOrchestrator.py so the orchestrator owns polling/cadence while the client provides synchronous segment metadata.

Changes:

  • Introduces orcasound_hls (client/utils/types) to locate folders, parse playlists, and represent downloadable timestamped HLS segments.
  • Refactors the inference orchestrator to fetch segments synchronously and process them via _process_segment() for both DateRange and LiveHLS modes.
  • Adds new integration-style tests for the HLS module and updates CI smoke-test expectations.

Reviewed changes

Copilot reviewed 10 out of 11 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
InferenceSystem/src/orcasound_hls/utils.py Adds S3 prefix-filtered folder lookup, M3U8 helpers, and latest-folder discovery.
InferenceSystem/src/orcasound_hls/types.py Adds OrcasoundHLSSegment frozen dataclass with WAV/FLAC download+conversion helpers.
InferenceSystem/src/orcasound_hls/client.py Adds synchronous OrcasoundHLSClient.get_segments() over time ranges and latest_stream_start().
InferenceSystem/src/orcasound_hls/init.py Exposes the new client and segment types as a module API.
InferenceSystem/src/LiveInferenceOrchestrator.py Refactors to use OrcasoundHLSClient and moves polling/sleep into run_loop().
InferenceSystem/tests/test_orcasound_hls.py Adds end-to-end tests validating folder lookup, timestamp determinism, and downloads against real S3.
InferenceSystem/tests/test_orchestrator.py Updates subprocess test formatting and relaxes LiveHLS smoke expectations to match new logs.
InferenceSystem/tests/orch_configs/LiveHLS/LiveHLS_OrcasoundLab.yml Replaces polling interval with live delay buffer + inference segment sizing.
InferenceSystem/pyproject.toml Removes orca-hls-utils and adds direct deps (boto3, m3u8, ffmpeg-python).
InferenceSystem/uv.lock Updates locked dependency graph to reflect removal/additions.
.github/workflows/InferenceSystem.yaml Updates Docker smoke-test assertions to match new orchestrator logging behavior.

You can also share your feedback on Copilot code review. Take the survey.

Comment thread InferenceSystem/src/orcasound_hls/utils.py
Comment thread InferenceSystem/pyproject.toml Outdated
Comment thread InferenceSystem/src/LiveInferenceOrchestrator.py Outdated
Comment thread InferenceSystem/src/LiveInferenceOrchestrator.py Outdated
Comment thread InferenceSystem/src/LiveInferenceOrchestrator.py
Comment thread InferenceSystem/src/orcasound_hls/types.py Outdated
Comment thread InferenceSystem/src/orcasound_hls/client.py Outdated
Comment thread InferenceSystem/src/orcasound_hls/utils.py Outdated
Comment thread InferenceSystem/tests/orch_configs/LiveHLS/LiveHLS_OrcasoundLab.yml Outdated
Comment thread InferenceSystem/src/LiveInferenceOrchestrator.py
@akashmjn akashmjn merged commit 27d5534 into main Mar 20, 2026
26 checks passed
@akashmjn akashmjn deleted the claude/debug-timestamp-issue-5iZAA branch March 20, 2026 19:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix expensive HLS S3 listing and expose precise seek location metadata Unsynchronized and inconsistent detection timestamps

5 participants