Fix unsynchronized timestamps using rewritten OrcasoundHLSClient by akashmjn · Pull Request #461 · orcasound/orcahello

akashmjn · 2026-03-18T08:08:07Z

Summary

Removes orca-hls-utils==0.0.4 dependency; adds internal src/orcasound_hls/ module that fixes timestamp drift (Unsynchronized and inconsistent detection timestamps #430) and reduces S3 API calls via prefix-filtered folder lookup Fix expensive HLS S3 listing and expose precise seek location metadata #457
Refactors to OrcasoundHLSClient with synchronous get_segments() returning a list of OrcasoundHLSSegment metadata — polling/sleep/download now lives in the orchestrator, not the module.
OrcasoundHLSSegment metadata serves as pointers to HLS audio to S3 with methods for inferring timestamps, and download
Polling logic with LiveInferenceOrchestrator is synchronized to trigger at XX:01 segment-aligned clock boundaries with cleaner logging

2026-03-20 11:18:03,197 INFO --- [iter 1] LiveHLS poll: fetching segments in [1774030320, 1774030380] (now=1774030680, delay=300.0s)
2026-03-20 11:18:03,475 INFO Found 1 folders in date range
2026-03-20 11:18:03,715 INFO Dropping 10.0s tail audio (1 ts_segments)
2026-03-20 11:18:03,716 INFO [iter 1] LiveHLS poll: got 1 segments
2026-03-20 11:18:03,716 INFO Segment: folder=1773990017, indices=[4030:4036), start=2026-03-20T18:11:59Z, duration=60.0s
2026-03-20 11:18:05,412 INFO Processing clip: rpi-north-sjc_2026_03_20_11_11_59_PDT.wav, start_timestamp=2026-03-20T18:11:59Z
2026-03-20 11:18:05,931 DEBUG Generated spectrogram: wav_dir/rpi-north-sjc_2026_03_20_11_11_59_PDT.png

=== Performance ===
File duration:   59.00s
Processing time: 0.29s
Realtime factor: 204.43x

=== Summary ===
3/29 segments predicted positive
Detected at times: [30.0, 38.0, 44.0]
global_confidence: 0.746
global_prediction: 1
2026-03-20 11:18:06,222 INFO Inference: prediction=1, confidence=0.746, positive_segments=3/29
2026-03-20 11:18:06,222 INFO Orca detected (confidence=0.746)
2026-03-20 11:18:06,222 DEBUG Deleted local files: wav_dir/rpi-north-sjc_2026_03_20_11_11_59_PDT.wav, wav_dir/rpi-north-sjc_2026_03_20_11_11_59_PDT.png
2026-03-20 11:18:06,222 DEBUG Sleeping for 53.8s until 1774030740

What changed

File	Change
`src/orcasound_hls/types.py`	New — `OrcasoundHLSSegment` frozen dataclass with `download_as_wav()`/`download_as_flac()`
`src/orcasound_hls/utils.py`	New — S3 folder lookup (prefix-filtered bisect), M3U8 playlist loading, `fetch_latest_folder_epoch()`
`src/orcasound_hls/client.py`	New — `OrcasoundHLSClient` with `get_segments(start_unix, end_unix)` and `latest_stream_start()`
`src/orcasound_hls/__init__.py`	New — exports `OrcasoundHLSClient`, `OrcasoundHLSSegment`
`src/LiveInferenceOrchestrator.py`	`build_orcasound_client()` replaces `build_hls_stream()`; `run_loop()` owns polling loop; per-segment logic in `_process_segment()`
`tests/test_orcasound_hls.py`	New — 20 tests covering locator, segment metadata, client API, cross-hydrophone, download
`pyproject.toml`	Removed `orca-hls-utils`; added `ffmpeg-python`, `m3u8`, `boto3` as direct deps

Why

Timestamp drift (#430): Old library computed start_time = end_index * avg_duration + folder_epoch - 60. Rounding error compounds over hundreds of segments. New module uses cumulative durations from M3U8 metadata — no drift.

Client API (#457): Old code using .get_next_clip() (with internal wait loop) made it impossible to poll multiple hydrophones in one cycle. New OrcasoundHLSClient.get_segments() -> OrcasoundHLSSegment is synchronous; the orchestrator controls cadence.

Test plan

uv run pytest tests/test_orcasound_hls.py -v (module-level, hits real S3)
uv run pytest tests/test_orchestrator.py -v (needs HuggingFace model access)
LiveHLS smoke test: uv run python src/LiveInferenceOrchestrator.py --orch_config tests/orch_configs/LiveHLS/LiveHLS_OrcasoundLab.yml --max_iterations 2
Docker build

Fixes: #430
Fixes: #457

Fixes #430 (timestamp drift) and #457 (expensive S3 listing). Key changes: - New src/orcasound_hls/ module with deterministic timestamp computation from S3 folder epoch + actual M3U8 segment durations (not system time) - Prefix-filtered S3 folder listing via hls_locator.py (avoids scanning all ~1000+ folders) - LiveHLSStream and DateRangeHLSStream as drop-in replacements - Remove orca-hls-utils==0.0.4 dependency; add ffmpeg-python, m3u8, boto3 - Add tests/test_orcasound_hls.py with 11 tests against real S3 data https://claude.ai/code/session_0183cFGd7Ji4UohnNypgckNC

Replace stateful stream classes with clean functional design: - HLSSegment: frozen dataclass with metadata (folder_epoch, segment indices, offsets, URLs) + download_as_wav() / download_as_flac() - date_range_segments(): generator yielding HLSSegment for a time range - live_segments(): generator yielding HLSSegment from live stream - No mutable state — standard Python iterators, composable Simplify orchestrator run_loop to: for segment in hls_segments: ... https://claude.ai/code/session_0183cFGd7Ji4UohnNypgckNC

https://claude.ai/code/session_0183cFGd7Ji4UohnNypgckNC

paulcretu · 2026-03-18T22:41:03Z

Haven't done a proper review, but just a heads up that at some point in the future the folder structure and format for HLS streams may change (e.g. we may remove latest.txt & separate folders for each restart, or even bake in timestamps into .ts filenames).

So it would be ideal to keep using orca-hls-utils so we can keep a single place for breaking changes down the line. In other words, the HLS stream folder structure should be considered a private API subject to change and not something to build off long-term. If OrcasoundHLSClient is the way forward, I think those changes should just go into orca-hls-utils.

That said, don't want to create a ton of extra overhead, and I totally understand if it's easier/faster to split away right now, go for it. But I thought it's worth a mention and if you do decide to move off orca-hls-utils, it could use a follow up issue to keep track of a future re-integration once the upstream code can be updated.

akashmjn · 2026-03-18T22:53:01Z

Haven't done a proper review, but just a heads up that at some point in the future the folder structure and format for HLS streams may change (e.g. we may remove latest.txt & separate folders for each restart, or even bake in timestamps into .ts filenames).

So it would be ideal to keep using orca-hls-utils so we can keep a single place for breaking changes down the line. In other words, the HLS stream folder structure should be considered a private API subject to change and not something to build off long-term. If OrcasoundHLSClient is the way forward, I think those changes should just go into orca-hls-utils.

That said, don't want to create a ton of extra overhead, and I totally understand if it's easier/faster to split away right now, go for it. But I thought it's worth a mention and if you do decide to move off orca-hls-utils, it could use a follow up issue to keep track of a future re-integration once the upstream code can be updated.

@paulcretu - agreed and noted here #457. Just prefer to rewrite incrementally, to avoid triggering a major refactor everywhere at once.

Once we're comfortable with the new API, can split into a separate issue to track upstream merge.

Also @dthaler - do you have the right permissions to merge and trigger CI on https://github.com/orcasound/orca-hls-utils/tree/master?

dthaler · 2026-03-19T17:28:59Z

Also @dthaler - do you have the right permissions to merge and trigger CI on https://github.com/orcasound/orca-hls-utils/tree/master?

I didn't previously, but I think I do now.

Copilot

Pull request overview

This PR replaces the external orca-hls-utils dependency in InferenceSystem with an internal src/orcasound_hls/ module to compute HLS segment timestamps deterministically from S3 folder epochs + M3U8 durations (fixing timestamp drift) and to reduce expensive S3 listing operations. It also refactors LiveInferenceOrchestrator.py so the orchestrator owns polling/cadence while the client provides synchronous segment metadata.

Changes:

Introduces orcasound_hls (client/utils/types) to locate folders, parse playlists, and represent downloadable timestamped HLS segments.
Refactors the inference orchestrator to fetch segments synchronously and process them via _process_segment() for both DateRange and LiveHLS modes.
Adds new integration-style tests for the HLS module and updates CI smoke-test expectations.

Reviewed changes

Copilot reviewed 10 out of 11 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
InferenceSystem/src/orcasound_hls/utils.py	Adds S3 prefix-filtered folder lookup, M3U8 helpers, and latest-folder discovery.
InferenceSystem/src/orcasound_hls/types.py	Adds `OrcasoundHLSSegment` frozen dataclass with WAV/FLAC download+conversion helpers.
InferenceSystem/src/orcasound_hls/client.py	Adds synchronous `OrcasoundHLSClient.get_segments()` over time ranges and `latest_stream_start()`.
InferenceSystem/src/orcasound_hls/init.py	Exposes the new client and segment types as a module API.
InferenceSystem/src/LiveInferenceOrchestrator.py	Refactors to use `OrcasoundHLSClient` and moves polling/sleep into `run_loop()`.
InferenceSystem/tests/test_orcasound_hls.py	Adds end-to-end tests validating folder lookup, timestamp determinism, and downloads against real S3.
InferenceSystem/tests/test_orchestrator.py	Updates subprocess test formatting and relaxes LiveHLS smoke expectations to match new logs.
InferenceSystem/tests/orch_configs/LiveHLS/LiveHLS_OrcasoundLab.yml	Replaces polling interval with live delay buffer + inference segment sizing.
InferenceSystem/pyproject.toml	Removes `orca-hls-utils` and adds direct deps (`boto3`, `m3u8`, `ffmpeg-python`).
InferenceSystem/uv.lock	Updates locked dependency graph to reflect removal/additions.
.github/workflows/InferenceSystem.yaml	Updates Docker smoke-test assertions to match new orchestrator logging behavior.

You can also share your feedback on Copilot code review. Take the survey.

claude and others added 7 commits March 17, 2026 23:22

Add agent-workspace handoff doc for HLS rewrite

5930999

https://claude.ai/code/session_0183cFGd7Ji4UohnNypgckNC

rm tmp file

2fcfd40

refactor w orcasound hls client

9b0c210

ruff

f793611

Merge branch 'main' into claude/debug-timestamp-issue-5iZAA

f6ba7f6

akashmjn requested a review from dthaler March 18, 2026 20:29

akashmjn mentioned this pull request Mar 18, 2026

Unsynchronized and inconsistent detection timestamps #430

Closed

dthaler reviewed Mar 18, 2026

View reviewed changes

Comment thread InferenceSystem/src/orcasound_hls/types.py Outdated

akashmjn added 4 commits March 18, 2026 16:25

fix hanging livehls pytest for offline hydrophone

718e7b0

cleanup and tail cross folder edge case fix

879c3d5

offset cleanup

53f8571

cleanup tests

1b098d5

akashmjn marked this pull request as ready for review March 19, 2026 02:26

akashmjn requested review from TruaShamu and micya as code owners March 19, 2026 02:26

akashmjn added 2 commits March 18, 2026 19:29

tweak

dab5bdf

fix test-docker

7d00b76

akashmjn requested a review from scottveirs as a code owner March 19, 2026 02:33

akashmjn added 5 commits March 18, 2026 19:43

fix test-docker

c1339a5

update orch config params

745efcf

update orch config params

3e769fa

tweak

1ac4d46

fix logs

65ffc2f

akashmjn requested a review from dthaler March 19, 2026 15:36

dthaler requested a review from Copilot March 19, 2026 17:34

Copilot started reviewing on behalf of dthaler March 19, 2026 17:34 View session

Copilot AI reviewed Mar 19, 2026

View reviewed changes

dthaler reviewed Mar 19, 2026

View reviewed changes

Comment thread InferenceSystem/src/orcasound_hls/utils.py Outdated

Comment thread InferenceSystem/tests/orch_configs/LiveHLS/LiveHLS_OrcasoundLab.yml Outdated

akashmjn added 3 commits March 20, 2026 10:04

pr nits

010b4c7

fix clock-aligned polling

6d3c9a1

max_live_iterations rename

991fc60

dthaler reviewed Mar 20, 2026

View reviewed changes

Comment thread InferenceSystem/src/LiveInferenceOrchestrator.py

akashmjn added 2 commits March 20, 2026 11:37

fix nit

0ecb83f

fix delay

f99b6b5

dthaler approved these changes Mar 20, 2026

View reviewed changes

akashmjn merged commit 27d5534 into main Mar 20, 2026
26 checks passed

akashmjn deleted the claude/debug-timestamp-issue-5iZAA branch March 20, 2026 19:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix unsynchronized timestamps using rewritten OrcasoundHLSClient#461

Fix unsynchronized timestamps using rewritten OrcasoundHLSClient#461
akashmjn merged 23 commits intomainfrom
claude/debug-timestamp-issue-5iZAA

akashmjn commented Mar 18, 2026 •

edited

Loading

Uh oh!

Uh oh!

paulcretu commented Mar 18, 2026

Uh oh!

akashmjn commented Mar 18, 2026 •

edited

Loading

Uh oh!

dthaler commented Mar 19, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

akashmjn commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Why

Test plan

Uh oh!

Uh oh!

paulcretu commented Mar 18, 2026

Uh oh!

akashmjn commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dthaler commented Mar 19, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

akashmjn commented Mar 18, 2026 •

edited

Loading

akashmjn commented Mar 18, 2026 •

edited

Loading