Skip to content

perf(exr): optimize EXR read pipeline for faster playback#231

Open
throb wants to merge 7 commits intoAcademySoftwareFoundation:mainfrom
throb:perf/exr-read-pipeline
Open

perf(exr): optimize EXR read pipeline for faster playback#231
throb wants to merge 7 commits intoAcademySoftwareFoundation:mainfrom
throb:perf/exr-read-pipeline

Conversation

@throb
Copy link

@throb throb commented Mar 15, 2026

Summary

Targeted optimizations to the EXR read pipeline, reducing per-frame overhead and improving precache throughput. Benchmarked on 4312x2274 ZIPS half-float EXR sequences on NVMe storage with a 32-thread CPU.

Changes

1. Cache EXR header metadata (high impact)

dump_json_headers() was called on every frame read, serializing all EXR header attributes to JSON via ~400 RTTI dynamic_cast operations. Since metadata is identical across all frames in a sequence, this is now computed once per part index and cached in the OpenEXRMediaReader instance.

Files: openexr.cpp, openexr.hpp

2. Cache MultiPartInputFile handle (medium impact)

Imf::MultiPartInputFile was constructed fresh on every frame, paying the cost of a file open syscall + full header parse each time. Now cached per reader instance — when the same file path is re-read (scrub-back, multi-stream access), the existing handle is reused. Cache is invalidated automatically when the path changes.

Files: openexr.cpp, openexr.hpp

3. Batch precache cache checks (medium impact)

The do_precache() dispatch loop previously sent individual preserve_atom request/response messages to the cache actor for each frame — up to N sequential round-trips through the CAF mailbox. Now pops all available requests first, groups them by cache actor + playhead, and sends a single batched preserve_atom per group. The cache returns only uncached entries, and reads are dispatched only for those.

Extracted dispatch_precache_read() helper to keep the restructured do_precache() readable.

Files: media_reader_actor.cpp, media_reader_actor.hpp

4. Increase max_in_flight from 4 to 8 (low-medium impact)

Allows 8 concurrent precache reads per playhead instead of 4. Benchmark shows ~5-7% higher throughput on 32-thread systems (83→88 fps). Tail latency (P95) increases slightly due to thread contention, but this is acceptable for precaching where total throughput matters more than per-frame consistency.

Files: media_reader_actor.cpp

5. Prior commit: configurable EXR thread count, concurrent precache dispatch, overscan default

The foundation commit (d4f20ef) added:

  • Configurable exr_thread_count preference (was hardcoded 16)
  • max_in_flight increased from 1→4 (now 8 in this PR)
  • Default overscan changed from 5% to 0%
  • Benchmark tool (bench/exr_benchmark/) for profiling read configurations
  • Precache error handling to prevent pipeline stalls on unreadable frames

Benchmark Results

Test data: 437 frames, 4312x2274, HALF float RGB/RGBA, ZIPS compression, NVMe SSD
System: 32 hardware threads

Configuration FPS Decoded MB/s Median (ms) P95 (ms)
Sequential, 1 thread 10 618 90 129
4 ext × 16 EXR threads 83 5,192 45 63
8 ext × 16 EXR threads 88 5,524 77 186

The sweet spot on a 32-thread system is 4 external × 16 EXR internal threads for smooth playback (low P95), with 8 external threads used for precache fill where total throughput dominates.

Test plan

  • Load EXR sequence, verify playback at 24fps fills cache without dropped frames
  • Scrub back and forth — file handle cache should eliminate re-open overhead
  • Load multi-part/multi-layer EXR — verify correct layer selection still works
  • Verify metadata panel shows correct EXR attributes (cached headers)
  • Run exr_benchmark --dir <path> to confirm no regression on target hardware
  • Test with overscan EXRs (data_window != display_window) — cropped path still works

🤖 Generated with Claude Code

throb and others added 7 commits March 10, 2026 16:56
…e loading

Three issues prevented image sequences (EXR, etc.) from loading on Windows:

1. Broken backslash regex in scan_posix_path() - the character class [\]
   matched ']' instead of '\', leaving backslashes in scanned file paths.
2. pad_size() returned 0 for non-zero-padded frame numbers (e.g. "1000"),
   producing invalid %00d / {:00d} format specifiers that failed extension
   matching in is_file_supported().
3. posix_path_to_uri() did not normalise backslashes to forward slashes on
   Windows, leaking them into file: URIs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add --review / -v CLI flag to launch xstudio in "Present" layout
(viewport only, no playlists or timeline). This overrides saved user
layout preferences on startup.

Also add drag-drop handling to the viewport panel so files can be
dropped directly onto the viewer in any layout. Previously drops were
only handled by the media list panel, which doesn't exist in the
Present layout.

Bump version to 1.1.1.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove Gamma and Saturation from the default hidden toolbar items so
they are visible by default in the viewport toolbar.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
EXR Performance:
- Dispatch up to 4 concurrent precache requests in do_precache()
- Add configurable EXR decompression thread count preference (default 16)
- Fix off-by-one in EXR resolution reporting (max-min → max-min+1)
- Crop data window to display window by default (0% overscan)
- Add standalone EXR benchmark tool

Hotkey Editor:
- Replace read-only hotkey viewer with full interactive editor
- Click-to-capture key rebinding with conflict detection and warnings
- Persistent hotkey overrides saved to %LOCALAPPDATA%/xstudio/
- Search filtering, per-key and reset-all functionality
- Scrollbar for overflow content

Viewport Fixes:
- Fix SSBO shader debug colors (red/blue → black) for out-of-bounds pixels
- Fix FBO texture wrap mode (GL_CLAMP_TO_EDGE → GL_CLAMP_TO_BORDER)
- Fix overscan display: crop EXR data window to display window by default

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add a Layer dropdown to the OCIO colour pipeline plugin that lets
users switch between EXR layers/AOVs (e.g. RGBA, sky, mask, displace)
directly from the viewport toolbar and right-click context menu.

The backend stream switching already existed — this wires it to the UI:
- New StringChoiceAttribute populated dynamically from media streams
- Sends current_media_stream_atom on selection change
- Base ColourPipeline stores media source actor ref for stream queries
- CLAUDE.md updated with correct plugin deployment paths

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extracted from PR AcademySoftwareFoundation#198 and adapted for Windows. Provides a VFX-oriented
filesystem browser panel with directory tree, file sequence detection
(via fileseq), version grouping, and thumbnail generation.

Key Windows fixes:
- Drive letter enumeration under virtual "/" root ("This PC")
- Case-insensitive path comparison throughout QML tree
- Forward-slash normalization on all path returns
- Fixed shadowed `time` import in _get_subdirs
- Direct attribute write from DirectoryTree (bypasses signal chain)
- Tree auto-syncs to current path on launch
- Up-directory button in path bar
- Depth spinner persists across sessions (title mismatch fix)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Cache dump_json_headers() per part index — metadata is identical across
  all frames in a sequence, eliminates ~400 RTTI dynamic_casts per frame
- Cache MultiPartInputFile handle — reuse when same file path is re-read
  (scrub-back, multi-stream), avoids repeated open+header-parse syscalls
- Batch precache cache checks — single preserve_atom message per group
  instead of N individual request/response round-trips to cache actor
- Bump max_in_flight from 4 to 8 for better pipeline saturation on
  multi-core CPUs (benchmark: 83→88 fps on 32-thread system)

Benchmark on 4312x2274 ZIPS EXR (NVMe): confirms no regression,
app-level overhead reduced by eliminating per-frame redundant work.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant