perf(exr): optimize EXR read pipeline for faster playback#231
Open
throb wants to merge 7 commits intoAcademySoftwareFoundation:mainfrom
Open
perf(exr): optimize EXR read pipeline for faster playback#231throb wants to merge 7 commits intoAcademySoftwareFoundation:mainfrom
throb wants to merge 7 commits intoAcademySoftwareFoundation:mainfrom
Conversation
…e loading
Three issues prevented image sequences (EXR, etc.) from loading on Windows:
1. Broken backslash regex in scan_posix_path() - the character class [\]
matched ']' instead of '\', leaving backslashes in scanned file paths.
2. pad_size() returned 0 for non-zero-padded frame numbers (e.g. "1000"),
producing invalid %00d / {:00d} format specifiers that failed extension
matching in is_file_supported().
3. posix_path_to_uri() did not normalise backslashes to forward slashes on
Windows, leaking them into file: URIs.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add --review / -v CLI flag to launch xstudio in "Present" layout (viewport only, no playlists or timeline). This overrides saved user layout preferences on startup. Also add drag-drop handling to the viewport panel so files can be dropped directly onto the viewer in any layout. Previously drops were only handled by the media list panel, which doesn't exist in the Present layout. Bump version to 1.1.1. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove Gamma and Saturation from the default hidden toolbar items so they are visible by default in the viewport toolbar. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
EXR Performance: - Dispatch up to 4 concurrent precache requests in do_precache() - Add configurable EXR decompression thread count preference (default 16) - Fix off-by-one in EXR resolution reporting (max-min → max-min+1) - Crop data window to display window by default (0% overscan) - Add standalone EXR benchmark tool Hotkey Editor: - Replace read-only hotkey viewer with full interactive editor - Click-to-capture key rebinding with conflict detection and warnings - Persistent hotkey overrides saved to %LOCALAPPDATA%/xstudio/ - Search filtering, per-key and reset-all functionality - Scrollbar for overflow content Viewport Fixes: - Fix SSBO shader debug colors (red/blue → black) for out-of-bounds pixels - Fix FBO texture wrap mode (GL_CLAMP_TO_EDGE → GL_CLAMP_TO_BORDER) - Fix overscan display: crop EXR data window to display window by default Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add a Layer dropdown to the OCIO colour pipeline plugin that lets users switch between EXR layers/AOVs (e.g. RGBA, sky, mask, displace) directly from the viewport toolbar and right-click context menu. The backend stream switching already existed — this wires it to the UI: - New StringChoiceAttribute populated dynamically from media streams - Sends current_media_stream_atom on selection change - Base ColourPipeline stores media source actor ref for stream queries - CLAUDE.md updated with correct plugin deployment paths Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extracted from PR AcademySoftwareFoundation#198 and adapted for Windows. Provides a VFX-oriented filesystem browser panel with directory tree, file sequence detection (via fileseq), version grouping, and thumbnail generation. Key Windows fixes: - Drive letter enumeration under virtual "/" root ("This PC") - Case-insensitive path comparison throughout QML tree - Forward-slash normalization on all path returns - Fixed shadowed `time` import in _get_subdirs - Direct attribute write from DirectoryTree (bypasses signal chain) - Tree auto-syncs to current path on launch - Up-directory button in path bar - Depth spinner persists across sessions (title mismatch fix) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Cache dump_json_headers() per part index — metadata is identical across all frames in a sequence, eliminates ~400 RTTI dynamic_casts per frame - Cache MultiPartInputFile handle — reuse when same file path is re-read (scrub-back, multi-stream), avoids repeated open+header-parse syscalls - Batch precache cache checks — single preserve_atom message per group instead of N individual request/response round-trips to cache actor - Bump max_in_flight from 4 to 8 for better pipeline saturation on multi-core CPUs (benchmark: 83→88 fps on 32-thread system) Benchmark on 4312x2274 ZIPS EXR (NVMe): confirms no regression, app-level overhead reduced by eliminating per-frame redundant work. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Targeted optimizations to the EXR read pipeline, reducing per-frame overhead and improving precache throughput. Benchmarked on 4312x2274 ZIPS half-float EXR sequences on NVMe storage with a 32-thread CPU.
Changes
1. Cache EXR header metadata (high impact)
dump_json_headers()was called on every frame read, serializing all EXR header attributes to JSON via ~400 RTTIdynamic_castoperations. Since metadata is identical across all frames in a sequence, this is now computed once per part index and cached in theOpenEXRMediaReaderinstance.Files:
openexr.cpp,openexr.hpp2. Cache
MultiPartInputFilehandle (medium impact)Imf::MultiPartInputFilewas constructed fresh on every frame, paying the cost of a file open syscall + full header parse each time. Now cached per reader instance — when the same file path is re-read (scrub-back, multi-stream access), the existing handle is reused. Cache is invalidated automatically when the path changes.Files:
openexr.cpp,openexr.hpp3. Batch precache cache checks (medium impact)
The
do_precache()dispatch loop previously sent individualpreserve_atomrequest/response messages to the cache actor for each frame — up to N sequential round-trips through the CAF mailbox. Now pops all available requests first, groups them by cache actor + playhead, and sends a single batchedpreserve_atomper group. The cache returns only uncached entries, and reads are dispatched only for those.Extracted
dispatch_precache_read()helper to keep the restructureddo_precache()readable.Files:
media_reader_actor.cpp,media_reader_actor.hpp4. Increase
max_in_flightfrom 4 to 8 (low-medium impact)Allows 8 concurrent precache reads per playhead instead of 4. Benchmark shows ~5-7% higher throughput on 32-thread systems (83→88 fps). Tail latency (P95) increases slightly due to thread contention, but this is acceptable for precaching where total throughput matters more than per-frame consistency.
Files:
media_reader_actor.cpp5. Prior commit: configurable EXR thread count, concurrent precache dispatch, overscan default
The foundation commit (
d4f20ef) added:exr_thread_countpreference (was hardcoded 16)max_in_flightincreased from 1→4 (now 8 in this PR)bench/exr_benchmark/) for profiling read configurationsBenchmark Results
Test data: 437 frames, 4312x2274, HALF float RGB/RGBA, ZIPS compression, NVMe SSD
System: 32 hardware threads
The sweet spot on a 32-thread system is 4 external × 16 EXR internal threads for smooth playback (low P95), with 8 external threads used for precache fill where total throughput dominates.
Test plan
exr_benchmark --dir <path>to confirm no regression on target hardware🤖 Generated with Claude Code