Caching for cuslide2 by cdinea · Pull Request #1034 · rapidsai/cucim

cdinea · 2026-03-03T16:53:15Z

Description

Implements tile-level caching for cuslide2's single-region read_region() path

Key changes

Tile-level caching (ifd.cpp)
Decomposes each read_region() ROI into its constituent TIFF tiles, performs per-tile cache lookups via cuCIM's existing ImageCache API, decodes only cache-miss tiles via nvImageCodec, inserts decoded tiles into the cache, and assembles the final output raster.
Reuses cuslide's cache infrastructure (create_key, find, lock/unlock, allocate, insert) — no new cache backend code.
Fixed hash_value_ to include the file handle hash (file_hash ^ splitmix64(ifd_index)) for cross-file cache key uniqueness.
Edge tiles are clipped to actual image bounds; decode failures fill with background_value.
Tiles always cached in host memory; GPU output transferred via a single cudaMemcpy(H2D) after assembly.
Falls back to direct ROI decode when caching is not applicable (strip-based images, out-of-bounds ROI, or no_cache mode).

copy-pr-bot · 2026-03-03T16:53:19Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

cdinea · 2026-03-03T17:12:00Z

/ok to test 87618ac

cdinea · 2026-03-03T21:23:52Z

/ok to test 5ee97fa

gigony

Thanks @cdinea !

I reviewed partial commits (https://github.com/rapidsai/cucim/pull/1034/changes/f4333cc2b6e39e4a27fa1d1efc30be2b1b576ec8..34c17ab793ea0653dc42070b607c61a9bab7d383) and it looks good to me.

…r_provider for zero-copy decode

cdinea · 2026-03-29T02:16:10Z

/ok to test b5f8660

Apply pre-commit hook auto-fixes: remove unnecessary f-string prefixes, fix line length, and add missing blank lines. Made-with: Cursor

cdinea · 2026-03-29T02:53:40Z

/ok to test e986c14

Remove unused region_overlap assignments (F841), fix unnecessary f-string prefixes, fix line length, and remove extra blank line. Made-with: Cursor

cdinea · 2026-03-29T03:08:44Z

/ok to test 7c69564

jakirkham

Thanks Cristiana! 🙏

Generally looks good

Had a few comments to clean up a couple things

jakirkham · 2026-03-30T16:07:42Z

+                cucim_free(host_raster);
+                throw std::runtime_error("Failed to allocate GPU buffer for tile-cached output");
+            }
+            cudaMemcpy(gpu_buf, host_raster, one_raster_size, cudaMemcpyHostToDevice);


In the future we may want to look at using stream to make this async

thank you for the feedback @jakirkham - I agree, cudaMemcpyAsync with a stream would allow overlapping the H2D transfer with other work. For now this only runs on the single-location tile-cached path (location_len == 1, batch_size == 1), so there's no pipeline to overlap with, but it would be a worthwhile optimization if we later add prefetching or pipelining to this path. The multi-location batch path already avoids this copy entirely via zero-copy OutputBufferProvider - keepign track of thsi RFE here #1064

jakirkham · 2026-03-30T16:08:38Z

+        if (out_device.type() == cucim::io::DeviceType::kCUDA)
+        {
+            uint8_t* gpu_buf = nullptr;
+            cudaError_t err = cudaMalloc(reinterpret_cast<void**>(&gpu_buf), one_raster_size);


Similarly in the future we could look at making this allocation async

thank you for the feedback @jakirkam - do you suggest to add cudaMallocAsync here?the batch decode path already uses a custom CUDA stream (decode_stream_ in NvImageCodecProcessor, created with cudaStreamNonBlocking) for async operations. This single-location tile-cached path doesn't have a stream because it runs synchronously in IFD::read() outside the NvImageCodecProcessor pipeline. Pllumbing a stream into this path (or using cudaMallocAsync / cudaMemcpyAsync) would be a natural follow-up - tracking this here #1065

jakirkham · 2026-03-30T16:09:32Z

+            print(f"  Cache type: {CuImage.cache().type}")
+            print(f"  Stat recording: {CuImage.cache().record()}")
+
+            import numpy as np


Can we move this import to the top-level?

thank you @jakirkham - addressed this feedback here fbb7e6e

jakirkham · 2026-03-30T16:09:56Z

+
+    except Exception as e:
+        print(f"  ❌ Caching test failed: {e}")
+        import traceback


Same with this import

thank you for the feedback @jakirkham - addressed this feedback here a2cc720

jakirkham · 2026-03-30T16:10:35Z

+    Raises:
+        RuntimeError: If caching assertions fail.
+    """
+    import numpy as np


Let's move this out of the function and to the top-level

thanky ou for the feedback @jakirkham - feedback already addressed in commit fbb7e6e

Move inline `import numpy as np` statements to the module top-level per Python conventions and reviewer feedback. Made-with: Cursor

Made-with: Cursor

cdinea · 2026-03-30T19:42:10Z

/ok to test a2cc720

Add a non-cached baseline read (direct decode, no cache) before enabling the tile cache, then verify pixel-exact match against the cached cold read. This catches any tile decomposition or assembly bugs that produce different output from the direct nvImageCodec path. Made-with: Cursor

Assert that the overlapping read produces both cache hits (shared tiles from previous region) and cache misses (new tiles). Compare the cached overlapping result pixel-for-pixel against a non-cached direct decode of the same region. Made-with: Cursor

1. Replace manual image_cache.unlock() calls with a CacheLockGuard RAII struct that auto-unlocks on destruction. This prevents lock leaks if SharedMemoryImageCache::allocate() or create_value() throw while the per-tile lock is held. 2. Add bits_per_sample_ == 8 && samples_per_pixel_ == 3 to the tile-caching eligibility check, matching the format constraints in is_read_optimizable(). Non-8-bit or non-RGB images now fall through to the direct decode path. Made-with: Cursor

cdinea · 2026-03-31T18:57:19Z

/ok to test 1e19e3c

Made-with: Cursor

cdinea · 2026-03-31T19:07:55Z

/ok to test 23c3fee

The compile-time guard for TIFF tag metadata used #if defined() on enum values (NVIMGCODEC_METADATA_KIND_TIFF_TAG, etc.), which always evaluates to false since these are C enum constants, not preprocessor macros. This caused tile_width_/tile_height_ to remain 0 and disabled tile-level caching entirely. Replace the broken guard with an unconditional #define. Also fix the overlapping-read cache test: the "non-cached baseline" read went through the global cache, pre-populating tiles before the cached read and causing a false assertion failure on zero new misses. Add a deprecation notice to the README for cuslide in favor of cuslide2. Made-with: Cursor

cdinea · 2026-03-31T21:41:56Z

/ok to test 379e3d6

Made-with: Cursor

cdinea · 2026-03-31T21:50:53Z

/ok to test 29c2136

jakirkham · 2026-04-01T05:27:35Z

/ok to test 96dd52a

jakirkham

Thanks Cristiana! 🙏

Only remaining suggestion from me is to put the version of the deprecation

Anything that remains unaddressed would make sure to file an issue so we can continue to iterate on the good feedback we have gotten here (thanks all for taking the time to look and share your thoughts 🙏)

Co-authored-by: jakirkham <jakirkham@gmail.com>

cdinea · 2026-04-01T05:53:07Z

/ok to test 48b0416

mkepa-nv

I think there is a bug now with how user provided buffer is used. IIRC Joaquin noted before that user provided buffer is never used currently? But it would be good to fix that anyway

mkepa-nv · 2026-04-01T09:31:20Z

+
+        if (caller_owns_buffer)
+        {
+            host_raster = output_buffer;


@cdinea The user provided buffer can be on device right? Then this code will not work (it assumes host_raster is in host memory, as it uses memcpy and memset to work with it

thank you for the feedback @mkepa-nv - i added a caller_buffer_on_device check: if the caller's buffer is on CUDA, a temporary host staging buffer is allocated for assembly, and the result is cudaMemcpy'd into the caller's device buffer at the end. Only when the caller's buffer is on the CPU does the code write into it directly .pushed this change in this commit ccdb2b6

The tile assembly loop uses host-side memcpy/memset, so host_raster must point to CPU memory. Previously, when the caller provided a pre-allocated GPU buffer, host_raster was set directly to that device pointer, which would segfault. Now a caller_buffer_on_device check ensures a temporary host staging buffer is used for assembly, with the result cudaMemcpy'd into the caller's device buffer at the end. Made-with: Cursor

Made-with: Cursor

cdinea · 2026-04-01T17:21:03Z

/ok to test f032db2

The hardcoded (128,128) offset did not extend past the cached tile grid for 240x240 tiles, causing zero new misses. Derive the shift and read size from the actual tile dimensions so the overlapping read always produces both cache hits and misses regardless of tile size. Made-with: Cursor

cdinea · 2026-04-01T18:02:51Z

/ok to test c8acab3

jakirkham

Based on offline discussion it sounds like we want to include cuslide's removal date. Have tried to capture that below

Co-authored-by: jakirkham <jakirkham@gmail.com>

cdinea · 2026-04-01T19:51:56Z

/ok to test 533786c

jakirkham

Reapproving. Thanks again Cristiana! Also thanks again to the reviewers 🙏

cdinea · 2026-04-01T21:41:18Z

/merge

cdinea requested review from a team as code owners March 3, 2026 16:53

cdinea self-assigned this Mar 3, 2026

cdinea added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Mar 3, 2026

cdinea changed the title ~~[WIP] Caching~~ [WIP] Caching for cuslide2 Mar 3, 2026

cdinea changed the title ~~[WIP] Caching for cuslide2~~ Caching for cuslide2 Mar 16, 2026

cdinea force-pushed the feature/caching branch from c756a7b to 34c17ab Compare March 17, 2026 17:48

jakirkham changed the base branch from main to release/26.04 March 17, 2026 18:37

jakirkham requested review from a team as code owners March 17, 2026 18:37

jakirkham requested a review from bdice March 17, 2026 18:37

gigony approved these changes Mar 24, 2026

View reviewed changes

cdinea added 4 commits March 28, 2026 19:11

fix GPU batch path: call wait_batch unconditionally, add output_buffe…

a8813f9

…r_provider for zero-copy decode

refactor(tests): raise exceptions on failure instead of returning bool

6fc58d3

implement tile-level caching for cuslide2 single-region read path

414cdc6

Fix TIFF tag case mismatch and add tile-level caching tests

b5f8660

cdinea force-pushed the feature/caching branch from 34c17ab to b5f8660 Compare March 29, 2026 02:14

style: fix ruff/black formatting in test scripts

e986c14

Apply pre-commit hook auto-fixes: remove unnecessary f-string prefixes, fix line length, and add missing blank lines. Made-with: Cursor

style: fix ruff lint and format errors in test scripts

7c69564

Remove unused region_overlap assignments (F841), fix unnecessary f-string prefixes, fix line length, and remove extra blank line. Made-with: Cursor

jakirkham reviewed Mar 30, 2026

View reviewed changes

cdinea added 2 commits March 30, 2026 12:33

style: move numpy import to top-level in test scripts

fbb7e6e

Move inline `import numpy as np` statements to the module top-level per Python conventions and reviewer feedback. Made-with: Cursor

style: move traceback import to top-level in test scripts

a2cc720

Made-with: Cursor

cdinea added 3 commits March 31, 2026 11:11

style: fix pre-commit formatting in test scripts

23c3fee

Made-with: Cursor

style: fix pre-commit formatting in test_common.py

29c2136

Made-with: Cursor

Merge branch 'release/26.04' into feature/caching

96dd52a

jakirkham approved these changes Apr 1, 2026

View reviewed changes

Comment thread README.md Outdated

cdinea and others added 2 commits March 31, 2026 22:52

Update README.md

ce3159a

Co-authored-by: jakirkham <jakirkham@gmail.com>

Merge branch 'release/26.04' into feature/caching

48b0416

jantonguirao approved these changes Apr 1, 2026

View reviewed changes

mkepa-nv suggested changes Apr 1, 2026

View reviewed changes

cdinea added 2 commits April 1, 2026 09:58

refactor: simplify pixel correctness assertions in cache tests

f032db2

Made-with: Cursor

jakirkham reviewed Apr 1, 2026

View reviewed changes

Comment thread README.md Outdated

Update README.md

533786c

Co-authored-by: jakirkham <jakirkham@gmail.com>

jakirkham approved these changes Apr 1, 2026

View reviewed changes

rapids-bot Bot merged commit c121adf into release/26.04 Apr 1, 2026
131 of 132 checks passed

jakirkham deleted the feature/caching branch April 4, 2026 22:36

Conversation

cdinea commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot Bot commented Mar 3, 2026

Uh oh!

cdinea commented Mar 3, 2026

Uh oh!

cdinea commented Mar 3, 2026

Uh oh!

gigony left a comment

Choose a reason for hiding this comment

Uh oh!

cdinea commented Mar 29, 2026

Uh oh!

cdinea commented Mar 29, 2026

Uh oh!

cdinea commented Mar 29, 2026

Uh oh!

jakirkham left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cdinea commented Mar 30, 2026

Uh oh!

cdinea commented Mar 31, 2026

Uh oh!

cdinea commented Mar 31, 2026

Uh oh!

cdinea commented Mar 31, 2026

Uh oh!

cdinea commented Mar 31, 2026

Uh oh!

jakirkham commented Apr 1, 2026

Uh oh!

jakirkham left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cdinea commented Apr 1, 2026

Uh oh!

mkepa-nv left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cdinea commented Apr 1, 2026

Uh oh!

cdinea commented Apr 1, 2026

Uh oh!

jakirkham left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cdinea commented Apr 1, 2026

Uh oh!

jakirkham left a comment

Choose a reason for hiding this comment

Uh oh!

cdinea commented Mar 3, 2026 •

edited

Loading