diff --git a/.claude/sweep-performance-state.csv b/.claude/sweep-performance-state.csv
index 9c43540ef..098fc5f95 100644
--- a/.claude/sweep-performance-state.csv
+++ b/.claude/sweep-performance-state.csv
@@ -18,6 +18,7 @@ fire,2026-03-31T18:00:00Z,SAFE,compute-bound,0,,
 flood,2026-03-31T18:00:00Z,SAFE,compute-bound,0,,
 focal,2026-03-31T18:00:00Z,SAFE,compute-bound,0,,
 geodesic,2026-03-31T18:00:00Z,N/A,compute-bound,0,,
+geotiff,2026-05-15,SAFE,IO-bound,0,1923,"Pass 9 (2026-05-15): re-audit after #1888 writers/_writers split, #1885-1888 backends extraction, #1909 chunked GPU dtype cast, #1910 band-kwarg validation, #1911-1918 various. 1 new MEDIUM found and fixed. _read_vrt_chunked in _backends/vrt.py:500 embedded the full parsed VRTDataset in every chunk task's kwargs (kwargs['parsed_vrt']=vrt) instead of routing it through a shared dask.delayed(vrt, pure=True) like the sibling http_meta_key (dask.py:147) and GDS meta_key (gpu.py:1035) patterns. Under distributed/process schedulers each task pickle ships the full source list independently; structurally verified pre-fix: 64 of 64 _vrt_chunk_read tasks held an inline VRTDataset in kwargs (1000-source VRT * 1000 chunks => ~57 MB graph). Filed #1923, fixed in this branch; 3 new tests in test_vrt_chunked_shared_dataset_1923.py (structural shared-input check, decode unchanged, band validation unchanged). 438 existing VRT tests pass; the 1 unrelated tile_size=4 GPU validator failure predates this change. Also noted but not fixed: LOW _apply_nodata_mask_gpu (_backends/_gpu_helpers.py:62-67) uses cupy.where alloc for fresh GPU buffers where in-place arr_gpu[mask] = nan would save one chunk-sized allocation; deferred since arr buffers are small (<= chunk size) and current code is already correct. SAFE/IO-bound verdict holds. | Pass 8 (2026-05-12): 1 new MEDIUM found and fixed. _assemble_standard_layout/_assemble_cog_layout returned bytes(bytearray), doubling peak memory transiently during eager writes. Filed #1756, fixed by returning the bytearray directly. Measured: 95 MB uint8 raster peak drops 202 MB -> 107 MB. _write_bytes / parse_header already accepted the buffer protocol so the change is transparent to callers. 6 new tests in test_assemble_layout_no_bytes_copy_1756.py. 2123 existing geotiff tests pass; the 10 unrelated failures (test_no_georef_windowed_coords_1710, test_predictor2_big_endian_gpu_1517) reference the now-private read_to_array attribute (commit 8adb749, issue #1708) and predate this change. SAFE/IO-bound verdict holds. | Pass 7 (2026-05-12): re-audit identified 4 MEDIUM findings, all real, all backed by microbenches. (1) unpack_bits sub-byte loops for bps=2/4/12 in _compression.py:836-878 were 100-200x slower than vectorised numpy (filed #1713, fixed in this branch: bps=4 2M pixels drops from 165ms to 3ms = 55x; bps=2/12 similar). (2) _write_vrt_tiled at __init__.py:1708 uses scheduler='synchronous' on independent tile writes; measured 33% slowdown on 256-tile zstd write vs threads scheduler (filed #1714, no fix yet). (3) _nvcomp_batch_compress at _gpu_decode.py:2522-2526 still does per-tile cupy.get().tobytes() despite #1552 / #1659 fixing the same pattern elsewhere; measured 45% reduction with concat+single get on n=1024 (filed #1712, no fix yet). (4) _nvcomp_batch_compress at _gpu_decode.py:2457 uses per-tile cupy.empty allocations; 1024 tiles 16KB drops from 4.7ms to 1.0ms with single contiguous + views (bundled into #1712). Cat 6 OOM verdict: SAFE/IO-bound holds -- read_geotiff_dask caps task count at _MAX_DASK_CHUNKS=50_000 and per-chunk memory is bounded by chunk size. _inflate_tiles_kernel resource usage on Ampere: 67 regs/thread, 2896B local/thread, 8192B shared/block (LZW kernel: 29 regs, 24576B shared) -- register pressure under control; high local memory in inflate is unavoidable (LZ77 state) but only thread 0 in each block uses it. | Pass 4 (2026-05-10): re-audit after #1559 (centralise attrs across all read backends). New _populate_attrs_from_geo_info helper at __init__.py:301 runs once per read, not per-chunk -- no perf impact. Probe: 2560x2560 deflate-tiled file opened via read_geotiff_dask yields 400 tasks (4 tasks/chunk for 100 chunks), well under 1M cap. read_geotiff_gpu(1024x1024) returns cupy.ndarray end-to-end with no host round-trip (226ms incl. write+decode). No new HIGH/MEDIUM findings. SAFE/IO-bound holds. | Pass 3 (2026-05-10): SAFE/IO-bound. Audited 4 perf commits: #1558 (in-place NaN writes on uniquely-owned buffers correct), #1556 (fp-predictor ngjit ~297us/tile for 256x256 float32), #1552 (single cupy.concatenate + one .get() for batched D2H at _gpu_decode.py:870-913), #1551 (parallel decode threshold >=65536px engages 256x256 default at _reader.py:1121). Bench: 8192x8192 f32 deflate+pred2 256-tile write 782ms; 4096x4096 f32 deflate read 83ms with parallel decode. Deferred LOW (none filed, all <10% MEDIUM threshold): _writer.py:459/1109 redundant .copy() before predictor encode (~1% per tile), _compression.py:280 lzw_decompress dst[:n].copy() (~2% per LZW tile decode), _writer.py:1419 seg_np.copy() before in-place NaN substitution (negligible, conditional path), _CloudSource.read_range opens fresh fsspec handle per range (pre-existing, predates audit scope). nvCOMP per-tile D2H batching break-even confirmed (variable sizes need staging buffer, no win). | Pass 3 (2026-05-10): audited f157746,39322c3,f23ec8f,1aac3b7. All 5 commits correct. Redundant .copy() in _writer.py:459,1109 and _compression.py:280 (1-2% overhead, LOW). _CloudSource.read_range() per-call open is pre-existing arch issue. No HIGH/MEDIUM regressions. SAFE. | re-audit 2026-05-02: 6 commits since 2026-04-16 (predictor=3 CPU encode/decode, GPU predictor stride fix, validate_tile_layout, BigTIFF LONG8 offsets, AREA_OR_POINT VRT, per-tile alloc guard). 1M dask chunk cap intact at __init__.py:948; adler32 batch transfer intact at _gpu_decode.py:1825. New code is metadata validation and dispatcher logic with no extra materialization or per-tile sync points. No HIGH/MEDIUM regressions. | Pass 5 (2026-05-12): re-audit identified MEDIUM in _gpu_decode.py:1577 _try_nvcomp_from_device_bufs: per-tile cupy.empty + trailing cupy.concatenate doubled peak VRAM and added serial concat. Filed #1659 and fixed to single-buffer + pointer offsets (matches LZW/deflate/host-buffer patterns at L1847/L1878/L1114). Microbench (alloc+concat overhead only, not full nvCOMP latency): n=256 tile_bytes=65536 drops 3.66ms->0.69ms, n=256 tile_bytes=262144 drops 8.18ms->0.13ms. Tests: 5 new tests in test_nvcomp_from_device_bufs_single_alloc_1659.py (codec short-circuit, no-lib short-circuit, memory-guard contract, real ZSTD round-trip via nvCOMP, structural single-buffer check). 1458 existing geotiff tests pass, 3 unrelated matplotlib/py3.14 failures pre-existing. SAFE/IO-bound verdict holds. | Pass 6 (2026-05-12): re-audit on top of #1659. New HIGH in _try_kvikio_read_tiles at _gpu_decode.py:941: per-tile cupy.empty() + blocking IOFuture.get() inside loop serialised GDS reads to ~1 outstanding pread, missed parallelism the kvikio worker pool was designed for, paid per-tile cupy.empty setup (matches #1659 anti-pattern in nvCOMP path), and lacked _check_gpu_memory guard. Filed #1688 and fixed to single contiguous buffer + batched submit + guard. Microbench with 8-worker pool simulation: 256 tiles@1ms latency drops 256ms->38.7ms (~6.6x); single-thread simulation 256ms->28.5ms (9x). Tests: 9 new tests in test_kvikio_batched_pread_1688.py (kvikio-absent path, single-buffer pointer arithmetic, submit-before-get ordering, memory guard, partial-read fallback, round-trip data, zero-size/all-sparse tiles). All 1577 geotiff tests pass except pre-existing matplotlib/py3.14 failures."
 geotiff,2026-05-15,SAFE,IO-bound,0,1756,"Rockout 2026-05-15: LOW filed #1934 -- _apply_nodata_mask_gpu used cupy.where (allocating); switched to cupy.putmask on the already-owned buffer (float path) and on the post-astype float64 buffer (int path). Saves one chunk-sized device allocation per call. 7 new tests in test_apply_nodata_mask_gpu_inplace_1934.py; 52 related nodata tests pass. | Pass 8 (2026-05-12): 1 new MEDIUM found and fixed. _assemble_standard_layout/_assemble_cog_layout returned bytes(bytearray), doubling peak memory transiently during eager writes. Filed #1756, fixed by returning the bytearray directly. Measured: 95 MB uint8 raster peak drops 202 MB -> 107 MB. _write_bytes / parse_header already accepted the buffer protocol so the change is transparent to callers. 6 new tests in test_assemble_layout_no_bytes_copy_1756.py. 2123 existing geotiff tests pass; the 10 unrelated failures (test_no_georef_windowed_coords_1710, test_predictor2_big_endian_gpu_1517) reference the now-private read_to_array attribute (commit 8adb749, issue #1708) and predate this change. SAFE/IO-bound verdict holds. | Pass 7 (2026-05-12): re-audit identified 4 MEDIUM findings, all real, all backed by microbenches. (1) unpack_bits sub-byte loops for bps=2/4/12 in _compression.py:836-878 were 100-200x slower than vectorised numpy (filed #1713, fixed in this branch: bps=4 2M pixels drops from 165ms to 3ms = 55x; bps=2/12 similar). (2) _write_vrt_tiled at __init__.py:1708 uses scheduler='synchronous' on independent tile writes; measured 33% slowdown on 256-tile zstd write vs threads scheduler (filed #1714, no fix yet). (3) _nvcomp_batch_compress at _gpu_decode.py:2522-2526 still does per-tile cupy.get().tobytes() despite #1552 / #1659 fixing the same pattern elsewhere; measured 45% reduction with concat+single get on n=1024 (filed #1712, no fix yet). (4) _nvcomp_batch_compress at _gpu_decode.py:2457 uses per-tile cupy.empty allocations; 1024 tiles 16KB drops from 4.7ms to 1.0ms with single contiguous + views (bundled into #1712). Cat 6 OOM verdict: SAFE/IO-bound holds -- read_geotiff_dask caps task count at _MAX_DASK_CHUNKS=50_000 and per-chunk memory is bounded by chunk size. _inflate_tiles_kernel resource usage on Ampere: 67 regs/thread, 2896B local/thread, 8192B shared/block (LZW kernel: 29 regs, 24576B shared) -- register pressure under control; high local memory in inflate is unavoidable (LZ77 state) but only thread 0 in each block uses it. | Pass 4 (2026-05-10): re-audit after #1559 (centralise attrs across all read backends). New _populate_attrs_from_geo_info helper at __init__.py:301 runs once per read, not per-chunk -- no perf impact. Probe: 2560x2560 deflate-tiled file opened via read_geotiff_dask yields 400 tasks (4 tasks/chunk for 100 chunks), well under 1M cap. read_geotiff_gpu(1024x1024) returns cupy.ndarray end-to-end with no host round-trip (226ms incl. write+decode). No new HIGH/MEDIUM findings. SAFE/IO-bound holds. | Pass 3 (2026-05-10): SAFE/IO-bound. Audited 4 perf commits: #1558 (in-place NaN writes on uniquely-owned buffers correct), #1556 (fp-predictor ngjit ~297us/tile for 256x256 float32), #1552 (single cupy.concatenate + one .get() for batched D2H at _gpu_decode.py:870-913), #1551 (parallel decode threshold >=65536px engages 256x256 default at _reader.py:1121). Bench: 8192x8192 f32 deflate+pred2 256-tile write 782ms; 4096x4096 f32 deflate read 83ms with parallel decode. Deferred LOW (none filed, all <10% MEDIUM threshold): _writer.py:459/1109 redundant .copy() before predictor encode (~1% per tile), _compression.py:280 lzw_decompress dst[:n].copy() (~2% per LZW tile decode), _writer.py:1419 seg_np.copy() before in-place NaN substitution (negligible, conditional path), _CloudSource.read_range opens fresh fsspec handle per range (pre-existing, predates audit scope). nvCOMP per-tile D2H batching break-even confirmed (variable sizes need staging buffer, no win). | Pass 3 (2026-05-10): audited f157746,39322c3,f23ec8f,1aac3b7. All 5 commits correct. Redundant .copy() in _writer.py:459,1109 and _compression.py:280 (1-2% overhead, LOW). _CloudSource.read_range() per-call open is pre-existing arch issue. No HIGH/MEDIUM regressions. SAFE. | re-audit 2026-05-02: 6 commits since 2026-04-16 (predictor=3 CPU encode/decode, GPU predictor stride fix, validate_tile_layout, BigTIFF LONG8 offsets, AREA_OR_POINT VRT, per-tile alloc guard). 1M dask chunk cap intact at __init__.py:948; adler32 batch transfer intact at _gpu_decode.py:1825. New code is metadata validation and dispatcher logic with no extra materialization or per-tile sync points. No HIGH/MEDIUM regressions. | Pass 5 (2026-05-12): re-audit identified MEDIUM in _gpu_decode.py:1577 _try_nvcomp_from_device_bufs: per-tile cupy.empty + trailing cupy.concatenate doubled peak VRAM and added serial concat. Filed #1659 and fixed to single-buffer + pointer offsets (matches LZW/deflate/host-buffer patterns at L1847/L1878/L1114). Microbench (alloc+concat overhead only, not full nvCOMP latency): n=256 tile_bytes=65536 drops 3.66ms->0.69ms, n=256 tile_bytes=262144 drops 8.18ms->0.13ms. Tests: 5 new tests in test_nvcomp_from_device_bufs_single_alloc_1659.py (codec short-circuit, no-lib short-circuit, memory-guard contract, real ZSTD round-trip via nvCOMP, structural single-buffer check). 1458 existing geotiff tests pass, 3 unrelated matplotlib/py3.14 failures pre-existing. SAFE/IO-bound verdict holds. | Pass 6 (2026-05-12): re-audit on top of #1659. New HIGH in _try_kvikio_read_tiles at _gpu_decode.py:941: per-tile cupy.empty() + blocking IOFuture.get() inside loop serialised GDS reads to ~1 outstanding pread, missed parallelism the kvikio worker pool was designed for, paid per-tile cupy.empty setup (matches #1659 anti-pattern in nvCOMP path), and lacked _check_gpu_memory guard. Filed #1688 and fixed to single contiguous buffer + batched submit + guard. Microbench with 8-worker pool simulation: 256 tiles@1ms latency drops 256ms->38.7ms (~6.6x); single-thread simulation 256ms->28.5ms (9x). Tests: 9 new tests in test_kvikio_batched_pread_1688.py (kvikio-absent path, single-buffer pointer arithmetic, submit-before-get ordering, memory guard, partial-read fallback, round-trip data, zero-size/all-sparse tiles). All 1577 geotiff tests pass except pre-existing matplotlib/py3.14 failures."
 glcm,2026-03-31T18:00:00Z,SAFE,compute-bound,0,,"Downgraded to MEDIUM. da.stack without rechunk is scheduling overhead, not OOM risk."
 hillshade,2026-04-16T12:00:00Z,SAFE,compute-bound,0,,"Re-audit after Horn's method rewrite (PR 1175): clean stencil, map_overlap depth=(1,1), no materialization. Zero findings."
diff --git a/xrspatial/geotiff/_backends/vrt.py b/xrspatial/geotiff/_backends/vrt.py
index 487c68b52..14016c6ca 100644
--- a/xrspatial/geotiff/_backends/vrt.py
+++ b/xrspatial/geotiff/_backends/vrt.py
@@ -499,6 +499,18 @@ def _read_vrt_chunked(source, *, window, band, name, chunks, gpu, dtype,
 
     delayed_read = dask.delayed(_vrt_chunk_read)
 
+    # Wrap the parsed VRTDataset in a single dask Delayed so every
+    # chunk task takes it as one graph input rather than a per-task
+    # closure copy. Without this, dask embeds the full source list
+    # (filenames, src/dst rects, per-source nodata) into every task's
+    # pickled graph entry; a 1000-source VRT split into 1000 chunks
+    # would build a ~57 MB driver graph and re-serialise the same
+    # metadata 1000 times under distributed/process schedulers. The
+    # sibling COG-HTTP and GDS chunked paths use the same single-
+    # delayed-input pattern (see ``http_meta_key`` in ``dask.py`` and
+    # ``meta_key`` in ``gpu.py``). See issue #1923.
+    parsed_vrt_key = dask.delayed(vrt, pure=True)
+
     if gpu:
         import cupy
         meta = cupy.empty((0,) * (3 if out_has_band_axis else 2),
@@ -526,7 +538,7 @@ def _read_vrt_chunked(source, *, window, band, name, chunks, gpu, dtype,
                 missing_sources=missing_sources,
                 declared_dtype=declared_dtype,
                 gpu=gpu,
-                parsed_vrt=vrt,
+                parsed_vrt=parsed_vrt_key,
             )
             block = da.from_delayed(d, shape=block_shape,
                                     dtype=declared_dtype, meta=meta)
diff --git a/xrspatial/geotiff/tests/test_vrt_chunked_shared_dataset_1923.py b/xrspatial/geotiff/tests/test_vrt_chunked_shared_dataset_1923.py
new file mode 100644
index 000000000..37427014d
--- /dev/null
+++ b/xrspatial/geotiff/tests/test_vrt_chunked_shared_dataset_1923.py
@@ -0,0 +1,168 @@
+"""Regression test for issue #1923.
+
+``_read_vrt_chunked`` previously passed the parsed ``VRTDataset`` as a
+plain kwarg to each per-chunk ``dask.delayed`` call, so dask embedded
+the full source list (filenames, src/dst rects, per-source nodata) into
+every task graph entry. With N sources and M chunks the graph held
+``N * M`` copies of the same metadata; a 1000-source VRT split into
+1000 chunks built a ~57 MB driver graph and reserialised that payload
+1000 times under distributed/process schedulers.
+
+The fix wraps the dataset in ``dask.delayed(vrt, pure=True)`` once before
+the per-chunk loop and passes that single shared reference into every
+task, mirroring the ``http_meta_key`` pattern in ``_backends/dask.py``
+and the ``meta_key`` pattern in ``_backends/gpu.py``.
+
+These tests assert the shared-input behaviour structurally so a
+regression would either flip the structural check or pump the graph
+size back over the cap.
+"""
+from __future__ import annotations
+
+import os
+
+import numpy as np
+import pytest
+import xarray as xr
+
+from xrspatial.geotiff import read_vrt, to_geotiff
+
+
+def _make_tile_vrt(tmp_path, n_tiles_per_side=4):
+    """Build a small multi-source VRT for testing the chunked path.
+
+    Each source is a 64x64 tile written to a temp directory; the VRT
+    stitches them into a ``(64*N, 64*N)`` mosaic. ``N=4`` keeps the
+    fixture cheap while still producing a multi-source VRT whose
+    embedded metadata is measurable.
+    """
+    tile_dir = os.path.join(tmp_path, "tiles")
+    os.makedirs(tile_dir, exist_ok=True)
+    tile_size = 64
+    sources = []
+    for r in range(n_tiles_per_side):
+        for c in range(n_tiles_per_side):
+            arr = np.full(
+                (tile_size, tile_size),
+                fill_value=r * n_tiles_per_side + c,
+                dtype=np.float32,
+            )
+            ox = c * tile_size
+            oy = -(r * tile_size)
+            # GeoTransform tuple ordering matches rasterio: pixel_width
+            # first, then column rotation (forced to zero), then origin_x,
+            # then row rotation (forced to zero), then pixel_height,
+            # then origin_y.
+            da = xr.DataArray(
+                arr,
+                dims=["y", "x"],
+                attrs={"transform": (1.0, 0.0, ox, 0.0, -1.0, oy)},
+            )
+            path = os.path.join(tile_dir, f"tile_{r}_{c}.tif")
+            to_geotiff(da, path, compression="deflate", tiled=True,
+                       tile_size=64)
+            sources.append((path, r, c, tile_size))
+
+    vrt_path = os.path.join(tmp_path, "mosaic.vrt")
+    width = n_tiles_per_side * tile_size
+    height = n_tiles_per_side * tile_size
+    lines = [
+        f'<VRTDataset rasterXSize="{width}" rasterYSize="{height}">',
+        '<SRS></SRS>',
+        '<GeoTransform>0.0, 1.0, 0.0, 0.0, 0.0, -1.0</GeoTransform>',
+        '<VRTRasterBand dataType="Float32" band="1">',
+    ]
+    for (path, r, c, ts) in sources:
+        lines.extend([
+            "<SimpleSource>",
+            f'<SourceFilename relativeToVRT="0">{path}</SourceFilename>',
+            "<SourceBand>1</SourceBand>",
+            f'<SrcRect xOff="0" yOff="0" xSize="{ts}" ySize="{ts}"/>',
+            f'<DstRect xOff="{c * ts}" yOff="{r * ts}" xSize="{ts}" '
+            f'ySize="{ts}"/>',
+            "</SimpleSource>",
+        ])
+    lines.extend(["</VRTRasterBand>", "</VRTDataset>"])
+    with open(vrt_path, "w") as f:
+        f.write("\n".join(lines))
+    return vrt_path, len(sources)
+
+
+def test_vrt_chunked_dataset_is_shared_graph_input(tmp_path):
+    """Issue #1923: parsed VRTDataset is wrapped as a single Delayed.
+
+    Walks each ``_vrt_chunk_read`` task's kwargs dict in the dask graph
+    and verifies that ``parsed_vrt`` is NOT an inline ``VRTDataset``
+    instance (the pre-fix shape). With the fix, ``parsed_vrt`` is
+    routed through ``dask.delayed(vrt, pure=True)`` so each task's
+    ``kwargs['parsed_vrt']`` is a graph reference (an ``Alias`` /
+    ``TaskRef``-style placeholder pointing into a single shared
+    ``from-value`` layer) rather than a literal embedded dataset.
+
+    Under the synchronous / threaded scheduler tasks are not pickled
+    at all, so an embedded copy is harmless in that path. The bug
+    surfaces under the distributed / multi-process scheduler where
+    each task is serialised independently and the full dataset is
+    shipped once per task -- so the structural shape of the graph,
+    not in-process behaviour, is what matters.
+    """
+    from xrspatial.geotiff._vrt import VRTDataset
+
+    vrt_path, n_sources = _make_tile_vrt(str(tmp_path), n_tiles_per_side=4)
+
+    # Use small chunks so the grid produces multiple tasks. 4 tiles per
+    # side at 64 px each is 256x256 image; chunks=32 gives 64 chunks.
+    result = read_vrt(vrt_path, chunks=32)
+    graph = result.__dask_graph__()
+
+    assert n_sources == 16, "fixture build sanity check"
+
+    chunk_task_count = 0
+    embedded_vrt_count = 0
+    for layer_name, layer in graph.layers.items():
+        if "_vrt_chunk_read" not in layer_name:
+            continue
+        for _key, task in layer.items():
+            # New-style dask Task: kwargs is a dict attribute.
+            kwargs = getattr(task, "kwargs", None)
+            if kwargs is None:
+                continue
+            parsed_vrt = kwargs.get("parsed_vrt")
+            if parsed_vrt is None:
+                continue
+            chunk_task_count += 1
+            if isinstance(parsed_vrt, VRTDataset):
+                embedded_vrt_count += 1
+
+    assert chunk_task_count > 1, (
+        f"fixture sanity: expected multiple chunk tasks, got "
+        f"{chunk_task_count}"
+    )
+    assert embedded_vrt_count == 0, (
+        f"#1923 regression: {embedded_vrt_count} of {chunk_task_count} "
+        f"_vrt_chunk_read tasks still embed an inline VRTDataset in "
+        f"kwargs['parsed_vrt']. The fix wraps the dataset in "
+        f"dask.delayed(vrt, pure=True) so kwargs['parsed_vrt'] should "
+        f"be a TaskRef-style graph reference, not a VRTDataset."
+    )
+
+
+def test_vrt_chunked_decode_unchanged_after_shared_wrap(tmp_path):
+    """The shared-Delayed wrap must not change decoded pixel values."""
+    vrt_path, _ = _make_tile_vrt(str(tmp_path), n_tiles_per_side=3)
+
+    # Eager read produces the baseline.
+    eager = read_vrt(vrt_path)
+    chunked = read_vrt(vrt_path, chunks=32).compute()
+
+    np.testing.assert_array_equal(np.asarray(eager), np.asarray(chunked))
+
+
+def test_vrt_chunked_band_kwarg_still_validates(tmp_path):
+    """Wrapping the dataset must not change band validation behaviour."""
+    vrt_path, _ = _make_tile_vrt(str(tmp_path), n_tiles_per_side=2)
+
+    with pytest.raises(ValueError):
+        # The fixture is single-band; band=5 should still raise the
+        # same ValueError the chunked path raised pre-fix.
+        read_vrt(vrt_path, chunks=32, band=5)