NVIDIA
diff --git a/‎cuda_bindings/docs/build_docs.sh‎
Lines changed: 4 additions & 0 deletions b/‎cuda_bindings/docs/build_docs.sh‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎cuda_bindings/docs/source/conf.py‎
Lines changed: 13 additions & 0 deletions b/‎cuda_bindings/docs/source/conf.py‎
Lines changed: 13 additions & 0 deletions
diff --git a/‎cuda_bindings/docs/source/examples.rst‎
Lines changed: 16 additions & 16 deletions b/‎cuda_bindings/docs/source/examples.rst‎
Lines changed: 16 additions & 16 deletions
diff --git a/‎cuda_bindings/docs/source/overview.rst‎
Lines changed: 1 addition & 1 deletion b/‎cuda_bindings/docs/source/overview.rst‎
Lines changed: 1 addition & 1 deletion
@@ -25,6 +25,10 @@ if [[ -z "${SPHINX_CUDA_BINDINGS_VER}" ]]; then
                                       | awk -F'+' '{print $1}')
 fi
 
+if [[ "${LATEST_ONLY}" == "1" && -z "${BUILD_PREVIEW:-}" && -z "${BUILD_LATEST:-}" ]]; then
+    export BUILD_LATEST=1
+fi
+
 # build the docs (in parallel)
 SPHINXOPTS="-j 4 -d build/.doctrees" make html
 
 
@@ -26,6 +26,15 @@
 release = os.environ["SPHINX_CUDA_BINDINGS_VER"]
 
 
+def _github_examples_ref():
+    if int(os.environ.get("BUILD_PREVIEW", 0)) or int(os.environ.get("BUILD_LATEST", 0)):
+        return "main"
+    return f"v{release}"
+
+
+GITHUB_EXAMPLES_REF = _github_examples_ref()
+
+
 # -- General configuration ---------------------------------------------------
 
 # Add any Sphinx extension module names here, as strings. They can be
@@ -99,6 +108,10 @@
 # skip cmdline prompts
 copybutton_exclude = ".linenos, .gp"
 
+rst_epilog = f"""
+.. |cuda_bindings_github_ref| replace:: {GITHUB_EXAMPLES_REF}
+"""
+
 intersphinx_mapping = {
     "python": ("https://docs.python.org/3/", None),
     "numpy": ("https://numpy.org/doc/stable/", None),
 
@@ -5,64 +5,64 @@ Examples
 ========
 
 This page links to the ``cuda.bindings`` examples shipped in the
-`cuda-python repository <https://github.com/NVIDIA/cuda-python/tree/main/cuda_bindings/examples>`_.
+`cuda-python repository <https://github.com/NVIDIA/cuda-python/tree/|cuda_bindings_github_ref|/cuda_bindings/examples>`_.
 Use it as a quick index when you want a runnable sample for a specific API area
 or CUDA feature.
 
 Introduction
 ------------
 
-- `clock_nvrtc_test.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/examples/0_Introduction/clock_nvrtc_test.py>`_
+- `clock_nvrtc.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/0_Introduction/clock_nvrtc.py>`_
   uses NVRTC-compiled CUDA code and the device clock to time a reduction
   kernel.
-- `simpleCubemapTexture_test.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/examples/0_Introduction/simpleCubemapTexture_test.py>`_
+- `simple_cubemap_texture.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/0_Introduction/simple_cubemap_texture.py>`_
   demonstrates cubemap texture sampling and transformation.
-- `simpleP2P_test.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/examples/0_Introduction/simpleP2P_test.py>`_
+- `simple_p2p.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/0_Introduction/simple_p2p.py>`_
   shows peer-to-peer memory access and transfers between multiple GPUs.
-- `simpleZeroCopy_test.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/examples/0_Introduction/simpleZeroCopy_test.py>`_
+- `simple_zero_copy.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/0_Introduction/simple_zero_copy.py>`_
   uses zero-copy mapped host memory for vector addition.
-- `systemWideAtomics_test.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/examples/0_Introduction/systemWideAtomics_test.py>`_
+- `system_wide_atomics.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/0_Introduction/system_wide_atomics.py>`_
   demonstrates system-wide atomic operations on managed memory.
-- `vectorAddDrv_test.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/examples/0_Introduction/vectorAddDrv_test.py>`_
+- `vector_add_drv.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/0_Introduction/vector_add_drv.py>`_
   uses the CUDA Driver API and unified virtual addressing for vector addition.
-- `vectorAddMMAP_test.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/examples/0_Introduction/vectorAddMMAP_test.py>`_
+- `vector_add_mmap.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/0_Introduction/vector_add_mmap.py>`_
   uses virtual memory management APIs such as ``cuMemCreate`` and
   ``cuMemMap`` for vector addition.
 
 Concepts and techniques
 -----------------------
 
-- `streamOrderedAllocation_test.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/examples/2_Concepts_and_Techniques/streamOrderedAllocation_test.py>`_
+- `stream_ordered_allocation.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/2_Concepts_and_Techniques/stream_ordered_allocation.py>`_
   demonstrates ``cudaMallocAsync`` and ``cudaFreeAsync`` together with
   memory-pool release thresholds.
 
 CUDA features
 -------------
 
-- `globalToShmemAsyncCopy_test.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/examples/3_CUDA_Features/globalToShmemAsyncCopy_test.py>`_
+- `global_to_shmem_async_copy.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/3_CUDA_Features/global_to_shmem_async_copy.py>`_
   compares asynchronous global-to-shared-memory copy strategies in matrix
   multiplication kernels.
-- `simpleCudaGraphs_test.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/examples/3_CUDA_Features/simpleCudaGraphs_test.py>`_
+- `simple_cuda_graphs.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/3_CUDA_Features/simple_cuda_graphs.py>`_
   shows both manual CUDA graph construction and stream-capture-based replay.
 
 Libraries and tools
 -------------------
 
-- `conjugateGradientMultiBlockCG_test.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/examples/4_CUDA_Libraries/conjugateGradientMultiBlockCG_test.py>`_
+- `conjugate_gradient_multi_block_cg.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/4_CUDA_Libraries/conjugate_gradient_multi_block_cg.py>`_
   implements a conjugate-gradient solver with cooperative groups and
   multi-block synchronization.
-- `nvidia_smi.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/examples/4_CUDA_Libraries/nvidia_smi.py>`_
+- `nvidia_smi.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/4_CUDA_Libraries/nvidia_smi.py>`_
   uses NVML to implement a Python subset of ``nvidia-smi``.
 
 Advanced and interoperability
 -----------------------------
 
-- `isoFDModelling_test.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/examples/extra/isoFDModelling_test.py>`_
+- `iso_fd_modelling.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/extra/iso_fd_modelling.py>`_
   runs isotropic finite-difference wave propagation across multiple GPUs with
   peer-to-peer halo exchange.
-- `jit_program_test.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/examples/extra/jit_program_test.py>`_
+- `jit_program.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/extra/jit_program.py>`_
   JIT-compiles a SAXPY kernel with NVRTC and launches it through the Driver
   API.
-- `numba_emm_plugin.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/examples/extra/numba_emm_plugin.py>`_
+- `numba_emm_plugin.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/extra/numba_emm_plugin.py>`_
   shows how to back Numba's EMM interface with the NVIDIA CUDA Python Driver
   API.
@@ -522,7 +522,7 @@ CUDA objects
 Certain CUDA kernels use native CUDA types as their parameters such as ``cudaTextureObject_t``. These types require special handling since they're neither a primitive ctype nor a custom user type. Since ``cuda.bindings`` exposes each of them as Python classes, they each implement ``getPtr()`` and ``__int__()``. These two callables used to support the NumPy and ctypes approach. The difference between each call is further described under `Tips and Tricks <https://nvidia.github.io/cuda-python/cuda-bindings/latest/tips_and_tricks.html#>`_.
 
 For this example, lets use the ``transformKernel`` from
-`examples/0_Introduction/simpleCubemapTexture_test.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/examples/0_Introduction/simpleCubemapTexture_test.py>`_.
+`simple_cubemap_texture.py <https://github.com/NVIDIA/cuda-python/blob/|cuda_bindings_github_ref|/cuda_bindings/examples/0_Introduction/simple_cubemap_texture.py>`_.
 The :doc:`examples` page links to more samples covering textures, graphs,
 memory mapping, and multi-GPU workflows.