Skip to content

Add GPU CTE and GPU Infinite Memory (UVM demand paging) docs#9

Merged
lukemartinlogan merged 5 commits intomainfrom
quickstart-and-sidebar-reorder
Mar 9, 2026
Merged

Add GPU CTE and GPU Infinite Memory (UVM demand paging) docs#9
lukemartinlogan merged 5 commits intomainfrom
quickstart-and-sidebar-reorder

Conversation

@lukemartinlogan
Copy link
Contributor

Summary

  • gpu-cte.md: Documents GPU kernel integration with CTE — CPU fork-client setup, three-backend memory layout, IpcManagerGpuInfo, kernel-side AsyncPutBlob/AsyncGetBlob, routing table (ToLocalCpu / Local / LocalGpuBcast), and stop-flag polling pattern.
  • gpu-inf-mem.md: Documents the wrp_cte_uvm demand-paging module (GpuVirtualMemoryManager) — GpuVmmConfig, touchPage/touchRange/evictPage, prefetch window, CTE blob store backing, separate transfer/compute streams, state queries, full example, CMake integration, and hardware requirements.

Test plan

  • Verify both pages render correctly in the Docusaurus sidebar under Context Transfer Engine
  • Check all code blocks are valid (headers, API calls match actual source)
  • Confirm gpu-inf-mem.md accurately reflects context-transfer-engine/uvm/include/wrp_cte/uvm/gpu_vmm.h

🤖 Generated with Claude Code

lukemartinlogan and others added 5 commits March 3, 2026 05:55
Document how CUDA/ROCm kernels interact with the Chimaera runtime,
including task definition, client API, host-side setup, configuration
parameters, and troubleshooting.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Move base-modules section after the new GPU clients page to maintain
logical sidebar ordering.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Update all queue field names to current API (cpu2gpu_queue, gpu2cu_queue,
  gpu2gpu_queue) throughout examples and narrative
- Update ClientConnectTask handshake table to new field names
- Add gpu_heap_backend (GpuMalloc, 9000+gpu_id) and gpu2gpu device-memory
  backend (3000+gpu_id) to server backend table and memory layout diagrams
- Replace single-ArenaAllocator description with dual-allocator architecture:
  ArenaAllocator (HSHM_DEFAULT_ALLOC_GPU_T, primary bump-pointer) +
  BuddyAllocator (CHI_GPU_HEAP_T, serialization heap with individual free)
- Document CHIMAERA_GPU_ORCHESTRATOR_INIT(gpu_info, num_blocks) macro for
  multi-block client kernels; CHI_CLIENT_GPU_INIT alias
- Add GetClientGpuInfo(gpu_id) to host-side setup — fills all IpcManagerGpuInfo
  fields automatically for same-process kernel launches
- Add Performance section: ~200 µs BuddyAllocator vs ~400 µs device malloc,
  corrected latency measurement explanation, arena-reset semantics
- Update server phase-1 init sequence for new backends and queue layout
- Update GPU memory backend size guidance for primary arena vs heap backends

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
gpu-cte.md: documents GPU kernel integration with CTE — CPU fork-client
setup, three-backend memory layout, IpcManagerGpuInfo, kernel-side
AsyncPutBlob/GetBlob, routing table, and stop-flag polling pattern.

gpu-inf-mem.md: documents the GpuShmMmap UVM backend — why UVM is
chosen over pinned memory (no HostNativeAtomicSupported required),
shm_init API, memory layout, kernel passing, IPC manager registration,
per-block allocator construction, destruction rules, and comparison
table against MallocBackend / PosixShmMmap / GpuMalloc.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace placeholder UVM backend doc with accurate documentation of the
GpuVirtualMemoryManager class in context-transfer-engine/uvm.  Covers:
- GpuVmmConfig fields and defaults
- init/destroy lifecycle
- touchPage / touchRange / touchPageAsync demand page-in
- evictPage / evictPageAsync page-out to host RAM or CTE blob store
- prefetch_window auto-prefetch
- state queries (isMapped, isEvictedToHost, getMappedPageCount, ...)
- separate transfer/compute stream model
- CTE blob store backing option
- full end-to-end example
- CMake integration and hardware requirements

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@lukemartinlogan lukemartinlogan merged commit 0ca139a into main Mar 9, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant