Add GPU CTE and GPU Infinite Memory (UVM demand paging) docs#9
Merged
lukemartinlogan merged 5 commits intomainfrom Mar 9, 2026
Merged
Add GPU CTE and GPU Infinite Memory (UVM demand paging) docs#9lukemartinlogan merged 5 commits intomainfrom
lukemartinlogan merged 5 commits intomainfrom
Conversation
Document how CUDA/ROCm kernels interact with the Chimaera runtime, including task definition, client API, host-side setup, configuration parameters, and troubleshooting. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Move base-modules section after the new GPU clients page to maintain logical sidebar ordering. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Update all queue field names to current API (cpu2gpu_queue, gpu2cu_queue, gpu2gpu_queue) throughout examples and narrative - Update ClientConnectTask handshake table to new field names - Add gpu_heap_backend (GpuMalloc, 9000+gpu_id) and gpu2gpu device-memory backend (3000+gpu_id) to server backend table and memory layout diagrams - Replace single-ArenaAllocator description with dual-allocator architecture: ArenaAllocator (HSHM_DEFAULT_ALLOC_GPU_T, primary bump-pointer) + BuddyAllocator (CHI_GPU_HEAP_T, serialization heap with individual free) - Document CHIMAERA_GPU_ORCHESTRATOR_INIT(gpu_info, num_blocks) macro for multi-block client kernels; CHI_CLIENT_GPU_INIT alias - Add GetClientGpuInfo(gpu_id) to host-side setup — fills all IpcManagerGpuInfo fields automatically for same-process kernel launches - Add Performance section: ~200 µs BuddyAllocator vs ~400 µs device malloc, corrected latency measurement explanation, arena-reset semantics - Update server phase-1 init sequence for new backends and queue layout - Update GPU memory backend size guidance for primary arena vs heap backends Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
gpu-cte.md: documents GPU kernel integration with CTE — CPU fork-client setup, three-backend memory layout, IpcManagerGpuInfo, kernel-side AsyncPutBlob/GetBlob, routing table, and stop-flag polling pattern. gpu-inf-mem.md: documents the GpuShmMmap UVM backend — why UVM is chosen over pinned memory (no HostNativeAtomicSupported required), shm_init API, memory layout, kernel passing, IPC manager registration, per-block allocator construction, destruction rules, and comparison table against MallocBackend / PosixShmMmap / GpuMalloc. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace placeholder UVM backend doc with accurate documentation of the GpuVirtualMemoryManager class in context-transfer-engine/uvm. Covers: - GpuVmmConfig fields and defaults - init/destroy lifecycle - touchPage / touchRange / touchPageAsync demand page-in - evictPage / evictPageAsync page-out to host RAM or CTE blob store - prefetch_window auto-prefetch - state queries (isMapped, isEvictedToHost, getMappedPageCount, ...) - separate transfer/compute stream model - CTE blob store backing option - full end-to-end example - CMake integration and hardware requirements Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
IpcManagerGpuInfo, kernel-sideAsyncPutBlob/AsyncGetBlob, routing table (ToLocalCpu/Local/LocalGpuBcast), and stop-flag polling pattern.wrp_cte_uvmdemand-paging module (GpuVirtualMemoryManager) —GpuVmmConfig,touchPage/touchRange/evictPage, prefetch window, CTE blob store backing, separate transfer/compute streams, state queries, full example, CMake integration, and hardware requirements.Test plan
gpu-inf-mem.mdaccurately reflectscontext-transfer-engine/uvm/include/wrp_cte/uvm/gpu_vmm.h🤖 Generated with Claude Code