chore(benchmarks): enhance safetensors loading strategies#146
chore(benchmarks): enhance safetensors loading strategies#146wolegechu wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b1ee0aa84b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| host_pack_pool = | ||
| std::make_unique<common::memory::PinnedBufferPool>(pack_total_bytes, pack_chunk_bytes, std::move(pool_opts)); |
There was a problem hiding this comment.
Align host_pack pinned pool size to 4K
PinnedBufferPool allocates its slab with aligned_alloc(4096, slab_bytes) (see core/common/memory/pinned_buffer_pool.cc:115–118), which requires the total size be a multiple of 4096. Here pack_total_bytes is derived from pack_chunk_bytes * 2, but pack_chunk_bytes is only aligned to 512, so values like 1024/3072/5120 are possible for small or narrow slices. In those cases the constructor will LOG(FATAL) and the host_pack benchmark will abort. Consider rounding pack_total_bytes (or pack_chunk_bytes) up to PinnedBufferPool::kMemoryAlignment before constructing the pool.
Useful? React with 👍 / 👎.
Note
Introduces a new loader strategy and H2D microbenchmark, plus documentation.
host_pack(Strategy C_host_pack): Implements CPU-side pack + 1D H2D path (--strategy=host_pack/hp), including planning, host staging, pinned-pack buffers, async H2D, and metrics; addsStrategyKind::kC_HostPack, execution flow, and logging.h2d_2d_baselinemode: BenchmarkscudaMemcpy2DAsync(H2D)for 2D stride-to-packed copies; adds mode parsing, per-GPU execution, timing, and aggregated throughput output.--h2d_2d_width_bytes,--h2d_2d_height,--h2d_2d_src_pitch_bytes,--h2d_2d_dst_pitch_bytes) and extended--h2d_per_gpu_pinned_poolsemantics; updated--modeand--strategyhelp.HostBufferSinkfor host staging writes; minor planner/stat counters and result logging updates to include C_host_pack.host_packusage andh2d_2d_baseline; new detailed benchmark report covering results and comparisons.Written by Cursor Bugbot for commit b1ee0aa. Configure here.