chore(benchmarks): enhance safetensors loading strategies by wolegechu · Pull Request #146 · tensorcast-ai/tensorcast

wolegechu · 2026-01-20T02:08:23Z

Note

Introduces a new loader strategy and H2D microbenchmark, plus documentation.

Loader host_pack (Strategy C_host_pack): Implements CPU-side pack + 1D H2D path (--strategy=host_pack/hp), including planning, host staging, pinned-pack buffers, async H2D, and metrics; adds StrategyKind::kC_HostPack, execution flow, and logging.
New h2d_2d_baseline mode: Benchmarks cudaMemcpy2DAsync(H2D) for 2D stride-to-packed copies; adds mode parsing, per-GPU execution, timing, and aggregated throughput output.
CLI additions: New flags for 2D H2D (--h2d_2d_width_bytes, --h2d_2d_height, --h2d_2d_src_pitch_bytes, --h2d_2d_dst_pitch_bytes) and extended --h2d_per_gpu_pinned_pool semantics; updated --mode and --strategy help.
Utilities: Adds HostBufferSink for host staging writes; minor planner/stat counters and result logging updates to include C_host_pack.
Docs: README expanded with host_pack usage and h2d_2d_baseline; new detailed benchmark report covering results and comparisons.

^{Written by Cursor Bugbot for commit b1ee0aa. Configure here.}

…ack and 2D H2D support

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b1ee0aa84b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-01-20T02:19:09Z

+    host_pack_pool =
+        std::make_unique<common::memory::PinnedBufferPool>(pack_total_bytes, pack_chunk_bytes, std::move(pool_opts));


Align host_pack pinned pool size to 4K

PinnedBufferPool allocates its slab with aligned_alloc(4096, slab_bytes) (see core/common/memory/pinned_buffer_pool.cc:115–118), which requires the total size be a multiple of 4096. Here pack_total_bytes is derived from pack_chunk_bytes * 2, but pack_chunk_bytes is only aligned to 512, so values like 1024/3072/5120 are possible for small or narrow slices. In those cases the constructor will LOG(FATAL) and the host_pack benchmark will abort. Consider rounding pack_total_bytes (or pack_chunk_bytes) up to PinnedBufferPool::kMemoryAlignment before constructing the pool.

Useful? React with 👍 / 👎.

wolegechu added 2 commits January 20, 2026 10:07

chore(benchmarks): enhance safetensors loading strategies with host p…

1c6e2e9

…ack and 2D H2D support

Merge branch 'main' into ychu/update-bench

b1ee0aa

chatgpt-codex-connector Bot reviewed Jan 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(benchmarks): enhance safetensors loading strategies#146

chore(benchmarks): enhance safetensors loading strategies#146
wolegechu wants to merge 2 commits into
mainfrom
ychu/update-bench

wolegechu commented Jan 20, 2026 •

edited by cursor Bot

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		host_pack_pool =
		std::make_unique<common::memory::PinnedBufferPool>(pack_total_bytes, pack_chunk_bytes, std::move(pool_opts));

Conversation

wolegechu commented Jan 20, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

wolegechu commented Jan 20, 2026 •

edited by cursor Bot

Loading