Open
Conversation
hero78119
reviewed
Mar 5, 2026
Collaborator
hero78119
left a comment
There was a problem hiding this comment.
a first quick pass with some perf related question
| self.vm.tracer().step_record(idx) | ||
| } | ||
|
|
||
| fn syscall_witnesses(&self) -> &[SyscallWitness] { |
Collaborator
There was a problem hiding this comment.
this heavily calls within tracer better add #[inline(always)] to aligned with above step_record
| }; | ||
| tracing::debug!("position_next_shard finish in {:?}", time.elapsed()); | ||
| let shard_steps = step_iter.shard_steps(); | ||
| shard_ctx.syscall_witnesses = Arc::new(step_iter.syscall_witnesses().to_vec()); |
Collaborator
There was a problem hiding this comment.
this to_vec() step is slightly expensive because of vector clone. Can we follow above line 1293 shard_steps to get only slice, then in the place where need it, just retrieved it by slice[index]?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
A fixed-size,
Copy,repr(C)StepRecordis a prerequisite for GPU-accelerated witness generation. With a deterministic layout, millions of step records can be bulk-copied (H2D) to the GPU without any serialization. TheCopytrait also reduces per-step clone overhead on the CPU side by turning clones into cheap bitwise copies.Summary
StepRecordto implementCopyand use#[repr(C)]for deterministic memory layout, making it easy to bulk H2D copy to GPU for future CUDA witness generationRegIdxtype fromusizetou8for compactness (RISC-V has only 32 registers)SyscallWitnessout ofStepRecordinto a separate store indexed byu32, sinceSyscallWitnesscontainsVecand cannot beCopyChanges
Core data structure changes (
ceno_emul)StepRecordis nowCopy + repr(C)(136 bytes, 4-byte aligned):Option<ReadOp>/Option<WriteOp>fields with inline values +has_rs1/has_rs2/has_rd/has_memory_opboolean flagsOption<SyscallWitness>withsyscall_index: u32(index into external store,u32::MAX= no syscall)Defaultimpl with sentinel valuesRegIdxnarrowed fromusizetou8(addr.rs):u32width before narrowing to avoid silent truncation#[repr(C)]/#[repr(u8)]added to supporting types:ByteAddr,WordAddr,Instruction,InsnKind,MemOp<T>,Change<T>FullTracergainssyscall_witnesses: Vec<SyscallWitness>:track_syscall()pushes to this vec and stores the index inStepRecordsyscall_witnesses() -> &[SyscallWitness]reset_step_buffer()clearssyscall_witnesses, keeping indices shard-local and avoiding cross-shard accumulationLayout verification tests (
tracer.rs):test_step_record_is_copy_and_compact: assertsCopytrait and size <= 144 bytestest_step_record_layout_for_gpu: asserts exact byte offsets of every field for CUDA header alignmenttest_supporting_types_are_copy: assertsReadOp,WriteOp,ChangeareCopyAPI signature changes
StepRecord::syscall()now takes&[SyscallWitness]parameter instead of returning from an internalOptionStepRecord::has_syscall() -> booladdedStepSourcetrait gainssyscall_witnesses() -> &[SyscallWitness]ShardContextgainssyscall_witnesses: Arc<Vec<SyscallWitness>>fieldkeccak_step()test helper returns(StepRecord, Vec<Instruction>, Vec<SyscallWitness>)Callers updated (
ceno_zkvm,ceno_host)assign_instancesmethods updated to read syscall witnesses fromshard_ctx.syscall_witnessesOpFixedRSconst generic changed fromusizetou8ceno_host/tests/test_elf.rs:run()returns(Vec<StepRecord>, Vec<SyscallWitness>)e2e.rs:generate_witnesspropagates syscall witnesses intoShardContext