Skip to content

Add: L3/L2 host-device mapped region design#861

Open
ccyywwen wants to merge 1 commit into
hw-native-sys:mainfrom
ccyywwen:host-device_mapped-region
Open

Add: L3/L2 host-device mapped region design#861
ccyywwen wants to merge 1 commit into
hw-native-sys:mainfrom
ccyywwen:host-device_mapped-region

Conversation

@ccyywwen
Copy link
Copy Markdown
Contributor

Summary

  • Add a design document for the proposed HostDeviceMappedRegion primitive.
  • Define the first-layer host/device mapped-region contract extracted from the PR803 lessons: raw mapped data bytes plus cache-line-sized signal slots, explicit host datacopy, and explicit host-side notify/wait.
  • Specify runtime ownership for L3 PROCESS mode, where the chip child owns the real mapped allocation and the L3 parent carries only an opaque region handle through the existing parent-child mailbox RPC path.
  • Define the planned public C ABI and Python API, including config/info structs, datacopy helpers, notify/wait helpers, error mapping, and masked host pointers in public Python info.
  • Document the region layout, signal slot layout, memory-ordering contract, side-band mailbox transport, platform support matrix, thin NPU example, and expected tests.
  • Keep higher-level shared-buffer, send/recv, and queue protocols explicitly out of scope for this design.

Backend Notes

  • On onboard A2/A3, the design uses device allocation plus halHostRegister so the owner process can initialize and access the region through a host mapping while kernels receive device-visible pointers.
  • On sim backends, ordinary host memory may serve as both the host mapping and simulation-visible device pointer, but the same layout, validation, and signal semantics still apply.
  • On onboard A5, the design calls for explicit -ENOTSUP stubs rather than missing ABI symbols.
  • The parent-child mailbox remains a control/proxy path only. It is not treated as the CPU-NPU mapped-region primitive itself.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the design document for HostDeviceMappedRegion, a low-level parent-to-child/NPU communication primitive in Simpler. The review feedback suggests improving the public ABI structure HostDeviceMappedRegionInfo by making implicit padding fields explicit to prevent potential security vulnerabilities and FFI errors, as well as updating the corresponding static assertions to validate these offsets.

Comment on lines +90 to +99
typedef struct {
uint64_t host_data_ptr;
uint64_t device_data_ptr;
uint64_t data_bytes;
uint64_t host_signal_ptr;
uint64_t device_signal_ptr;
uint32_t signal_count;
uint64_t total_bytes;
uint32_t flags;
} HostDeviceMappedRegionInfo;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The HostDeviceMappedRegionInfo structure contains implicit padding between signal_count (offset 40) and total_bytes (offset 48), as well as trailing padding after flags (offset 56) to align the structure to an 8-byte boundary (64 bytes total). Implicit padding in public ABI structures can lead to security vulnerabilities (such as uninitialized memory/information leakage) when copied or serialized directly across process boundaries. It can also make cross-language bindings (e.g., Python ctypes or Rust FFI) more error-prone. Adding explicit reserved fields ensures that padding is visible, documented, and can be explicitly zero-initialized.

Suggested change
typedef struct {
uint64_t host_data_ptr;
uint64_t device_data_ptr;
uint64_t data_bytes;
uint64_t host_signal_ptr;
uint64_t device_signal_ptr;
uint32_t signal_count;
uint64_t total_bytes;
uint32_t flags;
} HostDeviceMappedRegionInfo;
typedef struct {
uint64_t host_data_ptr;
uint64_t device_data_ptr;
uint64_t data_bytes;
uint64_t host_signal_ptr;
uint64_t device_signal_ptr;
uint32_t signal_count;
uint32_t reserved0;
uint64_t total_bytes;
uint32_t flags;
uint32_t reserved1;
} HostDeviceMappedRegionInfo;

Comment on lines +296 to +304
static_assert(offsetof(HostDeviceMappedRegionInfo, host_data_ptr) == 0);
static_assert(offsetof(HostDeviceMappedRegionInfo, device_data_ptr) == 8);
static_assert(offsetof(HostDeviceMappedRegionInfo, data_bytes) == 16);
static_assert(offsetof(HostDeviceMappedRegionInfo, host_signal_ptr) == 24);
static_assert(offsetof(HostDeviceMappedRegionInfo, device_signal_ptr) == 32);
static_assert(offsetof(HostDeviceMappedRegionInfo, signal_count) == 40);
static_assert(offsetof(HostDeviceMappedRegionInfo, total_bytes) == 48);
static_assert(offsetof(HostDeviceMappedRegionInfo, flags) == 56);
static_assert(sizeof(HostDeviceMappedRegionInfo) == 64);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Update the static assertions for HostDeviceMappedRegionInfo to explicitly validate the offsets of the newly added reserved0 and reserved1 fields. This ensures the structure layout remains exactly 64 bytes with the expected alignment and no implicit padding.

Suggested change
static_assert(offsetof(HostDeviceMappedRegionInfo, host_data_ptr) == 0);
static_assert(offsetof(HostDeviceMappedRegionInfo, device_data_ptr) == 8);
static_assert(offsetof(HostDeviceMappedRegionInfo, data_bytes) == 16);
static_assert(offsetof(HostDeviceMappedRegionInfo, host_signal_ptr) == 24);
static_assert(offsetof(HostDeviceMappedRegionInfo, device_signal_ptr) == 32);
static_assert(offsetof(HostDeviceMappedRegionInfo, signal_count) == 40);
static_assert(offsetof(HostDeviceMappedRegionInfo, total_bytes) == 48);
static_assert(offsetof(HostDeviceMappedRegionInfo, flags) == 56);
static_assert(sizeof(HostDeviceMappedRegionInfo) == 64);
static_assert(offsetof(HostDeviceMappedRegionInfo, host_data_ptr) == 0);
static_assert(offsetof(HostDeviceMappedRegionInfo, device_data_ptr) == 8);
static_assert(offsetof(HostDeviceMappedRegionInfo, data_bytes) == 16);
static_assert(offsetof(HostDeviceMappedRegionInfo, host_signal_ptr) == 24);
static_assert(offsetof(HostDeviceMappedRegionInfo, device_signal_ptr) == 32);
static_assert(offsetof(HostDeviceMappedRegionInfo, signal_count) == 40);
static_assert(offsetof(HostDeviceMappedRegionInfo, reserved0) == 44);
static_assert(offsetof(HostDeviceMappedRegionInfo, total_bytes) == 48);
static_assert(offsetof(HostDeviceMappedRegionInfo, flags) == 56);
static_assert(offsetof(HostDeviceMappedRegionInfo, reserved1) == 60);
static_assert(sizeof(HostDeviceMappedRegionInfo) == 64);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant