-
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 1 reply
-
|
Hi, Great question and excellent analysis in your demo — your understanding of the current data path is correct. Options to improve performance1. Batched DMA with pinned memoryYour demo already does this mostly right. A few optimizations:
# Instead of per-cable transfers:
all_gpu_pts = wp.zeros(total_points_all_cables, dtype=wp.vec3f, device="cuda:0")
all_cpu_pts = wp.zeros(total_points_all_cables, dtype=wp.vec3f, device="cpu", pinned=True)
# One kernel launch for ALL cables
wp.launch(kernel=_compute_all_cables, dim=total_points_all_cables, inputs=[all_gpu_pts, ...])
# One DMA transfer
wp.copy(all_cpu_pts, all_gpu_pts)
wp.synchronize_device("cuda:0")
# Slice and set per-curve
offset = 0
for curve, n in zip(curves, points_per_curve):
curve.GetPointsAttr().Set(Vt.Vec3fArray.FromNumpy(all_cpu_pts.numpy()[offset:offset+n]))
offset += n2. Avoids the Python list comprehensionFor 64 points (768 bytes), the DMA transfer itself is essentially free. Your actual bottleneck is likely step 3 — the Python list comprehension building Vt.Vec3fArray from numpy: # This is O(n) — slow for large n:
pts_attr.Set(Vt.Vec3fArray([Gf.Vec3f(float(p[0]), float(p[1]), float(p[2])) for p in cpu_pts_np]))
# Try instead (if supported in your version):
pts_attr.Set(Vt.Vec3fArray.FromNumpy(cpu_pts_np))
# Or:
pts_attr.Set(Vt.Vec3fArray(cpu_pts_np.tolist())) |
Beta Was this translation helpful? Give feedback.
-
Another Question
@wp.kernel(enable_backward=False)
def _set_fabric_transforms(
fabric_transforms: wp.fabricarray(dtype=wp.mat44d),
newton_indices: wp.fabricarray(dtype=wp.uint32),
newton_body_q: wp.array(ndim=1, dtype=wp.transformf),
):
"""Write Newton body transforms to Fabric world matrices.
For each Fabric prim at thread ``i``, reads the Newton body transform at
``newton_body_q[newton_indices[i]]`` and stores it as a column-major
``mat44d`` in ``fabric_transforms[i]``.
"""
i = int(wp.tid())
idx = int(newton_indices[i])
transform = newton_body_q[idx]
fabric_transforms[i] = wp.transpose(wp.mat44d(wp.math.transform_to_matrix(transform))) |
Beta Was this translation helpful? Give feedback.
-
|
Not for plain UsdGeom.BasisCurves.points today, only for transforms and physics-owned point sets. The newton_manager.py kernel you quoted works because omni:fabric:worldMatrix is the one Fabric attribute that has dual-storage on every Xformable prim — Fabric keeps a GPU-resident copy that Hydra/RTX renders directly from, so a Warp kernel writing into |
Beta Was this translation helpful? Give feedback.

Not for plain UsdGeom.BasisCurves.points today, only for transforms and physics-owned point sets.
The newton_manager.py kernel you quoted works because omni:fabric:worldMatrix is the one Fabric attribute that has dual-storage on every Xformable prim — Fabric keeps a GPU-resident copy that Hydra/RTX renders directly from, so a Warp kernel writing into
wp.fabricarray(dtype=wp.mat44d)is a real zero-copy path. That exact kernel also lives in this repo at fabric.py:30 set_fabric_transforms.