[Question] Cure point updating in gpu fabrics array #614

lyd405121 · 2026-05-09T02:39:15Z

lyd405121
May 9, 2026

Backgroun‘d

I am doging a cable simulation in isaacsim with newton

And I found the only way to transfer sim result from newton to isaacsim is using

curve_points_attr = curve.GetPointsAttr()
curve_points_attr.Set(points)

Question

Is there any way to make a fully gpu update for curves
For large amout of cables , it will drag down fps lot

Demo

To simplify my case , I wrote a demo

"""GPU-computed BasisCurves animation demo.

Computation happens entirely in a Warp GPU kernel.  Results are transferred to
a pre-allocated pinned CPU buffer each frame and then written to the USD
``points`` attribute.

Why not "pure zero-copy via fabricarrayarray":
  Kit's Fabric system only allocates GPU-native (writable) attribute storage for
  *physics-owned* prims (Newton cloth, PhysX deformables, particle systems).
  For a plain USD BasisCurves prim, ``points`` is a CPU-only, USD-synced
  attribute in Fabric — ``wp.fabricarrayarray`` writes to it silently fail.
  The ``omni:fabric:worldMatrix`` zero-copy path works only for transforms, not
  for geometry arrays.

Actual data path (per frame):
  1. Warp kernel  →  GPU ``wp.array`` (CUDA device memory)
  2. ``wp.copy``  →  CPU ``wp.array`` (one PCIe DMA, pre-allocated, ~768 bytes
                     for 64 points — effectively free vs. 60 Hz render budget)
  3. ``attr.Set`` →  USD stage  →  Hydra  →  renderer

Contrast with the CPU baseline (minimal_basis_curve_update.py):
  - Baseline: Python ``math.sin`` per point, then CPU-assembled Vt.Vec3fArray
  - This demo: CUDA kernel for all math, reused buffers, no Python-loop math

Usage:
    ./isaaclab.sh -p scripts/demos/gpu-curve/gpu_curve_demo.py
    ./isaaclab.sh -p scripts/demos/gpu-curve/gpu_curve_demo.py --num_points 256
"""

from __future__ import annotations

import argparse

from isaaclab.app import AppLauncher

parser = argparse.ArgumentParser(description="GPU-computed BasisCurves animation.")
parser.add_argument("--num_points", type=int, default=64, help="Number of curve control points.")
parser.add_argument("--max_steps", type=int, default=0, help="Steps to run (0 = until window closed).")
AppLauncher.add_app_launcher_args(parser)
args_cli = parser.parse_args()

app_launcher = AppLauncher(args_cli)
simulation_app = app_launcher.app

# -- post-launch imports -------------------------------------------------------
import numpy as np
import warp as wp
from pxr import Gf, UsdGeom, Vt

import isaaclab.sim as sim_utils

CURVE_PATH = "/World/GpuCurve/curve_0"


@wp.kernel(enable_backward=False)
def _compute_curve(
    pts: wp.array(dtype=wp.vec3f),
    num_points: int,
    phase: float,
):
    """Sine-wave curve positions computed entirely on GPU.

    Args:
        pts: Output GPU array of positions [m], shape ``[num_points]``.
        num_points: Number of curve control points.
        phase: Animation phase offset [rad].
    """
    i = wp.tid()
    if i >= num_points:
        return
    t = float(i) / float(num_points - 1)
    PI = float(3.141592653589793)
    x = -1.2 + 2.4 * t
    y = 0.25 * wp.sin(2.0 * PI * t + phase)
    z = 0.7 + 0.25 * wp.sin(PI * t) + 0.08 * wp.cos(3.0 * PI * t + phase)
    pts[i] = wp.vec3f(x, y, z)


# ------------------------------------------------------------------------------


def _create_curve(stage, num_points: int) -> UsdGeom.BasisCurves:
    """Define the BasisCurves prim."""
    sim_utils.create_prim("/World/GpuCurve", "Xform")
    pts = [Gf.Vec3f(-1.2 + 2.4 * i / float(num_points - 1), 0.0, 0.7) for i in range(num_points)]
    curve = UsdGeom.BasisCurves.Define(stage, CURVE_PATH)
    curve.CreateTypeAttr(UsdGeom.Tokens.linear)
    curve.CreateWrapAttr(UsdGeom.Tokens.nonperiodic)
    curve.CreateCurveVertexCountsAttr(Vt.IntArray([num_points]))
    curve.CreatePointsAttr(Vt.Vec3fArray(pts))
    curve.CreateWidthsAttr(Vt.FloatArray([0.03]))
    curve.SetWidthsInterpolation(UsdGeom.Tokens.constant)
    curve.CreateDisplayColorAttr(Vt.Vec3fArray([Gf.Vec3f(0.1, 0.8, 1.0)]))
    return curve


def main() -> None:
    """Run the GPU curve animation demo."""
    n = args_cli.num_points
    sim_cfg = sim_utils.SimulationCfg(dt=1.0 / 60.0, device=args_cli.device)
    sim = sim_utils.SimulationContext(sim_cfg)
    sim.set_camera_view([0.0, -3.5, 1.5], [0.0, 0.0, 0.75])

    stage = sim_utils.get_current_stage()
    light_cfg = sim_utils.DomeLightCfg(intensity=1500.0, color=(1.0, 1.0, 1.0))
    light_cfg.func("/World/DomeLight", light_cfg)

    curve = _create_curve(stage, n)
    sim.reset()

    # Pre-allocate GPU and CPU buffers once — reused every frame.
    gpu_pts = wp.zeros(n, dtype=wp.vec3f, device="cuda:0")
    cpu_pts = wp.zeros(n, dtype=wp.vec3f, device="cpu")
    # Numpy view into the CPU Warp array — zero-copy on the host side.
    cpu_pts_np: np.ndarray = cpu_pts.numpy()

    pts_attr = curve.GetPointsAttr()
    print(f"[INFO]: GPU curve demo ready — {n} points. Close the viewport to stop.", flush=True)

    frame = 0
    while simulation_app.is_running() and (args_cli.max_steps <= 0 or frame < args_cli.max_steps):
        phase = float(frame) * 0.05

        # 1. Compute positions on GPU.
        wp.launch(kernel=_compute_curve, dim=n, inputs=[gpu_pts, n, phase], device="cuda:0")

        # 2. DMA transfer GPU → pre-allocated CPU buffer (one PCIe transfer).
        wp.copy(cpu_pts, gpu_pts)
        wp.synchronize_device("cuda:0")

        # 3. Write to USD.  pxr does not accept raw numpy rows as GfVec3f, so we
        #    build the Vt array from a Python list of Gf.Vec3f objects.  For the
        #    point counts used in this demo (≤256) the Python-level conversion is
        #    negligible relative to the GPU kernel and DMA transfer.
        pts_attr.Set(Vt.Vec3fArray([Gf.Vec3f(float(p[0]), float(p[1]), float(p[2])) for p in cpu_pts_np]))

        sim.step(render=True)
        simulation_app.update()
        frame += 1


if __name__ == "__main__":
    main()
    simulation_app.close()

Answered by PeterL-NV

May 14, 2026

Not for plain UsdGeom.BasisCurves.points today, only for transforms and physics-owned point sets.

The newton_manager.py kernel you quoted works because omni:fabric:worldMatrix is the one Fabric attribute that has dual-storage on every Xformable prim — Fabric keeps a GPU-resident copy that Hydra/RTX renders directly from, so a Warp kernel writing into wp.fabricarray(dtype=wp.mat44d) is a real zero-copy path. That exact kernel also lives in this repo at fabric.py:30 set_fabric_transforms.

View full answer

PeterL-NV · 2026-05-11T21:08:13Z

PeterL-NV
May 11, 2026
Collaborator

Hi, Great question and excellent analysis in your demo — your understanding of the current data path is correct.

Options to improve performance

1. Batched DMA with pinned memory

Your demo already does this mostly right. A few optimizations:

Use pinned CPU memory (wp.array(..., device="cpu", pinned=True)) for the staging buffer — this can significantly speed up the PCIe transfer.
Batch all cables into a single DMA transfer — allocate one large GPU array for all cable points, compute all cables in one kernel launch, then do a single wp.copy for the entire batch rather than per-cable.

# Instead of per-cable transfers:
all_gpu_pts = wp.zeros(total_points_all_cables, dtype=wp.vec3f, device="cuda:0")
all_cpu_pts = wp.zeros(total_points_all_cables, dtype=wp.vec3f, device="cpu", pinned=True)

# One kernel launch for ALL cables
wp.launch(kernel=_compute_all_cables, dim=total_points_all_cables, inputs=[all_gpu_pts, ...])

# One DMA transfer
wp.copy(all_cpu_pts, all_gpu_pts)
wp.synchronize_device("cuda:0")

# Slice and set per-curve
offset = 0
for curve, n in zip(curves, points_per_curve):
    curve.GetPointsAttr().Set(Vt.Vec3fArray.FromNumpy(all_cpu_pts.numpy()[offset:offset+n]))
    offset += n

2. Avoids the Python list comprehension

For 64 points (768 bytes), the DMA transfer itself is essentially free. Your actual bottleneck is likely step 3 — the Python list comprehension building Vt.Vec3fArray from numpy:

# This is O(n) — slow for large n:
pts_attr.Set(Vt.Vec3fArray([Gf.Vec3f(float(p[0]), float(p[1]), float(p[2])) for p in cpu_pts_np]))

# Try instead (if supported in your version):
pts_attr.Set(Vt.Vec3fArray.FromNumpy(cpu_pts_np))
# Or:
pts_attr.Set(Vt.Vec3fArray(cpu_pts_np.tolist()))

0 replies

lyd405121 · 2026-05-12T01:28:18Z

lyd405121
May 12, 2026
Author

Thanks for your replay :)
I get it , make a big buffer to store all curves!

Another Question

Will there be totally gpu data sync method like xform data , deformables, particles
Using a singel warp kernel like in isaaclab[dev]: newton manager

@wp.kernel(enable_backward=False)
def _set_fabric_transforms(
    fabric_transforms: wp.fabricarray(dtype=wp.mat44d),
    newton_indices: wp.fabricarray(dtype=wp.uint32),
    newton_body_q: wp.array(ndim=1, dtype=wp.transformf),
):
    """Write Newton body transforms to Fabric world matrices.

    For each Fabric prim at thread ``i``, reads the Newton body transform at
    ``newton_body_q[newton_indices[i]]`` and stores it as a column-major
    ``mat44d`` in ``fabric_transforms[i]``.
    """
    i = int(wp.tid())
    idx = int(newton_indices[i])
    transform = newton_body_q[idx]
    fabric_transforms[i] = wp.transpose(wp.mat44d(wp.math.transform_to_matrix(transform)))

0 replies

PeterL-NV · 2026-05-14T21:13:27Z

PeterL-NV
May 14, 2026
Collaborator

Not for plain UsdGeom.BasisCurves.points today, only for transforms and physics-owned point sets.

The newton_manager.py kernel you quoted works because omni:fabric:worldMatrix is the one Fabric attribute that has dual-storage on every Xformable prim — Fabric keeps a GPU-resident copy that Hydra/RTX renders directly from, so a Warp kernel writing into wp.fabricarray(dtype=wp.mat44d) is a real zero-copy path. That exact kernel also lives in this repo at fabric.py:30 set_fabric_transforms.

1 reply

lyd405121 May 15, 2026
Author

A big thank to you！Learn a lot ^_^

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Cure point updating in gpu fabrics array #614

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[Question] Cure point updating in gpu fabrics array #614

Uh oh!

lyd405121 May 9, 2026

Backgroun‘d

Question

Demo

Replies: 3 comments · 1 reply

Uh oh!

PeterL-NV May 11, 2026 Collaborator

Options to improve performance

1. Batched DMA with pinned memory

2. Avoids the Python list comprehension

Uh oh!

lyd405121 May 12, 2026 Author

Another Question

Uh oh!

Uh oh!

PeterL-NV May 14, 2026 Collaborator

Uh oh!

lyd405121 May 15, 2026 Author

lyd405121
May 9, 2026

Replies: 3 comments 1 reply

PeterL-NV
May 11, 2026
Collaborator

lyd405121
May 12, 2026
Author

PeterL-NV
May 14, 2026
Collaborator

lyd405121 May 15, 2026
Author