Skip to content

Enable register spilling to shared memory#1132

Draft
stephenswat wants to merge 1 commit intoacts-project:mainfrom
stephenswat:perf/spill_to_smem
Draft

Enable register spilling to shared memory#1132
stephenswat wants to merge 1 commit intoacts-project:mainfrom
stephenswat:perf/spill_to_smem

Conversation

@stephenswat
Copy link
Copy Markdown
Member

CUDA 13.0 enables the PTX assembler to spill registers to shared memory instead of local memory, which should both be much faster, and also reduce the local memory usage of our fitting and finding kernels which are currently bottlenecking our throughput.

@stephenswat stephenswat added the performance Performance-relevant changes label Aug 20, 2025
@sonarqubecloud
Copy link
Copy Markdown

@stephenswat
Copy link
Copy Markdown
Member Author

I'm not 100% certain this works as intended like this, as this pragma is to be attached at the function scope. But we can try.

@beomki-yeo
Copy link
Copy Markdown
Contributor

beomki-yeo commented Aug 20, 2025

This is interesting as we are not actively using the shared memory in our finding and fitting kernels.
On the contrary, I hope the compiler is smart enough not to overuse the shared memory as this can reduce the number of concurrent blocks. (if we can limit the usage of shared memory from register spilling it would be great) Please let us know if there is any noticeable performance change

CUDA 13.0 enables the PTX assembler to spill registers to shared memory
instead of local memory, which should both be much faster, and also
reduce the local memory usage of our fitting and finding kernels which
are currently bottlenecking our throughput.
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud bot commented Jan 9, 2026

@stephenswat
Copy link
Copy Markdown
Member Author

Physics performance summary

Here is a summary of the physics performance effects of this PR. Command used:

traccc_seeding_example_cuda --input-directory=/data/Acts/odd-simulations-20240506/geant4_ttbar_mu200 --digitization-file=geometries/odd/odd-digi-geometric-config.json --detector-file=geometries/odd/odd-detray_geometry_detray.json --grid-file=geometries/odd/odd-detray_surface_grids_detray.json --material-file=geometries/odd/odd-detray_material_detray.json --input-events=1 --use-acts-geom-source=on --check-performance --truth-finding-min-track-candidates=5 --truth-finding-min-pt=1.0 --truth-finding-min-z=-150 --truth-finding-max-z=150 --truth-finding-max-r=10 --seed-matching-ratio=0.99 --track-matching-ratio=0.5 --track-candidates-range=5:100 --seedfinder-vertex-range=-150:150

Seeding performance

Total number of seeds went from 33828 to 33828 (+0.0%)

Seeding plots



Track finding performance

Total number of found tracks went from 5565 to 5569 (+0.1%)

Finding plots









Track fitting performance

Fitting plots














Seeding to track finding relative performance

Seeding to track finding plots



Note

This is an automated message produced on the explicit request of a human being.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Performance-relevant changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants