Improve GPU performance of TLSPH#1084
Improve GPU performance of TLSPH#1084efaulhaber wants to merge 8 commits intotrixi-framework:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR optimizes the TLSPH time-integration path for GPU workloads by reducing per-particle dispatch/branching in hot loops, adding GPU-optimized bulk copies for velocity/positions, and short-circuiting work when source terms/gravity are inactive.
Changes:
- Refactors
drift!velocity update into per-systemset_velocity!and adds a GPU fast-path (copyto!) for velocity assignment. - Adds a GPU fast-path for TLSPH position updates (
update_tlsph_positions!). - Refactors
add_source_terms!to dispatch per system and skip work when acceleration and source terms are inactive; updates special-case handling forParticlePackingSystem.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
src/general/semidiscretization.jl |
Refactors drift! velocity-setting and reworks source-term application/dispatch + skipping logic. |
src/schemes/structure/total_lagrangian_sph/system.jl |
Adds GPU-optimized TLSPH position copy. |
src/schemes/boundary/open_boundary/system.jl |
Updates open-boundary velocity-setting to new set_velocity! API and threaded loop. |
src/preprocessing/particle_packing/system.jl |
Adds ParticlePackingSystem source-term no-op specialization and adapts to set_velocity!. |
test/schemes/structure/total_lagrangian_sph/rhs.jl |
Removes now-unneeded mock add_acceleration! definition. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1084 +/- ##
==========================================
- Coverage 89.38% 89.27% -0.12%
==========================================
Files 122 122
Lines 8958 8987 +29
==========================================
+ Hits 8007 8023 +16
- Misses 951 964 +13
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
/run-gpu-tests |
| add_velocity!(du, v, u, particle, system, semi, t) | ||
| end | ||
| end | ||
| # Set velocity and add acceleration for each system |
There was a problem hiding this comment.
Isn't acceleration set in kick!()?
There was a problem hiding this comment.
Indeed. I just copied the wrong comment without thinking about it. But especially with #1055, the comment is redundant anyway because it's clear from the code what is happening.
This PR
integrate_tlsphchecks out of the hot loops indrift!and source terms (relevant for small problems on GPUs),add_velocity!andupdate_tlsph_positions!for GPU arrays,set_zero!(du)and renamesadd_velocity!toset_velocity!(relevant for small problems on GPUs),add_source_terms!when no source terms are used andsystem.accelerationis zero.Especially for split integration on GPUs, where a small TLSPH-only simulation is run for many sub-steps, this provides a very significant speedup.
Here are the relevant changes in the timer output of my large fin simulation (#844) on an H100. Note that this is before I moved the "source terms" timer behind the dispatch. Now, the "source terms" timer should actually completely disappear when neither source terms nor gravity is used.
main:
this PR:
Here are the full timer outputs for reference.
main:
this PR: