Use SST_MPI_Comm_spawn_multiple in Airel PIN backend#2638
Use SST_MPI_Comm_spawn_multiple in Airel PIN backend#2638plavin wants to merge 6 commits intosstsimulator:develfrom
Conversation
|
This PR needs to be retargeted at the devel branch |
|
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection is Not Necessary for this Pull Request. |
|
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.9_sst-elements
Build InformationTest Name: SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.9_sst-elements_Make-Dist
Build InformationTest Name: SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.9_sst-elements_MT-2
Build InformationTest Name: SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.9_sst-elements_MR-2
Build InformationTest Name: SST__AutotestGen2_NewFW_OSX-15-XC15-ARM2_OMPI-4.1.6_PY3.10_sst-elements
Using Repos:
Pull Request Author: plavin |
|
Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED Pull Request Auto Testing has PASSED (click to expand)Build InformationTest Name: SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.9_sst-elements
Build InformationTest Name: SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.9_sst-elements_Make-Dist
Build InformationTest Name: SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.9_sst-elements_MT-2
Build InformationTest Name: SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.9_sst-elements_MR-2
Build InformationTest Name: SST__AutotestGen2_NewFW_OSX-15-XC15-ARM2_OMPI-4.1.6_PY3.10_sst-elements
|
|
Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
|
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
17 similar comments
|
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
|
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
|
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
|
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
|
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
|
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
|
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
|
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
|
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
|
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
|
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
|
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
|
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
|
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
|
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
|
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
|
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
|
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
2 similar comments
|
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
|
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
|
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
25 similar comments
|
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
|
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
|
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
|
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
|
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
|
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
|
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
|
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
|
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
|
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
|
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
|
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
|
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
|
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
|
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
|
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
|
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
|
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
|
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
|
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
|
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
|
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
|
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
|
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
|
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
This PR updates Ariel to use
MPI_Comm_spawn_multiplefor launching applications when core is compiled with MPI support. This simplifies the build system as now the only parts of Ariel that depend on an MPI compiler are the test applications. This also makes launching applications more robust, as MPI applications are not supposed to call fork. Fixes #2624.This PR also adds new functionality to the Ariel API. Two new functions are added:
These will each cause a message to be output to stdout, with the region name and the current simulation timestamp. This can be used to correlate stat dumps with locations in the source app.
Justification for changes to
fesimple.ccThe file
ariel/frontend/pin3/fesimple.ccrequired extensive changes. One tricky part about tracing MPI applications with PIN is that the MPI library will typically spawn its own threads when MPI_Init is called. This causes two issues: (1) the program has more threads than specified by the user in their Ariel config meaning the shared memory tunnel is not big enough, and (2) if MPI_Init is called before all of the application threads are launched, the MPI threads will receive lower IDs than the application threads. This makes ignoring them harder.The first solution, which is removed by this PR, was to try and place an OMP parallel region before MPI_Init, so that the application threads would always be numbered 0..N-1. But this obviously only works for OpenMP programs and meant that the Ariel API needed to be compiled with
-fopenmp.The new approach is to track which threads were MPI threads by checking if
libmpi.sowas in their callstack. This works but now we have to maintain a map of the thread IDs (typically [0,3,4,...,N+1] -> [0,...,N-1]). We then need to check this map when writing to the tunnel. PIN won't let us change how it numbers threads. In a future update, I hope to move this functionality to a class that wraps the tunnel so that the map can be queried in a single location instead of all overfesimple.cc.Known Issues
FATAL: ArielComponent[arielcore.cc:486:refillQueue] Error: Ariel did not understand command (128) provided during instruction queue refill.or it may cause the program to hang indefinitely. This error seems to mostly affecttest_Ariel_test_ivb_pin.OMPI_MCA_rmaps_base_oversubscribe=1in the MPI testsuite file.fesimple.ccmay break if more application threads are launched than the thecorecountparameter passed to theariel.arielcomponent.