-
Notifications
You must be signed in to change notification settings - Fork 19
Revamp of the binding layer. #515
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
devreal
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much needed 👍 two comments inline, otherwise LGTM
d9006b7 to
f5af19f
Compare
|
I'm trying it on leconte. Which I interpret as when running Now, I run a parsec test with this PR: It looks like the two ranks are sharing the same cores from the output? A run of htop in a parallel terminal shows that only cores 0, 1, 13, 20, 21, 40, 41 and 65 have work to do (plus a bit for 77 and 46 sometimes), but definitely not all cores are active, and it's pretty slow. Also, I tried to rebase the PR on the current master, but there are conflicts I'm not sure how to solve. |
|
I was not able to find a way to translate between relative and absolute core numbering, so the reported bindings are relative to the allowed procs, and not absolute (as one would expect). Let me look again at the documentation to see if there is a way. |
Follow the process binding provided by the batch scheduler or process manager. If the application is started with mpirun, follow the bindings provided by the mpirun command. Add an MCA parameter (runtime_report_bindings) to report the final bindings for each virtual process and thread. Report the binding in both logical and physical notation. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
This option allow threads to run on a single physical resource. The singlification can be done early (for negative values of the MCA parameter) and will pack the threads on the resources, or late (for positive values of the MCA parameter) in which case the threads will be spread across the resources. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
d6334a6 to
be2ee69
Compare
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
|
@bosilca ping ... need it badly :) |
|
@bosilca ping again .. real showstopper |
|
doing the merge now |
| static int parsec_nb_total_threads = 0; | ||
| static int parse_binding_parameter(int vp, int nbth, char * binding); | ||
|
|
||
| int vpmap_get_nb_total_threads(void) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is used by dplasma, and is removed without replacement. The new parsec_context_query(parsec_context_t*) could be used instead but we don't have the parsec context at the caller site. I substituted with parsec_vpmap_get_vp_thread(0) in dplasma as that was probably the intent anyway (assuming vps have symmetrical number of threads). Should we reintroduce the function?
parsec/parsec.c
Outdated
| int core = atoi(option); | ||
| if( (core > -1) && (core < parsec_hwloc_nb_real_cores()) ) | ||
| context->comm_th_core = core; | ||
| /* negative core allowed to force an absolute core selection */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not true, passing a negative value to find_core_by_idx will cause an infinite loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bosilca is the comment wrong or the code wrong?
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
|
The following command produces the correct binding for testing (as witnessed from hwloc-ls, the binding is correctly restricted by mpiexec to the correct sockets). ' The resultant binding in Parsec does not appear correct: |
|
It looks like the message above is partially misleading: we initialized the vpmap before we extract the ALLOWED mask, so the vpmap_from_flat initializes something that is not the same as what the real final binding is. This is a problem because using VPs will do something different than not using VPs, and the output is misleading, but the actual binding produced should be correct when not using VPs. A potential solution is to move up the initialization of the ALLOWED_MASK early in parsec_init, and use the ALLOWED mask the same way we use it in the |
|
Looks like inherited binding is correct (beside the problem with the vpmap above, comparing |
|
vpmap initialization creates and fills
At this point I will excise the rework on the vpmap, merge the rework that is effective in the flat case, and defer completion of complex vpmap process binding to 4.1. |
write only atm. Use the bindthread to print the actual binding effected, if requested.
…ying the general binding.
Rework the bindings. The main idea is to inherit the bindings from the batch scheduler, and then work from there.