Add support for Feature specification for ComputeResources#3020
Add support for Feature specification for ComputeResources#3020efim-a-efim wants to merge 10 commits intoaws:developfrom
Feature specification for ComputeResources#3020Conversation
demartinofra
left a comment
There was a problem hiding this comment.
Thanks for contributing with this change. I left a few minor comments if you could take a look.
| self.disable_multithreading = compute_resource_config["DisableSimultaneousMultithreading"] | ||
| self.custom_settings = compute_resource_config.get("CustomSlurmSettings", {}) | ||
| # If Features are in CustomSlurmSettings, fetch them and remove from CustomSlurmSettings | ||
| self.custom_features = self.custom_settings.pop("Feature", "") |
There was a problem hiding this comment.
Instead of mutating the custom_settings object by removing features I would keep custom_settings aligned with what is injected from config and filter out features when building the custom += f" {param}={value}" string as part of the nodename rendering.
There was a problem hiding this comment.
OK, I'll check how to do it better. Thanks. I'll come back for review :)
| # this is a simple workaround for empty Features: split will give [''] and we'll discard it | ||
| # as well as all empty strings. | ||
| features.discard('') |
There was a problem hiding this comment.
default to None when reading the Feature string and skip the splitting so that you don't need to discard ''?
There was a problem hiding this comment.
Because it's the easiest way to work around human errors like double spaces, space at the end of the string, etc.
| # remove "system" features from configured features, so that they will never interfere | ||
| for feat in ('static', 'dynamic', 'gpu', 'efa') : | ||
| features.discard(feat) |
There was a problem hiding this comment.
you could consider leaving these unaltered - if for any reason the user passes them in we could just honor them as overrides
There was a problem hiding this comment.
OK, I was thinking that they are "system" and shouldn't be managed by users. If it's not a problem - I'll leave them.
Description of changes
With this changes, users will be able to specify additional
Featurevalues for ComputeResources. ThoseFeature-s will be safely added to the Slurm nodes specification for the partition.Rationale: In many cases, especially with complex cluster setups, it's crucial to have additional
Feature-s that can be used to filter nodes for the jobs.Tests
Added the following tests:
test/unit/slurm/test_slurm_config_generator.py::test_generate_slurm_config_with_custom_features- tests config generation with custom features specified. Corresponding test data is intest/unit/slurm/test_generate_slurm_config_with_custom_features.NOTE: test data for many tests were changed because the new features generation algorithm requires all features in
Feature=...in config files to be sorted. So many test files were affected, but the only change that were made to them is features ordering.References
This PR has a complimentary PR in
aws-parallelclusterrepo: aws/aws-parallelcluster#6962By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.