Releases: aws/aws-parallelcluster-node
AWS ParallelCluster v2.10.3
We're excited to announce the release of AWS ParallelCluster Node 2.10.3
This is associated with AWS ParallelCluster v2.10.3
CHANGES
- There were no notable changes for this version.
AWS ParallelCluster v2.10.2
We're excited to announce the release of AWS ParallelCluster Node 2.10.2
This is associated with AWS ParallelCluster v2.10.2
CHANGES
- There were no notable changes for this version.
AWS ParallelCluster v2.10.1
We're excited to announce the release of AWS ParallelCluster Node 2.10.1
This is associated with AWS ParallelCluster v2.10.1
ENHANCEMENTS
- Improve error handling in slurm plugin processes when clustermgtd is down.
CHANGES
- Increase max attempts when retrying on Route53 API call failures.
AWS ParallelCluster v2.10.0
We're excited to announce the release of AWS ParallelCluster Node 2.10.0
This is associated with AWS ParallelCluster v2.10.0
ENHANCEMENTS
- Add new
all_or_nothing_batchconfiguration parameter forslurm_resumescript. WhenTrue,slurm_resumewill
succeed only if all the instances required by all the pending jobs in Slurm will be available.
CHANGES
- CentOS 6 is no longer supported.
- Optimize retrieval of nodes info from Slurm scheduler.
- Improve retrieval of instance type info by using
DescribeInstanceTypeAPI. - Increase timeout from 10 to 30 seconds when
clustermgtdandcomputemgtddaemons invoke Slurm commands.
BUG FIXES
- Retrieve the right number of compute instance slots when instance type is updated.
- Fix a bug that was causing
clustermgtdandcomputemgtdsleep interval to be incorrectly computed when
system timezone is not set to UTC.
AWS ParallelCluster v2.9.1
We're excited to announce the release of AWS ParallelCluster Node 2.9.1.
This is associated with AWS ParallelCluster v2.9.1.
CHANGES
- There were no notable changes for this version.
AWS ParallelCluster v2.9.0
We're excited to announce the release of AWS ParallelCluster Node 2.9.0
This is associated with AWS ParallelCluster v2.9.0
ENHANCEMENTS
- Add support for multiple queues and multiple instance types feature with the Slurm scheduler.
- Replace the previously available scaling components with:
clustermgtddaemon that takes care of handling compute fleet management operations, included the processing of health check failures coming from EC2 and cluster start/stop operations;slurm_resumeandslurm_suspendscripts that integrate with Slurm power saving plugin;computemgtddaemon that monitors the health of the system from the compute nodes. - Replace Auto Scaling Group with plain EC2 APIs when provisioning cluster nodes.
- Register cluster nodes in a Route53 private hosted zone when DNS resolution is enabled for the cluster.
- Register mapping between Slurm node names and EC2 instances in DynamoDB table.
- Create log files for the new components in
/var/log/parallelcluster/dir.
- Replace the previously available scaling components with:
AWS ParallelCluster v2.8.1
We're excited to announce the release of AWS ParallelCluster Node 2.8.1.
This is associated with AWS ParallelCluster v2.8.1.
CHANGES
- There were no notable changes for this version.
AWS ParallelCluster v2.8.0
We're excited to announce the release of AWS ParallelCluster Node 2.8.0.
This is associated with AWS ParallelCluster v2.8.0
ENHANCEMENTS
- Dynamically generate the architecture-specific portion of the path to the SGE binaries directory.
AWS ParallelCluster v2.7.0
We're excited to announce the release of AWS ParallelCluster Node 2.7.0
This is associated with AWS ParallelCluster v2.7.0
ENHANCEMENTS
sqswatcher: The daemon is now compatible with VPC Endpoints so that SQS messages can be passed without traversing the public internet.
AWS ParallelCluster v2.6.1
We're excited to announce the release of AWS ParallelCluster 2.6.1.
Upgrade
How to upgrade?
sudo pip install --upgrade aws-parallelcluster
ENHANCEMENTS
- Improved the management of SQS messages and retries to speed-up recovery times when failures occur.
CHANGES
- Do not launch a replacement for an unhealthy or unresponsive node until this is terminated. This makes cluster slower at provisioning new nodes when failures occur but prevents any temporary over-scaling with respect to the expected capacity.
- Increase parallelism when starting
slurmdon compute nodes that join the cluster from 10 to 30. - Reduce the verbosity of messages logged by the node daemons.
- Do not dump logs to
/home/logswhen nodewatcher encounters a failure and terminates the node. CloudWatch can be used to debug such failures. - Reduce the number of retries for failed REMOVE events in sqswatcher.
BUG FIXES
- Fixed a bug in the ordering and retrying of SQS messages that was causing, under certain circumstances of heavy load, the scheduler configuration to be left in an inconsistent state.
- Delete from queue the REMOVE events that are discarded due to hostname collision with another event fetched as part of the same
sqswatcheriteration.
Support
Need help / have a feature request?
AWS Support: https://console.aws.amazon.com/support/home
ParallelCluster Issues tracker on GitHub: https://github.com/aws/aws-parallelcluster
The HPC Forum on the AWS Forums page: https://forums.aws.amazon.com/forum.jspa?forumID=192