Releases · aws/aws-parallelcluster

23 Mar 21:04

hanwen-cluster

v3.15.0

b6b4e59

AWS ParallelCluster v3.15.0 Latest

Latest

We're excited to announce the release of AWS ParallelCluster 3.15.0

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

3.15.0

ENHANCEMENTS

Add support for p6-b300 instances for all OSs except AL2.
Replace cfn-hup in compute nodes with systemd timer to support in place updates in order to improve performance for tightly coupled worloads at scale.
This new mechanism relies on shared storage to sync updates between the head node and compute nodes.
Disable dnf-makecache.timer to improve performance for tightly coupled worloads on RHEL/Rocky at scale.
Support updates of Tags during cluster-updates.
Add LaunchTemplateOverrides to cluster config to allow network interfaces to be customized by overriding the launch template of a compute resource.
- This overrides the parallelcluster default using a shallow merge.
Add alarm on missing clustermgtd heartbeat.

CHANGES

When EFA is enabled, ParallelCluster now configures network interfaces as interface and efa-only instead of the combined efa type. NCI-0 is configured with an interface ENI for IP connectivity plus an efa-only ENI for EFA fabric. Secondary cards are configured with an efa-only only ENI. This reduces IP address consumption from one per network card to one per instance. EFA performance is unchanged. Customers who need the legacy efa behavior can set DevSettings: EfaInterfaceType: efa.
Reduce transient build-image failures in RHEL and Rocky caused by out-of-sync repo mirrors by resetting metadata upon retry.
Always start clustermgtd on cluster update and compute fleet status update failure, regardless the failure condition.
Improve resiliency of the cluster update rollback workflow.
Upgrade Slurm to version 25.11.4 (from 24.11.7).
Upgrade Pmix to 5.0.10 (from 5.0.6).
Upgrade EFA installer to 1.47.0 (from 1.44.0).
- Efa-driver: efa-3.0.0
- Efa-config: efa-config-1.18-1
- Efa-profile: efa-profile-1.7-1
- Libfabric-aws: libfabric-aws-2.4.0-1
- Rdma-core: rdma-core-61.0-1
- Open MPI: openmpi40-aws-4.1.7-2 and openmpi50-aws-5.0.9
Upgrade NVIDIA driver to version 580.105.08 (from 570.172.08) for all OSs except Amazon Linux 2.
Upgrade GDRCopy to version 2.5.2 (from 2.4.4).
Upgrade DCV to version 2025.0-20103 (from 2024.0-19030).
Upgrade CUDA Toolkit to version 13.0.2 (from 12.8.1) for all OSs except Amazon Linux 2.
Upgrade NVIDIA Fabric manager to 580.105.08 for all OSs except Amazon Linux 2.
Upgrade Python to 3.14.2 (from 3.12.11) for all OSs except Amazon Linux 2.
Upgrade aws-cfn-bootstrap to version 2.0-38 (from 2.0-33).
Upgrade DCGM to version 4.5.1 (from 4.4.1) for all OSs except Amazon Linux 2.
Upgrade mysql-community-client to version 8.4.8 (from 8.0.39) for all OSs except Amazon Linux 2.
Upgrade Intel MPI Library to 2021.17.2 (from 2021.16.0).
Upgrade Cinc Client to version 18.8.54 (from 18.7.10).
Upgrade amazon-efs-utils to version 2.4.0 (from v2.1.0) for Amazon Linux AMI's.
Upgrade jmespath to ~=1.0 (from ~=0.10).
Upgrade tabulate to <=0.9.0 (from <=0.8.10).
Add a validator to warn in case in-place updates have been disabled (via DevSettings) on compute and login nodes.

BUG FIXES

Fix LoginNodes NLB not being publicly accessible when using public subnet with implicit main route table association.
See #7173
Fix a failure when creating a cluster with GPU instances and with DCV enabled but without internet access.
Fix an issue where cluster creation would intermittently fail due to eventual consistency when the head/compute/login nodes share the same security group.
Fix build-image failure during ubuntu-desktop installation on a Ubuntu parent image with outdated OS packages.
Fix validation of HeadNode/LocalStorage. This configuration parameter does not support updates.
Fix validator PlacementGroupCapacityReservationValidator to accept capacity reservations with cross-account placement group.
Fix the CloudWatch agent configuration to ensure proper parsing of timestamps across all log files.
Fix logging configuration to capture all Slurm health check events (updating log level from WARNING to INFO to prevent missing log entries).
Improve cluster update resiliency by ensuring the update does not fail on nodes completing the bootstrap during the update.
Prevent cluster update failure recovery process from running on AWS Batch clusters. This recovery mechanism should only execute on Slurm clusters.

DEPRECATIONS

The LoginNodes/Pools/Ssh/KeyName configuration parameter, deprecated since 3.14.0, is no longer supported.
This is the last ParallelCluster release supporting Amazon Linux 2, as Amazon Linux 2 will reach end of support on June 30, 2026.
This is the last ParallelCluster release supporting AWS Batch CLI. Starting with v3.16.0, ParallelCluster will no longer support AWS Batch as a scheduler.

Assets 2

16 Feb 22:35

hgreebe

v3.14.2

751037a

AWS ParallelCluster v3.14.2

We're excited to announce the release of AWS ParallelCluster 3.14.2

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

3.14.2

CHANGES

Upgrade munge to version 0.5.18 (from 0.5.16) to address CVE-2026-25506.
Upgrade NodeJS version in installer to version 22.22.0 (from 20.18.3).

Assets 2

23 Dec 02:01

himani2411

v3.14.1

4bea369

AWS ParallelCluster v3.14.1

We're excited to announce the release of AWS ParallelCluster 3.14.1

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

3.14.1

CHANGES

Improve cluster update resiliency by ensuring clustermgtd is started after updates complete successfully, or after failed updates where queue reconfiguration succeeded.
Add chef attribute cluster/in_place_update_on_fleet_enabled to disable in-place updates on compute and login nodes
and mitigate performance impact at scale. See #7095
Upgrade Slurm to version 24.11.7 (from 24.11.6).
Upgrade Werkzeug to ~=3.1 (from ~=2.0) to address CVE-2024-34069.
Upgrade Connexion to ~=2.15.1 (from ~=2.13.0).
Upgrade Flask to ~=3.1.0 (from >=2.2.5,<2.3).
Load kernel module drm_client_lib before installation of NVIDIA driver, if available on the kernel.
Reduce dependency footprint by installing the package sssd-common rather than sssd.
Upgrade libjwt to version 1.18.4 (from 1.17.0) for all OSes except Amazon Linux 2.
Upgrade amazon-efs-utils to version 2.4.0 (from v2.3.1).
Upgrade EFA installer to 1.44.0 (from 1.43.2).
- Efa-driver: efa-2.17.3-1
- Efa-config: efa-config-1.18-1
- Efa-profile: efa-profile-1.7-1
- Libfabric-aws: libfabric-aws-2.3.1-1
- Rdma-core: rdma-core-59.0-1
- Open MPI: openmpi40-aws-4.1.7-2 and openmpi50-aws-5.0.8-11

BUG FIXES

Fix an issue where cfn-hup enters an endless loop on the head node after a rollback to a cluster state older than 24 hours, caused by cfn-signal failing to signal an expired wait condition handle.
Fix race condition where compute nodes could deploy the wrong cluster config version after an update failure.
Prevent cluster readiness check failures due to instances launched while the check is in progress.
Fix incorrect timestamp parsing for chef-client.log in CloudWatch Agent configuration.
Disable snap auto-refresh on Ubuntu during build image to prevent intermittent reboot failures.
Reduce EFA installation time for Ubuntu by ~20 minutes by only holding kernel packages for the installed kernel.
Add GetFunction and GetPolicy permissions to PClusterBuildImageCleanupRole to prevent AccessDenied errors during build image stack deletion.
Fix validation error messages when DevSettings is null or DevSettings/InstanceTypesData is missing required fields.

Assets 2

30 Sep 12:13

hgreebe

v3.14.0

500faa0

AWS ParallelCluster v3.14.0

We're excited to announce the release of AWS ParallelCluster 3.14.0

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

ENHANCEMENTS

Include drivers for P6e-GB200 and P6-B200 instances. ParallelCluster sets up Slurm topology plugin to handle P6e-GB200 UltraServers. See limitations section for important additional setup requirements.
Support prioritized and capacity-optimized-prioritized Allocation Strategy. This allows users to prioritize subnets for instance placement to optimize costs and performance.
Add build-image support for Amazon Linux 2023 AMIs based on kernel 6.12 (in addition to 6.1).
Support DCV on Amazon Linux 2023.
Echo chef-client logs in the instance console when a node fails to bootstrap. This helps with investigating bootstrap failures in cases CloudWatch logs are not available.

LIMITATIONS

P6e-GB200 instances are only tested on Amazon Linux 2023, Ubuntu 22.04 and Ubuntu 24.04.
Using IMEX on P6e-GB200 requires additional setup. Please refer to the dedicated tutorial in our public documentation.
P6-B200 instances are only tested on Amazon Linux 2023, RHEL 8 & 9, Rocky 8 & 9, Ubuntu 22.04 and Ubuntu 24.04.
GPU HealthChecks are not recommended for instances with GPU memory above 320GB (such as p6-b200.48xlarge). Health check duration can exceed 10 minutes, potentially causing job failures and significantly reducing the job throughput.

CHANGES

Install nvidia-imex for all OSs except Amazon Linux 2.
Remove UnkillableStepTimeout from slurm.conf and let slurm set this value.
Upgrade Python runtime used by Lambda functions to Python 3.12 (from 3.9). See Lambda Documentation for important information about Python 3.9 EOL: https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtimes.html
Support encryption of EFS file system used for the head node internal shared storage via a new configuration parameter HeadNode/SharedStorageEfsSettings/Encrypted
Add validator that warns against using non GPU instances with DCV.
Upgrade Slurm to version 24.11.6 (from 24.05.8).
Upgrade EFA installer to 1.43.2 (from 1.41.0).
- Efa-driver: efa-2.17.2-1
- Efa-config: efa-config-1.18-1
- Efa-profile: efa-profile-1.7-1
- Libfabric-aws: libfabric-aws-2.1.0-5
- Rdma-core: rdma-core-58.0-1
- Open MPI: openmpi40-aws-4.1.7-2 and openmpi50-aws-5.0.6-11
Upgrade Cinc Client to version 18.4.12 (from 18.2.7).
Upgrade NVIDIA driver to version 570.172.08 (from 570.86.15) for all OSs except Amazon Linux 2.
Upgrade CUDA Toolkit to version 12.8.1 (from 12.8.0) for all OSs except Amazon Linux 2.
Upgrade DCGM to version 4.4.1 (from 3.3.6) for all OSs except Amazon Linux 2.
Upgrade Python to 3.12.11 (from 3.12.8) for all OSs except Amazon Linux 2.
Upgrade Python to 3.9.23 (from 3.9.20) for Amazon Linux 2.
Upgrade Intel MPI Library to 2021.16.0 (from 2021.13.1).
Upgrade DCV to version 2024.0-19030.
Upgrade the official ParallelCluster Amazon Linux 2023 AMIs to kernel 6.12 (from 6.1).

BUG FIXES

Prevent build-image stack deletion failures by deploying a global role that automatically deletes the build-image stack after images either succeed or fail the build.
The role is meant to exist even after the stack has been deleted. See #5914.
Fix an issue where Security Group validation failed when a rule contained both IPv4 ranges (IpRanges) and security group references (UserIdGroupPairs).
Fix build-image failure on Rocky 9, occurring when the parent image does not ship the latest kernel version on the latest Rocky minor version.
Fix cluster id mismatch issue which causes cluster update failures when slurm accounting is used.
Fix a race condition in CloudWatch Agent startup that could cause node bootstrap failures.

DEPRECATIONS

The configuration parameter LoginNodes/Pools/Ssh/KeyName has been deprecated, and it will be removed in future releases. The CLI now returns a warning message when it is used in the cluster configuration.
See #6811.
Ubuntu 20.04 is no longer supported.

Assets 2

24 Jun 21:40

hehe7318

v3.13.2

9fe28d4

AWS ParallelCluster v3.13.2

We're excited to announce the release of AWS ParallelCluster 3.13.2

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

3.13.2

BUG FIXES

Fix a bug which may cause update-cluster and update-compute-fleet to fail when compute resources reference an expired Capacity Reservation that is no longer accessible via EC2 APIs.
Fix build-image failure on Rocky 9, occurring when the parent image does not ship the latest kernel version. See #6874.

Assets 2

04 Jun 20:53

gmarciani

v3.13.1

49ce8e0

AWS ParallelCluster v3.13.1

We're excited to announce the release of AWS ParallelCluster 3.13.1

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

CHANGES

Upgrade Slurm to version 24.05.8.
Upgrade EFA installer to 1.41.0 (from 1.38.1).
- Efa-driver: efa-2.15.0-1
- Efa-config: efa-config-1.18-1
- Efa-profile: efa-profile-1.7-1
- Libfabric-aws: libfabric-aws-2.1.0-1
- Rdma-core: rdma-core-57.0-1
- Open MPI: openmpi40-aws-4.1.7-2 and openmpi50-aws-5.0.6
Upgrade amazon-efs-utils to version 2.3.1 (from v2.1.0) for non-Amazon Linux AMI's.
Support DCV in us-isob-east-1 and us-iso-east-1.
Support FSX for Lustre and Ontap in us-isob-east-1 and us-iso-east-1.
Ensure kernel consistency throughout ParallelCluster image build by pinning at the beginning and unpinning at completion.

BUG FIXES

Fix a bug in the installation of ARM Performance Library that was causing the build image fail in isolated environments.
Fix a bug that was preventing the script 'update_directory_service_password.sh' from updating the AD password.

Assets 2

01 Apr 20:39

gmarciani

v3.13.0

9a15d4d

AWS ParallelCluster v3.13.0

We're excited to announce the release of AWS ParallelCluster 3.13.0

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

DEPRECATIONS

This is the last ParallelCluster release supporting Ubuntu 20.04
as Ubuntu 20.04 will be in End-Of-Standard-Support on May 2025.

ENHANCEMENTS

Add support for Ubuntu 24.04.
Add support for ap-southeast-7 region.
Disable unused services cups and wpa_supplicant from Official ParallelCluster AMIs to improve security.

CHANGES

Upgrade Slurm to version 24.05.7.
Upgrade NVIDIA driver to version 570.86.15 (from 550.127.08) for all OSs except AL2.
Upgrade CUDA Toolkit to version 12.8.0 (from 12.4.1) for all OSs except AL2.
Upgrade Python to 3.12.8 for all OSs except AL2 (from 3.9.20).
On Ubuntu 22.04, install the Nvidia driver with the same compiler version used to compile the kernel.
Upgrade aws-cfn-bootstrap to version 2.0-33.
Upgrade EFA installer to 1.38.0 (from 1.36.0).
- Efa-driver: efa-2.13.0-1
- Efa-config: efa-config-1.17-1
- Efa-profile: efa-profile-1.7-1
- Libfabric-aws: libfabric-aws-1.22.0-1
- Rdma-core: rdma-core-54.0-1
- Open MPI: openmpi40-aws-4.1.7-1 and openmpi50-aws-5.0.5
Upgrade amazon-efs-utils to version 2.1.0.
Remove third-party cookbook: apt-7.5.22 and pyenv-4.2.3.
Upgrade third-party cookbook dependencies:
- line-4.5.21 (from line-4.5.13)
- nfs-5.1.5 (from nfs-5.1.2)
- openssh-2.11.14 (from openssh-2.11.12)
- yum-7.4.20 (from yum-7.4.13)
- yum-epel-5.0.8 (from yum-epel-5.0.2)
Upgrade Pmix to 5.0.6 (from 5.0.3).
Upgrade ARM PL to version 24.10 (from 23.10).
Upgrade Python to version 3.12.8 (from 3.9.17) in Lambda layer and installer.
Upgrade NodeJS to version 20.18.3 (from 18.20.3) in Lambda layer and installer.
Remove generation of DSA keys for login nodes as DSA, which became unsupported in OpenSSH 9.7+.
Set instance ID and instance type information in Slurm upon compute nodes launch.
Install NVIDIA drivers without the option 'no-cc-version-check', which is now deprecated in the NVIDIA installer.
Add validator to enforce up to 10- login node pools.
Update the default root volume size to 45 GB.
Increase HeadNodeBootstrapTimeout by 5 minutes, making it 35 minutes in total.

BUG FIXES

Remove usage of cfn-init for compute node bootstrapping to reduce node scale up time.
Fix an issue causing compute node bootstrap failure when a proxy is used.
On Ubuntu 22.04, install the Nvidia driver with the same compiler version used to compile the kernel
to prevent installation failures.- Fix the execution of overriding aws-parallelcluster-node package only on the head node during update.
Fix an issue where containerized jobs executed through Pyxis/Enroot in a multi-user environment (integrated with Active Directory) would fail.
Fix usage of authselect causing node bootstrap failures on Rocky 9.5+ when directory service is used.

Assets 2

18 Dec 22:10

dreambeyondorange

v3.12.0

35ae681

AWS ParallelCluster v3.12.0

We're excited to announce the release of AWS ParallelCluster 3.12.0

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

ENHANCEMENTS

Add new build image configuration section Build/Installation to turn on/off Nvidia software and Lustre client installations. By default, Nvidia software, although included in official ParallelCluster AMIs, is not installed by build-image. By default, Lustre client is installed.
The CLI commands export-cluster-logs and export-image-logs can now by default export the logs to the default ParallelCluster bucket or to the CustomS3Bucket if specified in the config.
Extend Amazon DCV support to Ubuntu2204 on ARM instances.

CHANGES

Upgrade NVIDIA driver to version 550.127.08 (from 550.90.07). This addresses a known issue from Nivdia.
Upgrade Amazon DCV to version 2024.0-18131.
- server: 2024.0-18131-1
- xdcv: 2024.0.631-1
- gl: 2024.0.1078-1
- web_viewer: 2024.0-18131-1
Upgrade EFA installer to 1.36.0.
- Efa-driver: efa-2.13.0-1
- Efa-config: efa-config-1.17-1
- Efa-profile: efa-profile-1.7-1
- Libfabric-aws: libfabric-aws-1.22.0-1
- Rdma-core: rdma-core-54.0-1
- Open MPI: openmpi40-aws-4.1.7-1 and openmpi50-aws-5.0.5
Auto-restart slurmctld on failure.
Upgrade mysql-community-client to version 8.0.39.
Remove support for Python 3.7 and 3.8, which are in end of life.

BUG FIXES

Fix an issue where changes in sequence of custom actions scripts were not detected during cluster updates.
Add missing permissions for ParallelCluster API to create the service linked roles for Elastic Load Balancing and Auto Scaling, that are required to deploy login nodes.
Fix an issue in the way we get region when manage volumes so that it can correctly handle local zone.
Fix an issue where adding EFS filesystems with AccessPointIds during an update would fail.
Fix an issue where when using PCAPI, cluster update could fail when updating a parameter that is not type String (e.g. MaxCount).
When mounting an external OpenZFS, it is no longer required to set the outbound rules for ports 111, 2049, 20001, 20002, 20003.

Assets 2

21 Oct 16:54

gmarciani

v3.11.1

c877343

AWS ParallelCluster v3.11.1

We're excited to announce the release of AWS ParallelCluster 3.11.1

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

CHANGES

Pyxis is now disabled by default, so it must be manually enabled as documented in the product documentation.
Upgrade Python runtime to version 3.12 in ParallelCluster Lambda Layer.
Remove version pinning for setuptools to version prior to 70.0.0.
Upgrade libjwt to version 1.17.0.

BUG FIXES

Fix an issue in the way we configure the Pyxis Slurm plugin in ParallelCluster that can lead to job submission failures.
#6459
Add missing permissions required by login nodes to the public template of policies.

Assets 2

26 Sep 18:26

hgreebe

v3.11.0

df041cd

AWS ParallelCluster v3.11.0

We're excited to announce the release of AWS ParallelCluster 3.11.0

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

ENHANCEMENTS

Add support for custom actions on login nodes.
Allow DCV connection to login nodes.
Add support for ap-southeast-3 region.
Add security groups to login node network load balancer.
Add AllowedIps configuration for login nodes.
Add new configuration SharedStorage/EfsSettings/AccessPointId to specify an optional EFS access point for a mount
Allow up to 10 login node pools.
Install enroot and pyxis in official pcluster AMIs

CHANGES

[BREAKING] The loginNodes field returned by the API DescribeCluster and the CLI command describe-cluster
has been changed from a dictionary to an array to support multiple pools of login nodes.
This change breaks backward compatibility, making these operations incompatible with clusters deployed with older versions.
Upgrade Slurm to 23.11.10 (from 23.11.7).
Upgrade Pmix to 5.0.3 (from 5.0.2).
Upgrade EFA installer to 1.34.0.
- Efa-driver: efa-2.10.0-1
- Efa-config: efa-config-1.17-1
- Efa-profile: efa-profile-1.7-1
- Libfabric-aws: libfabric-aws-1.22.0-1
- Rdma-core: rdma-core-52.0-1
- Open MPI: openmpi40-aws-4.1.6-3 and openmpi50-aws-5.0.3-11
Upgrade NVIDIA driver to version 550.90.07 (from 535.183.01).
Upgrade CUDA Toolkit to version 12.4.1 (from 12.2.2).
Upgrade Python to 3.9.20 (from 3.9.19).
Upgrade Intel MPI Library to 2021.13.1.769 (from 2021.12.1.8).

BUG FIXES

Fix validator EfaPlacementGroupValidator so that it does not suggest to configure a Placement Group when Capacity Blocks are used.
Fix occasional cluster creation failures by ensuring that FSx for Lustre file systems are created after security group rules.
Fix cluster deletion failure when placement group is enabled.
Fix issue with login nodes being marked unhealthy when restricting SSH access.
Fix retrieve_supported_regions so that it can get the correct S3 url.
Fix describe_images to use pagination.
Fix No route tables found bug when specifying default VPC subnet to LoginNodes/Networking/SubnetIds.

Assets 2

Releases: aws/aws-parallelcluster

AWS ParallelCluster v3.15.0

Upgrade

3.15.0

Uh oh!

AWS ParallelCluster v3.14.2

Upgrade

3.14.2

Uh oh!

AWS ParallelCluster v3.14.1

Upgrade

3.14.1

Uh oh!

AWS ParallelCluster v3.14.0

Upgrade

Uh oh!

AWS ParallelCluster v3.13.2

Upgrade

3.13.2

Uh oh!

AWS ParallelCluster v3.13.1

Upgrade

Uh oh!

AWS ParallelCluster v3.13.0

Upgrade

Uh oh!

AWS ParallelCluster v3.12.0

Upgrade

Uh oh!

AWS ParallelCluster v3.11.1

Upgrade

Uh oh!

AWS ParallelCluster v3.11.0

Upgrade

Uh oh!