Skip to content

Fixed the random number generator issue on large # of GPUs#177

Merged
haykh merged 3 commits intomasterfrom
1.3.3rc
Feb 9, 2026
Merged

Fixed the random number generator issue on large # of GPUs#177
haykh merged 3 commits intomasterfrom
1.3.3rc

Conversation

@haykh
Copy link
Collaborator

@haykh haykh commented Feb 9, 2026

The issue was caused by the unnecessary allocation of random_number_pool_t objects on all Domain-s of the Metadomain, including the placeholder ones which did not belong to the given rank. The solution is to use std::optional wrapper for the object, and only get it by .value() when it .has_value() is satisfied.

API changes

  • now the random pool is accessed via random_pool() method of the domain, instead of directly getting the random_pool object. when trying to access this from the placeholder domain, will throw an error.

@haykh haykh self-assigned this Feb 9, 2026
@haykh haykh added the bug Something isn't working label Feb 9, 2026
@haykh haykh marked this pull request as ready for review February 9, 2026 22:47
@haykh haykh merged commit 5bf97fd into master Feb 9, 2026
5 checks passed
@haykh haykh deleted the 1.3.3rc branch February 9, 2026 22:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Memory leaks and crashes on AMD MI300A APU [BUG] Memory allocation error from Kokkos random number generator

1 participant