Skip to content

Fix: Enforce deterministic SDFG graph traversal and code generation#2320

Draft
kotsaloscv wants to merge 14 commits intospcl:mainfrom
kotsaloscv:deterministic_behavior
Draft

Fix: Enforce deterministic SDFG graph traversal and code generation#2320
kotsaloscv wants to merge 14 commits intospcl:mainfrom
kotsaloscv:deterministic_behavior

Conversation

@kotsaloscv
Copy link
Collaborator

@kotsaloscv kotsaloscv commented Mar 11, 2026

Motivation

When generating code via DaCe (e.g., as a backend for GT4Py/ICON4Py), clearing the Python cache between runs can result in structurally shifting generated code. While mathematically identical, these shifting instruction orders disrupt C++ compiler heuristics (such as register allocation and loop unrolling), leading to unpredictable performance regressions.

This non-determinism stems from graph traversal operations falling back on memory addresses or volatile generated UUIDs to break ties during dictionary and set iterations, which cascades into DaCe's internal adjacency lists and NetworkX topological sorts.

Solution

This PR introduces a deterministic sorting pass (sort_sdfg_alphabetically) that locks the internal memory layout of the SDFG before code generation by sorting elements based on their intrinsic semantic properties rather than their location in memory.

Specifically, this PR:

  1. Semantic Node & Edge Keys: Introduces get_deterministic_node_key and get_deterministic_edge_key. To safely break ties between identical nodes, these functions evaluate graph topology (in/out degrees), interface semantics (connectors), loop parameters, and a stable MD5 hash of internal Tasklet code.
  2. Strips Volatile Hashes: Implements a compiled regex (VOLATILE_STR_REGEX) to strip memory addresses, full UUIDs, and partial 8-character suffix hashes from node labels, connectors, and memlet data payloads.
  3. Deep Graph Stabilization: Recursively sorts global symbol tables (_arrays, symbols), master node/edge dictionaries, and nested $O(1)$ adjacency lists (in_edges, out_edges) in-place.
  4. Rebuilds NetworkX Backends: Tears down and deterministically rebuilds the underlying _nx multigraphs based on the newly stabilized registries.

Testing

Added tests/sdfg/deterministic_sort_test.py.
This test manually scrambles the internal dictionaries of an SDFG and SDFGState, applies the sorting pass, and asserts that the resulting adjacency lists and NetworkX nodes are forced back into a strict, predictable topological order based on the new semantic keys.

@kotsaloscv kotsaloscv requested a review from tbennun March 11, 2026 08:05
@kotsaloscv kotsaloscv marked this pull request as draft March 11, 2026 15:26
@kotsaloscv kotsaloscv changed the title Sort SDFG internals alphabetically for deterministic code generation Fix: Enforce deterministic SDFG graph traversal and code generation Mar 12, 2026
…le node/edge key generation for deterministic sorting
Copy link
Collaborator

@tbennun tbennun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could have some performance implications. Could we make this configurable?

Copy link
Collaborator

@philip-paul-mueller philip-paul-mueller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some comments ans suggestions.

Copy link
Collaborator

@philip-paul-mueller philip-paul-mueller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are still some things that needs to be checked, especially the global object.

Copy link
Collaborator

@philip-paul-mueller philip-paul-mueller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of small things.


sdfg_alphabetical_sorting:
type: bool
default: false
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would change the CI configuration such that when auto optimize is activated (probably also when just simplification is enabled) that sorting is enabled.
You can do that by modifying .github/workflows/general-ci.yml.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants