Metal: extend address-space inference across call arguments#816
Merged
Conversation
The exception-reporting runtime functions (report_exception and friends) read GPUCompiler's deduced type-name and stack-frame string globals through a generic pointer argument. Out of line, an address-space inference pass can't trace those reads back to the constant globals, so they stay in the flat/generic space — and Metal's shader validator crashes its compiler service on a generic-space load of a constant global. Force-inlining the functions before InferAddressSpaces lets it resolve the reads to the constant globals (a clean constant-space load the validator accepts). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Changing a function's signature is unsound if it has callers outside the module, so restrict the IPO address-space pass to internal/private linkage. By finish_ir! the pipeline has already internalized everything but the kernel entrypoints, so the targeted runtime helpers still qualify. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
clone_into! drops a parameter's attributes when it is remapped to the entry addrspacecast rather than to a new argument, so reattach them to the retargeted parameters. Also carry the call-site attributes over to the rewritten calls. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Cloning remaps a function's recursive self-calls to the clone but leaves them with the old signature. Collect those self-calls from the clone and rewrite them through the same path as the external call sites, so a narrowed function that calls itself stays well-typed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The narrowing only relocates a side-effect-free addrspacecast across the call boundary, so it is correct for any pointer with a known address space, not just constant globals. Drop the global restriction so device data threaded through helpers benefits too, and rename the predicate accordingly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Cover the narrowing of agreeing call sites, attribute preservation, and the bail-out cases (disagreeing sources, address-taken or externally-visible callees), plus self-recursion and a non-global (device-pointer) source. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Trim the pass comments to the essentials, drop the duplicated rationale at the call site, and remove em dashes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A single sweep only narrows a function once all its callers already pass an addrspacecast-from-specific, so a constant reaching a deep callee through a delegating helper was missed unless functions happened to be visited in the right order. Iterate until no change so narrowing is order-independent and transitive; this lets back-ends delegate exception reporters instead of duplicating their bodies. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
InferAddressSpacesis intraprocedural. A pointer that reaches a load through a function parameter is invisible to it, because the parameter's address space was established at the call site, one frame up. On Metal that gap is fatal: a constant global GPUCompiler reads through an out-of-line runtime function (the exception reporters, the-g2debug strings) arrives as a generic pointer and is read with a generic-space load, and the shader validator crashes on it.At
-g2the deduced exception name lives in the constant space, but the reporter still reads it through a generic parameter:add_global_address_spaces!already placed the string inaddrspace(2), but the cast back to generic at the call site is exactly the provenanceInferAddressSpacescannot follow into the callee.Solution
propagate_argument_address_spaces!supplies the missing call-edge inference. When every caller passes the sameaddrspacecast(specific -> generic)for a generic pointer parameter of an internal function, it retargets the parameter to that space, recreates the cast on entry so the body is untouched, and passes the source directly at the call sites:The
InferAddressSpacesrun that already follows then finishes the job intraprocedurally, folding the entry cast away:The two together give whole-program inference for the case where all callers agree on the source space.