Skip to content

Metal: extend address-space inference across call arguments#816

Merged
maleadt merged 9 commits into
mainfrom
tb/infer_as_ipo
Jun 1, 2026
Merged

Metal: extend address-space inference across call arguments#816
maleadt merged 9 commits into
mainfrom
tb/infer_as_ipo

Conversation

@maleadt
Copy link
Copy Markdown
Member

@maleadt maleadt commented Jun 1, 2026

Problem

InferAddressSpaces is intraprocedural. A pointer that reaches a load through a function parameter is invisible to it, because the parameter's address space was established at the call site, one frame up. On Metal that gap is fatal: a constant global GPUCompiler reads through an out-of-line runtime function (the exception reporters, the -g2 debug strings) arrives as a generic pointer and is read with a generic-space load, and the shader validator crashes on it.

At -g2 the deduced exception name lives in the constant space, but the reporter still reads it through a generic parameter:

@exception = private addrspace(2) constant [10 x i8] c"exception\00"

; caller
call void @report_exception(ptr addrspacecast (ptr addrspace(2) @exception to ptr))

define internal void @report_exception(ptr %ex) {
  %c = load i8, ptr %ex          ; generic load: validator crashes
}

add_global_address_spaces! already placed the string in addrspace(2), but the cast back to generic at the call site is exactly the provenance InferAddressSpaces cannot follow into the callee.

Solution

propagate_argument_address_spaces! supplies the missing call-edge inference. When every caller passes the same addrspacecast(specific -> generic) for a generic pointer parameter of an internal function, it retargets the parameter to that space, recreates the cast on entry so the body is untouched, and passes the source directly at the call sites:

; caller
call void @report_exception(ptr addrspace(2) @exception)

define internal void @report_exception(ptr addrspace(2) %ex) {
  %gen = addrspacecast ptr addrspace(2) %ex to ptr   ; recreated on entry
  %c = load i8, ptr %gen
}

The InferAddressSpaces run that already follows then finishes the job intraprocedurally, folding the entry cast away:

define internal void @report_exception(ptr addrspace(2) %ex) {
  %c = load i8, ptr addrspace(2) %ex                 ; constant-space load
}

The two together give whole-program inference for the case where all callers agree on the source space.

maleadt and others added 9 commits May 30, 2026 23:35
The exception-reporting runtime functions (report_exception and friends)
read GPUCompiler's deduced type-name and stack-frame string globals through
a generic pointer argument. Out of line, an address-space inference pass
can't trace those reads back to the constant globals, so they stay in the
flat/generic space — and Metal's shader validator crashes its compiler
service on a generic-space load of a constant global. Force-inlining the
functions before InferAddressSpaces lets it resolve the reads to the
constant globals (a clean constant-space load the validator accepts).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Changing a function's signature is unsound if it has callers outside the
module, so restrict the IPO address-space pass to internal/private linkage.
By finish_ir! the pipeline has already internalized everything but the
kernel entrypoints, so the targeted runtime helpers still qualify.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
clone_into! drops a parameter's attributes when it is remapped to the entry
addrspacecast rather than to a new argument, so reattach them to the
retargeted parameters. Also carry the call-site attributes over to the
rewritten calls.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Cloning remaps a function's recursive self-calls to the clone but leaves
them with the old signature. Collect those self-calls from the clone and
rewrite them through the same path as the external call sites, so a
narrowed function that calls itself stays well-typed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The narrowing only relocates a side-effect-free addrspacecast across the
call boundary, so it is correct for any pointer with a known address space,
not just constant globals. Drop the global restriction so device data
threaded through helpers benefits too, and rename the predicate accordingly.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Cover the narrowing of agreeing call sites, attribute preservation, and the
bail-out cases (disagreeing sources, address-taken or externally-visible
callees), plus self-recursion and a non-global (device-pointer) source.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Trim the pass comments to the essentials, drop the duplicated rationale at
the call site, and remove em dashes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A single sweep only narrows a function once all its callers already pass an
addrspacecast-from-specific, so a constant reaching a deep callee through a
delegating helper was missed unless functions happened to be visited in the
right order. Iterate until no change so narrowing is order-independent and
transitive; this lets back-ends delegate exception reporters instead of
duplicating their bodies.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@maleadt maleadt merged commit 66ac4d5 into main Jun 1, 2026
36 of 37 checks passed
@maleadt maleadt deleted the tb/infer_as_ipo branch June 1, 2026 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant