Skip to content

ssa: emit go semantic metadata#1728

Open
luoliwoshang wants to merge 35 commits intogoplus:mainfrom
luoliwoshang:codex/useiface-metadata-producer
Open

ssa: emit go semantic metadata#1728
luoliwoshang wants to merge 35 commits intogoplus:mainfrom
luoliwoshang:codex/useiface-metadata-producer

Conversation

@luoliwoshang
Copy link
Copy Markdown
Member

@luoliwoshang luoliwoshang commented Mar 18, 2026

Background

Ordinary call/ref reachability only sees direct symbol references, but in Go the question of which methods must be kept also depends on additional semantics such as:

  • interface conversions
  • interface method calls
  • MethodByName
  • conservative reflection handling

An ordinary reference graph alone cannot answer:

  • which concrete types enter the interface dispatch domain
  • which interface method slots are actually demanded
  • which method names are requested by name
  • which sites must fall back to conservative reflection handling

This PR is not the end of the full pipeline. It is one implementation stage of the proposal: it emits these Go semantic facts as llgo.xxx metadata during ssa/cl, and adds stable readback for them.

This PR corresponds to the producer / readback portion of:

  • llgo#1727: Proposal: Semantic Pruning of Unreachable Methods from a Global Graph

This PR only covers the producer / readback layer of the proposal. It does not include:

  • whole-program analysis after metadata aggregation
  • global method liveness driven by OrdinaryEdges / TypeChildren

What This PR Does

1. Emit Go semantic metadata during ssa/cl

This PR adds and wires up the following named metadata:

  • !llgo.useiface
  • !llgo.useifacemethod
  • !llgo.interfaceinfo
  • !llgo.methodinfo
  • !llgo.usenamedmethod
  • !llgo.reflectmethod

They represent:

  • useiface
    • which concrete types enter the interface semantic domain once an owner becomes reachable
  • useifacemethod
    • which interface method demands are produced once an owner becomes reachable
  • interfaceinfo
    • the complete method set of an interface
  • methodinfo
    • the method-table slots of a concrete type, plus MType / IFn / TFn
  • usenamedmethod
    • which method names are requested exactly through MethodByName
  • reflectmethod
    • which owners must fall back to conservative reflection handling

2. Define the metadata encoding baseline

The current encoding is row-oriented to keep emission and later aggregation simple:

  • llgo.interfaceinfo
    • one row per interface method
  • llgo.methodinfo
    • one row per method slot

In particular:

  • methodinfo is emitted only when a type actually has method slots
  • unexported method names are normalized with package qualification
  • MethodInfo keeps Index / MethodSig / IFn / TFn, matching the needs of later analysis

3. Distinguish the emission ownership of MethodInfo and InterfaceInfo

Both are inputs to later analysis, but they intentionally use different emission strategies in the current implementation.

MethodInfo

MethodInfo is tied to the method-table layout of a concrete type. Its ownership is stronger, and duplicate emission is much noisier.

So this PR tightens its emission:

  • for ordinary imported, non-generic named types
    • llgo.methodinfo is no longer emitted redundantly at use sites
  • their method-slot metadata is emitted only by the defining package

Generic instances remain conservatively allowed at use sites for now:

  • current LLGo still materializes generic instance methods primarily at use sites
  • a concrete imported generic named instance may only be instantiated and compiled in the current package
  • if imported generic instances were forced into definition-site-only emission now, metadata could be missing even though the instance methods were materialized locally

So the current policy is:

  • ordinary imported, non-generic named types
    • deduplicated; emitted only by the defining package
  • imported generic instances
    • still conservatively allowed at use sites, prioritizing completeness

InterfaceInfo

InterfaceInfo records the complete method set of an interface. In principle it also looks like definition-site information, but under current LLGo interface lowering, named interfaces are often erased early into their underlying *types.Interface. That makes it unreliable to recover a stable named owner from the definition-side type materialization path.

So the current implementation chooses:

  • emit InterfaceInfo at the use site
  • specifically, along the path that already produces the interface method demand

This is intentional because:

  • that path always has the relevant interface shape in hand
  • duplicate InterfaceInfo contributions from multiple modules remain semantically safe after whole-program merge
  • this is more reliable than forcing a definition-site-only rule under the current lowering model

In other words, this PR intentionally treats the two differently:

  • MethodInfo
    • heavier and more strongly bound to concrete type ownership, so duplicate emission is reduced aggressively
  • InterfaceInfo
    • definition-site ownership is unstable under current lowering, so use-site emission is preferred to keep the input complete

4. Add internal/semmeta

This PR adds internal/semmeta, which is responsible for:

  • reading llgo.xxx metadata back from a single llvm.Module
  • folding it into semantic ModuleInfo
  • owning the metadata protocol on both the write side and the read side

This layer only handles semantic metadata. It does not handle:

  • ordinary LLVM reference graphs
  • TypeChildren
  • whole-program aggregation
  • DCE analyze / rewrite

5. Use the new metadata read API from goplus/llvm

This PR also switches to the metadata read API that has already landed in goplus/llvm, removing the temporary cgo-based reader previously used by tests.

The current code directly uses:

  • Module.NamedMetadataOperands
  • Value.MDNodeOperands
  • Value.MDString

to read named metadata back from llvm.Module.

6. Add llgen.GenModuleFrom

This PR adds GenModuleFrom to internal/llgen, so callers can directly obtain an llvm.Module.

That makes it possible to use the same llvm.Module for both:

  1. obtaining IR text
  2. reading semantic metadata back

without a stringify-then-reparse detour.

7. Build a stable semantic view for metadata

The readback result is not a raw row dump. It is a stable semantic view grouped by meaning, for example:

  • UseIface
  • UseIfaceMethod
  • InterfaceInfo
  • MethodInfo
  • UseNamedMethod
  • ReflectMethod

This captures semantic content rather than LLVM metadata node numbering or row layout details.

Current Scope

This PR only covers the producer / readback layer of the proposal. It does not include:

  • whole-program DCE analysis after metadata aggregation
  • global method liveness driven by OrdinaryEdges / TypeChildren

In other words, this PR establishes the input contract and observability foundation required by the later algorithm.

Representative Coverage

This PR covers several important semantic categories:

  • ifaceconv
    • interface conversions driving useiface
  • reader / invoke / interface
    • interface method demands
  • embedunexport / geometry1370
    • interface methods with unexported names
  • reflectmk
    • Method, MethodByName, and conservative reflection
  • abimethod
    • named / anonymous / promoted methods and embedded method sets
  • cursor / reflectconv / abinamed
    • large method-slot sets and more complex type coverage

Follow-up Work

Later PRs will continue from here with:

  1. whole-program aggregation of OrdinaryEdges, TypeChildren, and semantic metadata
  2. the analysis that computes type symbol -> live method indexes

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the LLGo compiler's SSA generation by adding a mechanism to emit semantic metadata. Specifically, it now tracks and annotates instances where concrete types are converted to interface types using !llgo.useiface metadata. This change provides more detailed information in the generated LLVM IR, which can be beneficial for downstream analysis or optimization passes. The update also includes the necessary regeneration of test output files to align with this new metadata emission.

Highlights

  • Semantic Metadata Emission: Introduced a new package-level semantic metadata emitter infrastructure within the SSA (Static Single Assignment) form generation.
  • Interface Conversion Metadata: Implemented the emission of !llgo.useiface metadata for non-interface to interface conversions originating from the MakeInterface operation.
  • Test File Regeneration: Regenerated numerous existing IR golden files to reflect the newly introduced metadata, ensuring consistency and correctness.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@xgopilot
Copy link
Copy Markdown
Contributor

xgopilot bot commented Mar 18, 2026

Overall: Clean, well-scoped PR. The semantic metadata emitter is a solid foundation for recording interface conversions. The golden file updates are consistent. A few items below worth addressing — mainly a redundant abiTypeGlobal call and a missing license header.

Comment thread ssa/interface.go Outdated
Comment thread ssa/metadata.go
Comment thread ssa/metadata.go Outdated
Comment thread ssa/metadata.go Outdated
@luoliwoshang luoliwoshang force-pushed the codex/useiface-metadata-producer branch from 1958ac9 to 8e77b05 Compare March 18, 2026 06:58
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an infrastructure for emitting package-level semantic metadata in the SSA backend and uses it to emit !llgo.useiface metadata for non-interface to interface conversions. The implementation includes a new semanticMetadataEmitter to manage and prevent duplicate metadata entries in the LLVM module. The changes are well-structured, with a logical refactoring in abitype.go to support the new functionality in interface.go. The code is clean, correct, and the regenerated golden files confirm the intended behavior. Overall, this is a solid improvement to the compiler's metadata emission capabilities.

@luoliwoshang luoliwoshang changed the title Add llgo.useiface metadata emission ssa: emit llgo.useiface metadata Mar 18, 2026
@luoliwoshang luoliwoshang force-pushed the codex/useiface-metadata-producer branch from 57e867d to b0f7c04 Compare March 18, 2026 08:53
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 18, 2026

Codecov Report

❌ Patch coverage is 96.65272% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.56%. Comparing base (2c3c5c1) to head (957d908).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
internal/semmeta/semmeta.go 95.74% 3 Missing and 3 partials ⚠️
ssa/metadata.go 66.66% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1728      +/-   ##
==========================================
+ Coverage   88.44%   88.56%   +0.12%     
==========================================
  Files          50       52       +2     
  Lines       13656    13891     +235     
==========================================
+ Hits        12078    12303     +225     
- Misses       1369     1374       +5     
- Partials      209      214       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@luoliwoshang luoliwoshang force-pushed the codex/useiface-metadata-producer branch 4 times, most recently from 646581d to 1b2e89b Compare March 18, 2026 10:13
@luoliwoshang luoliwoshang changed the title ssa: emit llgo.useiface metadata ssa: emit deadcode metadata Mar 18, 2026
@luoliwoshang luoliwoshang force-pushed the codex/useiface-metadata-producer branch 4 times, most recently from 728feba to a4b76a5 Compare March 19, 2026 02:47
Comment thread ssa/mdtest/metadata.go Outdated
Copy link
Copy Markdown
Member Author

@luoliwoshang luoliwoshang Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will use goplus/llvm instead

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

already use goplus/llvm ssa: use llvm metadata read APIs

@luoliwoshang luoliwoshang force-pushed the codex/useiface-metadata-producer branch 3 times, most recently from 3e35f66 to 7222776 Compare March 19, 2026 11:09
@luoliwoshang luoliwoshang changed the title ssa: emit deadcode metadata ssa: emit go semantic metadata Mar 19, 2026
@luoliwoshang luoliwoshang force-pushed the codex/useiface-metadata-producer branch 4 times, most recently from b2b62bb to 6cad6b2 Compare March 20, 2026 07:14
@luoliwoshang luoliwoshang force-pushed the codex/useiface-metadata-producer branch 3 times, most recently from f41a4c8 to e8fbec6 Compare April 15, 2026 09:42
@luoliwoshang luoliwoshang force-pushed the codex/useiface-metadata-producer branch from e8fbec6 to ee34c8d Compare April 15, 2026 10:17
@luoliwoshang luoliwoshang changed the title [wip] ssa: emit go semantic metadata ssa: emit go semantic metadata Apr 15, 2026
@zhouguangyuan0718
Copy link
Copy Markdown
Contributor

这部分我不太建议新增meta-expect.txt这种类型的用例,和之前的out.ll问题一样,针对性不足,并且会受其他部分变化的影响。我建议可以新增一些有针对性的测试用例,复用同样的LITTEST机制来做检查,而不是普遍性的测试大量代码的输出。

@luoliwoshang
Copy link
Copy Markdown
Member Author

这部分我不太建议新增meta-expect.txt这种类型的用例,和之前的out.ll问题一样,针对性不足,并且会受其他部分变化的影响。

明白,对于每个cl/testxxx 都输出meta-expect.txt 会导致噪音过多,这里预期会修改为针对性的用例验证metadata的产出。但如果emitter输出逻辑、abi.Type名称稳定其实这个回归也不应该变化,如果abi.Type名称变化这里更新其实是预期的~

我建议可以新增一些有针对性的测试用例,复用同样的LITTEST机制来做检查,而不是普遍性的测试大量代码的输出。

这里复用LITTEST机制 指的是预期在cl/testxxx下面新增对应的in.go,然后标记llvm module 的 metadata的节点产出进行匹配么;
因为这部分想回归的其实是metadata的产出稳定,而不是llvm.module 那里的metadata的节点布局,(后续如果涉及到缓存,可能会落到其他文件)所以这里还是会更倾向于对比的是序列化后的semantic module info ~ ,会更可读。

所以可能会更倾向于在cl/_testmeta下面增加 ifaceuse,interface,reflect等等的针对性用例,每个包内都是一个in.go,以及现在形状的meta-expect.txt.

@zhouguangyuan0718
Copy link
Copy Markdown
Contributor

明白,对于每个cl/testxxx 都输出meta-expect.txt 会导致噪音过多,这里预期会修改为针对性的用例验证metadata的产出。但如果emitter输出逻辑、abi.Type名称稳定其实这个回归也不应该变化,如果abi.Type名称变化这里更新其实是预期的~

主要是需要额外维护这些东西,比如因为其他原因,改了无关用例里的类型名称,或者顺序,字段之类的,还需要再去维护metadata的输出,这就和现在的偶尔改动需要刷新大量的out.ll的情况类似

所以可能会更倾向于在cl/_testmeta下面增加 ifaceuse,interface,reflect等等的针对性用例,每个包内都是一个in.go,以及现在形状的meta-expect.txt.

嗯嗯,我赞同这样的改法,主要是觉得把测试用例的关注点独立出来会比普遍检查输出会更好。

@luoliwoshang luoliwoshang force-pushed the codex/useiface-metadata-producer branch 5 times, most recently from 4c2a85e to 038df93 Compare April 16, 2026 14:26
@luoliwoshang luoliwoshang force-pushed the codex/useiface-metadata-producer branch from 038df93 to 3b43c7f Compare April 16, 2026 14:35
Comment thread ssa/abitype.go Outdated
if obj == nil || obj.Pkg() == nil {
return true
}
return abi.PathOf(obj.Pkg()) == p.Path()
Copy link
Copy Markdown
Contributor

@zhouguangyuan0718 zhouguangyuan0718 Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里有个疑问,看起来类型只有在定义的包中有使用的时候才会判定为true,那对于import的情况,如果在定义的包中没有使用,在其他包中才有使用,是不是会缺少这个type的MethodInfo?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#1728 (comment)
确实是一个误优化!MethodInfo 当前是随 abiUncommonMethods/abiType 按需发射的,而不是定义包统一产出,所以不应该限制为只在 owner package emit。这里将会移除这个条件,和当前 InterfaceInfo 保持一致。

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed at d553c34d553c34

Comment thread ssa/interface.go Outdated
// only on definition/type-processing paths would miss rows such as
// "_llgo_foo/bar.IFmt" even though later whole-program analysis still
// needs that complete interface method set.
if _, ok := types.Unalias(intf.raw.Type).(*types.Named); ok {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

匿名的interface是否也需要InterfaceInfo?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改,匿名的interface也会需要Interface,否则后续全局视角计算下,类型是否实现接口在匿名接口的情况下会误判,已修改,并回归在 cl/_testmeta/interface_anonymous/in.go

@luoliwoshang luoliwoshang force-pushed the codex/useiface-metadata-producer branch from e41d813 to d553c34 Compare April 16, 2026 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants