Skip to content

fix(github): create domain accounts for non-committer authors (#8886)#8894

Open
JAORMX wants to merge 1 commit into
apache:mainfrom
JAORMX:fix/github-orphan-accounts-8886
Open

fix(github): create domain accounts for non-committer authors (#8886)#8894
JAORMX wants to merge 1 commit into
apache:mainfrom
JAORMX:fix/github-orphan-accounts-8886

Conversation

@JAORMX
Copy link
Copy Markdown
Contributor

@JAORMX JAORMX commented May 30, 2026

Problem

issues.creator_id and pull_requests.author_id are written for every author, but a domain accounts row is only created for users we collected a full profile for (effectively, committers). Authors who only filed an issue, or opened a PR without committing, become orphan foreign keys:

SELECT i.url, i.creator_id, i.creator_name, a.user_name
  FROM issues i
  LEFT JOIN accounts a ON a.id = i.creator_id
 WHERE i.url = 'https://github.com/stacklok/toolhive/issues/4297';

creator_id is set and creator_name is milichev, but a.user_name is NULL. Bot filters keying on accounts.user_name LIKE '%[bot]' miss those rows for the same reason.

Fixes #8886.

Root cause

The issue and PR extractors already record every author in _tool_github_repo_accounts. But ConvertAccounts sourced the domain accounts table FROM _tool_github_accounts, which is only populated for users we fetched full profiles for. So the convertors generate creator_id / author_id for everyone, while accounts only ever gets the committers.

Change

  • ConvertAccounts now reads FROM _tool_github_repo_accounts LEFT JOIN _tool_github_accounts, so every user the repo references becomes a domain account: enriched with name/email/avatar when we have the detail, login-only otherwise. The domain id uses the same didgen generator the issue/PR convertors use, so the foreign keys line up.
  • pr_extractor also emits a repo_account for the merged_by user, which fixes pull_requests.merged_by_id (that user wasn't recorded anywhere before).
  • The query is MySQL/PostgreSQL-agnostic (COALESCE not IFNULL, no backtick quoting, values parameterized via the dal). The join mirrors the existing one in account_org_collector.go.

Testing

  • The e2e fixture didn't exercise the bug (every referenced account already had a profile row), so I added the orphan case from the issue: milichev, referenced by the repo with no detail row. It now gets a login-only accounts row.
  • Added a referential-integrity assertion to TestAccountDataFlow: every account the repo references must resolve to a domain accounts row, generated with the same id generator the convertors use. Verified it fails against the old converter and passes with the fix.
  • Full plugins/github/e2e suite passes on both MySQL and PostgreSQL.

Out of scope

Plural issue/PR assignees and PR requested reviewers aren't seeded into _tool_github_repo_accounts, so those FKs can still be unresolved. The github_graphql plugin likely shares the root cause. Happy to follow up on these separately.

🤖 Generated with Claude Code

…#8886)

ConvertAccounts sourced the domain `accounts` table FROM _tool_github_accounts,
which is only populated for users we collected full profiles for (effectively,
committers). Issue and PR authors who never committed were written into
_tool_github_repo_accounts but never converted, so issues.creator_id and
pull_requests.author_id pointed at accounts rows that didn't exist.

Source ConvertAccounts FROM _tool_github_repo_accounts LEFT JOIN
_tool_github_accounts instead, so every user the repo references gets a domain
account, enriched with profile detail when we have it and login-only otherwise.
The domain id uses the same generator the issue/PR convertors use, so the FKs
line up. Also emit a repo_account for a PR's merged_by user so
pull_requests.merged_by_id resolves too.

The query stays MySQL/PostgreSQL-agnostic (COALESCE, no backtick quoting,
parameterized via the dal) and mirrors the join already in
account_org_collector.go.

Adds the non-committer orphan case to the e2e fixture plus a referential-
integrity assertion in TestAccountDataFlow. Verified on both MySQL and
PostgreSQL.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@dosubot dosubot Bot added size:M This PR changes 30-99 lines, ignoring generated files. component/plugins This issue or PR relates to plugins pr-type/bug-fix This PR fixes a bug labels May 30, 2026
github:GithubAccount:1:7496278,i@andypan.me,Andy Pan,panjf2000,https://avatars.githubusercontent.com/u/7496278?v=4,,"{""ConnectionId"":1,""Name"":""panjf2000/ants""}",_raw_github_api_accounts,5,
github:GithubAccount:1:8518239,badger@gitter.im,The Gitter Badger,gitter-badger,https://avatars.githubusercontent.com/u/8518239?v=4,,"{""ConnectionId"":1,""Name"":""panjf2000/ants""}",_raw_github_api_accounts,9,
github:GithubAccount:1:964542,sarath.sp06@gmail.com,Sarath Sadasivan Pillai,sarathsp06,https://avatars.githubusercontent.com/u/964542?v=4,"exotel,leadmrktr,shellagilehub,odysseyhack,boodltech","{""ConnectionId"":1,""Name"":""panjf2000/ants""}",_raw_github_api_accounts,1,
github:GithubAccount:1:1052632,runner.mei@,runner,runner-mei,https://avatars.githubusercontent.com/u/1052632?v=4,,,,0,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_raw_data fields are missing, please fix it. Thanks.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, will do when I'm back on my computer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component/plugins This issue or PR relates to plugins pr-type/bug-fix This PR fixes a bug size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GitHub plugin: domain accounts row missing for non-contributor authors, leaving issues.creator_id / pull_requests.author_id as orphan FKs

2 participants