Skip to content

feat(indexer): vendor go-openaudio/pkg/etl as the block indexer#840

Open
raymondjacobson wants to merge 2 commits into
mainfrom
api/vendor-etl
Open

feat(indexer): vendor go-openaudio/pkg/etl as the block indexer#840
raymondjacobson wants to merge 2 commits into
mainfrom
api/vendor-etl

Conversation

@raymondjacobson
Copy link
Copy Markdown
Member

@raymondjacobson raymondjacobson commented May 20, 2026

Summary

Replaces the in-tree CoreIndexer block-fetching loop (which only handled CreateUser) with the full ETL indexer from github.com/OpenAudio/go-openaudio/pkg/etl.

ETL gives us the full 31-entity-type handler suite (users, tracks, playlists, follows, saves, reposts, comments, events, grants, developer apps, tip reactions, associated wallets, etc.) plus the scheduled-release publisher.

Pinning policy

Pinned to main (pseudo-version), not a release tag. This PR's go.mod resolves to the current HEAD of go-openaudio/main at the time of merge. Subsequent bumps land via:

go get github.com/OpenAudio/go-openaudio@main github.com/OpenAudio/go-openaudio/pkg/etl@main
go mod tidy

Why main, not tag: pkg/etl is the locus of all parity work for the cutover and is changing every few days; release-please tag cuts add real lag right when we want to ship a fix. Pseudo-versions still lock a specific commit via go.sum, so builds remain reproducible — we just trade nicer-looking version strings for fewer round trips with release-please.

Switching back later: when pkg/etl stabilizes post-cutover, switch to tagged pins with a one-line edit (go get @v1.4.x).

Changes

File What
go.mod / go.sum Pin github.com/OpenAudio/go-openaudio and github.com/OpenAudio/go-openaudio/pkg/etl to a main pseudo-version
indexer/indexer.go Rewritten. CoreIndexer.Start runs etl.Indexer.Run() alongside the existing AggregatesCalculator via errgroup. The previous block-fetching loop (run, attemptProcessNextBlock, handleBlock, handleManageEntity) is gone.
indexer/index_user.go + test Deleted. The only operation the old handler implemented was CreateUser, which ETL now handles along with the other 30 entity types.
indexer/constants.go Kept — the Action_* constants are still used by api/api/v1_*.go handlers when building outgoing ManageEntity write transactions (not part of the indexing path).
api/health_check.go Switched the indexer-lag query from indexing_checkpoints.last_checkpoint (the old in-tree tracker) to MAX(height) FROM core_indexed_blocks (ETL's per-block tracker). Same semantic, different table.

ETL configuration choices

  • SkipMigrations: false (default). Migrations are idempotent against api/'s schema — every ETL migration uses CREATE TABLE IF NOT EXISTS / ADD COLUMN IF NOT EXISTS. Verified by applying all current ETL migrations on top of a fresh DB seeded with api/'s schema: zero errors, only NOTICE messages for already-existing relations. ETL tracks its own migration state in etl_db_migrations separate from api/'s schema_version, so no state-table collision.
  • DisableMaterializedViewRefresh(): refreshes mv_dashboard_* views that don't exist in api/'s schema.
  • DisablePgNotifyListener(): publishes block/play events to a channel api/ has no consumer for.
  • ScheduledReleasePublisher stays enabled — it's the same job apps' Python publish_scheduled_releases celery task did and we want it running here.

Caveat

etl.Indexer.Run() uses its own internal context.Background() rather than honoring api/'s shutdown ctx — graceful shutdown via ctx cancellation isn't supported by the upstream API today. Process termination (SIGTERM) still works via Go's normal exit path, and DB connections drain via pool finalizers on process exit. Acceptable tradeoff to avoid forking ETL; can be patched upstream later if it matters.

Concurrency hazard during cutover

If the legacy Python discovery-provider is still running against the same DB when this deploys, two of ETL's tables will see racy writes from both writers and could end up double-counted:

  • hourly_play_counts (Python's index_hourly_play_counts)
  • user_listening_history (Python's index_user_listening_history)

These both use checkpoint-cursored additive upserts. Coordinate the cutover: stop the Python jobs before deploying api/ with this PR.

Trending, scheduled-release publish, prune, delist statuses, and the other ETL/jobs flows are idempotent and safe to run alongside Python during the transition window.

Stacking with #834

PR #834 (parity jobs) adds startParityJobs(ctx) to CoreIndexer.Start. This PR rewrites that file. Whichever lands second needs a one-line rebase to slot ci.startParityJobs(ctx) back into the errgroup section. Both PRs are small enough that order doesn't matter much.

Verified

  • go build ./... clean
  • go vet ./indexer/ ./api/ ./jobs/ clean
  • go get @main resolves cleanly; pseudo-version captures all 20+ post-v1.3.0 ETL commits including the parity-5 work, fan-club fields, playlist_seen PK fix, developer_apps UNIQUE drop, dispatch-error fix, and auto-subscribe-on-remix-contest behavior.
  • ETL migrations apply cleanly on top of api/'s schema (proven separately).

🤖 Generated with Claude Code

Replaces the in-tree CoreIndexer block-fetching loop (which only handled
CreateUser) with the full ETL indexer from
github.com/OpenAudio/go-openaudio/pkg/etl@v1.3.0. ETL gives us the 31-entity-
type handler suite plus scheduled-release publishing, kept in sync with
upstream via tagged releases (lockstep with go-openaudio root per upstream
release-please config).

Changes:

- go.mod: add github.com/OpenAudio/go-openaudio/pkg/etl v1.3.0 and bump
  parent github.com/OpenAudio/go-openaudio to v1.3.0.
- indexer/indexer.go: rewritten. CoreIndexer.Start runs etl.Indexer.Run()
  alongside the existing AggregatesCalculator via errgroup. The previous
  block-fetching loop (run / attemptProcessNextBlock / handleBlock /
  handleManageEntity) is gone.
- indexer/index_user.go + index_user_test.go: deleted. The only operation
  the old handler implemented was CreateUser, which ETL now handles along
  with the other 30 entity types.
- indexer/constants.go: kept — the Action_* constants are still used by
  api/api/v1_*.go handlers when building outgoing ManageEntity write
  transactions (not part of the indexing path).
- api/health_check.go: switched the indexer-lag query from
  indexing_checkpoints.last_checkpoint (the old in-tree tracker) to
  MAX(height) FROM core_indexed_blocks (ETL's per-block tracker). Same
  semantic, different table.

ETL config: SkipMigrations is left false. Migrations are idempotent against
api/'s schema (verified by applying all 21 current ETL migrations on top of
a fresh DB seeded with api/'s schema: zero errors, only NOTICE messages for
already-existing relations). ETL tracks its own state in etl_db_migrations
separate from api/'s schema_version, so there's no collision.

Two ETL components are explicitly disabled when embedded here:

- MaterializedViewRefresh: refreshes mv_dashboard_* views that don't exist
  in api/'s schema.
- PgNotifyListener: publishes block/play events to a channel api/ has no
  consumer for.

ScheduledReleasePublisher stays enabled — it covers the
publish_scheduled_releases celery task gap.

Caveat: etl.Indexer.Run() uses its own internal context.Background()
rather than honoring api/'s shutdown ctx — graceful shutdown via ctx
cancellation isn't supported by the upstream API today. Process termination
(SIGTERM) still works via Go's normal exit path. Acceptable tradeoff to
avoid forking ETL.
Switches the pin from v1.3.0 to the current commit on go-openaudio's
main branch (resolves to 9a14058 at time of writing). This captures 20+
commits that landed after v1.3.0, including all of the parity-5 work,
playlist_seen PK fix, the developer_apps UNIQUE drop, fan-club text post
fields, and the auto-subscribe-on-remix-contest behavior.

Pinning policy going forward: bump via `go get @main` rather than chasing
release-please tag cuts. Pseudo-versions still lock a specific commit, so
builds remain reproducible via go.sum. The trade-off is uglier go.mod
strings in exchange for not waiting on a tag bump every time we land a
parity fix in go-openaudio.

After pkg/etl stabilizes post-cutover this can switch back to tagged
pinning with a one-line edit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant