feat(indexer): vendor go-openaudio/pkg/etl as the block indexer#840
Open
raymondjacobson wants to merge 2 commits into
Open
feat(indexer): vendor go-openaudio/pkg/etl as the block indexer#840raymondjacobson wants to merge 2 commits into
raymondjacobson wants to merge 2 commits into
Conversation
Replaces the in-tree CoreIndexer block-fetching loop (which only handled CreateUser) with the full ETL indexer from github.com/OpenAudio/go-openaudio/pkg/etl@v1.3.0. ETL gives us the 31-entity- type handler suite plus scheduled-release publishing, kept in sync with upstream via tagged releases (lockstep with go-openaudio root per upstream release-please config). Changes: - go.mod: add github.com/OpenAudio/go-openaudio/pkg/etl v1.3.0 and bump parent github.com/OpenAudio/go-openaudio to v1.3.0. - indexer/indexer.go: rewritten. CoreIndexer.Start runs etl.Indexer.Run() alongside the existing AggregatesCalculator via errgroup. The previous block-fetching loop (run / attemptProcessNextBlock / handleBlock / handleManageEntity) is gone. - indexer/index_user.go + index_user_test.go: deleted. The only operation the old handler implemented was CreateUser, which ETL now handles along with the other 30 entity types. - indexer/constants.go: kept — the Action_* constants are still used by api/api/v1_*.go handlers when building outgoing ManageEntity write transactions (not part of the indexing path). - api/health_check.go: switched the indexer-lag query from indexing_checkpoints.last_checkpoint (the old in-tree tracker) to MAX(height) FROM core_indexed_blocks (ETL's per-block tracker). Same semantic, different table. ETL config: SkipMigrations is left false. Migrations are idempotent against api/'s schema (verified by applying all 21 current ETL migrations on top of a fresh DB seeded with api/'s schema: zero errors, only NOTICE messages for already-existing relations). ETL tracks its own state in etl_db_migrations separate from api/'s schema_version, so there's no collision. Two ETL components are explicitly disabled when embedded here: - MaterializedViewRefresh: refreshes mv_dashboard_* views that don't exist in api/'s schema. - PgNotifyListener: publishes block/play events to a channel api/ has no consumer for. ScheduledReleasePublisher stays enabled — it covers the publish_scheduled_releases celery task gap. Caveat: etl.Indexer.Run() uses its own internal context.Background() rather than honoring api/'s shutdown ctx — graceful shutdown via ctx cancellation isn't supported by the upstream API today. Process termination (SIGTERM) still works via Go's normal exit path. Acceptable tradeoff to avoid forking ETL.
Switches the pin from v1.3.0 to the current commit on go-openaudio's main branch (resolves to 9a14058 at time of writing). This captures 20+ commits that landed after v1.3.0, including all of the parity-5 work, playlist_seen PK fix, the developer_apps UNIQUE drop, fan-club text post fields, and the auto-subscribe-on-remix-contest behavior. Pinning policy going forward: bump via `go get @main` rather than chasing release-please tag cuts. Pseudo-versions still lock a specific commit, so builds remain reproducible via go.sum. The trade-off is uglier go.mod strings in exchange for not waiting on a tag bump every time we land a parity fix in go-openaudio. After pkg/etl stabilizes post-cutover this can switch back to tagged pinning with a one-line edit.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replaces the in-tree
CoreIndexerblock-fetching loop (which only handledCreateUser) with the full ETL indexer fromgithub.com/OpenAudio/go-openaudio/pkg/etl.ETL gives us the full 31-entity-type handler suite (users, tracks, playlists, follows, saves, reposts, comments, events, grants, developer apps, tip reactions, associated wallets, etc.) plus the scheduled-release publisher.
Pinning policy
Pinned to
main(pseudo-version), not a release tag. This PR'sgo.modresolves to the current HEAD ofgo-openaudio/mainat the time of merge. Subsequent bumps land via:Why main, not tag: pkg/etl is the locus of all parity work for the cutover and is changing every few days; release-please tag cuts add real lag right when we want to ship a fix. Pseudo-versions still lock a specific commit via
go.sum, so builds remain reproducible — we just trade nicer-looking version strings for fewer round trips with release-please.Switching back later: when pkg/etl stabilizes post-cutover, switch to tagged pins with a one-line edit (
go get @v1.4.x).Changes
go.mod/go.sumgithub.com/OpenAudio/go-openaudioandgithub.com/OpenAudio/go-openaudio/pkg/etlto amainpseudo-versionindexer/indexer.goCoreIndexer.Startrunsetl.Indexer.Run()alongside the existingAggregatesCalculatorvia errgroup. The previous block-fetching loop (run,attemptProcessNextBlock,handleBlock,handleManageEntity) is gone.indexer/index_user.go+ testCreateUser, which ETL now handles along with the other 30 entity types.indexer/constants.goAction_*constants are still used byapi/api/v1_*.gohandlers when building outgoing ManageEntity write transactions (not part of the indexing path).api/health_check.goindexing_checkpoints.last_checkpoint(the old in-tree tracker) toMAX(height) FROM core_indexed_blocks(ETL's per-block tracker). Same semantic, different table.ETL configuration choices
SkipMigrations: false(default). Migrations are idempotent against api/'s schema — every ETL migration usesCREATE TABLE IF NOT EXISTS/ADD COLUMN IF NOT EXISTS. Verified by applying all current ETL migrations on top of a fresh DB seeded with api/'s schema: zero errors, only NOTICE messages for already-existing relations. ETL tracks its own migration state inetl_db_migrationsseparate from api/'sschema_version, so no state-table collision.DisableMaterializedViewRefresh(): refreshesmv_dashboard_*views that don't exist in api/'s schema.DisablePgNotifyListener(): publishes block/play events to a channel api/ has no consumer for.ScheduledReleasePublisherstays enabled — it's the same job apps' Pythonpublish_scheduled_releasescelery task did and we want it running here.Caveat
etl.Indexer.Run()uses its own internalcontext.Background()rather than honoring api/'s shutdown ctx — graceful shutdown via ctx cancellation isn't supported by the upstream API today. Process termination (SIGTERM) still works via Go's normal exit path, and DB connections drain via pool finalizers on process exit. Acceptable tradeoff to avoid forking ETL; can be patched upstream later if it matters.Concurrency hazard during cutover
If the legacy Python discovery-provider is still running against the same DB when this deploys, two of ETL's tables will see racy writes from both writers and could end up double-counted:
hourly_play_counts(Python's index_hourly_play_counts)user_listening_history(Python's index_user_listening_history)These both use checkpoint-cursored additive upserts. Coordinate the cutover: stop the Python jobs before deploying api/ with this PR.
Trending, scheduled-release publish, prune, delist statuses, and the other ETL/jobs flows are idempotent and safe to run alongside Python during the transition window.
Stacking with #834
PR #834 (parity jobs) adds
startParityJobs(ctx)toCoreIndexer.Start. This PR rewrites that file. Whichever lands second needs a one-line rebase to slotci.startParityJobs(ctx)back into the errgroup section. Both PRs are small enough that order doesn't matter much.Verified
go build ./...cleango vet ./indexer/ ./api/ ./jobs/cleango get @mainresolves cleanly; pseudo-version captures all 20+ post-v1.3.0 ETL commits including the parity-5 work, fan-club fields, playlist_seen PK fix, developer_apps UNIQUE drop, dispatch-error fix, and auto-subscribe-on-remix-contest behavior.🤖 Generated with Claude Code