feat(ent): Ent schema parity with raw SQL#179
Conversation
Phase 1 of the Ent migration (thoughts/plans/2026-04-18-ent-migration-postgresql.md):
make Ent schemas a superset of the raw SQL columns so Phase 2 can safely
collapse the two-database architecture without AutoMigrate dropping data.
- Agent: add ~25 operational fields (phase/activity/tool_name, connection
and container state, limits tracking, stalled detection, ancestry,
state_version, message, deleted_at, last_activity_event, applied_config,
etc.); alias column names via StorageKey so "slug" maps to agent_id and
timestamps to created_at/updated_at; index names match raw SQL DDL;
partial index on deleted_at.
- Grove: add default_runtime_broker_id, shared_dirs, GitHub App integration
fields (installation_id int64, permissions, app_status), git_identity;
named indexes matching raw SQL.
- User: add last_seen; rename created -> created_at; named email index.
- types.go: mirror SharedDir, GitHubTokenPermissions, GitHubAppGroveStatus,
GitIdentityConfig. AgentAppliedConfig is kept as json.RawMessage at the
Ent layer to avoid pulling pkg/api into pkg/ent/schema.
- RuntimeBroker: new schema (1 of 20 raw-SQL-only tables).
- pkg/store/entstore: new package hosting the schema-diff safety gate
(build tag schemadiff). The gate runs sqlite.Migrate and Ent.AutoMigrate
against two in-memory databases and asserts table/column coverage,
excluding the V5 stub tables and api_keys. Invoke via:
go test -tags schemadiff ./pkg/store/entstore/...
- Caller fix in entadapter/group_store.go for the created -> created_at
rename on Agent.
Adds Ent schemas for every non-V5-stub raw SQL table: Template,
HarnessConfig, EnvVar, Secret, UserAccessToken, BrokerSecret,
BrokerJoinToken, NotificationSubscription, Notification,
SubscriptionTemplate, ScheduledEvent, Schedule, GCPServiceAccount,
GitHubInstallation, MessageRecord, MaintenanceOperation,
MaintenanceOperationRun, GroveContributor, GroveSyncState.
Notable encodings:
- GitHubInstallation overrides id to int64 with StorageKey("installation_id")
to match the int64 PK convention used by GitHub App installation ids.
- BrokerSecret/BrokerJoinToken override id to map onto broker_id (1:1 with
runtime_brokers).
- MessageRecord is the Ent type name for the "messages" table to avoid
collision with Ent internals.
- GroveContributor/GroveSyncState: raw SQL uses a composite (grove_id,
broker_id) PK; Ent requires a surrogate id, so we add one and enforce
the composite as a UNIQUE index. Functionally equivalent.
- Partial indexes on scheduled_events.fire_at (WHERE status='pending') and
schedules.next_run_at (WHERE status='active') are expressed via
entsql.IndexWhere.
The schema-diff gate at pkg/store/entstore/schemacoverage_test.go now
passes both the table-presence and column-presence checks, completing the
Phase 1 safety prerequisite from
thoughts/plans/2026-04-18-ent-migration-postgresql.md. Ent-only extras
(access_policies, group_child_groups, group_memberships) are reported as
informational drift — these are Ent's authoritative group/policy schemas
that Phase 2 will replace the V5 stubs with.
Phase 2 prep work: close the remaining drift between raw SQL and Ent schemas so Phase 2 can point Ent at hub.db without surprises. - sqlite: add migration V46 adding delegation_enabled INTEGER NOT NULL DEFAULT 0 to agents. This aligns the raw SQL schema with the Ent-side field used by the policy engine to mark agents whose creator relationship is policy-addressable. - entstore/schemacoverage: add TestSchemaCoverage_ColumnAttributes which compares column type, NOT NULL, default, and PK flag between the two stores. Canonicalizes SQLite type synonyms (UUID/JSON -> TEXT, bool -> integer) and bool defaults (true/false -> 1/0). Skips notnull on PK columns (SQLite does not enforce NOT NULL on TEXT PRIMARY KEY). All remaining drift is documented in knownAttributeDrift with a short reason, covering: Go-side timestamp defaults vs CURRENT_TIMESTAMP, template.NOT NULL weakening for shadow records, composite PK tables (grove_contributors, grove_sync_state — planned to drop in Phase 2), and several raw-SQL DEFAULT-without-NOT-NULL columns where Ent tightens the constraint safely. The gate now asserts three invariants: 1. Every raw SQL table has a matching Ent table (modulo V5 stubs, api_keys). 2. Every raw SQL column has a matching Ent column. 3. Column type/notnull/default/pk match, or the drift is allowlisted. Run with: go test -tags schemadiff ./pkg/store/entstore/...
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
0fd75c9 to
2bb8ab4
Compare
There was a problem hiding this comment.
Code Review
This pull request introduces significant updates to the data model, including the addition of several new entities (BrokerJoinToken, BrokerSecret, EnvVar, GCPServiceAccount, GitHubInstallation, GroveContributor) and the expansion of the Agent and Grove entities with new fields. It also updates the database client initialization to return the underlying *sql.DB handle, allowing for better connection pool management. I have no feedback to provide as these changes appear to be a standard expansion of the schema and infrastructure.
…ults
Round out Phase 1 after code review:
- Schema-diff gate now checks indexes (TestSchemaCoverage_Indexes) in
addition to tables/columns/column-attributes. Catches the existing
expression-index gap on notification_subscriptions (documented in
knownIndexDrift) and will catch any StorageKey typos going forward.
- entc.Open{SQLite,Postgres} now return the underlying *sql.DB so the
gate no longer needs a parallel sql.Open handle. Also unblocks Phase 5
pool config and the advisory-lock migration wrapper.
- GroveContributor/GroveSyncState id fields use DefaultFunc(uuid.NewString)
so Phase 3 callsites don't each have to generate ids.
- Agent.delegation_enabled comment updated — V46 shipped in this branch,
no longer a pending follow-up.
Signed-off-by: Edvin Norling <edvin.norling@kognic.com>
2bb8ab4 to
3b989c4
Compare
Collapses pkg/ent/*.go and pkg/ent/*/*.go in GitHub diff views so reviewers see the hand-written schema changes (pkg/ent/schema/**) and migration/test changes instead of tens of thousands of lines of Ent codegen boilerplate. Schema files remain fully diffable. Signed-off-by: Edvin Norling <edvin.norling@kognic.com>
8e4178e to
281a00e
Compare
Resolves conflict on migrationV46 in pkg/store/sqlite/sqlite.go: both branches claimed slot 46. Upstream's V46 (templates.default_harness_config) keeps the slot since it merged first; this branch's delegation_enabled migration is renumbered to V47. - sqlite: keep upstream V46, add migrationV47 (delegation_enabled) and append it to the migrations slice - ent/schema/agent.go: update comment reference V46 -> V47 - ent/schema/template.go: add default_harness_config field for parity with upstream's V46 so the Phase 1 schema-diff gate keeps passing - regenerate ent code (template, migrate/schema, mutation) Verified: go build ./..., go test ./pkg/store/... ./pkg/ent/... ./cmd/..., and go test -tags schemadiff ./pkg/store/entstore/... all pass. The 7 pkg/hub failures observed locally also fail on pure upstream/main and are unrelated environmental issues (init.defaultBranch=main vs tests expecting master).
My hope with this PR is to start making it possible to enable postgresql by having ent schema parity.
These are the actually intersting files to look at.
I can of course split up this PR into smaller piceis.
Related to #53 (does not close it — this is Phase 1 of the multi-phase migration).
Summary
First in a series of PRs that migrate Scion's persistence layer off the raw SQL
pkg/store/sqliteimplementation onto Ent ORM, with PostgreSQL as a second supported backend.This PR lands only Phase 1: make the Ent schemas a complete superset of every table and column in the raw SQL schema. No behavioral code paths change. The point is to prepare the ground so the next PR (Phase 2) can safely point Ent at the existing
hub.dbwithout Ent'sAutoMigratesilently dropping columns.Why
Today the Hub runs on two databases:
hub.db— raw SQL (pkg/store/sqlite), 45 migrations, 187 methods, holds the bulk of the data (agents, groves, secrets, brokers, etc.).hub.db_ent— Ent ORM, holds groups, policies, and shadow records mirroring users/agents/groves that the authorization engine needs to reason about.This split was a stepping stone. The destination is a single database managed by Ent, with either SQLite or PostgreSQL underneath. Getting there requires four more phases after this one.
What's in this PR
1. Expanded Ent schemas for existing entities
pkg/ent/schema/{agent,grove,user}.gogrew from minimal "principal identity" schemas to full schemas matching every column present in raw SQL.phase,activity,tool_name, connection/container/runtime state, stalled-detection columns, limits tracking,ancestry,state_version,applied_config,deleted_at, etc. Column aliasing viaStorageKeyso Ent's Go fieldSlugstores as raw SQL columnagent_id, and timestamps land oncreated_at/updated_at. Indexes renamed to match raw SQL.default_runtime_broker_id,shared_dirs, GitHub App integration fields (github_installation_idint64,github_permissions,github_app_status),git_identity.last_seen.SharedDir,GitHubTokenPermissions,GitHubAppGroveStatus,GitIdentityConfig).AgentAppliedConfigis kept asjson.RawMessageto avoid pullingpkg/apiintopkg/ent/schema.2. Twenty new Ent schemas for raw-SQL-only tables
pkg/ent/schema/*.go(20 new files): Template, HarnessConfig, EnvVar, Secret, UserAccessToken, BrokerSecret, BrokerJoinToken, NotificationSubscription, Notification, SubscriptionTemplate, ScheduledEvent, Schedule, GCPServiceAccount, GitHubInstallation, MessageRecord, MaintenanceOperation, MaintenanceOperationRun, GroveContributor, GroveSyncState, RuntimeBroker.Notable encodings:
idto anint64column namedinstallation_id, matching the int64 PK that GitHub uses.StorageKey("broker_id")on the id field (1:1 withruntime_brokers).messagestable — "Message" would collide with Ent internals.(grove_id, broker_id). Ent requires a surrogate id, so we add one (withDefaultFunc(uuid.NewString)so Phase 3 callsites don't each have to generate ids) and enforce the composite as aUNIQUEindex. See "Planned drift" below.WHERE status='pending',WHERE status='active') onscheduled_events.fire_atandschedules.next_run_atare expressed viaentsql.IndexWhere.3. Raw SQL migration V46
Adds
delegation_enabled INTEGER NOT NULL DEFAULT 0to theagentstable. This column previously existed only in Ent'shub.db_entshadow table; bringing it intohub.dbcloses a drift point before Phase 2. TrivialALTER TABLE ADD COLUMN, default 0 (false), zero data risk.4.
entc.Open{SQLite,Postgres}expose*sql.DBBoth factory functions now return
(*ent.Client, *sql.DB, error)instead of(*ent.Client, error). The underlying connection was always created by these helpers but was hidden — the schema-diff gate was opening a parallelsql.Openhandle to the same shared-cache in-memory DB just to runPRAGMAintrospection. Exposing the handle lets the gate use the real connection and also unblocks Phase 5's pool-config wiring (SetMaxOpenConns,SetConnMaxLifetime) and the advisory-lock migration wrapper. Six callsites updated; callers that don't need the DB discard it with_.5. Schema-diff safety gate
New package
pkg/store/entstore, test file gated by build tagschemadiff. Run with:Four checks:
api_keystable from V7).true/false→1/0). Remaining drift is listed inknownAttributeDriftwith a short reason — no silent tolerations.WHEREclause. Ent's auto-generated UNIQUE indexes (*_key,*_col1_col2on UNIQUE composites) are reported as informational extras. Known drift: raw SQL'sidx_notification_subs_uniqueis a functional index onCOALESCE(agent_id, '')that Ent's index API can't express — documented inknownIndexDriftand slated for enforcement at the application layer in Phase 3j.The gate is the first line of defense for Phase 2: any schema change that breaks parity fails the test explicitly rather than surprising users at migration time.
Not wired into CI by default — the build tag keeps it as a deliberate local/manual check, consistent with how the parent plan specifies it.
Runtime impact
None in the current binary. Specifically:
initStore()assembly incmd/server_foreground.gois untouched. The Hub still opens both databases, still wires upCompositeStore, still routes groups/policies tohub.db_entand everything else tohub.db.AutoMigratestill runs againsthub.db_entat startup. With the expanded schemas it will now create ~20 additional empty tables inhub.db_ent. They sit unused until Phase 3 routes sub-interfaces to them.hub.db_ent(groups,group_memberships,access_policies,policy_bindings) are unaffected — those Ent schemas were not touched.slug→agent_id,created→created_at, etc.).AutoMigratewill drop the old-named columns and add the new-named columns on the shadow tables inhub.db_ent. Shadow records are derived lazily fromhub.dbviaensureEntUser/Agent/Grove, so rebuilt on next access. User-invisible.If you roll back this PR by checking out the previous commit, nothing needs to be un-done at the database level — the extra empty Ent tables are inert, and V46 is a safe forward-only column add.
Plan for future phases
The parent plan breaks the migration into five phases. This PR lands Phase 1; here's what comes next.
Phase 2 — Database consolidation (next PR)
Point Ent at
hub.dbdirectly and retirehub.db_ent. Rough shape:dropPreEntTablesdrops the V5 stub tables (groups,group_members,policies,policy_bindings) and the composite-PK tables (grove_contributors,grove_sync_state) inhub.dbbefore Ent touches it. The V5 stubs have never held real data on the raw SQL side — CompositeStore has always routed tohub.db_ent. The composite-PK tables use a surrogate id + UNIQUE in Ent but composite PKs in raw SQL; rebuilding in place would cost 2× complex migrations for minimal user benefit, so we drop and let Ent recreate them empty. Release note needed: users will need to re-link their brokers to groves once (a minute of UI clicking); the next workspace sync after upgrade re-uploads once, then runs incrementally as usual.hub.db_entintohub.dbvia the Ent client (idempotent on conflict).hub.db_enttohub.db_ent.bak.AutoMigrateagainsthub.db— now safe, because Phase 1 made the Ent schema a superset.ensureEntUser/Agent/Grove) from CompositeStore.User-visible changes in Phase 2: existing SQLite deployments will re-consolidate their database on startup. One-time broker re-link prompt. Fresh installs are unaffected.
Phase 3 — Sub-interface migration (several small PRs)
Incrementally move each of the 20
store.Storesub-interfaces from the raw SQL implementation to Ent, one or two at a time. CompositeStore overrides each interface as it gets migrated. Tests run unchanged throughout. Ordered by simplest-first in the parent plan.Phase 4 — Cleanup
Delete
pkg/store/sqliteentirely. Replace theCompositeStorewith a directEntStoreimplementingstore.Store. Also the first PR that runs the full test suite against PostgreSQL via testcontainers to catch SQLite-specific assumptions before PG support is declared ready.Phase 5 — PostgreSQL support
Add
case \"postgres\"toinitStore(). Implement connection pool config (already specified in the design doc). ImplementmigrationLockvia PostgreSQL advisory locks so concurrent Hub startups don't race.Planned drift (documented, not blockers)
Recorded in
knownAttributeDriftin the diff test:Each entry carries a one-line reason in the allowlist. Nothing is silently ignored.
Test plan
Pre-existing hub test failures on `main` (OAuth/authorization tests around template file handlers) are unchanged — confirmed present before this branch.
🤖 Generated with Claude Code