Skip to content

feat(ent): Ent schema parity with raw SQL#179

Open
nissessenap wants to merge 6 commits into
GoogleCloudPlatform:mainfrom
nissessenap:postgresql_phase1
Open

feat(ent): Ent schema parity with raw SQL#179
nissessenap wants to merge 6 commits into
GoogleCloudPlatform:mainfrom
nissessenap:postgresql_phase1

Conversation

@nissessenap
Copy link
Copy Markdown
Contributor

@nissessenap nissessenap commented Apr 22, 2026

My hope with this PR is to start making it possible to enable postgresql by having ent schema parity.

These are the actually intersting files to look at.
I can of course split up this PR into smaller piceis.

  - pkg/ent/schema/ — all the schema definitions (agent, grove, user, + 19 new ones)
  - pkg/ent/entc/client.go, pkg/ent/entc/client_test.go
  - pkg/store/entstore/schemacoverage_test.go
  - pkg/store/entadapter/composite_test.go, group_store_test.go, policy_store_test.go
  - pkg/store/sqlite/sqlite.go (V46 migration)
  - cmd/server_foreground.go
  - .gitattributes # for redability in the PR

Related to #53 (does not close it — this is Phase 1 of the multi-phase migration).

  • Tests pass
  • Appropriate changes to documentation are included in the PR

Summary

First in a series of PRs that migrate Scion's persistence layer off the raw SQL pkg/store/sqlite implementation onto Ent ORM, with PostgreSQL as a second supported backend.

This PR lands only Phase 1: make the Ent schemas a complete superset of every table and column in the raw SQL schema. No behavioral code paths change. The point is to prepare the ground so the next PR (Phase 2) can safely point Ent at the existing hub.db without Ent's AutoMigrate silently dropping columns.

Why

Today the Hub runs on two databases:

  • hub.db — raw SQL (pkg/store/sqlite), 45 migrations, 187 methods, holds the bulk of the data (agents, groves, secrets, brokers, etc.).
  • hub.db_ent — Ent ORM, holds groups, policies, and shadow records mirroring users/agents/groves that the authorization engine needs to reason about.

This split was a stepping stone. The destination is a single database managed by Ent, with either SQLite or PostgreSQL underneath. Getting there requires four more phases after this one.

What's in this PR

1. Expanded Ent schemas for existing entities

pkg/ent/schema/{agent,grove,user}.go grew from minimal "principal identity" schemas to full schemas matching every column present in raw SQL.

  • Agent — added ~25 operational fields: phase, activity, tool_name, connection/container/runtime state, stalled-detection columns, limits tracking, ancestry, state_version, applied_config, deleted_at, etc. Column aliasing via StorageKey so Ent's Go field Slug stores as raw SQL column agent_id, and timestamps land on created_at / updated_at. Indexes renamed to match raw SQL.
  • Grove — added default_runtime_broker_id, shared_dirs, GitHub App integration fields (github_installation_id int64, github_permissions, github_app_status), git_identity.
  • User — added last_seen.
  • types.go — mirrored the JSON blob types (SharedDir, GitHubTokenPermissions, GitHubAppGroveStatus, GitIdentityConfig). AgentAppliedConfig is kept as json.RawMessage to avoid pulling pkg/api into pkg/ent/schema.

2. Twenty new Ent schemas for raw-SQL-only tables

pkg/ent/schema/*.go (20 new files): Template, HarnessConfig, EnvVar, Secret, UserAccessToken, BrokerSecret, BrokerJoinToken, NotificationSubscription, Notification, SubscriptionTemplate, ScheduledEvent, Schedule, GCPServiceAccount, GitHubInstallation, MessageRecord, MaintenanceOperation, MaintenanceOperationRun, GroveContributor, GroveSyncState, RuntimeBroker.

Notable encodings:

  • GitHubInstallation overrides Ent's default id to an int64 column named installation_id, matching the int64 PK that GitHub uses.
  • BrokerSecret / BrokerJoinToken use StorageKey("broker_id") on the id field (1:1 with runtime_brokers).
  • MessageRecord is the Ent type name for the messages table — "Message" would collide with Ent internals.
  • GroveContributor / GroveSyncState — raw SQL uses a composite PK (grove_id, broker_id). Ent requires a surrogate id, so we add one (with DefaultFunc(uuid.NewString) so Phase 3 callsites don't each have to generate ids) and enforce the composite as a UNIQUE index. See "Planned drift" below.
  • Partial indexes (WHERE status='pending', WHERE status='active') on scheduled_events.fire_at and schedules.next_run_at are expressed via entsql.IndexWhere.

3. Raw SQL migration V46

Adds delegation_enabled INTEGER NOT NULL DEFAULT 0 to the agents table. This column previously existed only in Ent's hub.db_ent shadow table; bringing it into hub.db closes a drift point before Phase 2. Trivial ALTER TABLE ADD COLUMN, default 0 (false), zero data risk.

4. entc.Open{SQLite,Postgres} expose *sql.DB

Both factory functions now return (*ent.Client, *sql.DB, error) instead of (*ent.Client, error). The underlying connection was always created by these helpers but was hidden — the schema-diff gate was opening a parallel sql.Open handle to the same shared-cache in-memory DB just to run PRAGMA introspection. Exposing the handle lets the gate use the real connection and also unblocks Phase 5's pool-config wiring (SetMaxOpenConns, SetConnMaxLifetime) and the advisory-lock migration wrapper. Six callsites updated; callers that don't need the DB discard it with _.

5. Schema-diff safety gate

New package pkg/store/entstore, test file gated by build tag schemadiff. Run with:

go test -tags schemadiff ./pkg/store/entstore/...

Four checks:

  1. Table presence — every raw SQL table has a matching Ent table (modulo V5 stubs and the orphaned api_keys table from V7).
  2. Column presence — every raw SQL column has a matching Ent column.
  3. Column attributes — type, NOT NULL, default, and PK flag match across the two schemas, with SQLite-aware canonicalization (UUID/JSON → TEXT, bool → integer affinity, true/false1/0). Remaining drift is listed in knownAttributeDrift with a short reason — no silent tolerations.
  4. Index coverage — every user-declared raw SQL index has a matching Ent index with the same column list, UNIQUE flag, and partial-index WHERE clause. Ent's auto-generated UNIQUE indexes (*_key, *_col1_col2 on UNIQUE composites) are reported as informational extras. Known drift: raw SQL's idx_notification_subs_unique is a functional index on COALESCE(agent_id, '') that Ent's index API can't express — documented in knownIndexDrift and slated for enforcement at the application layer in Phase 3j.

The gate is the first line of defense for Phase 2: any schema change that breaks parity fails the test explicitly rather than surprising users at migration time.

Not wired into CI by default — the build tag keeps it as a deliberate local/manual check, consistent with how the parent plan specifies it.

Runtime impact

None in the current binary. Specifically:

  • initStore() assembly in cmd/server_foreground.go is untouched. The Hub still opens both databases, still wires up CompositeStore, still routes groups/policies to hub.db_ent and everything else to hub.db.
  • No new write paths. The 20 new Ent schemas generate Go types that are unused by any handler or service.
  • AutoMigrate still runs against hub.db_ent at startup. With the expanded schemas it will now create ~20 additional empty tables in hub.db_ent. They sit unused until Phase 3 routes sub-interfaces to them.
  • The authoritative group/policy/binding tables in hub.db_ent (groups, group_memberships, access_policies, policy_bindings) are unaffected — those Ent schemas were not touched.
  • Shadow record side effect: the Agent/Grove/User schema expansions include column renames (slugagent_id, createdcreated_at, etc.). AutoMigrate will drop the old-named columns and add the new-named columns on the shadow tables in hub.db_ent. Shadow records are derived lazily from hub.db via ensureEntUser/Agent/Grove, so rebuilt on next access. User-invisible.
  • Raw SQL migration V46 adds one column with a default of 0; existing rows get the default. No data changes.

If you roll back this PR by checking out the previous commit, nothing needs to be un-done at the database level — the extra empty Ent tables are inert, and V46 is a safe forward-only column add.

Plan for future phases

The parent plan breaks the migration into five phases. This PR lands Phase 1; here's what comes next.

Phase 2 — Database consolidation (next PR)

Point Ent at hub.db directly and retire hub.db_ent. Rough shape:

  1. dropPreEntTables drops the V5 stub tables (groups, group_members, policies, policy_bindings) and the composite-PK tables (grove_contributors, grove_sync_state) in hub.db before Ent touches it. The V5 stubs have never held real data on the raw SQL side — CompositeStore has always routed to hub.db_ent. The composite-PK tables use a surrogate id + UNIQUE in Ent but composite PKs in raw SQL; rebuilding in place would cost 2× complex migrations for minimal user benefit, so we drop and let Ent recreate them empty. Release note needed: users will need to re-link their brokers to groves once (a minute of UI clicking); the next workspace sync after upgrade re-uploads once, then runs incrementally as usual.
  2. Copy groups/policies/memberships/bindings from hub.db_ent into hub.db via the Ent client (idempotent on conflict).
  3. Rename hub.db_ent to hub.db_ent.bak.
  4. Run Ent's AutoMigrate against hub.db — now safe, because Phase 1 made the Ent schema a superset.
  5. Remove shadow syncing (ensureEntUser/Agent/Grove) from CompositeStore.

User-visible changes in Phase 2: existing SQLite deployments will re-consolidate their database on startup. One-time broker re-link prompt. Fresh installs are unaffected.

Phase 3 — Sub-interface migration (several small PRs)

Incrementally move each of the 20 store.Store sub-interfaces from the raw SQL implementation to Ent, one or two at a time. CompositeStore overrides each interface as it gets migrated. Tests run unchanged throughout. Ordered by simplest-first in the parent plan.

Phase 4 — Cleanup

Delete pkg/store/sqlite entirely. Replace the CompositeStore with a direct EntStore implementing store.Store. Also the first PR that runs the full test suite against PostgreSQL via testcontainers to catch SQLite-specific assumptions before PG support is declared ready.

Phase 5 — PostgreSQL support

Add case \"postgres\" to initStore(). Implement connection pool config (already specified in the design doc). Implement migrationLock via PostgreSQL advisory locks so concurrent Hub startups don't race.

Planned drift (documented, not blockers)

Recorded in knownAttributeDrift in the diff test:

  • `agents.template`: Ent keeps it `Optional()` so shadow records don't need to set it. Raw SQL has `NOT NULL`. Phase 2 AutoMigrate weakens the constraint (safe).
  • `grove_contributors` / `grove_sync_state` composite PK: will be dropped in Phase 2 (see above).
  • Timestamp defaults: raw SQL uses `CURRENT_TIMESTAMP`; Ent uses Go-side `time.Now()`. No caller relies on the SQL default. Phase 2 AutoMigrate drops the SQL defaults.
  • `DEFAULT ...` without `NOT NULL` on several agent/broker columns: Ent tightens to `NOT NULL DEFAULT x`. Safe because the backfill migrations ensure no existing rows are NULL.

Each entry carries a one-line reason in the allowlist. Nothing is silently ignored.

Test plan

  • `go generate ./pkg/ent/...` clean
  • `go build ./...` clean
  • `go vet ./...` clean
  • `make test-fast` green
  • `go test -tags schemadiff ./pkg/store/entstore/...` green across all four gate tests (tables, columns, column attributes, indexes)
  • `go test ./pkg/store/sqlite/... ./pkg/store/entadapter/... ./pkg/ent/entc/...` green
  • Manually inspect a fresh Hub start to confirm new `hub.db_ent` tables are created and no existing data is touched. (reviewer)
  • Manually upgrade a Hub that already has `hub.db` + `hub.db_ent`; confirm the new shadow-table columns appear and no startup errors are logged. (reviewer)

Pre-existing hub test failures on `main` (OAuth/authorization tests around template file handlers) are unchanged — confirmed present before this branch.

🤖 Generated with Claude Code

Phase 1 of the Ent migration (thoughts/plans/2026-04-18-ent-migration-postgresql.md):
make Ent schemas a superset of the raw SQL columns so Phase 2 can safely
collapse the two-database architecture without AutoMigrate dropping data.

- Agent: add ~25 operational fields (phase/activity/tool_name, connection
  and container state, limits tracking, stalled detection, ancestry,
  state_version, message, deleted_at, last_activity_event, applied_config,
  etc.); alias column names via StorageKey so "slug" maps to agent_id and
  timestamps to created_at/updated_at; index names match raw SQL DDL;
  partial index on deleted_at.
- Grove: add default_runtime_broker_id, shared_dirs, GitHub App integration
  fields (installation_id int64, permissions, app_status), git_identity;
  named indexes matching raw SQL.
- User: add last_seen; rename created -> created_at; named email index.
- types.go: mirror SharedDir, GitHubTokenPermissions, GitHubAppGroveStatus,
  GitIdentityConfig. AgentAppliedConfig is kept as json.RawMessage at the
  Ent layer to avoid pulling pkg/api into pkg/ent/schema.
- RuntimeBroker: new schema (1 of 20 raw-SQL-only tables).
- pkg/store/entstore: new package hosting the schema-diff safety gate
  (build tag schemadiff). The gate runs sqlite.Migrate and Ent.AutoMigrate
  against two in-memory databases and asserts table/column coverage,
  excluding the V5 stub tables and api_keys. Invoke via:
    go test -tags schemadiff ./pkg/store/entstore/...
- Caller fix in entadapter/group_store.go for the created -> created_at
  rename on Agent.
Adds Ent schemas for every non-V5-stub raw SQL table: Template,
HarnessConfig, EnvVar, Secret, UserAccessToken, BrokerSecret,
BrokerJoinToken, NotificationSubscription, Notification,
SubscriptionTemplate, ScheduledEvent, Schedule, GCPServiceAccount,
GitHubInstallation, MessageRecord, MaintenanceOperation,
MaintenanceOperationRun, GroveContributor, GroveSyncState.

Notable encodings:
- GitHubInstallation overrides id to int64 with StorageKey("installation_id")
  to match the int64 PK convention used by GitHub App installation ids.
- BrokerSecret/BrokerJoinToken override id to map onto broker_id (1:1 with
  runtime_brokers).
- MessageRecord is the Ent type name for the "messages" table to avoid
  collision with Ent internals.
- GroveContributor/GroveSyncState: raw SQL uses a composite (grove_id,
  broker_id) PK; Ent requires a surrogate id, so we add one and enforce
  the composite as a UNIQUE index. Functionally equivalent.
- Partial indexes on scheduled_events.fire_at (WHERE status='pending') and
  schedules.next_run_at (WHERE status='active') are expressed via
  entsql.IndexWhere.

The schema-diff gate at pkg/store/entstore/schemacoverage_test.go now
passes both the table-presence and column-presence checks, completing the
Phase 1 safety prerequisite from
thoughts/plans/2026-04-18-ent-migration-postgresql.md. Ent-only extras
(access_policies, group_child_groups, group_memberships) are reported as
informational drift — these are Ent's authoritative group/policy schemas
that Phase 2 will replace the V5 stubs with.
Phase 2 prep work: close the remaining drift between raw SQL and Ent
schemas so Phase 2 can point Ent at hub.db without surprises.

- sqlite: add migration V46 adding delegation_enabled INTEGER NOT NULL
  DEFAULT 0 to agents. This aligns the raw SQL schema with the Ent-side
  field used by the policy engine to mark agents whose creator
  relationship is policy-addressable.

- entstore/schemacoverage: add TestSchemaCoverage_ColumnAttributes which
  compares column type, NOT NULL, default, and PK flag between the two
  stores. Canonicalizes SQLite type synonyms (UUID/JSON -> TEXT, bool ->
  integer) and bool defaults (true/false -> 1/0). Skips notnull on PK
  columns (SQLite does not enforce NOT NULL on TEXT PRIMARY KEY).

  All remaining drift is documented in knownAttributeDrift with a short
  reason, covering: Go-side timestamp defaults vs CURRENT_TIMESTAMP,
  template.NOT NULL weakening for shadow records, composite PK tables
  (grove_contributors, grove_sync_state — planned to drop in Phase 2),
  and several raw-SQL DEFAULT-without-NOT-NULL columns where Ent
  tightens the constraint safely.

The gate now asserts three invariants:
1. Every raw SQL table has a matching Ent table (modulo V5 stubs, api_keys).
2. Every raw SQL column has a matching Ent column.
3. Column type/notnull/default/pk match, or the drift is allowlisted.

Run with: go test -tags schemadiff ./pkg/store/entstore/...
@google-cla
Copy link
Copy Markdown

google-cla Bot commented Apr 22, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant updates to the data model, including the addition of several new entities (BrokerJoinToken, BrokerSecret, EnvVar, GCPServiceAccount, GitHubInstallation, GroveContributor) and the expansion of the Agent and Grove entities with new fields. It also updates the database client initialization to return the underlying *sql.DB handle, allowing for better connection pool management. I have no feedback to provide as these changes appear to be a standard expansion of the schema and infrastructure.

…ults

Round out Phase 1 after code review:

- Schema-diff gate now checks indexes (TestSchemaCoverage_Indexes) in
  addition to tables/columns/column-attributes. Catches the existing
  expression-index gap on notification_subscriptions (documented in
  knownIndexDrift) and will catch any StorageKey typos going forward.
- entc.Open{SQLite,Postgres} now return the underlying *sql.DB so the
  gate no longer needs a parallel sql.Open handle. Also unblocks Phase 5
  pool config and the advisory-lock migration wrapper.
- GroveContributor/GroveSyncState id fields use DefaultFunc(uuid.NewString)
  so Phase 3 callsites don't each have to generate ids.
- Agent.delegation_enabled comment updated — V46 shipped in this branch,
  no longer a pending follow-up.

Signed-off-by: Edvin Norling <edvin.norling@kognic.com>
@nissessenap nissessenap marked this pull request as ready for review April 22, 2026 21:26
Collapses pkg/ent/*.go and pkg/ent/*/*.go in GitHub diff views so
reviewers see the hand-written schema changes (pkg/ent/schema/**) and
migration/test changes instead of tens of thousands of lines of Ent
codegen boilerplate. Schema files remain fully diffable.

Signed-off-by: Edvin Norling <edvin.norling@kognic.com>
@nissessenap nissessenap marked this pull request as draft April 22, 2026 21:38
@nissessenap nissessenap marked this pull request as draft April 22, 2026 21:38
@nissessenap nissessenap marked this pull request as ready for review April 22, 2026 21:48
@ptone ptone mentioned this pull request Apr 25, 2026
Resolves conflict on migrationV46 in pkg/store/sqlite/sqlite.go: both
branches claimed slot 46. Upstream's V46 (templates.default_harness_config)
keeps the slot since it merged first; this branch's delegation_enabled
migration is renumbered to V47.

- sqlite: keep upstream V46, add migrationV47 (delegation_enabled) and
  append it to the migrations slice
- ent/schema/agent.go: update comment reference V46 -> V47
- ent/schema/template.go: add default_harness_config field for parity
  with upstream's V46 so the Phase 1 schema-diff gate keeps passing
- regenerate ent code (template, migrate/schema, mutation)

Verified: go build ./..., go test ./pkg/store/... ./pkg/ent/... ./cmd/...,
and go test -tags schemadiff ./pkg/store/entstore/... all pass. The 7
pkg/hub failures observed locally also fail on pure upstream/main and
are unrelated environmental issues (init.defaultBranch=main vs tests
expecting master).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant