From 4fcb14f1bba4921441f9f4f4f5d866c94298b6e0 Mon Sep 17 00:00:00 2001 From: Lars George Date: Thu, 21 May 2026 18:45:42 +0200 Subject: [PATCH] docs(prd): amend Entra PRD with Directory abstraction + multi-provider scope MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Documents what shipped under PRs #406 / #407 / #412 / #413 / #416 / #417: - Renames the integration's manager / routes / settings keys in the PRD to match the implementation (Directory layer, /api/directory/*, DIRECTORY_* settings, Settings → Directory tab). - Documents the DirectoryProvider interface and the (DirectoryProviderContext, DirectoryProviderConfig) factory signature so future provider plug-ins know what to implement. - Documents the v1 provider set, which expanded during planning from Entra-only to entra + lakebase + file. The Lakebase table schema and CSV format are included so operators have a single reference. - Preserves story content, the disambiguation rule, both picker modes, storage-compatibility guarantees, and graceful-degradation rules from the PRD body unchanged. - Re-confirms the out-of-scope list (Okta/Ping, service principals, OBO, profile photos, manager hierarchy, role/team Select replacement, CSV bulk import) which the abstraction makes cheap to revisit. --- docs/prds/prd-entra-id-graph-integration.md | 87 +++++++++++++++++++++ 1 file changed, 87 insertions(+) diff --git a/docs/prds/prd-entra-id-graph-integration.md b/docs/prds/prd-entra-id-graph-integration.md index 91da7505..3e354762 100644 --- a/docs/prds/prd-entra-id-graph-integration.md +++ b/docs/prds/prd-entra-id-graph-integration.md @@ -230,3 +230,90 @@ A short manual smoke test against a real UC HTTP Connection wired to a tenant - **Performance**: 250ms debounce + 2-char minimum + 5-minute backend TTL cache should keep us well under Graph's per-app throttling thresholds for normal usage. - **Accessibility**: badges use `aria-label` containing `displayName` + `subLabel`; the popup dialog is focus-trapped; the type filter chips are reachable via Tab. - **Forward path**: a v2 PRD can introduce structured Principal columns (with type discriminator) on the highest-traffic tables (e.g. `app_roles.assigned_users`) once we have data on how often homonym collisions cause real bugs. + +--- + +## Amendment: Generalisation to a Directory provider abstraction + +> The PRD above was written Entra-first. During planning we generalised the manager / routes / settings keys so additional IdPs can plug in without breaking changes. This amendment documents what shipped — it is purely a re-naming of the abstraction layer and an enumeration of the providers v1 actually ships; the user-visible behaviour, disambiguation rule, picker contract, and graceful-degradation rules are unchanged from the original PRD. + +See the implementation plan at [`plans/directory-lookup-and-principal-picker.md`](../../plans/directory-lookup-and-principal-picker.md) (#375) for the phase-by-phase breakdown. + +### What changed in naming + +| Original PRD term | What shipped | +| ------------------------------------- | ------------------------------------------- | +| "Entra integration" | **Directory layer** (provider-agnostic) | +| `EntraSettingsManager` / `EntraManager` | **`DirectoryManager`** | +| `/api/entra/*` | **`/api/directory/*`** | +| `ENTRA_PROVIDER_TYPE` setting key | **`DIRECTORY_PROVIDER_TYPE`** | +| `ENTRA_UC_HTTP_CONNECTION_NAME` | **`DIRECTORY_UC_HTTP_CONNECTION_NAME`** | +| "Entra ID" Settings tab | **"Directory"** tab under Integrations | + +The PrincipalPicker, search API shape, status endpoint, and all storage compatibility rules are exactly as written in the PRD body. The amendment is just "rename the integration to be provider-agnostic at the surface layer". + +### The `DirectoryProvider` interface + +Every concrete provider implements: + +```python +class DirectoryProvider(ABC): + def search_users(self, prefix: str, top: int) -> List[Principal]: ... + def search_groups(self, prefix: str, top: int) -> List[Principal]: ... + def get_user(self, id: str) -> Principal: ... + def get_group(self, id: str) -> Principal: ... + def test(self) -> None: ... # raise DirectoryError on failure +``` + +Factories take `(DirectoryProviderContext, DirectoryProviderConfig)` and return a provider instance. The context carries transport handles (`ws_client` for Entra, `db_engine` for Lakebase, …); the config is one bag containing every directory setting and each provider reads only the fields relevant to its type. Adding a new provider is one entry in `_PROVIDER_REGISTRY` plus an optional `_REQUIRED_KEYS` row — no other code changes. + +### Providers shipped in v1 + +The PRD's "Out of Scope" line about second providers ("Okta / Ping / others … future-phase work") was conservative. The abstraction was actually needed for the same v1 to deliver a Postgres-table-backed and a CSV-file-backed provider as well — both are useful day-one (Lakebase for customers without an IdP integration, File for tests and offline dev). v1 ships: + +| Provider | Setting key carrying its primary config | Transport | +| ---------- | --------------------------------------- | ---------------------------------------------------- | +| `entra` | `DIRECTORY_UC_HTTP_CONNECTION_NAME` | Microsoft Graph via UC HTTP Connection | +| `lakebase` | `DIRECTORY_LAKEBASE_TABLE` (FQN) | App's own SQLAlchemy engine; Postgres table | +| `file` | `DIRECTORY_FILE_PATH` (absolute path) | CSV on local filesystem (mtime-cached re-load) | + +The Settings → Directory tab's provider Select offers all three. Per-provider config inputs and help blocks switch on the active provider. + +### Lakebase table schema + +The `lakebase` provider expects a Postgres table at the configured FQN with columns: + +```sql +CREATE TABLE ( + type TEXT NOT NULL, -- 'user' | 'group' + id TEXT NOT NULL, -- UPN/email for users, displayName for groups + display_name TEXT NOT NULL, + sub_label TEXT +); +-- recommended for snappy prefix search: +CREATE INDEX ON (LOWER(display_name)); +CREATE INDEX ON (LOWER(id)); +``` + +Population is operator-responsibility — typically synced from an IdP via a scheduled job. The provider parameterises every value passed to SQL and validates the table FQN identifier-by-identifier so the table name can never enter a query untrusted. + +### File CSV format + +The `file` provider expects a CSV file with a header row containing `type`, `id`, `display_name` (and an optional `sub_label`). `type` must be `user` or `group`. Blank rows are skipped; missing required columns or invalid `type` values raise on first read. + +```csv +type,id,display_name,sub_label +user,alice@example.com,Alice Liddell,alice@example.com +group,Producers,Data Producers,producers-guid +``` + +### What didn't change + +- The PrincipalPicker contract, both modes (configured / unconfigured), both UI variants (inline + popup dialog), and the disambiguation rule. +- Storage compatibility: every existing principal-bearing column keeps its current `string` / `List[str]` shape. +- Graceful degradation: a failing search call (any provider) drops the picker into manual-entry mode for the rest of the session. +- "No backend storage migration" — still true; the new settings keys live in the existing `app_settings` key/value table. + +### What's still out of scope (unchanged from the PRD body) + +Okta / Ping / SCIM providers, service principals, OBO-delegated Graph, profile photos, manager hierarchies, group-membership traversal, replacing role/team Selects in the workflow designer, and CSV bulk import. The `DirectoryProvider` abstraction is the thing that turns the first three from "ten-file changes" into "register one factory".