Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 87 additions & 0 deletions docs/prds/prd-entra-id-graph-integration.md
Original file line number Diff line number Diff line change
Expand Up @@ -230,3 +230,90 @@ A short manual smoke test against a real UC HTTP Connection wired to a tenant
- **Performance**: 250ms debounce + 2-char minimum + 5-minute backend TTL cache should keep us well under Graph's per-app throttling thresholds for normal usage.
- **Accessibility**: badges use `aria-label` containing `displayName` + `subLabel`; the popup dialog is focus-trapped; the type filter chips are reachable via Tab.
- **Forward path**: a v2 PRD can introduce structured Principal columns (with type discriminator) on the highest-traffic tables (e.g. `app_roles.assigned_users`) once we have data on how often homonym collisions cause real bugs.

---

## Amendment: Generalisation to a Directory provider abstraction

> The PRD above was written Entra-first. During planning we generalised the manager / routes / settings keys so additional IdPs can plug in without breaking changes. This amendment documents what shipped — it is purely a re-naming of the abstraction layer and an enumeration of the providers v1 actually ships; the user-visible behaviour, disambiguation rule, picker contract, and graceful-degradation rules are unchanged from the original PRD.

See the implementation plan at [`plans/directory-lookup-and-principal-picker.md`](../../plans/directory-lookup-and-principal-picker.md) (#375) for the phase-by-phase breakdown.

### What changed in naming

| Original PRD term | What shipped |
| ------------------------------------- | ------------------------------------------- |
| "Entra integration" | **Directory layer** (provider-agnostic) |
| `EntraSettingsManager` / `EntraManager` | **`DirectoryManager`** |
| `/api/entra/*` | **`/api/directory/*`** |
| `ENTRA_PROVIDER_TYPE` setting key | **`DIRECTORY_PROVIDER_TYPE`** |
| `ENTRA_UC_HTTP_CONNECTION_NAME` | **`DIRECTORY_UC_HTTP_CONNECTION_NAME`** |
| "Entra ID" Settings tab | **"Directory"** tab under Integrations |

The PrincipalPicker, search API shape, status endpoint, and all storage compatibility rules are exactly as written in the PRD body. The amendment is just "rename the integration to be provider-agnostic at the surface layer".

### The `DirectoryProvider` interface

Every concrete provider implements:

```python
class DirectoryProvider(ABC):
def search_users(self, prefix: str, top: int) -> List[Principal]: ...
def search_groups(self, prefix: str, top: int) -> List[Principal]: ...
def get_user(self, id: str) -> Principal: ...
def get_group(self, id: str) -> Principal: ...
def test(self) -> None: ... # raise DirectoryError on failure
```

Factories take `(DirectoryProviderContext, DirectoryProviderConfig)` and return a provider instance. The context carries transport handles (`ws_client` for Entra, `db_engine` for Lakebase, …); the config is one bag containing every directory setting and each provider reads only the fields relevant to its type. Adding a new provider is one entry in `_PROVIDER_REGISTRY` plus an optional `_REQUIRED_KEYS` row — no other code changes.

### Providers shipped in v1

The PRD's "Out of Scope" line about second providers ("Okta / Ping / others … future-phase work") was conservative. The abstraction was actually needed for the same v1 to deliver a Postgres-table-backed and a CSV-file-backed provider as well — both are useful day-one (Lakebase for customers without an IdP integration, File for tests and offline dev). v1 ships:

| Provider | Setting key carrying its primary config | Transport |
| ---------- | --------------------------------------- | ---------------------------------------------------- |
| `entra` | `DIRECTORY_UC_HTTP_CONNECTION_NAME` | Microsoft Graph via UC HTTP Connection |
| `lakebase` | `DIRECTORY_LAKEBASE_TABLE` (FQN) | App's own SQLAlchemy engine; Postgres table |
| `file` | `DIRECTORY_FILE_PATH` (absolute path) | CSV on local filesystem (mtime-cached re-load) |

The Settings → Directory tab's provider Select offers all three. Per-provider config inputs and help blocks switch on the active provider.

### Lakebase table schema

The `lakebase` provider expects a Postgres table at the configured FQN with columns:

```sql
CREATE TABLE <fqn> (
type TEXT NOT NULL, -- 'user' | 'group'
id TEXT NOT NULL, -- UPN/email for users, displayName for groups
display_name TEXT NOT NULL,
sub_label TEXT
);
-- recommended for snappy prefix search:
CREATE INDEX ON <fqn> (LOWER(display_name));
CREATE INDEX ON <fqn> (LOWER(id));
```

Population is operator-responsibility — typically synced from an IdP via a scheduled job. The provider parameterises every value passed to SQL and validates the table FQN identifier-by-identifier so the table name can never enter a query untrusted.

### File CSV format

The `file` provider expects a CSV file with a header row containing `type`, `id`, `display_name` (and an optional `sub_label`). `type` must be `user` or `group`. Blank rows are skipped; missing required columns or invalid `type` values raise on first read.

```csv
type,id,display_name,sub_label
user,alice@example.com,Alice Liddell,alice@example.com
group,Producers,Data Producers,producers-guid
```

### What didn't change

- The PrincipalPicker contract, both modes (configured / unconfigured), both UI variants (inline + popup dialog), and the disambiguation rule.
- Storage compatibility: every existing principal-bearing column keeps its current `string` / `List[str]` shape.
- Graceful degradation: a failing search call (any provider) drops the picker into manual-entry mode for the rest of the session.
- "No backend storage migration" — still true; the new settings keys live in the existing `app_settings` key/value table.

### What's still out of scope (unchanged from the PRD body)

Okta / Ping / SCIM providers, service principals, OBO-delegated Graph, profile photos, manager hierarchies, group-membership traversal, replacing role/team Selects in the workflow designer, and CSV bulk import. The `DirectoryProvider` abstraction is the thing that turns the first three from "ten-file changes" into "register one factory".
Loading