Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
323 changes: 23 additions & 300 deletions CLAUDE.md

Large diffs are not rendered by default.

7 changes: 6 additions & 1 deletion docs/INDEX.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@
- [`lif-query-cache.md`](design/components/lif-query-cache.md) — LIF Query Cache service: caching layer for resolved queries.
- [`lif-query-planner.md`](design/components/lif-query-planner.md) — LIF Query Planner service: query routing and optimization.
- [`mdr.md`](design/components/mdr.md) — MDR (Metadata Repository) service: schema registry and contract authority.
- [`semantic-search.md`](design/components/semantic-search.md) — Semantic Search MCP server: FastMCP tools (`lif_query`/`lif_mutation`), HTTP endpoints, embeddings, startup schema load.
- [`translator.md`](design/components/translator.md) — Translator service: source-data-to-LIF transformation engine.

### `docs/design/adr/` — Architectural Decision Records
Expand All @@ -58,9 +59,10 @@

### `docs/design/cross-cutting/` — Topics spanning services

- [`schema-loading.md`](design/cross-cutting/schema-loading.md) — Schema loading pattern (MDR-at-startup, no silent file fallback), `SchemaStateManager`, PascalCase/camelCase convention, Strawberry GraphQL implementation details.
- [`self-serve-tenant-auth.md`](design/cross-cutting/self-serve-tenant-auth.md) — Self-serve tenant onboarding narrative: Cognito sign-up → post-confirmation Lambda → schema-per-tenant provisioning → workspace selection cookie → invite tokens (#882/#883/#884).

*Other planned topics: `auth.md` (all-service auth model), `schema-loading.md`, `polylith-conventions.md`.*
*Other planned topics: `auth.md` (all-service auth model), `polylith-conventions.md`.*

---

Expand All @@ -74,7 +76,10 @@
- [`adding-a-new-microservice.md`](operations/guides/adding-a-new-microservice.md) — Runbook for standing up a new HTTP microservice: Polylith brick layout, pyproject hygiene, Dockerfile2, AuthMiddleware wiring, docker-compose entry.
- [`creating-a-data-source-adapter.md`](operations/guides/creating-a-data-source-adapter.md) — Reference for the data source adapter contract: what adapters are, what they receive, what they return.
- [`demo-environment-update.md`](operations/guides/demo-environment-update.md) — End-to-end runbook for promoting dev images to demo.
- [`deployment.md`](operations/guides/deployment.md) — Deployment scripts, env config (dev vs. demo), MDR schema migrations, Docker build dependency resolution, ECS/CloudWatch debugging, querying the MDR API.
- [`graphql-api-keys.md`](operations/guides/graphql-api-keys.md) — GraphQL org1 API key authentication: SSM key storage, `X-API-Key` flow, `setup-graphql-api-keys.sh` service/workshop modes.
- [`load-testing.md`](operations/guides/load-testing.md) — Load testing notes for LIF services.
- [`testing.md`](operations/guides/testing.md) — Unit/integration test principles, sample data orgs, the 6 test users, service-layer testing order.

### `docs/operations/proposals/` — Proposed work

Expand Down
22 changes: 22 additions & 0 deletions docs/design/components/semantic-search.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Semantic Search MCP Server

The semantic search service (`bases/lif/semantic_search_mcp_server/`) provides MCP tools for AI-powered learner data queries.

**Architecture:**
- Uses FastMCP for Model Context Protocol
- Loads schema from MDR at startup (sync initialization required for tool registration) — see [`schema-loading.md`](../cross-cutting/schema-loading.md)
- Connected to org1's GraphQL API for data queries
- Embeddings computed via Sentence-Transformers

**HTTP Endpoints:**
- `GET /health` - Readiness check
- `GET /schema/status` - Schema metadata (source, leaf count, roots, filter models)
- `POST /schema/refresh` - Reload schema from MDR (state only, not tool definitions)

**MCP Tools:**
- `lif_query` - Semantic search over LIF data fields
- `lif_mutation` - Update LIF data fields (if mutation model available)

**Docker port:** 8003 (exposed for integration testing)

**GraphQL authentication:** Uses `graphql_client` component for all GraphQL HTTP calls, which automatically sends `X-API-Key` from the `LIF_GRAPHQL_API_KEY` env var when set. See [`graphql-api-keys.md`](../../operations/guides/graphql-api-keys.md).
96 changes: 96 additions & 0 deletions docs/design/cross-cutting/schema-loading.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
# Schema Loading & Data Model

How services load the LIF schema, the PascalCase/camelCase naming convention, and the GraphQL implementation details that depend on it. The normative rules live in [`docs/specs/data-model-rules.md`](../../specs/data-model-rules.md); this doc is the agent-oriented implementation reference.

## Schema Hierarchy
1. **`schemas/lif-schema.json`** - Source of truth for LIF data model rules and policies
2. **MDR (Metadata Registry)** - Captures schema dynamically, allows extension by deployers
3. **Seed data** - Must validate against the schema from MDR
4. **Components** - Must honor the schema, load from MDR with short cache if needed
5. **GraphQL queries** - Should align with schema as best as practical

## Schema Loading Pattern (IMPORTANT)

Services load OpenAPI schema from MDR at startup. Key design decisions:

**No silent fallback to file:**
- If MDR is configured but unavailable, the service **fails with a clear error** (does not silently fall back to bundled file)
- This prevents using stale/outdated schema data in production
- Use `USE_OPENAPI_DATA_MODEL_FROM_FILE=true` to explicitly use bundled file (development only)

**Configuration via `LIFSchemaConfig`:**
- All schema-related config should use `LIFSchemaConfig.from_environment()` (not direct `os.getenv()`)
- Provides centralized validation and consistent defaults
- Key env vars: `OPENAPI_DATA_MODEL_ID`, `LIF_MDR_API_URL`, `USE_OPENAPI_DATA_MODEL_FROM_FILE`

**SchemaStateManager component** (`components/lif/schema_state_manager/`):
- Shared component for services that need schema data (semantic search, GraphQL)
- Handles sync and async initialization
- Thread-safe state access via lock
- Tracks schema source ("mdr" or "file")
- Supports schema refresh without restart

```python
from lif.schema_state_manager import SchemaStateManager
from lif.lif_schema_config import LIFSchemaConfig

config = LIFSchemaConfig.from_environment()
manager = SchemaStateManager(config)
manager.initialize_sync() # or await manager.initialize()

state = manager.state # Access schema leaves, filter models, embeddings
```

## Capitalization Convention (IMPORTANT)

The LIF schema uses a specific naming convention based on data type:

| Type | Case | Examples |
|------|------|----------|
| **Entity/Object/Array properties** | PascalCase | `Name`, `Contact`, `Identifier`, `EmploymentLearningExperience`, `CredentialAward`, `Proficiency` |
| **Scalar attributes** | camelCase | `firstName`, `lastName`, `identifier`, `identifierType`, `informationSourceId`, `startDate` |

**Example structure:**
```json
{
"person": [{
"Name": [{ // PascalCase - array of objects
"firstName": "John", // camelCase - scalar attribute
"lastName": "Doe",
"informationSourceId": "Org1"
}],
"Identifier": [{ // PascalCase - array of objects
"identifier": "12345", // camelCase - scalar attribute
"identifierType": "SCHOOL_ASSIGNED_NUMBER"
}],
"EmploymentPreferences": [{ // PascalCase - array of objects
"organizationTypes": ["Public"] // camelCase - scalar attribute
}]
}]
}
```

### Files That Must Follow This Convention
- **Seed data**: `projects/mongodb/sample_data/**/*.json`
- **GraphQL queries**: `components/lif/data_source_adapters/**/*.graphql`
- **Config files**: `deployments/**/information_sources_config*.yml` (fragment paths like `person.Name`)
- **Test fixtures**: Any test data in `test/`

## Key Implementation Details

1. **Strawberry GraphQL types** (`type_factory.py`):
- Uses `strawberry.field(name=field_name)` to preserve original schema case
- `resolve_actual_type()` preserves `List` wrappers for proper type resolution
- `dict_to_dataclass()` handles nested type conversion
- **Resolver annotations must use `Info` type** — dynamic resolvers in `build_root_query_type` and `build_root_mutation_type` must annotate `info` as `strawberry.types.Info`, not `object` or `Any`. Strawberry identifies the `info` parameter by type (name-based fallback was removed in 0.297.0).
- **MDR schemas have no `$ref`** — the MDR `generate_openapi_schema()` function deep-copies and inlines all referenced schemas. The `$ref` branch in `create_type()` exists but is not exercised by production schemas.

2. **Fragment paths** use format `person.EntityName` (e.g., `person.EmploymentPreferences`)

3. **Translator service** returns data with PascalCase root (`Person` not `person`)
- The `adjust_lif_fragments_for_initial_orchestrator_simplification()` function uses case-insensitive key lookup to handle this

4. **Filter inputs** in GraphQL also use PascalCase for entity names:
```graphql
person(filter: { Identifier: { identifier: "12345", identifierType: "..." } })
```
90 changes: 90 additions & 0 deletions docs/operations/guides/deployment.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# Deployment & Operations

Deployment scripts, environment config, MDR migrations, and debugging runbooks. For the full demo-promotion walkthrough see [`demo-environment-update.md`](demo-environment-update.md); for SAM/database details see [`sam/README.md`](../../../sam/README.md).

## Environment Configuration
- `{env}.aws` files (repo root) — define `AWS_REGION`, `SAM_CONFIG_ENV`, `STACKS` map, and `STACK_ORDER` for each environment
- `cloudformation/{env}-*.params` — CloudFormation parameter files per stack, including `ImageUrl` for ECS services
- Environments: `dev`, `demo` (demo is manually promoted from dev)

## Deployment Scripts

| Script | Purpose |
|--------|---------|
| `aws-deploy.sh` | Deploy CloudFormation stacks (`-s demo`, `--only-stack`, `--update-ecs`, `--update-sam`) |
| `scripts/release-demo.sh` | Update demo param files with latest ECR image tags from dev |
| `scripts/release-demo-frontend.sh` | Build and deploy MDR frontend to S3/CloudFront from a git ref |
| `scripts/verify-demo-images.sh` | Compare param file image tags against running ECS tasks |
| `scripts/setup-mdr-api-keys.sh` | Generate and store MDR service API keys in SSM Parameter Store |
| `scripts/setup-graphql-api-keys.sh` | Generate and store GraphQL org1 API keys in SSM (service + workshop modes) |
| `scripts/reset-mdr-database.sh` | Reset MDR database (flyway clean + migrate) when V1.1 SQL is replaced |
| `sam/deploy-sam.sh` | Build Flyway Docker image, push to ECR, run SAM deploy for database stacks |

## Environment Differences
- **Dev** uses `:latest` ECR image tags in param files; **demo** uses pinned version tags (e.g., `:1.2.3`)
- `scripts/release-demo.sh` copies the current dev image tags to demo param files for promotion
- Dev has a single-org setup (`dev-single-org`); demo has multi-org (`advisor-demo-org1/2/3`)

## MDR Schema Migrations (V1.2+)

- **Deployed envs** run migrations via Flyway (tracked in `flyway_schema_history`).
- **Local docker-compose** loads `backup.sql` (a pg_dump snapshot of V1.1 content) and then runs every `V1.*.sql` file in the Flyway directory through `psql` — it does *not* use real Flyway and does *not* track history. See `projects/lif_mdr_database/restore.sh`.
- This only re-runs on first init (empty data dir) or after `docker compose down -v`. Persistent-volume `up`/`down` cycles are safe.
- **Authoring convention:** every V1.2+ migration must be idempotent so local re-init is safe. Use `CREATE OR REPLACE FUNCTION`, `CREATE TABLE IF NOT EXISTS`, `DROP TRIGGER IF EXISTS … CREATE TRIGGER`, etc. rather than raw `CREATE`. This is a local-dev concession; deployed envs would tolerate non-idempotent migrations via Flyway's history tracking, but we keep one style across both.
- Wiring real Flyway into local docker-compose is a known gap; deferred pending the broader MDR database tooling evaluation (`docs/proposals/mdr-mongodb-evaluation.md`).

## Key Operational Notes
- **Demo update guide**: See [`demo-environment-update.md`](demo-environment-update.md) for the full end-to-end process
- **SAM databases**: See `sam/README.md` for database deployment architecture and Flyway migration details
- **Apple Silicon**: Docker images for Lambda must use `--platform linux/amd64` (already handled in scripts)
- **SSM parameters**: ECS tasks fail to start if referenced SSM parameters are missing, even optional ones like `ApiKeys`
- **Deploy sequentially**: Running multiple `aws-deploy.sh` commands in parallel causes SSO login conflicts
- **MDR frontend**: Deployed to S3 + CloudFront (not ECS); use `scripts/release-demo-frontend.sh` for demo
- **Bash `grep -v` with `pipefail`**: In scripts using `set -o pipefail`, `grep -v` returns exit code 1 when all lines are filtered out (no matches). Wrap in `(grep -v ... || true)` to prevent script failure.

## Docker Build Dependency Resolution (IMPORTANT)

Project Dockerfiles (`projects/*/Dockerfile2`) build wheels independently from the monorepo lock file. The runtime stage installs the wheel with `uv pip install --system`, which resolves dependencies from PyPI based on the wheel's metadata constraints — **not** from `uv.lock`. This means Docker images can get different dependency versions than local development.

**PEP 440 `~=` gotcha**: `~=0.275` means `>= 0.275, < 1.0`, NOT `>= 0.275, < 0.276`. To constrain to a minor version range, use `~=0.275.0` (which means `>= 0.275.0, < 0.276.0`). This distinction caused a production crash when `strawberry-graphql~=0.275` resolved to `0.297.0` in Docker.

## Debugging ECS Services

**CloudWatch log group**: All dev ECS services share a single log group named `dev`. Log streams are prefixed by service name (e.g., `graphql-org1/graphql-org1/<task-id>`).

```bash
# Tail recent logs for a service
AWS_PROFILE=lif AWS_REGION=us-east-1 aws logs filter-log-events \
--log-group-name dev --log-stream-name-prefix graphql-org1 \
--start-time $(python3 -c "import time; print(int((time.time()-3600)*1000))") \
--limit 100 --query 'events[].message' --output text

# Filter for errors only
AWS_PROFILE=lif AWS_REGION=us-east-1 aws logs filter-log-events \
--log-group-name dev --log-stream-name-prefix graphql-org1 \
--filter-pattern "ERROR" --limit 20 --query 'events[].message' --output text

# Check service status and recent events
AWS_PROFILE=lif AWS_REGION=us-east-1 aws ecs describe-services \
--cluster dev --services graphql-org1-FARGATE \
--query 'services[0].{status:status,running:runningCount,events:events[:3]}'
```

**ECS Exec** is enabled on dev services but requires the Session Manager Plugin (`session-manager-plugin`) installed locally.

## Querying the MDR API

The MDR API provides the OpenAPI schema that GraphQL services load at startup. Useful for debugging schema-related issues.

```bash
# Get the MDR API key for a service (stored in SSM)
AWS_PROFILE=lif AWS_REGION=us-east-1 aws ssm get-parameter \
--name /dev/graphql-org1/MdrApiKey --with-decryption \
--query 'Parameter.Value' --output text

# Fetch the OpenAPI schema (data model ID 17 for dev)
curl -s -H "X-API-Key: <key>" \
"https://mdr-api.dev.lif.unicon.net/datamodels/open_api_schema/17?include_attr_md=true&include_entity_md=false"
```

**MDR auth uses `X-API-Key` header** (not Bearer token). The endpoint path is `/datamodels/open_api_schema/{data_model_id}`.
37 changes: 37 additions & 0 deletions docs/operations/guides/graphql-api-keys.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# GraphQL API Key Authentication

GraphQL org1 supports API key authentication. Keys are managed in AWS SSM Parameter Store.

**How it works:**
- Server-side: `/{env}/graphql-org1/ApiKeys` stores comma-separated `key:client-name` pairs (e.g., `abc123:semantic-search,def456:workshop-01`)
- Client-side: Each client has its own SSM param with the bare key (e.g., `/{env}/semantic-search/GraphqlApiKey`)
- The `graphql_client` component reads `LIF_GRAPHQL_API_KEY` env var and sends it as `X-API-Key` header
- GraphQL server validates incoming keys against its `GRAPHQL_AUTH__API_KEYS` env var
- When `GRAPHQL_AUTH__API_KEYS` is empty/unset, authentication is disabled (local dev default)

**Key env vars:**
| Variable | Service | Purpose |
|----------|---------|---------|
| `GRAPHQL_AUTH__API_KEYS` | GraphQL org1 | Server-side: comma-separated `key:label` pairs to accept |
| `LIF_GRAPHQL_API_KEY` | Semantic search | Client-side: bare API key to send with requests |

**Managing keys:**
```bash
# Preview what will happen
AWS_PROFILE=lif ./scripts/setup-graphql-api-keys.sh demo

# Create/update service key (semantic-search)
AWS_PROFILE=lif ./scripts/setup-graphql-api-keys.sh demo --apply

# Generate workshop participant keys
AWS_PROFILE=lif ./scripts/setup-graphql-api-keys.sh demo --workshop 10 --apply

# Remove all workshop keys (preserves service keys)
AWS_PROFILE=lif ./scripts/setup-graphql-api-keys.sh demo --workshop 0 --apply
```

After key changes, redeploy affected services:
```bash
./aws-deploy.sh -s demo --only-stack demo-lif-semantic-search
./aws-deploy.sh -s demo --only-stack demo-lif-graphql-org1
```
Loading