Issue #906: Phase 1a: Stand up LDE microservice#920
Conversation
## Summary - **`ARCHITECTURE.md`** at repo root — landing page with a polylith primer, a service-map Mermaid diagram, and pointers to the deeper docs below - **`docs/design/cross-cutting/self-serve-tenant-auth.md`** — narrative of the #882 → #883 → #884 chain (Cognito self-serve → schema-per-tenant → workspace selection & invites), explaining the design and the *why* for engineers joining mid-project - **`docs/operations/guides/adding-a-new-microservice.md`** — runbook covering Polylith brick layout, `pyproject.toml` hygiene (avoiding the dependency bloat that hit PR #920), `Dockerfile2`, `AuthMiddleware` integration, and docker-compose wiring; ends with a pre-PR checklist - **`docs/specs/data-model-rules.md`** — enhanced the bare 5-line *Naming Styles* section with the case-table, JSON example, and "files that must follow this convention" list previously only in `CLAUDE.md` - **`cspell.json`** — added HMAC, PKCE, httpx, pyjwt, boto, selfserve, idempotently to the dictionary ## Why Reviewer bandwidth is the constraint, so this PR bundles four documentation-only changes that don't need review and can land directly. The goal of each: faster onboarding (`ARCHITECTURE.md`), preserve hard-won design rationale (self-serve auth narrative), prevent the next microservice from repeating known mistakes (runbook), and surface naming conventions that the codebase already enforces but only `CLAUDE.md` describes (data-model rules). ## Test plan - [x] `uv run pre-commit run cspell --files <new docs>` passes - [ ] Render check on GitHub: Mermaid diagram in `ARCHITECTURE.md`, tables, code blocks - [ ] Spot-check the relative links between docs after merge 🤖 Generated with [Claude Code](https://claude.com/claude-code)
## Summary Adds a short README to each base brick so people navigating the repo on GitHub can tell what each service does without opening `core.py`. Each follows the same shape: purpose, endpoints, auth (where relevant), components composed, and the deploying project. ## Coverage 11 new READMEs: - `api_graphql`, `advisor_restapi`, `query_cache_restapi`, `example_data_source_rest_api`, `semantic_search_mcp_server`, `orchestrator_restapi`, `translator_restapi`, `identity_mapper_restapi`, `query_planner_restapi` - `mdr_restapi` — with an endpoint-group table summarizing its 16 endpoint modules under their URL prefixes - `query_cache_module` — a note that this is an empty stub for a future non-HTTP cache interface Skipped: `learner_data_export_api` (only on the open #920 branch; its README will land with that PR). ## Why Reviewer bandwidth constraint means we're chipping away at docs that don't need review. Tier 1 of the README sweep — bases first because they're well-bounded; components batch will follow. ## Test plan - [x] `uv run pre-commit run cspell --files bases/lif/*/README.md` passes - [ ] Spot-check that each README's claimed endpoints and component list match `core.py` after merge 🤖 Generated with [Claude Code](https://claude.com/claude-code)
## Summary Tier 1 README sweep, **batch 4 of 4** — closes out the goal of "every directory that has at least a file or two has a README". Adds 26 short READMEs across the remaining component bricks and a few subdirectories. ## Coverage - **Auth + middleware:** api_key_auth, auth - **Data plumbing:** composer, datatypes, exceptions, logging, mongodb_connection, string_utils - **Schema + config:** lif_schema_config, openapi_schema_parser, openapi_to_graphql, schema_state_manager - **Source-adapter framework:** data_source_adapters + two adapter subdirs (lif_to_lif_adapter, example_data_source_rest_api_to_lif_adapter) - **Identity mapper trio:** identity_mapper_service, identity_mapper_storage, identity_mapper_storage_sql - **Service-specific bricks:** example_data_source_service, graphql_client, langchain_agent (+ prompts/), semantic_search_service, translator - **Tenant routing pure helpers:** tenant_routing - **Subdirs:** langchain_agent/prompts (5 prompt templates), mdr_client/resources (bundled OpenAPI snapshot) ## Wrapping up the sweep | Batch | PR | Scope | |---|---|---| | 1 | #922 | 11 bases | | 2 | #923 | 5 MDR components | | 3 | #924 | 7 query/orchestration components | | 4 | **this PR** | 26 remaining components + subdirs | After this merges, the only missing README under `bases/lif/` is `learner_data_export_api/` — which will land with cbeach's open PR #920. ## Test plan - [x] `uv run pre-commit run cspell --files <new READMEs>` passes - [ ] Spot-check the "Used by" lists after merge - [ ] Render check on GitHub: tables, cross-links between READMEs 🤖 Generated with [Claude Code](https://claude.com/claude-code)
There was a problem hiding this comment.
Pull request overview
This PR introduces the initial “Phase 1a” scaffolding for the Learner Data Export (LDE) FastAPI microservice, wiring it into the repo’s auth/config plumbing and providing a docker-compose deployment target.
Changes:
- Added a new
lif_learner_data_export_apiproject (pyproject, Dockerfiles, build scripts) plus a FastAPI app with stub/exportand/available-data-formatsendpoints. - Extended shared auth/config to support an LDE service API key and added DTOs for the available data formats response.
- Added basic API tests and updated the repo lockfile (including
ty).
Reviewed changes
Copilot reviewed 16 out of 18 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
uv.lock |
Updates locked dependency versions (incl. ty). |
test/bases/lif/learner_data_export_api/test_core.py |
Adds endpoint/auth tests for the new service. |
test/bases/lif/learner_data_export_api/__init__.py |
Test package init for the new test module. |
projects/lif_learner_data_export_api/README.md |
Adds build/run and curl examples for the microservice. |
projects/lif_learner_data_export_api/pyproject.toml |
Defines the new project packaging/deps and polylith brick mappings. |
projects/lif_learner_data_export_api/Dockerfile2 |
Multi-stage container build that syncs deps and builds the wheel in-container. |
projects/lif_learner_data_export_api/Dockerfile |
Single-stage runtime image that installs a prebuilt wheel + exported requirements. |
projects/lif_learner_data_export_api/build.sh |
Build helper to export requirements and build wheel artifacts. |
projects/lif_learner_data_export_api/build-docker.sh |
Build helper for the Docker image from repo root. |
projects/lif_learner_data_export_api/.gitignore |
Ignores build artifacts for this project. |
projects/lif_learner_data_export_api/.dockerignore |
Excludes local venv from docker build context. |
deployments/advisor-demo-docker/docker-compose.yml |
Adds the new LDE service container to the demo deployment. |
components/lif/mdr_utils/config.py |
Adds a new service API key setting for learner data export. |
components/lif/mdr_auth/core.py |
Registers the new API key in the service API key map (AuthMiddleware). |
components/lif/datatypes/core.py |
Adds DTOs for available data formats and a health check response model. |
bases/lif/learner_data_export_api/learner_data_export_endpoints.py |
Implements stub endpoints for export + available formats. |
bases/lif/learner_data_export_api/core.py |
Sets up the FastAPI app, AuthMiddleware, CORS, and wiring for endpoints. |
bases/lif/learner_data_export_api/__init__.py |
Exposes the new service’s core module. |
Comments suppressed due to low confidence (2)
bases/lif/learner_data_export_api/learner_data_export_endpoints.py:28
- Issue #906 specifies
GET /available-data-formats?urlEscaped=true|false, but this endpoint currently has nourlEscapedquery parameter. Add the parameter (with a default) so the API surface matches the design and clients can rely on it.
@router.get("/available-data-formats", response_model=TargetTransformationDataModelsDTO)
async def get_available_data_formats(request: Request):
# TODO: Build this out.
logger.info("Received request for available data formats as %s", request.state.principal)
bases/lif/learner_data_export_api/learner_data_export_endpoints.py:33
- The Issue #906 design shows
/available-data-formatsreturning a JSON array of data format objects, but this implementation returns an object withmetadataanddataFormats. Either adjust the response shape to match the design, or update the design/README to document the wrapper so downstream clients don't implement against the wrong contract.
@router.get("/available-data-formats", response_model=TargetTransformationDataModelsDTO)
async def get_available_data_formats(request: Request):
# TODO: Build this out.
logger.info("Received request for available data formats as %s", request.state.principal)
data_formats = TargetTransformationDataModelsDTO(
metadata={"total": 3},
dataFormats=[
TargetTransformationDataModelDTO(
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
bjagg
left a comment
There was a problem hiding this comment.
Read through the new microservice plus the diffs to shared modules. The Dockerfiles, build scripts, pyproject, docker-compose entry, and AuthMiddleware integration all match the established lif_translator_api-style template — nice to see the new service slot in cleanly.
Flagging a handful of items inline. The three I'd want to see addressed before merge:
bases/lif/learner_data_export_api/core.pyreadssettings.cors_*but those fields don't appear to exist on theSettingsclass incomponents/lif/mdr_utils/config.py(which only knows aboutmdr__*-namespaced fields). As written, the service should crash at first request. Either add the fields (consistent with the existingmdr__*namespace) or reados.environdirectly.test_core.pyhas two duplicate test functions —test_auth_info_401andtest_auth_info_default_tokenare each defined twice. Looks like a copy-paste artifact; the second def silently shadows the first.- The
/test/auth-infodebug endpoint ships hardcoded"auth_type": "API token"regardless of how the caller authenticated and exposes the principal to any authenticated caller — recommend gating or removing.
The rest are style/convention nits. Happy to keep them as comments rather than blockers if you want to land the stub quickly. Thanks for picking up #906!
|
|
||
| app.add_middleware(AuthMiddleware) | ||
| # Configure CORS middleware | ||
| cors_origins = [origin.strip() for origin in settings.cors_allow_origins.split(",") if origin.strip()] |
There was a problem hiding this comment.
These reads — settings.cors_allow_origins, cors_allow_credentials, cors_allow_methods, cors_allow_headers — assume the Settings class in components/lif/mdr_utils/config.py defines them, but I don't see them there. The docker-compose passes CORS_* env vars (no MDR__ prefix), and the existing Settings only declares mdr__*-namespaced fields. As written, the service should crash at startup with an AttributeError the first time these are dereferenced.
Either (a) add the fields to Settings namespaced as mdr__cors__* (with env aliases if you want to keep the bare CORS_* env vars in compose), or (b) read os.environ directly here. Either way the PR is missing the field definitions.
There was a problem hiding this comment.
Those settings exist -
I plan to hold off adjusting them until we have time to generalize the auth flows.
| @@ -0,0 +1,139 @@ | |||
| import pytest | |||
| from deepdiff import DeepDiff | |||
There was a problem hiding this comment.
Existing LIF tests use assert resp.json() == expected directly — pytest's assertion-rewriting prints a structural diff on failure. DeepDiff adds a runtime dep for the same outcome and produces a noisier failure message. Recommend dropping for consistency with the rest of the test suite.
There was a problem hiding this comment.
I've found DeepDiff to be more flexible as the objects to compare become more complex.
Adjusted the simple diffs to use ==, leaving the ones that will become complex as future facing examples of DeepDiff.
| @@ -0,0 +1,3 @@ | |||
| from lif.learner_data_export_api import core | |||
There was a problem hiding this comment.
Other bases (e.g. bases/lif/translator_restapi/__init__.py) are empty. The eager from … import core here triggers FastAPI app instantiation as a side effect of importing the package — can cause subtle test-order issues if anything imports the package without wanting the app initialized. Recommend dropping the body.
There was a problem hiding this comment.
All bases sub-folders follow this pattern of:
from ... import ...
__all__ = [...]Do you feel all the `base/lif/.../init.py files should be empty?
|
FYI — heads up that PR #914 landed on No rush — happy to wait until you're ready to address my review comments anyway. Just flagging so you're not surprised by the conflict marker. |
As per the first phase of #906 , this builds out the Learner Data Export (LDE) microservice, with stubs for endpoints.
Microservice launches as single dockerfile and in reference docker compose deployment.
Note, updated TY to address false positives with the FastAPI middleware: astral-sh/ty#1635