Skip to content

Issue #906: Phase 1a: Stand up LDE microservice#920

Open
cbeach47 wants to merge 7 commits into
mainfrom
feature/906/learner-data-export-api
Open

Issue #906: Phase 1a: Stand up LDE microservice#920
cbeach47 wants to merge 7 commits into
mainfrom
feature/906/learner-data-export-api

Conversation

@cbeach47
Copy link
Copy Markdown
Contributor

@cbeach47 cbeach47 commented May 15, 2026

As per the first phase of #906 , this builds out the Learner Data Export (LDE) microservice, with stubs for endpoints.

Microservice launches as single dockerfile and in reference docker compose deployment.

Note, updated TY to address false positives with the FastAPI middleware: astral-sh/ty#1635

Comment thread projects/lif_learner_data_export_api/pyproject.toml Outdated
Comment thread projects/lif_learner_data_export_api/pyproject.toml Outdated
Comment thread components/lif/datatypes/core.py Outdated
bjagg added a commit that referenced this pull request May 16, 2026
## Summary
- **`ARCHITECTURE.md`** at repo root — landing page with a polylith
primer, a service-map Mermaid diagram, and pointers to the deeper docs
below
- **`docs/design/cross-cutting/self-serve-tenant-auth.md`** — narrative
of the #882#883#884 chain (Cognito self-serve → schema-per-tenant
→ workspace selection & invites), explaining the design and the *why*
for engineers joining mid-project
- **`docs/operations/guides/adding-a-new-microservice.md`** — runbook
covering Polylith brick layout, `pyproject.toml` hygiene (avoiding the
dependency bloat that hit PR #920), `Dockerfile2`, `AuthMiddleware`
integration, and docker-compose wiring; ends with a pre-PR checklist
- **`docs/specs/data-model-rules.md`** — enhanced the bare 5-line
*Naming Styles* section with the case-table, JSON example, and "files
that must follow this convention" list previously only in `CLAUDE.md`
- **`cspell.json`** — added HMAC, PKCE, httpx, pyjwt, boto, selfserve,
idempotently to the dictionary

## Why
Reviewer bandwidth is the constraint, so this PR bundles four
documentation-only changes that don't need review and can land directly.
The goal of each: faster onboarding (`ARCHITECTURE.md`), preserve
hard-won design rationale (self-serve auth narrative), prevent the next
microservice from repeating known mistakes (runbook), and surface naming
conventions that the codebase already enforces but only `CLAUDE.md`
describes (data-model rules).

## Test plan
- [x] `uv run pre-commit run cspell --files <new docs>` passes
- [ ] Render check on GitHub: Mermaid diagram in `ARCHITECTURE.md`,
tables, code blocks
- [ ] Spot-check the relative links between docs after merge

🤖 Generated with [Claude Code](https://claude.com/claude-code)
bjagg added a commit that referenced this pull request May 16, 2026
## Summary
Adds a short README to each base brick so people navigating the repo on
GitHub can tell what each service does without opening `core.py`. Each
follows the same shape: purpose, endpoints, auth (where relevant),
components composed, and the deploying project.

## Coverage
11 new READMEs:
- `api_graphql`, `advisor_restapi`, `query_cache_restapi`,
`example_data_source_rest_api`, `semantic_search_mcp_server`,
`orchestrator_restapi`, `translator_restapi`, `identity_mapper_restapi`,
`query_planner_restapi`
- `mdr_restapi` — with an endpoint-group table summarizing its 16
endpoint modules under their URL prefixes
- `query_cache_module` — a note that this is an empty stub for a future
non-HTTP cache interface

Skipped: `learner_data_export_api` (only on the open #920 branch; its
README will land with that PR).

## Why
Reviewer bandwidth constraint means we're chipping away at docs that
don't need review. Tier 1 of the README sweep — bases first because
they're well-bounded; components batch will follow.

## Test plan
- [x] `uv run pre-commit run cspell --files bases/lif/*/README.md`
passes
- [ ] Spot-check that each README's claimed endpoints and component list
match `core.py` after merge

🤖 Generated with [Claude Code](https://claude.com/claude-code)
bjagg added a commit that referenced this pull request May 16, 2026
## Summary
Tier 1 README sweep, **batch 4 of 4** — closes out the goal of "every
directory that has at least a file or two has a README". Adds 26 short
READMEs across the remaining component bricks and a few subdirectories.

## Coverage
- **Auth + middleware:** api_key_auth, auth
- **Data plumbing:** composer, datatypes, exceptions, logging,
mongodb_connection, string_utils
- **Schema + config:** lif_schema_config, openapi_schema_parser,
openapi_to_graphql, schema_state_manager
- **Source-adapter framework:** data_source_adapters + two adapter
subdirs (lif_to_lif_adapter,
example_data_source_rest_api_to_lif_adapter)
- **Identity mapper trio:** identity_mapper_service,
identity_mapper_storage, identity_mapper_storage_sql
- **Service-specific bricks:** example_data_source_service,
graphql_client, langchain_agent (+ prompts/), semantic_search_service,
translator
- **Tenant routing pure helpers:** tenant_routing
- **Subdirs:** langchain_agent/prompts (5 prompt templates),
mdr_client/resources (bundled OpenAPI snapshot)

## Wrapping up the sweep

| Batch | PR | Scope |
|---|---|---|
| 1 | #922 | 11 bases |
| 2 | #923 | 5 MDR components |
| 3 | #924 | 7 query/orchestration components |
| 4 | **this PR** | 26 remaining components + subdirs |

After this merges, the only missing README under `bases/lif/` is
`learner_data_export_api/` — which will land with cbeach's open PR #920.

## Test plan
- [x] `uv run pre-commit run cspell --files <new READMEs>` passes
- [ ] Spot-check the "Used by" lists after merge
- [ ] Render check on GitHub: tables, cross-links between READMEs

🤖 Generated with [Claude Code](https://claude.com/claude-code)
@cbeach47 cbeach47 marked this pull request as ready for review May 19, 2026 21:54
@cbeach47 cbeach47 requested a review from bjagg May 20, 2026 14:41
@bjagg bjagg requested a review from Copilot May 20, 2026 16:22
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces the initial “Phase 1a” scaffolding for the Learner Data Export (LDE) FastAPI microservice, wiring it into the repo’s auth/config plumbing and providing a docker-compose deployment target.

Changes:

  • Added a new lif_learner_data_export_api project (pyproject, Dockerfiles, build scripts) plus a FastAPI app with stub /export and /available-data-formats endpoints.
  • Extended shared auth/config to support an LDE service API key and added DTOs for the available data formats response.
  • Added basic API tests and updated the repo lockfile (including ty).

Reviewed changes

Copilot reviewed 16 out of 18 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
uv.lock Updates locked dependency versions (incl. ty).
test/bases/lif/learner_data_export_api/test_core.py Adds endpoint/auth tests for the new service.
test/bases/lif/learner_data_export_api/__init__.py Test package init for the new test module.
projects/lif_learner_data_export_api/README.md Adds build/run and curl examples for the microservice.
projects/lif_learner_data_export_api/pyproject.toml Defines the new project packaging/deps and polylith brick mappings.
projects/lif_learner_data_export_api/Dockerfile2 Multi-stage container build that syncs deps and builds the wheel in-container.
projects/lif_learner_data_export_api/Dockerfile Single-stage runtime image that installs a prebuilt wheel + exported requirements.
projects/lif_learner_data_export_api/build.sh Build helper to export requirements and build wheel artifacts.
projects/lif_learner_data_export_api/build-docker.sh Build helper for the Docker image from repo root.
projects/lif_learner_data_export_api/.gitignore Ignores build artifacts for this project.
projects/lif_learner_data_export_api/.dockerignore Excludes local venv from docker build context.
deployments/advisor-demo-docker/docker-compose.yml Adds the new LDE service container to the demo deployment.
components/lif/mdr_utils/config.py Adds a new service API key setting for learner data export.
components/lif/mdr_auth/core.py Registers the new API key in the service API key map (AuthMiddleware).
components/lif/datatypes/core.py Adds DTOs for available data formats and a health check response model.
bases/lif/learner_data_export_api/learner_data_export_endpoints.py Implements stub endpoints for export + available formats.
bases/lif/learner_data_export_api/core.py Sets up the FastAPI app, AuthMiddleware, CORS, and wiring for endpoints.
bases/lif/learner_data_export_api/__init__.py Exposes the new service’s core module.
Comments suppressed due to low confidence (2)

bases/lif/learner_data_export_api/learner_data_export_endpoints.py:28

  • Issue #906 specifies GET /available-data-formats?urlEscaped=true|false, but this endpoint currently has no urlEscaped query parameter. Add the parameter (with a default) so the API surface matches the design and clients can rely on it.
@router.get("/available-data-formats", response_model=TargetTransformationDataModelsDTO)
async def get_available_data_formats(request: Request):
    # TODO: Build this out.
    logger.info("Received request for available data formats as %s", request.state.principal)

bases/lif/learner_data_export_api/learner_data_export_endpoints.py:33

  • The Issue #906 design shows /available-data-formats returning a JSON array of data format objects, but this implementation returns an object with metadata and dataFormats. Either adjust the response shape to match the design, or update the design/README to document the wrapper so downstream clients don't implement against the wrong contract.
@router.get("/available-data-formats", response_model=TargetTransformationDataModelsDTO)
async def get_available_data_formats(request: Request):
    # TODO: Build this out.
    logger.info("Received request for available data formats as %s", request.state.principal)

    data_formats = TargetTransformationDataModelsDTO(
        metadata={"total": 3},
        dataFormats=[
            TargetTransformationDataModelDTO(

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread test/bases/lif/learner_data_export_api/test_core.py Outdated
Comment thread projects/lif_learner_data_export_api/README.md Outdated
Comment thread components/lif/datatypes/core.py Outdated
Comment thread bases/lif/learner_data_export_api/learner_data_export_endpoints.py Outdated
Comment thread deployments/advisor-demo-docker/docker-compose.yml Outdated
Comment thread projects/lif_learner_data_export_api/Dockerfile2
Copy link
Copy Markdown
Contributor

@bjagg bjagg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read through the new microservice plus the diffs to shared modules. The Dockerfiles, build scripts, pyproject, docker-compose entry, and AuthMiddleware integration all match the established lif_translator_api-style template — nice to see the new service slot in cleanly.

Flagging a handful of items inline. The three I'd want to see addressed before merge:

  1. bases/lif/learner_data_export_api/core.py reads settings.cors_* but those fields don't appear to exist on the Settings class in components/lif/mdr_utils/config.py (which only knows about mdr__*-namespaced fields). As written, the service should crash at first request. Either add the fields (consistent with the existing mdr__* namespace) or read os.environ directly.
  2. test_core.py has two duplicate test functions — test_auth_info_401 and test_auth_info_default_token are each defined twice. Looks like a copy-paste artifact; the second def silently shadows the first.
  3. The /test/auth-info debug endpoint ships hardcoded "auth_type": "API token" regardless of how the caller authenticated and exposes the principal to any authenticated caller — recommend gating or removing.

The rest are style/convention nits. Happy to keep them as comments rather than blockers if you want to land the stub quickly. Thanks for picking up #906!


app.add_middleware(AuthMiddleware)
# Configure CORS middleware
cors_origins = [origin.strip() for origin in settings.cors_allow_origins.split(",") if origin.strip()]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These reads — settings.cors_allow_origins, cors_allow_credentials, cors_allow_methods, cors_allow_headers — assume the Settings class in components/lif/mdr_utils/config.py defines them, but I don't see them there. The docker-compose passes CORS_* env vars (no MDR__ prefix), and the existing Settings only declares mdr__*-namespaced fields. As written, the service should crash at startup with an AttributeError the first time these are dereferenced.

Either (a) add the fields to Settings namespaced as mdr__cors__* (with env aliases if you want to keep the bare CORS_* env vars in compose), or (b) read os.environ directly here. Either way the PR is missing the field definitions.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those settings exist -

cors_allow_origins: str = "*" # comma-separated list of allowed origins

I plan to hold off adjusting them until we have time to generalize the auth flows.

Comment thread test/bases/lif/learner_data_export_api/test_core.py Outdated
Comment thread bases/lif/learner_data_export_api/core.py Outdated
Comment thread test/bases/lif/learner_data_export_api/test_core.py Outdated
@@ -0,0 +1,139 @@
import pytest
from deepdiff import DeepDiff
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Existing LIF tests use assert resp.json() == expected directly — pytest's assertion-rewriting prints a structural diff on failure. DeepDiff adds a runtime dep for the same outcome and produces a noisier failure message. Recommend dropping for consistency with the rest of the test suite.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've found DeepDiff to be more flexible as the objects to compare become more complex.

Adjusted the simple diffs to use ==, leaving the ones that will become complex as future facing examples of DeepDiff.

Comment thread components/lif/datatypes/core.py Outdated
Comment thread components/lif/datatypes/core.py Outdated
Comment thread bases/lif/learner_data_export_api/learner_data_export_endpoints.py Outdated
@@ -0,0 +1,3 @@
from lif.learner_data_export_api import core
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other bases (e.g. bases/lif/translator_restapi/__init__.py) are empty. The eager from … import core here triggers FastAPI app instantiation as a side effect of importing the package — can cause subtle test-order issues if anything imports the package without wanting the app initialized. Recommend dropping the body.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All bases sub-folders follow this pattern of:

from ... import ...
__all__ = [...]

Do you feel all the `base/lif/.../init.py files should be empty?

Comment thread deployments/advisor-demo-docker/docker-compose.yml Outdated
@bjagg
Copy link
Copy Markdown
Contributor

bjagg commented May 21, 2026

FYI — heads up that PR #914 landed on main last night, which marked this branch as DIRTY (merge conflict). The conflicts will be in components/lif/mdr_auth/core.py and components/lif/mdr_utils/config.py since both your PR and #914 added entries to the API-keys dict / Settings class. git merge upstream/main should resolve cleanly.

No rush — happy to wait until you're ready to address my review comments anyway. Just flagging so you're not surprised by the conflict marker.

@cbeach47 cbeach47 requested a review from bjagg May 29, 2026 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

3 participants