Skip to content

refactor: reorganize infrastructure/ by adapted technology, not by domain #142

@rorybyrne

Description

@rorybyrne

Problem

infrastructure/ currently mixes two taxonomies:

  • By technology: persistence/ (Postgres), s3/, http/, k8s/, oci/
  • By domain: auth/, data/, event/, ingest/

The result is that the same kind of thing lands in different places:

  • The data read store (a Postgres adapter) sits in data/ while every other Postgres adapter sits in persistence/.
  • The role repository (a Postgres repo) sits in auth/.
  • The filesystem storage adapters live under persistence/adapter/ while their S3 twins live under s3/ — two implementations of the same ports in unrelated corners of the tree.
  • persistence/adapter/spreadsheet.py is openpyxl template generation, not persistence.
  • messaging/ is an empty package (0 lines).
  • persistence/ itself is a grab-bag: engine setup, migrations, seeding, static table definitions (one 387-line tables.py), dynamic-table builders, dynamic-table stores, repositories, mappers, read adapters, and query utilities all at one level.

Target layout

Principle: top-level packages name an infrastructure concern — the external system being adapted, or the port when multiple backends implement it — and never a bounded context. Within each package, organize by adapter role (setup / table definitions / write repos / read queries). Domain names appear at the file level, never the directory level.

Two concern shapes are both valid at the top level:

  • Single-technology concerns get the technology's name: postgres/, http/.
  • Ports with multiple backends get the port's name with technology subdirectories: storage/ (fs + S3), runner/ (OCI + K8s) — keeping interchangeable implementations as siblings is the point.

What is not valid is a bounded-context name: the current auth/ and event/ packages are renamed to idp/ and worker/ so a reader can't mistake them for "adapters owned by the auth/event domains".

infrastructure/
├── logging.py
├── postgres/                      # ← renamed persistence/ (it's all PG-specific)
│   ├── setup/                     #   lifecycle & wiring
│   │   ├── database.py            #   engine/session factory
│   │   ├── migrate.py
│   │   ├── seed.py
│   │   └── di.py                  #   PersistenceProvider
│   ├── tables/                    #   ALL table shapes — the one place schemas live
│   │   ├── records.py, events.py, auth.py, ...   # tables.py split by domain
│   │   ├── feature_table.py       #   dynamic builders
│   │   ├── metadata_table.py
│   │   ├── column_mapper.py       #   ColumnDef → sa.Column
│   │   └── naming.py              #   api_naming.py
│   ├── repository/                #   write side: aggregate repositories
│   │   ├── (existing 8 repos)
│   │   ├── role.py                #   ← from auth/role_repository.py
│   │   └── mappers/               #   row ↔ aggregate (only repos use them)
│   ├── store/                     #   dynamic-table DDL + bulk writes
│   │   ├── feature_store.py
│   │   └── metadata_store.py
│   └── query/                     #   read side (CQRS: queries ≠ repositories)
│       ├── data_read_store.py     #   ← from infrastructure/data/
│       ├── feature_reader.py      #   ← from persistence/adapter/
│       ├── readers.py             #   ← cross-domain read ports
│       └── keyset.py
├── storage/                       # file-storage port — both backends together
│   ├── layout.py
│   ├── fs/                        #   ← persistence/adapter/{storage,ingest_storage}.py
│   └── s3/                        #   ← s3/{client,storage,ingest_storage}.py
├── runner/                        # validator/ingester execution backends
│   ├── shared.py                  #   ← runner_utils.py
│   ├── oci/
│   └── k8s/
├── idp/                           # ← renamed auth/ — external identity providers (orcid, provider_registry, di)
├── worker/                        # ← renamed event/ — APScheduler WorkerPool / outbox drainer
├── http/                          # outbound HTTP (ontology fetcher; unchanged)
└── spreadsheet/                   # openpyxl adapter — it was never persistence

Deleted outright: messaging/ (empty), infrastructure/data/ (absorbed into postgres/query/), persistence/adapter/ (disbanded — a 'miscellaneous' folder is how this drift started). ingest/di.py moves next to whatever it actually provides (likely runner/ or storage/).

The repository/ vs query/ split mirrors the CQRS layering: repositories serve aggregates to command handlers; the query package serves read models and streams. store/ sits apart because the dynamic-table stores are neither — they are DDL + projection writers driven by events.

Guardrail: directory-scoped CLAUDE.md

Add server/osa/infrastructure/CLAUDE.md recording the placement rule so the drift doesn't recur:

Top-level packages under infrastructure/ name an infrastructure concern: the external system being adapted (postgres/, http/), or the port when multiple backends implement it (storage/, runner/). Never create a package named after a bounded context. Within a package, separate setup, table definitions, write-side repositories, and read-side queries. Domain names appear at the file level only.

Also update the repository-structure section of the root CLAUDE.md to match the new tree.

Execution notes

  • Pure-mechanical move: git mv + import rewrites, zero logic changes. Existing test suites are the safety net.
  • Touches imports across the whole server and Alembic's target_metadata import path — do as a standalone PR after feat: unified /data/ read surface; remove legacy index/search/export #139 merges to avoid conflicting with open review threads.
  • Splitting tables.py per domain is the only step requiring judgment; it can trail in a second commit within the same PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    refactorInternal restructuring, no behavior changetech-debtKnown shortcuts to address later

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions