Skip to content

Latest commit

 

History

History
574 lines (527 loc) · 45.1 KB

File metadata and controls

574 lines (527 loc) · 45.1 KB

Version: 1.1.2 Status: active Class: published Owner: BioETL Team Reviewers:

  • BioETL Team Last verified: '2026-04-29'

BioETL Project Navigator

Synced with RULES.md v6.1.3 | Last updated: 2026-04-29

Documentation Update: 2026-04-29

  • 2026-04-29: navigator trimmed to navigation-only duties; manual document-status matrix removed to avoid freshness drift
  • Issue #3091 Resolution: Fixed ADR status contradiction (ADR-001..043 → ADR-001..045)
  • Source-code map updated: Added missing directories (domain/lineage/, domain/control_plane/, domain/config/control_plane.py, domain/composite/checkpoint/, application/services/control_plane/, application/services/dq/, application/services/execution/, application/services/lineage/, composition/monitoring/, infrastructure/adr/, infrastructure/audit/, infrastructure/compat/, infrastructure/control_plane/, infrastructure/system/)
  • Compatibility inventory synced with the current measured CLI shim registry
  • Source-code map updated for the storage subpackage decomposition (bronze/, silver/, gold/, metadata/, delta/, support/)
  • Snapshot-style file/test counts removed from active navigation blocks to reduce drift
  • Active entry points clarified: RULES.md, TOOLS.md, and canonical layer docs in docs/02-architecture/
  • 2026-03-20: stale config-loader entry updated to current composition/runtime and infrastructure config seams
  • 2026-03-24: composition/domain references synced with RF-021 config ownership and RF-022 runtime port contracts
  • 2026-03-27: navigator synced with ADR-044/ADR-045, GitHub local workflow guide, and active traceability runbooks
  • 2026-04-01: control-plane documentation pack re-synced with RunManifest / RunLedger runtime, storage layout, rollout flags, inspection CLI, and event baseline
  • 2026-04-02: navigator re-synced with 04-reference/index.md and 05-operations/archive-index.md
  • 2026-04-04: published docs verification guide added to active entrypoints and mixed-environment workflow references

Quick Links

Need to... Go to
Understand the rules RULES.md
Look up terminology glossary.md
Find tool commands TOOLS.md
Verify docs quality gates docs-verification.md
Check entity config parity docs-parity-gate.md
Review structure / retention rules 03-file-policy.md
Govern documentation D-01
Create a new pipeline governance/04-extending-bioetl.md
Review a pipeline pipeline-review-checklist.md
Browse published reference docs index.md
Find doc templates templates/index.md
Inspect run traceability run-manifest-ledger.md
Use inspection CLI cli.md
Check normalization governance non-chembl-normalization-overview.md
Review identifier family policy reference-identifiers.md
Run control-plane triage run-manifest-inspection.md
Understand control-plane decision ADR-044
Understand rollout / DQ decision ADR-045
Handle a prod error runbooks/index.md
Browse historical ops material archive-index.md
Understand architecture 00-overview.md
Check data contracts chembl_activity-v1.0.json
Check DQ contracts dq-contracts.md
Browse ADR registry adr-registry.md
Need historical context Repository path docs/99-archive/README.md (non-canonical)

Language Policy

Category Language Examples
Public-facing English README.md, CONTRIBUTING.md, CHANGELOG.md
User guides English docs/03-guides/, docs/04-reference/
Internal governance Russian RULES.md, AGENT.md, docs/00-project/governance/*
Architecture docs Russian docs/02-architecture/*
Code comments Russian Docstrings, inline comments

Documentation Structure

docs/
├── 00-project/                  # Project rules & governance
│   ├── 00-map.md                # This file (Project Navigator)
│   ├── index.md                 # Welcome page
│   ├── RULES.md                 # Canonical rules document (v6.1.2)
│   ├── glossary.md              # Ubiquitous Language terminology
│   ├── TOOLS.md                 # Active tools hub & unified entry points
│   ├── rules-summary.md         # TL;DR of RULES.md
│   └── governance/              # Project governance policies
│       ├── 01-documentation-governance-style-guide.md  # D-01 documentation metapolicy
│       ├── 02-naming-policy.md  # Entity naming conventions
│       ├── 03-file-policy.md
│       ├── 04-extending-bioetl.md
│       ├── 05-github-policy.md  # CI/CD, branch protection, reviews
│       └── 06-doc-publication-policy.md  # Documentation publication policy
│
├── 01-requirements/             # Requirements
│   └── REQUIREMENTS.md          # Testable requirements catalog
│
├── 02-architecture/             # Architecture & Decisions
│   ├── 00-overview.md           # Architecture overview
│   ├── decisions/               # ADR decision files (see generated adr-registry.md)
│   ├── diagrams/            # Canonical Mermaid source files and rendered views
│   └── ... (Layer docs: 01-domain, 02-application, etc.)
│
├── 03-guides/                   # Guides & Manuals
│   ├── development/             # Developer guides (config schema, etc.)
│   └── ... (User guides: getting-started, testing, etc.)
│
├── 04-reference/                # Reference Documentation
│   ├── index.md                 # Reference landing page
│   ├── api/                     # API Reference
│   ├── cli.md                   # CLI Reference
│   ├── providers/               # Provider documentation (ChEMBL, PubMed, etc.)
│   ├── pipelines/               # Pipeline specifications
│   ├── contracts/               # Data and control-plane contracts
│   ├── schemas/                 # Auxiliary schemas & field maps
│   └── templates/               # Code & doc templates + published template index
│
├── 05-operations/               # Operations & Runbooks
│   ├── archive-index.md         # Historical / archive-only operations surface
│   ├── runbooks/                # Incident response playbooks
│   ├── verification/            # Data verification reports
│   └── ... (Ops guides: vacuum, performance)
│
└── 99-archive/                  # Historical / superseded (repo-only, non-canonical)
    ├── reports/                 # Old project reports
    └── ...

By Topic

Getting Started

  1. RULES.md - Project rules (start here)
  2. rules-summary.md - Quick reference
  3. TOOLS.md - Active tools hub and script entry points
  4. 04-extending-bioetl.md - Adding providers/pipelines

Architecture

Document Covers RULES.md
system-context.md Entity models, IDs, relationships §2.8
container-diagram.md C4 Container, Local-Only runtime §5.6
data-flow.md Ports & Adapters, layer responsibilities §1.1
05-composition-layer.md Composition Root, DI, Factories §1.1
ADR-001: Delta Lake Storage engine choice §2.1, §3
ADR-002: Medallion Data layering pattern §1
ADR-003: In-Memory Locking MemoryLock strategy §6
ADR-004: Pydantic Validation approach -
ADR-005: Composition Layer DI and layer separation §1.1
ADR-006: Logger/Metrics Ports Port abstractions §1.1
ADR-007: Circuit Breaker Failure handling pattern §3.1.4
ADR-008: Graceful Shutdown SIGTERM/SIGINT handling §5.3
ADR-009: Paginated Fetcher Pagination abstraction App D
ADR-010: Local-Only Deploy File-based deployment (no Docker) §5.6
ADR-011: Watermark Removal Simplified checkpoint model §2.4
ADR-012: Storage Clear Contract Storage clear API, run-id injection §2.1
ADR-013: Async Storage Cleanup MedallionLifecycleService pattern §2.1
ADR-014: Deterministic Writes SCD2 ingestion-ts, reproducible writes §2.1
ADR-015: Pipeline Services Lifecycle Port lifecycle contracts §1.1
ADR-016: Error Handling Strategy Unified error classification §3.1
ADR-017: Observability Architecture Metrics, tracing, logging ports §5.1
ADR-018: Gold Strict Validation Pandera Gold validation §2.7
ADR-019: Observability Port Enforcement REQ-OBS-001 compliance §5.1
ADR-020: BasePipeline Decomposition God Object refactoring §1.1
ADR-021: DDD Aggregates DDD aggregates adoption -
ADR-022: Tracing NoOp NoOp for tracing -
ADR-023: Entity Type Patterns Entity type patterns -
ADR-024: Entity Naming Unification Entity naming unification -
ADR-025: Pipeline Config Unification Pipeline config unification -
ADR-026: Composite Pipeline Pattern Composite pipeline pattern -
ADR-027: DQ Rules Externalization Hierarchical DQ configuration §3.1.2
ADR-028: Filter Rules Externalization Hierarchical filter configuration App D
ADR-029: Output Metadata Unification Unified output metadata contracts §2.4
ADR-030: Publication Pagination Strategy Publication pagination strategy -
ADR-031: Loading Strategy Formalization Loading strategy formalization -
ADR-032: Unified HTTP Client Unified HTTP client pattern App A
ADR-033: Publication Validation Strategy Five-level publication validation §3.4
ADR-034: Schema↔Domain Pairs Schema↔Domain configuration pairs §2.8
ADR-035: JSON Field Typing Policy JSON field typing (Silver↔Gold) §2.8
ADR-036: Gold Contract Versioning Gold contract versioning policy §2.7
ADR-037: Canonical Schema Generation Canonical schema source and generation §2.8
ADR-038: Enum Externalization ChEMBL enum values externalization App D
ADR-039: Unified Entity Config Unified entity configuration format App D
ADR-040: Diagram Governance Mermaid diagram standards and governance §7.5
ADR-041: Naming Policy Skills/Agents Naming conventions for skills and agents §7.1
ADR-042: Testing Strategy Matrix Test categorization and coverage strategy §5
ADR-043: Documentation Knowledge Management Documentation governance and knowledge management §7
ADR-044: Run Manifest and Run Ledger Control Plane Immutable manifest, append-only ledger, inspection CLI §2.4, §5.5
ADR-045: Data Quality Contract System DQ contract semantics and rollout alignment §3.4, §5.5

Data Management

Topic Document RULES.md
Medallion Layers data-flow.md §2.1
Schema Drift RULES.md §2.2
Data Lineage system-context.md §2.3
Backfill/Replay RULES.md §2.4
Quarantine RULES.md §2.6
Content Hash system-context.md §2.8

Control Plane & Traceability

Topic Document RULES.md
Published control-plane contract run-manifest-ledger.md §2.4, §5.5
Supported inspection CLI cli.md §5.5
Mandatory inspection runbook run-manifest-inspection.md §2.4, §5.5
Control-plane ADR ADR-044 §2.4, §5.5
DQ / rollout ADR ADR-045 §3.4, §5.5
Documentation metapolicy D-01 §7

Schema Documentation

Provider Entity Schema Document RULES.md
ChEMBL Activity activity-schema.md §2.8
ChEMBL Molecule molecule-schema.md §2.8
ChEMBL Target target-schema.md §2.8
ChEMBL Assay assay-schema.md §2.8

Operations

Topic Document RULES.md
Error Handling ADR-016 §3.1
Circuit Breaker ADR-007 §3.1.4
Locking ADR-003 §3.3
DQ Metrics RULES.md §3.4
Graceful Shutdown ADR-008 §5.3
DR Procedures runbooks/index.md §5.5
Control-Plane Contract run-manifest-ledger.md §2.4, §5.5
Inspection CLI cli.md §5.5
Run Traceability run-manifest-inspection.md §2.4, §5.5
Historical Ops Artifacts archive-index.md §7
Cleanup cleanup-policy.md §2.1.1

Development

Topic Document RULES.md
Adding Providers add-new-source.md App D
Adding Pipelines add-pipeline-existing-source.md App D
Pipeline Review pipeline-review-checklist.md §4.2
GitHub Workflow github-local-workflow.md §7.3
Testing testing.md §4.2
Coverage Config coverage-configuration.md §4.2
E2E Testing ADR-010 §4.2.3
Date Handling date-handling.md §2.4
Code Style RULES.md §4 §4

Source Code Map

src/bioetl/
├── domain/                      # Pure logic, no I/O (§1.1)
│   ├── ports/                   # Protocol interfaces
│   │   ├── __init__.py          # Facade — single import point (ARCH-008)
│   │   ├── data_source.py       # DataSourcePort, FilterableDataSourcePort
│   │   ├── storage.py           # StoragePort
│   │   ├── locking.py           # LockPort
│   │   ├── checkpoint.py        # CheckpointPort
│   │   ├── quarantine.py        # QuarantinePort
│   │   ├── observability/       # Observability port package
│   │   │   ├── __init__.py      # Facade — single import point for observability ports
│   │   │   ├── logging.py       # LoggerPort
│   │   │   ├── metrics.py       # MetricsPort, MetricsServerPort, MetricsPublisherPort
│   │   │   └── tracing.py       # TracingPort
│   │   ├── validation.py        # GoldValidatorPort
│   │   ├── filtering.py         # InputFilterPort
│   │   ├── health_check.py      # HealthCheckPort
│   │   ├── resilience.py        # CircuitBreakerPort
│   │   ├── runner.py            # PipelineRunnerPort
│   │   ├── shutdown.py          # ShutdownPort
│   │   ├── memory.py            # MemoryPort
│   │   ├── metadata.py          # MetadataPort
│   │   ├── metadata_coordinator.py  # MetadataCoordinatorPort
│   │   ├── delta_reader.py      # DeltaReaderPort
│   │   ├── data_normalization.py    # DataNormalizationPort
│   │   ├── dq_config.py         # DQConfigPort
│   │   ├── dq_report.py         # DQReportPort
│   │   ├── serialization.py     # SerializationPort
│   │   ├── idmapping.py         # IdMappingPort
│   │   ├── adr.py               # AdrPort
│   │   ├── audit.py             # AuditPort
│   │   ├── pii.py               # PiiPort
│   │   └── noop.py              # NoOp implementations (Null Object Pattern)
│   ├── config/                  # Domain config models (package)
│   │   ├── pipeline.py          # PipelineConfig
│   │   ├── runtime.py           # RuntimeConfig
│   │   ├── dq.py                # DQ config models
│   │   ├── table.py             # TableConfig
│   │   ├── memory.py            # MemoryConfig
│   │   ├── validation.py        # ValidationConfig
│   │   └── control_plane.py     # Control plane config models
│   ├── exceptions/              # Domain exceptions hierarchy (package)
│   │   ├── base.py              # BioETLError base
│   │   ├── validation.py        # ValidationError
│   │   ├── network.py           # NetworkError
│   │   ├── infrastructure.py    # InfrastructureError
│   │   ├── internal.py          # InternalError
│   │   └── data_quality.py      # DataQualityError
│   ├── aggregates/              # DDD Aggregates (ADR-021)
│   │   ├── batch.py             # Batch aggregate
│   │   ├── pipeline_run.py      # PipelineRun aggregate
│   │   ├── quarantine_entry.py  # QuarantineEntry aggregate
│   │   └── events.py            # Domain events
│   ├── composite/               # Composite pipeline domain (ADR-026)
│   │   ├── strategy.py          # Merge strategies
│   │   ├── state.py             # Composite state machine
│   │   ├── result.py            # CompositeResult
│   │   ├── lineage.py           # Field-level lineage
│   │   ├── cross_validation.py  # Pre-merge validation
│   │   ├── field_groups.py      # Column ordering
│   │   ├── config.py            # Composite config models
│   │   ├── aggregation.py       # Enricher aggregation
│   │   └── checkpoint/           # Composite checkpoint models
│   ├── contracts/gold/          # Gold contract modules
│   │   ├── chembl.py            # ChEMBL Gold contract exports
│   │   ├── composite.py         # Composite Gold contract exports
│   │   ├── publications.py      # Publication Gold contract exports
│   │   ├── pubchem.py           # PubChem Gold contract exports
│   │   └── uniprot.py           # UniProt Gold contract exports
│   ├── entities/                # Domain entities
│   ├── filtering/               # Filter domain models (ADR-028)
│   │   ├── input_config.py      # InputFilterConfig
│   │   ├── silver_config.py     # SilverFilterConfig
│   │   ├── gold_config.py       # GoldFilterConfig
│   │   └── ...                  # column_filter, range_filter, list_filters
│   ├── lineage/                 # Data lineage tracking
│   │   ├── field_lineage.py     # Field-level lineage tracking
│   │   ├── pipeline_lineage.py  # Pipeline-level lineage
│   │   └── ...                  # Lineage utilities
│   ├── control_plane/           # Control plane domain models
│   │   ├── run_manifest.py      # Run manifest models
│   │   ├── run_ledger.py       # Run ledger models
│   │   └── ...                  # Control plane contracts
│   ├── mapping/                 # Field mapping definitions
│   │   ├── publication_fields.py        # Publication field mappings
│   │   ├── publication_type_mapping.py  # Type classification
│   │   ├── activity_fields.py           # Activity field mappings
│   │   └── molecule_fields.py           # Molecule field mappings
│   ├── models/                  # Domain models
│   ├── registry/                # Entity registries
│   ├── schemas/                 # PyArrow Silver schemas (by provider)
│   │   ├── common/              # Base schemas (publication_base, molecule_base)
│   │   ├── chembl/              # 13 ChEMBL entity schemas
│   │   ├── crossref/            # CrossRef schemas
│   │   ├── openalex/            # OpenAlex schemas
│   │   ├── pubchem/             # PubChem schemas
│   │   ├── pubmed/              # PubMed schemas
│   │   ├── semanticscholar/     # SemanticScholar schemas
│   │   └── uniprot/             # UniProt schemas
│   ├── services/                # Domain services
│   │   ├── normalization_service.py     # Data normalization
│   │   ├── identity_service.py          # Entity ID generation
│   │   ├── text_similarity.py           # Text similarity
│   │   ├── dq_metrics_calculator.py     # DQ metrics
│   │   ├── unit_converter.py            # Unit conversion
│   │   └── ...
│   ├── value_objects/           # Value objects
│   │   ├── run_context.py       # RunContext
│   │   ├── dq_result.py         # DQResult
│   │   ├── silver_result.py     # SilverResult
│   │   └── ...
│   ├── transformations.py       # Pure transformation functions
│   └── types.py                 # Shared types (RunType, HealthStatus, ErrorCode)
│
├── application/                 # Pipeline orchestration (§1.1)
│   ├── core/                    # Core pipeline infrastructure
│   │   ├── base.py              # Base pipeline primitives
│   │   ├── base_transformer/    # Base transformer contracts and structural policy
│   │   ├── batch_executor.py    # Batch executor
│   │   ├── runner.py            # PipelineRunner (Driving Adapter logic)
│   │   ├── record_processor.py  # Record processing
│   │   ├── lifecycle/           # Shutdown, checkpoint, locks, cleanup, heartbeat
│   │   ├── preflight/           # Pre-run validation and health aggregation
│   │   ├── postrun/             # Post-run cleanup, DQ, metadata and VACUUM
│   │   ├── quarantine_manager.py    # Quarantine management
│   │   ├── batch_memory_manager.py  # Memory management
│   │   └── ...
│   ├── composite/               # Composite pipeline orchestration
│   ├── pipelines/               # Concrete pipeline implementations
│   │   ├── common/              # Shared pipeline helpers
│   │   ├── chembl/              # ChEMBL transformers and pipeline helpers
│   │   │   ├── activity_transformer.py
│   │   │   ├── assay_transformer.py
│   │   │   ├── molecule_transformer.py
│   │   │   ├── target_transformer.py
│   │   │   ├── publication_transformer.py
│   │   │   ├── cell_line_transformer.py
│   │   │   ├── protein_class_transformer.py
│   │   │   ├── compound_record_transformer.py
│   │   │   ├── tissue_transformer.py
│   │   │   ├── target_component_transformer.py
│   │   │   ├── assay_parameters_transformer.py
│   │   │   ├── publication_term_transformer.py
│   │   │   ├── publication_similarity_transformer.py
│   │   │   ├── subcellular_fraction_transformer.py
│   │   │   ├── base_chembl_transformer.py
│   │   │   └── _pipelines.py
│   │   ├── pubchem/             # PubChem transformers
│   │   ├── pubmed/              # PubMed transformers
│   │   ├── uniprot/             # UniProt transformers
│   │   ├── crossref/            # CrossRef transformers
│   │   ├── openalex/            # OpenAlex transformers
│   │   └── semanticscholar/     # SemanticScholar transformers
│   ├── services/                # Application services
│   │   ├── control_plane/       # Control plane services
│   │   ├── dq/                 # Data quality services
│   │   ├── execution/          # Execution services
│   │   └── lineage/            # Lineage services
│   └── observability/           # Application-level observability
│
├── composition/                 # Composition Root (DI container)
│   ├── bootstrap/               # Bootstrap helpers
│   │   ├── assembly/            # Storage/checkpoint assembly
│   │   ├── cli/                 # CLI bootstrap (adr, checkpoint, config, health, ...)
│   │   └── runtime/             # Runtime assembly, composite, pipeline, runner
│   ├── bootstrap_contexts.py    # Bootstrap contexts
│   ├── bootstrap_logger.py      # Bootstrap logging setup
│   ├── builders.py              # Composition builders
│   ├── entrypoints.py           # CLI/runner entrypoints
│   ├── monitoring/              # Monitoring and health checks
│   ├── observability.py         # Observability wiring
│   ├── providers/               # Provider registration
│   ├── registry.py              # Pipeline discovery
│   ├── runtime_builders/        # Runtime builder helpers
│   ├── services/                # Composition services
│   └── factories/               # Consolidated factories
│       ├── pipeline_factory.py  # Pipeline factory
│       ├── runner_factory.py    # Runner factory
│       ├── storage_factory.py   # Multi-layer storage factory
│       ├── http_client_factory.py   # HTTP client factory
│       ├── datasource/data_source_factory.py   # Data source factory
│       ├── transformer_factory.py   # Transformer factory
│       └── ...
│
├── infrastructure/              # I/O adapters (§1.1)
│   ├── adapters/                # External API clients
│   │   ├── http/                # HTTP client infrastructure (rate limiter, circuit breaker)
│   │   ├── chembl/              # ChEMBL API adapter
│   │   ├── crossref/            # CrossRef API adapter
│   │   ├── openalex/            # OpenAlex API adapter
│   │   ├── pubchem/             # PubChem API adapter
│   │   ├── pubmed/              # PubMed API adapter
│   │   ├── semanticscholar/     # Semantic Scholar API adapter
│   │   ├── uniprot/             # UniProt API adapter
│   │   ├── common/              # Shared adapter utilities
│   │   ├── decorators/          # circuit_breaker, retry decorators
│   │   └── input/               # CSV filter reader
│   ├── storage/                 # Data persistence with canonical public seams + internal subpackages
│   │   ├── bronze/              # Bronze writer internals
│   │   ├── silver/              # Silver writer internals
│   │   ├── gold/                # Gold writer internals
│   │   ├── metadata/            # Metadata builder/writer internals
│   │   ├── delta/               # Shared Delta helpers
│   │   ├── support/             # Retention/checkpoint/atomic helpers
│   │   ├── bronze_writer.py     # Bronze layer public writer seam
│   │   ├── silver_writer.py     # Silver layer public writer seam
│   │   ├── gold_writer.py       # Gold layer public writer seam
│   │   ├── base_delta_writer.py # Shared Delta writer seam
│   │   ├── delta_reader.py      # Delta table reader seam
│   │   ├── metadata_builder.py  # Metadata builder seam
│   │   ├── metadata_writer.py   # Metadata writer seam
│   │   └── atomic.py            # Atomic file-write facade
│   ├── adr/                     # ADR utilities
│   ├── audit/                   # Audit utilities
│   ├── config/                  # Config loaders (package)
│   ├── control_plane/           # Control plane infrastructure
│   ├── errors/                  # Error handling
│   ├── export/                  # Export utilities
│   ├── locking/                 # Local in-process locking
│   │   └── memory_lock.py       # In-memory single-instance lock (ADR-010)
│   ├── checkpoint/              # Checkpoint persistence
│   ├── compat/                  # Compatibility utilities
│   ├── quarantine/              # DQ failure handling
│   ├── observability/           # Metrics, logging, tracing adapters
│   ├── schemas/                 # Pydantic config schemas
│   ├── security/                # PII hashing
│   ├── serialization/           # JSON encoders
│   ├── validation/              # Pandera validator
│   ├── config_merge.py          # Config merge utility
│   └── system/                  # System utilities
│
└── interfaces/                  # External interfaces
    ├── cli/                     # CLI package (bioetl run/quarantine/checkpoint)
    │   ├── __init__.py          # CLI package entry
    │   ├── __main__.py          # CLI module entrypoint
    │   ├── exit_codes.py        # CLI exit codes
    │   ├── formatters.py        # CLI output formatting
    │   ├── main.py              # Click command group
    │   └── commands/            # CLI command entrypoints + support/compat modules
    ├── http/                    # HTTP interfaces (health server)
    │   ├── health_server.py     # Health server
    │   └── types.py             # HTTP types
    └── orchestration/           # Orchestration helpers

tests/
├── unit/                        # Isolated unit tests (mock I/O)
│   ├── domain/                  # Domain logic tests
│   ├── application/             # Pipeline/transformer tests
│   └── infrastructure/          # Adapter tests
├── integration/                 # Integration tests (VCR cassettes)
│   └── adapters/                # HTTP adapter tests
├── e2e/                         # E2E tests (Local-Only arch)
├── architecture/                # Architecture validation tests
└── fixtures/                    # Test fixtures
    └── vcr/                     # VCR cassettes for HTTP

Config Files Map

graph TD
    subgraph configs
        direction LR
        A(entities) --> A1("chembl/activity.yaml")
        A --> A2("pubmed/publication.yaml")
        B(providers) --> B1("chembl.yaml")
        B --> B2("openalex.yaml")
        C(base) --> C1("pipeline.yaml")
        C --> C2("quality.yaml")
        D(composites) --> D1("publication.yaml")
    end
Loading

Key Files

File Purpose
docs/00-project/RULES.md Master rules document
docs/00-project/glossary.md Ubiquitous Language terminology
CHANGELOG.md Version history
configs/entities/{provider}/{entity}.yaml Pipeline configuration
src/bioetl/domain/ports/ Protocol interfaces (package)
src/bioetl/composition/bootstrap_contexts.py Composition root
src/bioetl/infrastructure/config/composite_config_api.py Canonical composite-config loading seam
src/bioetl/infrastructure/config/ Infrastructure config loaders and normalization package
docs/02-architecture/system-context.md High-level system diagram
docs/04-reference/contracts/gold/{provider}_{entity}_v{major}.{minor}.json Published Gold data contract exports

CI/CD & GitHub

Topic Document RULES.md
GitHub Policy 05-github-policy.md §4, §5
Contributing CONTRIBUTING.md
Security SECURITY.md §5.4

Related Resources


Navigator freshness is carried by the page header and per-document metadata. This page intentionally avoids a separate manual status table because it drifts faster than the canonical owner docs themselves.