Version: 1.1.2 Status: active Class: published Owner: BioETL Team Reviewers:
- BioETL Team Last verified: '2026-04-29'
Synced with RULES.md v6.1.3 | Last updated: 2026-04-29
Documentation Update: 2026-04-29
- 2026-04-29: navigator trimmed to navigation-only duties; manual document-status matrix removed to avoid freshness drift
- Issue #3091 Resolution: Fixed ADR status contradiction (ADR-001..043 → ADR-001..045)
- Source-code map updated: Added missing directories (
domain/lineage/,domain/control_plane/,domain/config/control_plane.py,domain/composite/checkpoint/,application/services/control_plane/,application/services/dq/,application/services/execution/,application/services/lineage/,composition/monitoring/,infrastructure/adr/,infrastructure/audit/,infrastructure/compat/,infrastructure/control_plane/,infrastructure/system/)- Compatibility inventory synced with the current measured CLI shim registry
- Source-code map updated for the storage subpackage decomposition (
bronze/,silver/,gold/,metadata/,delta/,support/)- Snapshot-style file/test counts removed from active navigation blocks to reduce drift
- Active entry points clarified:
RULES.md,TOOLS.md, and canonical layer docs indocs/02-architecture/- 2026-03-20: stale config-loader entry updated to current composition/runtime and infrastructure config seams
- 2026-03-24: composition/domain references synced with RF-021 config ownership and RF-022 runtime port contracts
- 2026-03-27: navigator synced with ADR-044/ADR-045, GitHub local workflow guide, and active traceability runbooks
- 2026-04-01: control-plane documentation pack re-synced with RunManifest / RunLedger runtime, storage layout, rollout flags, inspection CLI, and event baseline
- 2026-04-02: navigator re-synced with
04-reference/index.mdand05-operations/archive-index.md- 2026-04-04: published docs verification guide added to active entrypoints and mixed-environment workflow references
| Need to... | Go to |
|---|---|
| Understand the rules | RULES.md |
| Look up terminology | glossary.md |
| Find tool commands | TOOLS.md |
| Verify docs quality gates | docs-verification.md |
| Check entity config parity | docs-parity-gate.md |
| Review structure / retention rules | 03-file-policy.md |
| Govern documentation | D-01 |
| Create a new pipeline | governance/04-extending-bioetl.md |
| Review a pipeline | pipeline-review-checklist.md |
| Browse published reference docs | index.md |
| Find doc templates | templates/index.md |
| Inspect run traceability | run-manifest-ledger.md |
| Use inspection CLI | cli.md |
| Check normalization governance | non-chembl-normalization-overview.md |
| Review identifier family policy | reference-identifiers.md |
| Run control-plane triage | run-manifest-inspection.md |
| Understand control-plane decision | ADR-044 |
| Understand rollout / DQ decision | ADR-045 |
| Handle a prod error | runbooks/index.md |
| Browse historical ops material | archive-index.md |
| Understand architecture | 00-overview.md |
| Check data contracts | chembl_activity-v1.0.json |
| Check DQ contracts | dq-contracts.md |
| Browse ADR registry | adr-registry.md |
| Need historical context | Repository path docs/99-archive/README.md (non-canonical) |
| Category | Language | Examples |
|---|---|---|
| Public-facing | English | README.md, CONTRIBUTING.md, CHANGELOG.md |
| User guides | English | docs/03-guides/, docs/04-reference/ |
| Internal governance | Russian | RULES.md, AGENT.md, docs/00-project/governance/* |
| Architecture docs | Russian | docs/02-architecture/* |
| Code comments | Russian | Docstrings, inline comments |
docs/
├── 00-project/ # Project rules & governance
│ ├── 00-map.md # This file (Project Navigator)
│ ├── index.md # Welcome page
│ ├── RULES.md # Canonical rules document (v6.1.2)
│ ├── glossary.md # Ubiquitous Language terminology
│ ├── TOOLS.md # Active tools hub & unified entry points
│ ├── rules-summary.md # TL;DR of RULES.md
│ └── governance/ # Project governance policies
│ ├── 01-documentation-governance-style-guide.md # D-01 documentation metapolicy
│ ├── 02-naming-policy.md # Entity naming conventions
│ ├── 03-file-policy.md
│ ├── 04-extending-bioetl.md
│ ├── 05-github-policy.md # CI/CD, branch protection, reviews
│ └── 06-doc-publication-policy.md # Documentation publication policy
│
├── 01-requirements/ # Requirements
│ └── REQUIREMENTS.md # Testable requirements catalog
│
├── 02-architecture/ # Architecture & Decisions
│ ├── 00-overview.md # Architecture overview
│ ├── decisions/ # ADR decision files (see generated adr-registry.md)
│ ├── diagrams/ # Canonical Mermaid source files and rendered views
│ └── ... (Layer docs: 01-domain, 02-application, etc.)
│
├── 03-guides/ # Guides & Manuals
│ ├── development/ # Developer guides (config schema, etc.)
│ └── ... (User guides: getting-started, testing, etc.)
│
├── 04-reference/ # Reference Documentation
│ ├── index.md # Reference landing page
│ ├── api/ # API Reference
│ ├── cli.md # CLI Reference
│ ├── providers/ # Provider documentation (ChEMBL, PubMed, etc.)
│ ├── pipelines/ # Pipeline specifications
│ ├── contracts/ # Data and control-plane contracts
│ ├── schemas/ # Auxiliary schemas & field maps
│ └── templates/ # Code & doc templates + published template index
│
├── 05-operations/ # Operations & Runbooks
│ ├── archive-index.md # Historical / archive-only operations surface
│ ├── runbooks/ # Incident response playbooks
│ ├── verification/ # Data verification reports
│ └── ... (Ops guides: vacuum, performance)
│
└── 99-archive/ # Historical / superseded (repo-only, non-canonical)
├── reports/ # Old project reports
└── ...
- RULES.md - Project rules (start here)
- rules-summary.md - Quick reference
- TOOLS.md - Active tools hub and script entry points
- 04-extending-bioetl.md - Adding providers/pipelines
| Document | Covers | RULES.md |
|---|---|---|
| system-context.md | Entity models, IDs, relationships | §2.8 |
| container-diagram.md | C4 Container, Local-Only runtime | §5.6 |
| data-flow.md | Ports & Adapters, layer responsibilities | §1.1 |
| 05-composition-layer.md | Composition Root, DI, Factories | §1.1 |
| ADR-001: Delta Lake | Storage engine choice | §2.1, §3 |
| ADR-002: Medallion | Data layering pattern | §1 |
| ADR-003: In-Memory Locking | MemoryLock strategy | §6 |
| ADR-004: Pydantic | Validation approach | - |
| ADR-005: Composition Layer | DI and layer separation | §1.1 |
| ADR-006: Logger/Metrics Ports | Port abstractions | §1.1 |
| ADR-007: Circuit Breaker | Failure handling pattern | §3.1.4 |
| ADR-008: Graceful Shutdown | SIGTERM/SIGINT handling | §5.3 |
| ADR-009: Paginated Fetcher | Pagination abstraction | App D |
| ADR-010: Local-Only Deploy | File-based deployment (no Docker) | §5.6 |
| ADR-011: Watermark Removal | Simplified checkpoint model | §2.4 |
| ADR-012: Storage Clear Contract | Storage clear API, run-id injection | §2.1 |
| ADR-013: Async Storage Cleanup | MedallionLifecycleService pattern | §2.1 |
| ADR-014: Deterministic Writes | SCD2 ingestion-ts, reproducible writes | §2.1 |
| ADR-015: Pipeline Services Lifecycle | Port lifecycle contracts | §1.1 |
| ADR-016: Error Handling Strategy | Unified error classification | §3.1 |
| ADR-017: Observability Architecture | Metrics, tracing, logging ports | §5.1 |
| ADR-018: Gold Strict Validation | Pandera Gold validation | §2.7 |
| ADR-019: Observability Port Enforcement | REQ-OBS-001 compliance | §5.1 |
| ADR-020: BasePipeline Decomposition | God Object refactoring | §1.1 |
| ADR-021: DDD Aggregates | DDD aggregates adoption | - |
| ADR-022: Tracing NoOp | NoOp for tracing | - |
| ADR-023: Entity Type Patterns | Entity type patterns | - |
| ADR-024: Entity Naming Unification | Entity naming unification | - |
| ADR-025: Pipeline Config Unification | Pipeline config unification | - |
| ADR-026: Composite Pipeline Pattern | Composite pipeline pattern | - |
| ADR-027: DQ Rules Externalization | Hierarchical DQ configuration | §3.1.2 |
| ADR-028: Filter Rules Externalization | Hierarchical filter configuration | App D |
| ADR-029: Output Metadata Unification | Unified output metadata contracts | §2.4 |
| ADR-030: Publication Pagination Strategy | Publication pagination strategy | - |
| ADR-031: Loading Strategy Formalization | Loading strategy formalization | - |
| ADR-032: Unified HTTP Client | Unified HTTP client pattern | App A |
| ADR-033: Publication Validation Strategy | Five-level publication validation | §3.4 |
| ADR-034: Schema↔Domain Pairs | Schema↔Domain configuration pairs | §2.8 |
| ADR-035: JSON Field Typing Policy | JSON field typing (Silver↔Gold) | §2.8 |
| ADR-036: Gold Contract Versioning | Gold contract versioning policy | §2.7 |
| ADR-037: Canonical Schema Generation | Canonical schema source and generation | §2.8 |
| ADR-038: Enum Externalization | ChEMBL enum values externalization | App D |
| ADR-039: Unified Entity Config | Unified entity configuration format | App D |
| ADR-040: Diagram Governance | Mermaid diagram standards and governance | §7.5 |
| ADR-041: Naming Policy Skills/Agents | Naming conventions for skills and agents | §7.1 |
| ADR-042: Testing Strategy Matrix | Test categorization and coverage strategy | §5 |
| ADR-043: Documentation Knowledge Management | Documentation governance and knowledge management | §7 |
| ADR-044: Run Manifest and Run Ledger Control Plane | Immutable manifest, append-only ledger, inspection CLI | §2.4, §5.5 |
| ADR-045: Data Quality Contract System | DQ contract semantics and rollout alignment | §3.4, §5.5 |
| Topic | Document | RULES.md |
|---|---|---|
| Medallion Layers | data-flow.md | §2.1 |
| Schema Drift | RULES.md | §2.2 |
| Data Lineage | system-context.md | §2.3 |
| Backfill/Replay | RULES.md | §2.4 |
| Quarantine | RULES.md | §2.6 |
| Content Hash | system-context.md | §2.8 |
| Topic | Document | RULES.md |
|---|---|---|
| Published control-plane contract | run-manifest-ledger.md | §2.4, §5.5 |
| Supported inspection CLI | cli.md | §5.5 |
| Mandatory inspection runbook | run-manifest-inspection.md | §2.4, §5.5 |
| Control-plane ADR | ADR-044 | §2.4, §5.5 |
| DQ / rollout ADR | ADR-045 | §3.4, §5.5 |
| Documentation metapolicy | D-01 | §7 |
| Provider | Entity | Schema Document | RULES.md |
|---|---|---|---|
| ChEMBL | Activity | activity-schema.md | §2.8 |
| ChEMBL | Molecule | molecule-schema.md | §2.8 |
| ChEMBL | Target | target-schema.md | §2.8 |
| ChEMBL | Assay | assay-schema.md | §2.8 |
| Topic | Document | RULES.md |
|---|---|---|
| Error Handling | ADR-016 | §3.1 |
| Circuit Breaker | ADR-007 | §3.1.4 |
| Locking | ADR-003 | §3.3 |
| DQ Metrics | RULES.md | §3.4 |
| Graceful Shutdown | ADR-008 | §5.3 |
| DR Procedures | runbooks/index.md | §5.5 |
| Control-Plane Contract | run-manifest-ledger.md | §2.4, §5.5 |
| Inspection CLI | cli.md | §5.5 |
| Run Traceability | run-manifest-inspection.md | §2.4, §5.5 |
| Historical Ops Artifacts | archive-index.md | §7 |
| Cleanup | cleanup-policy.md | §2.1.1 |
| Topic | Document | RULES.md |
|---|---|---|
| Adding Providers | add-new-source.md | App D |
| Adding Pipelines | add-pipeline-existing-source.md | App D |
| Pipeline Review | pipeline-review-checklist.md | §4.2 |
| GitHub Workflow | github-local-workflow.md | §7.3 |
| Testing | testing.md | §4.2 |
| Coverage Config | coverage-configuration.md | §4.2 |
| E2E Testing | ADR-010 | §4.2.3 |
| Date Handling | date-handling.md | §2.4 |
| Code Style | RULES.md §4 | §4 |
src/bioetl/
├── domain/ # Pure logic, no I/O (§1.1)
│ ├── ports/ # Protocol interfaces
│ │ ├── __init__.py # Facade — single import point (ARCH-008)
│ │ ├── data_source.py # DataSourcePort, FilterableDataSourcePort
│ │ ├── storage.py # StoragePort
│ │ ├── locking.py # LockPort
│ │ ├── checkpoint.py # CheckpointPort
│ │ ├── quarantine.py # QuarantinePort
│ │ ├── observability/ # Observability port package
│ │ │ ├── __init__.py # Facade — single import point for observability ports
│ │ │ ├── logging.py # LoggerPort
│ │ │ ├── metrics.py # MetricsPort, MetricsServerPort, MetricsPublisherPort
│ │ │ └── tracing.py # TracingPort
│ │ ├── validation.py # GoldValidatorPort
│ │ ├── filtering.py # InputFilterPort
│ │ ├── health_check.py # HealthCheckPort
│ │ ├── resilience.py # CircuitBreakerPort
│ │ ├── runner.py # PipelineRunnerPort
│ │ ├── shutdown.py # ShutdownPort
│ │ ├── memory.py # MemoryPort
│ │ ├── metadata.py # MetadataPort
│ │ ├── metadata_coordinator.py # MetadataCoordinatorPort
│ │ ├── delta_reader.py # DeltaReaderPort
│ │ ├── data_normalization.py # DataNormalizationPort
│ │ ├── dq_config.py # DQConfigPort
│ │ ├── dq_report.py # DQReportPort
│ │ ├── serialization.py # SerializationPort
│ │ ├── idmapping.py # IdMappingPort
│ │ ├── adr.py # AdrPort
│ │ ├── audit.py # AuditPort
│ │ ├── pii.py # PiiPort
│ │ └── noop.py # NoOp implementations (Null Object Pattern)
│ ├── config/ # Domain config models (package)
│ │ ├── pipeline.py # PipelineConfig
│ │ ├── runtime.py # RuntimeConfig
│ │ ├── dq.py # DQ config models
│ │ ├── table.py # TableConfig
│ │ ├── memory.py # MemoryConfig
│ │ ├── validation.py # ValidationConfig
│ │ └── control_plane.py # Control plane config models
│ ├── exceptions/ # Domain exceptions hierarchy (package)
│ │ ├── base.py # BioETLError base
│ │ ├── validation.py # ValidationError
│ │ ├── network.py # NetworkError
│ │ ├── infrastructure.py # InfrastructureError
│ │ ├── internal.py # InternalError
│ │ └── data_quality.py # DataQualityError
│ ├── aggregates/ # DDD Aggregates (ADR-021)
│ │ ├── batch.py # Batch aggregate
│ │ ├── pipeline_run.py # PipelineRun aggregate
│ │ ├── quarantine_entry.py # QuarantineEntry aggregate
│ │ └── events.py # Domain events
│ ├── composite/ # Composite pipeline domain (ADR-026)
│ │ ├── strategy.py # Merge strategies
│ │ ├── state.py # Composite state machine
│ │ ├── result.py # CompositeResult
│ │ ├── lineage.py # Field-level lineage
│ │ ├── cross_validation.py # Pre-merge validation
│ │ ├── field_groups.py # Column ordering
│ │ ├── config.py # Composite config models
│ │ ├── aggregation.py # Enricher aggregation
│ │ └── checkpoint/ # Composite checkpoint models
│ ├── contracts/gold/ # Gold contract modules
│ │ ├── chembl.py # ChEMBL Gold contract exports
│ │ ├── composite.py # Composite Gold contract exports
│ │ ├── publications.py # Publication Gold contract exports
│ │ ├── pubchem.py # PubChem Gold contract exports
│ │ └── uniprot.py # UniProt Gold contract exports
│ ├── entities/ # Domain entities
│ ├── filtering/ # Filter domain models (ADR-028)
│ │ ├── input_config.py # InputFilterConfig
│ │ ├── silver_config.py # SilverFilterConfig
│ │ ├── gold_config.py # GoldFilterConfig
│ │ └── ... # column_filter, range_filter, list_filters
│ ├── lineage/ # Data lineage tracking
│ │ ├── field_lineage.py # Field-level lineage tracking
│ │ ├── pipeline_lineage.py # Pipeline-level lineage
│ │ └── ... # Lineage utilities
│ ├── control_plane/ # Control plane domain models
│ │ ├── run_manifest.py # Run manifest models
│ │ ├── run_ledger.py # Run ledger models
│ │ └── ... # Control plane contracts
│ ├── mapping/ # Field mapping definitions
│ │ ├── publication_fields.py # Publication field mappings
│ │ ├── publication_type_mapping.py # Type classification
│ │ ├── activity_fields.py # Activity field mappings
│ │ └── molecule_fields.py # Molecule field mappings
│ ├── models/ # Domain models
│ ├── registry/ # Entity registries
│ ├── schemas/ # PyArrow Silver schemas (by provider)
│ │ ├── common/ # Base schemas (publication_base, molecule_base)
│ │ ├── chembl/ # 13 ChEMBL entity schemas
│ │ ├── crossref/ # CrossRef schemas
│ │ ├── openalex/ # OpenAlex schemas
│ │ ├── pubchem/ # PubChem schemas
│ │ ├── pubmed/ # PubMed schemas
│ │ ├── semanticscholar/ # SemanticScholar schemas
│ │ └── uniprot/ # UniProt schemas
│ ├── services/ # Domain services
│ │ ├── normalization_service.py # Data normalization
│ │ ├── identity_service.py # Entity ID generation
│ │ ├── text_similarity.py # Text similarity
│ │ ├── dq_metrics_calculator.py # DQ metrics
│ │ ├── unit_converter.py # Unit conversion
│ │ └── ...
│ ├── value_objects/ # Value objects
│ │ ├── run_context.py # RunContext
│ │ ├── dq_result.py # DQResult
│ │ ├── silver_result.py # SilverResult
│ │ └── ...
│ ├── transformations.py # Pure transformation functions
│ └── types.py # Shared types (RunType, HealthStatus, ErrorCode)
│
├── application/ # Pipeline orchestration (§1.1)
│ ├── core/ # Core pipeline infrastructure
│ │ ├── base.py # Base pipeline primitives
│ │ ├── base_transformer/ # Base transformer contracts and structural policy
│ │ ├── batch_executor.py # Batch executor
│ │ ├── runner.py # PipelineRunner (Driving Adapter logic)
│ │ ├── record_processor.py # Record processing
│ │ ├── lifecycle/ # Shutdown, checkpoint, locks, cleanup, heartbeat
│ │ ├── preflight/ # Pre-run validation and health aggregation
│ │ ├── postrun/ # Post-run cleanup, DQ, metadata and VACUUM
│ │ ├── quarantine_manager.py # Quarantine management
│ │ ├── batch_memory_manager.py # Memory management
│ │ └── ...
│ ├── composite/ # Composite pipeline orchestration
│ ├── pipelines/ # Concrete pipeline implementations
│ │ ├── common/ # Shared pipeline helpers
│ │ ├── chembl/ # ChEMBL transformers and pipeline helpers
│ │ │ ├── activity_transformer.py
│ │ │ ├── assay_transformer.py
│ │ │ ├── molecule_transformer.py
│ │ │ ├── target_transformer.py
│ │ │ ├── publication_transformer.py
│ │ │ ├── cell_line_transformer.py
│ │ │ ├── protein_class_transformer.py
│ │ │ ├── compound_record_transformer.py
│ │ │ ├── tissue_transformer.py
│ │ │ ├── target_component_transformer.py
│ │ │ ├── assay_parameters_transformer.py
│ │ │ ├── publication_term_transformer.py
│ │ │ ├── publication_similarity_transformer.py
│ │ │ ├── subcellular_fraction_transformer.py
│ │ │ ├── base_chembl_transformer.py
│ │ │ └── _pipelines.py
│ │ ├── pubchem/ # PubChem transformers
│ │ ├── pubmed/ # PubMed transformers
│ │ ├── uniprot/ # UniProt transformers
│ │ ├── crossref/ # CrossRef transformers
│ │ ├── openalex/ # OpenAlex transformers
│ │ └── semanticscholar/ # SemanticScholar transformers
│ ├── services/ # Application services
│ │ ├── control_plane/ # Control plane services
│ │ ├── dq/ # Data quality services
│ │ ├── execution/ # Execution services
│ │ └── lineage/ # Lineage services
│ └── observability/ # Application-level observability
│
├── composition/ # Composition Root (DI container)
│ ├── bootstrap/ # Bootstrap helpers
│ │ ├── assembly/ # Storage/checkpoint assembly
│ │ ├── cli/ # CLI bootstrap (adr, checkpoint, config, health, ...)
│ │ └── runtime/ # Runtime assembly, composite, pipeline, runner
│ ├── bootstrap_contexts.py # Bootstrap contexts
│ ├── bootstrap_logger.py # Bootstrap logging setup
│ ├── builders.py # Composition builders
│ ├── entrypoints.py # CLI/runner entrypoints
│ ├── monitoring/ # Monitoring and health checks
│ ├── observability.py # Observability wiring
│ ├── providers/ # Provider registration
│ ├── registry.py # Pipeline discovery
│ ├── runtime_builders/ # Runtime builder helpers
│ ├── services/ # Composition services
│ └── factories/ # Consolidated factories
│ ├── pipeline_factory.py # Pipeline factory
│ ├── runner_factory.py # Runner factory
│ ├── storage_factory.py # Multi-layer storage factory
│ ├── http_client_factory.py # HTTP client factory
│ ├── datasource/data_source_factory.py # Data source factory
│ ├── transformer_factory.py # Transformer factory
│ └── ...
│
├── infrastructure/ # I/O adapters (§1.1)
│ ├── adapters/ # External API clients
│ │ ├── http/ # HTTP client infrastructure (rate limiter, circuit breaker)
│ │ ├── chembl/ # ChEMBL API adapter
│ │ ├── crossref/ # CrossRef API adapter
│ │ ├── openalex/ # OpenAlex API adapter
│ │ ├── pubchem/ # PubChem API adapter
│ │ ├── pubmed/ # PubMed API adapter
│ │ ├── semanticscholar/ # Semantic Scholar API adapter
│ │ ├── uniprot/ # UniProt API adapter
│ │ ├── common/ # Shared adapter utilities
│ │ ├── decorators/ # circuit_breaker, retry decorators
│ │ └── input/ # CSV filter reader
│ ├── storage/ # Data persistence with canonical public seams + internal subpackages
│ │ ├── bronze/ # Bronze writer internals
│ │ ├── silver/ # Silver writer internals
│ │ ├── gold/ # Gold writer internals
│ │ ├── metadata/ # Metadata builder/writer internals
│ │ ├── delta/ # Shared Delta helpers
│ │ ├── support/ # Retention/checkpoint/atomic helpers
│ │ ├── bronze_writer.py # Bronze layer public writer seam
│ │ ├── silver_writer.py # Silver layer public writer seam
│ │ ├── gold_writer.py # Gold layer public writer seam
│ │ ├── base_delta_writer.py # Shared Delta writer seam
│ │ ├── delta_reader.py # Delta table reader seam
│ │ ├── metadata_builder.py # Metadata builder seam
│ │ ├── metadata_writer.py # Metadata writer seam
│ │ └── atomic.py # Atomic file-write facade
│ ├── adr/ # ADR utilities
│ ├── audit/ # Audit utilities
│ ├── config/ # Config loaders (package)
│ ├── control_plane/ # Control plane infrastructure
│ ├── errors/ # Error handling
│ ├── export/ # Export utilities
│ ├── locking/ # Local in-process locking
│ │ └── memory_lock.py # In-memory single-instance lock (ADR-010)
│ ├── checkpoint/ # Checkpoint persistence
│ ├── compat/ # Compatibility utilities
│ ├── quarantine/ # DQ failure handling
│ ├── observability/ # Metrics, logging, tracing adapters
│ ├── schemas/ # Pydantic config schemas
│ ├── security/ # PII hashing
│ ├── serialization/ # JSON encoders
│ ├── validation/ # Pandera validator
│ ├── config_merge.py # Config merge utility
│ └── system/ # System utilities
│
└── interfaces/ # External interfaces
├── cli/ # CLI package (bioetl run/quarantine/checkpoint)
│ ├── __init__.py # CLI package entry
│ ├── __main__.py # CLI module entrypoint
│ ├── exit_codes.py # CLI exit codes
│ ├── formatters.py # CLI output formatting
│ ├── main.py # Click command group
│ └── commands/ # CLI command entrypoints + support/compat modules
├── http/ # HTTP interfaces (health server)
│ ├── health_server.py # Health server
│ └── types.py # HTTP types
└── orchestration/ # Orchestration helpers
tests/
├── unit/ # Isolated unit tests (mock I/O)
│ ├── domain/ # Domain logic tests
│ ├── application/ # Pipeline/transformer tests
│ └── infrastructure/ # Adapter tests
├── integration/ # Integration tests (VCR cassettes)
│ └── adapters/ # HTTP adapter tests
├── e2e/ # E2E tests (Local-Only arch)
├── architecture/ # Architecture validation tests
└── fixtures/ # Test fixtures
└── vcr/ # VCR cassettes for HTTP
graph TD
subgraph configs
direction LR
A(entities) --> A1("chembl/activity.yaml")
A --> A2("pubmed/publication.yaml")
B(providers) --> B1("chembl.yaml")
B --> B2("openalex.yaml")
C(base) --> C1("pipeline.yaml")
C --> C2("quality.yaml")
D(composites) --> D1("publication.yaml")
end
| File | Purpose |
|---|---|
docs/00-project/RULES.md |
Master rules document |
docs/00-project/glossary.md |
Ubiquitous Language terminology |
CHANGELOG.md |
Version history |
configs/entities/{provider}/{entity}.yaml |
Pipeline configuration |
src/bioetl/domain/ports/ |
Protocol interfaces (package) |
src/bioetl/composition/bootstrap_contexts.py |
Composition root |
src/bioetl/infrastructure/config/composite_config_api.py |
Canonical composite-config loading seam |
src/bioetl/infrastructure/config/ |
Infrastructure config loaders and normalization package |
docs/02-architecture/system-context.md |
High-level system diagram |
docs/04-reference/contracts/gold/{provider}_{entity}_v{major}.{minor}.json |
Published Gold data contract exports |
| Topic | Document | RULES.md |
|---|---|---|
| GitHub Policy | 05-github-policy.md | §4, §5 |
| Contributing | CONTRIBUTING.md | — |
| Security | SECURITY.md | §5.4 |
- Repository: SatoryKono/BioactivityDataAcquisition
- Issues: Report bugs and feature requests
- CI/CD: GitHub Actions workflows (GitHub Policy)
Navigator freshness is carried by the page header and per-document metadata. This page intentionally avoids a separate manual status table because it drifts faster than the canonical owner docs themselves.