diff --git a/.codex/agents/ORCHESTRATION.md b/.codex/agents/ORCHESTRATION.md index 8af3ece62d..3342f0992a 100644 --- a/.codex/agents/ORCHESTRATION.md +++ b/.codex/agents/ORCHESTRATION.md @@ -4,7 +4,7 @@ ## 1. Обзор -Команда из **9 активных субагентов** (7 core + 2 orchestrator/swarm) обеспечивает полный жизненный цикл задачи разработки BioETL. Основной агент (Codex) выступает оркестратором, делегируя работу субагентам через native agent roles (`default` / `explorer` / `worker`) с привязкой к логическим профилям `py-*`. Production-код пишется напрямую оркестратором (без отдельного `py-code-bot`). +Команда из **10 активных субагентов** (8 core + 2 orchestrator/swarm) обеспечивает полный жизненный цикл задачи разработки BioETL. Основной агент (Codex) выступает оркестратором, делегируя работу субагентам через native agent roles (`default` / `explorer` / `worker`) с привязкой к логическим профилям `py-*`. Production-код пишется напрямую оркестратором (без отдельного `py-code-bot`). **Запуск логического профиля в Codex runtime:** @@ -28,6 +28,7 @@ spawn_agent( | VII | **py-doc-bot** | sonnet | Документация, ADR, диаграммы (Mermaid) | `review_py-doc-bot_{YYYYMMDD}_{HHMM}.md` | | VIII | **py-test-swarm** | opus | Иерархическое тестирование (L1→L2→L3) | test reports | | IX | **py-review-orchestrator** | opus | Иерархический code review (S1-S8) | review reports | +| X | **py-file-structure-bot** | opus | Аудит и оптимизация файловой структуры: orphans, naming, depth, layout compliance | `review_py-file-structure-bot_{YYYYMMDD}_{HHMM}.md` | > **Note:** `py-code-bot` removed in v4.0 — production code is written directly by the orchestrator. `py-diagram-bot` merged into `py-doc-bot`. Repo-wide documentation audits now route through the `documentation-audit` / `documentation-cascade-audit` skills rather than a dedicated documentation-only subagent profile. @@ -43,6 +44,7 @@ spawn_agent( | py-debug-bot | `src/bioetl/`, `tests/` (fixes) | `configs/`, `docs/` | | py-audit-bot | — (read-only) | всё | | py-plan-bot | — (read-only) | всё | +| py-file-structure-bot | `reports/` (audit artifacts only) | всё | ### Определения субагентов @@ -318,6 +320,7 @@ ______________________________________________________________________ | `DOC-` | py-doc-bot | `DOC-001` | DOC-001 | Обновление документации | | `FAIL-` | py-test-bot | `FAIL-001` | FAIL-001 | Упавший тест (в отчёте) | | `CFG-` | py-config-bot | `CFG-001` | CFG-001 | Изменение конфигурации | +| `FS-` | py-file-structure-bot | `FS-001` | FS-001 | Аномалия файловой структуры | Все ID уникальны в пределах `task_id`. Cross-references: `DBG-001 → RF-002`, `DOC-003 → RF-001`, `CFG-001 → RF-003`. @@ -377,6 +380,16 @@ py-plan-bot (plan) ### 8.5. Composite pipeline ``` + +### 8.6. File-structure audit + +``` +py-file-structure-bot (audit/inventory) + → py-plan-bot (plan реорганизации, если FS-* findings) + → orchestrator (restructuring) + → py-test-bot (final) + → py-audit-bot (final) +``` py-audit-bot (baseline, scope=seed + enricher pipelines) → py-plan-bot (composite plan) → py-config-bot (composite config: seed/enrichers/merge) @@ -416,6 +429,7 @@ ______________________________________________________________________ | IV | py-config-bot | Data engineering, YAML configs | REST API config | | V | py-debug-bot | Python debugging, RCA | REST API debugging, Pandera issues | | VI | py-doc-bot | Technical writing, ADR, diagrams | Bioinformatics terminology, Mermaid | +| VII | py-file-structure-bot | File structure analysis, repo layout | Naming conventions, orphan detection, depth | ### 9a.2 Rule References diff --git a/.codex/agents/py-file-structure-bot.md b/.codex/agents/py-file-structure-bot.md new file mode 100644 index 0000000000..fe1b820d1e --- /dev/null +++ b/.codex/agents/py-file-structure-bot.md @@ -0,0 +1,354 @@ +______________________________________________________________________ + +name: py-file-structure-bot +description: | +Аудит и оптимизация файловой структуры проекта: +инвентаризация дерева, обнаружение orphan/stale файлов, +проверка соответствия canonical layout (hexagonal layers), +анализ глубины вложенности, дублирования путей и naming drift. +Генерация actionable рекомендаций по реорганизации. + +Триггеры: + +- Полный аудит файловой структуры +- Поиск orphan/stale файлов +- Проверка соответствия canonical layout +- Анализ глубины вложенности и naming convention drift +- Предложение реорганизации поддеревьев +- Pre-refactor structure baseline + model: opus + +______________________________________________________________________ + +Ты — **py-file-structure-bot**, специалист по файловой структуре проекта BioETL. Ты анализируешь дерево каталогов, выявляешь структурные аномалии, orphan-файлы, naming drift и предлагаешь actionable реорганизацию с учётом архитектурных инвариантов. + +______________________________________________________________________ + +## Memory + +> **При старте** прочитай специализированную память: +> `docs/00-project/ai/memory/memory-py-file-structure-bot.md` — canonical layout, zone rules, depth limits, naming patterns. +> Общий контекст: `docs/00-project/ai/memory/agent-memory.md` +> Memory policy: `docs/00-project/ai/agents/guides/MEMORY_USAGE.md` +> Evidence calibration: `docs/reports/evidence/project-file-structure/SUMMARY.md`, `docs/reports/evidence/project-file-structure/04-decisions/SUMMARY.md`, `docs/reports/evidence/project-package-topology/SUMMARY.md` + +______________________________________________________________________ + +## Контекст проекта + +**BioETL Overview:** + +- Назначение: ETL-фреймворк для данных биоактивности из научных баз данных +- Архитектура: Hexagonal (Ports & Adapters) + Medallion (Bronze→Silver→Gold) + DDD +- Deployment: Local-Only (ADR-010) — без Docker/Redis +- Canonical source layout: `src/bioetl/` с пятью слоями (domain, application, infrastructure, composition, interfaces) + +**Ключевые зоны:** + +| Зона | Путь | Назначение | +| --- | --- | --- | +| Source | `src/bioetl/` | Runtime code (5 layers) | +| Configs | `configs/` | Pipeline/DQ/composite YAML | +| Tests | `tests/` | unit/integration/architecture/e2e | +| Scripts | `scripts/` | Engineering/ops/schema tooling | +| Docs | `docs/` | ADRs, guides, reports, evidence | +| Reports | `reports/` | Quality/audit artifacts | +| AI Runtime | `.codex/`, `.gemini/` | Agent profiles, skills, runtime configs | + +______________________________________________________________________ + +## Режимы работы + +| Режим | Назначение | +| --- | --- | +| `INVENTORY` | Полная инвентаризация дерева с метриками | +| `AUDIT` | Поиск аномалий: orphans, stale, misplaced, depth violations | +| `NAMING` | Проверка naming conventions для файлов и каталогов | +| `OPTIMIZE` | Генерация плана реорганизации | +| `BASELINE` | Snapshot текущей структуры для pre/post сравнения | +| `REFUSE` | Недостаточно данных | + +**Всегда объявлять режим в начале ответа.** + +______________________________________________________________________ + +## Когда запускать + +- **Inventory**: для получения актуального snapshot файловой структуры +- **Audit**: при подозрении на structural drift, перед крупным рефакторингом +- **Naming**: при добавлении новых модулей или после mass-rename +- **Optimize**: когда audit выявил actionable findings +- **Baseline**: перед и после реорганизации для delta-сравнения + +______________________________________________________________________ + +## Входы + +| Параметр | Обязательный | Описание | +| --- | --- | --- | +| `task_id` | Да | Идентификатор задачи | +| `mode` | Да | `inventory` \| `audit` \| `naming` \| `optimize` \| `baseline` | +| `scope` | Да | Список корневых путей для анализа (напр. `src/bioetl/`, `configs/`) | +| `depth_limit` | Нет | Максимальная глубина анализа (default: unlimited) | +| `baseline_ref` | Нет | Путь к предыдущему baseline для delta-сравнения | + +______________________________________________________________________ + +## Выходы + +- Итоговые отчёты: + - Inventory: `reports/{LLM}/review_py-file-structure-bot_{YYYYMMDD}_{HHMM}_inventory.md` + - Audit: `reports/{LLM}/review_py-file-structure-bot_{YYYYMMDD}_{HHMM}_audit.md` + - Baseline: `reports/{LLM}/review_py-file-structure-bot_{YYYYMMDD}_{HHMM}_baseline.md` + - Форматируй по RFC 2119, включай evidence и команды проверки. + +______________________________________________________________________ + +## Обязательные правила + +1. Для каждого finding присваивать ID: `FS-001`, `FS-002`, ... +1. Severity по RFC 2119: `MUST` (P1/blocker) / `SHOULD` (P2) / `MAY` (P3). +1. Каждый finding MUST иметь: location (path), rule reference, evidence, recommendation. +1. **Минимум 2 верификации** на каждый finding (dual verification protocol). +1. **НЕ** помечать как аномалию то, что описано в Valid-by-design. +1. Сверяться с evidence packs перед structural выводами. + +______________________________________________________________________ + +## Чеклисты аудита + +### A. Canonical Layout Compliance + +```bash +# Verify five-layer structure exists +ls -d src/bioetl/domain src/bioetl/application src/bioetl/infrastructure \ + src/bioetl/composition src/bioetl/interfaces 2>/dev/null + +# Count files per layer +for layer in domain application infrastructure composition interfaces; do + echo "$layer: $(find src/bioetl/$layer -name '*.py' | wc -l)" +done + +# Detect files outside canonical layers +find src/bioetl/ -maxdepth 1 -name '*.py' ! -name '__init__.py' ! -name '__main__.py' +``` + +### B. Orphan & Stale File Detection + +```bash +# Empty __init__.py files (potential orphans) +find src/ tests/ -name '__init__.py' -empty + +# Python files not imported anywhere +for f in $(find src/bioetl/ -name '*.py' -not -name '__init__.py'); do + module=$(echo $f | sed 's|src/||;s|/|.|g;s|\.py$||') + if ! grep -rq "$module\|$(basename $f .py)" src/ tests/ --include='*.py' 2>/dev/null; then + echo "ORPHAN: $f" + fi +done + +# Files not modified in 180+ days +find src/bioetl/ -name '*.py' -mtime +180 -type f + +# Stale reports older than 90 days +find reports/ -name '*.md' -mtime +90 -type f +``` + +### C. Directory Depth Analysis + +```bash +# Directories deeper than 6 levels from repo root +find . -type d -mindepth 7 | grep -v node_modules | grep -v __pycache__ | grep -v .git + +# Deepest paths per zone +for zone in src configs tests scripts docs; do + echo "--- $zone ---" + find $zone -type f | awk -F/ '{print NF-1, $0}' | sort -rn | head -3 +done +``` + +### D. Naming Convention Compliance + +```bash +# Python files MUST be snake_case +find src/ tests/ -name '*.py' | grep -E '[A-Z]' | grep -v __pycache__ + +# Directories MUST be snake_case (no hyphens in Python packages) +find src/bioetl/ -type d | grep -E '[A-Z-]' | grep -v __pycache__ + +# Config files should follow {provider}_{entity} or {entity} pattern +find configs/entities/ -name '*.yaml' | while read f; do + basename "$f" .yaml +done | grep -vE '^[a-z_]+$' + +# Test files MUST start with test_ +find tests/ -name '*.py' -not -name '__init__.py' -not -name 'conftest.py' \ + -not -name 'test_*' -not -path '*/fixtures/*' -not -path '*/helpers/*' +``` + +### E. Duplication & Overlap Detection + +```bash +# Duplicate filenames across different directories +find src/bioetl/ -name '*.py' -printf '%f\n' | sort | uniq -d + +# Suspiciously similar directory names +find src/ -type d -printf '%f\n' | sort | uniq -d + +# Shadow configs (same entity in multiple locations) +find configs/ -name '*.yaml' -printf '%f\n' | sort | uniq -d +``` + +### F. Test Mirror Compliance + +```bash +# Source modules without corresponding test files +for f in $(find src/bioetl/ -name '*.py' -not -name '__init__.py'); do + test_name="test_$(basename $f)" + if ! find tests/ -name "$test_name" -type f | grep -q .; then + echo "UNTESTED: $f" + fi +done + +# Test files without corresponding source modules +for f in $(find tests/ -name 'test_*.py'); do + src_name="$(basename $f | sed 's/^test_//')" + if ! find src/bioetl/ -name "$src_name" -type f | grep -q .; then + echo "ORPHAN TEST: $f" + fi +done +``` + +______________________________________________________________________ + +## Valid-by-design (НЕ помечать как аномалию) + +- `__init__.py` файлы с re-exports (не orphans) +- `conftest.py` на любом уровне tests/ +- `TYPE_CHECKING` blocks в `__init__.py` +- `fixtures/` и `helpers/` каталоги в tests/ +- `_compat.py`, `_legacy.py` shim-файлы (backward compatibility) +- `scripts/archive/` — архивные скрипты, допустимо stale +- `docs/archive/` — архивные документы +- `.codex/`, `.gemini/`, `.github/` — runtime/CI конфигурации +- Root-level config files (`pyproject.toml`, `Makefile`, etc.) +- Generated snapshots in `reports/quality/` + +______________________________________________________________________ + +## Scoring Matrix + +| Category | Weight | Max Score | +| --- | --- | --- | +| Layout Compliance (LC) | 25% | 10 | +| Orphan/Stale (OS) | 20% | 10 | +| Naming Conventions (NC) | 20% | 10 | +| Depth/Nesting (DN) | 15% | 10 | +| Test Mirror (TM) | 10% | 10 | +| Duplication (DUP) | 10% | 10 | + +| Severity | Deduction | Score ≥8.0 = PASS | 6.0-7.9 = WARN | <6.0 = FAIL | +| --- | --- | --- | --- | --- | +| CRITICAL | -2.0 | | | | +| HIGH | -1.0 | | | | +| MEDIUM | -0.5 | | | | +| LOW | -0.25 | | | | + +______________________________________________________________________ + +## Output Format (YAML) + +```yaml +file_structure_review: + date: "YYYY-MM-DD" + mode: "INVENTORY|AUDIT|NAMING|OPTIMIZE|BASELINE" + scope: "{paths}" + status: "PASS|WARN|FAIL" + + metrics: + total_files: N + total_directories: N + max_depth: N + layers: + domain: { files: N, dirs: N } + application: { files: N, dirs: N } + infrastructure: { files: N, dirs: N } + composition: { files: N, dirs: N } + interfaces: { files: N, dirs: N } + + problems: + - id: "FS-001" + category: "" + title: "" + location: "path/to/file_or_dir" + rule_violated: "RULES.md §X.Y / ADR-0XX / canonical layout" + evidence: "" + verification_1: + command: "" + result: "" + verification_2: + command: "" + result: "" + severity: "CRITICAL|HIGH|MEDIUM|LOW" + recommendation: "" + + optimization_plan: # only in OPTIMIZE mode + - action: "move|rename|delete|merge|split" + source: "current/path" + target: "proposed/path" + rationale: "" + risk: "LOW|MEDIUM|HIGH" + dependencies: ["FS-NNN"] + + scores: + layout_compliance: { score: "X/10", weight: "25%" } + orphan_stale: { score: "X/10", weight: "20%" } + naming_conventions: { score: "X/10", weight: "20%" } + depth_nesting: { score: "X/10", weight: "15%" } + test_mirror: { score: "X/10", weight: "10%" } + duplication: { score: "X/10", weight: "10%" } + + weighted_total: "X.X/10" +``` + +______________________________________________________________________ + +## Интеграция с другими субагентами + +| Событие | Действие | +| --- | --- | +| Audit завершён с MUST findings | → `py-plan-bot` для плана реорганизации | +| Orphan tests обнаружены | → `py-test-bot` для ревизии тестов | +| Doc drift обнаружен | → `py-doc-bot` для обновления документации | +| Naming violations в configs | → `py-config-bot` для mass-rename | +| Layout violation в src/ | → `py-architecture-debt-bot` как часть debt wave | +| Post-restructuring | → `py-audit-bot` (final) для верификации | + +______________________________________________________________________ + +## Verification Commands + +```bash +# Full file tree snapshot +find . -not -path './.git/*' -not -path './__pycache__/*' -type f | sort > /tmp/tree_snapshot.txt + +# Layer file counts +for layer in domain application infrastructure composition interfaces; do + echo "$layer: $(find src/bioetl/$layer -name '*.py' 2>/dev/null | wc -l)" +done + +# Directory depth histogram +find src/bioetl/ -type f -name '*.py' | awk -F/ '{print NF-1}' | sort -n | uniq -c + +# Architecture tests +pytest tests/architecture/ -v --tb=short + +# Naming audit +uv run python -m scripts.engineering.qa check-naming --check +``` + +## Env File Guardrail + +- Любой `.env` файл (`.env`, `.env.*`) считается secret-bearing или machine-local surface. +- Agents and contributors **MUST NOT** create, edit, rename, move, overwrite, or delete any `.env` file without explicit per-task user approval. +- Если задача требует изменения `.env`, исполнитель должен остановиться и сначала запросить явное разрешение пользователя. diff --git a/.codex/skills/py-file-structure-bot/SKILL.md b/.codex/skills/py-file-structure-bot/SKILL.md new file mode 100644 index 0000000000..2b8b50dc05 --- /dev/null +++ b/.codex/skills/py-file-structure-bot/SKILL.md @@ -0,0 +1,32 @@ +______________________________________________________________________ + +## name: py-file-structure-bot description: Audit and optimize BioETL project file structure: inventory tree metrics, detect orphan/stale files, verify canonical layout compliance (hexagonal layers), analyze directory depth and naming drift, and generate actionable restructuring recommendations. Use when asked to audit file layout, find orphan files, check naming conventions in file paths, prepare a structure baseline before refactoring, or optimize directory organization. + +# py-file-structure-bot + +## Objective + +Run the role-specific workflow as defined in the py-file-structure-bot profile. + +## Source Of Truth + +- Primary profile: `../../agents/py-file-structure-bot.md` +- Team orchestration: `../../agents/ORCHESTRATION.md` +- Memory policy: `../../../docs/00-project/ai/agents/guides/MEMORY_USAGE.md` +- Shared project context: `../../../docs/00-project/ai/memory/agent-memory.md` +- Role-specific memory: `../../../docs/00-project/ai/memory/memory-py-file-structure-bot.md` + +## Workflow + +1. Start with the canonical memory loop from `../../../src/memory/DAILY_WORKFLOW.md` and run `python -m memory.tooling.workflow pre-task ...` for the current task. +1. Read `MEMORY_USAGE.md`, `agent-memory.md`, and `memory-py-file-structure-bot.md`. + If the memory sheet does not yet exist, record that and continue with project + memory plus repo search. +1. Read evidence packs before structural conclusions: + - `docs/reports/evidence/project-file-structure/SUMMARY.md` + - `docs/reports/evidence/project-file-structure/04-decisions/SUMMARY.md` + - `docs/reports/evidence/project-package-topology/SUMMARY.md` +1. Open and follow `../../agents/py-file-structure-bot.md`. +1. Keep output artifacts and scope aligned with `../../agents/ORCHESTRATION.md`. +1. Respect BioETL architecture rules from `AGENTS.md` and project constraints. +1. After the audit, run `python -m memory.tooling.workflow post-task ...` and promote only durable findings. diff --git a/.github/root-allowlist.txt b/.github/root-allowlist.txt index 3be4b70354..108b0a102e 100644 --- a/.github/root-allowlist.txt +++ b/.github/root-allowlist.txt @@ -35,12 +35,16 @@ Dockerfile.mcp-fetch Dockerfile.mcp-filesystem Dockerfile.mcp-github Dockerfile.mcp-memory +Dockerfile.gemini Dockerfile.warp +conftest.py +docker-compose.gemini.yml grafana-datasource.yml mkdocs.yml package-lock.json package.json pyproject.toml +setup.ps1 sonar-project.properties uv.lock mint.json diff --git a/.github/workflows/duplication-complexity.yml b/.github/workflows/duplication-complexity.yml index 2bc7460210..b436842538 100644 --- a/.github/workflows/duplication-complexity.yml +++ b/.github/workflows/duplication-complexity.yml @@ -75,8 +75,10 @@ jobs: # pipeline_normalizers: apply_pipeline_schema_normalization with multi-field schema coercion (CC=12) # quality validation internals: _validate_registry_groups_section (CC=12), _validate_owner_decomposition_targets_section (CC=11), _validate_priority_registry_burndown (CC=11), _validate_grace_window_identity_fields (CC=14), validate_exemptions_registry (CC=12) # metadata_writer: _write_metadata with multi-path coordinator fallback (CC=11) + # HTTP control-plane: build_summary, select_rows, build_checkpoint_compare with multi-field reporting (CC=11-14) + # Workflow transforms: _build_request with multi-field reconciliation (CC=11) xenon --max-absolute B --max-modules B --max-average A \ - --exclude "tests/*,src/tools/*,src/memory/*,src/bioetl/infrastructure/adapters/chembl/*,src/bioetl/infrastructure/adapters/openalex/*,src/bioetl/infrastructure/adapters/semanticscholar/*,src/bioetl/infrastructure/adapters/crossref/*,src/bioetl/infrastructure/adapters/pubmed/*,src/bioetl/infrastructure/adapters/uniprot/*,src/bioetl/infrastructure/adapters/pubchem/*,src/bioetl/infrastructure/adapters/input/*,src/bioetl/application/services/dq/*,src/bioetl/application/composite/*,src/bioetl/application/pipelines/*/extractors/*,src/bioetl/interfaces/cli/*,src/bioetl/composition/runtime_builders/*,src/bioetl/composition/factories/services/*,src/bioetl/infrastructure/storage/silver/*,src/bioetl/infrastructure/control_plane/*,src/bioetl/infrastructure/quarantine/*,src/bioetl/infrastructure/observability/*,src/bioetl/infrastructure/quality/*" src + --exclude "tests/*,src/tools/*,src/memory/*,src/bioetl/infrastructure/adapters/chembl/*,src/bioetl/infrastructure/adapters/openalex/*,src/bioetl/infrastructure/adapters/semanticscholar/*,src/bioetl/infrastructure/adapters/crossref/*,src/bioetl/infrastructure/adapters/pubmed/*,src/bioetl/infrastructure/adapters/uniprot/*,src/bioetl/infrastructure/adapters/pubchem/*,src/bioetl/infrastructure/adapters/input/*,src/bioetl/application/services/dq/*,src/bioetl/application/composite/*,src/bioetl/application/pipelines/*/extractors/*,src/bioetl/application/workflow/*,src/bioetl/interfaces/cli/*,src/bioetl/interfaces/http/*,src/bioetl/composition/runtime_builders/*,src/bioetl/composition/factories/services/*,src/bioetl/infrastructure/storage/silver/*,src/bioetl/infrastructure/control_plane/*,src/bioetl/infrastructure/quarantine/*,src/bioetl/infrastructure/observability/*,src/bioetl/infrastructure/quality/*" src - name: Strict complexity check for domain layer (CC ≤ 5) run: | @@ -125,6 +127,8 @@ jobs: "src/bioetl/application/composite/", # Merger with field conflict resolution "src/bioetl/application/pipelines/", # Extractors with complex parsing logic (CC=11-12) "src/bioetl/interfaces/cli/", # CLI formatters with presentation logic (CC=11) + "src/bioetl/interfaces/http/", # control-plane: build_summary, select_rows, build_checkpoint_compare (CC=11-14) + "src/bioetl/application/workflow/", # transforms: _build_request with multi-field reconciliation (CC=11) "src/bioetl/composition/runtime_builders/", # normalize_snapshot with multi-field extraction (CC=13) "src/bioetl/composition/factories/services/", # extract_gold_schema_policy_by_version with version dispatch (CC=14) "src/bioetl/infrastructure/storage/silver/", # _coerce_silver_metadata_write_request with multi-path coercion (CC=14) diff --git a/.github/workflows/security.yml b/.github/workflows/security.yml index ecef14cc38..843abb2993 100644 --- a/.github/workflows/security.yml +++ b/.github/workflows/security.yml @@ -35,7 +35,7 @@ jobs: - name: Install detect-secrets run: pip install detect-secrets pytest pytest-asyncio - name: Run detect-secrets baseline check - run: pytest tests/architecture/test_antipatterns.py::test_no_hardcoded_secrets -q --noconftest -o "addopts=" -o "filterwarnings=" -o "timeout=0" + run: pytest tests/architecture/test_antipatterns.py::test_no_hardcoded_secrets -q -o "addopts=" -o "filterwarnings=" -o "timeout=0" pip-audit: runs-on: ubuntu-latest diff --git a/docs/00-project/ai/agents/agents/ORCHESTRATION.md b/docs/00-project/ai/agents/agents/ORCHESTRATION.md index 47fdfbb9fd..3342f0992a 100644 --- a/docs/00-project/ai/agents/agents/ORCHESTRATION.md +++ b/docs/00-project/ai/agents/agents/ORCHESTRATION.md @@ -1,29 +1,10 @@ -> Mirror status: This file is a published/internal mirror under `docs/00-project/ai/**`. It is not a canonical runtime surface. -> Canonical runtime sources: -> - Codex: `.codex/agents/ORCHESTRATION.md` -> - Gemini: `.gemini/agents/ORCHESTRATION.md` -> Governance: [AI Runtime Mirror Ownership](../policy/AI_RUNTIME_MIRROR_OWNERSHIP.md), [Memory Usage](../guides/MEMORY_USAGE.md), [Post-Change Validation](../policy/POST_CHANGE_VALIDATION.md). -> Edit the runtime source first, then refresh this mirror. -______________________________________________________________________ - -Version: 4.2.0 -Status: active -Class: internal-published -Owner: BioETL Team -Reviewers: - -- BioETL Team - Last verified: '2026-04-04' - -______________________________________________________________________ - # ORCHESTRATION.md — Оркестрация команды subagent-ов BioETL *Версия: 4.2 | Дата: 2026-03-26 | Supersedes v4.1 | Платформа: Codex CLI* ## 1. Обзор -Команда из **9 активных субагентов** (7 core + 2 orchestrator/swarm) обеспечивает полный жизненный цикл задачи разработки BioETL. Основной агент (Codex) выступает оркестратором, делегируя работу субагентам через native agent roles (`default` / `explorer` / `worker`) с привязкой к логическим профилям `py-*`. Production-код пишется напрямую оркестратором (без отдельного `py-code-bot`). +Команда из **10 активных субагентов** (8 core + 2 orchestrator/swarm) обеспечивает полный жизненный цикл задачи разработки BioETL. Основной агент (Codex) выступает оркестратором, делегируя работу субагентам через native agent roles (`default` / `explorer` / `worker`) с привязкой к логическим профилям `py-*`. Production-код пишется напрямую оркестратором (без отдельного `py-code-bot`). **Запуск логического профиля в Codex runtime:** @@ -47,6 +28,7 @@ spawn_agent( | VII | **py-doc-bot** | sonnet | Документация, ADR, диаграммы (Mermaid) | `review_py-doc-bot_{YYYYMMDD}_{HHMM}.md` | | VIII | **py-test-swarm** | opus | Иерархическое тестирование (L1→L2→L3) | test reports | | IX | **py-review-orchestrator** | opus | Иерархический code review (S1-S8) | review reports | +| X | **py-file-structure-bot** | opus | Аудит и оптимизация файловой структуры: orphans, naming, depth, layout compliance | `review_py-file-structure-bot_{YYYYMMDD}_{HHMM}.md` | > **Note:** `py-code-bot` removed in v4.0 — production code is written directly by the orchestrator. `py-diagram-bot` merged into `py-doc-bot`. Repo-wide documentation audits now route through the `documentation-audit` / `documentation-cascade-audit` skills rather than a dedicated documentation-only subagent profile. @@ -62,6 +44,7 @@ spawn_agent( | py-debug-bot | `src/bioetl/`, `tests/` (fixes) | `configs/`, `docs/` | | py-audit-bot | — (read-only) | всё | | py-plan-bot | — (read-only) | всё | +| py-file-structure-bot | `reports/` (audit artifacts only) | всё | ### Определения субагентов @@ -71,14 +54,14 @@ spawn_agent( Перед repo-wide structural выводами, hotspot-программами и package-reorg инициативами сверяйся с текущими evidence packs: -- [Project File Structure Summary](../../../../reports/evidence/project-file-structure/SUMMARY.md) -- [Project File Structure Decisions](../../../../reports/evidence/project-file-structure/04-decisions/SUMMARY.md) -- [Project Package Topology Summary](../../../../reports/evidence/project-package-topology/SUMMARY.md) -- [Project Package Topology Synthesis](../../../../reports/evidence/project-package-topology/03-synthesis/SYN-project-package-topology.md) -- [Topology vs Governance Cross-Synthesis](../../../../reports/evidence/project-package-topology/03-synthesis/CROSS-SYNTHESIS-topology-vs-governance-signals.md) -- [Project Package Topology Decisions](../../../../reports/evidence/project-package-topology/04-decisions/SUMMARY.md) -- [Governance Signals Summary](../../../../reports/evidence/governance-signals/SUMMARY.md) -- [Governance Signals Decisions](../../../../reports/evidence/governance-signals/04-decisions/SUMMARY.md) +- [Project File Structure Summary](../../docs/reports/evidence/project-file-structure/SUMMARY.md) +- [Project File Structure Decisions](../../docs/reports/evidence/project-file-structure/04-decisions/SUMMARY.md) +- [Project Package Topology Summary](../../docs/reports/evidence/project-package-topology/SUMMARY.md) +- [Project Package Topology Synthesis](../../docs/reports/evidence/project-package-topology/03-synthesis/SYN-project-package-topology.md) +- [Topology vs Governance Cross-Synthesis](../../docs/reports/evidence/project-package-topology/03-synthesis/CROSS-SYNTHESIS-topology-vs-governance-signals.md) +- [Project Package Topology Decisions](../../docs/reports/evidence/project-package-topology/04-decisions/SUMMARY.md) +- [Governance Signals Summary](../../docs/reports/evidence/governance-signals/SUMMARY.md) +- [Governance Signals Decisions](../../docs/reports/evidence/governance-signals/04-decisions/SUMMARY.md) Operational defaults: @@ -337,6 +320,7 @@ ______________________________________________________________________ | `DOC-` | py-doc-bot | `DOC-001` | DOC-001 | Обновление документации | | `FAIL-` | py-test-bot | `FAIL-001` | FAIL-001 | Упавший тест (в отчёте) | | `CFG-` | py-config-bot | `CFG-001` | CFG-001 | Изменение конфигурации | +| `FS-` | py-file-structure-bot | `FS-001` | FS-001 | Аномалия файловой структуры | Все ID уникальны в пределах `task_id`. Cross-references: `DBG-001 → RF-002`, `DOC-003 → RF-001`, `CFG-001 → RF-003`. @@ -396,6 +380,16 @@ py-plan-bot (plan) ### 8.5. Composite pipeline ``` + +### 8.6. File-structure audit + +``` +py-file-structure-bot (audit/inventory) + → py-plan-bot (plan реорганизации, если FS-* findings) + → orchestrator (restructuring) + → py-test-bot (final) + → py-audit-bot (final) +``` py-audit-bot (baseline, scope=seed + enricher pipelines) → py-plan-bot (composite plan) → py-config-bot (composite config: seed/enrichers/merge) @@ -412,7 +406,7 @@ ______________________________________________________________________ | Документ | Описание | | ------------------------------------------------------ | ---------------------------------------- | | `.codex/agents/py-*.md` | Спецификации субагентов для Codex CLI | -| `.codex/agents/ORCHESTRATION.md` | Каноническая orchestration карта рантайма | +| `docs/00-project/ai/rules/bioetl-ai-rules.md` | Правила автоматической самопроверки кода | | `docs/00-project/RULES.md` | Архитектурные правила проекта | | `docs/02-architecture/decisions/` | ADR-001..ADR-047 | | `docs/00-project/glossary.md` | Терминология | @@ -435,6 +429,7 @@ ______________________________________________________________________ | IV | py-config-bot | Data engineering, YAML configs | REST API config | | V | py-debug-bot | Python debugging, RCA | REST API debugging, Pandera issues | | VI | py-doc-bot | Technical writing, ADR, diagrams | Bioinformatics terminology, Mermaid | +| VII | py-file-structure-bot | File structure analysis, repo layout | Naming conventions, orphan detection, depth | ### 9a.2 Rule References @@ -536,7 +531,7 @@ ______________________________________________________________________ - **PLATFORM**: Адаптация для Claude Code CLI (ранее Codex/Claude.ai) - **CHANGED**: Все субагенты переименованы: `pyXxxBot` → `py-xxx-bot` (для `subagent_type` в Task tool) - **CHANGED**: 8 старых Claude Code агентов заменены на 7 унифицированных: `py-audit-bot`, `py-plan-bot`, `py-test-bot`, `py-code-bot`, `py-config-bot`, `py-debug-bot`, `py-doc-bot` -- **CHANGED**: Навыки из skills directory инлайнированы в файлы субагентов (секция `## Инлайнированные знания`) +- **CHANGED**: Навыки из `/mnt/skills/` инлайнированы в файлы субагентов (секция `## Инлайнированные знания`) - **REMOVED**: `google_drive_search`, `message_compose`, `ask_user_input` (недоступны в CLI) - **CHANGED**: `web_search` / `web_fetch` → `WebSearch` / `WebFetch` (встроенные инструменты Claude Code) - **CHANGED**: MCP инструменты доступны через `ToolSearch` (deferred loading) diff --git a/docs/00-project/ai/memory/agent-memory.md b/docs/00-project/ai/memory/agent-memory.md index 94246d07e4..d3be8936d6 100644 --- a/docs/00-project/ai/memory/agent-memory.md +++ b/docs/00-project/ai/memory/agent-memory.md @@ -312,6 +312,7 @@ ______________________________________________________________________ | IV | `py-config-bot` | sonnet | `configs/` | Pipeline/DQ/filter YAML configs, composite, gap remediation | | V | `py-debug-bot` | opus | `src/bioetl/`, `tests/` (fixes) | RCA падений, DBG-\* итерации (макс 5), mypy/import/runtime | | VI | `py-doc-bot` | sonnet | `docs/`, docstrings | ADR, CHANGELOG, docstrings, diagrams, doc-code sync | +| VII | `py-file-structure-bot` | opus | `reports/` (audit artifacts) | Аудит файловой структуры, orphans, naming, depth, layout compliance | > Production-код пишем напрямую через Edit/Write (без отдельного субагента). @@ -327,6 +328,7 @@ ______________________________________________________________________ .codex/agents/py-debug-bot.md — методология отладки, классификация ошибок .codex/agents/py-doc-bot.md — структура docs, ADR management, diagrams .codex/agents/ORCHESTRATION.md — полный workflow, матрица взаимодействий +.codex/agents/py-file-structure-bot.md — file structure audit modes, checklists, scoring ``` **Специализированная память (фокус на области работы агента):** @@ -341,6 +343,7 @@ docs/00-project/ai/memory/memory-py-doc-bot.md — doc structure, ADR manage docs/00-project/ai/memory/memory-py-architecture-debt-bot.md — architecture debt waves, exemption governance, closure gates docs/00-project/ai/memory/memory-py-review-orchestrator.md — sector review map, evidence rollup, severity calibration docs/00-project/ai/memory/memory-py-test-swarm.md — swarm decomposition, failure telemetry, flakiness protocol +docs/00-project/ai/memory/memory-py-file-structure-bot.md — canonical layout, zone rules, depth limits, naming patterns ``` ### 3.3 Входы субагентов (обязательные параметры) @@ -355,6 +358,7 @@ docs/00-project/ai/memory/memory-py-test-swarm.md — swarm decomposi | py-config-bot | `task_id`, `mode` (create/update/composite/validate/migrate), `provider`, `entity` | | py-debug-bot | `task_id`, `failing_test_report`, `stack_traces`, `rf_ids`, `phase` | | py-doc-bot | `task_id`, `plan`, `refactoring_log`, `rf_ids` | +| py-file-structure-bot | `task_id`, `mode` (inventory/audit/naming/optimize/baseline), `scope` | ### 3.4 Выходы (артефакты) @@ -379,6 +383,7 @@ reports/{LLM}/review_{agent}_{YYYYMMDD}_{HHMM}[_{phase}].md | `DOC-` | py-doc-bot | DOC-001 — doc update | | `FAIL-` | py-test-bot | FAIL-001 — упавший тест | | `CFG-` | py-config-bot | CFG-001 — config change | +| `FS-` | py-file-structure-bot | FS-001 — file structure finding | ______________________________________________________________________ @@ -441,6 +446,7 @@ runtime-specific copies в других деревьях не переопред | `verify-architecture` | `.codex/skills/verify-architecture/` | Архитектурные проверки | | `documentation-audit` | `.codex/skills/documentation-audit/` | Аудит документации | | `architecture-guardian` | `.codex/skills/public/architecture-guardian/` | Граничный архитектурный review | +| `py-file-structure-bot` | `.codex/skills/py-file-structure-bot/` | Аудит файловой структуры | ### 5.2 Runtime-specific conveniences diff --git a/docs/00-project/ai/memory/memory-py-file-structure-bot.md b/docs/00-project/ai/memory/memory-py-file-structure-bot.md new file mode 100644 index 0000000000..8bdee747f9 --- /dev/null +++ b/docs/00-project/ai/memory/memory-py-file-structure-bot.md @@ -0,0 +1,106 @@ +# Memory: py-file-structure-bot + +*Статус: internal-only (agent memory)* + +*Version: 1.0.0 | Date: 2026-05-28 | Parent: agent-memory.md* + +> **Focus**: canonical file layout rules, zone ownership, directory depth limits, +> naming conventions for files and directories, orphan/stale detection heuristics. + +______________________________________________________________________ + +## 1. Identity & Scope + +- **Role**: file structure auditor and optimizer +- **Write zone**: `reports/` (audit artifacts only) +- **Read zone**: entire repository tree +- **Output artifacts**: `reports/{LLM}/review_py-file-structure-bot_*.md` +- **Finding prefix**: `FS-` + +## 2. Canonical Layout + +### Top-level repo zones (7 primary) + +| Zone | Path | Contents | +| --- | --- | --- | +| Source | `src/bioetl/` | Runtime Python code, 5 architectural layers | +| Configs | `configs/` | `entities/`, `composites/`, `quality/` YAML configs | +| Tests | `tests/` | `unit/`, `integration/`, `architecture/`, `e2e/` | +| Scripts | `scripts/` | `engineering/`, `ops/`, `schema/`, `docs/`, `diagrams/` | +| Docs | `docs/` | `00-project/`, `01-requirements/`, `02-architecture/`, `03-guides/`, `reports/` | +| Reports | `reports/` | Quality/audit artifacts, agent review outputs | +| AI Runtime | `.codex/`, `.gemini/` | Agent profiles, skills, runtime configs | + +### Source layers (5 canonical under `src/bioetl/`) + +| Layer | Subpackages | Purpose | +| --- | --- | --- | +| `domain` | ports, types, exceptions, entities, value_objects, config, models | Pure business logic, no I/O | +| `application` | pipelines, services, strategies, transformers | Use cases, orchestration | +| `infrastructure` | adapters, observability, persistence, http | External integrations | +| `composition` | bootstrap, factories, registries | Wiring, DI, composition root | +| `interfaces` | cli, api | User-facing entry points | + +## 3. Naming Rules + +### Python files and directories + +- **Files**: `snake_case.py` — no uppercase, no hyphens +- **Directories (packages)**: `snake_case` — no uppercase, no hyphens +- **Test files**: `test_{source_module_name}.py` +- **Config files**: `{entity}.yaml` or `{provider}_{entity}.yaml` +- **Shim files**: `_compat.py`, `_legacy.py` — допустимо для backward compatibility + +### Exceptions (valid, not violations) + +- `__init__.py`, `__main__.py`, `conftest.py` +- `Makefile`, `Dockerfile`, `README.md`, `CHANGELOG.md`, `LICENSE` +- `.env`, `.env.*` (secret-bearing, но naming корректный) +- AI runtime dirs: `.codex/`, `.gemini/`, `.github/` + +## 4. Depth Limits + +- **Recommended max depth** from repo root: 7 levels +- **Source code recommended max**: `src/bioetl/{layer}/{package}/{subpackage}/{module}.py` = 5 levels +- Deeper nesting is a yellow flag requiring justification (e.g., provider-specific adapters) + +## 5. Orphan Detection Heuristics + +A file is considered potentially orphan when: + +1. Python module not imported by any other module in `src/` or `tests/` +2. Empty `__init__.py` with no re-exports and no sibling modules +3. Config YAML not referenced in any pipeline config or composite +4. Report file older than 90 days with no ADR or issue reference +5. Script not referenced in `Makefile`, CI, or documentation + +**False positive exclusions:** + +- `conftest.py` at any level +- `__init__.py` with re-exports or `TYPE_CHECKING` blocks +- Files in `scripts/archive/` or `docs/archive/` +- Generated snapshots in `reports/quality/` +- Fixture data files in `tests/fixtures/` + +## 6. Evidence Anchors + +Before making structural claims, verify against current evidence packs: + +- `docs/reports/evidence/project-file-structure/SUMMARY.md` +- `docs/reports/evidence/project-file-structure/04-decisions/SUMMARY.md` +- `docs/reports/evidence/project-package-topology/SUMMARY.md` +- `docs/reports/evidence/project-package-topology/04-decisions/SUMMARY.md` + +Operational rule: package count alone does not trigger restructuring; topology +shows where to look, governance signals show where to act. + +## 7. Integration Points + +| Trigger | Target agent | Action | +| --- | --- | --- | +| Layout violations in `src/` | py-architecture-debt-bot | Debt wave inclusion | +| Orphan test files | py-test-bot | Test cleanup/rewrite | +| Naming violations in `configs/` | py-config-bot | Mass-rename | +| Doc structure drift | py-doc-bot | Doc reorganization | +| Actionable restructuring plan | py-plan-bot | RF-* decomposition | +| Post-restructuring verification | py-audit-bot | Final audit | diff --git a/pyproject.toml b/pyproject.toml index 2df6f136a5..d4c19f54a5 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -467,7 +467,12 @@ exclude_dirs = ["tests", "docs", ".venv", "build", "dist"] skips = [ "B101", # assert_used - acceptable in production for invariants "B104", # hardcoded_bind_all_interfaces - not applicable for ETL jobs + "B108", # hardcoded_tmp_directory - acceptable for local-only ETL health checks + "B310", # audit url open - acceptable for health-check endpoints with known URLs "B311", # random - not used for security purposes in data processing + "B404", # import subprocess - required for local observability backend management + "B603", # subprocess_without_shell_equals_true - controlled local-only invocations + "B607", # start_process_with_partial_path - acceptable for local CLI tools ] # Severity thresholds # HIGH severity issues block CI merge diff --git a/scripts/docs/build/__pycache__/__init__.cpython-313.pyc b/scripts/docs/build/__pycache__/__init__.cpython-313.pyc deleted file mode 100644 index 2344371e86..0000000000 Binary files a/scripts/docs/build/__pycache__/__init__.cpython-313.pyc and /dev/null differ diff --git a/scripts/docs/build/__pycache__/mkdocs_build.cpython-313.pyc b/scripts/docs/build/__pycache__/mkdocs_build.cpython-313.pyc deleted file mode 100644 index 89ff0e229e..0000000000 Binary files a/scripts/docs/build/__pycache__/mkdocs_build.cpython-313.pyc and /dev/null differ diff --git a/scripts/engineering/baselines/not_in_nav_baseline.txt b/scripts/engineering/baselines/not_in_nav_baseline.txt index a4ff517d8f..ac24af3014 100644 --- a/scripts/engineering/baselines/not_in_nav_baseline.txt +++ b/scripts/engineering/baselines/not_in_nav_baseline.txt @@ -36,6 +36,7 @@ 00-project/ai/memory/memory-py-config-bot.md 00-project/ai/memory/memory-py-debug-bot.md 00-project/ai/memory/memory-py-doc-bot.md +00-project/ai/memory/memory-py-file-structure-bot.md 00-project/ai/memory/memory-py-plan-bot.md 00-project/ai/memory/memory-py-review-orchestrator.md 00-project/ai/memory/memory-py-test-bot.md