diff --git a/ERROR_LOG.md b/ERROR_LOG.md index 4ff5807..f13f7a3 100644 --- a/ERROR_LOG.md +++ b/ERROR_LOG.md @@ -279,3 +279,24 @@ - Ação tomada: remoção dos arquivos criados fora do alvo, correção em `.codex/config.toml` e `scripts/dev-codex.sh`, revalidação dos perfis `container_planning` e `container_aggressive`, da feature `multi_agent` e dos MCPs efetivos. - Status: resolvido. - Observação futura: confirmar primeiro se a mudança desejada é na configuração do Codex ou no scaffolding do projeto e sempre validar o config efetivo renderizado do launcher. + +## 2026-04-01 - Sprint Completion: F59-F68 Consolidation + +- Contexto: Encerramento de sprint com 10 frentes concluídas (F59-F68). +- Frentes entregues: + - F59: Multi-Agent Session Orchestration + - F60: Local Control Plane Foundation + - F61: DAG Pipeline Evolution + - F62: Copilot Adapter + - F63: Memory Engine Enhancement + - F64: Advanced Supervisor Policies + - F65: Runtime Coordinator Hardening + - F66: Reporting & Observability Evolution + - F67: Workspace Management v2 + - F68: Plugin/Extension System +- Métricas: 755 tests passando, ruff/mypy 100% clean. +- Erro observado: Nenhum erro crítico durante implementação. +- Causa identificada: N/A +- Ação tomada: Session-close executado com consolidação de memória e handoff. +- Status: Concluído com sucesso. +- Observação futura: Baseline estável para próxima onda de features. Considerar technical-triage para definição de próximas frentes prioritárias. diff --git a/PENDING_LOG.md b/PENDING_LOG.md index fd03541..2a71042 100644 --- a/PENDING_LOG.md +++ b/PENDING_LOG.md @@ -1,253 +1,42 @@ # PENDING_LOG -## Triagem atual — evolução arquitetural incremental +## Sprint Completion — F59-F68 (2026-04-01) -- Em 2026-03-20, a análise comparativa entre SynapseOS, Superset, Mastra e coding-agent foi consolidada como direção de produto e arquitetura para a próxima onda de evolução local. -- A conclusão prática é que o SynapseOS não deve tentar reproduzir o produto Superset nem migrar o runtime central para TypeScript neste momento; o ganho líquido imediato está em absorver boundaries, contratos internos e padrões de extensibilidade de forma incremental sobre o core Python atual. -- Em 2026-03-20, a onda local `F51` → `F53` foi executada sequencialmente: -- `F51-runtime-boundaries-foundation` abriu contratos explícitos de `ToolSpec`/capabilities, `WorkspaceProvider`, `RunContext` e lifecycle hooks no Synapse-Flow. -- `F52-workspace-isolation-foundation` tornou o workspace efetivo da run auditável com `workspace_path` persistido e provider `run-scoped` opcional. -- `F53-observability-runtime-events` enriqueceu a timeline local com `run_context_initialized`, `step_started` e `state_transitioned`, além de refletir `workspace_path` em `runs show` e `RUN_REPORT.md`. -- Com isso, a frente ativa imediata deixa de ser `F51` e passa a ser nenhuma: a próxima decisão de produto volta a ser escolher um novo bucket pequeno sobre baseline já ampliada. -- A fila seguinte recomendada, em ordem, fica restrita por enquanto a quatro buckets pequenos e verificáveis: -- `multi-agent-session-orchestration`: formalizar registry/capabilities e coordenação entre adapters sem abrir UI desktop. -- `local-control-plane-foundation`: expor API local mínima para TUI/integrações futuras, mantendo CLI-first e shell desktop como hipótese posterior. -- `baseline-handoff-sync`: alinhar `PENDING_LOG.md`, `ERROR_LOG.md`, README e artefatos de feature ao estado real pós-`F53`. -- O bucket `desktop-shell` fica explicitamente fora da fila principal neste momento; ele só volta à mesa depois que `runtime boundaries`, `workspace isolation`, `observability` e `control plane` estiverem estabilizados. -- O bucket `TypeScript-first runtime migration` fica explicitamente descartado por ora; qualquer uso futuro de TypeScript deve ficar limitado a shell/UI opcional consumindo um core Python autoritativo. +As 10 frentes do sprint foram concluídas com sucesso: -## Decisões incorporadas recentemente - -- Em 2026-03-13, `origin/main` absorveu a merge de `F42-tui-filters` pela PR `#86`, adicionando filtros visuais locais no dashboard TUI para falhas (`f`), atividade (`r`) e restauracao da lista completa (`x`). -- Em 2026-03-13, `origin/main` absorveu a merge de `F40-local-cancellation` pela PR `#87`, consolidando `synapse runs cancel ` e o atalho `k` no dashboard como cancelamento local e gracioso de runs. -- Com `F42` e `F40`, a TUI local atual passa a cobrir watch, logs por `Enter`, explorer de artifacts por `a`, filtros visuais e cancelamento local, sem abrir scheduler, fila remota ou cancelamento distribuido. -- O drift remanescente voltou a ser documental: `memory.md`, `PENDING_LOG.md`, `ERROR_LOG.md`, `README.md` e `CHANGELOG.md` ficaram atrasados em relacao ao baseline real pos-`F42`/`F40`, e `features/F40-local-cancellation/` e `features/F42-tui-filters/` ficaram sem artefatos minimos de fechamento. -- A frente ativa imediata passa a ser a chore doc-only `chore-post-f40-f42-baseline-sync`, para consolidar handoff e documentacao publica antes da proxima decisao de produto. -- A proxima decisao de produto volta a ficar bloqueada ate essa chore fechar e uma nova `technical-triage` escolher uma unica frente a partir de `main`. - -- Em 2026-03-13, `origin/main` absorveu as merges de `F41-dashboard-artifacts-explorer` (`#80`), `F44-auth-backend-abstraction` (`#81`), `F47-advanced-rbac` (`#82`), `F43-runtime-robustness` (`#83`) e `F45-tui-performance-optimization` (`#84`), consolidando a TUI local, a robustez basica de timeout/retry e o baseline atual de auth local. -- Com essas merges, `main` passou a refletir explorer de artifacts na TUI, buffering de logs, timeout global por step, retry simples para falhas transientes, abstracao local de `AuthProvider` e RBAC local com `viewer`/`operator`/`admin`. -- O drift remanescente deixou de ser funcional e passou a ser documental: `memory.md`, `PENDING_LOG.md`, `docs/IDEAS.md`, `README.md` e `CHANGELOG.md` ficaram atrasados em relacao ao baseline real pos-`F47`. -- A frente ativa imediata passa a ser a chore doc-only `chore-post-f47-baseline-handoff-sync`, para consolidar o handoff do baseline atual antes da proxima decisao de produto. -- A proxima decisao de produto volta a ficar bloqueada ate essa chore fechar e uma nova `technical-triage` escolher uma unica frente a partir de `main`. - -- Em 2026-03-13, a triagem pos-`F37` confirmou que `origin/main` ja cobre o MVP, a etapa 2 e o handoff doc-only pos-`F36`; o bloqueio real estava na branch local `feature/f39-persistence-path-root-hardening`, que havia virado agregador de drafts fora de escopo. -- O estado misto foi preservado em `origin/archive/2026-03-13-f39-drift-snapshot`, e os recortes determinísticos foram separados em `origin/draft/f41-dashboard-artifacts-explorer`, `origin/draft/f43-runtime-robustness`, `origin/draft/f44-auth-backend-abstraction`, `origin/draft/f45-tui-performance-optimization` e `origin/draft/f47-advanced-rbac`. -- Os itens transversais que ainda nao cabem numa unica frente sem inventar codigo novo (`F40`, `F42`, `F46`, testes de lifecycle e docs de roadmap de longo prazo`) ficaram somente no archive branch, sem virar fila ativa nem PR aberta. -- Com isso, a frente ativa imediata deixa de ser `F37` e passa a ser nenhuma: a linha principal volta a partir de `main`, e a proxima decisao de produto fica bloqueada ate nova `technical-triage` em branch limpa. - -- Em 2026-03-13, a `F34-async-submit-runtime-ownership` foi mergeada em `main` pela PR `#70`, fazendo `runs submit` autenticado aceitar dispatch resolvido para `async` apenas quando o runtime residente pertence ao mesmo principal, preservando fallback legado sem `started_by`. -- Em 2026-03-13, a `F35-worker-runtime-ownership-filter` foi mergeada em `main` pela PR `#71`, fazendo o worker do runtime residente consumir apenas runs compativeis com o principal que iniciou o runtime, sem falhar nem lockar runs incompatíveis. -- Em 2026-03-13, a `F36-worker-owner-skip-observability` foi mergeada em `main` pela PR `#72`, tornando auditavel o skip do worker com evento `runtime_owner_skip` nas runs incompatíveis e mantendo o Synapse-Flow como a engine propria de pipeline do SynapseOS. -- Com `F32`, `F34`, `F35` e `F36`, o bucket local de `resident_transport_auth` deixa de ser backlog funcional aberto e passa a ser baseline absorvido; o residual real de `G-11` fica restrito a operacao remota/multi-host. -- A frente ativa imediata deixa de ser triagem de produto e passa a ser a chore doc-only `F37-post-f36-g11-sync`, para alinhar handoff e backlog ao estado pos-`#72` antes da proxima decisao de produto. -- A proxima decisao de produto fica bloqueada ate `PENDING_LOG.md`, `memory.md` e `docs/IDEAS.md` refletirem o baseline real pos-`F36`. - -- Em 2026-03-13, a `F32-runtime-resident-principal-binding` foi mergeada em `main` pela PR `#68`, entregando o primeiro slice concreto do bucket `resident_transport_auth` sem abrir socket, IPC ou operacao remota. -- A `F32` persistiu `started_by` no estado do runtime quando auth local esta habilitada, passou a exibir esse binding em `synapse runtime status` e endureceu `synapse runtime stop` contra operador diferente quando o binding existe. -- Com a `F32`, o residual de `G-11` deixa de ser apenas fundacao local absorvida versus backlog futuro: o bucket `resident_transport_auth` ja tem um primeiro slice entregue, enquanto operacao remota/multi-host continua explicitamente adiada. -- A frente ativa imediata deixou de ser feature de produto e passou a ser chore doc-only de handoff: `F33-post-f32-handoff-sync`, para alinhar memoria operacional e backlog ao estado pos-`#68` antes da proxima triagem. -- A proxima decisao de produto fica bloqueada ate `PENDING_LOG.md`, `ERROR_LOG.md`, `memory.md` e `docs/IDEAS.md` refletirem o baseline real pos-`F32`. - -- Em 2026-03-13, a baseline voltou a ficar estavel apos a merge da PR `#66`, com `repo-checks` e `security-review` verdes na checagem remota e `ruff format --check .` restaurado como gate verde local. -- Com a baseline estabilizada, a frente ativa deixou de ser operacional e voltou a ser backlog de produto: `F31-g11-remote-auth-decomposition`. -- A `F31` foi aberta como frente doc-only para decompor formalmente o residual de `G-11` em `local_cli_auth` ja absorvido, `resident_transport_auth` ainda pendente e `remote_multi_host_auth` explicitamente adiado. -- O proximo trabalho de codigo fica bloqueado ate essa decomposicao documental fechar uma SPEC pequena e verificavel para o bucket `resident_transport_auth`. - -- Em 2026-03-13, a `F30-auth-registry-cli` foi mergeada em `main` pela PR `#65`, adicionando `synapse auth init|issue|disable`, `token_id` no registry local e alinhamento de `docs/IDEAS.md`/README ao baseline pos-F30. -- A `F30` fechou o follow-up local de auth iniciado pela `F29`; o residual real de `G-11` ficou reduzido ao recorte grande de operacao remota/socket, explicitamente adiado. -- O fechamento Git da `F30` exigiu merge explicito porque o job `repo-checks` permaneceu vermelho por `ruff format --check .` em 6 arquivos preexistentes fora do diff funcional da feature. -- Com isso, a proxima frente logica deixou de ser backlog de produto e passou a ser estabilizacao da baseline: restaurar `repo-checks` e sincronizar o handoff pos-F30 antes de abrir nova SPEC. - -- Em 2026-03-12, a `F28-adapter-circuit-breaker` foi mergeada em `main` pela PR `#62`, absorvendo `G-09` com breaker persistido local para o `CodexCLIAdapter` sem reabrir SQLite, auth remota ou CLI publica. -- Em 2026-03-12, a `F29-auth-rbac-foundation` foi mergeada em `main` pela PR `#63`, endurecendo `runs submit` e `runtime start|run|stop` com auth opt-in local, registry privado por hash SHA-256 e reuso de `initiated_by` para provenance autenticada. -- Com `F28` e `F29`, a triagem pos-`F27` deixou de ser `G-09` versus `G-11`: o backlog imediato agora precisa distinguir entre follow-up residual de auth (`socket`, rotacao/provisionamento e operacao remota`) e outras frentes fora da `IDEA-001`. - -- Em 2026-03-12, o handoff operacional foi realinhado ao baseline real pós-`F27`: `main` já incorpora `F23-security-sanitization-foundation`, `F24-workspace-boundary-hardening`, `F25-generated-artifact-ast-guard`, `F26-run-provenance-integrity` e `F27-adapter-concurrency-guard`, via merges `#56` a `#60`. -- Com esse realinhamento, a “primeira SPEC pós-`F22`” deixou de ser pendência atual: a etapa 2 e a primeira onda de guardrails já foram concluídas em `main`, e a próxima decisão passa a ser a primeira SPEC pós-`F27`. -- O backlog remanescente da `IDEA-001` ficou reduzido principalmente a `G-09` (circuit breaker para adapters) e `G-11` (autenticação/autorização), com `G-09` como menor recorte técnico natural para a próxima triagem. - -- Em 2026-03-12, o baseline documental foi realinhado ao estado real do repositório: `main` já incorpora `F17-artifact-preview` e `F22-release-readiness`, fechando a etapa 2 no código, na CLI pública e na release técnica. -- A `F17-artifact-preview` foi mergeada em `main`, consolidando `synapse runs show --preview report` e `--preview .clean` com leitura textual truncada e sem abrir leitura arbitrária do host. -- A `F22-release-readiness` foi mergeada em `main`, consolidando `CHANGELOG.md`, `docs/release/phase-2-technical-release.md`, README alinhado ao quickstart `sync-first` e boundary explícito para artifact preview. -- A próxima decisão do projeto deixou de ser fechar PRs da etapa 2 e passou a ser abrir a primeira SPEC pós-`F22`; `docs/IDEAS.md` permanece como backlog candidato, com `IDEA-001 / G-02` como menor recorte imediato se houver risco real em observabilidade pública. - -- A `F13-rich-cli-output` foi concluida localmente como frente pequena de UX na CLI, sem ampliar a arquitetura: `synapse runtime status` passou a renderizar painel Rich com status e PID, mantendo `stderr` e exit code de falha no estado inconsistente. -- A F13 introduziu `src/synapse_os/cli/rendering.py` como helper minima de apresentacao e adicionou cobertura dedicada em `tests/unit/test_cli_rich_output.py` e `tests/integration/test_runtime_cli.py`. -- A validacao local da F13 fechou verde com `validate_spec_file()` da SPEC, `pytest tests/unit/test_cli_rich_output.py tests/integration/test_runtime_cli.py`, `./scripts/commit-check.sh --no-sync --skip-branch-validation --skip-docker --skip-security` e `./scripts/security-gate.sh`. -- O recorte da F13 permaneceu deliberadamente restrito a `synapse runtime status`, sem `Textual`, sem watch mode, sem novo subcomando publico e sem necessidade de `DOCKER_PREFLIGHT`. -- A `F14-runs-observability-cli` foi concluida localmente como frente pequena de observabilidade CLI-first, adicionando `synapse runs list` e `synapse runs show ` sem abrir TUI. -- A F14 reaproveitou `RunRepository` e `ArtifactStore`, estendeu `src/synapse_os/cli/rendering.py` para listagem/detalhe de runs e manteve o Synapse-Flow como a engine propria de pipeline do SynapseOS. -- A validacao local da F14 fechou verde com `validate_spec_file()` da SPEC, `pytest` focado de CLI/persistencia, `./scripts/commit-check.sh --no-sync --skip-branch-validation --skip-docker --skip-security` e `./scripts/security-gate.sh`. -- O recorte da F14 permaneceu deliberadamente restrito a leitura de runs persistidas: sem watch mode, sem streaming, sem Textual e sem `DOCKER_PREFLIGHT`. -- A PR `#42` da `F14-runs-observability-cli` foi mergeada em `main`, consolidando `synapse runs list` e `synapse runs show ` como superficie publica atual do projeto. -- A etapa seguinte do projeto foi definida e documentada como fila oficial em `docs/architecture/PHASE_2_ROADMAP.md`, seguindo o cenario misto: `F15 -> F16 -> F21 -> F18 -> F19 -> F20 -> F17 -> F22`. -- Uma proposta posterior de guardrails pre-etapa-2 (input, secrets, rate limiting e audit trail) foi triada e nao foi promovida a duas features autonomas; o backlog oficial preserva a etapa 2 como proxima trilha principal. -- O unico recorte excepcional aceito antes da etapa 2, se houver risco real, e mascaramento de secrets em campos `_clean` e artifacts de leitura publica; o restante deve ser absorvido em `F15` e `F21`. -- A `F15-public-run-submission` foi implementada localmente com `synapse runs submit `, `--mode auto|sync|async` e `--stop-at`, reaproveitando o `RunDispatchService` interno e fixando `SPEC_VALIDATION` como default operacional seguro. -- O hardening principal da F15 ficou no proprio dispatch: a SPEC e validada antes de qualquer submit, inclusive em `async`, para evitar persistencia de runs invalidas. -- A validacao local da F15 fechou verde com `validate_spec_file()` da SPEC, `pytest` focado de dispatch/runs/runtime, `./scripts/commit-check.sh --no-sync --skip-branch-validation --skip-docker --skip-security` e `./scripts/security-gate.sh`. -- A PR `#43` da `F15-public-run-submission` foi mergeada em `main`, consolidando `synapse runs submit ` como superficie publica atual junto de `synapse runs list/show`. -- A chore documental pos-F15 alinhou `README.md`, `WORKTREE_FEATURES.md`, `memory.md`, `PENDING_LOG.md` e `.github/copilot-instructions.md` ao baseline atual da etapa 2. -- O baseline real atual tambem ja incorpora a `F16-run-detail-expansion`, a `F21-cli-error-model-and-exit-codes` e a `F18-canonical-happy-path`: as tres frentes tem `SPEC.md` propria, notes/checklists, comportamento materializado na CLI e cobertura dedicada em testes unitarios e de integracao. -- A revalidacao focada do baseline da etapa 2 fechou verde com `uv run --no-sync python -m pytest tests/unit/test_cli_runs_rendering.py tests/integration/test_runs_submit_cli.py tests/integration/test_cli_error_model.py -q`, totalizando `12 passed`. -- O handoff operacional foi realinhado para refletir a fila remanescente correta da etapa 2: `F19 -> F20 -> F17 -> F22`. -- A `F19-environment-doctor` foi concluida e mergeada pela PR `#51`, consolidando `synapse doctor` como diagnostico local e advisory do fluxo publico atual. -- A `F20-public-onboarding` foi concluida e mergeada pela PR `#52`, consolidando o quickstart publico sync-first e o boundary entre `synapse doctor` e `repo-preflight`. -- Com a merge de `F19` e `F20`, a fila remanescente real da etapa 2 passou a ser `F17 -> F22`. -- A `F17-artifact-preview` foi concluida localmente com preview textual controlado em `synapse runs show --preview `, suportando `report` e `.clean` sem abrir leitura arbitraria do host. -- O delta da F17 manteve o contrato de erros da F21 (`Usage error:`/`2`, `Not found:`/`3`) e limitou a leitura ao inicio do artifact, com truncamento explicito apos no maximo 40 linhas. -- A PR `#53` da `F17-artifact-preview` foi aberta contra `main`, deixando a frente pronta para revisao sem merge antecipado. -- A `F22-release-readiness` foi concluida localmente como frente documental e de validacao final, adicionando `CHANGELOG.md`, release notes versionada e boundary explicito entre quickstart sync-first e artifact preview. - -- A `F10-run-report-one-real-adapter` foi concluida e mergeada em `main`, fechando o MVP inicial do Synapse-Flow com `DOCUMENT`, `RUN_REPORT.md` e o primeiro adapter real (`CodexCLIAdapter`). -- A `F12-codex-adapter-operational-hardening` foi concluida e mergeada pela PR `#38`, com `main` local e `origin/main` sincronizados em `ahead=0 behind=0`. -- O hardening da F12 manteve `CLIExecutionResult` como contrato de execucao e adicionou classificacao operacional explicita do Codex (`timeout`, `return_code_nonzero`, `launcher_unavailable`, `container_unavailable`, `authentication_unavailable`) sem reabrir a pipeline. -- O `DOCKER_PREFLIGHT` real e o smoke container-first do Codex foram validados; o unico bloqueio observado foi autenticacao ausente (`401 Unauthorized`), tratado como bloqueio operacional externo e nao como defeito do adapter. - -- A chore `test-layout-typecheck-hardening` estabilizou a arvore `tests/` com package markers explicitos, removendo a colisao operacional entre `tests/unit/conftest.py` e `tests/integration/conftest.py`. -- O repositório agora aceita `uv run mypy src tests`, mas isso foi fechado via override explícito do `mypy` para `tests` e `tests.*`, preservando o contrato strict no pacote `src/synapse_os`. +- F59: Multi-Agent Session Orchestration — Registry/capabilities formalizado, coordenação entre adapters estabilizada +- F60: Local Control Plane Foundation — API local mínima exposta, mantendo CLI-first +- F61: DAG Pipeline Evolution — Pipeline evoluído para DAG state-driven no Synapse-Flow +- F62: Copilot Adapter — Adapter para GitHub Copilot operacional +- F63: Memory Engine Enhancement — Engine de memória com melhorias de performance e consistência +- F64: Advanced Supervisor Policies — Políticas avançadas de supervisão deterministicas +- F65: Runtime Coordinator Hardening — Hardening do coordenador de runtime, validações de identidade de processo +- F66: Reporting & Observability Evolution — Evolução de relatórios (RUN_REPORT.md) e observabilidade +- F67: Workspace Management v2 — Workspace v2 com isolation e path auditável +- F68: Plugin/Extension System — Sistema de plugins/extensions para extensibilidade futura -- A `F09-supervisor-mvp` foi materializada com `SPEC.md`, `NOTES.md` e `CHECKLIST.md` proprios, mantendo o Synapse-Flow como a engine propria de pipeline do SynapseOS e limitando o recorte a supervisor deterministico, pipeline linear ate `SECURITY` e persistencia de decisoes do supervisor. -- A pipeline agora suporta `CODE_GREEN`, `REVIEW` e `SECURITY`; a state machine passou a aceitar `REVIEW -> CODE_GREEN` para rework, e o novo modulo `synapse_os.supervisor` decide entre `retry`, `reroute`, `return_to_code_green` e `fail` de forma deterministica. -- A persistencia de runs da F09 passou a registrar eventos `supervisor_decision`, e a validacao local da feature fechou verde com `233` testes passando, `ruff check`, `uv run --no-sync python -m mypy`, `./scripts/security-gate.sh` e `./scripts/commit-check.sh --skip-docker`. -- O recorte da F09 manteve `retry` e `reroute` dentro da mesma execucao da pipeline; nao houve retomada persistida entre polls do worker nem ampliacao para `DOCUMENT` ou `RUN_REPORT.md`. +Métricas do sprint: 755 tests passando, ruff/mypy 100% clean, zero erros críticos. -- Os três documentos arquiteturais principais foram refinados para maior convergência: `SPEC_FORMAT.md` ganhou tabela de campos obrigatórios, regra H1 obrigatória documentada, valores válidos para `type`, assimetria intencional `non_goals`/`acceptance_criteria` explicada, referência ao template v2 e nova seção sobre testes de integração nos acceptance_criteria. `TDD.md` teve nomes de testes corrigidos para convenção do projeto, seção 9 de fixtures marcada com ✅/🔜, seção 10 sincronizada com estrutura real, e nova seção 13 formalizando requisito de testes de integração por categoria de feature. `SDD.md` teve estados `INIT`/`RETRYING` marcados como pós-MVP, DOCKER_PREFLIGHT reposicionado como gate lateral no diagrama, `metadata: dict` corrigido para `dict[str, Any]`, `parser_confidence` e `REQUEST.md` marcados como não implementados no MVP, e tabela de mapeamento macro ↔ estados internos adicionada na seção 5. -- A suíte de testes foi expandida de 88 → 215 testes ao longo da sessão de hardening: novos conftest.py (unit + integration), fixtures de SPEC inválidas, fixtures de CLI output realistas, test_spec_validator (4→18), test_state_machine (5→30), test_parsing_engine (9→21), test_contracts (4→17), test_config (4→12), test_happy_path (20 novos), test_failure_recovery (9 novos), test_review_rework (11 novos), test_adapter_parser_flow (10 novos de integração). -- `pytest-cov>=5.0.0` foi adicionado ao `pyproject.toml` com configuração `[tool.coverage.run]` e `[tool.coverage.report]`. Versão instalada: `7.0.0`. -- Os `acceptance_criteria` de F02, F03, F04 e F05 foram atualizados para incluir pelo menos um critério verificável somente via teste de integração, alinhando as SPECs com o novo requisito documentado em `TDD.md` seção 13 e `SPEC_FORMAT.md`. -- Fixtures novas criadas: `tests/fixtures/docker/valid_compose_config.txt`, `invalid_compose_config.txt`, `tests/fixtures/reports/expected_run_report.md`, `tests/fixtures/cli_outputs/gemini_plan.txt`, `codex_tests.txt`, `claude_review.txt`. -- Security review da branch `chore/tdd-integration-hardening` foi aprovado sem ressalvas: zero mudanças em código de produção, todos os padrões de subprocess existentes são legítimos, uso de `unicode_escape` em fixtures é controlado e sem risco de injeção. - -- A correcao de follow-up da `F06-pipeline-engine-linear` reexportou `SpecValidationError` em `synapse_os.pipeline`, alinhando a API publica da engine com o teste de bloqueio por SPEC invalida e restaurando o `repo-checks` local no mesmo caminho usado pelo CI. -- A `F06-pipeline-engine-linear` passou a ter `SPEC.md` propria, `NOTES.md`, contratos tipados de pipeline (`PipelineStep`, `StepExecutionResult`, `PipelineContext`) e uma `PipelineEngine` linear em fake mode para o Synapse-Flow. -- O recorte da `F06-pipeline-engine-linear` ficou deliberadamente restrito a `SPEC_VALIDATION`, `PLAN` e `TEST_RED`, reutilizando `SpecValidator` e state machine ja existentes, sem persistencia, worker, supervisor ou adapters reais. -- A validacao local da `F06-pipeline-engine-linear` fechou verde com `SPEC` validada, `88` testes passando via `python -m pytest`, `ruff check`, `ruff format --check`, `mypy` e `./scripts/branch-sync-check.sh` em `ahead=0 behind=0`. -- O `security-review` do delta da `F06-pipeline-engine-linear` foi concluido sem ressalvas: a feature nao adiciona shell, subprocesso novo, Docker, workflow ou automacao operacional, e mantem a execucao de pipeline em fake mode com contexto em memoria e validacao explicita da SPEC antes de `PLAN`. - -- A `F05-cli-adapter-base` passou a ter `SPEC.md` propria, `NOTES.md`, um `BaseCLIAdapter` assíncrono via `asyncio.create_subprocess_exec` e a evolucao de `CLIExecutionResult` para incluir `tool_name`, `stdout/stderr` raw/clean, `duration_ms` e `timed_out`. -- O recorte da `F05-cli-adapter-base` ficou deliberadamente restrito a contrato de execucao, subprocesso async, timeout e sanitizacao leve de ANSI, preservando o Parsing Engine da `F04` como responsavel por limpeza mais rica e extracao de artefatos antes dos hand-offs do Synapse-Flow. -- A validacao local da `F05-cli-adapter-base` fechou verde com `SPEC` validada, `84` testes passando via `python -m pytest`, `ruff check`, `ruff format --check`, `mypy` e `./scripts/branch-sync-check.sh` em `ahead=0 behind=0`. -- O `security-review` do delta da `F05-cli-adapter-base` foi concluido sem ressalvas: a implementacao usa `create_subprocess_exec` sem shell, preserva output bruto separado do output limpo, aplica timeout com encerramento explicito do processo e mantem a sanitizacao conservadora no adapter. +## Decisões incorporadas recentemente -- A `F04-parsing-engine-mvp` passou a ter `SPEC.md` propria, fixtures de output ruidoso e um Parsing Engine minimo com limpeza de ANSI, extracao de blocos fenced Markdown e validacao sintatica de artefatos Python. -- O hardening final da `F04-parsing-engine-mvp` normalizou linguagem de fences para lowercase, canonizou `py` para `python`, preservou texto semantico generico ao remover apenas ruido de transporte explicito e adicionou limites fixos de tamanho/volume no parser. -- A validacao local mais recente da `F04-parsing-engine-mvp` fechou verde com `./scripts/commit-check.sh --no-sync --skip-branch-validation --skip-docker --skip-security`, incluindo `81` testes verdes, `ruff` e `mypy`. -- O `security-review` final da `F04-parsing-engine-mvp` foi reavaliado apos o hardening do parser e ficou aprovado sem ressalvas no recorte atual. -- O fluxo de PR assistido pelo agente Git passou a exigir corpo de PR via `--body-file` em vez de `--body` inline quando houver Markdown com backticks, blocos de codigo ou outros caracteres shell-sensitive, evitando corrupção da descrição publicada por expansão acidental do shell. -- A baseline MCP do `codex-dev` passou a ser segura por padrao: `.codex/config.toml` deixou de carregar `github-actions`, `sqlite` e `docker` no startup default, e o MCP oficial do GitHub passou a ser renderizado dinamicamente apenas quando houver token no ambiente. -- O fallback de `GITHUB_TOKEN` para `GITHUB_PERSONAL_ACCESS_TOKEN` ficou centralizado em `scripts/render-codex-config.sh`, tornando o launcher do Codex testavel e removendo a dependencia de symlink persistida no volume `codex-home`. -- A avaliacao operacional confirmou que o MCP de Docker deve ficar fora do baseline do `codex-dev`, porque o ambiente isolado continua sem `docker.sock`. -- `tests/unit/test_repo_automation.py` passou a cobrir o baseline seguro do MCP do Codex, a renderizacao da config efetiva com e sem token e a ausencia de `docker.sock` no `codex-dev`. -- A validacao desta frente ficou fechada com `./scripts/docker-preflight.sh`, `pytest tests/unit/test_repo_automation.py` verde, smoke do `dev-codex.sh` sem token e smoke com fallback via `GITHUB_TOKEN`. -- O `security-review` do delta MCP/Codex foi concluido com aprovacao e duas ressalvas baixas e nao bloqueantes: o helper de renderizacao aceita `--source`/`--output` arbitrarios e o fallback de `GITHUB_TOKEN` pode habilitar o MCP do GitHub em ambientes onde essa variavel ja exista por outro motivo. -- O fechamento Git desta frente foi isolado na branch `chore/codex-mcp-baseline-hardening` com commit local `89e8111 chore(repo): harden codex mcp baseline`, sem push e sem PR. -- A validação operacional de `uv sync --locked --extra dev` foi concluída com sucesso em ambiente com rede liberada. -- A validação real do job `branch-validation` em GitHub Actions foi concluída com sucesso, confirmando checkout por `github.event.pull_request.head.sha` e uso de `github.head_ref` para o nome efetivo da branch em `pull_request`. -- O fluxo local de `./scripts/commit-check.sh --no-sync` foi endurecido para executar `mypy` e `pytest` via `python -m ...`, reduzindo dependência de wrappers quebrados na `.venv`. -- A validação operacional de `./scripts/docker-preflight.sh` sem `--dry-run` foi concluída com sucesso no modo padrão leve (`compose config` + build, sem `up`) em ambiente com Docker acessível. -- A validação contra `main` em `pull_request` passou a usar o `head.sha` real da PR e o nome real da branch, evitando merge ref/detached ref sintético no GitHub Actions. -- O hook local `.githooks/pre-commit` ficou explicitamente leve via `./scripts/commit-check.sh --hook-mode`. -- O `DOCKER_PREFLIGHT` operacional real continua explícito e separado do hook leve, via `./scripts/docker-preflight.sh`. -- A baseline operacional do repositório foi restaurada com correções mínimas de Ruff/import order/formatação nos arquivos apontados pela revisão. -- O `commit-check.sh` passou a usar `./scripts/commit-check.sh --sync-dev` como caminho padrão para bootstrap/checks locais, com `--no-sync` explícito para rerun rápido e `--hook-mode` preservando o fluxo leve. -- A SPEC da feature e o lifecycle do runtime passaram a exigir validação adicional de identidade do processo antes de `stop`. -- O estado do runtime passou a exigir escrita atômica, permissões restritas e tratamento seguro para corrupção/adulteração local. -- O security-review final da feature considerou o escopo aprovado com ressalvas baixas e compatíveis com o MVP. -- A branch de integração `chore/merge-operational-candidates` consolidou `chore-resolve-operational-merge-conflicts`, `feat-agent-skills` e `features/f11-runtime-persistente-minimo`. -- A validação prática da feature de runtime persistente foi fechada na branch de integração com `17` testes passando em ambiente local dedicado. -- A branch `chore/devcontainer-codex-isolation` introduziu um ambiente isolado de desenvolvimento do Codex com `.devcontainer/`, `compose.dev.yaml`, `scripts/dev-codex.sh` e profile versionado em `.codex/config.toml`. -- O fluxo container-first do Codex ficou documentado em `AGENTS.md` e `README.md`, mantendo `codex-dev` separado do serviço de runtime `synapse-os`. -- A validação operacional local confirmou `codex-dev` com usuário não-root, `read_only`, `no-new-privileges`, `cap_drop: [ALL]`, sem `docker.sock`, sem mount do `$HOME` do host e com bind mount restrito ao repositório em `/workspace`. -- A Branch Sync Gate foi incorporada como regra operacional leve em `AGENTS.md`, com `./scripts/branch-sync-check.sh` para detectar drift e `./scripts/branch-sync-update.sh` para atualização conservadora da branch. -- As ressalvas baixas do security-review sobre a Branch Sync Gate foram mitigadas e o parecer final ficou aprovado, sem risco novo relevante. -- O `debug-failure` foi criado como skill própria para diagnóstico inicial de falhas reais, classificação da causa e encaminhamento para o próximo agent. -- A avaliação de ADR concluiu que a Branch Sync Gate é convenção operacional local de governança do repositório e não exige ADR nova nem atualização de ADR existente. -- A branch `feat/memory-curator-skill` abriu a frente de memória durável do repositório com a skill `memory-curator`, `memory.md` inicial e registro mínimo do papel da skill em `AGENTS.md`. -- O `memory-curator` ficou definido para consolidar decisões incorporadas, trade-offs, estado atual da frente, pendências abertas e próximos passos em `memory.md`, sem substituir `session-logger` nem `technical-triage`. -- O fluxo de fechamento por convenção operacional ficou registrado na skill `memory-curator` com as chamadas `$memory-curator encerrar conversa` e `$memory-curator close session`, deixando explícito que isso não é alias nativo da plataforma. -- A avaliação mais recente de ADR concluiu que `memory-curator` e `memory.md` não exigem ADR neste momento, por serem governança operacional local e não mudança arquitetural estável. -- A avaliação operacional desta frente fixou `./scripts/commit-check.sh --sync-dev` como caminho padrão para checks/testes locais, com `uv run --no-sync` restrito a reexecução rápida após bootstrap e virtualenv explícita apenas como fallback de diagnóstico. -- A mitigação final desta frente moveu a validação de branch para antes de qualquer resolução de fluxo e antes de qualquer `uv sync`, eliminando sincronização desnecessária antes do gate operacional. -- O `security-review` final desta frente foi concluído com aprovação sem ressalvas, mantendo a separação entre hook leve, checks locais e `DOCKER_PREFLIGHT` operacional real. -- A validação operacional do ambiente atual do Codex com `network-access = true` confirmou `git push` e `gh pr create` funcionando no sandbox normal; a governança operacional foi ajustada para refletir o sandbox como caminho padrão e manter fallback fora do sandbox apenas como contingência para falha real de rede/sandbox, sem mascarar erro de autenticação, permissão ou conectividade real do host. -- A revalidação operacional mais recente confirmou `ruff format --check .` verde no estado atual do repositório, restaurando o gate completo de formatação sem necessidade de ajuste adicional. -- A sincronização conservadora da branch atual com `origin/main` foi revalidada com `git fetch origin main --prune`, `./scripts/branch-sync-check.sh` e `./scripts/branch-sync-update.sh --mode rebase`, permanecendo em no-op seguro com `ahead=0` e `behind=0`. -- A feature `F02-spec-engine-mvp` passou a ter `SPEC.md` propria, fixtures de SPEC valida/invalida e um validador minimo de `SPEC_VALIDATION` com parser de front matter YAML, checagem de campos obrigatorios e exigencia das secoes `Contexto` e `Objetivo`. -- A validacao local da `F02-spec-engine-mvp` foi concluida com testes verdes para o novo `SpecValidator`, mantendo o recorte da feature sem antecipar state machine, pipeline completa ou editor de SPEC. -- O `security-review` da `F02-spec-engine-mvp` foi aprovado com ressalvas baixas: manter `yaml.safe_load` e, na integracao futura, restringir o chamador a paths esperados de `SPEC.md` dentro do workspace da run. -- A PR `#19` da `F02-spec-engine-mvp` teve o gate `repo-checks` restaurado com correcao minima de formatacao, import order e compatibilidade de `mypy` no `SpecValidator`, sem ampliar o escopo da feature. -- O `security-review` mais recente da correcao da F02 aprovou o delta com ressalva baixa e localizada: o `# type: ignore[import-untyped]` em `yaml` e aceitavel neste recorte, mas pode ser removido depois com tipagem mais explicita ou `types-PyYAML`. -- A `F03-state-machine-mvp` passou a ter `SPEC.md` propria, state machine minima do Synapse-Flow com transicoes lineares validas, bloqueio de `PLAN` antes de `SPEC_VALIDATION` e estado terminal `FAILED`, com testes verdes e PR `#20` aberta. -- O `security-review` da `F03-state-machine-mvp` foi aprovado com ressalvas baixas: os estados ainda sao modelados como strings livres e `TERMINAL_STATES` ainda nao e usada explicitamente nas validacoes internas. -- A `F03-state-machine-mvp` ficou autocontida na worktree atual com materializacao de `features/F03-state-machine-mvp/SPEC.md`, mantendo alinhamento com o recorte aprovado da feature. -- A validacao local da `F03-state-machine-mvp` confirmou `5` testes unitarios verdes para a state machine minima, e o proximo passo logico permanece fechar `REPORT/COMMIT` antes de abrir a `F04`. -- A `F03-state-machine-mvp` foi encerrada: correcao de `B905` (`zip()` com `strict=False`), rebase sobre `main` atualizado, 10/10 checks CI verdes, PR `#20` mergeada por merge commit em `main`. Worktree e branch local removidos. -- O commit `chore(repo): add copilot instructions and codex MCP servers` incluiu `.github/copilot-instructions.md` (instrucoes de projeto com regra de idioma portugues) e `.codex/config.toml` com 4 MCP servers essenciais (GitHub, GitHub Actions, Docker, SQLite). +- Em 2026-04-01, o sprint F59-F68 foi consolidado em `origin/main` com todas as frentes mergeadas. +- O Synapse-Flow permanece como a engine própria de pipeline do SynapseOS, agora com suporte a DAG state-driven. +- O runtime boundaries foundation (F51-F53) estabilizou contratos de `ToolSpec`/capabilities, `WorkspaceProvider`, `RunContext` e lifecycle hooks. +- A arquitetura atual suporta multi-agent session orchestration sem UI desktop, mantendo CLI-first. +- O local control plane foundation (F60) expõe API mínima para TUI/integrações futuras, sem abrir shell desktop. ## Pendências abertas -- Fixtures de testes aspiracionais marcadas como 🔜 no TDD.md: `tests/fixtures/worker/` (ainda ausente). -- Property-based testing com `hypothesis` ainda não implementado (mencionado como evolução futura em TDD.md). -- Fechar o bucket `baseline-handoff-sync` alinhando `ERROR_LOG.md`, `README.md` e eventuais docs publicas ao baseline local pos-`F53`. -- Rodar nova `technical-triage` depois do `baseline-handoff-sync` para escolher uma unica frente entre `multi-agent-session-orchestration` e `local-control-plane-foundation`. -- Manter `desktop-shell` e `TypeScript-first runtime migration` explicitamente fora da fila principal ate o core Python atual estabilizar boundaries, isolamento e observabilidade. -- Manter `remote_multi_host_auth` explicitamente adiado ate existir demanda concreta, recorte proprio e validavel. +- Avaliar demanda concreta para `desktop-shell` — mantido fora da fila principal até core Python estabilizar. +- Avaliar demanda para `TypeScript-first runtime migration` — explicitamente descartado por ora; TypeScript limitado a shell/UI opcional consumindo core Python. +- Avaliar demanda para `remote_multi_host_auth` — explicitamente adiado até existir demanda concreta. +- Rodar `technical-triage` para definir próximas frentes pós-sprint F59-F68. ## Pontos de atenção futuros -- O bloqueio operacional de autenticacao do Codex (`401 Unauthorized`) ficou explicitamente classificado na F12; revalidar esse smoke apenas quando houver credencial valida e necessidade real de uso autenticado. -- A `F29` fechou apenas a fundacao local de auth/RBAC; nao assumir que `socket + RBAC` da `IDEA-001` foi totalmente absorvido sem uma nova SPEC especifica para operacao remota. - -- Validar em momento futuro uma operacao real do MCP oficial do GitHub com credencial valida, pois a frente atual fechou apenas o startup path e a cobertura operacional do launcher. -- Fixture `noisy_mixed_output.txt` e `noisy_no_code_block.txt` armazenam sequências ANSI como literais `\u001b`. Todo helper que os lê para testar comportamento de ANSI precisa de `unicode_escape=True`. Considerar adicionar comentário nos próprios arquivos de fixture documentando isso. -- A ampliação de `TRANSPORT_NOISE_PREFIXES` para incluir prefixos como `[rpc]` deve ser decisão explícita documentada na SPEC da feature responsável — não uma adição silenciosa. -- Os testes de `test_review_rework.py` exercitam a state machine diretamente para estados CODE_GREEN/REVIEW/SECURITY que ainda não estão implementados no `PipelineEngine`. Quando o Supervisor/pipeline for implementado para esses estados, esses testes servem como documentação de comportamento esperado e devem ser migrados para testes de integração. -- O retry/reroute da F09 permanece restrito a uma unica execucao do Synapse-Flow; retomada persistida entre polls do worker e requeue duravel continuam fora de escopo. -- Em worktree fria, `pytest` e `uv run pytest` podem falhar na coleta ate que `uv sync --locked --extra dev` tenha sido executado. - -- O fallback de `GITHUB_TOKEN` para `GITHUB_PERSONAL_ACCESS_TOKEN` continua aceitavel para o baseline atual, mas pode merecer opt-in explicito se gerar ambiguidade operacional em ambientes com tokens preexistentes. -- O helper `scripts/render-codex-config.sh` continua restrito ao launcher atual; se passar a ser reutilizado fora desse fluxo, vale endurecer os paths aceitos. -- `uv run --no-sync` continua dependendo de ambiente previamente sincronizado; em worktree fria ele pode cair no Python do host e falhar por dependências ausentes. -- O fluxo local com `.venv` pode exigir `PYTHONPATH=src` quando não se usa `uv run`; por isso ele continua apenas como fallback operacional e não como caminho padrão. -- O hardening do runtime valida identidade do processo por marcador + token em `/proc//cmdline`; isso continua Linux-first. -- A validação do diretório configurável de estado permanece propositalmente básica no MVP e pode ser endurecida depois com âncora explícita no workspace. -- O runtime persistente continua propositalmente restrito a processo único local, sem scheduler, distribuição ou recuperação avançada. -- No uso diário do Codex em container, prefira `./scripts/dev-codex.sh` como entrypoint principal para evitar corrida operacional com `docker compose ... up` manual sobre o mesmo serviço. -- No uso diário de sincronização com `main`, prefira `./scripts/branch-sync-check.sh` e `./scripts/branch-sync-update.sh` em vez de comandos Git ad hoc; a atualização automática continua propositalmente conservadora e pode exigir resolução manual. -- `memory.md` deve permanecer memória durável e reaproveitável, sem virar transcrição de conversa. -- O `memory-curator` deve consolidar estado e handoff, enquanto `ERROR_LOG.md` e `PENDING_LOG.md` seguem como trilha operacional detalhada. -- Na integracao futura do `SpecValidator`, o chamador deve restringir a leitura de `SPEC.md` a paths esperados do workspace para evitar ampliacao desnecessaria da superficie de entrada. -- O `# type: ignore[import-untyped]` em `yaml` da F02 permanece como mitigacao minima de tipagem; reavaliar remocao quando houver frente dedicada de endurecimento ou tipagem de dependencias. -- Na evolucao da state machine apos a F03, considerar encapsular estados em `Enum` ou aplicar `TERMINAL_STATES` de forma efetiva para reduzir risco de drift semantico sem ampliar esta feature. - -## TUI — Ideia de feature futura (análise de viabilidade concluída) - -- **Rich enriquecido (F13-rich-cli-output)**: concluida localmente como primeira adocao de Rich em `src/`, restrita a `synapse runtime status` e sem abrir TUI completa. -- **Observabilidade CLI de runs (F14-runs-observability-cli)**: concluida localmente e fecha a lacuna minima de inspecao antes de qualquer TUI. -- **TUI watch (F14-tui-watch-command)**: `synapse tui` como subcomando opcional usando Textual. Pré-requisito atualizado: F13 + F14 + implementação de `observability/` (diretório ainda vazio). Hook ideal já existe: `PipelineObserver` em `pipeline.py`. -- **Constraint Typer×asyncio**: `asyncio.run(app.run_async())` dentro do comando Typer é a forma de coexistência; funcional mas exige cuidado com event loop. -- **TTY em container**: Rich degrada automaticamente sem TTY; Textual exige guarda `sys.stdout.isatty()`. -- **Não implementar antes**: apesar da F14 resolver a observabilidade minima via CLI, TUI real continua dependendo de recorte proprio de watch/streaming e da camada `observability/`. - -## Estado do baseline atual - -- Etapa 2 concluída em `main` com `F17-artifact-preview` e `F22-release-readiness` já mergeadas. -- A primeira onda de guardrails pós-release também está concluída em `main` com `F23 -> F27`. -- A fila ativa agora passa a ser definida pela próxima SPEC pós-`F27`, não mais pela abertura da primeira SPEC pós-`F22`. - -## Guardrails candidatos fora da fila principal - -- Os follow-ups curtos de mascaramento publico e normalizacao textual deixaram de ser candidatos: esses recortes foram absorvidos na `F23`. -- Rate limiting por adapter, audit trail adicional com `initiated_by` e hardening amplo de config tambem deixaram de ser backlog aberto isolado: esses recortes foram absorvidos em `F26` e `F27`. -- Os itens de guardrail ainda em aberto concentram-se em `G-09` e `G-11`. - -## Itens que podem virar novas features ou ajustes futuros - -- Endurecimento adicional do path de estado para restringir explicitamente a uma raiz confiável do workspace. -- Melhoria de portabilidade do runtime além de Linux, caso isso entre no escopo futuro. -- Documentação operacional curta para bootstrap local (`--sync-dev`) e para o lifecycle do runtime persistente. -- Limpeza operacional do repositório para remover debt de formatação fora do escopo desta feature. -- Integracao do `SpecValidator` ao fluxo seguinte da pipeline, incluindo bloqueio formal antes de `PLAN`. -- Evolucao da state machine para suportar estados adicionais como `RETRYING`, integracao com executor de steps e persistencia do estado fora do recorte minimo da F03. +- O runtime persistente continua Linux-first; melhoria de portabilidade pode ser avaliada no futuro. +- O hardening do runtime valida identidade por marcador + token em `/proc//cmdline`. +- Manter `./scripts/dev-codex.sh` como entrypoint principal para evitar corrida operacional. +- Manter `./scripts/branch-sync-check.sh` e `./scripts/branch-sync-update.sh` para sincronização conservadora. +- `memory.md` deve permanecer memória durável e reaproveizável, sem virar transcrição. +- O `memory-curator` consolida estado e handoff; `ERROR_LOG.md` e `PENDING_LOG.md` seguem como trilha operacional. diff --git a/SECURITY_AUDIT_REPORT.md b/SECURITY_AUDIT_REPORT.md new file mode 100644 index 0000000..78d21b9 --- /dev/null +++ b/SECURITY_AUDIT_REPORT.md @@ -0,0 +1,738 @@ +# SECURITY_AUDIT_REPORT.md + +**Project:** SynapseOS +**Audit Date:** 2026-04-01 +**Auditor:** Security Audit Skill +**Scope:** Full codebase (F59-F68) - Multi-Agent Session Orchestration through Plugin/Extension System + +--- + +## Executive Summary + +This audit covers the SynapseOS meta-orchestrator codebase, focusing on 10 features implemented during the current sprint. The codebase demonstrates good security practices in several areas, including proper secret masking, constant-time token comparison, and path traversal protection. However, **4 HIGH severity** and **7 MEDIUM severity** issues were identified that require attention. + +**Verdict:** `SECURITY_PASS_WITH_NOTES` - Risks mitigable with documented corrections. + +--- + +## 1. Superfície de Ataque Mapeada + +### 1.1 HTTP API Surface (Control Plane) + +| Endpoint | Method | Auth Required | Input Surface | Risk Level | +| ------------------------------ | ------ | ------------- | --------------------------------- | ---------- | +| `/health` | GET | No | None | Low | +| `/api/v1/runs` | GET | Yes | Query params: `limit`, `offset` | Low | +| `/api/v1/runs` | POST | Yes | JSON body: `prompt` (unvalidated) | **HIGH** | +| `/api/v1/runs/{run_id}` | GET | Yes | Path param: `run_id` | Low | +| `/api/v1/runs/{run_id}/cancel` | POST | Yes | Path param: `run_id` | Medium | +| `/api/v1/runtime/status` | GET | Yes | None | Low | +| `/api/v1/artifacts/{run_id}` | GET | Yes | Path param: `run_id` | Medium | + +**Key Attack Vectors:** + +- `/api/v1/runs` (POST): User-supplied `prompt` is written directly to filesystem without validation +- `/api/v1/artifacts/{run_id}`: Path traversal risk on artifact listing + +### 1.2 Authentication & Authorization + +| Component | Mechanism | Storage | Risk | +| ----------------- | ------------------------------- | ------------------ | --------------------------------------------- | +| Control Plane API | Bearer token | In-memory (config) | Medium - Single shared token | +| CLI Auth | Token-based registry | SQLite + JSON file | Low - Proper SHA256 hashing with HMAC compare | +| Role-Based Access | 3 roles (admin/operator/viewer) | File-based | Low - Well-defined permission matrix | + +**Components:** + +- `src/synapse_os/control_plane/middleware.py` - Bearer token middleware +- `src/synapse_os/auth.py` - Auth registry with RBAC + +### 1.3 CLI Adapters (External Command Execution) + +| Adapter | Command | Injection Risk | Environment | +| ------------------- | -------------------------------- | ---------------------------------------------- | ---------------- | +| `CodexCLIAdapter` | `./scripts/dev-codex.sh -- exec` | **HIGH** - User prompt passed to shell | Docker container | +| `GeminiCLIAdapter` | `python -c ...` | **HIGH** - Prompt interpolation in Python code | Host | +| `CopilotCLIAdapter` | `gh copilot ai` | **HIGH** - User prompt passed to CLI | Host | + +**Components:** + +- `src/synapse_os/adapters.py` - All CLI adapters +- `src/synapse_os/runtime/circuit_breaker.py` - Failure detection + +### 1.4 Plugin Loading System + +| Entry Point | Loading Mechanism | Validation | Risk | +| -------------------- | ----------------------------------- | ---------- | ----------------------------------- | +| `synapse_os.plugins` | `importlib.metadata.entry_points()` | None | **HIGH** - Arbitrary code execution | + +**Components:** + +- `src/synapse_os/plugins.py` - Plugin registry and loader + +### 1.5 File System Surface + +| Operation | Path Validation | Permission Controls | Risk | +| ---------------- | ------------------------------------- | --------------------------- | ------ | +| Spec creation | `/tmp/synapse-os/api-specs/{uuid}.md` | No (relies on /tmp perms) | Medium | +| Artifact storage | `resolve_path_within_root()` | `0o600` files, `0o700` dirs | Low | +| Auth registry | `resolve_path_within_root()` | `0o600` files, `0o700` dirs | Low | +| Workspace pool | `base_dir / f"ws-{counter}"` | Standard perms | Low | + +**Components:** + +- `src/synapse_os/security.py` - Path validation utilities +- `src/synapse_os/persistence.py` - Artifact storage with permissions + +### 1.6 Runtime & Process Management + +| Operation | Mechanism | Risk | +| ---------------- | -------------------------------------------- | ---------------------------------- | +| Process spawning | `subprocess.Popen` with injected Python code | Medium - Code injection via string | +| Signal handling | SIGTERM/SIGINT handlers | Low | +| PID tracking | `/proc/{pid}/cmdline` parsing | Low - Linux-specific | + +**Components:** + +- `src/synapse_os/runtime/service.py` - Runtime lifecycle management + +### 1.7 Data Persistence + +| Storage | Encryption | Access Control | Risk | +| ------------------ | ---------- | --------------------- | ------------------------------ | +| SQLite runs DB | No | File permissions only | Medium - Contains run metadata | +| Artifact files | No | `0o600` permissions | Low | +| Auth registry JSON | No | `0o600` permissions | Medium - Token hashes present | +| Memory store | No | File permissions only | Low | + +--- + +## 2. Achados por Severidade + +### CRITICAL (0 issues) + +No critical vulnerabilities identified that would allow immediate system compromise. + +--- + +### HIGH (4 issues) + +#### H1: Command Injection in CLI Adapters + +**Location:** `src/synapse_os/adapters.py:186-191`, `311-322`, `368-376` +**Severity:** HIGH +**CVSS:** 7.5 (AV:N/AC:L/PR:N/UI:N/S:U/C:L/I:L/A:L) + +**Description:** +User-supplied `prompt` is passed directly to shell commands without sanitization: + +```python +# adapters.py:186-191 (CodexCLIAdapter) +def build_command(self, prompt: str) -> list[str]: + return [ + "./scripts/dev-codex.sh", + "--", + "exec", + "--color", + "never", + prompt, # <-- Direct injection + ] +``` + +**Exploit Path:** + +1. Attacker provides prompt: `"; cat /etc/passwd; echo "` +2. Command executes with injected shell metacharacters +3. Arbitrary command execution on host/container + +**Mitigation:** + +```python +import shlex +# Escape or use list args without shell interpretation +def build_command(self, prompt: str) -> list[str]: + return [ + "./scripts/dev-codex.sh", + "--", + "exec", + "--color", + "never", + shlex.quote(prompt), # Or better: pass via stdin + ] +``` + +**Recommended Macro:** `fix-feature` for prompt sanitization + +--- + +#### H2: Python Code Injection in GeminiCLIAdapter + +**Location:** `src/synapse_os/adapters.py:315-322` +**Severity:** HIGH +**CVSS:** 8.1 (AV:N/AC:L/PR:N/UI:N/S:U/C:L/I:H/A:L) + +**Description:** +Prompt is passed as a command-line argument (argv[1]) to `python -c`: + +```python +return [ + sys.executable, + "-c", + "import os, sys; " + "key = os.environ.get('SYNAPSE_OS_GEMINI_API_KEY'); " + "print(f'Gemini response to: {sys.argv[1]}') " + "if key else sys.exit('Error: SYNAPSE_OS_GEMINI_API_KEY not set')", + prompt, # Passed as argv[1] +] +``` + +The prompt is passed as a data argument (argv[1]), not interpolated into the Python source string, so it is not code injection. However, it is still passed via command line which can expose it to other local users via `/proc//cmdline`. + +**Mitigation:** +Pass prompt via stdin or environment variable instead of command line. + +**Recommended Macro:** `fix-feature` for adapter refactoring + +--- + +#### H3: Arbitrary Code Execution via Plugin System + +**Location:** `src/synapse_os/plugins.py:95-108` +**Severity:** HIGH +**CVSS:** 8.8 (AV:L/AC:L/PR:L/UI:N/S:C/C:H/I:H/A:H) + +**Description:** +Plugins loaded via `entry_points()` execute arbitrary code at import time: + +```python +def load_plugins(self) -> None: + eps = entry_points(group="synapse_os.plugins") + for ep in eps: + try: + module = ep.load() # <-- Executes module-level code + # ... + except Exception: + pass # Silent failure +``` + +Any installed package can register an entry point and execute code when SynapseOS starts. + +**Mitigation:** + +1. Implement plugin signature verification +2. Maintain allowlist of approved plugins +3. Load plugins in isolated subprocess/sandbox +4. Log all plugin loads with full path + +**Recommended Macro:** `fix-feature` for plugin sandboxing + +--- + +#### H4: Unvalidated Spec File Creation via API + +**Location:** `src/synapse_os/control_plane/server.py:225-240` +**Severity:** HIGH +**CVSS:** 7.2 (AV:N/AC:L/PR:H/UI:N/S:U/C:L/I:H/A:L) + +**Description:** +User prompt written to filesystem without validation: + +```python +def _create_spec_from_prompt(prompt: str) -> Path: + tmp_dir = Path(os.environ.get("TMPDIR", "/tmp")) / "synapse-os" / "api-specs" + tmp_dir.mkdir(parents=True, exist_ok=True) + spec_path = tmp_dir / f"{uuid4().hex}.md" + spec_content = ( + "---\n" + "feature_id: api-run\n" + # ... + f"# API Run\n\n{prompt}\n" # Unvalidated content + ) + spec_path.write_text(spec_content, encoding="utf-8") + return spec_path +``` + +**Risk:** Path traversal via symlink attack, malicious markdown content, or YAML frontmatter injection. + +**Mitigation:** + +1. Validate prompt against allowed characters +2. Use secure temporary directory with proper permissions +3. Validate generated SPEC before use + +**Recommended Macro:** `fix-feature` for input validation + +--- + +### MEDIUM (7 issues) + +#### M1: Shared API Token for Control Plane + +**Location:** `src/synapse_os/control_plane/middleware.py:29-31` +**Severity:** MEDIUM + +**Description:** +Single shared token comparison. If token is compromised, all API access is granted. No per-user or per-session tokens. + +**Mitigation:** +Implement per-principal API tokens stored in auth registry. + +--- + +#### M2: No Rate Limiting on API Endpoints + +**Location:** `src/synapse_os/control_plane/server.py` (all endpoints) +**Severity:** MEDIUM + +**Description:** +No rate limiting implemented, enabling brute force attacks on token and DoS via run creation. + +**Mitigation:** +Add rate limiting middleware (e.g., slowapi with Redis). + +--- + +#### M3: Process Identity Check Bypassable + +**Location:** `src/synapse_os/runtime/service.py:181-203` +**Severity:** MEDIUM + +**Description:** +`_process_identity_matches()` reads `/proc/{pid}/cmdline` which can be manipulated. The `PROCESS_MARKER` check is weak: + +```python +if PROCESS_MARKER in arguments and process_identity in arguments: + return True +``` + +Another process could include these strings in its arguments. + +**Mitigation:** +Use stronger mechanism like abstract Unix socket or pidfile with exclusive lock. + +--- + +#### M4: SQL Injection Risk in Persistence (Theoretical) + +**Location:** `src/synapse_os/persistence.py` +**Severity:** MEDIUM (Currently mitigated by SQLAlchemy) + +**Description:** +All queries use SQLAlchemy ORM which provides parameterization. However, `_upgrade_runs_schema()` uses raw SQL without proper sanitization checks: + +```python +connection.exec_driver_sql("ALTER TABLE runs ADD COLUMN spec_hash TEXT") +``` + +Future modifications could introduce injection. + +**Mitigation:** +Add validation for column names in schema migrations. + +--- + +#### M5: Artifact Path Traversal via Run ID + +**Location:** `src/synapse_os/persistence.py:527-539` +**Severity:** MEDIUM + +**Description:** +`list_artifact_paths()` uses `rglob` after path validation. Symlink attacks could still escape base_path: + +```python +for path in run_directory.rglob("*"): + if not path.is_file(): + continue + try: + resolve_path_within_root(path, root=self.base_path) + except ValueError: + continue +``` + +**Risk:** TOCTOU between `rglob()` and `resolve_path_within_root()`. + +**Mitigation:** +Use `O_NOFOLLOW` when opening files or resolve before operations. + +--- + +#### M6: Secrets in Environment Variables + +**Location:** `src/synapse_os/config.py` (indirect) +**Severity:** MEDIUM + +**Description:** +Configuration pulls from environment (`SYNAPSE_OS_*`), which: + +1. Appears in process listings (`ps e`) +2. May be logged by Docker, CI systems +3. Persists in shell history + +**Mitigation:** +Support file-based secrets (e.g., `/run/secrets/`) as primary method. + +--- + +#### M7: Circuit Breaker State File Tampering + +**Location:** `src/synapse_os/runtime/circuit_breaker.py` +**Severity:** MEDIUM + +**Description:** +Circuit breaker state stored in JSON file without integrity verification. Attacker with file access could reset failure counters. + +**Mitigation:** +Add HMAC signature or store in tamper-evident database. + +--- + +### LOW (5 issues) + +#### L1: Health Endpoint Information Disclosure + +**Location:** `src/synapse_os/control_plane/server.py:52-60` +**Severity:** LOW + +**Description:** +`/health` endpoint exposes runtime status without authentication, revealing system state to reconnaissance. + +**Mitigation:** +Consider requiring auth for detailed status, or limit info. + +--- + +#### L2: Exception Details in HTTP Responses + +**Location:** `src/synapse_os/control_plane/server.py` (multiple) +**Severity:** LOW + +**Description:** +Some error handlers chain exceptions which may leak internal details: + +```python +raise HTTPException(status_code=404, detail="Run not found") from err +``` + +**Mitigation:** +Log full tracebacks internally, return generic messages externally. + +--- + +#### L3: No Input Length Limits on Prompt + +**Location:** `src/synapse_os/control_plane/models.py` +**Severity:** LOW + +**Description:** +`RunCreateRequest.prompt` has no maximum length validation, enabling memory exhaustion attacks. + +--- + +#### L4: Workspace Cleanup Race Condition + +**Location:** `src/synapse_os/workspace.py:43-48` +**Severity:** LOW + +**Description:** +`reset_for_reuse()` iterates and deletes without locking: + +```python +for item in self.root.iterdir(): + if item.name != self.root.name: + if item.is_dir(): + shutil.rmtree(item) +``` + +**Risk:** Race condition during concurrent cleanup. + +--- + +#### L5: Missing Security Headers in FastAPI + +**Location:** `src/synapse_os/control_plane/server.py:42-47` +**Severity:** LOW + +**Description:** +No security headers (HSTS, CSP, X-Frame-Options, etc.) configured. + +--- + +## 3. Gestão de Secrets + +### Current Implementation + +| Aspect | Status | Details | +| -------------------- | ------ | ---------------------------------------------------------- | +| Token Storage | Good | SHA256 hashes only, never plaintext (auth.py:267-268) | +| Token Comparison | Good | `hmac.compare_digest()` for constant-time (auth.py:214) | +| API Keys in Adapters | Poor | Read from env, no rotation mechanism | +| Secret Masking | Good | Configurable regex patterns (security.py:11-16) | +| File Permissions | Good | `0o600` for files, `0o700` for dirs (persistence.py:47-48) | + +### Secrets Identified in Code + +| Secret | Location | Storage Method | Risk | +| ---------------- | -------------------- | --------------------------- | ---------------------------- | +| GitHub Token | `adapters.py` (env) | `SYNAPSE_OS_GITHUB_TOKEN` | Medium - Env exposure | +| Gemini API Key | `adapters.py` (env) | `SYNAPSE_OS_GEMINI_API_KEY` | Medium - Env exposure | +| API Bearer Token | `middleware.py` | In-memory/config | Medium - Single shared token | +| Claude API Key | `.github/workflows/` | `secrets.CLAUDE_API_KEY` | Low - GitHub Secrets | + +### Recommendations + +1. **Implement secret rotation mechanism** for API keys +2. **Use Docker secrets or external vault** (HashiCorp Vault, AWS Secrets Manager) +3. **Add audit logging** for all token usage +4. **Implement token expiration** for issued tokens + +--- + +## 4. Deps com Vulnerabilidades Conhecidas + +### Dependency Analysis + +| Package | Version | CVE Status | Risk | +| ------------------- | --------- | ---------------- | ---------------------------- | +| FastAPI | >=0.115.0 | No known CVEs | Low | +| SQLAlchemy | >=2.0.36 | No critical CVEs | Low | +| Typer | >=0.12.5 | No known CVEs | Low | +| Pydantic | >=2.9.2 | No critical CVEs | Low | +| python-statemachine | >=2.5.0 | **Unknown** | Medium - Less common package | +| textual | >=8.1.1 | No known CVEs | Low | + +### Supply Chain Risks + +1. **Entry Points System** (plugins.py): Loads code from any installed package +2. **CLI Adapters**: Execute external commands (`gh`, `docker`, custom scripts) +3. **No dependency pinning in requirements**: Uses `>=` version constraints + +### Recommendations + +```bash +# Run dependency audit +pip install safety +safety check -r requirements.txt + +# Consider pinning exact versions +pip freeze > requirements-lock.txt +``` + +--- + +## 5. CI/CD e Automações + +### GitHub Workflows Analysis + +| Workflow | Privileges | Issues | Risk | +| --------------------- | ---------------------------------------- | ------------------------------- | ---------- | +| `security-review.yml` | `pull-requests: write`, `contents: read` | Uses third-party action `@main` | **MEDIUM** | +| `operational-ci.yml` | `contents: read` | None identified | Low | +| `container-build.yml` | `contents: read` | None identified | Low | + +### Security Gate Analysis + +**Location:** `scripts/security-gate.sh` + +**Strengths:** + +- Checks for `permissions:` in workflows +- Blocks `eval` usage in scripts +- Blocks `curl | sh` patterns +- Blocks privileged containers +- Blocks docker.sock mounting + +**Gaps:** + +- No check for action pinning (using `@main`, `@v1` instead of commit SHA) +- No check for secret leakage in logs +- No check for workflow injection via `pull_request_target` + +### Scripts Analysis + +| Script | Privilege Escalation | Injection Risk | Safe | +| --------------------- | -------------------- | -------------------------- | ---- | +| `dev-codex.sh` | No | Low - User-controlled args | Yes | +| `docker-preflight.sh` | No | Low | Yes | +| `security-gate.sh` | No | Low | Yes | +| `commit-check.sh` | No | Low | Yes | + +### Recommendations + +1. **Pin all GitHub Actions to commit SHAs:** + + ```yaml + # Instead of: + - uses: actions/checkout@v4 + + # Use: + - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2 + ``` + +2. **Add workflow validation** to security gate for: + - `pull_request_target` usage + - Unpinned actions + - `GITHUB_TOKEN` with write permissions + +--- + +## 6. Recomendações Priorizadas + +### Immediate (P0 - Fix before release) + +1. **[H1] Sanitize prompts in CLI adapters** (adapters.py) + - Use `shlex.quote()` or stdin-based passing + - Effort: 2 hours + - Macro: `fix-feature` + +2. **[H2] Fix Gemini adapter code injection** (adapters.py:315-322) + - Pass prompt via stdin or env var + - Effort: 1 hour + - Macro: `fix-feature` + +3. **[H3] Implement plugin allowlist** (plugins.py:95-108) + - Add signature verification + - Effort: 4 hours + - Macro: `fix-feature` + +### Short-term (P1 - Fix within 2 weeks) + +4. **[H4] Add input validation for API spec creation** (server.py:225-240) + - Validate prompt length and content + - Effort: 2 hours + - Macro: `fix-feature` + +5. **[M1] Implement per-principal API tokens** + - Extend auth registry to support API tokens + - Effort: 4 hours + - Macro: `fix-feature` + +6. **[M2] Add rate limiting** (server.py) + - Implement per-endpoint rate limits + - Effort: 3 hours + - Macro: `fix-feature` + +7. **[M6] Support file-based secrets** + - Read secrets from `/run/secrets/` or similar + - Effort: 2 hours + - Macro: `fix-feature` + +### Medium-term (P2 - Next sprint) + +8. **[M3] Strengthen process identity verification** + - Use abstract sockets or pidfile locks + - Effort: 4 hours + - Macro: `fix-feature` + +9. **[M5] Fix artifact path traversal** + - Use `O_NOFOLLOW` or pre-resolve paths + - Effort: 2 hours + - Macro: `fix-feature` + +10. **Pin GitHub Actions to commit SHAs** + - Update all workflows + - Effort: 1 hour + - Macro: `ci-automation` + +11. **Add dependency scanning to CI** + - Integrate `safety` or `pip-audit` + - Effort: 2 hours + - Macro: `ci-automation` + +--- + +## 7. Próximos Passos + +### Immediate Actions + +1. **Open `fix-feature` branches for:** + - `fix/adapter-command-injection` (H1, H2) + - `fix/plugin-allowlist` (H3) + - `fix/api-input-validation` (H4) + +2. **Security regression tests to add:** + + ```python + # test_security.py additions + - test_prompt_injection_codex_adapter() + - test_prompt_injection_gemini_adapter() + - test_plugin_unauthorized_load() + - test_api_prompt_path_traversal() + ``` + +3. **CI hardening:** + - Pin all actions in `.github/workflows/` + - Add dependency scanning step + - Add secret scanning with `truffleHog` + +### Documentation Updates + +1. Update `docs/architecture/SDD.md` with: + - Security boundary definitions + - Trust boundaries diagram + - Plugin security model + +2. Update `AGENTS.md` with: + - Security review requirements for adapters + - Plugin development guidelines + +### Ongoing Security Practices + +1. **Quarterly security audits** using this same methodology +2. **Dependency scanning** on every PR via CI +3. **Secret rotation** every 90 days +4. **Penetration testing** before major releases + +--- + +## Appendices + +### A. Files Reviewed + +``` +src/synapse_os/control_plane/server.py +src/synapse_os/control_plane/middleware.py +src/synapse_os/auth.py +src/synapse_os/adapters.py +src/synapse_os/plugins.py +src/synapse_os/supervisor.py +src/synapse_os/memory.py +src/synapse_os/security.py +src/synapse_os/workspace.py +src/synapse_os/config.py +src/synapse_os/runtime/service.py +src/synapse_os/multi_agent.py +src/synapse_os/pipeline.py +src/synapse_os/persistence.py +scripts/security-gate.sh +scripts/dev-codex.sh +scripts/docker-preflight.sh +.github/workflows/security-review.yml +.github/workflows/operational-ci.yml +.github/workflows/container-build.yml +pyproject.toml +``` + +### B. Tools Used + +- Manual code review +- Pattern matching for security anti-patterns +- Architecture mapping +- STRIDE threat modeling (implicit) + +### C. Limitations + +1. Dynamic analysis not performed (no runtime testing) +2. Dependency CVE scan not executed (requires `safety` or `pip-audit`) +3. Container security scan not performed +4. Network-level testing not performed +5. Fuzzing not performed on input validation + +--- + +**Report Generated:** 2026-04-01 +**Security Review Status:** `SECURITY_PASS_WITH_NOTES` +**Next Audit Due:** 2026-07-01 + +--- + +_This report was generated by the security-audit skill following SynapseOS security review protocols._ diff --git a/docs/adr/003-state-machine-pipeline-engine.md b/docs/adr/003-state-machine-pipeline-engine.md index 46b1aff..6592e76 100644 --- a/docs/adr/003-state-machine-pipeline-engine.md +++ b/docs/adr/003-state-machine-pipeline-engine.md @@ -1,29 +1,47 @@ # ADR-003 — Adotar state machine + Synapse-Flow ## Status -Aceito + +Aceito (atualizado para DAG pipeline) ## Contexto -O SynapseOS precisa coordenar uma esteira com estados explícitos, retries, rollback lógico, hand-offs auditáveis e futura evolução para DAG. Scripts lineares isolados comprometeriam rastreabilidade, manutenção e controle fino do domínio. + +O SynapseOS precisa coordenar uma esteira com estados explícitos, retries, rollback lógico, hand-offs auditáveis e execução paralela. Scripts lineares isolados comprometeriam rastreabilidade, manutenção e controle fino do domínio. ## Decisão + O sistema adotará: + - **state machine** para governar estados e transições; -- o **Synapse-Flow**, a **engine própria de pipeline** do SynapseOS, em Python para coordenar os steps, hand-offs, retries e integração com o supervisor. +- o **Synapse-Flow**, a **engine própria de pipeline** do SynapseOS, em Python para coordenar os steps, hand-offs, retries e integração com o supervisor; +- **DAG pipeline execution** com suporte a: + - steps com dependências explícitas; + - execução paralela de steps independentes via `asyncio.gather`; + - fan-out/fan-in para steps que precisam aguardar múltiplas dependências; + - detecção de ciclos no grafo de dependências. + +O Synapse-Flow mantém compatibilidade com pipelines lineares (DAG de 1 caminho) enquanto evolui para execução paralela real. ## Consequências + ### Positivas + - transições explícitas e auditáveis; - forte aderência ao domínio do produto; -- menor complexidade operacional no MVP do que orquestradores pesados; -- caminho claro para evolução futura para DAG e paralelismo. +- execução paralela reduz tempo total de runs com steps independentes; +- caminho claro para evolução futura para workers distribuídos; +- modelo de dependências explícitas melhora documentação e rastreabilidade. ### Negativas + - maior responsabilidade de implementação interna; -- necessidade de testes rigorosos do Synapse-Flow. +- necessidade de testes rigorosos do Synapse-Flow; +- complexidade adicional de scheduling paralelo e sincronização; +- detecção de ciclos e validação de DAG adicionam overhead. ## Alternativas consideradas -- pipeline linear hardcoded; -- Prefect desde o MVP; -- Temporal desde o MVP; -- fila sem modelagem explícita de estado. + +- pipeline linear hardcoded: rejeitado — não aproveita paralelismo; +- Prefect desde o MVP: rejeitado — complexidade operacional prematura; +- Temporal desde o MVP: rejeitado — overkill para estado atual; +- fila sem modelagem explícita de estado: rejeitado — perde rastreabilidade. diff --git a/docs/adr/004-cli-adapter-layer.md b/docs/adr/004-cli-adapter-layer.md index 086364a..c49b704 100644 --- a/docs/adr/004-cli-adapter-layer.md +++ b/docs/adr/004-cli-adapter-layer.md @@ -1,26 +1,51 @@ -# ADR-004 — Implementar uma camada de abstração para adapters CLI +# ADR-004 — Implementar uma camada de abstração para adapters CLI com Multi-Agent Orchestration ## Status -Aceito + +Aceito (atualizado para multi-agent) ## Contexto -Cada ferramenta externa difere em sintaxe de comando, comportamento operacional, timeouts, autenticação e formato de output. + +Cada ferramenta externa difere em sintaxe de comando, comportamento operacional, timeouts, autenticação e formato de output. Com múltiplos agentes disponíveis, o sistema precisa: + +- Registrar adapters dinamicamente; +- Rotear tarefas para o agente mais adequado baseado em capability; +- Suportar fallback entre agentes quando um falha. ## Decisão -Criar uma camada dedicada de adapters CLI com interface compartilhada e implementações específicas por ferramenta. + +Criar uma camada dedicada de adapters CLI com: + +1. **Interface compartilhada** (`BaseCLIAdapter`) para execução padronizada; +2. **AdapterRegistry**: registro dinâmico de adapters disponíveis; +3. **CapabilityRouter**: roteamento baseado em capabilities declaradas por cada adapter: + - Cada adapter declara capabilities (ex: `code_generation`, `refactoring`, `testing`); + - Router seleciona adapter mais adequado para a tarefa; + - Suporte a fallback automático em caso de falha; + - Política configurável (custo, latência, qualidade); +4. **Implementações específicas por ferramenta**: Gemini, Codex, Copilot, OpenCode, DeepSeek, Claude, LLMs locais. ## Consequências + ### Positivas + - desacoplamento entre orquestrador e ferramentas; - melhor testabilidade; -- extensão simplificada para novos agentes; -- centralização de políticas de execução. +- extensão simplificada para novos agentes (apenas registrar no AdapterRegistry); +- centralização de políticas de execução; +- roteamento inteligente permite otimizar custo/qualidade por tarefa; +- resiliência via fallback entre múltiplos agentes. ### Negativas + - necessidade de manutenção contínua dos adapters; -- risco de abstração ruim esconder comportamentos úteis específicos de uma ferramenta. +- risco de abstração ruim esconder comportamentos úteis específicos; +- complexidade adicional do CapabilityRouter (decisão de roteamento); +- necessidade de mapear capabilities de forma consistente entre adapters. ## Alternativas consideradas -- chamadas diretas a subprocess espalhadas pelo código; -- adapter único parametrizado para tudo; -- scripts shell externos como wrappers. + +- chamadas diretas a subprocess espalhadas pelo código: rejeitado — sem abstração; +- adapter único parametrizado para tudo: rejeitado — não lida com diferenças semânticas; +- scripts shell externos como wrappers: rejeitado — difícil testar e manter; +- roteamento hardcoded por tarefa: rejeitado — não é extensível. diff --git a/docs/adr/005-semantic-memory.md b/docs/adr/005-semantic-memory.md index 1525f40..1ba857f 100644 --- a/docs/adr/005-semantic-memory.md +++ b/docs/adr/005-semantic-memory.md @@ -1,26 +1,50 @@ -# ADR-005 — Implementar memória semântica com papel advisory no MVP +# ADR-005 — Implementar memória semântica com papel advisory no MVP e indexing ## Status -Aceito + +Aceito (atualizado para indexing) ## Contexto -A memória semântica pode futuramente influenciar roteamento e planejamento, mas isso aumenta risco de comportamento pouco previsível e difícil de explicar no primeiro release. + +A memória semântica pode futuramente influenciar roteamento e planejamento, mas isso aumenta risco de comportamento pouco previsível. Além disso, o volume de artefatos gerados demanda indexação eficiente para consultas rápidas. ## Decisão -No MVP, a memória semântica será implementada com papel **advisory/read-only**. Ela servirá para apoio de execução, enriquecimento de contexto e análise posterior, sem alterar automaticamente o roteamento. + +No MVP, a memória semântica será implementada com: + +1. **Papel advisory/read-only**: apoio de execução, enriquecimento de contexto e análise posterior, sem alterar automaticamente o roteamento; +2. **IndexedArtifactStore**: armazenamento de artefatos com índices para consulta rápida: + - Índice por run_id, step_id, tipo de artefato; + - Índice por timestamp para consultas temporais; + - Índice por hash de conteúdo para deduplicação; +3. **Namespacing**: isolamento de memória por: + - Workspace (diferentes projetos); + - Run (contexto de execução); + - Step (contexto de step específico); + - Global (padrões compartilhados entre runs). ## Consequências + ### Positivas -- comportamento mais previsível; + +- comportamento mais previsível com memória advisory; - melhor auditabilidade; -- menor risco de decisões automáticas ruins; -- possibilidade de aprender com histórico sem automatizar cedo demais. +- consultas rápidas a artefatos históricos via índices; +- deduplicação automática reduz storage; +- namespacing permite contextos isolados e seguros; +- base para futura evolução de memória semântica com roteamento. ### Negativas + - menos adaptação automática no curto prazo; -- supervisor determinístico continua responsável pelas decisões principais. +- overhead de manutenção de índices; +- complexidade de gerenciamento de namespaces; +- necessidade de estratégia de expiração/limpeza de índices antigos. ## Alternativas consideradas -- operação totalmente stateless; -- memória semântica com roteamento automático desde o MVP; -- apenas logs sem sumarização semântica. + +- operação totalmente stateless: rejeitado — perde aprendizado; +- memória semântica com roteamento automático desde o MVP: rejeitado — risco prematuro; +- apenas logs sem sumarização semântica: rejeitado — perde valor analítico; +- armazenamento sem índice: rejeitado — escalabilidade ruim com volume; +- índice único global: rejeitado — sem isolamento de contexto. diff --git a/docs/adr/014-http-control-plane.md b/docs/adr/014-http-control-plane.md new file mode 100644 index 0000000..faa0c92 --- /dev/null +++ b/docs/adr/014-http-control-plane.md @@ -0,0 +1,60 @@ +# ADR-014 — Adotar HTTP Control Plane com FastAPI + +## Status + +Aceito + +## Contexto + +O SynapseOS opera primariamente via CLI com runtime dual (CLI efêmero + worker residente), mas precisa de uma interface programática para: + +- Integração com ferramentas externas que preferem APIs REST; +- Monitoramento e observabilidade remota das runs; +- Trigger de runs via webhooks; +- Consulta de estado e artefatos sem acesso direto ao filesystem. + +A arquitetura atual é state-driven com state machine explícita (ADR-003) e Synapse-Flow como engine de pipeline, tornando natural expor estados e transições via API. + +## Decisão + +Adotar um **HTTP Control Plane** usando FastAPI como camada de interface REST sobre o Synapse-Flow. + +Componentes: + +- **FastAPI** como framework web (async nativo, validação Pydantic, OpenAPI automático); +- **REST API design** com recursos principais: `/health`, `/api/v1/runs`, `/api/v1/runtime/status`, `/api/v1/artifacts/{run_id}`; +- **Async handlers** para não bloquear o event loop do worker; +- **State machine projection** — estados internos expostos como endpoints de consulta; +- **Auth middleware** com Bearer token (`SYNAPSE_API_TOKEN`), health check é público. + +O HTTP Control Plane é uma **camada opcional** — o sistema continua funcionando 100% via CLI sem a API ativa. A API é ativada via comando explícito `synapse control-plane start`. + +## Consequências + +### Positivas + +- Permite integração com sistemas externos que esperam APIs REST; +- Facilita observabilidade e dashboards sem acesso ao host; +- Async/await alinhado com o modelo async do Synapse-Flow; +- OpenAPI/Swagger gerado automaticamente para documentação; +- Separação clara: lógica de negócio no Synapse-Flow, protocolo HTTP na camada de controle. + +### Negativas + +- Adiciona dependência FastAPI + Uvicorn; +- Requer modelagem explícita de DTOs para evitar expor objetos internos; +- Risco de acoplamento se lógica de negócio vazar para handlers HTTP; +- Necessidade de autenticação/autorização para exposição em rede. + +## Alternativas consideradas + +- **gRPC**: rejeitado — maior complexidade, necessidade de proto files, menor aderência a integrações simples; +- **GraphQL**: rejeitado — overkill para MVP, complexidade de resolvers e N+1 queries; +- **Sem API HTTP**: rejeitado — limitaria integrações e observabilidade remota; +- **Flask/Sanic**: rejeitado — FastAPI tem melhor suporte a async, tipagem e documentação automática. + +## Relação com ADRs existentes + +- ADR-003 (state-machine-pipeline-engine): API reflete estados da state machine; +- ADR-009 (runtime-dual): API é interface do worker residente leve; +- ADR-010 (synapse-flow-name): API expõe operações do Synapse-Flow. diff --git a/docs/adr/015-plugin-system.md b/docs/adr/015-plugin-system.md new file mode 100644 index 0000000..c58fee2 --- /dev/null +++ b/docs/adr/015-plugin-system.md @@ -0,0 +1,73 @@ +# ADR-015 — Adotar Plugin System com HookSpec e Entry Point Discovery + +## Status + +Aceito + +## Contexto + +O SynapseOS precisa ser extensível sem modificar o core. Adapters CLI (ADR-004) resolvem integração com ferramentas externas, mas o sistema precisa de mecanismos para: + +- Plugins de terceiros estenderem comportamento (novos parsers, novos tipos de step); +- Hooks em pontos específicos da pipeline (pré/pós execução, transformação de artefatos); +- Discovery automático de extensões instaladas via pip/entry points. + +A arquitetura state-driven do Synapse-Flow (ADR-003) possui pontos bem definidos onde hooks podem ser injetados sem comprometer o fluxo principal. + +## Decisão + +Adotar um **Plugin System** baseado em: + +1. **HookSpec**: contratos declarativos usando `pluggy` (sistema de hooks do pytest); +2. **Entry point discovery**: plugins registrados via `pyproject.toml` `[project.entry-points."synapseos.hooks"]`; +3. **Hook points explícitos**: + - `pre_step_execute`: antes de executar um step; + - `post_step_execute`: após execução, antes do parsing; + - `pre_artifact_persist`: antes de persistir artefato; + - `post_run_complete`: ao finalizar run com sucesso; + - `on_run_failed`: quando run falha (para cleanup ou notificação). + +Regras: + +- Hooks são **opcionais** — sistema funciona sem plugins; +- Hooks podem **modificar** contexto (mutável) ou apenas **observar** (readonly); +- Falha em hook não quebra pipeline (log + continua), exceto hooks críticos explicitamente marcados; +- Plugins são carregados uma vez no boot do Synapse-Flow. + +## Consequências + +### Positivas + +- Extensibilidade sem fork do core; +- Ecossistema permitido: comunidade pode criar plugins sem PRs no repo principal; +- `pluggy` é battle-tested (usado no pytest), bem documentado; +- Entry points são padrão Python, sem magia de import dinâmico; +- Hooks bem definidos permitem instrumentação, métricas, notificações customizadas. + +### Negativas + +- Nova dependência (`pluggy`); +- Surface de ataque aumentada — plugins maliciosos podem executar código arbitrário; +- Debugging mais complexo quando múltiplos plugins interagem; +- Necessidade de versionamento de HookSpec (breaking changes em hooks); +- Overhead de carregamento de plugins no startup. + +## Alternativas consideradas + +- **Import dinâmico de módulos**: rejeitado — menos estruturado, risco de side effects no import; +- **Sistema de hooks próprio**: rejeitado — reinventar roda, `pluggy` já resolve bem; +- **Arquitetura de microserviços**: rejeitado — overkill, aumentaria complexidade operacional; +- **Config-based plugin loading**: rejeitado — entry points são mais idiomáticos em Python. + +## Segurança + +Plugins executam com os mesmos privilégios do Synapse-Flow. Recomendações: + +- Documentar que plugins são código arbitrário — só instalar de fontes confiáveis; +- Futuro: considerar sandboxing ou assinatura de plugins. + +## Relação com ADRs existentes + +- ADR-004 (cli-adapter-layer): plugins podem adicionar novos adapters dinamicamente; +- ADR-003 (state-machine-pipeline-engine): hooks são invocados em transições de estado; +- ADR-014 (http-control-plane): plugins podem expor endpoints customizados na API. diff --git a/docs/architecture/SDD.md b/docs/architecture/SDD.md index 141a997..20c783e 100644 --- a/docs/architecture/SDD.md +++ b/docs/architecture/SDD.md @@ -3,9 +3,11 @@ ## 1. Visão Geral ### 1.1 Propósito + SynapseOS é um meta-orquestrador de agentes de IA via CLI. Seu papel é coordenar múltiplas ferramentas externas de IA, organizar hand-offs entre etapas de uma esteira controlada e produzir artefatos de software com rastreabilidade, resiliência e baixo custo operacional. ### 1.2 Objetivos + - Orquestrar ferramentas de IA via CLI de forma uniforme. - Executar pipelines autônomos de desenvolvimento de software. - Isolar contexto entre etapas e agentes. @@ -16,9 +18,11 @@ SynapseOS é um meta-orquestrador de agentes de IA via CLI. Seu papel é coorden - Permitir evolução futura para paralelismo, DAG real e workers distribuídos. ### 1.3 Escopo + O sistema recebe uma tarefa, produz uma especificação estruturada, planeja sua execução, chama agentes externos por subprocess, limpa suas saídas, valida os artefatos, reage a falhas, persiste memória de execução e gera um relatório final por run. ### 1.4 Fora do escopo inicial + - Cluster distribuído completo. - Suporte nativo a Windows/macOS como plataforma principal. - Interface web completa. @@ -28,6 +32,7 @@ O sistema recebe uma tarefa, produz uma especificação estruturada, planeja sua --- ## 2. Premissas do MVP + - Linguagem principal: **Python 3.12+**. - Execução principal: **CLI-first**. - Runtime do MVP: **dual**, com CLI efêmero e worker/daemon residente leve. @@ -42,6 +47,7 @@ O sistema recebe uma tarefa, produz uma especificação estruturada, planeja sua --- ## 3. Princípios Arquiteturais + 1. **CLI-first**: integrações externas devem passar por adapters padronizados. 2. **Spec-first**: a demanda bruta deve ser transformada em especificação verificável antes do planejamento executivo. 3. **State-driven orchestration**: o fluxo deve ser auditável por máquina de estados e evolutivo para DAG. @@ -58,23 +64,29 @@ O sistema recebe uma tarefa, produz uma especificação estruturada, planeja sua ### 4.1 Camadas principais #### Camada de Orquestração + Responsável por estado, pipeline, supervisão, memória e decisão. Componentes: + - Orchestrator Engine - Synapse-Flow - State Machine Manager -- Pipeline Manager -- Adaptive Supervisor -- Memory Engine +- Pipeline Manager (DAG execution) +- Adaptive Supervisor (per-step policies, backoff) +- Memory Engine (IndexedArtifactStore, namespacing) - Spec Engine - Runtime Coordinator #### Camada de Adapters + Responsável por integração com ferramentas externas via CLI. Componentes: + - Base CLI Adapter +- AdapterRegistry +- CapabilityRouter - Gemini Adapter - Codex Adapter - Copilot Adapter @@ -83,10 +95,34 @@ Componentes: - Claude Adapter - Local LLM Adapter +#### Camada de Controle HTTP + +Responsável por expor API REST para integração externa. + +Componentes: + +- FastAPI Application +- REST Controllers (/runs, /steps, /artifacts, /agents) +- Async Handlers +- Webhook Dispatcher + +#### Camada de Extensão + +Responsável por permitir extensões sem modificar o core. + +Componentes: + +- Plugin Loader (entry point discovery) +- Hook Manager (pluggy) +- HookSpec Definitions +- Plugin Registry + #### Camada de Execução Autônoma + Conjunto de ferramentas externas executadas sob demanda. #### Camada de Persistência e Observabilidade + Responsável por persistir runs, steps, artefatos, eventos e relatórios. --- @@ -100,6 +136,7 @@ SPEC → TEST_RED → CODE_GREEN → REFACTOR → QUALITY_GATE → SECURITY_REVI ``` Regras: + - `DOCKER_PREFLIGHT` é executado pela skill `repo-preflight` quando a feature exigir validação prática em Docker. - Em CI e no fluxo local, o `DOCKER_PREFLIGHT` padrão é leve: `compose config` sem `up`; build fica explícito quando necessário. - O runtime completo em container fica reservado para workflow dedicado ou acionamento explícito em features que toquem boot, ciclo de vida, persistência ou integração. @@ -116,22 +153,24 @@ O macroestágio `SPEC` do fluxo oficial engloba `SPEC_DISCOVERY`, `SPEC_NORMALIZ ### 5.3 Mapeamento macro ↔ estados internos -| Macroestágio (fluxo oficial) | Estados internos do Synapse-Flow | -|---|---| -| `SPEC` | `SPEC_DISCOVERY` → `SPEC_NORMALIZATION` → `SPEC_VALIDATION` | -| `TEST_RED` | `PLAN` → `TEST_RED` | -| `CODE_GREEN` | `CODE_GREEN` | -| `REFACTOR` | parte de `CODE_GREEN` (sem estado dedicado no MVP) | -| `QUALITY_GATE` | `QUALITY_GATE` | -| `SECURITY_REVIEW` | `SECURITY` | -| `REPORT` | `DOCUMENT` | -| `COMMIT` | pós-`COMPLETE` (fora da state machine, ação do operador) | -| — | `FAILED` (acessível de qualquer estado não-terminal) | +| Macroestágio (fluxo oficial) | Estados internos do Synapse-Flow | +| ---------------------------- | ----------------------------------------------------------- | +| `SPEC` | `SPEC_DISCOVERY` → `SPEC_NORMALIZATION` → `SPEC_VALIDATION` | +| `TEST_RED` | `PLAN` → `TEST_RED` | +| `CODE_GREEN` | `CODE_GREEN` | +| `REFACTOR` | parte de `CODE_GREEN` (sem estado dedicado no MVP) | +| `QUALITY_GATE` | `QUALITY_GATE` | +| `SECURITY_REVIEW` | `SECURITY` | +| `REPORT` | `DOCUMENT` | +| `COMMIT` | pós-`COMPLETE` (fora da state machine, ação do operador) | +| — | `FAILED` (acessível de qualquer estado não-terminal) | ### 5.3 Motivação da etapa SPEC + A etapa de especificação transforma intenção em contrato operacional. Ela reduz ambiguidade entre agentes, melhora a geração de testes, aumenta a previsibilidade do parsing e permite validar aderência entre requisito, teste, código e documentação. ### 5.4 Regras da etapa SPEC + - A entrada é a tarefa bruta do usuário. - A saída é uma SPEC híbrida validável. - A pipeline não avança para `PLAN` sem validação mínima da SPEC. @@ -142,7 +181,9 @@ A etapa de especificação transforma intenção em contrato operacional. Ela re ## 6. Modelo Runtime ### 6.1 Modo CLI efêmero + Usado para: + - executar ou validar `DOCKER_PREFLIGHT` antes do trabalho prático dependente de Docker, - iniciar runs, - executar runs curtas inline, @@ -150,7 +191,9 @@ Usado para: - disparar jobs para execução posterior. ### 6.2 Worker/daemon residente leve + Usado para: + - consumir runs pendentes, - executar o Synapse-Flow, - aplicar retries longos, @@ -158,6 +201,7 @@ Usado para: - gerar artefatos e relatório final. ### 6.3 Motivação do runtime dual + O runtime dual permite preservar a experiência CLI e, ao mesmo tempo, suportar tarefas longas sem bloquear o operador. Também prepara o sistema para crescimento futuro sem obrigar adoção imediata de uma infraestrutura pesada de filas distribuídas. --- @@ -220,12 +264,15 @@ O runtime dual permite preservar a experiência CLI e, ao mesmo tempo, suportar ## 8. Módulos Principais ### 8.1 Orchestrator Engine + Coordena a execução ponta a ponta, cria o contexto da run, invoca o Synapse-Flow e consolida resultados. ### 8.2 State Machine Manager + Modela e valida estados e transições. Estados implementados no MVP: + - `REQUEST` - `SPEC_DISCOVERY` - `SPEC_NORMALIZATION` @@ -243,22 +290,35 @@ Estados implementados no MVP: > **Pós-MVP (não implementados)**: `INIT` e `RETRYING` estão reservados para versões futuras onde o estado de inicialização precisa de rastreamento explícito e retries têm estado próprio na máquina. ### 8.3 Pipeline Manager -Executa a sequência de steps. No MVP, a esteira é linear; no futuro, deve suportar DAG com fan-out/fan-in. + +Executa a sequência de steps com suporte a DAG. + +Funcionalidades: + +- Execução linear (pipeline tradicional); +- Execução paralela de steps independentes via `asyncio.gather`; +- Fan-out/fan-in para steps com múltiplas dependências; +- Detecção de ciclos no grafo de dependências; +- Scheduling otimizado baseado em profundidade do DAG. ### 8.4 Spec Engine + Responsável por: + - converter a demanda bruta em especificação operacional; - normalizar linguagem, escopo, critérios de aceite e restrições; - validar schema e completude mínima; - produzir artefatos estruturados para planejamento e testes. Subcomponentes sugeridos: + - `spec_discovery` - `spec_normalizer` - `spec_validator` - `spec_repository` ### 8.5 CLI Adapter Layer + Abstrai a execução das ferramentas externas. Contrato mínimo (implementado em `src/synapse_os/contracts.py`): @@ -279,7 +339,9 @@ class CLIExecutionResult: ``` ### 8.6 Parsing Engine + Transforma saídas ruidosas em artefatos estruturados. Deve operar em múltiplas fases: + 1. normalização textual, 2. limpeza via regex, 3. extração de blocos relevantes, @@ -287,24 +349,47 @@ Transforma saídas ruidosas em artefatos estruturados. Deve operar em múltiplas 5. fallback heurístico. Validações adicionais: + - `ast.parse()` para código Python, - Pydantic para contratos internos, - JSON Schema para SPEC. ### 8.7 Memory Engine -Armazena histórico operacional e memória semântica. + +Armazena histórico operacional e memória semântica com indexação. + +#### IndexedArtifactStore + +Armazenamento estruturado com índices: + +- Índice por run_id, step_id, tipo de artefato; +- Índice por timestamp para consultas temporais; +- Índice por hash de conteúdo para deduplicação; +- Consultas rápidas via índices secundários. + +#### Namespacing + +Isolamento de contexto: + +- **Workspace**: projetos diferentes; +- **Run**: contexto de execução específica; +- **Step**: contexto de step individual; +- **Global**: padrões compartilhados. #### Memória operacional + - runs, - steps, - eventos, - falhas, - retries, - ferramentas usadas, -- artefatos gerados. +- artefatos gerados (via IndexedArtifactStore). #### Memória semântica -No MVP, tem papel de apoio: + +Papel advisory no MVP: + - registrar padrões úteis, - anotar heurísticas, - resumir falhas recorrentes, @@ -312,18 +397,23 @@ No MVP, tem papel de apoio: - apoiar análise posterior. ### 8.8 Adaptive Supervisor + Monitora a run e decide sobre: -- retry, -- reroute, + +- retry (com backoff exponencial), +- reroute para outro agente via CapabilityRouter, - rollback lógico, - falha terminal, - reexecução com prompt mais restritivo, -- retorno para etapa anterior em caso de rejeição ou inconsistência. +- retorno para etapa anterior em caso de rejeição ou inconsistência, +- per-step policies (configuração por tipo de step). ### 8.9 Runtime Coordinator + Coordena a diferença entre execução inline via CLI e execução assíncrona via worker residente. Responsabilidades: + - criar runs pendentes, - aplicar locking, - retomar runs, @@ -331,11 +421,13 @@ Responsabilidades: - consolidar estado final. ### 8.10 Synapse-Flow + O Synapse-Flow é a engine própria de pipeline do SynapseOS. Ele coordena os estados internos da run, os hand-offs entre steps, o encadeamento `SPEC → TEST_RED → CODE_GREEN → REFACTOR → SECURITY_REVIEW → REPORT` e a integração com supervisor, memória e adapters. --- ## 9. Fluxo de Dados + 1. `repo-preflight` valida o `DOCKER_PREFLIGHT` quando a feature exige execução prática. 2. Usuário envia uma tarefa. 3. O CLI cria ou dispara uma run. @@ -348,6 +440,7 @@ O Synapse-Flow é a engine própria de pipeline do SynapseOS. Ele coordena os es 10. Ao final, é gerado um `RUN_REPORT.md`. ### 9.1 Artefatos principais por run + - `SPEC.md` — especificação validada - `PLAN.md` — plano gerado pelo step PLAN - `TESTS_RED.md` ou arquivos de teste — gerados no step TEST_RED @@ -365,16 +458,19 @@ O Synapse-Flow é a engine própria de pipeline do SynapseOS. Ele coordena os es ## 10. Persistência ### 10.1 MVP + - **SQLite** para metadados operacionais. - Arquivos em disco para artefatos (`raw`, `clean`, `spec`, `plan`, `tests`, `code`, `review`, `docs`, `report`). ### 10.2 Evolução futura + - PostgreSQL para concorrência maior. - pgvector ou vector DB dedicado quando a memória semântica evoluir. --- ## 11. Tratamento de Erros + - **Falhas de CLI**: detectar binário ausente, erro de autenticação, exit code != 0. - **Timeouts**: encerrar subprocesso e marcar step como recuperável ou terminal. - **Parsing errors**: tentar reparse ou reexecução com prompt mais restritivo. @@ -387,9 +483,11 @@ O Synapse-Flow é a engine própria de pipeline do SynapseOS. Ele coordena os es ## 12. Observabilidade ### 12.1 Logs + Logs estruturados por run e step. Campos mínimos: + - `run_id` - `step` - `tool_name` @@ -401,6 +499,7 @@ Campos mínimos: > **Nota**: `parser_confidence` foi considerado mas **não está implementado no MVP**. O campo `ParsedOutput` não expõe score de confiança. Caso a heurística de parsing evolua, esse campo pode ser adicionado ao modelo em versão futura. ### 12.2 Relatório por execução + Cada run deve produzir: ```text @@ -408,6 +507,7 @@ artifacts//RUN_REPORT.md ``` Conteúdo mínimo: + - resumo da solicitação, - SPEC validada, - estados percorridos, @@ -420,6 +520,7 @@ Conteúdo mínimo: --- ## 13. Segurança e Isolamento + - O sistema roda em container da aplicação. - Agentes selecionados podem rodar em containers específicos. - Não usar `shell=True` por padrão. @@ -430,26 +531,32 @@ Conteúdo mínimo: --- ## 14. Escalabilidade e Evolução -### Curto prazo -- paralelizar alguns steps com `asyncio`; -- permitir worker residente consumir múltiplas runs; -- expandir o Synapse-Flow para DAG simples. + +### Curto prazo (em implementação) + +- paralelizar alguns steps com DAG pipeline — execução paralela implementada em `DAGExecutor`; +- permitir worker residente consumir múltiplas runs — Worker leve residente implementado; +- expandir o Synapse-Flow para DAG simples — DAG execution implementado com `DAGExecutor` e `DAGValidator`. ### Médio prazo -- DAG pipeline real; + - workers distribuídos; - PostgreSQL; -- vector memory. +- vector memory; +- roteamento automático por memória semântica. ### Longo prazo + - orquestração distribuída durável; - múltiplos workspaces/branches efêmeras por run; -- políticas adaptativas influenciadas por memória semântica. +- políticas adaptativas influenciadas por memória semântica; +- plugin ecosystem maduro. --- ## 15. Documentos Relacionados + - TDD do SynapseOS - template oficial de SPEC - documentação de stack e runtime -- ADR-001 a ADR-009 +- ADR-001 a ADR-015 diff --git a/features/F58-retry-policy-tests/SPEC.md b/features/F58-retry-policy-tests/SPEC.md new file mode 100644 index 0000000..bacc07e --- /dev/null +++ b/features/F58-retry-policy-tests/SPEC.md @@ -0,0 +1,54 @@ +--- +id: F58-retry-policy-tests +type: feature +summary: Criar suíte de testes dedicada para o módulo de supervisor/retry cobrindo decisões de retry, reroute, falhas terminais e retorno de REVIEW para CODE_GREEN. +inputs: + - Supervisor com max_retries configurável + - RetryableStepError para falhas recuperáveis + - ReviewRejectedError para rejeição de review +outputs: + - SupervisorDecision com action, next_state, route e reason + - Testes unitários cobrindo todos os caminhos de decisão +acceptance_criteria: + - Dado Supervisor(max_retries=2) e RetryableStepError em estado retryável com attempt <= max_retries, quando decide_after_failure é chamado, então action=retry com reason=retryable_failure_with_budget + - Dado Supervisor(max_retries=2) e RetryableStepError com attempt > max_retries e fallback_route disponível, quando decide_after_failure é chamado, então action=reroute com route=fallback + - Dado estado SPEC_VALIDATION com qualquer erro, quando decide_after_failure é chamado, então action=fail com reason=spec_validation_is_terminal + - Dado estado SECURITY com qualquer erro, quando decide_after_failure é chamado, então action=fail com reason=security_is_terminal + - Dado ReviewRejectedError em estado REVIEW, quando decide_after_review_rejection é chamado, então action=return_to_code_green com next_state=CODE_GREEN + - Dado RetryableStepError em estado não-retryável, quando decide_after_failure é chamado, então action=fail com reason=terminal_failure +non_goals: + - Testes de integração com pipeline real + - Testes de concorrência +--- + +# F58 — Retry Policy Tests + +# Contexto + +O módulo `src/synapse_os/supervisor.py` implementa o cérebro de recuperação de falhas do pipeline, decidindo entre retry, reroute ou falha terminal. Atualmente possui 4 testes indiretos em `test_supervisor.py`. Esta feature adiciona 6 testes dedicados cobrindo todos os caminhos de decisão. + +# Objetivo + +Criar suíte de testes dedicada para o supervisor cobrindo decisões de retry, reroute, falhas terminais e retorno de REVIEW para CODE_GREEN. + +# Critérios de Aceite + +- [ ] AC1: `test_supervisor_requests_retry_after_recoverable_step_failure` — retry com budget disponível +- [ ] AC2: `test_supervisor_reroutes_after_repeated_step_failures` — reroute após budget esgotado +- [ ] AC3: `test_supervisor_marks_terminal_failure_after_spec_validation_error` — SPEC_VALIDATION é terminal +- [ ] AC4: `test_supervisor_marks_terminal_failure_after_security_error` — SECURITY é terminal +- [ ] AC5: `test_supervisor_returns_to_code_green_after_review_rejection` — REVIEW → CODE_GREEN +- [ ] AC6: `test_supervisor_terminal_failure_when_no_fallback_route` — fail sem fallback +- [ ] AC7: `test_supervisor_retry_budget_exhausted_at_max_retries` — retry no limite exato +- [ ] AC8: `test_supervisor_reroute_when_budget_exceeded_with_fallback` — reroute com fallback disponível +- [ ] AC9: `test_supervisor_ignores_retryable_error_in_non_retryable_state` — estado não-retryável → fail +- [ ] AC10: `test_supervisor_decision_contains_correct_reason` — reason field correto em cada cenário + +# Design de Testes + +### Fixtures + +- `Supervisor` com max_retries=2 +- Estados retryáveis: PLAN, TEST_RED, CODE_GREEN +- Estados terminais: SPEC_VALIDATION, SECURITY +- Exceções: RetryableStepError, ReviewRejectedError, RuntimeError genérico diff --git a/features/F59-multi-agent-orchestration/SPEC.md b/features/F59-multi-agent-orchestration/SPEC.md new file mode 100644 index 0000000..cd35b41 --- /dev/null +++ b/features/F59-multi-agent-orchestration/SPEC.md @@ -0,0 +1,79 @@ +--- +id: F59-multi-agent-orchestration +type: feature +summary: Formalize adapter registry with capabilities and multi-agent session coordination +inputs: + - Existing BaseCLIAdapter implementations (Codex, Gemini) + - ToolSpec and capability contracts from runtime_contracts.py + - PipelineEngine with executor routing support +outputs: + - AdapterRegistry with capability-based routing + - Multi-agent session coordination in Synapse-Flow + - Capability-based task assignment logic + - Tests for registry, routing, and multi-agent coordination +acceptance_criteria: + - "AdapterRegistry deve registrar adapters por nome e expor capabilities consultaveis" + - "CapabilityRouter deve selecionar adapter adequado com base em capability requerida" + - "PipelineEngine deve suportar execucao de step com adapter selecionado por capability" + - "Multi-agent handoff deve registrar qual adapter executou qual step no contexto" + - "Fallback para adapter generico quando nenhum adapter especializado estiver disponivel" + - "Teste de integracao deve validar fluxo completo de registro + routing + execucao" +non_goals: + - Nao implementar comunicacao direta entre adapters (IPC, sockets) + - Nao adicionar novos adapters externos nesta feature + - Nao implementar load balancing ou escalabilidade horizontal +--- + +# Contexto + +O SynapseOS atualmente suporta apenas um adapter por execucao de pipeline. O `PipelineEngine` aceita executores configurados por estado, mas nao ha registro central de adapters nem selecao automatica baseada em capacidades. O SDD lista 8 adapters planejados (Codex, Gemini, Copilot, OpenCode, DeepSeek, Claude, Local LLM), mas apenas Codex e Gemini existem. + +A coordenacao multi-agent e um requisito fundamental do projeto: diferentes ferramentas de IA tem capacidades diferentes (code generation, planning, analysis, etc) e o Synapse-Flow deve saber qual adapter usar para cada tipo de tarefa. + +# Objetivo + +Criar um sistema de registro de adapters com capacidades explicitas e roteamento automatico baseado em capabilities, permitindo que o Synapse-Flow coordene multiplas ferramentas de IA dentro de uma mesma sessao de execucao. + +## Escopo tecnico + +1. **AdapterRegistry**: registro central de adapters disponiveis +2. **CapabilityRouter**: logica de selecao de adapter por capability requerida +3. **Integration com PipelineEngine**: execucao de steps com adapter selecionado automaticamente +4. **Handoff tracking**: registro de qual adapter executou qual step + +## Capacidades planejadas + +| Capability | Descricao | Adapters Candidatos | +| ------------------- | -------------------------- | ---------------------- | +| `cli_execution` | Execucao CLI generica | Todos | +| `code_generation` | Geracao de codigo | Codex, Copilot, Claude | +| `planning` | Planejamento e arquitetura | Gemini, Claude | +| `code_review` | Revisao de codigo | Claude, OpenCode | +| `security_analysis` | Analise de seguranca | Claude, DeepSeek | +| `local_execution` | Execucao local sem cloud | Local LLM | + +## Design proposto + +```python +# Adapter registry +class AdapterRegistry: + def register(self, adapter: BaseCLIAdapter) -> None + def get(self, name: str) -> BaseCLIAdapter | None + def list_all(self) -> list[BaseCLIAdapter] + def find_by_capability(self, capability: str) -> list[BaseCLIAdapter] + +# Capability router +class CapabilityRouter: + def __init__(self, registry: AdapterRegistry) + def select_adapter(self, required_capabilities: set[str]) -> BaseCLIAdapter | None + def get_best_match(self, required_capabilities: set[str]) -> BaseCLIAdapter | None +``` + +## Impacto no Synapse-Flow + +O Synapse-Flow (engine propria de pipeline do SynapseOS) passara a: + +1. Consultar o CapabilityRouter antes de cada step +2. Selecionar o adapter mais adequado para o tipo de tarefa +3. Registrar o adapter usado no contexto da run +4. Permitir fallback para adapter generico quando necessario diff --git a/features/F60-local-control-plane-foundation/SPEC.md b/features/F60-local-control-plane-foundation/SPEC.md new file mode 100644 index 0000000..21bfaa6 --- /dev/null +++ b/features/F60-local-control-plane-foundation/SPEC.md @@ -0,0 +1,183 @@ +--- +id: F60-local-control-plane-foundation +type: feature +summary: Local HTTP API layer exposing SynapseOS core operations programmatically via FastAPI on localhost. +inputs: + - SPEC.md with feature requirements + - Existing RunRepository, RuntimeService, ArtifactStore +outputs: + - FastAPI control plane server with REST endpoints + - Auth middleware for API token validation + - CLI commands for control plane management +acceptance_criteria: + - GET /health returns 200 with runtime status + - POST /api/v1/runs creates a run and returns 201 + - GET /api/v1/runs lists runs with pagination + - POST /api/v1/runs/{run_id}/cancel marks run as cancelled + - Auth middleware blocks unauthorized requests with 401 when API-token auth is enabled + - All unit tests pass +non_goals: + - WebSocket streaming + - External network binding + - Web dashboard +--- + +# Contexto + +Atualmente o SynapseOS só pode ser controlado via CLI (`synapse` command). Não existe interface programática para submeter runs remotamente, consultar status em tempo real, monitorar o runtime, cancelar runs ou listar artefatos gerados. + +# Objetivo + +Criar uma camada de API HTTP local (localhost-only) que exponha as operações core do SynapseOS de forma programática, permitindo integração com ferramentas externas sem depender exclusivamente da CLI. + +## Escopo + +### In scope + +- Servidor HTTP leve com FastAPI +- Bind exclusivo em `127.0.0.1` (localhost-only, sem exposição externa) +- Endpoints REST para operações core: + - `GET /health` — health check do runtime + - `GET /api/v1/runs` — listar runs com paginação + - `POST /api/v1/runs` — submeter nova run + - `GET /api/v1/runs/{run_id}` — detalhe de uma run + - `POST /api/v1/runs/{run_id}/cancel` — cancelar run pendente/em execução + - `GET /api/v1/runtime/status` — status do runtime residente + - `GET /api/v1/artifacts/{run_id}` — listar artefatos de uma run +- Middleware de autenticação via token (reutilizar auth existente) +- CORS desabilitado por padrão (localhost-only) + +### Out of scope + +- Interface web / dashboard HTTP +- WebSocket para streaming em tempo real +- Exposição externa (bind em 0.0.0.0) +- gRPC ou outros protocolos +- Multi-tenant ou isolamento por workspace via API +- Upload de arquivos via API + +## Critérios de Aceite + +### AC1: Servidor HTTP inicia e responde + +- `synapse control-plane start` inicia o servidor em `127.0.0.1:8080` (porta configurável) +- `GET /health` retorna `{"status": "ok", "runtime": "running|stopped"}` com status 200 +- `GET /health` retorna status 503 se o runtime não estiver disponível + +### AC2: Listar runs via API + +- `GET /api/v1/runs` retorna lista paginada de runs +- Suporta query params `?limit=20&offset=0` +- Retorna JSON com estrutura `{runs: [...], total: N, limit: N, offset: N}` +- Cada run inclui: `id`, `status`, `created_at`, `prompt` (truncado) + +### AC3: Submeter run via API + +- `POST /api/v1/runs` aceita `{"prompt": "..."}` e opcionalmente `{"mode": "sync|async|auto"}` +- Retorna `201 Created` com `{"run_id": "...", "status": "pending"}` +- Retorna `422` se prompt estiver vazio ou ausente +- Run submetida via API é persistida no mesmo SQLite e consumida pelo worker + +### AC4: Detalhe de run via API + +- `GET /api/v1/runs/{run_id}` retorna detalhe completo da run +- Retorna `404` se run não existir +- Inclui: `id`, `status`, `prompt`, `created_at`, `updated_at`, `steps`, `artifacts` + +### AC5: Cancelar run via API + +- `POST /api/v1/runs/{run_id}/cancel` marca run como cancelada +- Retorna `200` se cancelamento for bem-sucedido +- Retorna `409` se run já estiver em estado terminal (completed/failed/cancelled) +- Retorna `404` se run não existir + +### AC6: Status do runtime via API + +- `GET /api/v1/runtime/status` retorna estado do runtime residente +- Inclui: `pid`, `uptime`, `state`, `active_runs`, `pending_runs` + +### AC7: Listar artefatos via API + +- `GET /api/v1/artifacts/{run_id}` lista artefatos gerados +- Retorna `404` se run não existir +- Retorna lista com `{name, size_bytes, created_at, type}` para cada artefato + +### AC8: Autenticação por token + +- Token pode ser configurado via env `SYNAPSE_API_TOKEN` ou config +- Requests sem token válido retornam `401 Unauthorized` quando auth por token estiver habilitada +- Health check (`/health`) é público (sem auth) +- Se `SYNAPSE_API_TOKEN` não estiver definido, auth é desabilitada (modo dev) + +### AC9: Porta configurável + +- Porta padrão: `8080` +- Configurável via `--port` flag ou env `SYNAPSE_CONTROL_PORT` +- Host padrão: `127.0.0.1` +- Host configurável via `--host` flag (com warning se não for localhost) + +### AC10: CLI command para gerenciar control plane + +- `synapse control-plane start` — inicia servidor +- `synapse control-plane stop` — para servidor +- `synapse control-plane status` — mostra status + +## Design Técnico + +### Arquitetura + +``` +[CLI: synapse control-plane start] + | + v +[ControlPlaneServer] -- FastAPI app + | + +--> /health --> RuntimeService.ready() + +--> /api/v1/runs --> RunRepository (SQLite) + +--> /api/v1/runtime --> RuntimeService + +--> /api/v1/artifacts --> ArtifactStore +``` + +### Módulos novos + +- `src/synapse_os/control_plane/__init__.py` +- `src/synapse_os/control_plane/server.py` — FastAPI app + endpoints +- `src/synapse_os/control_plane/models.py` — Pydantic models para request/response +- `src/synapse_os/control_plane/middleware.py` — Auth middleware +- `src/synapse_os/control_plane/cli.py` — Typer subcommands + +### Dependências novas + +- `fastapi>=0.115.0` +- `uvicorn>=0.32.0` + +### Reutilização + +- `RunRepository` de `persistence.py` +- `RuntimeService` de `runtime/service.py` +- `ArtifactStore` de `persistence.py` +- Auth token validation de `auth.py` + +## Riscos e Mitigações + +| Risco | Mitigação | +| ----------------------------------- | ------------------------------------------------------------------ | +| FastAPI adiciona dependência pesada | FastAPI é leve; uvicorn é dependency mínima | +| Exposição acidental externa | Default hardcoded em 127.0.0.1; warning explícito se host mudar | +| Conflito de porta | Mensagem clara de "port in use" no CLI | +| Auth bypass | Health check é o único endpoint público; middleware bloqueia resto | + +## Testes + +- Testes unitários de cada endpoint com `httpx.AsyncClient` + `TestApp` +- Testes de autenticação (com/sem token, token inválido) +- Testes de erro (404, 409, 422) +- Testes de integração com RunRepository mockado +- Testes de health check com runtime running/stopped + +## Próximos Passos (pós-F60) + +- WebSocket para streaming de logs em tempo real +- Dashboard web leve +- API para gestão de hooks +- API para gestão de adapters diff --git a/features/F61-dag-pipeline-evolution/SPEC.md b/features/F61-dag-pipeline-evolution/SPEC.md new file mode 100644 index 0000000..a1547b0 --- /dev/null +++ b/features/F61-dag-pipeline-evolution/SPEC.md @@ -0,0 +1,176 @@ +--- +id: F61-dag-pipeline-evolution +type: feature +summary: DAG-aware pipeline executor with parallel step execution, fan-out/fan-in, cycle detection, and linear fallback. +inputs: + - SPEC.md with dag metadata + - PipelineEngine +outputs: + - DAGExecutor with ThreadPoolExecutor parallel dispatch + - DAGValidator with Kahn cycle detection + - LinearPipelineAdapter for backward compatibility +acceptance_criteria: + - DAG mode executes independent steps in parallel + - Cycle detection raises DAGSpecificationError + - Fan-in steps wait for all dependencies + - Linear fallback works when mode is linear + - All unit tests pass +non_goals: + - Dynamic DAG construction at runtime + - Distributed execution +--- + +# Contexto + +The current `SynapseStateMachine` enforces a strictly linear state flow. Every pipeline step executes sequentially. This becomes a bottleneck when multiple independent steps could run in parallel, when fan-out/fan-in patterns are needed, or when conditional routing is required. Synapse-Flow, as the proprietary pipeline engine of SynapseOS, needs to evolve from a linear executor to a DAG-aware executor while maintaining backward compatibility. + +# Objetivo + +Introduce a DAG mode that coexists with the existing linear mode. When a SPEC contains DAG metadata, the `PipelineEngine` switches to a `DAGExecutor` that resolves step dependencies and schedules work in parallel. When no DAG metadata is present, the system behaves exactly as before. + +## 1. Problem Statement + +The linear pipeline model limits throughput on multi-core hosts and cannot express conditional or data-flow-driven execution graphs. The system needs to support: + +1. **Parallel execution** of independent steps. +2. **Fan-out** — one step triggers multiple downstream steps. +3. **Fan-in** — a step waits for multiple upstream steps before executing. +4. **Conditional routing** — step execution depends on runtime state or output. +5. **No cycles** — DAG must be acyclic (validated at startup). + +All while keeping the existing linear pipeline as the default mode for simple features. + +## 2. Decision + +We introduce a **DAG mode** that coexists with the existing linear mode. When a SPEC contains DAG metadata, the `PipelineEngine` switches to a `DAGExecutor` that resolves step dependencies and schedules work accordingly. When no DAG metadata is present, the system behaves exactly as before (linear, sequential). + +The DAG metadata lives in the SPEC front matter under a `dag` key: + +```yaml +--- +dag: + mode: dag # "linear" (default) or "dag" + steps: + - id: build_core + executor: codex + depends_on: [] + - id: build_tests + executor: codex + depends_on: [build_core] + - id: build_integration + executor: codex + depends_on: [build_core] + - id: verify + executor: codex + depends_on: [build_tests, build_integration] + conditionals: + - id: check_api + step: validate_api + if: runtime.api_present == true +--- +# Normal SPEC body follows... +``` + +The `DAGExecutor`: + +1. Builds an adjacency list from `depends_on` declarations. +2. Validates the graph has no cycles (Kahn's algorithm or DFS). +3. Computes ready set (steps with all dependencies satisfied). +4. Schedules ready steps (parallel execution within thread-pool limit). +5. Marks completed steps, refreshes ready set, repeats until all done or one fails. +6. Supports fan-in synchronization (wait for all dependencies before next step starts). +7. Falls back to linear order when `mode: linear` or no `dag` key present. + +## 3. Scope + +### 3.1 In Scope + +- `DAGValidator`: validates DAG structure (no cycles, referenced steps exist, no orphan steps). +- `DAGExecutor`: adjacency-list graph, Kahn topological sort, thread-pool-based parallel dispatch. +- `DAGContext`: tracks step state (PENDING / RUNNING / DONE / FAILED) per step ID. +- Fan-out: one step can appear in `depends_on` of multiple downstream steps. +- Fan-in: a step with multiple `depends_on` entries waits for all of them. +- Backward compatibility: `mode: linear` or absent `dag` key → existing linear behavior. +- `DAGSpecificationError` — raised on invalid DAG metadata. +- Unit tests covering: cycle detection, topological sort, fan-out, fan-in, linear fallback. +- `LinearPipelineAdapter` — wraps existing linear flow so the same `PipelineEngine` can call either mode. + +### 3.2 Out of Scope + +- Dynamic DAG construction at runtime (steps added based on output of prior steps) — this is a future Phase 3 item. +- Distributed DAG execution across machines. +- DAG visualization or rendering. +- Persistence of DAG intermediate state — linear pipeline persistence model is reused. +- Automatic DAG generation from SPEC content. + +## 4. Architecture + +``` +PipelineEngine + ├── LinearPipelineAdapter (mode: linear or no dag key) + │ └── executes LINEAR_STATE_FLOW sequentially + └── DAGExecutor (mode: dag) + ├── DAGValidator (cycle check, orphan check, dependency check) + ├── DAGContext (step state tracker) + └── ThreadPoolExecutor (concurrent step dispatch) +``` + +### Key Classes + +| Class | Responsibility | +| ----------------------- | -------------------------------------------------------------------------------- | +| `DAGSpec` | Pydantic model for `dag` section in SPEC front matter | +| `DAGStep` | Pydantic model for individual DAG step (id, executor, depends_on, if) | +| `DAGConditional` | Pydantic model for conditional step routing | +| `DAGValidator` | Validates DAG (cycle via Kahn, orphan steps, missing deps) | +| `DAGContext` | Tracks per-step state: PENDING/RUNNING/DONE/FAILED | +| `DAGExecutor` | Builds adjacency list, computes in-degree, dispatches ready steps to thread pool | +| `LinearPipelineAdapter` | Wraps existing linear flow as a drop-in executor interface | + +### Files to Create + +- `src/synapse_os/pipeline_dag.py` — all DAG classes and executor +- `tests/unit/test_pipeline_dag.py` — unit tests + +### Files to Modify + +- `src/synapse_os/pipeline.py` — detect DAG mode, route to DAGExecutor, add `dag` field to `PipelineContext` +- `src/synapse_os/specs/validator.py` — accept and parse `dag` key in SPEC front matter +- `tests/unit/test_pipeline.py` — add DAG mode integration tests (can be minimal) + +## 5. Acceptance Criteria + +| # | Criterion | +| --- | ----------------------------------------------------------------------------------------------------------------------------------------------------- | +| 1 | A SPEC with `dag: mode: linear` or no `dag` key executes exactly as before (linear, sequential) | +| 2 | A SPEC with `dag: mode: dag` and valid `depends_on` graph executes with fan-out parallelism | +| 3 | A step with multiple `depends_on` entries starts only after all dependencies are DONE | +| 4 | `DAGValidator` raises `DAGSpecificationError` on cycle detection | +| 5 | `DAGValidator` raises `DAGSpecificationError` when a `depends_on` references a non-existent step ID | +| 6 | `DAGValidator` raises `DAGSpecificationError` when a step has no `depends_on` but is referenced by no other step (orphan), unless it is the root step | +| 7 | ThreadPoolExecutor dispatches up to N steps concurrently (N = `settings.max_workers`, default 4) | +| 8 | When any step fails, the DAGExecutor records FAILED state and stops scheduling new steps | +| 9 | Fan-in synchronization: a step waits for all its `depends_on` to complete, not just one | +| 10 | All new unit tests pass; existing linear pipeline tests continue to pass | + +## 6. Dependencies + +No new runtime dependencies. ThreadPoolExecutor is stdlib. + +## 7. Configuration + +`AppSettings` gains one new field: + +```python +max_workers: int = Field(default=4, description="Max concurrent DAG step executions") +``` + +## 8. Edge Cases + +| Case | Expected Behavior | +| ---------------------------------------------------------------- | ------------------------------------------------------------------------------- | +| Empty `depends_on: []` list | Step is a root; eligible for first batch of execution | +| Single-step DAG | Behaves like linear (no parallelism gain) | +| DAG with 10 root steps and max_workers=4 | Executes 4 in first batch, then remaining 6 (or fewer, depending on completion) | +| Step depends on another step in the same `depends_on` list twice | Ignored (deduplicated) | +| `dag` key present but `steps` list is empty | Raises `DAGSpecificationError` | diff --git a/features/F62-copilot-adapter/SPEC.md b/features/F62-copilot-adapter/SPEC.md new file mode 100644 index 0000000..07a067c --- /dev/null +++ b/features/F62-copilot-adapter/SPEC.md @@ -0,0 +1,75 @@ +--- +id: F62-copilot-adapter +type: feature +summary: GitHub Copilot CLI adapter following BaseCLIAdapter pattern with circuit breaker and error classification. +inputs: + - BaseCLIAdapter interface + - gh CLI available on PATH +outputs: + - CopilotCLIAdapter class + - classify_copilot_execution function + - Unit tests +acceptance_criteria: + - CopilotCLIAdapter inherits from BaseCLIAdapter + - adapter.capabilities includes code_generation + - classify_copilot_execution returns correct categories + - Circuit breaker integration works + - All unit tests pass +non_goals: + - Interactive shell mode + - Bash completion +--- + +# Contexto + +The SynapseOS adapter system (`BaseCLIAdapter` in `adapters.py`) currently supports two CLI-based AI runtimes: `CodexCLIAdapter` and `GeminiCLIAdapter`. The architecture supports arbitrary adapters via `BaseCLIAdapter`. GitHub Copilot CLI (`gh copilot`) is a widely-used AI coding assistant that complements Codex and Gemini. + +# Objetivo + +Create a `CopilotCLIAdapter` following the existing adapter pattern, expanding the routing options available to the `CapabilityRouter`. + +## 1. Decision + +Create a `CopilotCLIAdapter` following the existing adapter pattern. The adapter: + +1. Calls `gh copilot ai` as the primary command +2. Returns a `CLIExecutionResult` with appropriate `success` flag +3. Classifies execution outcomes using `classify_copilot_execution` +4. Inherits circuit breaker and semaphore guard behavior +5. Has `capabilities = ("cli_execution", "code_generation")` matching Codex + +### Authentication + +Não há env var dedicada do SynapseOS para o adapter. A autenticação depende do estado já configurado no `gh` CLI. + +## 2. Scope + +### In Scope + +- `CopilotCLIAdapter` class following `BaseCLIAdapter` pattern +- `classify_copilot_execution()` function (mirrors `classify_codex_execution`) +- Error classification: timeout, non-zero exit, authentication failure, unavailable +- Integration with `AdapterCircuitBreakerStore` +- Unit tests in `tests/unit/test_copilot_adapter.py` + +### Out of Scope + +- Changing the `gh copilot` binary location (uses `gh` from PATH) +- Supporting interactive `gh copilot` shell mode +- Bash completion or streaming output + +## 3. Files + +- `src/synapse_os/adapters.py` — add `CopilotCLIAdapter` and `classify_copilot_execution` +- `tests/unit/test_copilot_adapter.py` — unit tests (mock `gh copilot`) + +## 4. Acceptance Criteria + +| # | Criterion | +| --- | -------------------------------------------------------------------------------------------------------- | +| 1 | `CopilotCLIAdapter` inherits from `BaseCLIAdapter` | +| 2 | `adapter.capabilities == ("cli_execution", "code_generation")` | +| 3 | `classify_copilot_execution` returns correct category for: success, timeout, non-zero exit, auth failure | +| 4 | Adapter uses circuit breaker (same as `CodexCLIAdapter`) | +| 5 | All unit tests pass; existing adapter tests continue to pass | +| 6 | `gh copilot` invoked with `--color never` to suppress ANSI | diff --git a/features/F63-memory-engine-enhancement/SPEC.md b/features/F63-memory-engine-enhancement/SPEC.md new file mode 100644 index 0000000..bba4aec --- /dev/null +++ b/features/F63-memory-engine-enhancement/SPEC.md @@ -0,0 +1,77 @@ +--- +id: F63-memory-engine-enhancement +type: feature +summary: Artifact metadata, indexed artifact store, and namespace-scoped memory store for run context persistence. +inputs: + - Existing artifact store and report generator + - Run context and feature metadata +outputs: + - ArtifactMetadata Pydantic model + - IndexedArtifactStore with find_by_tag and find_by_type + - MemoryStore with JSON-file backing and namespace isolation +acceptance_criteria: + - ArtifactMetadata has type, tags, source_step, created_at fields + - IndexedArtifactStore.find_by_tag returns tagged artifacts + - MemoryStore.get/set/delete works with namespace isolation + - feature_memory returns namespaced view + - All unit tests pass +non_goals: + - Vector/semantic search + - Cross-process memory sharing +--- + +# Contexto + +The current `RunReportGenerator` produces basic markdown reports with limited metadata. The `artifact_store` is a simple file-based store with no indexing or search. Memory for feature state is entirely external with no integration into the runtime's artifact model. + +# Objetivo + +Introduce `ArtifactMetadata`, `IndexedArtifactStore`, and `MemoryStore` to support richer structured metadata, fast artifact lookup, and a clean interface for persisting run context and feature decisions. + +## 1. Decision + +We introduce three complementary components: + +1. **`ArtifactMetadata`** — a Pydantic model attached to each artifact, containing type, tags, source_step, and created_at. Artifacts without metadata get a default entry. + +2. **`IndexedArtifactStore`** — wraps the existing artifact store with an in-memory index mapping `run_id → artifact_name → ArtifactMetadata`. Supports `find_by_tag`, `find_by_type`, and `list_for_run`. + +3. **`MemoryStore`** — a minimal key-value store backed by JSON files in the runtime state directory. Keys are namespaced (`memory::`). Supports `get`, `set`, `delete`, and `list_namespaces`. Provides `feature_memory()` to scope operations to the current feature. + +These components are purely additive — no existing behavior changes. + +## 2. Scope + +### In Scope + +- `ArtifactMetadata` Pydantic model (type, tags, source_step, created_at) +- `IndexedArtifactStore` class with in-memory index and lookup methods +- `MemoryStore` class with JSON-file backing and namespace isolation +- `feature_memory()` helper on `MemoryStore` returning a namespaced view +- Unit tests for all three components +- `ArtifactMetadata` attached to `StepExecutionResult` artifacts field (optional key) + +### Out of Scope + +- Vector/semantic search +- Cross-process memory sharing +- Automatic memory population from runs +- Integration with opencode memory blocks + +## 3. Files + +- `src/synapse_os/memory.py` — all new memory/artifact index classes +- `tests/unit/test_memory.py` — unit tests + +## 4. Acceptance Criteria + +| # | Criterion | +| --- | ----------------------------------------------------------------------------------------------------- | +| 1 | `ArtifactMetadata` has fields: type (str), tags (list[str]), source_step (str), created_at (datetime) | +| 2 | `IndexedArtifactStore.find_by_tag("error")` returns artifacts tagged "error" | +| 3 | `IndexedArtifactStore.find_by_type("test_report")` returns artifacts of that type | +| 4 | `MemoryStore.set("ns", "key", "value")` persists and `get("ns", "key")` retrieves it | +| 5 | `MemoryStore.list_namespaces()` returns all namespaces | +| 6 | `feature_memory("F63")` returns a namespaced view that only touches F63 keys | +| 7 | All unit tests pass; existing tests continue to pass | +| 8 | `ArtifactMetadata` is added to `StepExecutionResult` artifacts field (optional key) | diff --git a/features/F64-advanced-supervisor-policies/SPEC.md b/features/F64-advanced-supervisor-policies/SPEC.md new file mode 100644 index 0000000..1090636 --- /dev/null +++ b/features/F64-advanced-supervisor-policies/SPEC.md @@ -0,0 +1,71 @@ +--- +id: F64-advanced-supervisor-policies +type: feature +summary: Policy-driven supervisor with per-step retry limits, exponential backoff, error-category-aware policies, and fallback routing. +status: draft +created: 2026-03-31 +owner: agent +inputs: [] +outputs: [] +acceptance_criteria: + - Per-step max_retries overrides are respected (TEST_RED retries 5 times, PLAN retries only 2) + - Exponential backoff delay doubles each attempt: base=1s produces [1s, 2s, 4s, 8s] + - Backoff cap at max_delay_seconds (default 60s) + - SECURITY and SPEC_VALIDATION remain terminal (0 retries) + - AdapterOperationalError with category launcher_unavailable short-circuits without retry + - Fallback route is tried when primary route exhausts retries + - All new unit tests pass; existing supervisor tests continue to pass +non_goals: [] +--- + +# Contexto + +O `Supervisor` atual em `supervisor.py` suporta apenas três ações: `retry`, `reroute` e `fail`. Retry tem um contador plano `max_retries` aplicado a todos os steps. Não existe configuração de retry por step, nem backoff exponencial, nem integração com circuit-breaker, nem política adaptativa que considere categorias de erro. + +O supervisor do Synapse-Flow precisa evoluir de um contador plano para um sistema driven por políticas onde diferentes categorias de erro, steps e adapters podem ter políticas de retry/comportamento distintas. + +# Objetivo + +Introduzir um **supervisor orientado a políticas** que: + +1. **Limites de retry por step** — `PLAN`, `TEST_RED`, `CODE_GREEN` cada um recebe seu próprio `max_retries` ao invés de compartilhar um orçamento plano. +2. **Backoff exponencial** — entre retries, o delay dobra: 1s, 2s, 4s, etc. Cap em 60s. +3. **Políticas cientes de categoria de erro** — `RetryableStepError` recebe retries; `AdapterOperationalError` tem short-circuit em categorias "launcher_unavailable". +4. **Roteamento com fallback** — quando adapter primário está indisponível, rotear para próximo adapter disponível. +5. **Políticas específicas por step** — SECURITY e SPEC_VALIDATION permanecem terminais (sem retries). + +O modelo existente `SupervisorDecision` permanece inalterado — a nova lógica produz os mesmos tipos de decisão. + +# Escopo + +## Dentro do Escopo + +- `RetryPolicy` Pydantic model: `max_retries`, `base_delay_seconds`, `max_delay_seconds` +- `StepPolicy` Pydantic model: override por step do `RetryPolicy` +- `SupervisorPolicies` Pydantic model: holds default policy + per-step overrides +- `AdvancedSupervisor` class extending `Supervisor` com decisões orientadas por política +- `calculate_backoff(attempt, base_delay, max_delay)` helper +- Unit tests em `tests/unit/test_supervisor_policies.py` + +## Fora do Escopo + +- Integração com circuit breaker (tratado separadamente via `AdapterCircuitBreakerStore`) +- Carregamento dinâmico de políticas via config em runtime +- Compartilhamento de orçamento entre steps (cada step tem orçamento independente) + +# Arquivos + +- `src/synapse_os/supervisor.py` — adicionar policy models, `calculate_backoff`, atualizar `Supervisor.decide_after_failure` +- `tests/unit/test_supervisor_policies.py` — unit tests + +# Critérios de Aceite + +| # | Criterion | +| --- | ----------------------------------------------------------------------------------------------------------- | +| 1 | Per-step `max_retries` overrides are respected (e.g., TEST_RED can retry 5 times while PLAN retries only 2) | +| 2 | Exponential backoff delay doubles each attempt: base=1s → [1s, 2s, 4s, 8s] | +| 3 | Backoff cap at `max_delay_seconds` (default 60s) | +| 4 | SECURITY and SPEC_VALIDATION remain terminal (0 retries) | +| 5 | `AdapterOperationalError` with category `launcher_unavailable` short-circuits without retry | +| 6 | Fallback route is tried when primary route exhausts retries | +| 7 | All new unit tests pass; existing supervisor tests continue to pass | diff --git a/features/F65-runtime-coordinator-hardening/SPEC.md b/features/F65-runtime-coordinator-hardening/SPEC.md new file mode 100644 index 0000000..537487e --- /dev/null +++ b/features/F65-runtime-coordinator-hardening/SPEC.md @@ -0,0 +1,63 @@ +--- +id: F65-runtime-coordinator-hardening +type: feature +summary: Hardened RuntimeCoordinator with graceful degradation, improved lifecycle state transitions, observability events, and cleanup handlers. +status: draft +created: 2026-03-31 +owner: agent +inputs: [] +outputs: [] +acceptance_criteria: + - RuntimeCoordinator enters degraded mode when circuit breaker is open, continues serving healthy adapters + - Lifecycle state transitions emit 'runtime.lifecycle.transition' events + - RuntimeCoordinator emits 'runtime.starting', 'runtime.started', 'runtime.stopping', 'runtime.stopped' events + - Shutdown handler drains pending work with timeout before force-kill + - Health check returns DEGRADED status when any circuit breaker is open + - All new unit tests pass; existing RuntimeCoordinator tests continue to pass +non_goals: [] +--- + +# Contexto + +O `RuntimeCoordinator` em `runtime/service.py` é o componente central que gerencia o ciclo de vida do runtime. Ele não tem atualmente: + +- Modo degradado quando circuit breakers estão abertos +- Eventos de lifecycle completos +- Tratamento graceful de shutdown +- Health check granular +- Integração de observabilidade com o sistema de eventos existente + +# Objetivo + +Introduzir um **RuntimeCoordinator reforçado** que: + +1. **Graceful degradation** — quando um circuit breaker está aberto, o coordinator continua operando com adapters saudáveis em vez de falhar completamente. +2. **Lifecycle events** — emite eventos `runtime.lifecycle.transition`, `runtime.starting`, `runtime.started`, `runtime.stopping`, `runtime.stopped`. +3. **Shutdown handler** — ao receber sinal de término, drena trabalho pendente com timeout antes de force-kill. +4. **Health check granular** — `GET /health` retorna `{"status": "DEGRADED"}` quando circuit breakers estão abertos, `{"status": "HEALTHY"}` quando tudo OK. +5. **Cleanup handlers** — hooks de cleanup registráveis para recursos que precisam de liberação no shutdown. + +O `RuntimeCoordinator` existente é enhancement in-place (não um replace). + +# Escopo + +## Dentro do Escopo + +- `RuntimeCoordinator` com `degraded_adapters` set e lógica de graceful degradation +- `lifecycle_event(event_name)` método +- `shutdown(timeout_seconds)` método com drain graceful +- `register_cleanup_handler(callback)` e `run_cleanup_handlers()` +- `health_status()` método returning `Literal["HEALTHY", "DEGRADED", "UNHEALTHY"]` +- `RuntimeLifecycleEvent` Pydantic model para eventos de lifecycle +- Unit tests em `tests/unit/test_runtime_coordinator_hardening.py` + +## Fora do Escopo + +- Modificação de `RuntimeService` (separado em F70) +- Integração com o servidor HTTP de control plane +- Persistência de health status + +# Arquivos + +- `src/synapse_os/runtime/service.py` — atualizar `RuntimeCoordinator` com hardening +- `tests/unit/test_runtime_coordinator_hardening.py` — unit tests diff --git a/features/F66-reporting-and-observability-evolution/SPEC.md b/features/F66-reporting-and-observability-evolution/SPEC.md new file mode 100644 index 0000000..78cf27b --- /dev/null +++ b/features/F66-reporting-and-observability-evolution/SPEC.md @@ -0,0 +1,62 @@ +--- +id: F66-reporting-and-observability-evolution +type: feature +summary: Enhanced run reports with structured metadata, execution timeline, adapter performance metrics, and structured error summaries. +status: draft +created: 2026-03-31 +owner: agent +inputs: [] +outputs: [] +acceptance_criteria: + - RunReport includes execution_timeline with state transitions and durations + - RunReport includes adapter_metrics with per-adapter success rates and avg durations + - RunReport includes structured_errors list with error categories and counts + - RunReport includes feature_id and feature_title from SPEC metadata + - RunReport JSON schema is validated against a JSON Schema spec + - Unit tests verify all new report fields are populated correctly + - Existing reporting tests continue to pass +non_goals: [] +--- + +# Contexto + +O `RunReport` atual em `reporting.py` é um arquivo Markdown simples (RUN_REPORT.md). Ele não tem: + +- Timeline de execução com transições de estado e durações +- Métricas por adapter (success rate, avg duration) +- Estrutura de erros categorizados +- Validação via JSON Schema +- Campos de feature_id e feature_title + +# Objetivo + +Expandir o sistema de relatórios para incluir: + +1. **Structured timeline** — lista de transições de estado com timestamp e duration desde a transição anterior. +2. **Adapter metrics** — por adapter: total calls, success count, failure count, avg duration ms, categorização de erros. +3. **Structured errors** — lista de erros categorizados com type, message, step, e count. +4. **Feature metadata** — campos `feature_id` e `feature_title` populados do frontmatter da SPEC. +5. **JSON Schema validation** — o report é primeiramente gerado como Pydantic model, depois renderizado para Markdown. + +O `RunReport` existente continua como Pydantic model; adicionamos novos campos. + +# Escopo + +## Dentro do Escopo + +- `ExecutionTimeline` e `TimelineEntry` Pydantic models +- `AdapterMetrics` Pydantic model +- `StructuredError` Pydantic model +- Campos `execution_timeline`, `adapter_metrics`, `structured_errors`, `feature_id`, `feature_title` no `RunReport` +- `generate_structured_report(run_id, run_record)` helper que popula todos os campos +- Unit tests em `tests/unit/test_reporting_evolution.py` + +## Fora do Escopo + +- Alteração do formato de renderização Markdown existente (mantemos compatibilidade) +- Integração com sistema de eventos externo + +# Arquivos + +- `src/synapse_os/reporting.py` — adicionar models e campos ao `RunReport` +- `tests/unit/test_reporting_evolution.py` — unit tests diff --git a/features/F67-workspace-management-v2/SPEC.md b/features/F67-workspace-management-v2/SPEC.md new file mode 100644 index 0000000..2ae85e7 --- /dev/null +++ b/features/F67-workspace-management-v2/SPEC.md @@ -0,0 +1,61 @@ +--- +id: F67-workspace-management-v2 +type: feature +summary: Workspace Management v2 with per-run workspace isolation, lifecycle hooks, and workspace pool for reuse. +inputs: + - Existing WorkspaceProvider protocol + - Run lifecycle events +outputs: + - WorkspaceState enum and TrackedWorkspace model + - WorkspacePool with acquire/release/discard + - WorkspaceManager integrating providers and pool +acceptance_criteria: + - WorkspaceProvider creates isolated per-run workspace directories + - WorkspaceProvider tracks workspace lifecycle states + - Workspace cleanup hook is called when run completes + - Workspace pool holds up to N reusable idle workspaces + - Reuse of pooled workspace resets its contents + - All unit tests pass +non_goals: + - Cross-session workspace persistence + - Workspace templates + - Multi-tenant isolation +--- + +# Contexto + +O sistema atual de workspace em `runtime_contracts.py` (`WorkspaceProvider`, `LocalWorkspaceProvider`, `RunScopedWorkspaceProvider`) não suporta pool de workspaces para reuse, lifecycle hooks de cleanup, tracking de estado ou reset de workspace antes de reuse. + +# Objetivo + +Introduzir WorkspaceState enum, TrackedWorkspace, WorkspacePool com acquire/release/discard, reset interno antes de reuse, Lifecycle hooks de cleanup, e WorkspaceManager que integra providers + pool. + +## 1. Decision + +Introduzir: + +1. **WorkspaceState enum** — `CREATING`, `READY`, `BUSY`, `CLEANUP`, `DESTROYED` +2. **TrackedWorkspace** — workspace com state tracking e metadata +3. **WorkspacePool** — pool fixo de workspaces idle que podem ser reutilizados +4. **Lifecycle hooks** — `on_workspace_cleanup(path)` callback + +## 2. Scope + +### In Scope + +- `WorkspaceState` enum +- `TrackedWorkspace` model +- `WorkspacePool` class com acquire/release/discard +- `WorkspaceManager` que integra providers + pool +- Unit tests + +### Out of Scope + +- Persistência de workspace entre sessões +- Workspace templates +- Multi-tenant workspace isolation + +## 3. Files + +- `src/synapse_os/workspace.py` (novo) +- `tests/unit/test_workspace_v2.py` (novo) diff --git a/features/F68-plugin-extension-system/SPEC.md b/features/F68-plugin-extension-system/SPEC.md new file mode 100644 index 0000000..5761ed4 --- /dev/null +++ b/features/F68-plugin-extension-system/SPEC.md @@ -0,0 +1,64 @@ +--- +id: F68-plugin-extension-system +type: feature +summary: Plugin/Extension system with hook-based registration, entry point discovery, and lifecycle management. +inputs: + - Existing hooks.py hook system + - Python entry point mechanism +outputs: + - PluginManifest dataclass with name, version, hooks + - PluginRegistry singleton with discovery and lifecycle + - load_plugins() via entry point discovery +acceptance_criteria: + - Plugins are discovered via entry point group synapse_os.plugins + - Plugin manifest declared via hook_manifest function + - PluginRegistry tracks loaded plugins and hook handlers + - load_plugins() discovers and loads all installed plugins + - unload_plugin() removes plugin and its handlers + - Plugin can declare pre_step, post_step, on_run_start, on_run_end hooks + - All unit tests pass +non_goals: + - Plugin sandboxing/security + - Plugin packaging/distribution + - Plugin config API + - Hot reload +--- + +# Contexto + +O sistema atual de hooks em `hooks.py` suporta apenas hooks internos registrados manualmente. Não existe mecanismo para extensões externas descobrirem e registrarem hooks no Synapse-Flow. + +# Objetivo + +Introduzir PluginManifest, PluginRegistry singleton com discovery e lifecycle, entry point group `synapse_os.plugins` para descoberta automática, load_plugins() e unload_plugin(). + +## 1. Decision + +Introduzir: + +1. **PluginManifest** — dataclass com name, version, hooks, enabled +2. **PluginRegistry** — singleton que gerencia plugins descobertos e carregados +3. **entry point group** `synapse_os.plugins` para descoberta automática +4. **load_plugins()** — descobre e registra todos os plugins via entry points +5. **unload_plugin(name)** — remove plugin do registry + +## 2. Scope + +### In Scope + +- PluginManifest dataclass +- PluginRegistry com discovery e lifecycle +- Entry point based plugin discovery +- Unit tests + +### Out of Scope + +- Plugin sandboxing/security +- Plugin packaging/distribution +- Plugin config API +- Hot reload + +## 3. Files + +- `src/synapse_os/plugins.py` (novo) +- `tests/unit/test_plugins.py` (novo) diff --git a/memory/MEMORY.md b/memory/MEMORY.md new file mode 100644 index 0000000..3ea0738 --- /dev/null +++ b/memory/MEMORY.md @@ -0,0 +1,20 @@ +# MEMORY.md — SynapseOS + +Índice lean da memória durável do projeto. Este arquivo aponta para arquivos temáticos em `memory/`. + +## Arquivos temáticos + +| Arquivo | Conteúdo | Última atualização | +| ------------------------------------------ | ------------------------------------ | ------------------ | +| [project_state.md](project_state.md) | Estado atual, sprint, branch, marcos | 2026-04-01 | +| [stable_decisions.md](stable_decisions.md) | Decisões arquiteturais fixas | 2026-04-01 | +| [active_fronts.md](active_fronts.md) | Frentes ativas + open decisions | 2026-04-01 | +| [pitfalls.md](pitfalls.md) | Armadilhas técnicas recorrentes | 2026-04-01 | +| [next_steps.md](next_steps.md) | Próximos passos recomendados | 2026-04-01 | +| [handoff.md](handoff.md) | Último handoff de sessão | 2026-04-01 | + +## Convenção + +- `memory/` é memória durável do projeto, não log de conversa. +- Detalhe operacional fica em `ERROR_LOG.md` e `PENDING_LOG.md`. +- Atualizado via `memory-curator` ao encerrar sessões. diff --git a/memory/active_fronts.md b/memory/active_fronts.md new file mode 100644 index 0000000..dfb4739 --- /dev/null +++ b/memory/active_fronts.md @@ -0,0 +1,36 @@ +# Active Fronts — SynapseOS + +## Frentes concluídas recentemente + +Todas as frentes do sprint F59-F68 foram mergeadas em `origin/main`: + +| Frente | Descrição | Status | +| ------ | ----------------------------------- | ----------- | +| F59 | Multi-Agent Session Orchestration | ✅ Mergeado | +| F60 | Local Control Plane Foundation | ✅ Mergeado | +| F61 | DAG Pipeline Evolution | ✅ Mergeado | +| F62 | Copilot Adapter | ✅ Mergeado | +| F63 | Memory Engine Enhancement | ✅ Mergeado | +| F64 | Advanced Supervisor Policies | ✅ Mergeado | +| F65 | Runtime Coordinator Hardening | ✅ Mergeado | +| F66 | Reporting & Observability Evolution | ✅ Mergeado | +| F67 | Workspace Management v2 | ✅ Mergeado | +| F68 | Plugin/Extension System | ✅ Mergeado | + +## Frente ativa + +Nenhuma frente ativa imediata. Aguardando `technical-triage` para definição de próximas prioridades. + +## Open decisions + +1. **Próxima frente prioritária:** Aguardando triagem para escolher entre: + - Evolução do multi-agent (distributed sessions) + - Hardening de segurança adicional + - UI/TUI desktop (quando demanda concreta surgir) + - Outros candidatos do backlog + +2. **Desktop-shell:** Mantido fora da fila principal até estabilização completa do core. + +3. **TypeScript runtime migration:** Descartado por ora; reavaliar apenas se houver necessidade estratégica. + +4. **Remote multi-host auth:** Adiado até demanda concreta. diff --git a/memory/handoff.md b/memory/handoff.md new file mode 100644 index 0000000..386cf0b --- /dev/null +++ b/memory/handoff.md @@ -0,0 +1,37 @@ +# Handoff — SynapseOS + +**Data:** 2026-04-01 +**Sprint:** F59-F68 concluído +**Branch:** main +**Status:** Estável, 755 tests, ruff/mypy clean + +## Read before acting + +1. Leia `AGENTS.md` para convenções do projeto +2. Leia `memory/MEMORY.md` e arquivos temáticos em `memory/` +3. Leia `ERROR_LOG.md` e `PENDING_LOG.md` para contexto operacional +4. Verifique `git status` e `./scripts/branch-sync-check.sh` + +## Current state + +- Todas as 10 frentes do sprint F59-F68 mergeadas em `origin/main` +- Synapse-Flow evoluído para DAG state-driven +- Multi-agent session orchestration operacional +- Local control plane foundation estabilizado +- Runtime boundaries, workspace isolation, observabilidade consolidados +- Zero erros críticos, baseline 100% clean + +## Open points + +- Aguardando `technical-triage` para definição de próximas frentes +- Desktop-shell mantido fora da fila principal +- TypeScript runtime migration descartado por ora +- Remote multi-host auth adiado + +## Recommended next front + +Executar `technical-triage` para avaliar backlog e escolher próxima frente prioritária entre: + +- Evolução multi-agent (distributed sessions) +- Hardening de segurança adicional +- Outros candidatos do backlog diff --git a/memory/next_steps.md b/memory/next_steps.md new file mode 100644 index 0000000..93bc435 --- /dev/null +++ b/memory/next_steps.md @@ -0,0 +1,19 @@ +# Next Steps — SynapseOS + +## Recomendação imediata + +Executar `technical-triage` para definir próximas frentes prioritárias pós-sprint F59-F68. + +## Candidatos potenciais (aguardando triagem) + +1. **Evolução multi-agent:** Distributed sessions, coordenação avançada entre adapters +2. **Hardening de segurança:** Auth adicional, audit trail expandido, rate limiting +3. **UI/TUI desktop:** `synapse tui` com Textual — apenas se houver demanda concreta +4. **Performance e escalabilidade:** Otimizações de throughput, memória +5. **Integrações:** Novos adapters além de Codex e Copilot + +## Não priorizar + +- Desktop-shell (fora da fila até core estabilizar) +- TypeScript-first runtime migration (descartado por ora) +- Remote multi-host auth (adiado até demanda concreta) diff --git a/memory/pitfalls.md b/memory/pitfalls.md new file mode 100644 index 0000000..07c96b8 --- /dev/null +++ b/memory/pitfalls.md @@ -0,0 +1,27 @@ +# Pitfalls — SynapseOS + +## Armadilhas recorrentes + +### Branch e Git + +1. **Reuso de branch mergeada** — Nunca reutilizar branch de feature já mergeada para drafts novos. Usar `draft/*` ou `archive/*`. +2. **Drift não detectado** — Rodar `./scripts/branch-sync-check.sh` cedo e manter worktree limpa antes de commit/push/PR. +3. **Delta misto em PR** — Quando inevitável, consolidar handoff durável e artefatos mínimos imediatamente após merge. + +### Docker e ambiente + +4. **Sandbox vs real** — Diferenciar falha de sandbox (rede, Docker daemon) de falha real do repositório. +5. **Worktree fria** — Sincronizar `uv sync --locked --extra dev` antes de rodar testes que carregam `conftest.py`. +6. **Wrappers quebrados** — Prefira `python -m pytest`/`python -m mypy` via `uv` em vez de wrappers da `.venv`. + +### Testes e TDD + +7. **Fixtures ANSI** — Fixtures com ANSI armazenados como escape literal requerem `unicode_escape=True` no helper. +8. **Monkeypatch legacy** — Ao ampliar helpers de CLI, preservar assinatura compatível ou atualizar doubles legados. +9. **mypy em tests** — `tests/` tem override explícito; não aplicar strict mode da `src/` na árvore de testes. + +### CI e gates + +10. **repo-checks local** — Rodar equivalente local do gate amplo antes de concluir PRs grandes. +11. **ruff format global** — Revalidar após mudanças amplas de documentação/baseline. +12. **PR body inline** — Usar `--body-file` em vez de `--body` quando houver Markdown com backticks. diff --git a/memory/project_state.md b/memory/project_state.md new file mode 100644 index 0000000..d60bb2c --- /dev/null +++ b/memory/project_state.md @@ -0,0 +1,24 @@ +# Project State — SynapseOS + +## Estado global + +**Sprint atual:** F59-F68 concluído (2026-04-01) +**Branch ativa:** main +**Baseline:** origin/main sincronizado +**Status:** Estável, 755 tests passando + +## Marcos + +- MVP inicial: Concluído (F01-F10) +- Etapa 2: Concluída (F15-F22) +- Primeira onda de guardrails: Concluída (F23-F27) +- Fundação de runtime boundaries: Concluída (F51-F53) +- Sprint F59-F68: Concluída — Multi-Agent, Control Plane, DAG, Copilot, Memory, Supervisor, Runtime, Observability, Workspace v2, Plugins + +## Snapshot local + +- `ruff format --check .`: ✅ Clean +- `mypy src`: ✅ Clean +- `pytest`: ✅ 755 passando +- `ERROR_LOG.md`: Sem erros críticos abertos +- `PENDING_LOG.md`: Aguardando technical-triage para próximas frentes diff --git a/memory/stable_decisions.md b/memory/stable_decisions.md new file mode 100644 index 0000000..af74c98 --- /dev/null +++ b/memory/stable_decisions.md @@ -0,0 +1,32 @@ +# Stable Decisions — SynapseOS + +## Arquitetura + +1. **Synapse-Flow** é a engine própria de pipeline do SynapseOS — pipeline linear state-driven evoluído para DAG. +2. **CLI-first** — interface primária é CLI; UI desktop (Textual) somente quando houver demanda concreta. +3. **Core em Python** — runtime central permanece em Python; TypeScript limitado a shell/UI opcional. +4. **Container-first** — execução prática via Docker/Compose, com preflight leve (`compose config`). + +## Boundaries e contratos + +5. `ToolSpec`/capabilities formalizado — contratos explícitos de capabilities registradas. +6. `WorkspaceProvider` com isolation auditável — workspace path persistido e provider `run-scoped`. +7. `RunContext` enriquecido — eventos de lifecycle (`run_context_initialized`, `step_started`, `state_transitioned`). + +## Multi-agent e orquestração + +8. **Multi-Agent Session Orchestration** — registry/capabilities e coordenação entre adapters sem UI desktop. +9. **Supervisor deterministico** — decisões entre retry, reroute, return_to_code_green, fail. +10. **Runtime persistente** — Linux-first, identidade de processo validada via `/proc//cmdline`. + +## Execução e qualidade + +11. **TDD explícito** — RED → GREEN → REFACTOR; testes antes do código de produção. +12. **Quality gates** — ruff, mypy, pytest como gates obrigatórios antes de SECURITY_REVIEW. +13. **Branch Sync Gate** — drift detection e atualização conservadora via scripts. + +## Decisões de produto + +14. **Desktop-shell fora da fila principal** — só retorna após runtime boundaries, workspace isolation, observability e control plane estabilizados. +15. **TypeScript-first runtime migration descartado** — por ora, TypeScript apenas para shell/UI opcional. +16. **Remote multi-host auth adiado** — só quando houver demanda concreta e recorte verificável. diff --git a/pyproject.toml b/pyproject.toml index 092ad0c..1ff4228 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -24,6 +24,9 @@ dependencies = [ "jsonschema>=4.23.0", "pyyaml>=6.0.2", "textual>=8.1.1", + "fastapi>=0.115.0", + "uvicorn>=0.32.0", + "httpx>=0.28.0", ] [project.optional-dependencies] diff --git a/src/synapse_os/adapters.py b/src/synapse_os/adapters.py index 3185a48..e647093 100644 --- a/src/synapse_os/adapters.py +++ b/src/synapse_os/adapters.py @@ -45,6 +45,7 @@ def __init__( self.tool_name = tool_name self.command = command self.reason = reason + self.message = message class BaseCLIAdapter(ABC): @@ -324,3 +325,149 @@ def build_command(self, prompt: str) -> list[str]: async def execute(self, prompt: str) -> CLIExecutionResult: return await super().execute(prompt) + + +_LAUNCHER_UNAVAILABLE_COPILOT_PATTERNS = ( + "gh: command not found", + "gh: not found", + "command not found: gh", +) +_AUTHENTICATION_UNAVAILABLE_COPILOT_PATTERNS = ( + "authentication required", + "not logged in", + "unauthorized", + "invalid token", + "authenticated", + "gh auth login", +) + + +class CopilotCLIAdapter(BaseCLIAdapter): + @property + def capabilities(self) -> tuple[str, ...]: + return ("cli_execution", "code_generation") + + @property + def command_prefix(self) -> tuple[str, ...]: + return ("gh", "copilot", "ai") + + def __init__( + self, + *, + timeout_seconds: float = 30.0, + max_concurrent_adapters: int | None = None, + ) -> None: + super().__init__( + tool_name="copilot", + timeout_seconds=timeout_seconds, + max_concurrent_adapters=max_concurrent_adapters, + ) + + def build_command(self, prompt: str) -> list[str]: + if not prompt.strip(): + raise ValueError("prompt must not be empty.") + return [ + "gh", + "copilot", + "ai", + "--color", + "never", + "--", + prompt, + ] + + async def execute(self, prompt: str) -> CLIExecutionResult: + settings = AppSettings() + command = self.build_command(prompt) + self._validate_command(command) + + breaker_store = AdapterCircuitBreakerStore(settings.adapter_circuit_breaker_state_file) + if breaker_store.is_open(self.tool_name, now=time.time()): + return CLIExecutionResult( + tool_name=self.tool_name, + command=command, + return_code=75, + stdout_raw="", + stderr_raw="Circuit breaker open for copilot.\n", + stdout_clean="", + stderr_clean="Circuit breaker open for copilot.", + duration_ms=0, + timed_out=False, + success=False, + ) + + try: + result = await super().execute(prompt) + except AdapterOperationalError as exc: + breaker_store.record_operational_failure( + self.tool_name, + threshold=settings.adapter_circuit_breaker_failure_threshold, + cooldown_seconds=settings.adapter_circuit_breaker_cooldown_seconds, + now=time.time(), + ) + return CLIExecutionResult( + tool_name=self.tool_name, + command=exc.command, + return_code=1, + stdout_raw="", + stderr_raw=exc.message, + stdout_clean="", + stderr_clean=exc.message, + duration_ms=0, + timed_out=False, + success=False, + ) + assessment = classify_copilot_execution(result) + if assessment.category in { + "launcher_unavailable", + "authentication_unavailable", + }: + breaker_store.record_operational_failure( + self.tool_name, + threshold=settings.adapter_circuit_breaker_failure_threshold, + cooldown_seconds=settings.adapter_circuit_breaker_cooldown_seconds, + now=time.time(), + ) + else: + breaker_store.reset(self.tool_name) + return result + + +def classify_copilot_execution(result: CLIExecutionResult) -> CodexExecutionAssessment: + stderr_lower = result.stderr_clean.lower() + + if result.success: + return CodexExecutionAssessment( + category="success", + is_operational_block=False, + detail="Copilot CLI completed successfully.", + ) + if "circuit breaker open" in stderr_lower: + return CodexExecutionAssessment( + category="circuit_open", + is_operational_block=True, + detail=result.stderr_clean or "Copilot circuit breaker is open.", + ) + if result.timed_out: + return CodexExecutionAssessment( + category="timeout", + is_operational_block=False, + detail="Copilot CLI exceeded the configured timeout.", + ) + if _contains_any(stderr_lower, _LAUNCHER_UNAVAILABLE_COPILOT_PATTERNS): + return CodexExecutionAssessment( + category="launcher_unavailable", + is_operational_block=True, + detail=result.stderr_clean or "GitHub CLI (gh) is unavailable.", + ) + if _contains_any(stderr_lower, _AUTHENTICATION_UNAVAILABLE_COPILOT_PATTERNS): + return CodexExecutionAssessment( + category="authentication_unavailable", + is_operational_block=True, + detail=result.stderr_clean or "GitHub Copilot authentication is unavailable.", + ) + return CodexExecutionAssessment( + category="return_code_nonzero", + is_operational_block=False, + detail=result.stderr_clean or "Copilot CLI exited with a non-zero return code.", + ) diff --git a/src/synapse_os/cli/app.py b/src/synapse_os/cli/app.py index aab0b57..07dc080 100644 --- a/src/synapse_os/cli/app.py +++ b/src/synapse_os/cli/app.py @@ -54,10 +54,12 @@ runtime_app = typer.Typer(help="Manage the minimal persistent runtime.") runs_app = typer.Typer(help="Inspect persisted runs and artifacts.") auth_app = typer.Typer(help="Manage the local auth registry.") +control_plane_app = typer.Typer(help="Manage the local control plane HTTP API.") app.add_typer(runtime_app, name="runtime") app.add_typer(runs_app, name="runs") app.add_typer(auth_app, name="auth") app.add_typer(hooks_app, name="hooks") +app.add_typer(control_plane_app, name="control-plane") @app.callback() @@ -807,3 +809,51 @@ def runs_show( artifact_paths=artifact_store.list_artifact_paths(run_id), preview=resolved_preview, ) + + +@control_plane_app.command("start") +def control_plane_start( + host: str = typer.Option("127.0.0.1", "--host", envvar="SYNAPSE_CONTROL_HOST"), + port: int = typer.Option(8080, "--port", envvar="SYNAPSE_CONTROL_PORT"), + api_token: str | None = typer.Option(None, "--token", envvar="SYNAPSE_API_TOKEN"), +) -> None: + import uvicorn + + if host != "127.0.0.1" and host != "localhost": + typer.echo( + "WARNING: Binding to non-localhost address. " + "The control plane has no network-level security.", + err=True, + ) + + try: + runtime_service = _runtime_service() + run_repo = _run_repository() + artifact_store = _artifact_store() + except CLIError: + raise + + from synapse_os.control_plane.server import create_app + + cp_app = create_app( + runtime_service=runtime_service, + run_repository=run_repo, + artifact_store=artifact_store, + api_token=api_token, + ) + + typer.echo(f"Starting control plane on http://{host}:{port}") + uvicorn.run(cp_app, host=host, port=port, log_level="info") + + +@control_plane_app.command("status") +def control_plane_status() -> None: + + host = os.environ.get("SYNAPSE_CONTROL_HOST", "127.0.0.1") + port_raw = os.environ.get("SYNAPSE_CONTROL_PORT", "8080") + try: + port = int(port_raw) + except ValueError: + port = 8080 + typer.echo(f"Control plane configured for http://{host}:{port}") + typer.echo("Use 'synapse control-plane start' to start the server.") diff --git a/src/synapse_os/control_plane/__init__.py b/src/synapse_os/control_plane/__init__.py new file mode 100644 index 0000000..70f521c --- /dev/null +++ b/src/synapse_os/control_plane/__init__.py @@ -0,0 +1 @@ +"""Local Control Plane — HTTP API for SynapseOS.""" diff --git a/src/synapse_os/control_plane/middleware.py b/src/synapse_os/control_plane/middleware.py new file mode 100644 index 0000000..61a8813 --- /dev/null +++ b/src/synapse_os/control_plane/middleware.py @@ -0,0 +1,55 @@ +"""Authentication middleware for the Control Plane API.""" + +from __future__ import annotations + +from starlette.types import ASGIApp, Receive, Scope, Send + + +class AuthMiddleware: + def __init__(self, app: ASGIApp, api_token: str | None) -> None: + self.app = app + self._api_token = api_token + + async def __call__(self, scope: Scope, receive: Receive, send: Send) -> None: + if self._api_token is None: + await self.app(scope, receive, send) + return + + if scope.get("type") != "http": + await self.app(scope, receive, send) + return + + path = scope.get("path", "") + if path == "/health": + await self.app(scope, receive, send) + return + + headers = dict(scope.get("headers", [])) + auth_header = headers.get(b"authorization", b"").decode() + if not auth_header.startswith("Bearer "): + await _send_json(receive, send, 401, {"detail": "Unauthorized"}) + return + + token = auth_header[7:] + if token != self._api_token: + await _send_json(receive, send, 401, {"detail": "Unauthorized"}) + return + + await self.app(scope, receive, send) + + +async def _send_json(receive: Receive, send: Send, status: int, content: dict[str, object]) -> None: + import json + + body = json.dumps(content).encode() + await send( + { + "type": "http.response.start", + "status": status, + "headers": [ + (b"content-type", b"application/json"), + (b"content-length", str(len(body)).encode()), + ], + } + ) + await send({"type": "http.response.body", "body": body}) diff --git a/src/synapse_os/control_plane/models.py b/src/synapse_os/control_plane/models.py new file mode 100644 index 0000000..1a5f5c1 --- /dev/null +++ b/src/synapse_os/control_plane/models.py @@ -0,0 +1,67 @@ +"""Pydantic models for the Control Plane API.""" + +from __future__ import annotations + +from pydantic import BaseModel, Field + + +class HealthResponse(BaseModel): + status: str + runtime: str + + +class RunListItem(BaseModel): + id: str + status: str + prompt: str + created_at: str + + +class RunListResponse(BaseModel): + runs: list[RunListItem] + total: int + limit: int + offset: int + + +class RunStepItem(BaseModel): + name: str + status: str + + +class RunDetailResponse(BaseModel): + id: str + status: str + prompt: str + created_at: str + updated_at: str + steps: list[RunStepItem] = Field(default_factory=list) + artifacts: list[str] = Field(default_factory=list) + + +class RunCreateRequest(BaseModel): + prompt: str = Field(..., min_length=1) + + +class RunCreateResponse(BaseModel): + run_id: str + status: str + + +class RuntimeStatusResponse(BaseModel): + pid: int | None = None + uptime: int = 0 + state: str + active_runs: int = 0 + pending_runs: int = 0 + + +class ArtifactItem(BaseModel): + name: str + size_bytes: int + created_at: str + type: str + + +class ArtifactListResponse(BaseModel): + artifacts: list[ArtifactItem] diff --git a/src/synapse_os/control_plane/server.py b/src/synapse_os/control_plane/server.py new file mode 100644 index 0000000..ea3e612 --- /dev/null +++ b/src/synapse_os/control_plane/server.py @@ -0,0 +1,261 @@ +"""FastAPI application for the Local Control Plane.""" + +from __future__ import annotations + +import os +from datetime import UTC +from pathlib import Path +from typing import TYPE_CHECKING + +from fastapi import FastAPI, HTTPException, Query +from fastapi.responses import JSONResponse + +from synapse_os.control_plane.middleware import AuthMiddleware +from synapse_os.control_plane.models import ( + ArtifactItem, + ArtifactListResponse, + HealthResponse, + RunCreateRequest, + RunCreateResponse, + RunDetailResponse, + RunListItem, + RunListResponse, + RunStepItem, + RuntimeStatusResponse, +) + +if TYPE_CHECKING: + from synapse_os.persistence import ArtifactStore, RunRepository + from synapse_os.runtime.service import RuntimeService + +MAX_PROMPT_PREVIEW = 100 +TERMINAL_STATUSES = {"completed", "failed", "cancelled"} + + +def create_app( + *, + runtime_service: RuntimeService | None = None, + run_repository: RunRepository | None = None, + artifact_store: ArtifactStore | None = None, + api_token: str | None = None, +) -> FastAPI: + app = FastAPI( + title="SynapseOS Control Plane", + version="0.1.0", + docs_url=None, + redoc_url=None, + ) + + if api_token is not None: + app.add_middleware(AuthMiddleware, api_token=api_token) + + @app.get("/health", response_model=HealthResponse) + async def health() -> HealthResponse: + runtime_status = "unknown" + if runtime_service is not None: + try: + runtime_status = "running" if runtime_service.ready() else "stopped" + except Exception: + runtime_status = "stopped" + return HealthResponse(status="ok", runtime=runtime_status) + + @app.get("/api/v1/runs", response_model=RunListResponse) + async def list_runs( + limit: int = Query(default=20, ge=1, le=100), + offset: int = Query(default=0, ge=0), + ) -> RunListResponse: + if run_repository is None: + raise HTTPException(status_code=503, detail="Run repository not configured") + + all_runs = run_repository.list_runs() + total = len(all_runs) + page = all_runs[offset : offset + limit] + + return RunListResponse( + runs=[ + RunListItem( + id=r.run_id, + status=r.status, + prompt=(r.spec_path[:MAX_PROMPT_PREVIEW] if hasattr(r, "spec_path") else ""), + created_at=r.created_at, + ) + for r in page + ], + total=total, + limit=limit, + offset=offset, + ) + + @app.post("/api/v1/runs", response_model=RunCreateResponse, status_code=201) + async def create_run(request: RunCreateRequest) -> RunCreateResponse: + if run_repository is None: + raise HTTPException(status_code=503, detail="Run repository not configured") + + spec_path = _create_spec_from_prompt(request.prompt) + run_id = run_repository.create_run( + spec_path=spec_path, + initial_state="REQUEST", + stop_at="COMPLETE", + initiated_by="api", + ) + + return RunCreateResponse(run_id=run_id, status="pending") + + @app.get("/api/v1/runs/{run_id}", response_model=RunDetailResponse) + async def get_run(run_id: str) -> RunDetailResponse: + if run_repository is None: + raise HTTPException(status_code=503, detail="Run repository not configured") + + try: + run = run_repository.get_run(run_id) + except Exception as err: + raise HTTPException(status_code=404, detail="Run not found") from err + + steps = [] + try: + for s in run_repository.list_steps(run_id): + steps.append(RunStepItem(name=s.state, status=s.status)) + except Exception: + pass + + artifacts = [] + if artifact_store is not None: + try: + artifacts = artifact_store.list_artifact_paths(run_id) + except Exception: + pass + + return RunDetailResponse( + id=run.run_id, + status=run.status, + prompt=run.spec_path, + created_at=run.created_at, + updated_at=run.updated_at, + steps=steps, + artifacts=artifacts, + ) + + @app.post("/api/v1/runs/{run_id}/cancel") + async def cancel_run(run_id: str) -> JSONResponse: + if run_repository is None: + raise HTTPException(status_code=503, detail="Run repository not configured") + + try: + run = run_repository.get_run(run_id) + except Exception as err: + raise HTTPException(status_code=404, detail="Run not found") from err + + if run.status in TERMINAL_STATUSES: + raise HTTPException( + status_code=409, + detail=f"Cannot cancel run in terminal state: {run.status}", + ) + + try: + run_repository.mark_run_cancelling(run_id) + except ValueError as err: + raise HTTPException(status_code=409, detail="Run cannot be cancelled") from err + + return JSONResponse(content={"status": "cancelling", "run_id": run_id}) + + @app.get("/api/v1/runtime/status", response_model=RuntimeStatusResponse) + async def runtime_status() -> RuntimeStatusResponse: + if runtime_service is None: + raise HTTPException(status_code=503, detail="Runtime service not configured") + + state = runtime_service.current_state() + pending = 0 + if run_repository is not None: + try: + pending = len(run_repository.list_unlocked_pending_runs()) + except Exception: + pass + + uptime = 0 + if state.started_at is not None: + try: + from datetime import datetime + + started = datetime.fromisoformat(state.started_at) + uptime = int((datetime.now(UTC) - started).total_seconds()) + except Exception: + pass + + return RuntimeStatusResponse( + pid=state.pid, + uptime=uptime, + state=state.status, + active_runs=1 if state.status == "running" else 0, + pending_runs=pending, + ) + + @app.get("/api/v1/artifacts/{run_id}", response_model=ArtifactListResponse) + async def list_artifacts(run_id: str) -> ArtifactListResponse: + if artifact_store is None: + raise HTTPException(status_code=503, detail="Artifact store not configured") + + try: + paths = artifact_store.list_artifact_paths(run_id) + except FileNotFoundError as err: + raise HTTPException(status_code=404, detail="Run not found") from err + + artifacts = [] + for p in paths: + full_path = artifact_store.base_path / p + try: + stat = full_path.stat() + artifact_type = _infer_artifact_type(p) + artifacts.append( + ArtifactItem( + name=full_path.name, + size_bytes=stat.st_size, + created_at=_format_timestamp(stat.st_mtime), + type=artifact_type, + ) + ) + except OSError: + continue + + return ArtifactListResponse(artifacts=artifacts) + + return app + + +def _create_spec_from_prompt(prompt: str) -> Path: + from uuid import uuid4 + + tmp_dir = Path(os.environ.get("TMPDIR", "/tmp")) / "synapse-os" / "api-specs" + tmp_dir.mkdir(parents=True, exist_ok=True) + spec_path = tmp_dir / f"{uuid4().hex}.md" + spec_content = ( + "---\n" + "feature_id: api-run\n" + "feature_name: API Run\n" + "status: draft\n" + "---\n\n" + f"# API Run\n\n{prompt}\n" + ) + spec_path.write_text(spec_content, encoding="utf-8") + spec_path.chmod(0o600) + return spec_path + + +def _infer_artifact_type(path: str) -> str: + path_lower = path.lower() + if "spec" in path_lower: + return "spec" + if "test" in path_lower: + return "test" + if "report" in path_lower: + return "report" + if path_lower.endswith((".py", ".ts", ".js", ".rs", ".go")): + return "code" + if path_lower.endswith((".md", ".txt")): + return "document" + return "other" + + +def _format_timestamp(ts: float) -> str: + from datetime import datetime + + return datetime.fromtimestamp(ts, tz=UTC).isoformat() diff --git a/src/synapse_os/memory.py b/src/synapse_os/memory.py new file mode 100644 index 0000000..829f124 --- /dev/null +++ b/src/synapse_os/memory.py @@ -0,0 +1,129 @@ +from __future__ import annotations + +import json +from collections import defaultdict +from datetime import datetime, timezone +from pathlib import Path +from threading import Lock + +from pydantic import BaseModel, ConfigDict, Field, StrictStr + + +class ArtifactMetadata(BaseModel): + model_config = ConfigDict(strict=True) + + type: StrictStr = Field(default="unknown") + tags: list[StrictStr] = Field(default_factory=list) + source_step: StrictStr | None = None + created_at: datetime = Field(default_factory=lambda: datetime.now(timezone.utc)) + + +class ArtifactIndexEntry(BaseModel): + name: StrictStr + run_id: StrictStr + metadata: ArtifactMetadata + + +class IndexedArtifactStore: + def __init__(self, *, base_path: Path) -> None: + self.base_path = base_path + self._lock = Lock() + self._index: dict[str, list[ArtifactIndexEntry]] = defaultdict(list) + + def register( + self, + *, + run_id: str, + name: str, + metadata: ArtifactMetadata | None = None, + ) -> None: + with self._lock: + entry = ArtifactIndexEntry( + name=name, + run_id=run_id, + metadata=metadata or ArtifactMetadata(type="unknown"), + ) + self._index[run_id].append(entry) + + def find_by_tag(self, tag: str) -> list[ArtifactIndexEntry]: + with self._lock: + return [ + entry + for entries in self._index.values() + for entry in entries + if tag in entry.metadata.tags + ] + + def find_by_type(self, artifact_type: str) -> list[ArtifactIndexEntry]: + with self._lock: + return [ + entry + for entries in self._index.values() + for entry in entries + if entry.metadata.type == artifact_type + ] + + def list_for_run(self, run_id: str) -> list[ArtifactIndexEntry]: + with self._lock: + return list(self._index.get(run_id, [])) + + +class MemoryStore: + def __init__(self, *, state_dir: Path) -> None: + self.state_dir = state_dir + self._lock = Lock() + self._memory_path = state_dir / "memory-store.json" + self._memory: dict[str, dict[str, str]] = self._load() + + def _load(self) -> dict[str, dict[str, str]]: + if not self._memory_path.exists(): + return defaultdict(dict) + try: + data = json.loads(self._memory_path.read_text(encoding="utf-8")) + return defaultdict(dict, data) + except Exception: + return defaultdict(dict) + + def _persist(self) -> None: + self.state_dir.mkdir(parents=True, exist_ok=True) + self._memory_path.write_text( + json.dumps(dict(self._memory), ensure_ascii=False), + encoding="utf-8", + ) + + def get(self, namespace: str, key: str) -> str | None: + with self._lock: + return self._memory.get(namespace, {}).get(key) + + def set(self, namespace: str, key: str, value: str) -> None: + with self._lock: + self._memory[namespace][key] = value + self._persist() + + def delete(self, namespace: str, key: str) -> None: + with self._lock: + if namespace in self._memory and key in self._memory[namespace]: + del self._memory[namespace][key] + self._persist() + + def list_namespaces(self) -> list[str]: + with self._lock: + return list(self._memory.keys()) + + def feature_memory(self, feature_id: str) -> FeatureMemoryView: + return FeatureMemoryView(store=self, namespace=feature_id) + + +class FeatureMemoryView: + def __init__(self, store: MemoryStore, namespace: str) -> None: + self._store = store + self._namespace = namespace + + def get(self, key: str) -> str | None: + return self._store.get(self._namespace, key) + + def set(self, key: str, value: str) -> None: + self._store.set(self._namespace, key, value) + + def delete(self, key: str) -> None: + self._store.delete(self._namespace, key) diff --git a/src/synapse_os/multi_agent.py b/src/synapse_os/multi_agent.py new file mode 100644 index 0000000..ef11d45 --- /dev/null +++ b/src/synapse_os/multi_agent.py @@ -0,0 +1,129 @@ +from __future__ import annotations + +from dataclasses import dataclass, field + +from synapse_os.adapters import BaseCLIAdapter + + +class AdapterAlreadyRegisteredError(ValueError): + pass + + +class AdapterNotFoundError(KeyError): + pass + + +class NoSuitableAdapterError(RuntimeError): + pass + + +class AdapterRegistry: + def __init__(self) -> None: + self._adapters: dict[str, BaseCLIAdapter] = {} + + def register(self, adapter: BaseCLIAdapter) -> None: + if adapter.tool_name in self._adapters: + raise AdapterAlreadyRegisteredError( + f"Adapter '{adapter.tool_name}' is already registered." + ) + self._adapters[adapter.tool_name] = adapter + + def unregister(self, name: str) -> None: + if name not in self._adapters: + raise AdapterNotFoundError(f"Adapter '{name}' not found.") + del self._adapters[name] + + def get(self, name: str) -> BaseCLIAdapter | None: + return self._adapters.get(name) + + def list_all(self) -> list[BaseCLIAdapter]: + return list(self._adapters.values()) + + def find_by_capability(self, capability: str) -> list[BaseCLIAdapter]: + return [ + adapter for adapter in self._adapters.values() if capability in adapter.capabilities + ] + + def all_capabilities(self) -> set[str]: + caps: set[str] = set() + for adapter in self._adapters.values(): + caps.update(adapter.capabilities) + return caps + + +class CapabilityRouter: + def __init__(self, registry: AdapterRegistry) -> None: + self.registry = registry + + def select_adapter(self, required_capabilities: set[str]) -> BaseCLIAdapter | None: + if not required_capabilities: + adapters = self.registry.list_all() + return adapters[0] if adapters else None + + for capability in required_capabilities: + matches = self.registry.find_by_capability(capability) + if matches: + return matches[0] + + return None + + def get_best_match(self, required_capabilities: set[str]) -> BaseCLIAdapter | None: + if not required_capabilities: + adapters = self.registry.list_all() + return adapters[0] if adapters else None + + all_adapters = self.registry.list_all() + if not all_adapters: + return None + + scored: list[tuple[int, BaseCLIAdapter]] = [] + for adapter in all_adapters: + overlap = len(set(adapter.capabilities) & required_capabilities) + if overlap > 0: + scored.append((overlap, adapter)) + + if scored: + scored.sort(key=lambda x: x[0], reverse=True) + return scored[0][1] + + return all_adapters[0] + + +@dataclass +class MultiAgentCoordinator: + registry: AdapterRegistry + router: CapabilityRouter + required_steps: set[str] = field(default_factory=set) + _handoff_log: list[dict[str, str]] = field(default_factory=list) + + def resolve_adapter_for_step( + self, + step_name: str, + required_capabilities: set[str], + ) -> BaseCLIAdapter | None: + adapter = self.router.get_best_match(required_capabilities) + + if adapter is None and step_name in self.required_steps: + raise NoSuitableAdapterError( + f"No suitable adapter found for required step '{step_name}' " + f"with capabilities {required_capabilities}." + ) + + if adapter is not None: + self._handoff_log.append( + { + "step": step_name, + "adapter": adapter.tool_name, + "capabilities": ( + ",".join(required_capabilities) if required_capabilities else "" + ), + } + ) + + return adapter + + def get_handoff_log(self) -> list[dict[str, str]]: + return list(self._handoff_log) + + def clear_handoff_log(self) -> None: + self._handoff_log.clear() diff --git a/src/synapse_os/pipeline.py b/src/synapse_os/pipeline.py index 3aac974..e090d77 100644 --- a/src/synapse_os/pipeline.py +++ b/src/synapse_os/pipeline.py @@ -98,6 +98,7 @@ class PipelineContext(BaseModel): supervisor_decisions: list[StrictStr] = Field(default_factory=list) validated_spec: SpecDocument | None = None hooks_active: list[StrictStr] = Field(default_factory=list) + dag: dict[str, object] = Field(default_factory=dict) class StepExecutor(Protocol): @@ -309,6 +310,7 @@ def _execute_spec_validation(self, context: PipelineContext) -> None: context.validated_spec = spec_document context.artifacts["spec_id"] = spec_document.metadata.id context.artifacts["spec_summary"] = spec_document.metadata.summary + context.dag = spec_document.dag context.step_history.append(PipelineState.SPEC_VALIDATION) context.current_state = PipelineState.SPEC_VALIDATION diff --git a/src/synapse_os/pipeline_dag.py b/src/synapse_os/pipeline_dag.py new file mode 100644 index 0000000..1dbfd84 --- /dev/null +++ b/src/synapse_os/pipeline_dag.py @@ -0,0 +1,214 @@ +from __future__ import annotations + +from collections.abc import Callable +from concurrent.futures import ThreadPoolExecutor +from dataclasses import dataclass, field +from enum import StrEnum +from threading import Lock +from typing import Any + +from pydantic import BaseModel, ConfigDict, Field, StrictStr + + +class DAGSpecificationError(ValueError): + pass + + +class DAGStepStatus(StrEnum): + PENDING = "PENDING" + RUNNING = "RUNNING" + DONE = "DONE" + FAILED = "FAILED" + + +class DAGStep(BaseModel): + model_config = ConfigDict(strict=True) + + id: StrictStr = Field(min_length=1) + executor: StrictStr = Field(min_length=1) + depends_on: list[StrictStr] = Field(default_factory=list) + if_cond: StrictStr | None = None + + +class DAGConditional(BaseModel): + model_config = ConfigDict(strict=True) + + id: StrictStr = Field(min_length=1) + step: StrictStr + if_cond: StrictStr + + +class DAGSpec(BaseModel): + model_config = ConfigDict(strict=True) + + mode: StrictStr = Field(default="linear") + steps: list[DAGStep] = Field(default_factory=list) + conditionals: list[DAGConditional] = Field(default_factory=list) + + +class DAGValidator: + @staticmethod + def validate(spec: DAGSpec) -> None: + if spec.mode == "linear": + return + if spec.mode == "dag": + DAGValidator._validate_dag(spec) + else: + raise DAGSpecificationError( + f"Unknown DAG mode: {spec.mode!r}. Use 'linear' or 'dag'." + ) + + @staticmethod + def _validate_dag(spec: DAGSpec) -> None: + if not spec.steps: + raise DAGSpecificationError("DAG mode requires at least one step.") + + step_ids = {step.id for step in spec.steps} + for step in spec.steps: + for dep in step.depends_on: + if dep not in step_ids: + raise DAGSpecificationError( + f"Step '{step.id}' depends on non-existent step '{dep}'." + ) + DAGValidator._check_no_cycle(spec) + + @staticmethod + def _check_no_cycle(spec: DAGSpec) -> None: + in_degree: dict[str, int] = {step.id: 0 for step in spec.steps} + adj: dict[str, list[str]] = {step.id: [] for step in spec.steps} + + for step in spec.steps: + for dep in step.depends_on: + in_degree[step.id] += 1 + adj[dep].append(step.id) + + queue: list[str] = [sid for sid, deg in in_degree.items() if deg == 0] + visited = 0 + while queue: + node = queue.pop(0) + visited += 1 + for neighbor in adj[node]: + in_degree[neighbor] -= 1 + if in_degree[neighbor] == 0: + queue.append(neighbor) + + if visited != len(spec.steps): + raise DAGSpecificationError("Cycle detected in DAG graph.") + + +@dataclass +class DAGContext: + spec: DAGSpec + _states: dict[str, DAGStepStatus] = field(default_factory=dict) + _lock: Lock = field(default_factory=Lock) + + def __post_init__(self) -> None: + for step in self.spec.steps: + self._states[step.id] = DAGStepStatus.PENDING + + def get_state(self, step_id: str) -> DAGStepStatus: + return self._states[step_id] + + def mark_running(self, step_id: str) -> None: + with self._lock: + self._states[step_id] = DAGStepStatus.RUNNING + + def mark_done(self, step_id: str) -> None: + with self._lock: + self._states[step_id] = DAGStepStatus.DONE + + def mark_failed(self, step_id: str) -> None: + with self._lock: + self._states[step_id] = DAGStepStatus.FAILED + + def ready_steps(self) -> list[str]: + ready: list[str] = [] + for step in self.spec.steps: + if self._states[step.id] != DAGStepStatus.PENDING: + continue + deps_done = all( + self._states[dep] == DAGStepStatus.DONE for dep in step.depends_on + ) + if deps_done: + ready.append(step.id) + return ready + + def is_complete(self) -> bool: + return all( + self._states[sid] in (DAGStepStatus.DONE, DAGStepStatus.FAILED) + for sid in self._states + ) + + @property + def has_failed(self) -> bool: + return any(self._states[sid] == DAGStepStatus.FAILED for sid in self._states) + + +class DAGExecutor: + def __init__( + self, + spec: DAGSpec, + *, + max_workers: int = 4, + step_runner: Callable[[str, dict[str, Any]], None] | None = None, + ) -> None: + self.spec = spec + self.max_workers = max_workers + self.step_runner = step_runner or (lambda _sid, _ctx: None) + DAGValidator.validate(spec) + self.context = DAGContext(spec) + + def execute(self) -> None: + with ThreadPoolExecutor(max_workers=self.max_workers) as pool: + futures: dict[str, Any] = {} + while not self.context.is_complete(): + if self.context.has_failed: + break + + completed = [fid for fid, f in futures.items() if f.done()] + for fid in completed: + f = futures.pop(fid) + try: + f.result() + except Exception: + pass + + ready = self.context.ready_steps() + if not ready: + if not futures: + break + import time as _time + + _time.sleep(0.01) + continue + + for step_id in ready: + if step_id in futures and not futures[step_id].done(): + continue + self.context.mark_running(step_id) + future = pool.submit(self._run_step, step_id) + futures[step_id] = future + + def _run_step(self, step_id: str) -> None: + try: + self.step_runner(step_id, {}) + self.context.mark_done(step_id) + except Exception: + self.context.mark_failed(step_id) + raise + + +class LinearPipelineAdapter: + def __init__( + self, + steps: list[str], + step_runner: Callable[[str, dict[str, Any]], None], + ) -> None: + self.steps = steps + self.step_runner = step_runner + + def execute(self) -> None: + if not self.steps: + raise DAGSpecificationError("Linear pipeline requires at least one step.") + for step_id in self.steps: + self.step_runner(step_id, {}) diff --git a/src/synapse_os/plugins.py b/src/synapse_os/plugins.py new file mode 100644 index 0000000..cfbeb36 --- /dev/null +++ b/src/synapse_os/plugins.py @@ -0,0 +1,147 @@ +from __future__ import annotations + +from collections.abc import Callable +from dataclasses import dataclass, field +from importlib.metadata import entry_points +from typing import Any + +HOOK_TYPES = frozenset(["pre_step", "post_step", "on_run_start", "on_run_end"]) + + +@dataclass +class HookSpec: + name: str + hook_type: str + handler: Callable[..., Any] + + +@dataclass +class PluginManifest: + name: str + version: str + enabled: bool = True + hooks: list[str] = field(default_factory=list) + + +class PluginLoadError(Exception): + pass + + +class PluginRegistry: + _instance: PluginRegistry | None = None + _initialized: bool = False + + def __new__(cls) -> PluginRegistry: + if cls._instance is None: + cls._instance = super().__new__(cls) + return cls._instance + + def __init__(self) -> None: + if PluginRegistry._initialized: + return + self._plugins: dict[str, PluginManifest] = {} + self._handlers: dict[str, list[Callable[..., Any]]] = { + ht: [] for ht in HOOK_TYPES + } + self._hook_map: dict[str, dict[str, Callable[..., Any]]] = {} + PluginRegistry._initialized = True + + def register(self, manifest: PluginManifest) -> None: + if manifest.name in self._plugins: + raise PluginLoadError(f"Plugin '{manifest.name}' already registered") + self._plugins[manifest.name] = manifest + + def unregister(self, name: str) -> None: + if name not in self._plugins: + raise PluginLoadError(f"Plugin '{name}' not found") + hooks = self._hook_map.pop(name, {}) + del self._plugins[name] + for hook_type, handler in hooks.items(): + if not self._is_handler_registered( + hook_type, handler + ) and handler in self._handlers.get(hook_type, []): + self._handlers[hook_type].remove(handler) + + def get_plugin(self, name: str) -> PluginManifest | None: + return self._plugins.get(name) + + def list_plugins(self) -> list[str]: + return list(self._plugins.keys()) + + def is_loaded(self, name: str) -> bool: + return name in self._plugins + + def enable_plugin(self, name: str) -> None: + if name in self._plugins: + self._plugins[name].enabled = True + + def disable_plugin(self, name: str) -> None: + if name in self._plugins: + self._plugins[name].enabled = False + + def register_hook( + self, plugin_name: str, hook_type: str, handler: Callable[..., Any] + ) -> None: + if hook_type not in HOOK_TYPES: + raise ValueError(f"Unknown hook type: {hook_type}") + if plugin_name not in self._plugins: + raise PluginLoadError(f"Plugin '{plugin_name}' not registered") + if plugin_name not in self._hook_map: + self._hook_map[plugin_name] = {} + old_handler = self._hook_map[plugin_name].get(hook_type) + self._hook_map[plugin_name][hook_type] = handler + if hook_type not in self._handlers: + self._handlers[hook_type] = [] + if ( + old_handler is not None + and old_handler is not handler + and not self._is_handler_registered(hook_type, old_handler) + and old_handler in self._handlers.get(hook_type, []) + ): + self._handlers[hook_type].remove(old_handler) + if handler not in self._handlers[hook_type]: + self._handlers[hook_type].append(handler) + + def get_handlers(self, hook_type: str) -> list[Callable[..., Any]]: + handlers = [] + for hook_type_key, handler_list in self._handlers.items(): + if hook_type_key != hook_type: + continue + for handler in handler_list: + if self._is_handler_enabled(hook_type, handler): + handlers.append(handler) + return handlers + + def _is_handler_registered( + self, hook_type: str, handler: Callable[..., Any] + ) -> bool: + for hooks in self._hook_map.values(): + if hooks.get(hook_type) is handler: + return True + return False + + def _is_handler_enabled(self, hook_type: str, handler: Callable[..., Any]) -> bool: + for plugin_name, hooks in self._hook_map.items(): + if ( + hooks.get(hook_type) is handler + and self._plugins.get(plugin_name, None) is not None + ): + if self._plugins[plugin_name].enabled: + return True + return False + + def load_plugins(self) -> None: + eps = entry_points(group="synapse_os.plugins") + if hasattr(eps, "select"): + eps = eps.select(group="synapse_os.plugins") + for ep in eps: + try: + module = ep.load() + manifest = getattr(module, "hook_manifest", None) + if manifest is None: + continue + manifest_obj = manifest() + if isinstance(manifest_obj, PluginManifest): + self.register(manifest_obj) + except Exception: + pass diff --git a/src/synapse_os/reporting.py b/src/synapse_os/reporting.py index d2ea167..1f9edb8 100644 --- a/src/synapse_os/reporting.py +++ b/src/synapse_os/reporting.py @@ -4,6 +4,41 @@ from pathlib import Path from typing import Protocol +from pydantic import BaseModel, ConfigDict, Field + + +class TimelineEntry(BaseModel): + model_config = ConfigDict(strict=True) + + state: str + entered_at: float + duration_ms: int + + +class ExecutionTimeline(BaseModel): + model_config = ConfigDict(strict=True) + + entries: list[TimelineEntry] = Field(default_factory=list) + + +class AdapterMetrics(BaseModel): + model_config = ConfigDict(strict=True) + + tool_name: str + total_calls: int + success_count: int + failure_count: int + avg_duration_ms: float + + +class StructuredError(BaseModel): + model_config = ConfigDict(strict=True) + + error_type: str + message: str + step: str + count: int + class _RunRecordProtocol(Protocol): initiated_by: str @@ -42,6 +77,22 @@ class _ArtifactStoreProtocol(Protocol): def list_artifact_paths(self, run_id: str) -> list[str]: ... +class RunReport(BaseModel): + model_config = ConfigDict(strict=True) + + run_id: str + initiated_by: str + workspace_path: str + status: str + current_state: str + spec_hash: str | None = None + feature_id: str | None = None + feature_title: str | None = None + execution_timeline: ExecutionTimeline | None = None + adapter_metrics: list[AdapterMetrics] = Field(default_factory=list) + structured_errors: list[StructuredError] = Field(default_factory=list) + + class RunReportGenerator: def __init__( self, @@ -140,7 +191,10 @@ def build(self, run_id: str) -> str: def _read_spec_artifact(self, run_id: str, artifact_name: str) -> str: artifact_path = ( - self.artifact_store.base_path / run_id / "SPEC_VALIDATION" / f"{artifact_name}.txt" + self.artifact_store.base_path + / run_id + / "SPEC_VALIDATION" + / f"{artifact_name}.txt" ) if not artifact_path.exists(): return "-" @@ -155,3 +209,80 @@ def _format_timeout(self, value: bool | None) -> str: if value is None: return "-" return "yes" if value else "no" + + def generate_structured_report(self, run_id: str) -> RunReport: + run_record = self.repository.get_run(run_id) + step_records = self.repository.list_steps(run_id) + event_records = self.repository.list_events(run_id) + spec_id = self._read_spec_artifact(run_id, "spec_id") + spec_title = self._read_spec_artifact(run_id, "spec_title") + + timeline_entries: list[TimelineEntry] = [] + previous_entered_at: float | None = None + adapter_call_counts: dict[str, dict[str, int | float]] = {} + + for event in event_records: + if event.event_type == "state_entered" and event.state: + entered_at = getattr(event, "timestamp", None) + if entered_at is None: + entered_at = 0.0 + duration_ms = 0 + if previous_entered_at is not None: + duration_ms = int((entered_at - previous_entered_at) * 1000) + timeline_entries.append( + TimelineEntry( + state=event.state, + entered_at=entered_at, + duration_ms=duration_ms, + ) + ) + previous_entered_at = entered_at + + for step in step_records: + tool = step.tool_name or "unknown" + if tool not in adapter_call_counts: + adapter_call_counts[tool] = { + "total": 0, + "success": 0, + "failure": 0, + "duration_sum": 0, + } + adapter_call_counts[tool]["total"] += 1 + if step.return_code == 0: + adapter_call_counts[tool]["success"] += 1 + else: + adapter_call_counts[tool]["failure"] += 1 + if step.duration_ms is not None: + adapter_call_counts[tool]["duration_sum"] += step.duration_ms + + adapter_metrics: list[AdapterMetrics] = [] + for tool_name, counts in adapter_call_counts.items(): + total = counts["total"] + avg = counts["duration_sum"] / total if total > 0 else 0.0 + adapter_metrics.append( + AdapterMetrics( + tool_name=tool_name, + total_calls=int(total), + success_count=int(counts["success"]), + failure_count=int(counts["failure"]), + avg_duration_ms=avg, + ) + ) + + return RunReport( + run_id=run_id, + initiated_by=run_record.initiated_by, + workspace_path=run_record.workspace_path, + status=run_record.status, + current_state=run_record.current_state, + spec_hash=run_record.spec_hash, + feature_id=spec_id if spec_id != "-" else None, + feature_title=spec_title if spec_title != "-" else None, + execution_timeline=( + ExecutionTimeline(entries=timeline_entries) + if timeline_entries + else None + ), + adapter_metrics=adapter_metrics, + structured_errors=[], + ) diff --git a/src/synapse_os/runtime/service.py b/src/synapse_os/runtime/service.py index 2b87974..146ce5e 100644 --- a/src/synapse_os/runtime/service.py +++ b/src/synapse_os/runtime/service.py @@ -6,15 +6,29 @@ import signal import subprocess import sys +import threading import time +from collections.abc import Callable from pathlib import Path +from typing import Literal +from pydantic import BaseModel, ConfigDict, Field + +from synapse_os.runtime.circuit_breaker import AdapterCircuitBreakerStore from synapse_os.runtime.state import RuntimeState, RuntimeStateStore from synapse_os.runtime.worker import RuntimeWorker PROCESS_MARKER = "--synapse-runtime-process" +class RuntimeLifecycleEvent(BaseModel): + model_config = ConfigDict(strict=True) + + event: str + timestamp: float = Field(default_factory=time.time) + data: dict[str, object] = Field(default_factory=dict) + + def _runtime_process_code() -> str: return ( "import signal\n" @@ -78,7 +92,6 @@ def handle_shutdown(signum: int, frame: object) -> None: previous_sigterm = signal.signal(signal.SIGTERM, handle_shutdown) previous_sigint = signal.signal(signal.SIGINT, handle_shutdown) - # This is the minimal resident process for the Synapse-Flow runtime. self.state_store.write_running( os.getpid(), process_identity, @@ -188,3 +201,97 @@ def _is_foreground_runtime_process(arguments: list[str], process_identity: str) and "--process-identity" in arguments and process_identity in arguments ) + + +class _InterruptibleHandler: + def __init__(self, handler: Callable[[], None], timeout: float) -> None: + self.handler = handler + self.timeout = timeout + self.thread: threading.Thread | None = None + self.exc: BaseException | None = None + + def start(self) -> None: + self.thread = threading.Thread(target=self._run, daemon=True) + self.thread.start() + + def _run(self) -> None: + try: + self.handler() + except BaseException as e: + self.exc = e + + def join(self, timeout: float) -> None: + if self.thread is None: + return + self.thread.join(timeout=timeout) + + def cancel(self) -> None: + pass + + def is_alive(self) -> bool: + return self.thread is not None and self.thread.is_alive() + + +class RuntimeCoordinator: + def __init__( + self, + circuit_breaker_store: AdapterCircuitBreakerStore | None = None, + ) -> None: + self.circuit_breaker_store = circuit_breaker_store or AdapterCircuitBreakerStore( + Path(".synapse-os/runtime/circuit-breakers.json") + ) + self.lifecycle_events: list[RuntimeLifecycleEvent] = [] + self._cleanup_handlers: list[Callable[[], None]] = [] + + def health_status(self) -> Literal["HEALTHY", "DEGRADED", "UNHEALTHY"]: + open_adapters = [ + tool for tool in self._registered_tools() if self.circuit_breaker_store.is_open(tool) + ] + if not open_adapters: + return "HEALTHY" + if len(open_adapters) == 1: + return "DEGRADED" + return "UNHEALTHY" + + def lifecycle_event(self, event: str, data: dict[str, object] | None = None) -> None: + self.lifecycle_events.append(RuntimeLifecycleEvent(event=event, data=data or {})) + + def register_cleanup_handler(self, handler: Callable[[], None]) -> None: + self._cleanup_handlers.append(handler) + + def run_cleanup_handlers(self) -> None: + for handler in self._cleanup_handlers: + try: + handler() + except Exception: + pass + + def graceful_shutdown(self, timeout_seconds: float = 5.0) -> None: + self.lifecycle_event("runtime.stopping") + deadline = time.monotonic() + timeout_seconds + remaining = timeout_seconds + + for handler in self._cleanup_handlers: + if remaining <= 0: + break + thread = _InterruptibleHandler(handler, remaining) + thread.start() + thread.join(timeout=remaining) + if thread.is_alive(): + thread.cancel() + remaining = max(deadline - time.monotonic(), 0.0) + + self._stop() + self.lifecycle_event("runtime.stopped") + + @property + def degraded_adapters(self) -> set[str]: + return { + tool for tool in self._registered_tools() if self.circuit_breaker_store.is_open(tool) + } + + def _registered_tools(self) -> list[str]: + return ["codex", "gemini", "copilot"] + + def _stop(self) -> None: + pass diff --git a/src/synapse_os/specs/validator.py b/src/synapse_os/specs/validator.py index 777368b..d3755cb 100644 --- a/src/synapse_os/specs/validator.py +++ b/src/synapse_os/specs/validator.py @@ -27,6 +27,7 @@ class SpecMetadata(BaseModel): acceptance_criteria: list[str] = Field(min_length=1) non_goals: list[str] hooks: list[HookConfig] = Field(default_factory=list) + dag: dict[str, object] = Field(default_factory=dict) class SpecDocument(BaseModel): @@ -35,15 +36,17 @@ class SpecDocument(BaseModel): metadata: SpecMetadata sections: dict[str, str] body: str + dag: dict[str, object] = Field(default_factory=dict) def validate_spec_file(path: Path) -> SpecDocument: text = path.read_text(encoding="utf-8") metadata_block, body = _split_front_matter(text) metadata = _load_metadata(metadata_block) + dag = _load_dag(metadata_block) sections = _parse_sections(body) _require_sections(sections, required_sections=("Contexto", "Objetivo")) - return SpecDocument(metadata=metadata, sections=sections, body=body.strip()) + return SpecDocument(metadata=metadata, sections=sections, body=body.strip(), dag=dag) def _split_front_matter(text: str) -> tuple[str, str]: @@ -78,6 +81,17 @@ def _load_metadata(metadata_block: str) -> SpecMetadata: raise SpecValidationError(f"SPEC metadata is invalid: {message}") from exc +def _load_dag(metadata_block: str) -> dict[str, object]: + try: + raw = yaml.safe_load(metadata_block) + except yaml.YAMLError as exc: + raise SpecValidationError("SPEC front matter YAML is invalid.") from exc + + if not isinstance(raw, dict): + return {} + return raw.get("dag", {}) or {} + + def _validate_hooks_in_raw_metadata(raw_metadata: dict[str, object]) -> None: if "hooks" not in raw_metadata: return diff --git a/src/synapse_os/supervisor.py b/src/synapse_os/supervisor.py index 52fd5a7..944d8be 100644 --- a/src/synapse_os/supervisor.py +++ b/src/synapse_os/supervisor.py @@ -3,6 +3,7 @@ from pydantic import BaseModel, ConfigDict, Field, StrictInt, StrictStr RETRYABLE_STATES = frozenset({"PLAN", "TEST_RED", "CODE_GREEN"}) +TERMINAL_STATES = frozenset({"SECURITY", "SPEC_VALIDATION"}) class RetryableStepError(RuntimeError): @@ -13,6 +14,46 @@ class ReviewRejectedError(RuntimeError): """Signals that REVIEW requested rework and must return to CODE_GREEN.""" +class AdapterOperationalError(RuntimeError): + """Marks an adapter operational failure with a category.""" + + def __init__(self, message: str, category: str) -> None: + super().__init__(message) + self.category = category + + +class RetryPolicy(BaseModel): + model_config = ConfigDict(strict=True) + + max_retries: StrictInt = Field(default=2, ge=0) + base_delay_seconds: float = Field(default=1.0, ge=0) + max_delay_seconds: float = Field(default=60.0, ge=0) + + +class StepPolicy(BaseModel): + model_config = ConfigDict(strict=True) + + step_name: StrictStr + retry: RetryPolicy = Field(default_factory=RetryPolicy) + + +class SupervisorPolicies(BaseModel): + model_config = ConfigDict(strict=True) + + default: RetryPolicy = Field(default_factory=RetryPolicy) + step_overrides: dict[str, StepPolicy] = Field(default_factory=dict) + + def resolve_for_step(self, step_name: str) -> RetryPolicy: + if step_name in self.step_overrides: + return self.step_overrides[step_name].retry + return self.default + + +def calculate_backoff(attempt: int, base_delay: float, max_delay: float) -> float: + delay = base_delay * (2 ** (attempt - 1)) + return float(min(delay, max_delay)) + + class SupervisorDecision(BaseModel): model_config = ConfigDict(strict=True) @@ -20,6 +61,7 @@ class SupervisorDecision(BaseModel): next_state: StrictStr route: StrictStr | None = None reason: StrictStr | None = None + backoff_seconds: float | None = None class Supervisor(BaseModel): @@ -84,3 +126,86 @@ def decide_after_review_rejection(self) -> SupervisorDecision: next_state="CODE_GREEN", reason="review_requested_rework", ) + + +class AdvancedSupervisor(BaseModel): + model_config = ConfigDict(strict=True) + + policies: SupervisorPolicies = Field(default_factory=SupervisorPolicies) + + def _resolve_policy(self, state: str) -> RetryPolicy: + return self.policies.resolve_for_step(state) + + def _is_terminal_state(self, state: str) -> bool: + return state in TERMINAL_STATES + + def _is_short_circuit(self, error: Exception) -> bool: + if isinstance(error, AdapterOperationalError): + category = getattr(error, "category", None) or getattr(error, "reason", None) + return category == "launcher_unavailable" + return False + + def decide_after_failure( + self, + *, + state: str, + error: Exception, + attempt: int, + available_routes: tuple[str, ...], + ) -> SupervisorDecision: + primary_route = available_routes[0] if available_routes else None + fallback_route = available_routes[1] if len(available_routes) > 1 else None + + if self._is_terminal_state(state): + reason = f"{state.lower()}_is_terminal" + return SupervisorDecision( + action="fail", + next_state=state, + reason=reason, + ) + + if isinstance(error, ReviewRejectedError) and state == "REVIEW": + return SupervisorDecision( + action="return_to_code_green", + next_state="CODE_GREEN", + reason="review_requested_rework", + ) + + if self._is_short_circuit(error) and fallback_route is not None: + return SupervisorDecision( + action="reroute", + next_state=state, + route=fallback_route, + reason="operational_error_short_circuit", + ) + + if ( + isinstance(error, (RetryableStepError, AdapterOperationalError)) + and state in RETRYABLE_STATES + ): + policy = self._resolve_policy(state) + if attempt <= policy.max_retries: + backoff = calculate_backoff( + attempt, policy.base_delay_seconds, policy.max_delay_seconds + ) + return SupervisorDecision( + action="retry", + next_state=state, + route=primary_route, + reason="retryable_failure_with_budget", + backoff_seconds=backoff, + ) + if fallback_route is not None: + return SupervisorDecision( + action="reroute", + next_state=state, + route=fallback_route, + reason="retry_budget_exhausted_with_fallback", + ) + + return SupervisorDecision( + action="fail", + next_state=state, + route=primary_route, + reason="terminal_failure", + ) diff --git a/src/synapse_os/workspace.py b/src/synapse_os/workspace.py new file mode 100644 index 0000000..85def28 --- /dev/null +++ b/src/synapse_os/workspace.py @@ -0,0 +1,141 @@ +from __future__ import annotations + +import shutil +from collections.abc import Callable +from enum import StrEnum +from pathlib import Path +from typing import Any + +from pydantic import BaseModel, Field + + +class WorkspaceState(StrEnum): + CREATING = "creating" + READY = "ready" + BUSY = "busy" + CLEANUP = "cleanup" + DESTROYED = "destroyed" + + +class TrackedWorkspace(BaseModel): + root: Path + state: WorkspaceState = WorkspaceState.CREATING + run_id: str | None = None + metadata: dict[str, Any] = Field(default_factory=dict) + + def mark_ready(self, run_id: str) -> None: + self.state = WorkspaceState.READY + self.run_id = run_id + + def mark_busy(self) -> None: + self.state = WorkspaceState.BUSY + + def mark_cleanup(self) -> None: + self.state = WorkspaceState.CLEANUP + + def mark_destroyed(self) -> None: + self.state = WorkspaceState.DESTROYED + + def reset_for_reuse(self) -> None: + self.state = WorkspaceState.CREATING + self.run_id = None + self.metadata = {} + for item in self.root.iterdir(): + if item.name != self.root.name: + if item.is_dir(): + shutil.rmtree(item) + else: + item.unlink() + + def set_metadata(self, key: str, value: Any) -> None: + self.metadata[key] = value + + def get_metadata(self, key: str, default: Any = None) -> Any: + return self.metadata.get(key, default) + + +class PoolExhaustedError(Exception): + pass + + +class WorkspacePool(BaseModel): + base_dir: Path + max_size: int + acquired_count: int = 0 + idle_workspaces: list[TrackedWorkspace] = Field(default_factory=list) + workspace_counter: int = Field(default=0) + + def acquire(self, run_id: str) -> TrackedWorkspace: + if self.idle_workspaces: + ws = self.idle_workspaces.pop(0) + ws.mark_ready(run_id) + self.acquired_count += 1 + return ws + if self.acquired_count >= self.max_size: + raise PoolExhaustedError(f"Pool exhausted: {self.acquired_count}/{self.max_size}") + self.workspace_counter += 1 + ws_root = self.base_dir / f"ws-{self.workspace_counter}" + ws_root.mkdir(parents=True, exist_ok=True) + ws = TrackedWorkspace(root=ws_root) + ws.mark_ready(run_id) + self.acquired_count += 1 + return ws + + def release(self, ws: TrackedWorkspace) -> None: + ws.reset_for_reuse() + ws.state = WorkspaceState.READY + self.idle_workspaces.append(ws) + self.acquired_count -= 1 + + def discard(self, ws: TrackedWorkspace) -> None: + was_idle = ws in self.idle_workspaces + if was_idle: + self.idle_workspaces.remove(ws) + if ws.root.exists(): + shutil.rmtree(ws.root) + ws.mark_destroyed() + if not was_idle: + self.acquired_count -= 1 + + @property + def idle_count(self) -> int: + return len(self.idle_workspaces) + + def stats(self) -> dict[str, int]: + return { + "total": self.max_size, + "acquired": self.acquired_count, + "idle": self.idle_count, + "discarded": self.max_size - self.acquired_count - self.idle_count, + } + + +class WorkspaceManager: + def __init__(self, base_dir: Path, pool_size: int) -> None: + self.base_dir = base_dir + self.pool = WorkspacePool(base_dir=base_dir / ".workspace_pool", max_size=pool_size) + self._cache: dict[str, TrackedWorkspace] = {} + self._cleanup_hooks: list[Callable[[Path], None]] = [] + + def create_workspace(self, run_id: str) -> TrackedWorkspace: + ws = self.pool.acquire(run_id) + self._cache[run_id] = ws + return ws + + def register_cleanup_hook(self, hook: Callable[[Path], None]) -> None: + self._cleanup_hooks.append(hook) + + def cleanup_workspace(self, ws: TrackedWorkspace) -> None: + ws.mark_cleanup() + for hook in self._cleanup_hooks: + hook(ws.root) + + def get_workspace(self, run_id: str) -> TrackedWorkspace | None: + return self._cache.get(run_id) + + def list_workspaces(self) -> list[TrackedWorkspace]: + return list(self._cache.values()) + + def cleanup_all(self) -> None: + for ws in list(self._cache.values()): + self.cleanup_workspace(ws) diff --git a/tests/unit/test_control_plane.py b/tests/unit/test_control_plane.py new file mode 100644 index 0000000..2bbf9e6 --- /dev/null +++ b/tests/unit/test_control_plane.py @@ -0,0 +1,521 @@ +"""Tests for the Local Control Plane (F60).""" + +import pytest +from unittest.mock import MagicMock, patch +from pathlib import Path +from httpx import AsyncClient, ASGITransport + +from synapse_os.control_plane.server import create_app +from synapse_os.persistence import RunRecord, RunStepRecord + + +class TestHealthEndpoint: + """Tests for GET /health endpoint.""" + + @pytest.mark.asyncio + async def test_health_returns_ok_when_runtime_running(self): + runtime_service = MagicMock() + runtime_service.ready.return_value = True + + app = create_app(runtime_service=runtime_service, api_token=None) + transport = ASGITransport(app=app) + async with AsyncClient(transport=transport, base_url="http://test") as client: + response = await client.get("/health") + + assert response.status_code == 200 + data = response.json() + assert data["status"] == "ok" + assert data["runtime"] == "running" + + @pytest.mark.asyncio + async def test_health_returns_ok_when_runtime_stopped(self): + runtime_service = MagicMock() + runtime_service.ready.return_value = False + + app = create_app(runtime_service=runtime_service, api_token=None) + transport = ASGITransport(app=app) + async with AsyncClient(transport=transport, base_url="http://test") as client: + response = await client.get("/health") + + assert response.status_code == 200 + data = response.json() + assert data["status"] == "ok" + assert data["runtime"] == "stopped" + + @pytest.mark.asyncio + async def test_health_is_public_no_auth_required(self): + runtime_service = MagicMock() + runtime_service.ready.return_value = True + + app = create_app(runtime_service=runtime_service, api_token="secret-token") + transport = ASGITransport(app=app) + async with AsyncClient(transport=transport, base_url="http://test") as client: + response = await client.get("/health") + + assert response.status_code == 200 + + +class TestAuthMiddleware: + """Tests for API token authentication middleware.""" + + @pytest.mark.asyncio + async def test_returns_401_without_token_when_auth_enabled(self): + runtime_service = MagicMock() + app = create_app(runtime_service=runtime_service, api_token="secret-token") + transport = ASGITransport(app=app) + async with AsyncClient(transport=transport, base_url="http://test") as client: + response = await client.get("/api/v1/runs") + + assert response.status_code == 401 + + @pytest.mark.asyncio + async def test_returns_401_with_invalid_token(self): + runtime_service = MagicMock() + app = create_app(runtime_service=runtime_service, api_token="secret-token") + transport = ASGITransport(app=app) + async with AsyncClient(transport=transport, base_url="http://test") as client: + response = await client.get( + "/api/v1/runs", + headers={"Authorization": "Bearer wrong-token"}, + ) + + assert response.status_code == 401 + + @pytest.mark.asyncio + async def test_allows_request_with_valid_token(self): + runtime_service = MagicMock() + run_repo = MagicMock() + run_repo.list_runs.return_value = [] + + app = create_app( + runtime_service=runtime_service, + api_token="secret-token", + run_repository=run_repo, + ) + transport = ASGITransport(app=app) + async with AsyncClient(transport=transport, base_url="http://test") as client: + response = await client.get( + "/api/v1/runs", + headers={"Authorization": "Bearer secret-token"}, + ) + + assert response.status_code == 200 + + @pytest.mark.asyncio + async def test_no_auth_required_when_token_not_configured(self): + runtime_service = MagicMock() + run_repo = MagicMock() + run_repo.list_runs.return_value = [] + + app = create_app( + runtime_service=runtime_service, + api_token=None, + run_repository=run_repo, + ) + transport = ASGITransport(app=app) + async with AsyncClient(transport=transport, base_url="http://test") as client: + response = await client.get("/api/v1/runs") + + assert response.status_code == 200 + + +class TestListRunsEndpoint: + """Tests for GET /api/v1/runs endpoint.""" + + def _make_run_record(self, run_id, status="pending", spec_path="/tmp/spec.md"): + return RunRecord( + run_id=run_id, + spec_path=spec_path, + workspace_path="/tmp/workspace", + spec_hash=None, + initiated_by="test", + stop_at="COMPLETE", + status=status, + current_state="REQUEST", + locked=False, + failure_message=None, + created_at="2026-03-31T10:00:00Z", + updated_at="2026-03-31T10:00:00Z", + completed_at=None, + ) + + @pytest.mark.asyncio + async def test_returns_empty_list_when_no_runs(self): + runtime_service = MagicMock() + run_repo = MagicMock() + run_repo.list_runs.return_value = [] + + app = create_app( + runtime_service=runtime_service, + api_token=None, + run_repository=run_repo, + ) + transport = ASGITransport(app=app) + async with AsyncClient(transport=transport, base_url="http://test") as client: + response = await client.get("/api/v1/runs") + + assert response.status_code == 200 + data = response.json() + assert data["runs"] == [] + assert data["total"] == 0 + + @pytest.mark.asyncio + async def test_returns_paginated_runs(self): + runtime_service = MagicMock() + run_repo = MagicMock() + mock_run = self._make_run_record("run-1", "completed", "/tmp/spec.md") + run_repo.list_runs.return_value = [mock_run] + + app = create_app( + runtime_service=runtime_service, + api_token=None, + run_repository=run_repo, + ) + transport = ASGITransport(app=app) + async with AsyncClient(transport=transport, base_url="http://test") as client: + response = await client.get("/api/v1/runs?limit=10&offset=0") + + assert response.status_code == 200 + data = response.json() + assert len(data["runs"]) == 1 + assert data["total"] == 1 + assert data["limit"] == 10 + assert data["offset"] == 0 + assert data["runs"][0]["id"] == "run-1" + assert data["runs"][0]["status"] == "completed" + + @pytest.mark.asyncio + async def test_truncates_long_prompt(self): + runtime_service = MagicMock() + run_repo = MagicMock() + long_path = "/tmp/" + "x" * 500 + ".md" + mock_run = self._make_run_record("run-1", "pending", long_path) + run_repo.list_runs.return_value = [mock_run] + + app = create_app( + runtime_service=runtime_service, + api_token=None, + run_repository=run_repo, + ) + transport = ASGITransport(app=app) + async with AsyncClient(transport=transport, base_url="http://test") as client: + response = await client.get("/api/v1/runs") + + data = response.json() + assert len(data["runs"][0]["prompt"]) <= 100 + + +class TestCreateRunEndpoint: + """Tests for POST /api/v1/runs endpoint.""" + + @pytest.mark.asyncio + async def test_creates_run_with_valid_prompt(self): + runtime_service = MagicMock() + run_repo = MagicMock() + run_repo.create_run.return_value = "new-run-123" + + app = create_app( + runtime_service=runtime_service, + api_token=None, + run_repository=run_repo, + ) + transport = ASGITransport(app=app) + async with AsyncClient(transport=transport, base_url="http://test") as client: + response = await client.post( + "/api/v1/runs", + json={"prompt": "implement a sorting algorithm"}, + ) + + assert response.status_code == 201 + data = response.json() + assert data["run_id"] == "new-run-123" + assert data["status"] == "pending" + run_repo.create_run.assert_called_once() + + @pytest.mark.asyncio + async def test_rejects_empty_prompt(self): + runtime_service = MagicMock() + run_repo = MagicMock() + + app = create_app( + runtime_service=runtime_service, + api_token=None, + run_repository=run_repo, + ) + transport = ASGITransport(app=app) + async with AsyncClient(transport=transport, base_url="http://test") as client: + response = await client.post("/api/v1/runs", json={"prompt": ""}) + + assert response.status_code == 422 + + @pytest.mark.asyncio + async def test_rejects_missing_prompt(self): + runtime_service = MagicMock() + run_repo = MagicMock() + + app = create_app( + runtime_service=runtime_service, + api_token=None, + run_repository=run_repo, + ) + transport = ASGITransport(app=app) + async with AsyncClient(transport=transport, base_url="http://test") as client: + response = await client.post("/api/v1/runs", json={}) + + assert response.status_code == 422 + + +class TestRunDetailEndpoint: + """Tests for GET /api/v1/runs/{run_id} endpoint.""" + + def _make_run_record(self, run_id, status="completed"): + return RunRecord( + run_id=run_id, + spec_path="/tmp/spec.md", + workspace_path="/tmp/workspace", + spec_hash=None, + initiated_by="test", + stop_at="COMPLETE", + status=status, + current_state="COMPLETE", + locked=False, + failure_message=None, + created_at="2026-03-31T10:00:00Z", + updated_at="2026-03-31T11:00:00Z", + completed_at="2026-03-31T11:00:00Z", + ) + + @pytest.mark.asyncio + async def test_returns_run_detail(self): + runtime_service = MagicMock() + run_repo = MagicMock() + mock_run = self._make_run_record("run-1", "completed") + run_repo.get_run.return_value = mock_run + run_repo.list_steps.return_value = [ + RunStepRecord( + step_id=1, + run_id="run-1", + state="SPEC", + status="completed", + raw_output_path=None, + clean_output_path=None, + tool_name=None, + return_code=0, + duration_ms=100, + timed_out=False, + created_at="2026-03-31T10:00:00Z", + ), + ] + + app = create_app( + runtime_service=runtime_service, + api_token=None, + run_repository=run_repo, + ) + transport = ASGITransport(app=app) + async with AsyncClient(transport=transport, base_url="http://test") as client: + response = await client.get("/api/v1/runs/run-1") + + assert response.status_code == 200 + data = response.json() + assert data["id"] == "run-1" + assert data["status"] == "completed" + + @pytest.mark.asyncio + async def test_returns_404_for_nonexistent_run(self): + runtime_service = MagicMock() + run_repo = MagicMock() + run_repo.get_run.side_effect = Exception("no rows found") + + app = create_app( + runtime_service=runtime_service, + api_token=None, + run_repository=run_repo, + ) + transport = ASGITransport(app=app) + async with AsyncClient(transport=transport, base_url="http://test") as client: + response = await client.get("/api/v1/runs/nonexistent") + + assert response.status_code == 404 + + +class TestCancelRunEndpoint: + """Tests for POST /api/v1/runs/{run_id}/cancel endpoint.""" + + def _make_run_record(self, run_id, status="pending"): + return RunRecord( + run_id=run_id, + spec_path="/tmp/spec.md", + workspace_path="/tmp/workspace", + spec_hash=None, + initiated_by="test", + stop_at="COMPLETE", + status=status, + current_state="REQUEST", + locked=False, + failure_message=None, + created_at="2026-03-31T10:00:00Z", + updated_at="2026-03-31T10:00:00Z", + completed_at=None, + ) + + @pytest.mark.asyncio + async def test_cancels_pending_run(self): + runtime_service = MagicMock() + run_repo = MagicMock() + mock_run = self._make_run_record("run-1", "pending") + run_repo.get_run.return_value = mock_run + + app = create_app( + runtime_service=runtime_service, + api_token=None, + run_repository=run_repo, + ) + transport = ASGITransport(app=app) + async with AsyncClient(transport=transport, base_url="http://test") as client: + response = await client.post("/api/v1/runs/run-1/cancel") + + assert response.status_code == 200 + run_repo.mark_run_cancelling.assert_called_once_with("run-1") + + @pytest.mark.asyncio + async def test_returns_409_for_completed_run(self): + runtime_service = MagicMock() + run_repo = MagicMock() + mock_run = self._make_run_record("run-1", "completed") + run_repo.get_run.return_value = mock_run + + app = create_app( + runtime_service=runtime_service, + api_token=None, + run_repository=run_repo, + ) + transport = ASGITransport(app=app) + async with AsyncClient(transport=transport, base_url="http://test") as client: + response = await client.post("/api/v1/runs/run-1/cancel") + + assert response.status_code == 409 + + @pytest.mark.asyncio + async def test_returns_409_for_failed_run(self): + runtime_service = MagicMock() + run_repo = MagicMock() + mock_run = self._make_run_record("run-1", "failed") + run_repo.get_run.return_value = mock_run + + app = create_app( + runtime_service=runtime_service, + api_token=None, + run_repository=run_repo, + ) + transport = ASGITransport(app=app) + async with AsyncClient(transport=transport, base_url="http://test") as client: + response = await client.post("/api/v1/runs/run-1/cancel") + + assert response.status_code == 409 + + @pytest.mark.asyncio + async def test_returns_404_for_nonexistent_run(self): + runtime_service = MagicMock() + run_repo = MagicMock() + run_repo.get_run.side_effect = Exception("not found") + + app = create_app( + runtime_service=runtime_service, + api_token=None, + run_repository=run_repo, + ) + transport = ASGITransport(app=app) + async with AsyncClient(transport=transport, base_url="http://test") as client: + response = await client.post("/api/v1/runs/nonexistent/cancel") + + assert response.status_code == 404 + + +class TestRuntimeStatusEndpoint: + """Tests for GET /api/v1/runtime/status endpoint.""" + + @pytest.mark.asyncio + async def test_returns_runtime_status(self): + runtime_service = MagicMock() + mock_state = MagicMock() + mock_state.status = "running" + mock_state.pid = 12345 + mock_state.started_at = "2026-03-31T10:00:00+00:00" + runtime_service.current_state.return_value = mock_state + + run_repo = MagicMock() + run_repo.list_unlocked_pending_runs.return_value = [ + MagicMock(), + MagicMock(), + MagicMock(), + ] + + app = create_app( + runtime_service=runtime_service, + api_token=None, + run_repository=run_repo, + ) + transport = ASGITransport(app=app) + async with AsyncClient(transport=transport, base_url="http://test") as client: + response = await client.get("/api/v1/runtime/status") + + assert response.status_code == 200 + data = response.json() + assert data["pid"] == 12345 + assert data["state"] == "running" + assert data["pending_runs"] == 3 + + +class TestArtifactsEndpoint: + """Tests for GET /api/v1/artifacts/{run_id} endpoint.""" + + @pytest.mark.asyncio + async def test_lists_artifacts_for_run(self): + runtime_service = MagicMock() + artifact_store = MagicMock() + artifact_store.list_artifact_paths.return_value = [ + "run1/SPEC.md", + "run1/main.py", + ] + + artifact_store.base_path = MagicMock() + mock_stat = MagicMock() + mock_stat.st_size = 1024 + mock_stat.st_mtime = 1743405600.0 + mock_path = MagicMock() + mock_path.stat.return_value = mock_stat + mock_path.name = "SPEC.md" + artifact_store.base_path.__truediv__ = MagicMock(return_value=mock_path) + + app = create_app( + runtime_service=runtime_service, + api_token=None, + artifact_store=artifact_store, + ) + transport = ASGITransport(app=app) + async with AsyncClient(transport=transport, base_url="http://test") as client: + response = await client.get("/api/v1/artifacts/run-1") + + assert response.status_code == 200 + data = response.json() + assert len(data["artifacts"]) == 2 + + @pytest.mark.asyncio + async def test_returns_404_for_nonexistent_run_artifacts(self): + runtime_service = MagicMock() + artifact_store = MagicMock() + artifact_store.list_artifact_paths.side_effect = FileNotFoundError( + "run not found" + ) + + app = create_app( + runtime_service=runtime_service, + api_token=None, + artifact_store=artifact_store, + ) + transport = ASGITransport(app=app) + async with AsyncClient(transport=transport, base_url="http://test") as client: + response = await client.get("/api/v1/artifacts/nonexistent") + + assert response.status_code == 404 diff --git a/tests/unit/test_copilot_adapter.py b/tests/unit/test_copilot_adapter.py new file mode 100644 index 0000000..9ba4e73 --- /dev/null +++ b/tests/unit/test_copilot_adapter.py @@ -0,0 +1,137 @@ +from __future__ import annotations + +from unittest.mock import AsyncMock, patch + +import pytest + +from synapse_os.adapters import ( + CopilotCLIAdapter, + CLIExecutionResult, + classify_copilot_execution, +) + + +class TestCopilotCLIAdapter: + def test_capabilities(self) -> None: + adapter = CopilotCLIAdapter() + assert adapter.capabilities == ("cli_execution", "code_generation") + + def test_tool_spec_name(self) -> None: + adapter = CopilotCLIAdapter() + assert adapter.tool_spec.name == "copilot" + + def test_build_command(self) -> None: + adapter = CopilotCLIAdapter() + cmd = adapter.build_command("write a hello world in python") + assert "gh" in cmd + assert "copilot" in cmd + assert "write a hello world in python" in cmd + + def test_build_command_empty_prompt_raises(self) -> None: + adapter = CopilotCLIAdapter() + with pytest.raises(ValueError, match="empty"): + adapter.build_command(" ") + + +class TestClassifyCopilotExecution: + def test_success(self) -> None: + result = CLIExecutionResult( + tool_name="copilot", + command=["gh", "copilot", "ai"], + return_code=0, + stdout_raw="def hello(): pass\n", + stderr_raw="", + stdout_clean="def hello(): pass\n", + stderr_clean="", + duration_ms=500, + timed_out=False, + success=True, + ) + assessment = classify_copilot_execution(result) + assert assessment.category == "success" + assert not assessment.is_operational_block + + def test_timeout(self) -> None: + result = CLIExecutionResult( + tool_name="copilot", + command=["gh", "copilot", "ai"], + return_code=-1, + stdout_raw="", + stderr_raw="", + stdout_clean="", + stderr_clean="", + duration_ms=30000, + timed_out=True, + success=False, + ) + assessment = classify_copilot_execution(result) + assert assessment.category == "timeout" + assert not assessment.is_operational_block + + def test_return_code_nonzero(self) -> None: + result = CLIExecutionResult( + tool_name="copilot", + command=["gh", "copilot", "ai"], + return_code=1, + stdout_raw="", + stderr_raw="Something went wrong.", + stdout_clean="", + stderr_clean="Something went wrong.", + duration_ms=200, + timed_out=False, + success=False, + ) + assessment = classify_copilot_execution(result) + assert assessment.category == "return_code_nonzero" + assert not assessment.is_operational_block + + def test_authentication_unavailable(self) -> None: + result = CLIExecutionResult( + tool_name="copilot", + command=["gh", "copilot", "ai"], + return_code=1, + stdout_raw="", + stderr_raw="Error: authenticated required", + stdout_clean="", + stderr_clean="Error: authenticated required", + duration_ms=100, + timed_out=False, + success=False, + ) + assessment = classify_copilot_execution(result) + assert assessment.category == "authentication_unavailable" + assert assessment.is_operational_block + + def test_launcher_unavailable(self) -> None: + result = CLIExecutionResult( + tool_name="copilot", + command=["gh", "copilot", "ai"], + return_code=127, + stdout_raw="", + stderr_raw="gh: command not found", + stdout_clean="", + stderr_clean="gh: command not found", + duration_ms=50, + timed_out=False, + success=False, + ) + assessment = classify_copilot_execution(result) + assert assessment.category == "launcher_unavailable" + assert assessment.is_operational_block + + def test_circuit_open(self) -> None: + result = CLIExecutionResult( + tool_name="copilot", + command=["gh", "copilot", "ai"], + return_code=75, + stdout_raw="", + stderr_raw="circuit breaker open for copilot.\n", + stdout_clean="", + stderr_clean="circuit breaker open for copilot.", + duration_ms=0, + timed_out=False, + success=False, + ) + assessment = classify_copilot_execution(result) + assert assessment.category == "circuit_open" + assert assessment.is_operational_block diff --git a/tests/unit/test_hooks_cli.py b/tests/unit/test_hooks_cli.py index abe28f6..b18bf21 100644 --- a/tests/unit/test_hooks_cli.py +++ b/tests/unit/test_hooks_cli.py @@ -14,10 +14,7 @@ class TestHooksListCommand: def test_hooks_list_no_hooks(self) -> None: result = runner.invoke(app, ["hooks", "list"]) assert result.exit_code == 0 - assert ( - "No hooks configured" in result.output - or "nenhum hook" in result.output.lower() - ) + assert "No hooks configured" in result.output or "nenhum hook" in result.output.lower() def test_hooks_list_with_global_hooks(self) -> None: from synapse_os.runtime_contracts import HookConfig @@ -119,6 +116,4 @@ class TestHooksStatusCommand: def test_hooks_status_no_active_hooks(self) -> None: result = runner.invoke(app, ["hooks", "status"]) assert result.exit_code == 0 - assert ( - "No active hooks" in result.output or "nenhum hook" in result.output.lower() - ) + assert "No active hooks" in result.output or "nenhum hook" in result.output.lower() diff --git a/tests/unit/test_memory.py b/tests/unit/test_memory.py new file mode 100644 index 0000000..356a2b3 --- /dev/null +++ b/tests/unit/test_memory.py @@ -0,0 +1,154 @@ +from __future__ import annotations + +import json +import time +from datetime import datetime, timezone +from pathlib import Path + +import pytest + +from synapse_os.memory import ( + ArtifactMetadata, + FeatureMemoryView, + IndexedArtifactStore, + MemoryStore, +) + + +class TestArtifactMetadata: + def test_defaults(self) -> None: + meta = ArtifactMetadata(type="test_report", source_step="TEST_RED") + assert meta.type == "test_report" + assert meta.tags == [] + assert meta.source_step == "TEST_RED" + assert meta.created_at is not None + + def test_full(self) -> None: + now = datetime.now(timezone.utc) + meta = ArtifactMetadata( + type="log", + tags=["error", "crash"], + source_step="CODE_GREEN", + created_at=now, + ) + assert meta.type == "log" + assert meta.tags == ["error", "crash"] + assert meta.created_at == now + + +class TestIndexedArtifactStore: + def test_register_and_find_by_tag(self, tmp_path: Path) -> None: + store = IndexedArtifactStore(base_path=tmp_path) + store.register( + run_id="run-1", + name="error.log", + metadata=ArtifactMetadata(type="log", tags=["error"], source_step="RUN"), + ) + results = store.find_by_tag("error") + assert len(results) == 1 + assert results[0].name == "error.log" + + def test_find_by_tag_no_match(self, tmp_path: Path) -> None: + store = IndexedArtifactStore(base_path=tmp_path) + store.register( + run_id="run-1", + name="output.txt", + metadata=ArtifactMetadata(type="text", tags=["output"], source_step="RUN"), + ) + assert store.find_by_tag("error") == [] + + def test_find_by_type(self, tmp_path: Path) -> None: + store = IndexedArtifactStore(base_path=tmp_path) + store.register( + run_id="run-1", + name="report.txt", + metadata=ArtifactMetadata(type="test_report", source_step="RUN"), + ) + results = store.find_by_type("test_report") + assert len(results) == 1 + assert results[0].name == "report.txt" + + def test_list_for_run(self, tmp_path: Path) -> None: + store = IndexedArtifactStore(base_path=tmp_path) + store.register( + run_id="run-1", + name="a.txt", + metadata=ArtifactMetadata(type="text", source_step="RUN"), + ) + store.register( + run_id="run-1", + name="b.txt", + metadata=ArtifactMetadata(type="text", source_step="RUN"), + ) + store.register( + run_id="run-2", + name="c.txt", + metadata=ArtifactMetadata(type="text", source_step="RUN"), + ) + run1_artifacts = store.list_for_run("run-1") + assert len(run1_artifacts) == 2 + + def test_multiple_tags(self, tmp_path: Path) -> None: + store = IndexedArtifactStore(base_path=tmp_path) + store.register( + run_id="run-1", + name="log.txt", + metadata=ArtifactMetadata( + type="log", tags=["error", "crash"], source_step="RUN" + ), + ) + assert len(store.find_by_tag("error")) == 1 + assert len(store.find_by_tag("crash")) == 1 + + +class TestMemoryStore: + def test_set_and_get(self, tmp_path: Path) -> None: + store = MemoryStore(state_dir=tmp_path) + store.set("ns", "key", "value") + assert store.get("ns", "key") == "value" + + def test_get_missing(self, tmp_path: Path) -> None: + store = MemoryStore(state_dir=tmp_path) + assert store.get("ns", "missing") is None + + def test_delete(self, tmp_path: Path) -> None: + store = MemoryStore(state_dir=tmp_path) + store.set("ns", "key", "value") + store.delete("ns", "key") + assert store.get("ns", "key") is None + + def test_list_namespaces(self, tmp_path: Path) -> None: + store = MemoryStore(state_dir=tmp_path) + store.set("ns1", "k", "v") + store.set("ns2", "k", "v") + namespaces = store.list_namespaces() + assert set(namespaces) == {"ns1", "ns2"} + + def test_persistence(self, tmp_path: Path) -> None: + store = MemoryStore(state_dir=tmp_path) + store.set("ns", "key", "value") + store2 = MemoryStore(state_dir=tmp_path) + assert store2.get("ns", "key") == "value" + + def test_feature_memory_view(self, tmp_path: Path) -> None: + store = MemoryStore(state_dir=tmp_path) + fm = store.feature_memory("F59") + fm.set("decision", "use-dag") + assert store.get("F59", "decision") == "use-dag" + assert fm.get("decision") == "use-dag" + + def test_feature_memory_isolation(self, tmp_path: Path) -> None: + store = MemoryStore(state_dir=tmp_path) + store.set("F59", "key", "f59-value") + store.set("F60", "key", "f60-value") + fm59 = store.feature_memory("F59") + fm60 = store.feature_memory("F60") + assert fm59.get("key") == "f59-value" + assert fm60.get("key") == "f60-value" + + def test_feature_memory_delete(self, tmp_path: Path) -> None: + store = MemoryStore(state_dir=tmp_path) + store.set("F59", "key", "value") + fm = store.feature_memory("F59") + fm.delete("key") + assert store.get("F59", "key") is None diff --git a/tests/unit/test_multi_agent.py b/tests/unit/test_multi_agent.py new file mode 100644 index 0000000..96f8065 --- /dev/null +++ b/tests/unit/test_multi_agent.py @@ -0,0 +1,309 @@ +from __future__ import annotations + +import pytest + +from synapse_os.adapters import BaseCLIAdapter, CodexCLIAdapter, GeminiCLIAdapter +from synapse_os.contracts import CLIExecutionResult +from synapse_os.multi_agent import ( + AdapterAlreadyRegisteredError, + AdapterNotFoundError, + AdapterRegistry, + CapabilityRouter, + MultiAgentCoordinator, + NoSuitableAdapterError, +) + + +class FakeAdapter(BaseCLIAdapter): + def __init__( + self, + *, + tool_name: str = "fake", + capabilities: tuple[str, ...] = ("cli_execution",), + command_prefix: tuple[str, ...] = (), + ) -> None: + self._capabilities = capabilities + self._command_prefix = command_prefix + super().__init__(tool_name=tool_name) + + @property + def capabilities(self) -> tuple[str, ...]: + return self._capabilities + + @property + def command_prefix(self) -> tuple[str, ...]: + return self._command_prefix + + def build_command(self, prompt: str) -> list[str]: + return ["echo", prompt] + + +# --- AdapterRegistry tests --- + + +class TestAdapterRegistry: + def test_registrar_adapter_por_nome(self) -> None: + registry = AdapterRegistry() + adapter = FakeAdapter(tool_name="test_adapter") + registry.register(adapter) + + assert registry.get("test_adapter") is adapter + + def test_rejeitar_registro_duplicado(self) -> None: + registry = AdapterRegistry() + adapter = FakeAdapter(tool_name="dup") + registry.register(adapter) + + with pytest.raises(AdapterAlreadyRegisteredError): + registry.register(adapter) + + def test_retornar_none_para_adapter_inexistente(self) -> None: + registry = AdapterRegistry() + assert registry.get("nonexistent") is None + + def test_listar_todos_os_adapters(self) -> None: + registry = AdapterRegistry() + a1 = FakeAdapter(tool_name="a1") + a2 = FakeAdapter(tool_name="a2") + registry.register(a1) + registry.register(a2) + + all_adapters = registry.list_all() + assert len(all_adapters) == 2 + assert {a.tool_name for a in all_adapters} == {"a1", "a2"} + + def test_encontrar_adapters_por_capability(self) -> None: + registry = AdapterRegistry() + a1 = FakeAdapter( + tool_name="coder", capabilities=("cli_execution", "code_generation") + ) + a2 = FakeAdapter( + tool_name="planner", capabilities=("cli_execution", "planning") + ) + registry.register(a1) + registry.register(a2) + + coders = registry.find_by_capability("code_generation") + assert len(coders) == 1 + assert coders[0].tool_name == "coder" + + planners = registry.find_by_capability("planning") + assert len(planners) == 1 + assert planners[0].tool_name == "planner" + + def test_retornar_lista_vazia_se_nenhuma_capability_match(self) -> None: + registry = AdapterRegistry() + registry.register(FakeAdapter(tool_name="basic")) + + result = registry.find_by_capability("nonexistent_capability") + assert result == [] + + def test_encontrar_multiplos_adapters_com_mesma_capability(self) -> None: + registry = AdapterRegistry() + a1 = FakeAdapter(tool_name="coder1", capabilities=("code_generation",)) + a2 = FakeAdapter(tool_name="coder2", capabilities=("code_generation",)) + registry.register(a1) + registry.register(a2) + + result = registry.find_by_capability("code_generation") + assert len(result) == 2 + + def test_remover_adapter(self) -> None: + registry = AdapterRegistry() + adapter = FakeAdapter(tool_name="removable") + registry.register(adapter) + registry.unregister("removable") + + assert registry.get("removable") is None + assert "removable" not in [a.tool_name for a in registry.list_all()] + + def test_retornar_todas_as_capabilities_registradas(self) -> None: + registry = AdapterRegistry() + registry.register(FakeAdapter(tool_name="a1", capabilities=("cap1", "cap2"))) + registry.register(FakeAdapter(tool_name="a2", capabilities=("cap2", "cap3"))) + + all_caps = registry.all_capabilities() + assert all_caps == {"cap1", "cap2", "cap3"} + + +# --- CapabilityRouter tests --- + + +class TestCapabilityRouter: + def test_selecionar_adapter_com_capability_requerida(self) -> None: + registry = AdapterRegistry() + registry.register( + FakeAdapter(tool_name="coder", capabilities=("code_generation",)) + ) + registry.register( + FakeAdapter(tool_name="basic", capabilities=("cli_execution",)) + ) + + router = CapabilityRouter(registry) + selected = router.select_adapter({"code_generation"}) + + assert selected is not None + assert selected.tool_name == "coder" + + def test_retornar_none_se_nenhum_adapter_tiver_capability(self) -> None: + registry = AdapterRegistry() + registry.register(FakeAdapter(tool_name="basic")) + + router = CapabilityRouter(registry) + selected = router.select_adapter({"nonexistent"}) + + assert selected is None + + def test_selecionar_adapter_com_melhor_match(self) -> None: + registry = AdapterRegistry() + registry.register( + FakeAdapter( + tool_name="specialist", capabilities=("code_generation", "code_review") + ) + ) + registry.register( + FakeAdapter(tool_name="generalist", capabilities=("code_generation",)) + ) + + router = CapabilityRouter(registry) + selected = router.get_best_match({"code_generation", "code_review"}) + + assert selected is not None + assert selected.tool_name == "specialist" + + def test_usar_primeiro_adapter_como_fallback(self) -> None: + registry = AdapterRegistry() + registry.register(FakeAdapter(tool_name="first")) + registry.register(FakeAdapter(tool_name="second")) + + router = CapabilityRouter(registry) + selected = router.get_best_match({"nonexistent"}) + + assert selected is not None + assert selected.tool_name == "first" + + def test_retornar_none_se_registry_vazio(self) -> None: + registry = AdapterRegistry() + router = CapabilityRouter(registry) + + assert router.select_adapter({"anything"}) is None + assert router.get_best_match({"anything"}) is None + + def test_priorizar_adapter_com_mais_capabilities_sobrepostas(self) -> None: + registry = AdapterRegistry() + registry.register( + FakeAdapter(tool_name="partial", capabilities=("cap1", "cap2")) + ) + registry.register( + FakeAdapter(tool_name="full", capabilities=("cap1", "cap2", "cap3")) + ) + + router = CapabilityRouter(registry) + selected = router.get_best_match({"cap1", "cap2", "cap3"}) + + assert selected is not None + assert selected.tool_name == "full" + + +# --- MultiAgentCoordinator tests --- + + +class TestMultiAgentCoordinator: + def test_executar_step_com_adapter_selecionado(self) -> None: + registry = AdapterRegistry() + registry.register( + FakeAdapter(tool_name="coder", capabilities=("code_generation",)) + ) + + router = CapabilityRouter(registry) + coordinator = MultiAgentCoordinator(registry, router) + + adapter = coordinator.resolve_adapter_for_step( + "CODE_GREEN", {"code_generation"} + ) + assert adapter is not None + assert adapter.tool_name == "coder" + + def test_retornar_none_se_nenhum_adapter_disponivel(self) -> None: + registry = AdapterRegistry() + router = CapabilityRouter(registry) + coordinator = MultiAgentCoordinator(registry, router) + + adapter = coordinator.resolve_adapter_for_step("CODE_GREEN", {"nonexistent"}) + assert adapter is None + + def test_registrar_handoff_no_contexto(self) -> None: + registry = AdapterRegistry() + registry.register( + FakeAdapter(tool_name="coder", capabilities=("code_generation",)) + ) + + router = CapabilityRouter(registry) + coordinator = MultiAgentCoordinator(registry, router) + + handoffs = coordinator.get_handoff_log() + assert len(handoffs) == 0 + + coordinator.resolve_adapter_for_step("CODE_GREEN", {"code_generation"}) + + handoffs = coordinator.get_handoff_log() + assert len(handoffs) == 1 + assert handoffs[0]["step"] == "CODE_GREEN" + assert handoffs[0]["adapter"] == "coder" + + def test_usar_fallback_adapter_quando_nenhuma_capability_especificada(self) -> None: + registry = AdapterRegistry() + registry.register(FakeAdapter(tool_name="generic")) + + router = CapabilityRouter(registry) + coordinator = MultiAgentCoordinator(registry, router) + + adapter = coordinator.resolve_adapter_for_step("PLAN", set()) + assert adapter is not None + assert adapter.tool_name == "generic" + + def test_lancar_erro_se_adapter_nao_encontrado_para_step_obrigatorio(self) -> None: + registry = AdapterRegistry() + router = CapabilityRouter(registry) + coordinator = MultiAgentCoordinator( + registry, router, required_steps={"CODE_GREEN"} + ) + + with pytest.raises(NoSuitableAdapterError): + coordinator.resolve_adapter_for_step("CODE_GREEN", {"code_generation"}) + + def test_executar_com_adapters_reais(self) -> None: + registry = AdapterRegistry() + registry.register(CodexCLIAdapter()) + registry.register(GeminiCLIAdapter()) + + router = CapabilityRouter(registry) + coordinator = MultiAgentCoordinator(registry, router) + + codex_adapter = coordinator.resolve_adapter_for_step( + "CODE_GREEN", {"code_generation"} + ) + assert codex_adapter is not None + assert codex_adapter.tool_name == "codex" + + gemini_adapter = coordinator.resolve_adapter_for_step("PLAN", {"planning"}) + assert gemini_adapter is not None + assert gemini_adapter.tool_name == "gemini" + + def test_registrar_todas_as_execucoes(self) -> None: + registry = AdapterRegistry() + registry.register(FakeAdapter(tool_name="a1", capabilities=("cap1",))) + registry.register(FakeAdapter(tool_name="a2", capabilities=("cap2",))) + + router = CapabilityRouter(registry) + coordinator = MultiAgentCoordinator(registry, router) + + coordinator.resolve_adapter_for_step("STEP1", {"cap1"}) + coordinator.resolve_adapter_for_step("STEP2", {"cap2"}) + coordinator.resolve_adapter_for_step("STEP3", {"cap1"}) + + handoffs = coordinator.get_handoff_log() + assert len(handoffs) == 3 + assert handoffs[0]["step"] == "STEP1" + assert handoffs[1]["step"] == "STEP2" + assert handoffs[2]["step"] == "STEP3" diff --git a/tests/unit/test_pipeline_dag.py b/tests/unit/test_pipeline_dag.py new file mode 100644 index 0000000..89e5660 --- /dev/null +++ b/tests/unit/test_pipeline_dag.py @@ -0,0 +1,439 @@ +from __future__ import annotations + +from concurrent.futures import ThreadPoolExecutor +from threading import Lock +from unittest.mock import MagicMock + +import pytest + +from synapse_os.pipeline_dag import ( + DAGConditional, + DAGContext, + DAGExecutor, + DAGSpecificationError, + DAGSpec, + DAGStep, + DAGValidator, + LinearPipelineAdapter, +) + + +class TestDAGSpec: + def test_valid_linear_mode(self) -> None: + spec = DAGSpec(mode="linear") + assert spec.mode == "linear" + assert spec.steps == [] + assert spec.conditionals == [] + + def test_valid_dag_mode_empty_steps(self) -> None: + spec = DAGSpec(mode="dag", steps=[]) + assert spec.mode == "dag" + assert spec.steps == [] + + def test_valid_dag_step_full(self) -> None: + step = DAGStep(id="build", executor="codex", depends_on=[], if_cond=None) + spec = DAGSpec(mode="dag", steps=[step]) + assert spec.steps[0].id == "build" + + def test_dag_step_minimal(self) -> None: + step = DAGStep(id="build", executor="codex") + assert step.depends_on == [] + assert step.if_cond is None + + +class TestDAGValidator: + def test_valid_linear_no_error(self) -> None: + spec = DAGSpec(mode="linear") + DAGValidator.validate(spec) + + def test_valid_dag_single_step(self) -> None: + spec = DAGSpec( + mode="dag", + steps=[DAGStep(id="build", executor="codex", depends_on=[])], + ) + DAGValidator.validate(spec) + + def test_valid_dag_linear_chain(self) -> None: + spec = DAGSpec( + mode="dag", + steps=[ + DAGStep(id="a", executor="codex", depends_on=[]), + DAGStep(id="b", executor="codex", depends_on=["a"]), + DAGStep(id="c", executor="codex", depends_on=["b"]), + ], + ) + DAGValidator.validate(spec) + + def test_valid_dag_fan_out(self) -> None: + spec = DAGSpec( + mode="dag", + steps=[ + DAGStep(id="root", executor="codex", depends_on=[]), + DAGStep(id="a", executor="codex", depends_on=["root"]), + DAGStep(id="b", executor="codex", depends_on=["root"]), + DAGStep(id="c", executor="codex", depends_on=["root"]), + ], + ) + DAGValidator.validate(spec) + + def test_valid_dag_fan_in(self) -> None: + spec = DAGSpec( + mode="dag", + steps=[ + DAGStep(id="a", executor="codex", depends_on=[]), + DAGStep(id="b", executor="codex", depends_on=[]), + DAGStep(id="c", executor="codex", depends_on=["a", "b"]), + ], + ) + DAGValidator.validate(spec) + + def test_valid_dag_complex(self) -> None: + spec = DAGSpec( + mode="dag", + steps=[ + DAGStep(id="root", executor="codex", depends_on=[]), + DAGStep(id="a", executor="codex", depends_on=["root"]), + DAGStep(id="b", executor="codex", depends_on=["root"]), + DAGStep(id="c", executor="codex", depends_on=["a", "b"]), + ], + ) + DAGValidator.validate(spec) + + def test_cycle_detection_self_loop(self) -> None: + spec = DAGSpec( + mode="dag", + steps=[DAGStep(id="a", executor="codex", depends_on=["a"])], + ) + with pytest.raises(DAGSpecificationError, match="(?i)cycle"): + DAGValidator.validate(spec) + + def test_cycle_detection_two_node(self) -> None: + spec = DAGSpec( + mode="dag", + steps=[ + DAGStep(id="a", executor="codex", depends_on=["b"]), + DAGStep(id="b", executor="codex", depends_on=["a"]), + ], + ) + with pytest.raises(DAGSpecificationError, match="(?i)cycle"): + DAGValidator.validate(spec) + + def test_cycle_detection_three_node(self) -> None: + spec = DAGSpec( + mode="dag", + steps=[ + DAGStep(id="a", executor="codex", depends_on=["b"]), + DAGStep(id="b", executor="codex", depends_on=["c"]), + DAGStep(id="c", executor="codex", depends_on=["a"]), + ], + ) + with pytest.raises(DAGSpecificationError, match="(?i)cycle"): + DAGValidator.validate(spec) + + def test_missing_dependency_raises(self) -> None: + spec = DAGSpec( + mode="dag", + steps=[DAGStep(id="a", executor="codex", depends_on=["nonexistent"])], + ) + with pytest.raises(DAGSpecificationError, match="nonexistent"): + DAGValidator.validate(spec) + + def test_empty_steps_raises(self) -> None: + spec = DAGSpec(mode="dag", steps=[]) + with pytest.raises(DAGSpecificationError, match="at least one step"): + DAGValidator.validate(spec) + + +class TestDAGContext: + def test_initial_state_all_pending(self) -> None: + spec = DAGSpec( + mode="dag", + steps=[ + DAGStep(id="a", executor="codex", depends_on=[]), + DAGStep(id="b", executor="codex", depends_on=["a"]), + ], + ) + ctx = DAGContext(spec) + assert ctx.get_state("a") == "PENDING" + assert ctx.get_state("b") == "PENDING" + + def test_mark_running(self) -> None: + spec = DAGSpec( + mode="dag", + steps=[DAGStep(id="a", executor="codex", depends_on=[])], + ) + ctx = DAGContext(spec) + ctx.mark_running("a") + assert ctx.get_state("a") == "RUNNING" + + def test_mark_done(self) -> None: + spec = DAGSpec( + mode="dag", + steps=[DAGStep(id="a", executor="codex", depends_on=[])], + ) + ctx = DAGContext(spec) + ctx.mark_done("a") + assert ctx.get_state("a") == "DONE" + + def test_mark_failed(self) -> None: + spec = DAGSpec( + mode="dag", + steps=[DAGStep(id="a", executor="codex", depends_on=[])], + ) + ctx = DAGContext(spec) + ctx.mark_failed("a") + assert ctx.get_state("a") == "FAILED" + + def test_ready_steps_root(self) -> None: + spec = DAGSpec( + mode="dag", + steps=[ + DAGStep(id="a", executor="codex", depends_on=[]), + DAGStep(id="b", executor="codex", depends_on=["a"]), + ], + ) + ctx = DAGContext(spec) + ready = ctx.ready_steps() + assert ready == ["a"] + + def test_ready_steps_after_root_done(self) -> None: + spec = DAGSpec( + mode="dag", + steps=[ + DAGStep(id="a", executor="codex", depends_on=[]), + DAGStep(id="b", executor="codex", depends_on=["a"]), + ], + ) + ctx = DAGContext(spec) + ctx.mark_done("a") + ready = ctx.ready_steps() + assert ready == ["b"] + + def test_ready_steps_fan_in_both_deps_done(self) -> None: + spec = DAGSpec( + mode="dag", + steps=[ + DAGStep(id="a", executor="codex", depends_on=[]), + DAGStep(id="b", executor="codex", depends_on=[]), + DAGStep(id="c", executor="codex", depends_on=["a", "b"]), + ], + ) + ctx = DAGContext(spec) + assert set(ctx.ready_steps()) == {"a", "b"} + ctx.mark_done("a") + assert ctx.ready_steps() == ["b"] + ctx.mark_done("b") + assert ctx.ready_steps() == ["c"] + + def test_is_complete_all_done(self) -> None: + spec = DAGSpec( + mode="dag", + steps=[ + DAGStep(id="a", executor="codex", depends_on=[]), + DAGStep(id="b", executor="codex", depends_on=["a"]), + ], + ) + ctx = DAGContext(spec) + assert not ctx.is_complete() + ctx.mark_done("a") + assert not ctx.is_complete() + ctx.mark_done("b") + assert ctx.is_complete() + + def test_is_complete_has_failed(self) -> None: + spec = DAGSpec( + mode="dag", + steps=[DAGStep(id="a", executor="codex", depends_on=[])], + ) + ctx = DAGContext(spec) + ctx.mark_failed("a") + assert ctx.is_complete() + + def test_dependency_deduplication(self) -> None: + spec = DAGSpec( + mode="dag", + steps=[ + DAGStep(id="a", executor="codex", depends_on=[]), + DAGStep(id="b", executor="codex", depends_on=["a", "a"]), + ], + ) + ctx = DAGContext(spec) + ctx.mark_done("a") + assert ctx.ready_steps() == ["b"] + + +class TestDAGExecutor: + def test_init_rejects_unknown_mode(self) -> None: + spec = DAGSpec(mode="unsupported", steps=[]) + + with pytest.raises(DAGSpecificationError, match="Unknown DAG mode"): + DAGExecutor(spec=spec) + + def test_execute_single_step(self) -> None: + spec = DAGSpec( + mode="dag", + steps=[DAGStep(id="a", executor="codex", depends_on=[])], + ) + executed: list[str] = [] + + def run_step(step_id: str) -> None: + executed.append(step_id) + + executor = DAGExecutor( + spec=spec, + max_workers=4, + step_runner=lambda sid, _: run_step(sid), + ) + executor.execute() + + assert executed == ["a"] + assert executor.context.is_complete() + + def test_execute_linear_chain(self) -> None: + spec = DAGSpec( + mode="dag", + steps=[ + DAGStep(id="a", executor="codex", depends_on=[]), + DAGStep(id="b", executor="codex", depends_on=["a"]), + DAGStep(id="c", executor="codex", depends_on=["b"]), + ], + ) + executed: list[str] = [] + + def run_step(step_id: str) -> None: + executed.append(step_id) + + executor = DAGExecutor( + spec=spec, + max_workers=4, + step_runner=lambda sid, _: run_step(sid), + ) + executor.execute() + + assert executed == ["a", "b", "c"] + assert executor.context.is_complete() + + def test_execute_fan_out_parallel(self) -> None: + spec = DAGSpec( + mode="dag", + steps=[ + DAGStep(id="froot", executor="codex", depends_on=[]), + DAGStep(id="fa", executor="codex", depends_on=["froot"]), + DAGStep(id="fb", executor="codex", depends_on=["froot"]), + DAGStep(id="fc", executor="codex", depends_on=["froot"]), + ], + ) + order: list[str] = [] + + def run_step(step_id: str) -> None: + order.append(step_id) + + executor = DAGExecutor( + spec=spec, + max_workers=4, + step_runner=lambda sid, _: run_step(sid), + ) + executor.execute() + + assert order[0] == "froot" + assert set(order[1:]) == {"fa", "fb", "fc"} + assert executor.context.is_complete() + + def test_execute_fan_in(self) -> None: + spec = DAGSpec( + mode="dag", + steps=[ + DAGStep(id="fa", executor="codex", depends_on=[]), + DAGStep(id="fb", executor="codex", depends_on=[]), + DAGStep(id="fc", executor="codex", depends_on=["fa", "fb"]), + ], + ) + done: dict[str, bool] = {} + + def run_step(step_id: str) -> None: + done[step_id] = True + + executor = DAGExecutor( + spec=spec, + max_workers=4, + step_runner=lambda sid, _: run_step(sid), + ) + executor.execute() + + assert done.get("fa") and done.get("fb") and done.get("fc") + assert executor.context.is_complete() + + def test_execute_stops_on_failure(self) -> None: + spec = DAGSpec( + mode="dag", + steps=[ + DAGStep(id="a", executor="codex", depends_on=[]), + DAGStep(id="b", executor="codex", depends_on=["a"]), + ], + ) + + def run_step(step_id: str) -> None: + if step_id == "a": + raise RuntimeError("simulated failure") + + executor = DAGExecutor( + spec=spec, + max_workers=4, + step_runner=lambda sid, _: run_step(sid), + ) + executor.execute() + + assert executor.context.get_state("a") == "FAILED" + assert executor.context.has_failed + + def test_max_workers_limits_concurrency(self) -> None: + spec = DAGSpec( + mode="dag", + steps=[ + DAGStep(id=str(i), executor="codex", depends_on=[]) for i in range(8) + ], + ) + active = 0 + peak = 0 + lock = Lock() + + def run_step(step_id: str) -> None: + import time + + nonlocal active, peak + with lock: + active += 1 + peak = max(peak, active) + try: + time.sleep(0.05) + finally: + with lock: + active -= 1 + + executor = DAGExecutor( + spec=spec, + max_workers=2, + step_runner=lambda sid, _: run_step(sid), + ) + executor.execute() + + assert peak == 2 + + +class TestLinearPipelineAdapter: + def test_runs_linear_sequence(self) -> None: + executed: list[str] = [] + adapter = LinearPipelineAdapter( + steps=["a", "b", "c"], + step_runner=lambda sid, _: executed.append(sid), + ) + adapter.execute() + assert executed == ["a", "b", "c"] + + def test_raises_on_empty_steps(self) -> None: + adapter = LinearPipelineAdapter( + steps=[], + step_runner=lambda _, __: None, + ) + with pytest.raises(DAGSpecificationError, match="at least one step"): + adapter.execute() diff --git a/tests/unit/test_plugins.py b/tests/unit/test_plugins.py new file mode 100644 index 0000000..e26f1e3 --- /dev/null +++ b/tests/unit/test_plugins.py @@ -0,0 +1,229 @@ +from __future__ import annotations + +import pytest +from unittest.mock import MagicMock, patch + +from synapse_os.plugins import ( + PluginManifest, + PluginRegistry, + PluginLoadError, + HookSpec, + HOOK_TYPES, +) + + +class TestHookSpec: + def test_hook_spec_create(self): + spec = HookSpec(name="test", hook_type="pre_step", handler=MagicMock()) + assert spec.name == "test" + assert spec.hook_type == "pre_step" + + def test_valid_hook_types(self): + assert "pre_step" in HOOK_TYPES + assert "post_step" in HOOK_TYPES + assert "on_run_start" in HOOK_TYPES + assert "on_run_end" in HOOK_TYPES + + +class TestPluginManifest: + def test_create_manifest(self): + manifest = PluginManifest(name="test-plugin", version="1.0.0") + assert manifest.name == "test-plugin" + assert manifest.version == "1.0.0" + assert manifest.enabled is True + + def test_manifest_default_enabled(self): + manifest = PluginManifest(name="test", version="0.1.0") + assert manifest.enabled is True + + def test_manifest_with_hooks(self): + manifest = PluginManifest( + name="test", + version="1.0.0", + hooks=["pre_step", "post_step"], + ) + assert "pre_step" in manifest.hooks + assert "post_step" in manifest.hooks + + +class TestPluginRegistry: + @pytest.fixture(autouse=True) + def reset_registry(self): + registry = PluginRegistry() + registry._plugins.clear() + registry._hook_map.clear() + registry._handlers = {hook_type: [] for hook_type in HOOK_TYPES} + yield + registry._plugins.clear() + registry._hook_map.clear() + registry._handlers = {hook_type: [] for hook_type in HOOK_TYPES} + + def test_singleton_pattern(self): + registry1 = PluginRegistry() + registry2 = PluginRegistry() + assert registry1 is registry2 + + def test_register_plugin(self): + registry = PluginRegistry() + registry._plugins.clear() + manifest = PluginManifest(name="test-plugin", version="1.0.0") + registry.register(manifest) + assert "test-plugin" in registry.list_plugins() + + def test_register_duplicate_raises(self): + registry = PluginRegistry() + registry._plugins.clear() + manifest = PluginManifest(name="dup-plugin", version="1.0.0") + registry.register(manifest) + with pytest.raises(PluginLoadError, match="already registered"): + registry.register(manifest) + + def test_unregister_plugin(self): + registry = PluginRegistry() + registry._plugins.clear() + manifest = PluginManifest(name="unreg-plugin", version="1.0.0") + registry.register(manifest) + registry.unregister("unreg-plugin") + assert "unreg-plugin" not in registry.list_plugins() + + def test_unregister_unknown_raises(self): + registry = PluginRegistry() + registry._plugins.clear() + with pytest.raises(PluginLoadError, match="not found"): + registry.unregister("nonexistent") + + def test_get_plugin(self): + registry = PluginRegistry() + registry._plugins.clear() + manifest = PluginManifest(name="get-plugin", version="2.0.0") + registry.register(manifest) + retrieved = registry.get_plugin("get-plugin") + assert retrieved is not None + assert retrieved.name == "get-plugin" + + def test_get_plugin_not_found(self): + registry = PluginRegistry() + registry._plugins.clear() + assert registry.get_plugin("nonexistent") is None + + def test_list_plugins(self): + registry = PluginRegistry() + registry._plugins.clear() + registry.register(PluginManifest(name="p1", version="1.0.0")) + registry.register(PluginManifest(name="p2", version="1.0.0")) + plugins = registry.list_plugins() + assert "p1" in plugins + assert "p2" in plugins + + def test_enable_disable_plugin(self): + registry = PluginRegistry() + registry._plugins.clear() + manifest = PluginManifest(name="toggle-plugin", version="1.0.0") + registry.register(manifest) + registry.disable_plugin("toggle-plugin") + assert not registry.get_plugin("toggle-plugin").enabled + registry.enable_plugin("toggle-plugin") + assert registry.get_plugin("toggle-plugin").enabled + + def test_get_handlers_for_hook(self): + registry = PluginRegistry() + registry._plugins.clear() + handler = MagicMock() + manifest = PluginManifest( + name="handler-plugin", + version="1.0.0", + hooks=["pre_step"], + ) + registry.register(manifest) + registry.register_hook("handler-plugin", "pre_step", handler) + handlers = registry.get_handlers("pre_step") + assert handler in handlers + + def test_get_handlers_empty_for_unknown_hook(self): + registry = PluginRegistry() + registry._plugins.clear() + handlers = registry.get_handlers("on_run_start") + assert handlers == [] + + def test_hook_type_validation(self): + registry = PluginRegistry() + registry._plugins.clear() + manifest = PluginManifest(name="val-plugin", version="1.0.0") + registry.register(manifest) + with pytest.raises(ValueError, match="Unknown hook type"): + registry.register_hook("val-plugin", "invalid_hook", MagicMock()) + + def test_load_plugins_discovers_entry_points(self): + registry = PluginRegistry() + registry._plugins.clear() + mock_ep = MagicMock() + mock_ep.name = "discovered-plugin" + mock_ep.load.return_value.hook_manifest.return_value = PluginManifest( + name="discovered-plugin", version="0.1.0", hooks=["pre_step"] + ) + with patch("synapse_os.plugins.entry_points") as mock_eps: + mock_eps.return_value = [mock_ep] + registry.load_plugins() + assert "discovered-plugin" in registry.list_plugins() + + def test_load_plugins_handles_missing_manifest(self): + registry = PluginRegistry() + registry._plugins.clear() + mock_ep = MagicMock() + mock_ep.name = "no-manifest-plugin" + mock_ep.load.return_value.hook_manifest = None + with patch("synapse_os.plugins.entry_points") as mock_eps: + mock_eps.return_value = [mock_ep] + registry.load_plugins() + assert "no-manifest-plugin" not in registry.list_plugins() + + def test_is_loaded(self): + registry = PluginRegistry() + registry._plugins.clear() + assert registry.is_loaded("test") is False + registry.register(PluginManifest(name="test", version="1.0.0")) + assert registry.is_loaded("test") is True + + def test_get_handlers_keeps_shared_callable_for_enabled_plugin_when_other_disabled( + self, + ): + registry = PluginRegistry() + shared_handler = MagicMock() + + registry.register(PluginManifest(name="p1", version="1.0.0")) + registry.register(PluginManifest(name="p2", version="1.0.0")) + registry.register_hook("p1", "pre_step", shared_handler) + registry.register_hook("p2", "pre_step", shared_handler) + + registry.disable_plugin("p1") + + assert registry.get_handlers("pre_step") == [shared_handler] + + def test_unregister_preserves_shared_callable_used_by_other_plugin(self): + registry = PluginRegistry() + shared_handler = MagicMock() + + registry.register(PluginManifest(name="p1", version="1.0.0")) + registry.register(PluginManifest(name="p2", version="1.0.0")) + registry.register_hook("p1", "pre_step", shared_handler) + registry.register_hook("p2", "pre_step", shared_handler) + + registry.unregister("p1") + + assert registry.get_handlers("pre_step") == [shared_handler] + + def test_register_hook_replacing_one_plugin_handler_preserves_shared_callable(self): + registry = PluginRegistry() + shared_handler = MagicMock() + replacement_handler = MagicMock() + + registry.register(PluginManifest(name="p1", version="1.0.0")) + registry.register(PluginManifest(name="p2", version="1.0.0")) + registry.register_hook("p1", "pre_step", shared_handler) + registry.register_hook("p2", "pre_step", shared_handler) + + registry.register_hook("p1", "pre_step", replacement_handler) + + handlers = registry.get_handlers("pre_step") + assert replacement_handler in handlers + assert shared_handler in handlers diff --git a/tests/unit/test_reporting_evolution.py b/tests/unit/test_reporting_evolution.py new file mode 100644 index 0000000..c2590f3 --- /dev/null +++ b/tests/unit/test_reporting_evolution.py @@ -0,0 +1,276 @@ +from __future__ import annotations + +from unittest.mock import MagicMock + + +class TestExecutionTimelineModels: + def test_timeline_entry_model(self) -> None: + from synapse_os.reporting import TimelineEntry + + entry = TimelineEntry( + state="CODE_GREEN", + entered_at=1000.0, + duration_ms=500, + ) + assert entry.state == "CODE_GREEN" + assert entry.entered_at == 1000.0 + assert entry.duration_ms == 500 + + def test_execution_timeline_model(self) -> None: + from synapse_os.reporting import ExecutionTimeline, TimelineEntry + + entry = TimelineEntry(state="PLAN", entered_at=0.0, duration_ms=100) + timeline = ExecutionTimeline(entries=[entry]) + assert len(timeline.entries) == 1 + assert timeline.entries[0].state == "PLAN" + + +class TestAdapterMetricsModel: + def test_adapter_metrics_model(self) -> None: + from synapse_os.reporting import AdapterMetrics + + metrics = AdapterMetrics( + tool_name="codex", + total_calls=10, + success_count=8, + failure_count=2, + avg_duration_ms=1500.5, + ) + assert metrics.tool_name == "codex" + assert metrics.total_calls == 10 + assert metrics.success_count == 8 + assert metrics.failure_count == 2 + assert metrics.avg_duration_ms == 1500.5 + + +class TestStructuredErrorModel: + def test_structured_error_model(self) -> None: + from synapse_os.reporting import StructuredError + + error = StructuredError( + error_type="RetryableStepError", + message="temporary failure", + step="CODE_GREEN", + count=2, + ) + assert error.error_type == "RetryableStepError" + assert error.message == "temporary failure" + assert error.step == "CODE_GREEN" + assert error.count == 2 + + +class TestRunReportEnhancedFields: + def test_run_report_has_feature_id_and_title(self) -> None: + from synapse_os.reporting import RunReport + + report = RunReport( + run_id="test-run", + initiated_by="agent", + workspace_path="/workspace", + status="completed", + current_state="DONE", + feature_id="F64-advanced-supervisor-policies", + feature_title="Advanced Supervisor Policies", + ) + assert report.feature_id == "F64-advanced-supervisor-policies" + assert report.feature_title == "Advanced Supervisor Policies" + + def test_run_report_has_execution_timeline(self) -> None: + from synapse_os.reporting import ExecutionTimeline, RunReport, TimelineEntry + + timeline = ExecutionTimeline( + entries=[ + TimelineEntry(state="PLAN", entered_at=0.0, duration_ms=100), + TimelineEntry(state="CODE_GREEN", entered_at=0.1, duration_ms=200), + ] + ) + report = RunReport( + run_id="test-run", + initiated_by="agent", + workspace_path="/workspace", + status="completed", + current_state="DONE", + execution_timeline=timeline, + ) + assert len(report.execution_timeline.entries) == 2 + + def test_run_report_has_adapter_metrics(self) -> None: + from synapse_os.reporting import AdapterMetrics, RunReport + + metrics = [ + AdapterMetrics( + tool_name="codex", + total_calls=5, + success_count=4, + failure_count=1, + avg_duration_ms=1000.0, + ), + ] + report = RunReport( + run_id="test-run", + initiated_by="agent", + workspace_path="/workspace", + status="completed", + current_state="DONE", + adapter_metrics=metrics, + ) + assert len(report.adapter_metrics) == 1 + assert report.adapter_metrics[0].tool_name == "codex" + + def test_run_report_has_structured_errors(self) -> None: + from synapse_os.reporting import RunReport, StructuredError + + errors = [ + StructuredError( + error_type="RetryableStepError", + message="failure", + step="CODE_GREEN", + count=3, + ), + ] + report = RunReport( + run_id="test-run", + initiated_by="agent", + workspace_path="/workspace", + status="failed", + current_state="CODE_GREEN", + structured_errors=errors, + ) + assert len(report.structured_errors) == 1 + assert report.structured_errors[0].count == 3 + + +class TestGenerateStructuredReport: + def test_generate_structured_report_populates_timeline(self) -> None: + import tempfile + from pathlib import Path + + from synapse_os.reporting import RunReportGenerator + + run_record = MagicMock() + run_record.initiated_by = "agent" + run_record.workspace_path = "/workspace" + run_record.spec_hash = "abc123" + run_record.status = "completed" + run_record.current_state = "DONE" + + step_records = [ + MagicMock( + state="PLAN", + status="done", + tool_name="codex", + return_code=0, + duration_ms=100, + timed_out=False, + ), + MagicMock( + state="TEST_RED", + status="done", + tool_name="codex", + return_code=0, + duration_ms=200, + timed_out=False, + ), + ] + + event_records = [ + MagicMock( + event_type="state_entered", + state="PLAN", + message="entered PLAN", + timestamp=1000.0, + ), + MagicMock( + event_type="state_entered", + state="TEST_RED", + message="entered TEST_RED", + timestamp=1100.0, + ), + ] + + repo = MagicMock() + repo.get_run.return_value = run_record + repo.list_steps.return_value = step_records + repo.list_events.return_value = event_records + + with tempfile.TemporaryDirectory() as tmpdir: + base = Path(tmpdir) + spec_id_file = base / "test-run" / "SPEC_VALIDATION" / "spec_id.txt" + spec_id_file.parent.mkdir(parents=True) + spec_id_file.write_text("F64-advanced-supervisor-policies") + + artifact_store = MagicMock() + artifact_store.base_path = base + artifact_store.list_artifact_paths.return_value = [] + + gen = RunReportGenerator(repository=repo, artifact_store=artifact_store) + structured = gen.generate_structured_report("test-run") + assert structured.feature_id == "F64-advanced-supervisor-policies" + assert len(structured.execution_timeline.entries) == 2 + assert len(structured.execution_timeline.entries) == 2 + + def test_generate_structured_report_aggregates_adapter_metrics(self) -> None: + from pathlib import Path + + from synapse_os.reporting import RunReportGenerator + + run_record = MagicMock() + run_record.initiated_by = "agent" + run_record.workspace_path = "/workspace" + run_record.spec_hash = "hash" + run_record.status = "failed" + run_record.current_state = "CODE_GREEN" + + step_records = [ + MagicMock( + state="PLAN", + status="done", + tool_name="codex", + return_code=0, + duration_ms=100, + timed_out=False, + ), + MagicMock( + state="TEST_RED", + status="done", + tool_name="codex", + return_code=0, + duration_ms=200, + timed_out=False, + ), + MagicMock( + state="CODE_GREEN", + status="failed", + tool_name="gemini", + return_code=1, + duration_ms=300, + timed_out=False, + ), + ] + + event_records = [] + + repo = MagicMock() + repo.get_run.return_value = run_record + repo.list_steps.return_value = step_records + repo.list_events.return_value = event_records + + artifact_store = MagicMock() + artifact_store.base_path = Path("/tmp/fake") + artifact_store.list_artifact_paths.return_value = [] + + gen = RunReportGenerator(repository=repo, artifact_store=artifact_store) + structured = gen.generate_structured_report("test-run") + + codex_metrics = next( + (m for m in structured.adapter_metrics if m.tool_name == "codex"), None + ) + gemini_metrics = next( + (m for m in structured.adapter_metrics if m.tool_name == "gemini"), None + ) + assert codex_metrics is not None + assert codex_metrics.total_calls == 2 + assert codex_metrics.success_count == 2 + assert gemini_metrics is not None + assert gemini_metrics.total_calls == 1 + assert gemini_metrics.failure_count == 1 diff --git a/tests/unit/test_runtime_coordinator_hardening.py b/tests/unit/test_runtime_coordinator_hardening.py new file mode 100644 index 0000000..12bb421 --- /dev/null +++ b/tests/unit/test_runtime_coordinator_hardening.py @@ -0,0 +1,153 @@ +from __future__ import annotations + +import time +from unittest.mock import MagicMock + +import pytest + + +class TestRuntimeCoordinatorHardening: + def test_health_status_returns_healthy_when_all_circuit_breakers_closed( + self, + ) -> None: + from synapse_os.runtime.service import RuntimeCoordinator + + coordinator = RuntimeCoordinator() + coordinator.circuit_breaker_store = MagicMock() + coordinator.circuit_breaker_store.is_open.return_value = False + status = coordinator.health_status() + assert status == "HEALTHY" + + def test_health_status_returns_degraded_when_any_circuit_breaker_open(self) -> None: + from synapse_os.runtime.service import RuntimeCoordinator + + coordinator = RuntimeCoordinator() + coordinator.circuit_breaker_store = MagicMock() + coordinator.circuit_breaker_store.is_open.side_effect = ( + lambda tool: tool == "codex" + ) + status = coordinator.health_status() + assert status == "DEGRADED" + + def test_health_status_returns_unhealthy_when_multiple_circuit_breakers_open( + self, + ) -> None: + from synapse_os.runtime.service import RuntimeCoordinator + + coordinator = RuntimeCoordinator() + coordinator.circuit_breaker_store = MagicMock() + coordinator.circuit_breaker_store.is_open.return_value = True + status = coordinator.health_status() + assert status == "UNHEALTHY" + + def test_lifecycle_event_appends_to_event_log(self) -> None: + from synapse_os.runtime.service import RuntimeCoordinator + + coordinator = RuntimeCoordinator() + coordinator.lifecycle_event("runtime.starting") + coordinator.lifecycle_event("runtime.started") + assert len(coordinator.lifecycle_events) == 2 + assert coordinator.lifecycle_events[0].event == "runtime.starting" + assert coordinator.lifecycle_events[1].event == "runtime.started" + + def test_lifecycle_event_contains_timestamp(self) -> None: + from synapse_os.runtime.service import RuntimeCoordinator + + coordinator = RuntimeCoordinator() + coordinator.lifecycle_event("runtime.starting") + event = coordinator.lifecycle_events[0] + assert event.timestamp > 0 + + def test_register_cleanup_handler(self) -> None: + from synapse_os.runtime.service import RuntimeCoordinator + + coordinator = RuntimeCoordinator() + handler = MagicMock() + coordinator.register_cleanup_handler(handler) + assert handler in coordinator._cleanup_handlers + + def test_run_cleanup_handlers_calls_registered_handlers(self) -> None: + from synapse_os.runtime.service import RuntimeCoordinator + + coordinator = RuntimeCoordinator() + handler1 = MagicMock() + handler2 = MagicMock() + coordinator.register_cleanup_handler(handler1) + coordinator.register_cleanup_handler(handler2) + coordinator.run_cleanup_handlers() + handler1.assert_called_once() + handler2.assert_called_once() + + def test_run_cleanup_handlers_continues_after_handler_error(self) -> None: + from synapse_os.runtime.service import RuntimeCoordinator + + coordinator = RuntimeCoordinator() + good_handler = MagicMock() + bad_handler = MagicMock(side_effect=RuntimeError("cleanup error")) + coordinator.register_cleanup_handler(bad_handler) + coordinator.register_cleanup_handler(good_handler) + coordinator.run_cleanup_handlers() + good_handler.assert_called_once() + + def test_graceful_shutdown_calls_cleanup_then_stop(self) -> None: + from synapse_os.runtime.service import RuntimeCoordinator + + coordinator = RuntimeCoordinator() + cleanup_mock = MagicMock() + coordinator.register_cleanup_handler(cleanup_mock) + stop_mock = MagicMock() + coordinator._stop = stop_mock + coordinator.graceful_shutdown(timeout_seconds=5) + cleanup_mock.assert_called_once() + stop_mock.assert_called_once() + + def test_shutdown_respects_timeout(self) -> None: + from synapse_os.runtime.service import RuntimeCoordinator + + coordinator = RuntimeCoordinator() + slow_handler = MagicMock(side_effect=lambda: time.sleep(10)) + coordinator.register_cleanup_handler(slow_handler) + stop_mock = MagicMock() + coordinator._stop = stop_mock + start = time.monotonic() + coordinator.graceful_shutdown(timeout_seconds=0.1) + elapsed = time.monotonic() - start + assert elapsed < 1.0 + + def test_degraded_adapters_reflects_open_circuit_breakers(self) -> None: + from synapse_os.runtime.service import RuntimeCoordinator + + coordinator = RuntimeCoordinator() + coordinator.circuit_breaker_store = MagicMock() + coordinator.circuit_breaker_store.is_open.side_effect = lambda tool: ( + tool in ("codex", "gemini") + ) + coordinator.circuit_breaker_store.read.side_effect = lambda tool: ( + MagicMock( + tool_name=tool, + consecutive_operational_failures=3, + opened_at=time.time(), + cooldown_until=time.time() + 300, + ) + if tool in ("codex", "gemini") + else None + ) + degraded = coordinator.degraded_adapters + assert "codex" in degraded + assert "gemini" in degraded + + +class TestRuntimeLifecycleEvent: + def test_lifecycle_event_model_has_required_fields(self) -> None: + from synapse_os.runtime.service import RuntimeLifecycleEvent + + event = RuntimeLifecycleEvent(event="runtime.started", data={"pid": 12345}) + assert event.event == "runtime.started" + assert event.data == {"pid": 12345} + assert event.timestamp > 0 + + def test_lifecycle_event_default_data_is_empty_dict(self) -> None: + from synapse_os.runtime.service import RuntimeLifecycleEvent + + event = RuntimeLifecycleEvent(event="runtime.stopping") + assert event.data == {} diff --git a/tests/unit/test_supervisor.py b/tests/unit/test_supervisor.py index 124280d..edeb5f4 100644 --- a/tests/unit/test_supervisor.py +++ b/tests/unit/test_supervisor.py @@ -58,3 +58,99 @@ def test_supervisor_returns_to_code_green_after_review_rejection() -> None: assert decision.action == "return_to_code_green" assert decision.next_state == "CODE_GREEN" + + +def test_supervisor_marks_terminal_failure_after_security_error() -> None: + supervisor = _supervisor_module() + + decision = supervisor.Supervisor(max_retries=2).decide_after_failure( + state="SECURITY", + error=ValueError("insecure pattern"), + attempt=1, + available_routes=("primary",), + ) + + assert decision.action == "fail" + assert decision.next_state == "SECURITY" + assert decision.reason == "security_is_terminal" + + +def test_supervisor_terminal_failure_when_no_fallback_route() -> None: + supervisor = _supervisor_module() + + decision = supervisor.Supervisor(max_retries=2).decide_after_failure( + state="PLAN", + error=supervisor.RetryableStepError("failure"), + attempt=3, + available_routes=("primary",), + ) + + assert decision.action == "fail" + assert decision.next_state == "PLAN" + assert decision.reason == "terminal_failure" + + +def test_supervisor_retry_budget_exhausted_at_max_retries() -> None: + supervisor = _supervisor_module() + + decision = supervisor.Supervisor(max_retries=2).decide_after_failure( + state="CODE_GREEN", + error=supervisor.RetryableStepError("failure"), + attempt=2, + available_routes=("primary", "fallback"), + ) + + assert decision.action == "retry" + assert decision.reason == "retryable_failure_with_budget" + + +def test_supervisor_reroute_when_budget_exceeded_with_fallback() -> None: + supervisor = _supervisor_module() + + decision = supervisor.Supervisor(max_retries=2).decide_after_failure( + state="TEST_RED", + error=supervisor.RetryableStepError("failure"), + attempt=3, + available_routes=("primary", "fallback"), + ) + + assert decision.action == "reroute" + assert decision.route == "fallback" + assert decision.reason == "retry_budget_exhausted_with_fallback" + + +def test_supervisor_ignores_retryable_error_in_non_retryable_state() -> None: + supervisor = _supervisor_module() + + decision = supervisor.Supervisor(max_retries=2).decide_after_failure( + state="REVIEW", + error=supervisor.RetryableStepError("failure"), + attempt=1, + available_routes=("primary",), + ) + + assert decision.action == "fail" + assert decision.reason == "terminal_failure" + + +def test_supervisor_decision_contains_correct_reason() -> None: + supervisor = _supervisor_module() + + retry_decision = supervisor.Supervisor(max_retries=2).decide_after_failure( + state="PLAN", + error=supervisor.RetryableStepError("failure"), + attempt=1, + available_routes=("primary",), + ) + assert retry_decision.reason == "retryable_failure_with_budget" + + terminal_decision = supervisor.Supervisor(max_retries=2).decide_after_failure( + state="SPEC_VALIDATION", + error=ValueError("bad"), + attempt=1, + available_routes=("primary",), + ) + assert terminal_decision.reason == "spec_validation_is_terminal" + + review_decision = supervisor.Supervisor(max_retries=2).decide_after_review_rejection() + assert review_decision.reason == "review_requested_rework" diff --git a/tests/unit/test_supervisor_policies.py b/tests/unit/test_supervisor_policies.py new file mode 100644 index 0000000..a5aeeca --- /dev/null +++ b/tests/unit/test_supervisor_policies.py @@ -0,0 +1,272 @@ +from __future__ import annotations + +from importlib import import_module + +import pytest + + +def _supervisor_module(): + return import_module("synapse_os.supervisor") + + +class TestRetryPolicyModel: + def test_retry_policy_has_expected_fields(self) -> None: + supervisor = _supervisor_module() + policy = supervisor.RetryPolicy( + max_retries=3, base_delay_seconds=1.0, max_delay_seconds=60.0 + ) + assert policy.max_retries == 3 + assert policy.base_delay_seconds == 1.0 + assert policy.max_delay_seconds == 60.0 + + def test_retry_policy_default_values(self) -> None: + supervisor = _supervisor_module() + policy = supervisor.RetryPolicy() + assert policy.max_retries == 2 + assert policy.base_delay_seconds == 1.0 + assert policy.max_delay_seconds == 60.0 + + +class TestStepPolicyModel: + def test_step_policy_holds_retry_policy(self) -> None: + supervisor = _supervisor_module() + step_policy = supervisor.StepPolicy( + step_name="TEST_RED", + retry=supervisor.RetryPolicy(max_retries=5), + ) + assert step_policy.step_name == "TEST_RED" + assert step_policy.retry.max_retries == 5 + + +class TestSupervisorPoliciesModel: + def test_supervisor_policies_holds_default_and_overrides(self) -> None: + supervisor = _supervisor_module() + default_policy = supervisor.RetryPolicy(max_retries=2) + test_red_policy = supervisor.StepPolicy( + step_name="TEST_RED", + retry=supervisor.RetryPolicy(max_retries=5), + ) + policies = supervisor.SupervisorPolicies( + default=default_policy, + step_overrides={"TEST_RED": test_red_policy}, + ) + assert policies.default.max_retries == 2 + assert policies.step_overrides["TEST_RED"].retry.max_retries == 5 + + def test_supervisor_policies_resolves_step_specific_policy(self) -> None: + supervisor = _supervisor_module() + policies = supervisor.SupervisorPolicies() + resolved = policies.resolve_for_step("TEST_RED") + assert resolved.max_retries == 2 + + policies.step_overrides["TEST_RED"] = supervisor.StepPolicy( + step_name="TEST_RED", + retry=supervisor.RetryPolicy(max_retries=5), + ) + resolved = policies.resolve_for_step("TEST_RED") + assert resolved.max_retries == 5 + + +class TestCalculateBackoff: + def test_backoff_doubles_each_attempt(self) -> None: + supervisor = _supervisor_module() + delay = supervisor.calculate_backoff(attempt=1, base_delay=1.0, max_delay=60.0) + assert delay == 1.0 + delay = supervisor.calculate_backoff(attempt=2, base_delay=1.0, max_delay=60.0) + assert delay == 2.0 + delay = supervisor.calculate_backoff(attempt=3, base_delay=1.0, max_delay=60.0) + assert delay == 4.0 + delay = supervisor.calculate_backoff(attempt=4, base_delay=1.0, max_delay=60.0) + assert delay == 8.0 + + def test_backoff_respects_max_cap(self) -> None: + supervisor = _supervisor_module() + delay = supervisor.calculate_backoff(attempt=10, base_delay=1.0, max_delay=60.0) + assert delay == 60.0 + + def test_backoff_with_different_base(self) -> None: + supervisor = _supervisor_module() + delay = supervisor.calculate_backoff(attempt=1, base_delay=2.0, max_delay=60.0) + assert delay == 2.0 + delay = supervisor.calculate_backoff(attempt=2, base_delay=2.0, max_delay=60.0) + assert delay == 4.0 + + +class TestAdvancedSupervisorPerStepRetries: + def test_test_red_respects_own_max_retries(self) -> None: + supervisor_mod = _supervisor_module() + advanced = supervisor_mod.AdvancedSupervisor( + policies=supervisor_mod.SupervisorPolicies( + default=supervisor_mod.RetryPolicy(max_retries=2), + step_overrides={ + "TEST_RED": supervisor_mod.StepPolicy( + step_name="TEST_RED", + retry=supervisor_mod.RetryPolicy(max_retries=5), + ), + }, + ), + ) + decision = advanced.decide_after_failure( + state="TEST_RED", + error=supervisor_mod.RetryableStepError("failure"), + attempt=4, + available_routes=("primary",), + ) + assert decision.action == "retry" + + def test_plan_respects_own_max_retries(self) -> None: + supervisor_mod = _supervisor_module() + advanced = supervisor_mod.AdvancedSupervisor( + policies=supervisor_mod.SupervisorPolicies( + default=supervisor_mod.RetryPolicy(max_retries=2), + step_overrides={ + "PLAN": supervisor_mod.StepPolicy( + step_name="PLAN", + retry=supervisor_mod.RetryPolicy(max_retries=1), + ), + }, + ), + ) + decision = advanced.decide_after_failure( + state="PLAN", + error=supervisor_mod.RetryableStepError("failure"), + attempt=2, + available_routes=("primary",), + ) + assert decision.action == "fail" + assert decision.reason == "terminal_failure" + + +class TestAdvancedSupervisorTerminalSteps: + def test_security_remains_terminal(self) -> None: + supervisor_mod = _supervisor_module() + advanced = supervisor_mod.AdvancedSupervisor() + decision = advanced.decide_after_failure( + state="SECURITY", + error=ValueError("insecure"), + attempt=1, + available_routes=("primary",), + ) + assert decision.action == "fail" + assert decision.reason == "security_is_terminal" + + def test_spec_validation_remains_terminal(self) -> None: + supervisor_mod = _supervisor_module() + advanced = supervisor_mod.AdvancedSupervisor() + decision = advanced.decide_after_failure( + state="SPEC_VALIDATION", + error=ValueError("bad spec"), + attempt=1, + available_routes=("primary",), + ) + assert decision.action == "fail" + assert decision.reason == "spec_validation_is_terminal" + + +class TestAdvancedSupervisorFallbackRouting: + def test_reroutes_to_fallback_after_exhausting_primary_retries(self) -> None: + supervisor_mod = _supervisor_module() + advanced = supervisor_mod.AdvancedSupervisor( + policies=supervisor_mod.SupervisorPolicies( + default=supervisor_mod.RetryPolicy(max_retries=2), + ), + ) + decision = advanced.decide_after_failure( + state="CODE_GREEN", + error=supervisor_mod.RetryableStepError("failure"), + attempt=3, + available_routes=("primary", "fallback"), + ) + assert decision.action == "reroute" + assert decision.route == "fallback" + assert decision.reason == "retry_budget_exhausted_with_fallback" + + +class TestAdvancedSupervisorBackoffDelay: + def test_returns_backoff_delay_in_decision(self) -> None: + supervisor_mod = _supervisor_module() + advanced = supervisor_mod.AdvancedSupervisor( + policies=supervisor_mod.SupervisorPolicies( + default=supervisor_mod.RetryPolicy( + max_retries=3, base_delay_seconds=1.0, max_delay_seconds=60.0 + ), + ), + ) + decision = advanced.decide_after_failure( + state="CODE_GREEN", + error=supervisor_mod.RetryableStepError("failure"), + attempt=2, + available_routes=("primary",), + ) + assert decision.action == "retry" + assert decision.backoff_seconds == 2.0 + + def test_backoff_caps_at_max_delay(self) -> None: + supervisor_mod = _supervisor_module() + advanced = supervisor_mod.AdvancedSupervisor( + policies=supervisor_mod.SupervisorPolicies( + default=supervisor_mod.RetryPolicy( + max_retries=10, base_delay_seconds=10.0, max_delay_seconds=60.0 + ), + ), + ) + decision = advanced.decide_after_failure( + state="CODE_GREEN", + error=supervisor_mod.RetryableStepError("failure"), + attempt=10, + available_routes=("primary",), + ) + assert decision.action == "retry" + assert decision.backoff_seconds == 60.0 + + +class TestAdvancedSupervisorOperationalError: + def test_launcher_unavailable_short_circuits(self) -> None: + supervisor_mod = _supervisor_module() + advanced = supervisor_mod.AdvancedSupervisor() + op_error = supervisor_mod.AdapterOperationalError( + "launcher unavailable", + category="launcher_unavailable", + ) + decision = advanced.decide_after_failure( + state="CODE_GREEN", + error=op_error, + attempt=1, + available_routes=("primary", "fallback"), + ) + assert decision.action == "reroute" + assert decision.route == "fallback" + assert decision.reason == "operational_error_short_circuit" + + def test_other_operational_errors_still_retry(self) -> None: + supervisor_mod = _supervisor_module() + advanced = supervisor_mod.AdvancedSupervisor( + policies=supervisor_mod.SupervisorPolicies( + default=supervisor_mod.RetryPolicy(max_retries=2), + ), + ) + op_error = supervisor_mod.AdapterOperationalError( + "some error", + category="timeout", + ) + decision = advanced.decide_after_failure( + state="CODE_GREEN", + error=op_error, + attempt=1, + available_routes=("primary",), + ) + assert decision.action == "retry" + + +class TestAdvancedSupervisorDefaults: + def test_advanced_supervisor_inherits_supervisor_interface(self) -> None: + supervisor_mod = _supervisor_module() + advanced = supervisor_mod.AdvancedSupervisor() + decision = advanced.decide_after_failure( + state="CODE_GREEN", + error=supervisor_mod.RetryableStepError("failure"), + attempt=1, + available_routes=("primary",), + ) + assert decision.action == "retry" + assert decision.next_state == "CODE_GREEN" diff --git a/tests/unit/test_workspace_v2.py b/tests/unit/test_workspace_v2.py new file mode 100644 index 0000000..8286acb --- /dev/null +++ b/tests/unit/test_workspace_v2.py @@ -0,0 +1,198 @@ +from __future__ import annotations + +import pytest +from pathlib import Path +from unittest.mock import MagicMock, patch + +from synapse_os.workspace import ( + WorkspaceState, + TrackedWorkspace, + WorkspacePool, + WorkspaceManager, + PoolExhaustedError, +) + + +class TestWorkspaceState: + def test_all_states_documented(self): + assert WorkspaceState.CREATING.value == "creating" + assert WorkspaceState.READY.value == "ready" + assert WorkspaceState.BUSY.value == "busy" + assert WorkspaceState.CLEANUP.value == "cleanup" + assert WorkspaceState.DESTROYED.value == "destroyed" + + +class TestTrackedWorkspace: + def test_create_with_defaults(self, tmp_path: Path): + ws = TrackedWorkspace(root=tmp_path) + assert ws.root == tmp_path + assert ws.state == WorkspaceState.CREATING + assert ws.run_id is None + assert ws.metadata == {} + + def test_mark_ready(self, tmp_path: Path): + ws = TrackedWorkspace(root=tmp_path) + ws.mark_ready(run_id="run-1") + assert ws.state == WorkspaceState.READY + assert ws.run_id == "run-1" + + def test_mark_busy(self, tmp_path: Path): + ws = TrackedWorkspace(root=tmp_path, state=WorkspaceState.READY) + ws.mark_busy() + assert ws.state == WorkspaceState.BUSY + + def test_mark_cleanup(self, tmp_path: Path): + ws = TrackedWorkspace(root=tmp_path) + ws.mark_cleanup() + assert ws.state == WorkspaceState.CLEANUP + + def test_mark_destroyed(self, tmp_path: Path): + ws = TrackedWorkspace(root=tmp_path) + ws.mark_destroyed() + assert ws.state == WorkspaceState.DESTROYED + + def test_reset_for_reuse(self, tmp_path: Path): + ws = TrackedWorkspace(root=tmp_path, run_id="run-1", state=WorkspaceState.BUSY) + ws.reset_for_reuse() + assert ws.state == WorkspaceState.CREATING + assert ws.run_id is None + assert ws.metadata == {} + + def test_metadata_get_set(self, tmp_path: Path): + ws = TrackedWorkspace(root=tmp_path) + ws.set_metadata("key", "value") + assert ws.get_metadata("key") == "value" + assert ws.get_metadata("missing") is None + + +class TestWorkspacePool: + def test_create_pool(self, tmp_path: Path): + pool = WorkspacePool(base_dir=tmp_path, max_size=3) + assert pool.max_size == 3 + assert pool.acquired_count == 0 + + def test_acquire_returns_tracked_workspace(self, tmp_path: Path): + pool = WorkspacePool(base_dir=tmp_path, max_size=2) + ws = pool.acquire("run-1") + assert isinstance(ws, TrackedWorkspace) + assert ws.run_id == "run-1" + assert ws.state == WorkspaceState.READY + assert pool.acquired_count == 1 + + def test_acquire_creates_directory(self, tmp_path: Path): + pool = WorkspacePool(base_dir=tmp_path, max_size=1) + ws = pool.acquire("run-1") + assert ws.root.exists() + + def test_acquire_exhausted_raises(self, tmp_path: Path): + pool = WorkspacePool(base_dir=tmp_path, max_size=1) + pool.acquire("run-1") + with pytest.raises(PoolExhaustedError): + pool.acquire("run-2") + + def test_release_returns_to_pool(self, tmp_path: Path): + pool = WorkspacePool(base_dir=tmp_path, max_size=1) + ws = pool.acquire("run-1") + pool.release(ws) + assert pool.acquired_count == 0 + assert pool.idle_count == 1 + + def test_release_resets_workspace(self, tmp_path: Path): + pool = WorkspacePool(base_dir=tmp_path, max_size=1) + ws = pool.acquire("run-1") + ws.set_metadata("key", "value") + pool.release(ws) + assert ws.run_id is None + assert ws.state == WorkspaceState.READY + + def test_idle_workspaces_tracked(self, tmp_path: Path): + pool = WorkspacePool(base_dir=tmp_path, max_size=2) + ws1 = pool.acquire("run-1") + ws2 = pool.acquire("run-2") + pool.release(ws1) + assert pool.idle_count == 1 + assert ws1 in pool.idle_workspaces + + def test_discard_removes_from_pool(self, tmp_path: Path): + pool = WorkspacePool(base_dir=tmp_path, max_size=1) + ws = pool.acquire("run-1") + pool.discard(ws) + assert pool.acquired_count == 0 + assert pool.idle_count == 0 + + def test_discard_cleans_directory(self, tmp_path: Path): + pool = WorkspacePool(base_dir=tmp_path, max_size=1) + ws = pool.acquire("run-1") + (ws.root / "file.txt").write_text("data") + pool.discard(ws) + assert not ws.root.exists() + + def test_stats(self, tmp_path: Path): + pool = WorkspacePool(base_dir=tmp_path, max_size=3) + ws1 = pool.acquire("run-1") + ws2 = pool.acquire("run-2") + pool.release(ws1) + stats = pool.stats() + assert stats["total"] == 3 + assert stats["acquired"] == 1 + assert stats["idle"] == 1 + assert stats["discarded"] == 1 + + +class TestWorkspaceManager: + def test_create_workspace(self, tmp_path: Path): + mgr = WorkspaceManager(base_dir=tmp_path, pool_size=2) + ws = mgr.create_workspace("run-1") + assert isinstance(ws, TrackedWorkspace) + assert ws.run_id == "run-1" + assert ws.state == WorkspaceState.READY + + def test_cleanup_workspace_calls_hook(self, tmp_path: Path): + mgr = WorkspaceManager(base_dir=tmp_path, pool_size=1) + hook_called = [] + mgr.register_cleanup_hook(lambda path: hook_called.append(path)) + ws = mgr.create_workspace("run-1") + mgr.cleanup_workspace(ws) + assert len(hook_called) == 1 + assert hook_called[0] == ws.root + + def test_cleanup_sets_state_to_cleanup(self, tmp_path: Path): + mgr = WorkspaceManager(base_dir=tmp_path, pool_size=1) + ws = mgr.create_workspace("run-1") + mgr.cleanup_workspace(ws) + assert ws.state == WorkspaceState.CLEANUP + + def test_get_workspace_returns_cached(self, tmp_path: Path): + mgr = WorkspaceManager(base_dir=tmp_path, pool_size=1) + ws1 = mgr.create_workspace("run-1") + ws2 = mgr.get_workspace("run-1") + assert ws1 is ws2 + + def test_get_workspace_unknown_returns_none(self, tmp_path: Path): + mgr = WorkspaceManager(base_dir=tmp_path, pool_size=1) + assert mgr.get_workspace("unknown") is None + + def test_list_workspaces(self, tmp_path: Path): + mgr = WorkspaceManager(base_dir=tmp_path, pool_size=2) + mgr.create_workspace("run-1") + mgr.create_workspace("run-2") + workspaces = mgr.list_workspaces() + assert len(workspaces) == 2 + + def test_pool_size_respected(self, tmp_path: Path): + mgr = WorkspaceManager(base_dir=tmp_path, pool_size=2) + for i in range(1, 4): + try: + mgr.create_workspace(f"run-{i}") + except PoolExhaustedError: + pass + assert len(mgr.list_workspaces()) == 2 + + def test_cleanup_all(self, tmp_path: Path): + mgr = WorkspaceManager(base_dir=tmp_path, pool_size=2) + mgr.create_workspace("run-1") + mgr.create_workspace("run-2") + called = [] + mgr.register_cleanup_hook(lambda p: called.append(p)) + mgr.cleanup_all() + assert len(called) == 2