Add preview agentic self-healing deployment workflow

## Problem

Deterministic deploy recovery will cover the most common, well-understood failure modes, but it will never cover every local Docker and Azure deployment edge case.

After the CLI has:
- a stronger deploy-time recovery baseline
- Copilot SDK integration
- and AI-powered diagnosis with structured deployment context

there should be a **preview self-healing workflow** that can inspect deployment state, choose from explicit repair tools, apply bounded remediations, and either recover the deployment or stop with precise next steps.

## Proposed Solution

Add a **preview agentic self-healing deployment workflow** built on the GitHub Copilot SDK.

This is **not** a freeform shell agent. The model should orchestrate a small set of explicit, typed repair tools implemented in Go, with approval boundaries and an audit trail.

Possible command surfaces:

```bash
gh devlake repair
gh devlake diagnose --fix
gh devlake deploy local --self-heal=preview
```

Final command naming can be decided during implementation, but the behavior should be the same: inspect -> choose bounded repair -> apply -> retry -> report.

## Architecture

### Layering

1. **Deterministic recovery first** — rely on the deploy-time classifier and bounded-retry groundwork from #142.
2. **Diagnosis second** — reuse the structured health/connection/pipeline context from `gh devlake diagnose`.
3. **Agentic repair third** — let Copilot decide among explicit repair tools and approval-gated actions.

### Repair model

- Reuse the `internal/copilot/` foundation from #63.
- Add a repair-oriented tool surface rather than generic shell execution.
- Require approval for mutating actions.
- Keep every repair step auditable in CLI output.
- Stop after bounded attempts; when confidence is low, fall back to diagnosis and recommended commands.

### Candidate repair tools

```go
inspect_local_port_conflicts
rewrite_local_ports_to_alt_bundle
cleanup_partial_local_artifacts
retry_local_compose_up
check_azure_prereqs
start_mysql_if_stopped
purge_soft_deleted_key_vault
rerun_bicep_deploy
collect_deploy_logs
```

These tool names are illustrative, but the important constraint is that each one is:
- explicit
- typed
- narrow in side effects
- implemented in deterministic Go code

### Safety boundaries

- No arbitrary shell planning/execution by the model.
- No silent destructive cleanup.
- Mutating steps must either:
  - require confirmation, or
  - be explicitly classified as safe/idempotent in code.
- Repairs must log what changed and why.
- Bounded retry only; no open-ended loops.

## Likely Files

| File | Change |
|------|--------|
| `cmd/diagnose.go` or `cmd/repair.go` | Preview repair command surface |
| `internal/copilot/` | Repair-oriented session/tool orchestration |
| `internal/repair/` | Deterministic repair helpers and safety boundaries |
| `cmd/deploy_local.go` / `cmd/deploy_azure.go` | Shared recovery primitives consumed by repair tools |
| `README.md` / docs | Preview workflow, safety model, and examples |

## Acceptance Criteria

- [ ] A preview repair workflow exists (`repair` or `diagnose --fix`).
- [ ] The workflow uses explicit repair tools, not arbitrary shell execution.
- [ ] The model can inspect deployment state and select from bounded repair actions.
- [ ] Mutating repairs are approval-gated or explicitly safe/idempotent by implementation.
- [ ] Repair output includes an audit trail of the attempted fixes and retry results.
- [ ] When repair confidence is low or no safe action exists, the workflow stops and prints precise next steps instead of guessing.
- [ ] `go build ./...`, `go test ./...`, and `go vet ./...` pass.
- [ ] README/docs clearly describe the preview status and safety boundaries.

## Dependencies

Blocked by:
- #142 — deterministic deploy recovery groundwork
- #63 — Copilot SDK integration + `gh devlake insights`
- #64 — `gh devlake diagnose`

## Target Version

**v0.4.4** — preview work within the active v0.4.x line once the deterministic recovery and AI diagnosis foundations are in place.

## References

- #142 — deterministic deploy recovery groundwork
- #63 — Copilot SDK integration
- #64 — AI-powered diagnosis
- `internal/copilot/` — future shared SDK/session foundation


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add preview agentic self-healing deployment workflow #143

Problem

Proposed Solution

Architecture

Layering

Repair model

Candidate repair tools

Safety boundaries

Likely Files

Acceptance Criteria

Dependencies

Target Version

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

File	Change
`cmd/diagnose.go` or `cmd/repair.go`	Preview repair command surface
`internal/copilot/`	Repair-oriented session/tool orchestration
`internal/repair/`	Deterministic repair helpers and safety boundaries
`cmd/deploy_local.go` / `cmd/deploy_azure.go`	Shared recovery primitives consumed by repair tools
`README.md` / docs	Preview workflow, safety model, and examples

Add preview agentic self-healing deployment workflow #143

Description

Problem

Proposed Solution

Architecture

Layering

Repair model

Candidate repair tools

Safety boundaries

Likely Files

Acceptance Criteria

Dependencies

Target Version

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions