-
Notifications
You must be signed in to change notification settings - Fork 4
Description
Problem
When something goes wrong with a DevLake deployment, users must manually:
- Run
gh devlake statusand read the output - Test each connection individually
- Check pipeline logs in the Config UI
- Correlate error messages across services
There's no single command that inspects the entire stack, identifies problems, and explains what's wrong with actionable remediation steps.
Proposed Solution
Add gh devlake diagnose ΓÇö an AI-powered diagnostic command that runs all health checks, connection tests, and pipeline inspections, then synthesizes a diagnosis with remediation commands.
Command surface
# Full diagnostic
gh devlake diagnose
# Focus on a specific area
gh devlake diagnose --scope connections
gh devlake diagnose --scope pipelinesHow it works
-
Gather data ΓÇö run all checks programmatically (no user interaction needed):
- Ping all endpoints (backend, Config UI, Grafana)
- Test all saved connections across all plugins
- Fetch recent pipeline runs and their error messages
- Check DB connectivity
- Read state file for deployment context
-
Send to Copilot SDK ΓÇö package all results into a structured context and send to the LLM with a diagnostic prompt
-
Stream diagnosis ΓÇö the LLM synthesizes findings into plain-language explanation with actionable
gh devlakecommands
Example output
$ gh devlake diagnose
🔍 Running diagnostics...
✅ Backend API: http://localhost:8080 (healthy)
✅ Config UI: http://localhost:4000 (healthy)
✅ Grafana: http://localhost:3002 (healthy)
❌ Connection "GitHub - my-org" (github, id=1): 401 Unauthorized
✅ Connection "Copilot - my-ent" (gh-copilot, id=2): healthy
⚠️ Pipeline #12: FAILED (2 hours ago)
📋 Diagnosis:
Your GitHub connection "GitHub - my-org" is returning 401 Unauthorized.
This typically means the PAT has expired or been revoked.
To fix:
1. Generate a new PAT with scopes: repo, read:org, read:user
2. Update the connection:
gh devlake configure connection update --plugin github --id 1 --token ghp_NEW_TOKEN
Pipeline #12 failed because it depends on this connection.
After updating the token, re-trigger collection:
gh devlake configure project add --project-name my-team
Architecture
Reuses the internal/copilot/ package from #63. Adds diagnostic-specific tools:
// Tool: test_all_connections
// Batch-tests every connection across all plugins and returns results
var testAllConnectionsTool = copilot.DefineTool("test_all_connections",
"Test all saved DevLake connections and return pass/fail status for each",
func(params struct{}, inv copilot.ToolInvocation) (any, error) {
client := devlake.NewClient(apiURL)
var results []ConnectionTestResult
for _, def := range connectionRegistry {
conns, _ := client.ListConnections(def.Plugin)
for _, conn := range conns {
test, _ := client.TestSavedConnection(def.Plugin, conn.ID)
results = append(results, ConnectionTestResult{
Plugin: def.Plugin, ID: conn.ID, Name: conn.Name,
Healthy: test.Success, Message: test.Message,
})
}
}
return results, nil
})
// Tool: get_recent_pipeline_errors
// Fetches recent failed pipelines with error details
var getRecentPipelineErrorsTool = copilot.DefineTool("get_recent_pipeline_errors",
"Get recent failed DevLake pipeline runs with error messages and timestamps",
func(params struct{ Limit int `json:"limit,omitempty"` }, inv copilot.ToolInvocation) (any, error) {
// ... fetch pipelines, filter for failures, include error details ...
})
// Tool: check_all_endpoints
// Pings backend, Config UI, Grafana and returns status for each
var checkEndpointsTool = copilot.DefineTool("check_all_endpoints",
"Check health of all DevLake endpoints (backend API, Config UI, Grafana)",
func(params struct{}, inv copilot.ToolInvocation) (any, error) {
// ... ping each endpoint from state file or discovery ...
})Output mode
Unlike insights (which streams), diagnose uses batch mode: collect the full response, then render with the CLI's standard emoji/box-drawing formatting. This ensures the diagnostic output has consistent visual structure.
// Wait for full response instead of streaming
response, err := session.SendAndWait(ctx, copilot.MessageOptions{
Prompt: diagnosticPrompt,
})
// Format and print with standard CLI output conventionsSystem prompt for diagnosis
The system message includes:
- DevLake architecture context (three-layer model, plugin structure)
- Available
gh devlakecommands for remediation - Common failure patterns and their fixes
- The user's deployment type (local vs Azure) from the state file
Files to create/modify
| File | Change |
|---|---|
cmd/diagnose.go |
NEW ΓÇö gh devlake diagnose command |
internal/copilot/tools.go |
ADD ΓÇö diagnostic-specific tools (test_all_connections, get_recent_pipeline_errors, check_all_endpoints) |
internal/copilot/system.go |
ADD ΓÇö diagnostic system prompt variant |
Acceptance Criteria
-
gh devlake diagnosegathers all health/connection/pipeline data and produces a synthesis -
--scope connectionslimits diagnosis to connection health only -
--scope pipelineslimits diagnosis to pipeline failures only - Diagnosis includes actionable
gh devlakecommands for remediation - Graceful error if Copilot CLI is not installed (same as
insights) - Diagnostic data gathering works even if some endpoints are down (partial results)
- Output uses batch mode with standard CLI formatting (not streaming)
-
go build ./...andgo test ./...pass - README updated
Target Version
v0.4.3 ΓÇö AI-powered operations within the active v0.4.x line.
Dependencies
- Integrate Copilot SDK (Go) —
internal/copilotpackage +gh devlake insights#63 ΓÇö Copilot SDK integration (internal/copilot/package, SDK dependency) - Add
gh devlake querycommand with extensible query engine #62 ΓÇö query engine (for pipeline/metric data) - Add
--jsonoutput flag to read commands #60 ΓÇö--jsonoutput flag (for--jsonmode if desired)
References
- Copilot SDK (Go):
github/copilot-sdk/goΓÇöDefineTool,SendAndWait(batch mode), system messagesgo/README.mdΓÇö full API referencego/definetool.goΓÇö type-safe tool definitionsgo/session.goΓÇöSend,SendAndWait, event handling
- Copilot SDK overview:
github/copilot-sdkΓÇö architecture, auth, custom tools - DevLake API patterns:
apache/incubator-devlake/AGENTS.mdΓÇö plugin structure, API routes cmd/status.goΓÇö existing health check logic to reusecmd/configure_connection_test_cmd.goΓÇö existing connection test logicinternal/devlake/client.goΓÇöHealth(),TestSavedConnection(),ListConnections(),GetPipeline()internal/copilot/ΓÇö shared SDK client from Integrate Copilot SDK (Go) —internal/copilotpackage +gh devlake insights#63