Commit 4c2b7a5
feat(metrics): implement Prometheus observability (#45)
* feat(metrics): implement Prometheus observability with dedicated server
Replace generateRuntimeMetrics() with prometheus/client_golang and add
flexible metrics server architecture supporting same-port or dedicated
port deployment.
Changes:
- Add internal/metrics package with custom Prometheus registry
- Configurable metrics port via --metrics-port flag (default: 8084)
- Two-server architecture with proper WaitGroup coordination
- Graceful shutdown for both main and metrics servers
- Export kagent_tools_mcp_server_info (version metadata)
- Export kagent_tools_mcp_registered_tools (tool providers)
- Include Go runtime metrics (goroutines, memory, GC stats)
- Include process metrics (CPU, memory, file descriptors)
Architecture improvement: Move http.Server instantiation outside
goroutines to prevent race condition between assignment and shutdown.
Test coverage: 5 unit tests validating registry, collectors, and metrics.
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: MatteoMori <morimatteo14@gmail.com>
* feat(metrics): auto-register tool metrics using ListTools() diff
Use MCPServer.ListTools() to automatically detect which tools each
provider registers, eliminating the need to modify individual tool
packages.
The approach snapshots the tool list before and after each provider's
RegisterTools() call, then records the newly added tools in Prometheus
with the correct tool_provider label.
This means:
- Zero changes required in any pkg/ file
- Future tools are automatically tracked
- No risk of forgetting to add a metric for a new tool
Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: MatteoMori <morimatteo14@gmail.com>
* feat(metrics): instrument tool handlers with invocation counters
Add kagent_tools_mcp_invocations_total and
kagent_tools_mcp_invocations_failure_total counters using the
wrapper/middleware pattern. All handlers are centrally instrumented
in wrapToolHandlersWithMetrics with zero changes to pkg/ files.
Update README with Observability section and CLI flags reference.
Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: MatteoMori <morimatteo14@gmail.com>
* feat(observability): add Helm chart support and Grafana dashboard
Add comprehensive Prometheus Operator integration via Helm chart:
- ServiceMonitor resource for automatic target discovery
- Dedicated metrics service (kagent-tools-metrics)
- Deployment args for --metrics-port configuration
- Configurable scrape interval, timeout, and labels
Include Grafana dashboard with 8 panels visualizing:
- Server version and health metrics
- Tool invocation rates by provider
- Success/failure rates and trends
- Top invoked tools table with heat mapping
Add CLAUDE.md with architecture documentation covering:
- Tool provider pattern and MCP server lifecycle
- Observability architecture (metrics wrapper pattern)
- Development commands and key implementation patterns
- Helm chart structure and troubleshooting guide
Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: MatteoMori <morimatteo14@gmail.com>
* fix(metrics): default metrics-port to 0 (same as --port)
Previously --metrics-port defaulted to 8084, causing a mismatch when
the server ran on any other port (e.g. E2E tests use port 18190). The
metrics server would start on 8084 instead of sharing the main port,
so /metrics was unreachable at the expected address.
Change the default to 0, resolved at runtime as "same as --port".
Update Helm templates to fall back to the main targetPort when
tools.metrics.port is unset.
Signed-off-by: MatteoMori <morimatteo14@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
* fix(metrics): count result.IsError as invocation failure
The failure counter previously only incremented on non-nil Go errors.
Handlers in this codebase signal tool-level failures by returning
NewToolResultError(...), nil — result.IsError=true, err=nil — a pattern
used 214 times across pkg/. This meant the failure metric was always 0
for tool-level errors.
Fix the wrapper condition to check both:
err != nil || (result != nil && result.IsError)
Add three tests in cmd/metrics_wrap_test.go:
- IsError=true increments failure counter (regression test)
- Successful call does not increment failure counter
- Real Go error increments failure counter
Remove CLAUDE.md from the repository.
Signed-off-by: MatteoMori <morimatteo14@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
---------
Signed-off-by: MatteoMori <morimatteo14@gmail.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>1 parent eaaefde commit 4c2b7a5
12 files changed
Lines changed: 1516 additions & 60 deletions
File tree
- cmd
- dashboard
- helm/kagent-tools
- templates
- internal/metrics
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
188 | 188 | | |
189 | 189 | | |
190 | 190 | | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
191 | 203 | | |
192 | 204 | | |
193 | | - | |
| 205 | + | |
194 | 206 | | |
195 | 207 | | |
196 | 208 | | |
| |||
243 | 255 | | |
244 | 256 | | |
245 | 257 | | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
246 | 277 | | |
247 | 278 | | |
248 | 279 | | |
| |||
258 | 289 | | |
259 | 290 | | |
260 | 291 | | |
261 | | - | |
262 | | - | |
263 | | - | |
| 292 | + | |
| 293 | + | |
264 | 294 | | |
265 | 295 | | |
266 | 296 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
| 11 | + | |
11 | 12 | | |
12 | 13 | | |
13 | 14 | | |
14 | 15 | | |
15 | 16 | | |
16 | 17 | | |
17 | 18 | | |
| 19 | + | |
18 | 20 | | |
19 | 21 | | |
20 | 22 | | |
| |||
25 | 27 | | |
26 | 28 | | |
27 | 29 | | |
| 30 | + | |
28 | 31 | | |
29 | 32 | | |
30 | 33 | | |
31 | 34 | | |
32 | 35 | | |
| 36 | + | |
33 | 37 | | |
34 | 38 | | |
35 | 39 | | |
36 | 40 | | |
37 | 41 | | |
| 42 | + | |
38 | 43 | | |
39 | 44 | | |
40 | 45 | | |
| |||
56 | 61 | | |
57 | 62 | | |
58 | 63 | | |
| 64 | + | |
59 | 65 | | |
60 | 66 | | |
61 | 67 | | |
| |||
92 | 98 | | |
93 | 99 | | |
94 | 100 | | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
95 | 106 | | |
96 | 107 | | |
97 | 108 | | |
| |||
134 | 145 | | |
135 | 146 | | |
136 | 147 | | |
137 | | - | |
138 | | - | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
139 | 153 | | |
140 | 154 | | |
141 | 155 | | |
| |||
146 | 160 | | |
147 | 161 | | |
148 | 162 | | |
| 163 | + | |
149 | 164 | | |
150 | 165 | | |
151 | 166 | | |
| |||
170 | 185 | | |
171 | 186 | | |
172 | 187 | | |
173 | | - | |
174 | | - | |
175 | | - | |
176 | | - | |
177 | | - | |
178 | | - | |
179 | | - | |
180 | | - | |
181 | | - | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
182 | 199 | | |
183 | | - | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
184 | 222 | | |
185 | 223 | | |
186 | 224 | | |
| |||
229 | 267 | | |
230 | 268 | | |
231 | 269 | | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
232 | 283 | | |
233 | 284 | | |
234 | 285 | | |
| |||
242 | 293 | | |
243 | 294 | | |
244 | 295 | | |
245 | | - | |
246 | | - | |
247 | | - | |
248 | | - | |
249 | | - | |
250 | | - | |
251 | | - | |
252 | | - | |
253 | | - | |
254 | | - | |
255 | | - | |
256 | | - | |
257 | | - | |
258 | | - | |
259 | | - | |
260 | | - | |
261 | | - | |
262 | | - | |
263 | | - | |
264 | | - | |
265 | | - | |
266 | | - | |
267 | | - | |
268 | | - | |
269 | | - | |
270 | | - | |
271 | | - | |
272 | | - | |
273 | | - | |
274 | | - | |
275 | | - | |
276 | | - | |
277 | | - | |
278 | | - | |
279 | | - | |
280 | | - | |
281 | | - | |
282 | | - | |
283 | | - | |
284 | | - | |
285 | | - | |
286 | 296 | | |
287 | 297 | | |
288 | 298 | | |
| |||
291 | 301 | | |
292 | 302 | | |
293 | 303 | | |
294 | | - | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
295 | 309 | | |
296 | 310 | | |
297 | 311 | | |
| |||
310 | 324 | | |
311 | 325 | | |
312 | 326 | | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
313 | 332 | | |
314 | 333 | | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
315 | 340 | | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
316 | 351 | | |
317 | 352 | | |
318 | 353 | | |
319 | 354 | | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
| 403 | + | |
| 404 | + | |
| 405 | + | |
320 | 406 | | |
0 commit comments