Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 36 additions & 0 deletions CLAUDE_OBSERVABILITY.md
Original file line number Diff line number Diff line change
Expand Up @@ -328,6 +328,42 @@ The event data provides detailed insights into Claude Code interactions:

**Performance Monitoring**: Track API request durations and tool execution times to identify performance bottlenecks.

### Derived Metrics

Some useful metrics can be calculated from the raw telemetry data:

#### Context Window Utilization

Calculate the percentage of context window used per API request. This helps identify when sessions are approaching context limits.

**From Loki (per-request granularity):**

```logql
# Average context window utilization % (assumes 200k token context)
avg_over_time(
{service_name="claude-code"} |= "claude_code.api_request"
| json
| unwrap input_tokens [$__interval]
) / 200000 * 100
```

**Attributes available for filtering:**
* `model`: Filter by model (context limits vary by model)
* `session_id`: Track utilization per session
* `organization_id`: Aggregate by organization
* `user_account_uuid`: Track per-user patterns

**Context window limits by model:**
| Model | Context Window |
| ----- | -------------- |
| claude-opus-4-5-20251101 | 200,000 tokens |
| claude-sonnet-4-20250514 | 200,000 tokens |
| claude-haiku-4-5-20251001 | 200,000 tokens |

<Note>
High context utilization (>80%) may indicate sessions at risk of hitting context limits. Consider monitoring for alerts when utilization exceeds thresholds.
</Note>

## Backend Considerations

Your choice of metrics and logs backends will determine the types of analyses you can perform:
Expand Down
92 changes: 82 additions & 10 deletions claude-code-dashboard.json
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@
},
"gridPos": {
"h": 4,
"w": 6,
"w": 5,
"x": 0,
"y": 1
},
Expand All @@ -86,13 +86,13 @@
"type": "prometheus",
"uid": "prometheus"
},
"expr": "sum(increase(claude_code_session_count_total{job=\"otel-collector\"}[1h]))",
"expr": "count(count by (session_id)(claude_code_cost_usage_USD_total{job=\"otel-collector\"})) or vector(0)",
"interval": "",
"legendFormat": "Sessions (1h)",
"legendFormat": "Sessions",
"refId": "A"
}
],
"title": "Active Sessions (1h)",
"title": "Total Sessions",
"type": "stat"
},
{
Expand Down Expand Up @@ -129,8 +129,8 @@
},
"gridPos": {
"h": 4,
"w": 6,
"x": 6,
"w": 5,
"x": 5,
"y": 1
},
"id": 2,
Expand Down Expand Up @@ -195,8 +195,8 @@
},
"gridPos": {
"h": 4,
"w": 6,
"x": 12,
"w": 5,
"x": 10,
"y": 1
},
"id": 3,
Expand Down Expand Up @@ -261,8 +261,8 @@
},
"gridPos": {
"h": 4,
"w": 6,
"x": 18,
"w": 5,
"x": 15,
"y": 1
},
"id": 4,
Expand Down Expand Up @@ -293,6 +293,78 @@
"title": "Lines of Code (1h)",
"type": "stat"
},
{
"datasource": {
"type": "loki",
"uid": "loki"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "yellow",
"value": 50
},
{
"color": "orange",
"value": 70
},
{
"color": "red",
"value": 85
}
]
},
"unit": "percent",
"min": 0,
"max": 100
},
"overrides": []
},
"gridPos": {
"h": 4,
"w": 4,
"x": 20,
"y": 1
},
"id": 20,
"options": {
"colorMode": "background",
"graphMode": "area",
"justifyMode": "center",
"orientation": "auto",
"reduceOptions": {
"values": false,
"calcs": ["lastNotNull"],
"fields": ""
},
"textMode": "auto"
},
"targets": [
{
"datasource": {
"type": "loki",
"uid": "loki"
},
"expr": "max_over_time({service_name=\"claude-code\"} |= \"claude_code.api_request\" | json | unwrap input_tokens [1h]) / 200000 * 100",
"interval": "",
"legendFormat": "Context %",
"refId": "A"
}
],
"title": "Context Window (Max 1h)",
"type": "stat"
},
{
"title": "💰 Cost & Usage Analysis",
"type": "row",
Expand Down