Skip to content

agodianel/Trace-Lens-Linux

TraceLens

CI Python 3.11+ License: MIT

TraceLens banner

Understand what your system is doing, why it failed, and what changed — from one tool.

TraceLens is a local-first diagnostics platform for Linux and macOS. It captures system evidence, diagnoses incidents, generates reports, and provides a browser dashboard — all without requiring any cloud service or AI API key.

At a glance

  • Capture logs, kernel messages, service state, resource pressure, and package changes into a structured case
  • Diagnose common incident patterns offline, then optionally enrich results with AI
  • Generate Markdown, HTML, JSON, or text reports for sharing and archiving
  • Explore cases in a browser with Overview, Timeline, Services, Reports, Diff, Terminal, and AI Insights pages

The Problem

Linux debugging is powerful but fragmented. When something goes wrong — a crash, a hang, a slow boot, a service that keeps restarting — you're left jumping between a dozen disconnected tools:

You just got a kernel panic or freeze

You reboot. Now what? You run journalctl -b -1 hoping the previous boot's logs survived. You scan dmesg for hardware errors. You check which services failed. You cross-reference timestamps manually. You try to remember if you updated any packages yesterday. There's no single place to see what happened.

A systemd service keeps dying

You run systemctl status myservice. Failed. You run journalctl -u myservice. Wall of text. Was it an OOM kill? A dependency that isn't starting? A config change? You check dmesg for memory pressure. You check disk space. You check if something else restarted at the same time. Every clue is in a different tool with a different interface.

Your system is mysteriously slow

Load average is high. But why? You open top — nothing obvious. Check iostat — disk seems fine. Check journalctl — thousands of lines. Is a service flapping? Is the kernel warning about something? Did a recent update change something? You spend 30 minutes detective-working across tools before you even form a hypothesis.

You need to explain what happened to someone else

Your server went down. Your team asks what happened. You paste journalctl output into a chat. Then dmesg. Then systemctl list-units --failed. Then some df and free output. There's no structured incident report. Just scattered terminal output.

You want to compare "before it broke" to "after"

Something changed. A package update, a config edit, a kernel upgrade. But you have no baseline. You can't diff your system state from yesterday against today. Linux doesn't snapshot its own diagnostic state for you.

You run a homelab and things break while you sleep

Services restart. Disks fill up. Network hiccups cause cascading failures. You wake up and everything looks fine now, but something clearly happened at 3am. Without continuous monitoring, transient failures leave no trace.


What TraceLens Actually Does

TraceLens unifies the fragmented Linux/macOS debugging experience into one coherent workflow:

┌──────────────────────────────────────────────────────────────┐
│                        Your System                           │
│                                                              │
│  journalctl ─┐                                               │
│  dmesg ──────┤                                               │
│  systemctl ──┤                                               │
│  /proc ──────┼──► TraceLens ──► Structured    ──► Diagnosis  │
│  package mgr ┤                  Evidence           Report    │
│  boot info ──┤                  Case               Dashboard │
│  disk/mem ───┘                  Timeline            Diff     │
│                                                              │
└──────────────────────────────────────────────────────────────┘

One capture command collects evidence from all sources into a structured case. One diagnosis command analyzes that evidence and tells you what's likely wrong. One report command produces a clean, shareable document. One dashboard lets you explore everything visually in your browser.


Real Problems This Solves

Problem Without TraceLens With TraceLens
Service keeps crashing Manually cross-reference systemctl, journalctl, dmesg tracelens diagnose detects restart loops and correlates with OOM/resource pressure
Slow boot Run systemd-analyze blame, guess which services are slow tracelens capture --boot current captures full boot evidence with timeline
System froze yesterday Hope logs survived, manually search previous boot tracelens diagnose --boot previous analyzes previous boot automatically
Need to explain an outage Copy-paste terminal output into Slack tracelens report latest --format md generates a structured incident report
Something changed after update No baseline to compare against tracelens diff case-a case-b shows what changed between captures
Recurring 3am failures No visibility without monitoring setup tracelens service enable runs background capture that catches transient issues
Disk filling up slowly Notice when it's too late Diagnosis engine flags near-full disks and growth trends
OOM kills happening silently Buried in kernel logs Kernel collector surfaces OOM events and correlates with affected services

Quick Start

Install

# Install uv if needed
curl -LsSf https://astral.sh/uv/install.sh | sh

# Clone the repo
git clone https://github.com/agodianel/Trace-Lens-Linux.git
cd Trace-Lens-Linux

# Install dependencies
uv sync --extra dev

# Run TraceLens (from inside the repo)
uv run tracelens doctor

All tracelens commands are run via uv run tracelens ... from the project directory.

To install it globally so tracelens works from anywhere:

# Install as a global tool from the local source
uv tool install -e /path/to/Trace-Lens-Linux

Linux gets the most complete support today, especially on systemd-based distributions. macOS is supported automatically with unified logs, launchd service discovery, Homebrew package detection, and native storage paths. Background service mode remains systemd-only.

Check your environment

uv run tracelens doctor
╭─ TraceLens Doctor ────────────────────────╮
│ ✓ Python 3.11+                            │
│ ✓ journalctl accessible                   │
│ ✓ systemctl accessible                    │
│ ✓ dmesg accessible                        │
│ ✓ Data directory writable                 │
│ ✗ AI provider: not configured (optional)  │
│                                           │
│ Status: Ready                             │
╰───────────────────────────────────────────╯

Capture system evidence

# Capture current state
uv run tracelens capture

# Capture last hour
uv run tracelens capture --since "1 hour ago"

# Capture specific boot
uv run tracelens capture --boot previous

# Capture specific service
uv run tracelens capture --unit docker.service

Diagnose issues

# Diagnose current system
uv run tracelens diagnose

# Diagnose a captured case
uv run tracelens diagnose latest

# Diagnose previous boot
uv run tracelens diagnose --boot previous
╭─ Diagnosis ──────────────────────────────────────────╮
│                                                      │
│ Severity: WARNING                                    │
│                                                      │
│ Findings:                                            │
│  ⚠ Service restart loop: docker.service              │
│    Restarted 4 times in 15 minutes                   │
│    Correlated with memory pressure spike at 14:32    │
│                                                      │
│  ⚠ Near-full filesystem: /var (91% used)             │
│    Growth rate suggests full in ~3 days               │
│                                                      │
│  ℹ 12 kernel warnings (nouveau driver)               │
│    Recurring GPU timeout — likely driver issue        │
│                                                      │
│ Suggested commands:                                  │
│  journalctl -u docker.service --since "14:00"        │
│  df -h /var                                          │
│  dmesg | grep nouveau                               │
│                                                      │
╰──────────────────────────────────────────────────────╯

Generate a report

uv run tracelens report latest --format md
uv run tracelens report latest --format html
uv run tracelens report latest --format json

Open the dashboard

uv run tracelens ui --open

Opens a local browser dashboard with:

  • Overview — system health, quick actions (capture/diagnose/doctor), recent cases
  • Timeline — system events plotted by severity over time with stats summary
  • Services — clickable service list with name/state filters and log viewer
  • Logs — filterable log explorer with priority, unit, and keyword search
  • Kernel — hardware events, driver issues, OOM detection
  • Reports — interactive case browser with inline findings viewer and action buttons
  • Diff — compare two captures or boots
  • Terminal — run TraceLens commands directly from the dashboard
  • AI Insights — optional AI-powered root cause analysis viewer

Compare system states

uv run tracelens diff case-2026-04-01_001 case-2026-04-02_001
uv run tracelens diff --boot previous --boot current

Background Monitoring (Optional)

Enable TraceLens as a systemd service for continuous, lightweight evidence capture:

# Install and enable
uv run tracelens service install
uv run tracelens service enable

# Or manually
sudo systemctl enable --now tracelens.service

# Check status
uv run tracelens service status

The service runs with minimal overhead: periodic snapshots, failure detection, and rolling history — all stored locally.


How It Works

Collectors

Gather evidence from system subsystems:

  • journald — system logs with priority, unit, and boot filtering
  • dmesg — kernel messages, hardware events, OOM detection
  • systemd — unit states, failures, restart counts
  • processes — CPU/memory consumers, load average
  • resources — disk usage, memory pressure
  • boot — boot sessions, timing, previous boot analysis
  • packages — update history (distro-aware, Arch supported)

Diagnosis Engine

Pattern-based analysis that runs entirely offline:

  • Service restart loop detection
  • OOM kill correlation
  • Boot failure analysis
  • Disk pressure warnings
  • Kernel warning clustering
  • Package-update-to-failure correlation
  • Authentication failure detection
  • Network service instability

Reports

Structured outputs in Markdown, HTML, JSON, or plain text — ready to attach to a GitHub issue, send to a team, or archive.

Dashboard

Local Dash/Plotly web app with interactive charts, filterable logs, service health visualization, timeline correlation, a built-in command runner, and AI insights viewer. Every diagnostic action can be triggered from the browser.


No AI Required

TraceLens works completely offline with zero API keys. The diagnosis engine uses deterministic pattern matching and rules — not LLM prompts.

AI support is optional and additive. If you configure an API key, TraceLens can:

  • Summarize incidents in plain English
  • Suggest likely root causes
  • Generate improved report narratives
  • Cluster similar incidents

AI never modifies raw evidence, never deletes data, and all AI outputs are clearly marked as advisory.

Activating AI (Optional)

# 1. Install the AI dependencies
uv sync --extra ai

# 2. Activate with one command (prompts for your API key)
tracelens ai activate

# 3. Verify
tracelens doctor
# The AI line should show ✓ instead of "not configured"

The activate command writes your settings to settings.toml and saves the API key to your shell config (~/.bashrc or ~/.zshrc). If ANTHROPIC_API_KEY is already in your environment, it detects it automatically.

# Check AI status
tracelens ai status

# Disable AI
tracelens ai deactivate

Data Storage

Everything is stored locally in human-inspectable formats:

~/.local/share/tracelens/
  cases/
    2026-04-02_001/
      metadata.json
      journal.jsonl
      kernel.jsonl
      services.json
      system_snapshot.json
      findings.json
      report.md
  config/
    settings.toml
  logs/

No databases. No opaque blobs. You can inspect, copy, or delete any case directly.


Configuration

# ~/.local/share/tracelens/config/settings.toml

[dashboard]
host = "127.0.0.1"
port = 8765

[capture]
default_window = "6h"
storage_path = "~/.local/share/tracelens"

[service]
polling_interval = 300  # seconds

[ai]
enabled = false
provider = "none"  # "anthropic", "openai"
# API keys via environment: ANTHROPIC_API_KEY, OPENAI_API_KEY

Requirements

  • Python 3.11+
  • Linux with systemd (most modern distributions) or macOS
  • journalctl, systemctl, dmesg accessible (Linux); log, sysctl (macOS)
  • No root required for basic usage (some collectors benefit from elevated access)

Privacy & Security

  • Local-first: all data stays on your machine
  • No telemetry: nothing is sent anywhere
  • No cloud dependency: works fully offline
  • Redaction support: sensitive fields can be masked in exported reports
  • AI is opt-in: you choose if/when data goes to an AI provider

License

MIT

Trace-Lens-Linux

About

TraceLens Linux — Capture Linux incidents. Understand failures. Visualize system behavior.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages