Skip to content

debug: add memory_snapshot WS command for leak diagnosis#935

Open
marcelveldt wants to merge 2 commits into
mainfrom
debug-memory-snapshot
Open

debug: add memory_snapshot WS command for leak diagnosis#935
marcelveldt wants to merge 2 commits into
mainfrom
debug-memory-snapshot

Conversation

@marcelveldt
Copy link
Copy Markdown
Contributor

What does this implement/fix?

Users report dashboard RAM growing during and after firmware builds, but we have no way to tell what grew. Adds a tracemalloc-backed WS command so a bug report can carry an actual top-N allocator diff instead of "RAM goes up".

Off by default. Set ESPHOME_DEBUG_MEMORY=1 to start tracking from process boot (catches catalog loads + startup allocs), or call the command once to enable lazily.

Changes:

  • new debug/memory_snapshot WS command — returns top-N allocators, system stats, optional diff against a saved baseline
  • new helpers.memory module wrapping tracemalloc + an in-memory baseline store
  • env-var gate in __main__ so tracemalloc.start(25) runs as the very first step when enabled
  • DebugController wired into the controller registry
  • docs entry in API.md

Related issue or feature (if applicable):

Types of changes

  • New feature (non-breaking change which adds functionality) — new-feature

Frontend coordination

  • No frontend change needed

Checklist

  • The code change is tested and works locally.
  • Pre-commit hooks pass (ruff, codespell, yaml/json/python checks).
  • Tests have been added or updated under tests/ where applicable.
  • components.json has not been hand-edited (regenerate via script/sync_components.py if a sync is needed).
  • Architecture-level changes are reflected in docs/ARCHITECTURE.md and/or docs/API.md.

Users see dashboard RAM grow during/after builds but there's no way to
tell what grew. Adds an opt-in tracemalloc-backed WS command that
returns top-N allocators plus a diff against a saved baseline, so a
bug report can carry actionable evidence.

Off by default — set ESPHOME_DEBUG_MEMORY=1 to start tracking from
process boot (catches catalog loads + startup allocs), or call the
command once to enable lazily.
Copilot AI review requested due to automatic review settings May 21, 2026 22:47
@github-actions github-actions Bot added the new-feature New feature label May 21, 2026
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 21, 2026

Merging this PR will not alter performance

✅ 25 untouched benchmarks


Comparing debug-memory-snapshot (b948822) with main (8f61f63)

Open in CodSpeed

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 21, 2026

Codecov Report

❌ Patch coverage is 97.91667% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 99.23%. Comparing base (332c3f8) to head (b948822).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
esphome_device_builder/__main__.py 66.66% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #935      +/-   ##
==========================================
- Coverage   99.24%   99.23%   -0.01%     
==========================================
  Files         191      193       +2     
  Lines       14200    14339     +139     
==========================================
+ Hits        14093    14230     +137     
- Misses        107      109       +2     
Flag Coverage Δ
py3.12 99.16% <93.75%> (-0.04%) ⬇️
py3.14 99.23% <96.87%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
esphome_device_builder/controllers/debug.py 100.00% <100.00%> (ø)
esphome_device_builder/device_builder.py 97.66% <100.00%> (+0.01%) ⬆️
esphome_device_builder/helpers/memory.py 100.00% <100.00%> (ø)
esphome_device_builder/__main__.py 91.15% <66.66%> (-1.38%) ⬇️

... and 5 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a diagnostics-only WebSocket command to capture tracemalloc snapshots (optionally diffed against a saved baseline) to help investigate reported RAM growth during/after firmware builds. This integrates a new DebugController into the controller registry, provides a small helpers.memory wrapper around tracemalloc, and documents the new API.

Changes:

  • Add debug/memory_snapshot WS command (lazy enable, save/compare/drop baseline, top-N allocators + system stats).
  • Add helpers/memory.py to manage tracking, snapshots, formatting, and in-memory baselines.
  • Add early-process env-var gate in __main__ plus docs + tests to pin the wire contract.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
tests/test_debug_memory.py New tests pin the debug/memory_snapshot wire shape and baseline behavior.
esphome_device_builder/helpers/memory.py New tracemalloc wrapper + baseline store + system stats formatting.
esphome_device_builder/device_builder.py Registers the new DebugController so its WS commands are discoverable.
esphome_device_builder/controllers/debug.py Implements debug/memory_snapshot command and input handling.
esphome_device_builder/__main__.py Adds ESPHOME_DEBUG_MEMORY startup gate to enable tracemalloc early.
docs/API.md Documents the new WS command and recommended usage for leak bisection.

Comment thread tests/test_debug_memory.py Outdated
Comment on lines +1 to +5
"""Tests for ``controllers/debug.py`` + ``helpers/memory.py``.

Pins the wire shape of ``debug/memory_snapshot`` and the
save / compare / drop baseline contract. ``tracemalloc`` is
process-global, so each test stops + clears state in a fixture
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in b948822 — moved the body to the line after the opening """.

Comment on lines +29 to +50
_baselines: dict[str, tracemalloc.Snapshot] = {}


def start_tracking(frames: int = _DEFAULT_FRAMES) -> None:
"""Enable ``tracemalloc`` allocation tracking. Idempotent."""
if not tracemalloc.is_tracing():
tracemalloc.start(frames)


def is_tracking() -> bool:
"""Return whether ``tracemalloc`` is currently tracking allocations."""
return tracemalloc.is_tracing()


def take_snapshot() -> tracemalloc.Snapshot:
"""Return a fresh ``tracemalloc`` snapshot. Caller ensures tracking is on."""
return tracemalloc.take_snapshot()


def save_baseline(name: str, snapshot: tracemalloc.Snapshot) -> None:
"""Store *snapshot* under *name* for later ``compare_with``."""
_baselines[name] = snapshot
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm not sure I agree with this - this is a debug endpoint

Comment on lines +106 to +113
def _stat_to_dict(stat: Any) -> dict[str, Any]:
"""Convert a ``tracemalloc`` Statistic / StatisticDiff to wire shape."""
return {
"traceback": [str(frame) for frame in stat.traceback],
"size_bytes": stat.size,
"size_diff_bytes": getattr(stat, "size_diff", 0),
"count": stat.count,
"count_diff": getattr(stat, "count_diff", 0),
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an authed debug endpoint behind an env var — full paths are what you need to localise the leak. callers paste these straight into bug reports; basenames would drop the diagnostic value. happy to revisit if we ever expose it outside authed contexts.

Comment on lines +49 to +57
if not isinstance(top_n, int) or top_n < 1 or top_n > _MAX_TOP_N:
raise CommandError(
ErrorCode.INVALID_ARGS,
f"top_n must be an int between 1 and {_MAX_TOP_N}",
)

if drop_baseline is not None:
memory.drop_baseline(drop_baseline)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in b948822 — non-string save_as / compare_with / drop_baseline now reject with INVALID_ARGS (also length-capped to 100 chars) before reaching the dict.

Comment on lines +104 to +112
# Enable tracemalloc as the very first step so the
# ``debug/memory_snapshot`` WS command (helpers/memory.py) can
# produce diffs that include the catalog loads and other
# startup allocations. Off by default — adds per-allocation
# overhead.
if os.environ.get("ESPHOME_DEBUG_MEMORY"):
import tracemalloc # noqa: PLC0415

tracemalloc.start(25)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in b948822 — only 1 / true / yes / on (case-insensitive, whitespace-tolerant) enable it now, so =0 / empty / false / typos leave it off.

Comment on lines +104 to +112
# Enable tracemalloc as the very first step so the
# ``debug/memory_snapshot`` WS command (helpers/memory.py) can
# produce diffs that include the catalog loads and other
# startup allocations. Off by default — adds per-allocation
# overhead.
if os.environ.get("ESPHOME_DEBUG_MEMORY"):
import tracemalloc # noqa: PLC0415

tracemalloc.start(25)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added in b948822 — extracted the gate into _memory_tracking_enabled_from_env() and pinned the truthy/falsy table in tests/test_main_cli.py.

- type-validate save_as / compare_with / drop_baseline before dict
  access — non-string values now reject with INVALID_ARGS instead of
  bubbling a TypeError out as INTERNAL_ERROR.
- parse ESPHOME_DEBUG_MEMORY as a real on-shape boolean ("1" / "true"
  / "yes" / "on") so ESPHOME_DEBUG_MEMORY=0 doesn't silently enable
  tracking the way bool(os.environ.get(...)) would.
- extract the gate into _memory_tracking_enabled_from_env() so the
  truthy table can be unit-tested directly.
- multi-line test module docstring now starts on the line after the
  opening triple-quote (CLAUDE.md style).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

new-feature New feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants