debug: add memory_snapshot WS command for leak diagnosis#935
debug: add memory_snapshot WS command for leak diagnosis#935marcelveldt wants to merge 2 commits into
Conversation
Users see dashboard RAM grow during/after builds but there's no way to tell what grew. Adds an opt-in tracemalloc-backed WS command that returns top-N allocators plus a diff against a saved baseline, so a bug report can carry actionable evidence. Off by default — set ESPHOME_DEBUG_MEMORY=1 to start tracking from process boot (catches catalog loads + startup allocs), or call the command once to enable lazily.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #935 +/- ##
==========================================
- Coverage 99.24% 99.23% -0.01%
==========================================
Files 191 193 +2
Lines 14200 14339 +139
==========================================
+ Hits 14093 14230 +137
- Misses 107 109 +2
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Adds a diagnostics-only WebSocket command to capture tracemalloc snapshots (optionally diffed against a saved baseline) to help investigate reported RAM growth during/after firmware builds. This integrates a new DebugController into the controller registry, provides a small helpers.memory wrapper around tracemalloc, and documents the new API.
Changes:
- Add
debug/memory_snapshotWS command (lazy enable, save/compare/drop baseline, top-N allocators + system stats). - Add
helpers/memory.pyto manage tracking, snapshots, formatting, and in-memory baselines. - Add early-process env-var gate in
__main__plus docs + tests to pin the wire contract.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
tests/test_debug_memory.py |
New tests pin the debug/memory_snapshot wire shape and baseline behavior. |
esphome_device_builder/helpers/memory.py |
New tracemalloc wrapper + baseline store + system stats formatting. |
esphome_device_builder/device_builder.py |
Registers the new DebugController so its WS commands are discoverable. |
esphome_device_builder/controllers/debug.py |
Implements debug/memory_snapshot command and input handling. |
esphome_device_builder/__main__.py |
Adds ESPHOME_DEBUG_MEMORY startup gate to enable tracemalloc early. |
docs/API.md |
Documents the new WS command and recommended usage for leak bisection. |
| """Tests for ``controllers/debug.py`` + ``helpers/memory.py``. | ||
|
|
||
| Pins the wire shape of ``debug/memory_snapshot`` and the | ||
| save / compare / drop baseline contract. ``tracemalloc`` is | ||
| process-global, so each test stops + clears state in a fixture |
There was a problem hiding this comment.
fixed in b948822 — moved the body to the line after the opening """.
| _baselines: dict[str, tracemalloc.Snapshot] = {} | ||
|
|
||
|
|
||
| def start_tracking(frames: int = _DEFAULT_FRAMES) -> None: | ||
| """Enable ``tracemalloc`` allocation tracking. Idempotent.""" | ||
| if not tracemalloc.is_tracing(): | ||
| tracemalloc.start(frames) | ||
|
|
||
|
|
||
| def is_tracking() -> bool: | ||
| """Return whether ``tracemalloc`` is currently tracking allocations.""" | ||
| return tracemalloc.is_tracing() | ||
|
|
||
|
|
||
| def take_snapshot() -> tracemalloc.Snapshot: | ||
| """Return a fresh ``tracemalloc`` snapshot. Caller ensures tracking is on.""" | ||
| return tracemalloc.take_snapshot() | ||
|
|
||
|
|
||
| def save_baseline(name: str, snapshot: tracemalloc.Snapshot) -> None: | ||
| """Store *snapshot* under *name* for later ``compare_with``.""" | ||
| _baselines[name] = snapshot |
There was a problem hiding this comment.
hmmm not sure I agree with this - this is a debug endpoint
| def _stat_to_dict(stat: Any) -> dict[str, Any]: | ||
| """Convert a ``tracemalloc`` Statistic / StatisticDiff to wire shape.""" | ||
| return { | ||
| "traceback": [str(frame) for frame in stat.traceback], | ||
| "size_bytes": stat.size, | ||
| "size_diff_bytes": getattr(stat, "size_diff", 0), | ||
| "count": stat.count, | ||
| "count_diff": getattr(stat, "count_diff", 0), |
There was a problem hiding this comment.
this is an authed debug endpoint behind an env var — full paths are what you need to localise the leak. callers paste these straight into bug reports; basenames would drop the diagnostic value. happy to revisit if we ever expose it outside authed contexts.
| if not isinstance(top_n, int) or top_n < 1 or top_n > _MAX_TOP_N: | ||
| raise CommandError( | ||
| ErrorCode.INVALID_ARGS, | ||
| f"top_n must be an int between 1 and {_MAX_TOP_N}", | ||
| ) | ||
|
|
||
| if drop_baseline is not None: | ||
| memory.drop_baseline(drop_baseline) | ||
|
|
There was a problem hiding this comment.
fixed in b948822 — non-string save_as / compare_with / drop_baseline now reject with INVALID_ARGS (also length-capped to 100 chars) before reaching the dict.
| # Enable tracemalloc as the very first step so the | ||
| # ``debug/memory_snapshot`` WS command (helpers/memory.py) can | ||
| # produce diffs that include the catalog loads and other | ||
| # startup allocations. Off by default — adds per-allocation | ||
| # overhead. | ||
| if os.environ.get("ESPHOME_DEBUG_MEMORY"): | ||
| import tracemalloc # noqa: PLC0415 | ||
|
|
||
| tracemalloc.start(25) |
There was a problem hiding this comment.
fixed in b948822 — only 1 / true / yes / on (case-insensitive, whitespace-tolerant) enable it now, so =0 / empty / false / typos leave it off.
| # Enable tracemalloc as the very first step so the | ||
| # ``debug/memory_snapshot`` WS command (helpers/memory.py) can | ||
| # produce diffs that include the catalog loads and other | ||
| # startup allocations. Off by default — adds per-allocation | ||
| # overhead. | ||
| if os.environ.get("ESPHOME_DEBUG_MEMORY"): | ||
| import tracemalloc # noqa: PLC0415 | ||
|
|
||
| tracemalloc.start(25) |
There was a problem hiding this comment.
added in b948822 — extracted the gate into _memory_tracking_enabled_from_env() and pinned the truthy/falsy table in tests/test_main_cli.py.
- type-validate save_as / compare_with / drop_baseline before dict
access — non-string values now reject with INVALID_ARGS instead of
bubbling a TypeError out as INTERNAL_ERROR.
- parse ESPHOME_DEBUG_MEMORY as a real on-shape boolean ("1" / "true"
/ "yes" / "on") so ESPHOME_DEBUG_MEMORY=0 doesn't silently enable
tracking the way bool(os.environ.get(...)) would.
- extract the gate into _memory_tracking_enabled_from_env() so the
truthy table can be unit-tested directly.
- multi-line test module docstring now starts on the line after the
opening triple-quote (CLAUDE.md style).
What does this implement/fix?
Users report dashboard RAM growing during and after firmware builds, but we have no way to tell what grew. Adds a
tracemalloc-backed WS command so a bug report can carry an actual top-N allocator diff instead of "RAM goes up".Off by default. Set
ESPHOME_DEBUG_MEMORY=1to start tracking from process boot (catches catalog loads + startup allocs), or call the command once to enable lazily.Changes:
debug/memory_snapshotWS command — returns top-N allocators, system stats, optional diff against a saved baselinehelpers.memorymodule wrappingtracemalloc+ an in-memory baseline store__main__sotracemalloc.start(25)runs as the very first step when enabledDebugControllerwired into the controller registryAPI.mdRelated issue or feature (if applicable):
Types of changes
new-featureFrontend coordination
Checklist
ruff,codespell, yaml/json/python checks).tests/where applicable.components.jsonhas not been hand-edited (regenerate viascript/sync_components.pyif a sync is needed).docs/ARCHITECTURE.mdand/ordocs/API.md.