cut static memory footprint via lazy-loading catalogs + firmware logs

follow-up to #922 — an idle dashboard sits at ~700MB before a compile even runs. on the 2GB HA-addon VM in that issue there's no headroom for gcc.

three big eager loads contribute (sizes on disk):

- `definitions/components.json` — 22MB, ~14k mashumaro instances after parse
- `definitions/automations.json` — 15MB
- firmware-job persistence — restores up to 55 historical jobs × 2000 lines of build output (~15MB worst case) at startup

most of this is wasted RAM: list/search endpoints don't need `config_entries`, users mostly open one component card at a time, and old build-logs are rarely re-opened.

**approach** — slim-index + lazy-body, keeping orjson + mashumaro on the fast path:

- catalogs split build-time into an `index.json` (id / name / description / category / etc.) loaded eagerly, plus per-entry body files loaded on demand into a small LRU (~64 entries)
- firmware logs split into job-metadata (in the existing metadata blob) + a per-job sidecar log file under `ext_storage_path("dashboard-jobs")/<job_id>.log` — `FirmwareJob.output` populates lazily when the frontend opens a job detail view

ruled out:
- **sqlite** — read-only catalog regenerated by sync script, no relational queries we can't do in python, would lose the mashumaro fast path
- **`__slots__` + drop `DataClassORJSONMixin`** — would save ~4MB pre-lazy-body but only ~400KB once the LRU is bounded. not worth losing schema consistency, mashumaro's wire-codegen, and gaining a hand-rolled `to_dict()` per leaf model that has to stay in lockstep with field additions.

**estimated saving** — ~50-80MB resident off the 700MB baseline. headroom for #922's 2GB VM to stop sigkilling during compiles.

### checklist

- [x] **PR 1** — measurement scaffolding: tracemalloc-based memory benchmark in `tests/benchmarks/test_catalog_memory.py` covering all three catalog loaders + the firmware job restore path. regression gate, no production change.
- [ ] **PR 2** — `sys.intern()` on closed-vocabulary strings in `_load_component` / `_load_config_entry` (`category` / `platform_type` / `supported_platforms` / `references_component`). free win, no wire change. estimated 3-8MB.
- [ ] **PR 3** — components: slim index + lazy bodies. build-time: `sync_components.py` emits `definitions/components.index.json` + `definitions/components/<id>.json`. runtime: `ComponentCatalog.load()` parses index only, `get_body(id)` reads body on demand through bounded LRU. `_FeaturedRecord` stores `underlying_id: str`. `get_components` wire shape drops `config_entries` — coordinate with frontend repo PR same release window.
- [ ] **PR 4** — automations: same split shape as components. slim index must keep `id` + `domain` + `applies_to` + `is_device_level` because `triggers_for_domains` / `actions_for_domains` partition by domain on every request. refactor `@functools.cache` module-global to an `AutomationsCatalog` controller owned by `DeviceBuilder.automations`.
- [ ] **PR 5** — firmware logs: split `FirmwareJob.output` to per-job sidecar file. `load_jobs` restores metadata only. `_ingest_output_line` appends to in-memory list (for live followers) AND to the sidecar. `persist_jobs` stops rewriting outputs — sidecar is append-only. new `firmware/get_job_output` WS command (or flag on existing `firmware/get_job`).

boards (3MB) stays untouched — below the threshold for the complexity cost.

a diagnostic helper for separately-reported runtime memory growth landed in #935 (`debug/memory_snapshot` WS command) so future leak reports can carry tracemalloc diffs.

<details>
<summary>full design notes</summary>

## Context

an idle device-builder instance currently sits at ~700MB resident — before a single firmware compile launches. the new dashboard ships three eagerly-loaded read-only catalogs that the old `esphome dashboard` did not carry in memory:

| file | disk | resident objects | loader |
|---|---|---|---|
| `definitions/components.json` | **22MB** | ~904 `ComponentCatalogEntry` + ~13k `ConfigEntry` (no `__slots__`) | `controllers/components.py:86` (`ComponentCatalog.load()`) |
| `definitions/automations.json` | **15MB** | triggers / actions / conditions / light_effects | `controllers/automations/catalog.py:32` (`@functools.cache` at module-import) |
| `definitions/boards.json` | **3MB** | 490 boards | `BoardCatalog.load()` |

disk → python object explosion for catalogs of this shape is typically 3-5× — each JSON string becomes an `str` object (~50 bytes overhead), every list/dict gets PyObject headers, and the ~14k mashumaro dataclass instances each carry a ~280-byte `__dict__`. that's consistent with the 700MB baseline once you add esphome-library imports (`esphome.config`, `esphome.codegen`, the platform-specific component modules) which the dashboard's validation paths pull in at startup.

issue #922 reports SIGKILL on a 2GB HA-addon VM the moment a compile launches — gcc/g++ on its own can claim 300-500MB per invocation, and there's no headroom because the dashboard's permanent budget already consumed 700MB.

today's access pattern already does most of the work for us:

- `components/get_components` (paginated list / search) → list view does not need `config_entries`
- `components/get_categories`, `components/get_integration_docs` → only `id` + a couple of flat fields
- `components/get_component` (detail view) → needs the full entry
- `add_component`, `resolve_default_components` → full entry for one component at a time
- catalogs are never pushed in `subscribe_events.initial_state`

so the bulk of every dashboard load is paying ~13k `ConfigEntry` trees that the typical session only opens for a handful of components.

## why not the alternatives

- **sqlite**: read-only catalog regenerated by a script, no relational queries, no FTS we can't do cheaper in python. adds wheel-bundling complexity (binary blob), gives up mashumaro type validation, costs a cursor + row→dataclass adapter on every read. the only argument is "single file, atomic regen" and the slim index gives us that for free.
- **`__slots__` + drop `DataClassORJSONMixin`**: `@dataclass(slots=True)` on `ConfigEntry` only saves memory if we also drop the mixin (a non-slotted base silently re-adds `__dict__` via MRO). pre-lazy-body that would have saved ~4MB across ~14k resident instances — worth the cost. post-lazy-body only ~64 bodies live in the LRU at a time, so the win collapses to ~400KB. not worth losing schema consistency, mashumaro's wire-codegen, and gaining a hand-rolled `to_dict()` per leaf model that has to stay in lockstep with field additions. picks up if mashumaro upstream ever gets a slot-friendly mixin shape.
- **single packed file + offset index** (mmap + seek): one file instead of ~900 saves a bit of inode overhead but pushes complexity into the sync script (compute / verify offsets) and adds a torn-write failure mode the per-entry shape sidesteps. per-entry files have precedent in `definitions/boards/<id>/manifest.yaml` (493 dirs).

## wire shape

- `components/get_components` (list) returns slim entries — no `config_entries`. **frontend coordination required**: the frontend repo PR drops any list-view reads of `config_entries`. same release window.
- `components/get_component` (detail) keeps the full shape — the form renderer relies on `config_entries` here.

## per-PR detail

### PR 1 — measurement scaffolding

regression gate so future PRs are measured against a baseline:

- new `tests/benchmarks/test_catalog_memory.py` with a `tracemalloc` snapshot around `ComponentCatalog.load()`, `automations.catalog.load_catalog()`, `BoardCatalog.load()`, and the firmware job restore path.
- snapshot per-loader resident bytes; assert against a generous ceiling so a regression on `main` after a `sync_components` run surfaces immediately.
- no production code change.

### PR 2 — `sys.intern()` on closed-vocabulary strings

free win, no API change, low risk — lands while PR 3 is in review.

- in `_load_component` / `_load_config_entry`, intern `category`, `platform_type`, `supported_platforms` members, `references_component`. closed vocabularies are ~20 categories × ~10 platforms × ~30 entry types — currently duplicated across 13k `ConfigEntry` instances.
- estimated saving: 3-8MB. no wire change.

### PR 3 — components: slim-index + bodies on disk

build-time changes in `script/sync_components.py`:

- emit `definitions/components.index.json`: a list of `ComponentCatalogIndexEntry` carrying `id`, `name`, `description`, `category`, `docs_url`, `image_url`, `dependencies`, `multi_conf`, `supported_platforms`. every field any of `get_components` filter, `get_categories`, `get_integration_docs`, `_categories_for_board` or the featured-registry build path references.
- emit `definitions/components/<id>.json` per component: the full shape including `config_entries`.
- atomic regen: write the new tree to `definitions/components.next/` then `os.replace()` the directory + write a single `components.index.json` last. a Ctrl-C mid-regen must not leave a torn catalog. validate via a manifest hash carried in the index header.
- update `pyproject.toml`'s `tool.setuptools.package-data` glob.

runtime changes in `controllers/components.py`:

- new model: `ComponentCatalogIndexEntry` (slim shape) in `models/components.py`.
- `ComponentCatalog.load()` parses `components.index.json` only.
- `ComponentCatalog.get_body(id) -> ComponentCatalogEntry` reads `components/<id>.json` on demand, hydrates with mashumaro, returns through a bounded LRU (`maxsize=64`).
- body reads hop to a thread (`asyncio.to_thread`) to keep blockbuster happy and stay off the event loop.
- `_build_featured_registry()` becomes index-only: `_FeaturedRecord` stores `underlying_id: str`, not the body. bodies are fetched at `_materialise_featured` time. most invasive bit of this PR — flag explicitly in PR description.
- WS surface: `get_components` returns the slim type. `get_component` / `add_component` / `resolve_default_components` go through `get_body`. `get_categories` / `get_integration_docs` stay on the index.

test fixture work: `tests/conftest.py` materialises a tiny mock `components/` directory at test setup (a couple of fixture components + an index file) rather than stubbing `_COMPONENTS_JSON`.

### PR 4 — automations

same split shape as components, but the access pattern differs: `triggers_for_domains` / `actions_for_domains` / `conditions_for_domains` walk the whole catalog every request to partition `core` entries first. the slim index must therefore keep `id` + `domain` + `applies_to` + `is_device_level` so domain filtering stays index-only. bodies (the `config_entries` and option schemas) go behind the LRU.

the automations module currently loads at module-import time via `@functools.cache` (a global) — move that ownership onto an `AutomationsCatalog` controller object that mirrors `ComponentCatalog`'s shape and is owned by `DeviceBuilder.automations`.

estimated saving: comparable to PR 3 (~15MB raw → ~3-5MB index + LRU).

### PR 5 — firmware-job output: lazy restore

`controllers/firmware/persistence.py:69` (`load_jobs`) hydrates each `FirmwareJob` including its `output: list[str]` field. limits today:

- `_MAX_OUTPUT_LINES_RETAINED = 2000` per job
- `_MAX_PRIMARY_TERMINAL_JOBS = 50`, `_MAX_AUX_TERMINAL_JOBS = 5`

worst case: 55 × 2000 × ~150 bytes ≈ 15MB of build logs resident the user mostly never looks at, plus secondary churn: `persist_jobs` rewrites the whole jobs dict (outputs included) on every persist call.

approach:

- split persistence: keep job metadata (everything but `output`) in the existing `metadata_transaction` blob; sidecar each job's `output` to a per-job log file under `ext_storage_path("dashboard-jobs")/<job_id>.log`. resolve through `ext_storage_path`, never reconstruct paths — per the deployment-modes invariant.
- `load_jobs` restores metadata only; `FirmwareJob.output` starts empty.
- new `firmware/get_job_output` WS command (or flag on existing `firmware/get_job`) reads the sidecar file when the frontend opens a job-detail view.
- live jobs append both to the in-memory `output` list (for `subscribe_events` follower frames) AND to the sidecar log file.
- pruning deletes the corresponding sidecar log when a job falls off history.

estimated saving: ~15MB at idle. bonus: persist write amplification drops from "rewrite all 55 outputs every line trim" to "append to one sidecar".

most localized of the lazy-load changes — small blast radius, single controller, no wire-shape change beyond an additive `firmware/get_job_output` command. could land in parallel with PR 3.

### boards untouched

3MB raw is below the threshold; complicating `definitions/__init__.py` for a 1-2MB save isn't worth it.

## risks / gotchas

- **atomic regen of the catalog tree.** emit to a sibling directory then `os.replace()`. a torn write would leave the runtime resolving fresh body files against a stale index (or vice versa).
- **wheel bundling.** ~900 small files compress ~5-10% less efficiently than one packed JSON. acceptable.
- **blockbuster.** body reads must go through `asyncio.to_thread` or blockbuster will complain about sync I/O on the event loop in tests.
- **featured-component first-touch.** each board's `featured_components` references underlying components by id. with lazy bodies, opening a featured card pays one disk read the first time. not worth pre-warming — the LRU absorbs subsequent hits.
- **frontend coordination.** `get_components` slim shape needs the frontend repo PR landing in the same release window. PR 3's description must include the matching frontend PR link before merge.

## verification

1. **unit tests**: `pytest tests/test_components*` round-trips against the mock catalog tree; assert `get_component` / `get_components` / `add_component` / `resolve_default_components` shapes are unchanged end-to-end (modulo the deliberately removed `config_entries` on `get_components` list responses).
2. **memory benchmark (PR 1)**: `pytest tests/benchmarks/test_catalog_memory.py` asserts resident bytes after `load()` are under the new ceiling; PR 3 / PR 4 each tighten that ceiling.
3. **runtime benchmark**: `pytest --codspeed tests/benchmarks/test_startup.py` shows `ComponentCatalog.load()` faster end-to-end (smaller parse). add a per-body decode benchmark to track the new lazy path.
4. **end-to-end smoke**: run the dashboard against a real config dir, open a few component detail views, confirm responses are shape-identical against `main` for the detail path and the list path drops only `config_entries`. watch `ps -o rss` on the process — target idle resident: under 600MB after PR 3, under 500MB after PR 4.
5. **issue #922 repro**: ideally the reporter (or CI on a memory-capped runner) confirms the compile-loop no longer SIGKILLs on a 2GB VM once PR 3 + PR 4 ship.

</details>


file	disk	resident objects	loader
`definitions/components.json`	22MB	~904 `ComponentCatalogEntry` + ~13k `ConfigEntry` (no `__slots__`)	`controllers/components.py:86` (`ComponentCatalog.load()`)
`definitions/automations.json`	15MB	triggers / actions / conditions / light_effects	`controllers/automations/catalog.py:32` (`@functools.cache` at module-import)
`definitions/boards.json`	3MB	490 boards	`BoardCatalog.load()`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cut static memory footprint via lazy-loading catalogs + firmware logs #934

checklist

Context

why not the alternatives

wire shape

per-PR detail

PR 1 — measurement scaffolding

PR 2 — `sys.intern()` on closed-vocabulary strings

PR 3 — components: slim-index + bodies on disk

PR 4 — automations

PR 5 — firmware-job output: lazy restore

boards untouched

risks / gotchas

verification

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

cut static memory footprint via lazy-loading catalogs + firmware logs #934

Description

checklist

Context

why not the alternatives

wire shape

per-PR detail

PR 1 — measurement scaffolding

PR 2 — sys.intern() on closed-vocabulary strings

PR 3 — components: slim-index + bodies on disk

PR 4 — automations

PR 5 — firmware-job output: lazy restore

boards untouched

risks / gotchas

verification

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

PR 2 — `sys.intern()` on closed-vocabulary strings