Skip to content

Intern closed-vocabulary strings in the component catalog loader#938

Open
marcelveldt wants to merge 1 commit into
mainfrom
intern-component-catalog-strings
Open

Intern closed-vocabulary strings in the component catalog loader#938
marcelveldt wants to merge 1 commit into
mainfrom
intern-component-catalog-strings

Conversation

@marcelveldt
Copy link
Copy Markdown
Contributor

What does this implement/fix?

The component catalog parses ~13k ConfigEntry and ~900 ComponentCatalogEntry instances at startup, each carrying duplicate copies of a handful of closed-vocabulary strings. Measured against the live catalog: 12 unique platform_type values, 32 unique references_component values, 9 unique supported_platforms values — repeated across thousands of entries.

sys.intern collapses every occurrence onto a single PyUnicode object, trimming several MB off the loaded catalog's resident size for free. No wire-shape change, no API change.

Related issue or feature (if applicable):

Changes

  • new _intern_optional / _intern_str_list helpers in controllers/components.py
  • _load_config_entry interns platform_type, references_component, and supported_platforms members
  • _load_component interns supported_platforms members
  • category is already a ComponentCategory enum (singleton), so no interning needed there

Types of changes

  • Bugfix (non-breaking change which fixes an issue) — bugfix
  • New feature (non-breaking change which adds functionality) — new-feature
  • Enhancement to an existing feature — enhancement
  • Breaking change (fix or feature that would cause existing functionality to not work as expected) — breaking-change
  • Refactor (no behaviour change) — refactor
  • Documentation only — docs
  • Maintenance / chore — maintenance
  • CI / workflow change — ci
  • Dependencies bump — dependencies

Frontend coordination

  • No frontend change needed

Checklist

  • The code change is tested and works locally.
  • Pre-commit hooks pass (ruff, codespell, yaml/json/python checks).
  • Tests have been added or updated under tests/ where applicable.
  • components.json has not been hand-edited (regenerate via script/sync_components.py if a sync is needed).
  • Architecture-level changes are reflected in docs/ARCHITECTURE.md and/or docs/API.md.

The component catalog parses ~13k ConfigEntry instances and ~900
ComponentCatalogEntry instances, each carrying duplicate copies of
a handful of closed-vocabulary strings — platform_type, references_component,
and supported_platforms members all draw from a few dozen unique
values across the whole catalog.

sys.intern collapses those duplicates onto a single PyUnicode object
each, trimming several MB off the loaded catalog's resident size with
no wire-shape change. Lands as part of the #934 footprint cleanup
(PR 2 of the checklist).
Copilot AI review requested due to automatic review settings May 22, 2026 07:29
@marcelveldt marcelveldt added the enhancement Improvement to an existing feature label May 22, 2026
@github-actions github-actions Bot added the enhancement Improvement to an existing feature label May 22, 2026
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 22, 2026

Merging this PR will not alter performance

✅ 25 untouched benchmarks


Comparing intern-component-catalog-strings (17c47b8) with main (dd49cd3)

Open in CodSpeed

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.24%. Comparing base (dd49cd3) to head (17c47b8).

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #938   +/-   ##
=======================================
  Coverage   99.24%   99.24%           
=======================================
  Files         191      191           
  Lines       14243    14252    +9     
=======================================
+ Hits        14136    14145    +9     
  Misses        107      107           
Flag Coverage Δ
py3.12 99.19% <100.00%> (-0.01%) ⬇️
py3.14 99.24% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
esphome_device_builder/controllers/components.py 100.00% <100.00%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces the in-memory footprint of the components catalog at startup by interning a small set of closed-vocabulary strings while parsing definitions/components.json, avoiding thousands of duplicate str objects without changing the API/wire shape.

Changes:

  • Added _intern_optional and _intern_str_list helpers for safe sys.intern() usage during JSON → model loading.
  • Interned platform_type, references_component, and supported_platforms when loading ConfigEntry.
  • Interned supported_platforms when loading ComponentCatalogEntry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Improvement to an existing feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants