Intern closed-vocabulary strings in the component catalog loader#938
Open
marcelveldt wants to merge 1 commit into
Open
Intern closed-vocabulary strings in the component catalog loader#938marcelveldt wants to merge 1 commit into
marcelveldt wants to merge 1 commit into
Conversation
The component catalog parses ~13k ConfigEntry instances and ~900 ComponentCatalogEntry instances, each carrying duplicate copies of a handful of closed-vocabulary strings — platform_type, references_component, and supported_platforms members all draw from a few dozen unique values across the whole catalog. sys.intern collapses those duplicates onto a single PyUnicode object each, trimming several MB off the loaded catalog's resident size with no wire-shape change. Lands as part of the #934 footprint cleanup (PR 2 of the checklist).
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #938 +/- ##
=======================================
Coverage 99.24% 99.24%
=======================================
Files 191 191
Lines 14243 14252 +9
=======================================
+ Hits 14136 14145 +9
Misses 107 107
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
Contributor
There was a problem hiding this comment.
Pull request overview
This PR reduces the in-memory footprint of the components catalog at startup by interning a small set of closed-vocabulary strings while parsing definitions/components.json, avoiding thousands of duplicate str objects without changing the API/wire shape.
Changes:
- Added
_intern_optionaland_intern_str_listhelpers for safesys.intern()usage during JSON → model loading. - Interned
platform_type,references_component, andsupported_platformswhen loadingConfigEntry. - Interned
supported_platformswhen loadingComponentCatalogEntry.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this implement/fix?
The component catalog parses ~13k
ConfigEntryand ~900ComponentCatalogEntryinstances at startup, each carrying duplicate copies of a handful of closed-vocabulary strings. Measured against the live catalog: 12 uniqueplatform_typevalues, 32 uniquereferences_componentvalues, 9 uniquesupported_platformsvalues — repeated across thousands of entries.sys.interncollapses every occurrence onto a singlePyUnicodeobject, trimming several MB off the loaded catalog's resident size for free. No wire-shape change, no API change.Related issue or feature (if applicable):
Changes
_intern_optional/_intern_str_listhelpers incontrollers/components.py_load_config_entryinternsplatform_type,references_component, andsupported_platformsmembers_load_componentinternssupported_platformsmemberscategoryis already aComponentCategoryenum (singleton), so no interning needed thereTypes of changes
bugfixnew-featureenhancementbreaking-changerefactordocsmaintenancecidependenciesFrontend coordination
Checklist
ruff,codespell, yaml/json/python checks).tests/where applicable.components.jsonhas not been hand-edited (regenerate viascript/sync_components.pyif a sync is needed).docs/ARCHITECTURE.mdand/ordocs/API.md.