Skip to content

sync: repair debug platform descriptions showing config-variable bullet#916

Open
marcelveldt wants to merge 3 commits into
mainfrom
sync-repair-debug-platform-descriptions
Open

sync: repair debug platform descriptions showing config-variable bullet#916
marcelveldt wants to merge 3 commits into
mainfrom
sync-repair-debug-platform-descriptions

Conversation

@marcelveldt
Copy link
Copy Markdown
Contributor

What does this implement/fix?

The sensor.debug and text_sensor.debug cards in the "Add Component" wizard render a raw config-variables bullet (- **free** (*Optional*): Reports the free heap size...) as their description instead of a real component description. Root cause is upstream in esphome/esphome-docs script/schema_doc.pymd_get_paragraph skips titles but accepts the first ### Configuration variables bullet as the section paragraph when there's no prose between ## <Platform> and the configuration-variables heading. That bullet ends up baked into the schema bundle as the platform component's docs field.

Add a small repair pass in the sync pipeline that detects the - **<key>** (*Optional|Required*): shape and substitutes the bare-stem umbrella entry's description — so both cards now show the proper "The debug component can be used to debug problems with ESPHome..." prose. Easy to rip out once the upstream md_get_paragraph fix lands.

  • new _repair_field_bullet_descriptions in script/sync_components.py, wired into build_catalog before the MDX backfill
  • regression check in script/check_catalog.py (_check_no_field_bullet_descriptions) so this can't silently come back
  • 7 unit tests in tests/test_sync_components_field_bullet_repair.py pinning the contract
  • regenerated components.json — exactly 2 entries change: sensor.debug and text_sensor.debug

Related issue or feature (if applicable):

Types of changes

  • Bugfix (non-breaking change which fixes an issue) — bugfix
  • New feature (non-breaking change which adds functionality) — new-feature
  • Enhancement to an existing feature — enhancement
  • Breaking change (fix or feature that would cause existing functionality to not work as expected) — breaking-change
  • Refactor (no behaviour change) — refactor
  • Documentation only — docs
  • Maintenance / chore — maintenance
  • CI / workflow change — ci
  • Dependencies bump — dependencies

Frontend coordination

  • No frontend change needed
  • Companion frontend PR: esphome/device-builder-dashboard-frontend#

Checklist

  • The code change is tested and works locally.
  • Pre-commit hooks pass (ruff, codespell, yaml/json/python checks).
  • Tests have been added or updated under tests/ where applicable.
  • components.json has not been hand-edited (regenerate via script/sync_components.py if a sync is needed).
  • Architecture-level changes are reflected in docs/ARCHITECTURE.md and/or docs/API.md.

Copilot AI review requested due to automatic review settings May 20, 2026 22:44
@github-actions github-actions Bot added the bugfix Bug fix label May 20, 2026
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 20, 2026

Merging this PR will not alter performance

✅ 25 untouched benchmarks


Comparing sync-repair-debug-platform-descriptions (0b4d69b) with main (332c3f8)

Open in CodSpeed

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 20, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.24%. Comparing base (332c3f8) to head (0b4d69b).

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #916   +/-   ##
=======================================
  Coverage   99.24%   99.24%           
=======================================
  Files         191      191           
  Lines       14200    14200           
=======================================
  Hits        14093    14093           
  Misses        107      107           
Flag Coverage Δ
py3.12 99.19% <ø> (ø)
py3.14 99.24% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a defensive repair in the component-catalog sync pipeline to prevent platform components (notably sensor.debug / text_sensor.debug) from shipping a raw “configuration variables” bullet as their description, and adds a regression guard + unit tests to keep the catalog clean until the upstream docs generator is fixed.

Changes:

  • Add _repair_field_bullet_descriptions() to detect “field-bullet-shaped” descriptions and replace/clear them during build_catalog().
  • Add a catalog regression check (_check_no_field_bullet_descriptions) to fail CI if any component descriptions still match the bullet pattern.
  • Add a focused unit test module to pin the repair pass’s contract.

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 3 comments.

File Description
script/sync_components.py Runs a new post-process pass that repairs bullet-shaped descriptions before MDX backfill.
script/check_catalog.py Adds a new regression check to detect any remaining bullet-shaped component descriptions.
tests/test_sync_components_field_bullet_repair.py Adds unit tests covering replacement/clearing behavior and non-matches.

Comment thread script/sync_components.py Outdated
Comment thread script/sync_components.py Outdated
Comment thread tests/test_sync_components_field_bullet_repair.py Outdated
@bluetoothbot
Copy link
Copy Markdown

PR Review — sync: repair debug platform descriptions showing config-variable bullet

Targeted, well-scoped workaround for a real upstream defect, with a regression guard and focused unit tests — exactly the right shape for a temporary patch. The main concern is the umbrella-resolution heuristic (cid.split('.', 1)[1]): it works for the two entries this PR actually changes (sensor.debug, text_sensor.debug), but silently misbehaves for <domain>.<platform> ids where the second segment collides with an unrelated bare component (the canonical hazard is ota.esphome pulling the ESP core config description). Since the workaround is meant to live until upstream fixes md_get_paragraph, hardening the umbrella lookup is worth doing now. Smaller items: the bullet regex is duplicated across the sync and check sides without a coupling guarantee, catalog._components reaches past a private attribute, and the esphome-docs#TBD cross-reference should be filled in before merge.


🟡 Important

1. Umbrella-by-stem heuristic is wrong for `.` ids (`script/sync_components.py`, L1057-1062)

stem = cid.split('.', 1)[1] assumes the part after the first dot is the umbrella component name. That holds for sensor.debug -> debug and text_sensor.debug -> debug, but it's incorrect for any id where the second segment is a platform name that happens to collide with an unrelated bare component. The clearest example is ota.esphome: if upstream ever bakes a bullet-shaped description into the OTA-platform entry (the very same upstream bug is in play), this code would pull the bare esphome component's description (ESP global config) into ota.esphome (the native OTA platform) and the regression check would pass because the result is no longer bullet-shaped — silent semantic corruption.

Current diff only changes 2 entries (sensor.debug, text_sensor.debug), so this isn't a live bug. But the workaround is meant to survive until the upstream fix lands, and there's no guarantee a future schema bump won't add another affected id with the wrong shape. Two ways to harden it:

  1. Restrict umbrella substitution to a known-safe set of domains (e.g. only when domain in _UMBRELLA_DOMAINSsensor, text_sensor, binary_sensor, switch, ... — where the second segment really is the umbrella stem). For everything else, clear and let MDX backfill take over.
  2. Validate that the resolved umbrella is related to the entry, e.g. by checking that the umbrella's MDX file is the same one the platform entry was scraped from.

Option 1 is the smallest change and matches the existing _UMBRELLA_ENTRIES-style mental model.

if "." in cid:
    stem = cid.split(".", 1)[1]
    umbrella = by_id.get(stem)
    if umbrella is not None:
        umbrella_desc = (umbrella.get('description') or '').strip()

🟢 Suggestions

1. Regex duplicated between check_catalog and sync_components (`script/check_catalog.py`, L273-278)

_FIELD_BULLET_DESCRIPTION here and _FIELD_BULLET_PATTERN in script/sync_components.py:1014 are identical pattern strings declared independently. If the sync-side pattern is ever widened (e.g. to match a new bullet variant upstream emits), the check-side won't keep up and a regression can slip through.

Either import the sync-side pattern (from script.sync_components import _FIELD_BULLET_PATTERN) or add a cross-reference comment on both sides flagging that they must stay in lockstep. The former is cheaper to maintain.

_FIELD_BULLET_DESCRIPTION = re.compile(
    r"^-?\s*\*\*[A-Za-z_][\w]*\*\*\s*\(\s*\*(?:Optional|Required)\*\s*\)\s*[:\-]",
)
2. Reaches into `catalog._components` private attribute (`script/check_catalog.py`, L286-288)

catalog._components is a single-underscore attribute, i.e. nominally internal. Other check functions in this file iterate the catalog via the public surface (look at how _check_option_lists and _check_component_gating access the components). Mirror that pattern instead of reaching past the underscore — otherwise a future refactor of ComponentCatalog could break this check silently.

for component in catalog._components:
3. `esphome-docs#TBD` placeholder (`script/sync_components.py`, L1040-1042)

The docstring says Upstream: esphome/esphome-docs#TBD. — file the upstream issue first and inline the real number, or drop the line. A TBD cross-reference that ships rots immediately and future readers can't tell whether the upstream fix has landed (which is the whole gating condition for ripping this function out).

Remove this whole function (and the regex above) when the upstream
fix lands and the schema bundle stops emitting these descriptions.
Upstream: esphome/esphome-docs#TBD.
4. Missing test for the `.` collision case

Whichever way the umbrella-by-stem behaviour lands (current heuristic, or a domain-restricted one per the warning above), add a test pinning what happens for ota.esphome shape: an entry id whose second segment collides with an unrelated bare component. The current suite covers the happy path (sensor.debug -> debug), the orphan path, and the empty-umbrella path, but not the wrong-umbrella path that's the actual semantic hazard.

Example shape:

def test_does_not_pull_unrelated_bare_component_as_umbrella() -> None:
    entries = [
        {"id": "esphome", "description": "ESPHome core config block."},
        {"id": "ota.esphome", "description": "- **port** (*Optional*): ..."},
    ]
    _repair_field_bullet_descriptions(entries)
    # ota.esphome must NOT inherit the bare `esphome` core description
    assert entries[1]["description"] != "ESPHome core config block."

Checklist

  • No hardcoded secrets or unsafe deserialization
  • Input validation at boundaries
  • Error handling does not silently swallow failures
  • Logic is correct across edge cases (incl. <domain>.<platform> collisions) — warning #1
  • No duplicated invariants without a coupling guarantee — suggestion #2
  • Respects public/private API boundaries — suggestion #3
  • Tests cover the substantive edge cases — suggestion #5
  • Comments / docstrings carry stable references — suggestion #4

Summary

Targeted, well-scoped workaround for a real upstream defect, with a regression guard and focused unit tests — exactly the right shape for a temporary patch. The main concern is the umbrella-resolution heuristic (cid.split('.', 1)[1]): it works for the two entries this PR actually changes (sensor.debug, text_sensor.debug), but silently misbehaves for <domain>.<platform> ids where the second segment collides with an unrelated bare component (the canonical hazard is ota.esphome pulling the ESP core config description). Since the workaround is meant to live until upstream fixes md_get_paragraph, hardening the umbrella lookup is worth doing now. Smaller items: the bullet regex is duplicated across the sync and check sides without a coupling guarantee, catalog._components reaches past a private attribute, and the esphome-docs#TBD cross-reference should be filled in before merge.


To rebase specific severity levels, mention me: @bluetoothbot rebase critical (fixes 🔴 only), @bluetoothbot rebase important (fixes 🔴 + 🟡), or just @bluetoothbot rebase for all.


Automated review by Kōan4599a34

@marcelveldt
Copy link
Copy Markdown
Contributor Author

addressed the review feedback in 5da7466 + 7c173ff:

  • umbrella-by-stem fixed — domain in _UMBRELLA_ENTRIES (ota, time) now skips the stem lookup so ota.esphome can't pull the core esphome description. added a regression test pinning the <domain>.<platform> collision case.
  • _FIELD_BULLET_PATTERN is now imported into check_catalog.py from sync_components.py so the two sides can't drift.
  • switched the check-side iteration to catalog._by_id.values() to match the private-access pattern used by the neighbouring check functions.
  • dropped the #TBD placeholder from the docstring — opted not to file a separate upstream issue, pinged @jesserockz directly on [Bug] Platform and Sensors further feedback (Using debug: as guide) #444 instead.
  • trimmed the test module docstring to a one-liner.

@bdraco
Copy link
Copy Markdown
Member

bdraco commented May 21, 2026

related unfinished pr esphome/esphome-docs#6618

The upstream esphome-docs schema scraper bakes the first
`### Configuration variables` bullet into the `docs` field of
platform components whose MDX section lacks a prose intro
(`sensor.debug`, `text_sensor.debug`). That bullet then renders
as the card / form description in the "Add Component" wizard.

Detect the `- **<key>** (*Optional|Required*):` shape in the
sync pipeline and substitute the bare-stem umbrella entry's
description. Removable once esphome/esphome-docs `md_get_paragraph`
is fixed.
- skip stem-based umbrella lookup for synthetic-umbrella domains
  (`ota`, `time`) — `ota.esphome`'s stem `esphome` would otherwise
  resolve to the unrelated core `esphome` component
- drop the `#TBD` placeholder from the docstring
- trim the test module docstring to a one-liner
- add regression test pinning the umbrella-domain skip
Removes the duplicate regex declaration that could drift out of
sync with the source-of-truth in `_repair_field_bullet_descriptions`.
Switches the iteration to `catalog._by_id.values()` to match the
private-access pattern used by neighbouring check functions.
@marcelveldt marcelveldt force-pushed the sync-repair-debug-platform-descriptions branch from 7c173ff to 0b4d69b Compare May 21, 2026 20:31
@marcelveldt
Copy link
Copy Markdown
Contributor Author

@bdraco OK to merge ? The final fix will land when the schema is fixed upstream but this at least fixes the weirdness in the UI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix Bug fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Platform and Sensors further feedback (Using debug: as guide)

5 participants