fix: detect bold/italic/underline from semantic HTML tags (, , , , ) by vibeyclaw · Pull Request #94 · alphanome-ai/sec-parser

vibeyclaw · 2026-02-23T16:24:52Z

Problem

Fixes #61

The HighlightedTextClassifier was silently skipping text styled via semantic HTML tags like  and . As shown in the issue, many SEC filings use bare  tags rather than style="font-weight:bold" to apply bold formatting, so those text elements were never classified as HighlightedTextElement or promoted to TitleElement.

Root cause: _compute_effective_style in text_styles_metrics.py only walked the tag tree looking at style="..." attributes. It had no knowledge of the implied CSS properties that HTML semantic tags carry.

Fix

Extended _compute_effective_style to recognise a small set of semantic tags and map them to their implied CSS properties after any inline style attribute is processed (so inline styles still win):

Tag	Implied CSS
`<b>`, `<strong>`	`font-weight: bold`
`<i>`, `<em>`	`font-style: italic`
`<u>`	`text-decoration: underline`

The setdefault pattern already used throughout the function ensures correct cascade precedence: an explicit inline style always wins over the implied tag style.

Tests added

test_should_detect_bold_from_b_tag
test_should_detect_bold_from_strong_tag
test_should_detect_italic_from_i_tag
test_should_detect_italic_from_em_tag
test_inline_style_should_override_semantic_tag (precedence check)
Two new test_title_step parametrize cases (bold via tag and bold via tag) verifying end-to-end promotion to TitleElement

All 12 tests pass. No existing tests were modified.

…ng>, , , ) The HighlightedTextClassifier was not detecting text styled via semantic HTML tags such as and . The _compute_effective_style function only examined inline CSS style attributes, missing the implied font-weight that and carry by default. Fix: Extend _compute_effective_style to map semantic tag names to their implied CSS properties before falling back to inline styles. Inline styles still take precedence (processed first with setdefault semantics). Supported tag → CSS property mappings added: , → font-weight: bold , → font-style: italic → text-decoration: underline Adds tests covering , , , , inline-style override, and end-to-end HighlightedTextClassifier / TitleClassifier integration. Fixes alphanome-ai#61

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: detect bold/italic/underline from semantic HTML tags (<b>, <strong>, <i>, <em>, <u>)#94

fix: detect bold/italic/underline from semantic HTML tags (<b>, <strong>, <i>, <em>, <u>)#94
vibeyclaw wants to merge 1 commit intoalphanome-ai:mainfrom
vibeyclaw:fix/highlighted-text-classifier-b-tags

vibeyclaw commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vibeyclaw commented Feb 23, 2026

Problem

Fix

Tests added

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant