Skip to content

feat(fetch): include page title in extracted content#3739

Open
Christian-Sidak wants to merge 1 commit intomodelcontextprotocol:mainfrom
Christian-Sidak:feat-fetch-include-page-title
Open

feat(fetch): include page title in extracted content#3739
Christian-Sidak wants to merge 1 commit intomodelcontextprotocol:mainfrom
Christian-Sidak:feat-fetch-include-page-title

Conversation

@Christian-Sidak
Copy link
Copy Markdown

Summary

When the fetch server extracts content from HTML pages, readabilipy already parses the <title> tag, but the title was being discarded. This means fetched pages lose important context about what the page is.

This PR prepends the page title as a markdown # heading when present:

# What's new in 2.1.0 (Aug 30, 2023)

Content of the page...
  • Handles missing, null, and whitespace-only titles gracefully (no heading prepended)
  • 3 lines of code change in extract_content_from_html()
  • Both the fetch tool and the get-page prompt benefit automatically since they share the same extraction function

Fixes #2472

Test plan

  • 3 new unit tests: title present, title missing, whitespace-only title
  • Tests mock readabilipy to avoid Node.js dependency in CI
  • All new tests pass

The readabilipy library already extracts the HTML page title, but it
was being discarded. Now the title is prepended as a markdown H1
heading when present, giving consumers useful context about the page.

Handles missing, null, and whitespace-only titles gracefully.

Fixes modelcontextprotocol#2472
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Include title of fetched page in content returned by the fetch server

2 participants