Skip to content

fix(scraper): support hash-routed spa crawling (#379)#380

Open
arabold wants to merge 6 commits intomainfrom
fix/379-support-hash-routed-spas
Open

fix(scraper): support hash-routed spa crawling (#379)#380
arabold wants to merge 6 commits intomainfrom
fix/379-support-hash-routed-spas

Conversation

@arabold
Copy link
Copy Markdown
Owner

@arabold arabold commented Mar 29, 2026

Summary

  • add explicit preserveHashes support across CLI, MCP, web UI, pipeline, refresh, and stored scraper options for hash-routed SPA docs sites
  • preserve hash fragments in crawl identity and page storage while keeping normal anchor-link collapsing as the default behavior
  • upgrade fetch jobs to playwright when hash preservation is enabled, and cover refresh reuse/override plus redirect/interception edge cases in tests

Validation

  • npm run typecheck
  • npm run lint
  • npx vitest run src/tools/ScrapeTool.test.ts src/cli/commands/scrape.test.ts src/cli/commands/refresh.test.ts src/pipeline/PipelineManager.test.ts src/scraper/strategies/BaseScraperStrategy.test.ts src/scraper/strategies/WebScraperStrategy.test.ts src/scraper/middleware/HtmlPlaywrightMiddleware.test.ts src/mcp/mcpServer.test.ts
  • npx vitest run src/store/DocumentStore.test.ts src/store/DocumentManagementService.test.ts src/tools/ListLibrariesTool.test.ts

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Drafts an OpenSpec change plan to add opt-in support for preserving hash fragments during web crawling so hash-routed SPA documentation sites can be discovered and indexed correctly (issue #379).

Changes:

  • Adds OpenSpec proposal/design/spec for a --preserve-hashes / preserveHashes option.
  • Defines implementation tasks covering CLI/config wiring, URL normalization changes, Playwright interception adjustments, and tests.
  • Introduces OpenSpec metadata for the change package.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
openspec/changes/support-hash-routed-spas/.openspec.yaml Registers the change package with schema + creation date.
openspec/changes/support-hash-routed-spas/design.md Documents goals, decisions, and risks for hash-route preservation and Playwright interception behavior.
openspec/changes/support-hash-routed-spas/proposal.md Explains motivation, high-level approach, and impact of the new opt-in flag.
openspec/changes/support-hash-routed-spas/specs/hash-routed-spa-support/spec.md Adds requirements/scenarios for preserving hashes, enforcing Playwright, and interception matching.
openspec/changes/support-hash-routed-spas/tasks.md Breaks the implementation into concrete steps across config/CLI, core scraper, middleware, docs, and tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@arabold arabold changed the title fix(scraper): draft specs for hash-routed SPA support (#379) fix(scraper): support hash-routed spa crawling (#379) Mar 30, 2026
@arabold arabold requested a review from Copilot March 30, 2026 02:55
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 41 out of 41 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants