Skip to content

Latest commit

 

History

History
24 lines (16 loc) · 1011 Bytes

File metadata and controls

24 lines (16 loc) · 1011 Bytes

Spider - Web Article Parser

Spider is a powerful web article parser that transforms cluttered web pages into clean, readable content.

Visit spider.jlopes.eu to try it out!

Parsing Strategies

Spider supports multiple parsing strategies optimized for different types of websites:

  • auto (default): Automatically selects the best strategy based on domain
  • googlebot: Mimics Google's crawler for general content
  • facebook: Uses Facebook's external hit agent
  • archive: Optimized for archived or paywall content

Custom Parsers

You can extend Spider with custom parsers for specific domains by modifying the functions/node-fetch/node-fetch.mjs file.

Acknowledgments