Skip to content

Latest commit

 

History

History
96 lines (65 loc) · 2.92 KB

File metadata and controls

96 lines (65 loc) · 2.92 KB

Article-Reader

A Claude Code skill that fetches web articles and saves them as clean Markdown files locally. It auto-detects the URL type and picks the best scraping strategy.

Supported Sources

Source Strategy
X (Twitter) posts & articles fxtwitter API, falls back to Playwright
WeChat Official Account articles Playwright (mobile UA)
Everything else Jina Reader, falls back to Playwright

Features

  • Auto URL routing - detects x.com, twitter.com, mp.weixin.qq.com, or generic URLs and applies the right scraper
  • English-to-Chinese translation - automatically detects English articles and translates them to Chinese before saving
  • Clean Markdown output - preserves headings, bold/italic, links, images, blockquotes, code blocks, and lists
  • Configurable save path - asks once on first use, remembers for all subsequent fetches

Prerequisites

  • Claude Code CLI
  • Python 3.8+
  • Playwright (for WeChat and fallback scraping):
    pip install playwright
    playwright install chromium

Installation

Copy the Article-Reader folder into your Claude Code skills directory:

~/.claude/skills/Article-Reader/
├── SKILL.md
└── scripts/
    ├── scrape_tweet.py
    └── fetch_wechat.py

Or place it anywhere and register it in your Claude Code configuration.

Usage

In Claude Code, say:

帮我读一下 https://mp.weixin.qq.com/s/xxxxx
帮我读一下 https://x.com/elonmusk/status/123456789
帮我读一下 https://example.com/some-article

The skill will:

  1. Ask for a save directory (first time only)
  2. Detect the URL type and fetch the content
  3. Translate to Chinese if the article is in English
  4. Save as {title}.md to your chosen directory
  5. Show a preview with title, author, and the first 500 characters

How It Works

X (Twitter)

scripts/scrape_tweet.py first calls the fxtwitter API (api.fxtwitter.com) to get tweet data including long-form article content, images, and engagement metrics. If the API fails, it falls back to Playwright headless browser scraping.

WeChat

scripts/fetch_wechat.py uses Playwright with a mobile Safari user agent to render the WeChat article page, then walks the DOM tree to convert HTML into structured Markdown while preserving formatting.

Generic URLs

Uses Jina Reader via the r.jina.ai prefix to extract article content. If Jina fails or returns incomplete content, falls back to Playwright.

Scripts

Both scripts can also be used standalone:

# Fetch a tweet
python3 scripts/scrape_tweet.py <tweet_url> [output_dir]

# Fetch a WeChat article
python3 scripts/fetch_wechat.py <article_url> [output_dir]

License

MIT