Contributing to ScrapeAgent

The easiest way to contribute is to add a new Agent Skill — a Markdown file that teaches the agent how to scrape a specific website. No Python knowledge required.

Skill Naming Conventions

Use kebab-case, all lowercase: amazon-products, reddit-posts, linkedin-jobs
Include both the site name and the data type: hacker-news not just news
Keep skills focused and single-purpose:
- reddit-posts and reddit-comments are better than one fat reddit skill
- Smaller skills = less context loaded per turn = better agent performance
No consecutive hyphens, no leading/trailing hyphens

SKILL.md Structure

Every skill is a directory under scrapeagent/skills/ containing at minimum a SKILL.md file.

scrapeagent/skills/
└── my-site-data/
    ├── SKILL.md          # required
    ├── references/       # optional: additional .md reference files
    └── assets/           # optional: templates, schemas, examples

SKILL.md format

---
name: my-site-data
description: |
  One to two sentences. Describe WHAT data it scrapes and WHEN to trigger it.
  Example: "Scrape product listings from example.com including name, price, and rating.
  Use when the user mentions example.com or wants product data."
metadata:
  author: your-github-handle
  version: "1.0"
---

# My Site Data Scraper

## Overview
What this skill does and the target URL.

## Parameters (if applicable)
- param: description

## URL Construction
How to build the URL from parameters.

## Tool to use
- `fetch` for static HTML (no JavaScript required)
- `playwright` for JavaScript-rendered pages

## Extraction Instructions

Step-by-step guide with exact CSS selectors. Example:

1. Fetch `https://example.com/listings`
2. For each `.product-card` element extract:
   - `name`: `.product-title` text
   - `price`: `.price` text, strip `$`
   - `rating`: `.stars` data-rating attribute
3. Save with columns: name, price, rating

## Notes
- Rate limits, authentication requirements, edge cases

Description guidelines

The description field is the only thing loaded into the agent's context at startup (L1 metadata). Make it count:

Say what site it scrapes and what data it returns
Include the trigger words a user might say ("when the user mentions X")
Keep it under 3 sentences
Bad: "Scrapes a website." Good: "Scrapes product listings from Amazon including title, price, ASIN, and rating. Use when the user mentions Amazon, product prices, or ASIN numbers."

Testing a Skill Locally

Add your skill directory to scrapeagent/skills/
Start the agent: adk web
Ask the agent to perform the scrape your skill covers
Verify the output file in ./output/
Check that the correct skill was triggered (the agent will mention it)

Context Window Guidelines

Keep skills focused and under 500 lines.

The ADK Skills system uses progressive disclosure: only the skill description loads at startup (~100 tokens), and the full SKILL.md body loads only when the agent activates the skill. Bloated skills degrade performance for every user because:

More tokens loaded per turn = less room for scraped HTML content
Large skills are less likely to be followed precisely
One skill per data type is better than one mega-skill per site

PR Checklist

Before submitting a pull request with a new skill:

Skill name is kebab-case and clearly describes site + data type
description field covers what it scrapes AND when to trigger it
Extraction instructions include exact CSS selectors (tested against the live site)
Tool selection is correct (fetch vs playwright)
Edge cases are documented (empty fields, JS requirement, pagination)
Skill has been tested locally with adk web
No hardcoded credentials or API keys in any file
Skill body is under 500 lines

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contributing to ScrapeAgent

Skill Naming Conventions

SKILL.md Structure

SKILL.md format

Description guidelines

Testing a Skill Locally

Context Window Guidelines

PR Checklist

Resources

FilesExpand file tree

CONTRIBUTING.md

Latest commit

History

CONTRIBUTING.md

File metadata and controls

Contributing to ScrapeAgent

Skill Naming Conventions

SKILL.md Structure

SKILL.md format

Description guidelines

Testing a Skill Locally

Context Window Guidelines

PR Checklist

Resources