EVAL Skill Packs

EVAL Skills
Machine-readable tool intelligence for AI agents and ML engineers

EVAL Skill Packs

We eval the tools so you can ship the models.

EVAL Skill Packs are structured, machine-readable files that teach AI agents how to use ML/AI tools correctly. Each skill file contains everything an agent (or engineer) needs: when to use a tool, when NOT to use it, quick start commands, common patterns, configuration reference, pitfalls, and comparisons to alternatives.

Think of them as man pages for the AI age — but opinionated, practical, and designed to be consumed by both humans reading docs AND agents executing tasks.

Why This Exists

AI coding agents are great at writing code. They're terrible at making tooling decisions. Ask an agent to "set up LLM serving" and you'll get a hallucinated config from 2023. These skill files fix that by giving agents:

Current, tested knowledge — every skill is validated against the actual tool
Decision logic — explicit "when to use" and "when NOT to use" triggers
Copy-paste patterns — working code snippets, not pseudocode
Gotcha awareness — the pitfalls that waste hours when you hit them

Quick Start

For Humans

Browse the skills/ directory. Each .md file is a self-contained guide to one tool:

skills/
├── inference/          # LLM serving & inference engines
│   └── vllm-serving.md
├── data/               # Vector DBs, data pipelines, storage
│   └── qdrant-vector-db.md
├── orchestration/      # LLM frameworks, chaining, agents
│   └── langchain-lcel.md
├── tracking/           # Experiment tracking, observability
│   └── wandb-tracking.md
└── training/           # Fine-tuning, training frameworks
    └── axolotl-finetuning.md

For AI Agents

Skill files are designed for agent ingestion. Each file has YAML frontmatter with structured metadata followed by markdown content:

---
name: vllm
version: 0.7.3
category: inference
trigger: 'when the user needs to serve an LLM locally...'
updated: 2026-03-11
confidence: tested
eval_issue: 1
---

Agent integration pattern:

import yaml
import pathlib

def load_skills(skills_dir: str = "skills") -> list[dict]:
    """Load all EVAL skill files into a list of structured dicts."""
    skills = []
    for path in pathlib.Path(skills_dir).rglob("*.md"):
        text = path.read_text()
        if text.startswith("---"):
            _, frontmatter, body = text.split("---", 2)
            meta = yaml.safe_load(frontmatter)
            meta["content"] = body.strip()
            meta["path"] = str(path)
            skills.append(meta)
    return skills

def find_skill(skills: list[dict], query: str) -> dict | None:
    """Find the most relevant skill for a user query.
    
    In production, use embedding similarity against the 'trigger' field.
    Simple keyword matching shown here for clarity.
    """
    query_lower = query.lower()
    for skill in skills:
        trigger = skill.get("trigger", "").lower()
        if any(word in trigger for word in query_lower.split()):
            return skill
    return None

# Usage
skills = load_skills("skills")
skill = find_skill(skills, "I need to serve a Llama model")
if skill:
    print(f"Use {skill['name']} v{skill['version']}")
    print(skill["content"])

Using the trigger field for tool selection:

The trigger field is a natural-language description of when this tool is the right choice. Agents should match user intent against trigger fields to select the right skill. This works with:

Embedding similarity — embed the trigger and the user's request, pick highest cosine similarity
LLM routing — pass all triggers to a cheap model, ask it to pick the best match
Keyword matching — simple but effective for explicit tool mentions

Fetching Skills at Runtime

import urllib.request
import yaml

RAW_BASE = "https://raw.githubusercontent.com/eval-report/skills/main/skills"

def fetch_skill(category: str, filename: str) -> dict:
    """Fetch a single skill file from GitHub."""
    url = f"{RAW_BASE}/{category}/{filename}"
    text = urllib.request.urlopen(url).read().decode()
    _, frontmatter, body = text.split("---", 2)
    meta = yaml.safe_load(frontmatter)
    meta["content"] = body.strip()
    return meta

# Fetch the vLLM skill on-demand
skill = fetch_skill("inference", "vllm-serving.md")

Skill File Format

Every skill file follows the EVAL Skill Format Specification. The key components:

Section	Purpose
YAML Frontmatter	Machine-readable metadata: name, version, category, trigger, confidence
When to Use	Bullet list of scenarios where this tool is the right choice
When NOT to Use	Bullet list of scenarios with better alternatives (with recommendations)
Quick Start	Minimal steps to get from zero to working — copy-paste ready
Common Patterns	Real-world usage patterns with complete code examples
Configuration Reference	Table of flags/options/parameters with defaults
Pitfalls & Gotchas	Things that will waste your time if you don't know about them
Compared To	Feature matrix against alternatives

See SKILL_FORMAT.md for the complete specification.

Validating Skills

Use the validation script to check skill files against the format spec:

# Validate a single skill
python scripts/validate_skill.py skills/inference/vllm-serving.md

# Validate all skills
python scripts/validate_skill.py skills/

# Strict mode (warnings become errors)
python scripts/validate_skill.py --strict skills/

Requirements: Python 3.10+, pyyaml

pip install pyyaml

Contributing

We welcome skill contributions! Here's how:

Writing a New Skill

Fork this repo
Pick a tool that ML engineers actually use in production
Copy the template: cp SKILL_FORMAT.md skills/<category>/your-tool.md
Follow the format spec — every section matters
Test your examples — all code snippets must work
Submit a PR

Quality Bar

Every skill must meet these criteria:

Valid YAML frontmatter with all required fields
trigger field accurately describes when to use the tool
"When NOT to Use" section includes specific alternatives
Quick Start goes from pip install to working output in <5 commands
Code examples are complete and runnable (no ... placeholders)
Configuration table covers the 10 most-used options
Pitfalls section includes at least 3 non-obvious gotchas
Comparison table includes at least 2 alternatives
Passes python scripts/validate_skill.py --strict

Requesting a Skill

Don't see the tool you need? Open a skill request.

About EVAL

EVAL is The AI Tooling Intelligence Report — a weekly newsletter for ML engineers making tooling decisions. We systematically track, test, and evaluate the tools that AI builders use in production. No hype. No sponsored reviews. Just honest, opinionated analysis.

Newsletter: evalreport.com
Twitter/X: @evalreport
GitHub: eval-report

Each skill in this repo corresponds to a tool covered in the newsletter. The eval_issue field in the frontmatter links back to the newsletter issue where that tool was evaluated.

License

MIT — see LICENSE. Use these skills in your agents, products, and workflows. Attribution appreciated but not required.

Built by EVAL — we eval the tools so you can ship the models.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
scripts		scripts
skills		skills
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
SKILL_FORMAT.md		SKILL_FORMAT.md

Category	Directory	What's in it
Inference	`skills/inference/`	LLM serving engines, inference optimization, model deployment
Data	`skills/data/`	Vector databases, data pipelines, embeddings, storage
Orchestration	`skills/orchestration/`	LLM frameworks, prompt chaining, agent frameworks
Tracking	`skills/tracking/`	Experiment tracking, LLM observability, monitoring
Training	`skills/training/`	Fine-tuning frameworks, training pipelines, RLHF

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EVAL Skill Packs

Why This Exists

Quick Start

For Humans

For AI Agents

Fetching Skills at Runtime

Skill File Format

Validating Skills

Categories

Contributing

Writing a New Skill

Quality Bar

Requesting a Skill

About EVAL

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

EVAL Skill Packs

Why This Exists

Quick Start

For Humans

For AI Agents

Fetching Skills at Runtime

Skill File Format

Validating Skills

Categories

Contributing

Writing a New Skill

Quality Bar

Requesting a Skill

About EVAL

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages