53 lines (36 loc) · 1.34 KB

docforge

structured documentation infrastructure for humans and agents.

docforge crawls, renders, versions, and caches live documentation — turning messy, hard-to-scrape docs into clean, searchable artifacts that both humans and agents can reason over.

agents get a stable http interface.
humans get a readable, inspectable UI.
docs stop being scraped repeatedly and start being infrastructure.

why this exists

live documentation is one of the worst inputs for agents:

js-heavy pages
inconsistent structure
high token cost to scrape repeatedly
no versioning or freshness guarantees

docforge fixes this by:

rendering docs once (properly)
extracting structure, not just text
storing versioned docsets with diffs
exposing a deterministic api agents can trust

what docforge is

an agent-native api for documentation
a docset store with versioning + freshness
a human-readable ui to inspect what agents actually see
infra, not a chatbot

architecture (high level)

next.js
human ui + api gateway (agent entrypoint)
fastapi workers + playwright
render + crawl js-heavy documentation
redis + bullmq
async ingestion and crawling jobs
postgres
metadata, versions, chunks, search
s3-compatible storage (r2 / minio)
raw snapshots + extracted artifacts