Skip to content

Latest commit

 

History

History
53 lines (36 loc) · 1.34 KB

File metadata and controls

53 lines (36 loc) · 1.34 KB

docforge

structured documentation infrastructure for humans and agents.

docforge crawls, renders, versions, and caches live documentation — turning messy, hard-to-scrape docs into clean, searchable artifacts that both humans and agents can reason over.

agents get a stable http interface.
humans get a readable, inspectable UI.
docs stop being scraped repeatedly and start being infrastructure.


why this exists

live documentation is one of the worst inputs for agents:

  • js-heavy pages
  • inconsistent structure
  • high token cost to scrape repeatedly
  • no versioning or freshness guarantees

docforge fixes this by:

  • rendering docs once (properly)
  • extracting structure, not just text
  • storing versioned docsets with diffs
  • exposing a deterministic api agents can trust

what docforge is

  • an agent-native api for documentation
  • a docset store with versioning + freshness
  • a human-readable ui to inspect what agents actually see
  • infra, not a chatbot

architecture (high level)

  • next.js
    human ui + api gateway (agent entrypoint)

  • fastapi workers + playwright
    render + crawl js-heavy documentation

  • redis + bullmq
    async ingestion and crawling jobs

  • postgres
    metadata, versions, chunks, search

  • s3-compatible storage (r2 / minio)
    raw snapshots + extracted artifacts