Developer documentation for the Open Paws AI ecosystem: vector database, prediction models, generative language models, workflow automations, and HuggingFace datasets. This is the central reference for anyone building AI tools for animal advocacy using Open Paws infrastructure.
No build step -- this is a pure documentation repo. Browse by topic:
- Vector database (Weaviate):
Knowledge/README.md - Prediction models:
Predictions/README.md - Language models (8B):
Generation/README.md - n8n workflow automations:
Automation/README.md
Knowledge/ Weaviate vector-graph database docs (connection, search, RAG, schema)
Predictions/ HuggingFace text regression models (performance + preference prediction)
Generation/ 8B language models (Llama 3.1 base, continual pre-training + instruct)
Automation/ n8n workflow templates for advocacy automation
.github/ Dependabot config + CI workflows
| File | Purpose |
|---|---|
Knowledge/README.md |
Weaviate connection details, search operations, RAG patterns, Content schema — primary reference for querying the Open Paws vector-graph database |
| File | Purpose |
|---|---|
Predictions/README.md |
HuggingFace text regression model usage — performance and preference prediction, batch processing patterns, score clipping |
| File | Purpose |
|---|---|
Generation/README.md |
8B language model usage — Llama 3.1 base with continual pre-training and instruct tuning, generation parameters, known limitations |
| File | Purpose |
|---|---|
Automation/README.md |
n8n workflow automation — hosting options (cloud/self-hosted), workflow import/export, activation, advocacy-specific workflow templates |
| File | Purpose |
|---|---|
Infrastructure/README.md |
Clean-room agent architecture reference — shared runtime patterns, tool registry, operator controls, and safety boundaries across Open Paws repos. Canonical source for the 2026-04-01 clean-room reuse decision |
| File | Purpose |
|---|---|
README.md |
Human-facing overview — quick start, architecture summary, HuggingFace dataset links |
CONTRIBUTING.md |
Contribution guidelines for documentation PRs |
.gitleaksignore |
Secret scanning exclusions — covers read-only Weaviate API keys that appear in code examples |
| Service | Purpose | Docs |
|---|---|---|
| Weaviate Cloud | Vector-graph database (read-only access) | weaviate.io |
| HuggingFace | Model hosting + datasets | huggingface.co/open-paws |
| OpenAI API | Embeddings for Weaviate search | platform.openai.com |
| OpenRouter | LLM routing (used by downstream tools) | openrouter.ai |
| n8n | Workflow automation platform | docs.n8n.io |
- Conversational Fine-Tuning
- Continued Pre-Training
- Visual Q&A
- Animal Alignment Feedback
- Reasoning
- Tool Use
- Adding documentation: Create a new directory with a
README.mdfollowing the existing pattern - Code examples: Python with
transformersorweaviateclient libraries -- keep examples copy-pasteable - Style: Each section should be self-contained with connection details, code samples, and best practices
Layer: 1 | Lever: Strengthen | Integration: Reference material for platform and ecosystem
This repo documents the AI infrastructure layer that Open Paws tools are built on. It is a reference for Guild developers, bootcamp students, and coalition partners building on the platform.
Settled decisions affecting this repo:
- 2026-04-01: Clean-room agent architecture —
documentationowns the shared infrastructure note for the clean-room reuse decision. PR #7 merged (part of the 3/4 clean-room rollout completed 2026-04-09: PCC#13, platform#42, docs#7 merged). Remaining: Tools-Platform#1 repo name needs verification before the rollout is marked complete. Seeclosed-decisions.md2026-04-01.
Current status (as of 2026-04-09): Active reference. Clean-room architecture PR #7 merged. The shared infrastructure note is live. Tools-Platform#1 is the outstanding item in the 4-repo rollout.
- DRY — AI clones code at 4x the human rate. Search before writing anything new
- Deep modules — Reject shallow wrappers and pass-through methods
- Single responsibility — Each function does one thing at one level of abstraction
- Error handling — Never catch-all
- Information hiding — Don't expose internal state. Mask API keys (last 4 chars only)
- Ubiquitous language — Use movement terminology consistently
- Design for change — Abstraction layers and loose coupling
- Legacy velocity — Use characterization tests before modifying existing code
- Over-patterning — Simplest structure that works
- Test quality — Every test must fail when the covered behavior breaks
- Desloppify:
desloppify scan --path .— minimum score ≥85 - Speciesist language:
semgrep --config semgrep-no-animal-violence.yamlon all docs edits - Two-failure rule: After two failed fixes on the same problem, stop and restart
All 7 concerns apply. Highlighted critical ones:
- Privacy (critical) — API keys documented here are read-only Weaviate keys.
.gitleaksignorecovers these. Never commit write-access keys. Check.gitleaksignoreis current before adding new examples. - Security — Code examples must use environment variables for any keys. Never hardcode credentials.
- Advocacy domain — All documentation must use movement terminology. Examples should reference farmed animals and factory farms, not industry euphemisms.
- Accessibility — Documentation must work for developers on low-bandwidth connections. Avoid large embedded images.
- Emotional safety — If documentation examples include advocacy content (animal welfare data, investigation statistics), apply content warnings.
See CONTRIBUTING.md for the full list of required movement terminology and speciesist idioms to avoid.
For tool-specific AI coding instructions (Claude Code rules, Cursor MDC, Copilot, Windsurf, etc.), copy the corresponding directory from structured-coding-with-ai into this project root.
Last reviewed: 2026-04-11