Agentic Test Data Harness for Python.
Generate realistic, context-aware fixtures — deterministic in CI, AI-powered in development.
# This is what most test data looks like:
user = User(name="Test User", email="test@test.com", bio="Lorem ipsum...")
# It doesn't catch real-world edge cases.
# It doesn't feel like production data.
# And writing 500 of them by hand? Not happening.FixtureForge solves this in two modes:
# CI mode — deterministic, zero AI, seed-controlled. Same seed = same data. Always.
forge = Forge(use_ai=False, seed=42)
users = forge.create_batch(User, count=500)
# Dev mode — AI-generated, context-aware, realistic
forge = Forge()
reviews = forge.create_batch(Review, count=50, context="angry holiday customers")pip install fixtureforgeWith your preferred AI provider:
pip install "fixtureforge[anthropic]" # Claude
pip install "fixtureforge[openai]" # GPT
pip install "fixtureforge[gemini]" # Google Gemini
pip install "fixtureforge[all]" # All providersfrom fixtureforge import Forge
from pydantic import BaseModel
class User(BaseModel):
id: int
name: str
email: str
bio: str
forge = Forge() # auto-detects provider from env vars
users = forge.create_batch(User, count=50, context="SaaS platform users")That's it. FixtureForge:
- Assigns sequential IDs automatically
- Generates
nameandemailwith Faker (zero API cost) - Sends only
bioto the AI — in a single batch call for all 50 records
Every field is classified into a tier. Only semantic fields hit the AI:
| Tier | Fields | Generator | Cost |
|---|---|---|---|
| Structural | id, user_id, order_id |
Internal counters / FK registry | Free |
| Standard | name, email, phone, address, date |
Faker | Free |
| Computed | @computed_field properties |
Pydantic | Free |
| Semantic | bio, description, review, message |
LLM (batched) | API tokens |
100 users with 2 semantic fields = 2 API calls, not 200.
# CI — fully deterministic, no network, reproducible
forge = Forge(use_ai=False, seed=42)
# Dev — AI-powered, realistic context
forge = Forge(provider_name="anthropic", model="claude-haiku-4-5-20251001")
# Large datasets — seed+interpolation, constant cost regardless of count
forge.create_large(Order, count=100_000, seed_ratio=0.01) # pays for ~1k, delivers 100kSee exactly where each value comes from:
forge = Forge(use_ai=False, seed=42, verbose=True)
user = forge.create(User)
# [structural] id = 1
# [faker] name = 'Allison Hill'
# [faker] email = 'donaldgarcia@example.net'
# [ai] bio = 'Passionate developer with 8 years...'FixtureForge auto-detects your provider from environment variables:
export ANTHROPIC_API_KEY=... # → Claude (default: claude-haiku-4-5-20251001)
export OPENAI_API_KEY=... # → GPT (default: gpt-4o-mini)
export GOOGLE_API_KEY=... # → Gemini (default: gemini-2.0-flash)
export GROQ_API_KEY=... # → Groq (default: llama-3.3-70b-versatile)
# No key? → Ollama (localhost:11434) → Deterministic-onlyOr be explicit:
forge = Forge(provider_name="anthropic", model="claude-sonnet-4-6")
forge = Forge(provider_name="ollama", model="llama3.2")
forge = Forge(use_ai=False) # zero cost, zero networkRegister parent records first — child FKs resolve automatically:
# Step 1: generate customers
customers = forge.create_batch(Customer, count=10)
# Step 2: orders automatically reference real customer IDs
orders = forge.create_batch(Order, count=100)
# order.customer_id → always a valid customer.idGenerate multiple models in parallel with shared AI cache.
The first model warms the cache; every subsequent model inherits it (~90% cheaper per model).
results = forge.swarm(
models=[User, Order, Product, Payment],
counts=[10, 50, 100, 30],
contexts=["SaaS users", "E-commerce orders", None, None],
)
# returns:
# {
# "User": [...10 users...],
# "Order": [...50 orders...],
# "Product": [...100 products...],
# "Payment": [...30 payments...],
# }5 models ≈ cost of 1.5 models.
FixtureForge classifies models by data sensitivity and gates dangerous operations:
class SafeUser(BaseModel):
id: int
name: str # SAFE — auto-approved
class CustomerProfile(BaseModel):
id: int
ssn: str # SENSITIVE — requires FORGE_ALLOW_PII=1
salary: float # SENSITIVE
class SecurityTest(BaseModel):
id: int
sql_injection: str # DANGEROUS — requires interactive confirmation# PII auto-approved
forge = Forge(allow_pii=True)
# CI/headless — dangerous ops silently rejected
forge = Forge(interactive=False)Three levels: safe (auto) → sensitive (env gate) → dangerous (human prompt).
Persist business rules that survive across sessions.
Rules are re-read on every generation call — update a rule, next call respects it immediately.
forge.memory.add_rule("financial", "Users under 18 get restricted account type")
forge.memory.add_rule("user", "Israeli phone numbers use format 05x-xxx-xxxx")
forge.memory.add_rule("orders", "Max 3 active loans per customer at any time")
# Rules inject into AI prompts automatically
users = forge.create_batch(User, count=50, context="Israeli SaaS platform")Skeptical Memory — rules are hints, not truth. FixtureForge validates stored rules against the live schema before every generation call.
Progressive Forgetting — field names and types are never stored (re-derivable from the model). Only business rules that exist nowhere else in the code are kept.
Find gaps in your test-data coverage automatically:
import os
os.environ["FORGE_FLAG_DREAM"] = "1"
report = forge.dream(models=[User, Order], force=True)
print(report.summary())
# ForgeDream Report - 2026-04-08
# Coverage gaps found : 3
# Rule conflicts found : 0
# Top gaps:
# [User.age] no_boundary : No boundary-value rules for numeric field 'age'
# [User.email] no_invalid : No invalid-data rules for well-known field 'email'
# [Order.total] no_boundary: No boundary-value rules for numeric field 'total'Four phases: Orient (read index) → Gather (find gaps) → Consolidate (merge rules) → Prune (trim to ≤200 lines).
Report saved as .forge/coverage_gaps.json.
# Lazy evaluation — writes to disk one record at a time
for user in forge.create_stream(User, count=1_000_000, filename="users.json"):
pass # process one record, never loads all into memorySupports .json, .csv, .sql output formats.
from fixtureforge.core.exporter import DataExporter
users = forge.create_batch(User, count=100)
DataExporter.to_json(users, "users.json")
DataExporter.to_csv(users, "users.csv")
DataExporter.to_sql(users, "users.sql", table_name="users")AI responses are cached locally for 7 days. Identical requests cost nothing after the first call.
forge = Forge(use_cache=True) # default — saves to ~/.fixtureforge/cache/
forge = Forge(use_cache=False) # disable cachingfrom fixtureforge.config import is_enabled, flag_summary
flag_summary()
# {
# 'FORGE_SWARMS': True, # shipped
# 'FORGE_PERMISSIONS': True, # shipped
# 'FORGE_COMPRESSION': True, # shipped
# 'FORGE_MCP': True, # shipped
# 'FORGE_DREAM': False, # enable with FORGE_FLAG_DREAM=1
# 'FORGE_KAIROS': False, # coming in v2.x
# 'FORGE_ULTRAPLAN': False, # coming in v2.x
# }Enable any staged feature with an env var:
FORGE_FLAG_DREAM=1 python run_tests.pyforge.stats()
# {
# "registry": {"user": 50, "order": 200},
# "session_tokens": 1240,
# "memory": {"topics": 3, "total_kb": 2.4},
# "flags": {"FORGE_SWARMS": True, "FORGE_PERMISSIONS": True}
# }
forge.clear_registry() # reset FK registry between independent test scenariosFixtureForge v2.0
├── Config Layer feature flags, env-var overrides
├── Security Layer safe / sensitive / dangerous gates, mailbox pattern
├── Memory Layer FORGE.md pointer index, on-demand topic files
├── Generation Layer IntelligentRouter, SmartBatchEngine, DataSwarms
├── Compression Layer Micro → Auto → Full (three-layer pipeline)
├── Export Layer JSON / CSV / SQL / streaming
└── Background Layer ForgeDream coverage analysis (feature-flagged)
Provider-agnostic: Claude, GPT, Gemini, Groq, Ollama, or no AI at all.
Pydantic v2 native: full support for @computed_field, validators, and constrained types.
CI-safe: seed= parameter guarantees identical output across runs.
| FixtureForge | factory_boy | faker | hypothesis | |
|---|---|---|---|---|
| AI-generated context | Yes | No | No | No |
| Deterministic (seed=) | Yes | Yes | Yes | Yes |
| FK relationships | Auto | Manual | No | No |
| Coverage analysis | Yes | No | No | Partial |
| CI-safe mode | Yes | Yes | Yes | Yes |
| Large datasets | Yes (100k+) | Manual | Manual | No |
| Permission gates | Yes | No | No | No |
FixtureForge is not a replacement for faker — it uses faker internally. It's not a replacement for hypothesis — it solves a different problem. It adds the layer between "I need realistic data" and "I need it to feel like production".
- Python 3.11+
- pydantic >= 2.5
- faker >= 22.0
AI providers are optional extras — the core works with zero dependencies beyond pydantic and faker.
MIT — see LICENSE.
