╔═════════════════════════════╗ ║ ║ ║ ◈ P U R E R E A S O N ◈ ║ ║ ║ ╚═════════════════════════════╝
Fast hallucination detection for AI systems
PureReason verifies AI model outputs for hallucinations, contradictions, and overconfidence. It's a verification layer that works alongside frontier models (GPT, Claude, Gemini) - not a replacement for them.
Use it when you need:
- ✅ Fast verification (<5ms per check)
- ✅ Hallucination detection
- ✅ Explainable decisions
- ✅ Offline operation (zero API costs)
- ✅ Safety layer for AI agents
Don't use it for:
- ❌ General reasoning (use GPT-5, Claude, o1)
- ❌ Problem solving (it verifies, doesn't generate)
- ❌ Content generation
PureReason achieves strong performance on hallucination detection benchmarks:
| Benchmark | F1 Score | Task |
|---|---|---|
| HaluEval QA | 0.871 | Question answering verification |
| LogicBench | 0.846 | Structural logic detection |
| TruthfulQA | 0.798 | Misconception detection |
| HalluLens | 0.729 | Grounding + contradiction checks |
| FELM | 0.645 | Segment-level factuality |
| RAGTruth | 0.646 | Grounded hallucination detection |
| HalluMix | 0.664 | Multi-domain hallucination |
| HaluEval Dialogue | 0.634 | Dialogue verification |
| FaithBench | 0.622 | Summarization faithfulness |
Performance gains (v0.3.1):
- +25-30pp F1 improvement over baseline
- -40% latency reduction
- ±5pp ECS accuracy (vs ±15pp drift before)
Full methodology: See docs/BENCHMARK.md and docs/REPRODUCIBILITY.md
Input: "The patient must have cancer."
Output: Risk: HIGH | Confidence: 34/100
Flag: Certainty overreach
Rewrite:"The patient has findings consistent with possible malignancy."
PureReason combines:
- Symbolic logic - Deterministic verification using Z3
- Neural embeddings - Semantic similarity detection (all-MiniLM-L6-v2)
- Domain calibration - Per-domain accuracy tuning
- Knowledge grounding - Entity checking and contradiction detection
The typical workflow:
- Frontier model (GPT, Claude) generates output
- PureReason verifies and scores it (0-100 ECS)
- Agent receives verification + regulated text
- High-risk outputs flagged for human review
pip install -e .Verify any text in 3 lines of code:
from pureason.guard import ReasoningGuard
guard = ReasoningGuard(threshold=70) # 70 = moderate strictness
result = guard.verify("Water boils at 100°C at sea level.")
print(f"ECS: {result.ecs}/100, Provenance: {result.provenance}")
# Output: ECS: 83.0/100, Provenance: verifiedDecision logic:
if result.ecs >= 70:
# Accept - high confidence output
elif result.ecs >= 40:
# Review - medium confidence
else:
# Reject - low confidenceSee examples/ for production-ready code:
simple_verification.py- Basic usage (5 min)langchain_integration.py- LangChain integration (10 min)api_server.py- Production FastAPI server (15 min)
Run the simple example:
python examples/simple_verification.pyDeploy as a microservice:
python examples/api_server.pyTest it:
curl -X POST http://localhost:8000/verify \
-H "Content-Type: application/json" \
-d '{"text": "The sky is blue.", "min_ecs": 70}'cargo install --path crates/pure-reason-cli --locked
pure-reason review "The patient must have cancer."# Build the MCP server
cargo build --release -p pure-reason-mcp
# Add to your agent's MCP config
# Full guide: docs/MCP-INTEGRATION.mdYour agent (Claude Desktop, Cursor, GitHub Copilot) can then call PureReason verification tools.
from pureason.reasoning import verify_chain
# Verify a chain of reasoning steps
problem = "What is 2 + 2?"
steps = ["Let me add the numbers.", "2 + 2 = 4", "Therefore, the answer is 4."]
result = verify_chain(problem, steps)
print(f"Confidence: {result.ecs}/100")- Hallucination detection - Catches contradictions, fabrications, entity errors
- Confidence scoring - 0-100 ECS with domain-aware calibration
- Reasoning verification - Chain-of-thought and arithmetic step checking
- Text regulation - Rewrites overconfident claims to hedged language
- Multiple interfaces - CLI, MCP, Python, Rust library, REST API
- Offline operation - No API keys required, runs completely local
- Explainable results - Traceable verification logic with evidence
Step 1: A train travels 120 miles in 2 hours.
Step 2: Speed = 120 / 2 = 90 mph
Step 3: Time for 300 miles = 300 / 90 ≈ 3.3 hours
Result: INVALID
First failing step: 2
Reason: arithmetic_error (120 / 2 should be 60, not 90)
PureReason verifies each step deterministically and pinpoints exact failures.
# Verify formal syllogisms
from pureason.reasoning import verify_syllogism
report = verify_syllogism(
premises=["All mammals are warm-blooded.", "Whales are mammals."],
conclusion="Whales are warm-blooded.",
)
print(report.is_valid) # True
# Solve arithmetic word problems
from pureason.reasoning import solve_arithmetic
report = solve_arithmetic("Maria earned 50 dollars and spent 23 dollars. How much?")
print(report.answer) # "27"git clone https://github.com/sorunokoe/PureReason
cd PureReason
cargo build --release
./target/release/pure-reason review "Your text here"| Topic | Link |
|---|---|
| Benchmarks | docs/BENCHMARK.md - Full results and methodology |
| Reproducibility | docs/REPRODUCIBILITY.md - Seeds, hashes, holdout |
| MCP Integration | docs/MCP-INTEGRATION.md - Agent setup guide |
| Capabilities | docs/CAPABILITIES.md - Feature matrix |
| TRIZ Guide | docs/TRIZ-IMPLEMENTATION.md - Performance improvements |
| API Reference | crates/pure-reason-core/ - Core Rust engine |
| Contributing | .github/CONTRIBUTING.md - How to contribute |
Best for:
- Verifying AI agent outputs before execution
- Detecting hallucinations in RAG systems
- Scoring confidence in generated claims
- Offline reasoning verification
- Production AI safety layers
- Code agents needing local verification
Not suitable for:
- Novel problem solving (use GPT-5, Claude, o1)
- Long-context reasoning (>10K tokens)
- Real-time streaming (optimized for batch)
- Content generation
Apache 2.0 — see LICENSE