policy-gate

Deterministic policy firewall for LLMs.
Allowlist-first, fail-closed, fully auditable.
Built for agents, tool-use pipelines, and SaaS multi-tenant deployments.

import { Firewall } from "policy-gate";

const fw = await Firewall.create();
const verdict = await fw.evaluateForTenant("tenant-a", "What is the capital of France?");

if (!verdict.isPass) throw new Error(`Blocked: ${verdict.blockReason}`);

Status: Experimental — do not use for safety-critical production.

Why not a classifier?

Most AI guardrails are probabilistic. They estimate risk — and can be wrong in both directions.

policy-gate takes the opposite approach: only explicitly allowlisted intents pass. Everything unknown, ambiguous, or disagreed-upon fails closed. The verdict never depends on an LLM or a probability score.

Core features


Deterministic allowlist enforcement	Only known-good intent shapes pass. Unknown = Block.
Fail-closed voter (1oo2)	Two independent channels must agree. Disagreement or fault → Block.
Ingress + Egress firewall	Validates both prompts and model responses (leakage, PII, framing).
Multi-tenant policy hub	Isolated profiles, configs, and audit logs per tenant.
Shadow mode	Evaluate without blocking — safe for rollout and observability.
Proxy mode	Drop-in reverse proxy in front of any LLM API. Zero code changes.

Optional / advanced features

Streaming egress [experimental] — Aho-Corasick scanning across SSE chunk boundaries
Fast-Semantic 2.0 [optional] — sparse embeddings + learned centroid corpus, ~31µs, advisory-only
Session-aware monitor — multi-turn escalation detection (fragmentation, probing, topic drift)
Contextual Anchor Validation (SA-080) — egress output constraints derived from ingress intent

How it works

   App ──► policy-gate ingress ──────────────────────────────► Upstream LLM
              │                                                      │
              │  normalize                                           │
              │  Channel A: FSM + allowlist ──┐                     │
              │  Channel B: rule engine  ─────┴─► voter ──► PASS    │
              │                                   (fault/disagree → BLOCK + audit)
              │                                                      │
   App ◄── policy-gate egress ◄───────────────────────────────────-─┘
              │  Output Channel 1: pattern/PII scan ──┐
              │  Output Channel 2: framing / anchor ──┴─► voter ──► PASS / BLOCK

Ingress channels (A + B) — independent techniques, diverse by design to prevent common-cause failure.
Voter — any disagreement, unknown result, or internal fault → Block.
Channel C [advisory] — heuristic scoring after the verdict, never changes the outcome.
Channel D [optional] — semantic similarity, advisory-only.

Quickstart

Node / TypeScript

npm install
npm run build:native   # builds Rust → native/index.node
npm run build          # compiles TypeScript
npm run smoke          # basic sanity check
npm run conformance    # full corpus

Python

python -m venv .venv && .venv\Scripts\activate
pip install maturin
python -m maturin develop --manifest-path crates/firewall-pyo3/Cargo.toml
python scripts/smoke.py
python scripts/conformance.py

Proxy (zero-code integration)

export UPSTREAM_URL="https://api.openai.com/v1/chat/completions"
export UPSTREAM_API_KEY="sk-..."
cargo run --release -p firewall-proxy
# → point your app at http://localhost:8080/v1

Rust core

cargo test -p firewall-core
cargo clippy -p firewall-core -- -D warnings

Configuration

Copy firewall.example.toml to firewall.toml and adjust.

Key settings:

# Only these intents may pass
permitted_intents = ["QuestionFactual", "TaskCodeGeneration"]

# Block any ambiguous intent for high-sensitivity tenants
on_diagnostic_agreement = "fail_closed"

# Optional: explicit tool allowlist for agentic workflows
allowed_tools = ["weather_tool", "calculator_tool"]

# Shadow mode: evaluate but never block (for rollout)
shadow_mode = true

Project layout

policy-gate/
├── crates/
│   ├── firewall-core/       # Rust safety kernel
│   ├── firewall-proxy/      # Standalone reverse proxy (axum)
│   ├── firewall-cli/        # Policy governance CLI
│   ├── firewall-napi/       # Node.js binding (napi-rs)
│   ├── firewall-pyo3/       # Python binding (PyO3 / maturin)
│   ├── firewall-wasm/       # WASM / edge target
│   └── firewall-proxy-wasm/ # Proxy-Wasm / Envoy target
├── docs/                    # Extended documentation (see below)
├── policy-hub/              # Pre-built TOML profiles and presets
├── verification/            # Z3 models, corpora, benchmarks, operator tooling
└── firewall.example.toml

What this is NOT

not a general-purpose moderation classifier
not a jailbreak detector or prompt toxicity scorer
not a replacement for human policy design and threat modeling
not a certification-grade safety system or IEC 61508 implementation

Design inspiration

policy-gate borrows ideas from functional safety engineering — fail-closed behavior, channel diversity, explicit fault handling. It is not an IEC 61508 implementation, has not been assessed by any third party, and makes no compliance claims. See SAFETY_MANUAL.md for the full design rationale.

License

Apache 2.0 — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.cargo		.cargo
.github/workflows		.github/workflows
crates		crates
docs		docs
examples		examples
fuzz		fuzz
helm/policy-gate		helm/policy-gate
policy-hub		policy-hub
scripts		scripts
tmp		tmp
verification		verification
.dockerignore		.dockerignore
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
RED_TEAM.md		RED_TEAM.md
SAFETY_MANUAL.md		SAFETY_MANUAL.md
SECURITY_AUDIT_REPORT.md		SECURITY_AUDIT_REPORT.md
channel_a.smt2		channel_a.smt2
deployment.md		deployment.md
firewall.broken.toml		firewall.broken.toml
firewall.example.toml		firewall.example.toml
firewall.toml		firewall.toml
index.ts		index.ts
operator_actions.jsonl		operator_actions.jsonl
package-lock.json		package-lock.json
package.json		package.json
simulate_audit.py		simulate_audit.py
tsconfig.json		tsconfig.json
verify_probing_sequences.py		verify_probing_sequences.py


SAFETY_MANUAL.md	Full design, hazard analysis, channel specs, safety requirements
docs/proxy.md	Reverse proxy, Prometheus metrics, hot-reload, CLI, Docker, Helm
docs/multi-tenant.md	Tenant registry, profiles, voter strictness
docs/agents.md	Tool-schema validation, LangGraph integration
docs/performance.md	Benchmarks, parallel batch, BERT semantic mode
docs/verification.md	Z3 proofs, regression datasets, operator review tooling

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

policy-gate

Why not a classifier?

Core features

Optional / advanced features

How it works

Quickstart

Node / TypeScript

Python

Proxy (zero-code integration)

Rust core

Configuration

Project layout

What this is NOT

Design inspiration

License

Further reading

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

policy-gate

Why not a classifier?

Core features

Optional / advanced features

How it works

Quickstart

Node / TypeScript

Python

Proxy (zero-code integration)

Rust core

Configuration

Project layout

What this is NOT

Design inspiration

License

Further reading

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages