Agent Dialectic Resolver

A defensive design pattern for high-stakes agentic workflows. Hybrid neuro-symbolic verification for multi-agent LLM systems that need trustworthy, auditable decisions instead of confident-sounding hallucinations.

Status: Production pattern, distilled from a working media-planning agent. License: CC0 — public domain. Adapt freely.

It addresses the primary blocker for enterprise agent adoption: critic fatigue and silent hallucinations in multi-agent consensus loops.

The one-line pitch

LLMs can propose. LLMs can object. LLMs can request evidence. LLMs cannot adjudicate.

Adjudication is deterministic code that reads typed evidence and applies typed rules. Everything else follows from that.

Architecture at a glance

flowchart TD
    U[User directive]:::user --> P[Proposer LLM]:::llm
    P -->|typed proposal| PT[Pressure-Tester LLM]:::llm
    PT -->|typed objections| RW[Research Workers<br/>APIs · MCP · files · search]:::tools
    RW -->|typed evidence| R{{Resolver<br/>PURE CODE}}:::code
    R -->|advance / hold / escalate| OUT[Validated Artifact + Receipts]:::out
    R -. another round .-> P

    classDef llm fill:#1f6feb,stroke:#0d419d,color:#fff
    classDef tools fill:#8957e5,stroke:#553098,color:#fff
    classDef code fill:#1a7f37,stroke:#0f5223,color:#fff
    classDef user fill:#9a6700,stroke:#5c3d00,color:#fff
    classDef out fill:#cf222e,stroke:#82071e,color:#fff

The Resolver is what makes this pattern different from "two LLMs talking." It is the part you trust because you wrote it.

What's in this repo

Path	Purpose
docs/agent-dialectic-resolver-pattern.md	The full pattern: four actors, typed contracts, authority hierarchy, confidence thresholds, resolver state machine, hard gates, receipts, anti-patterns, generic use cases
src/types.ts	TypeScript contracts (`EvidenceChainItem`, `Proposal`, `Objection`, `Resolution`, `DialecticArtifact`, `DialecticReceipt`)
src/resolver.ts	Pure `runDialecticResolver()` — no LLM, no network, no randomness
src/gates.ts	The starter hard-gate set, including the invention check
src/index.ts	Public surface
examples/code-review/	Runnable mock: code-review proposer/pressure-tester with stub research workers

Quick start

import { runDialecticResolver, defaultAuthorityRank } from "agent-dialectic-resolver";

const result = runDialecticResolver({
  artifact,           // your DialecticArtifact
  evidenceChain,      // EvidenceChainItem[] appended so far this turn
  config: {
    thresholdsByCategory: {
      "Regulatory":              0.85,
      "Safety":                  0.80,
      "External-Availability":   0.75,
      "Factual":                 0.70,
      "Categorical":             0.65,
    },
    authorityRank: defaultAuthorityRank,
    maxRounds: 3,
    minSourceKindForSeverity: {
      BLOCKING: ["user-override", "live-api", "official-source", "domain-evidence"],
      HIGH:     ["user-override", "live-api", "official-source", "domain-evidence", "fresh-research"],
      MEDIUM:   ["user-override", "live-api", "official-source", "domain-evidence", "fresh-research", "modeled-fallback"],
      LOW:      ["user-override", "live-api", "official-source", "domain-evidence", "fresh-research", "modeled-fallback", "llm-inference"],
    },
  },
});

if (result.artifact.validation.readyToAdvance) {
  // ship the artifact
} else if (result.needsAnotherRound) {
  // run the Proposer + Pressure-Tester again with the appended receipts visible
} else {
  // escalate to human review
}

See examples/code-review/run.ts for an end-to-end mock.

When to use this pattern

Use it when the output drives a real action with cost: spend, code, configuration, communication, policy.

Don't use it when:

The output is purely informational and the user is the final adjudicator (a search tool, a Q&A bot).
The cost of being wrong is low and the cost of slowness is high (autocomplete).
There is no domain-specific Red Flag Playbook worth writing — meaning nobody has expertise about how this thing fails.

If you can't name the failures you're guarding against, you're not ready to apply this pattern. Go find a domain expert first.

Generic use cases

The pattern is domain-agnostic. The Red Flag Playbook, the closed enums, the confidence thresholds, and the gates are what you specialize.

Software engineering — Proposer writes a patch, Pressure-Tester applies a security/perf/test-coverage playbook, Research Workers run tests & static analysis, Resolver requires green tests + no blocking critique unresolved.
Customer support — Proposer drafts a refund decision, Pressure-Tester checks policy + fraud, Research Workers hit billing/orders, Resolver blocks high-value refunds without a live-api order confirmation.
Medical triage — Proposer suggests next step, Pressure-Tester applies red-flag symptom list, Research Workers hit EHR + drug interaction + guideline retrieval, Resolver never advances on llm-inference alone for contraindications.
Legal drafting — Proposer drafts a clause, Pressure-Tester argues opposing counsel, Research Workers retrieve case law + statute, Resolver requires official-source citations for any binding claim.
DevOps — Proposer generates a Terraform/K8s change, Pressure-Tester applies SRE red flags, Research Workers run terraform plan + OPA, Resolver gates prod changes on green plan + policy pass.
Scientific paper review — Proposer synthesizes findings, Pressure-Tester critiques methodology, Research Workers verify citations + re-run stats, Resolver requires dataset references and checked test statistics.

Full breakdown in docs/agent-dialectic-resolver-pattern.md.

What this pattern costs you

Honest accounting:

Schema discipline. Every domain field needs a closed enum or a validator.
Calibrated confidences at ingest. Garbage-in, garbage-out: if your data sources don't carry honest confidence, the Resolver propagates lies.
Real Red Flag Playbooks. A weak Pressure-Tester misses real objections.
A separate research layer. Typed adapters around your tools. You wanted those anyway.
More LLM calls, not fewer. Proposer + Pressure-Tester + (sometimes) Research = ≥2 LLM calls per turn.

What it buys: a system whose decisions you can defend.

What makes this pattern different

Other multi-agent frameworks (AutoGen, CrewAI, LangGraph) ship the agent loop but leave adjudication to either (a) free-form chat between two LLMs or (b) an "LLM-as-judge." Both collapse under pressure-testing because confidence inflation eventually convinces the critic to relax its standards.

Three elements that are not currently found off-the-shelf:

Evidence Authority Hierarchy. A strict transport-layer ranking (user-override > live-api > official-source > domain-evidence > fresh-research > modeled-fallback > llm-inference) with an explicit rule that llm-inference can never clear a BLOCKING objection alone.
No natural-language drift. Both agents emit only typed message kinds (closed enums of structural moves), not free-form prose. The audit trail is parseable, not skimmable.
Pure-code adjudication. The Resolver is deterministic TypeScript. No network. No model evaluation. 100% reproducible — same artifact + same evidence chain → same resolutions, every time.

Related work

This pattern lives next to a handful of adjacent ideas. It is not a replacement for them; it solves a different slice of the problem.

Approach	What it does well	Where this pattern is different
"DIALECTIC" frameworks (multi-agent debate, e.g. VC-style pro/contra critics)	Structures high-stakes decisions as adversarial argument	Those still rely on an LLM judge or scoring heuristic to draw the conclusion. Here the adjudicator is a hard-coded state machine that cannot be bypassed by argument quality.
Compiled AI / LLM+P	Runs the LLM once at compile-time to generate intent, then routes execution through deterministic planners and hard validation gates	Optimizes for static, low-latency execution. This pattern optimizes for live, iterative negotiation loops where agents fetch dynamic tools to satisfy a critic before committing.
Reasoning Graphs	Anchors evaluation edges directly to retrieved evidence items — mirrors this pattern's rule that "if it's not in the evidence chain, it does not exist"	Designed for agentic self-improvement and memory. This Resolver is a defensive runtime firewall whose job is to block bad execution, not to learn from it.
LLM-as-judge (AutoGen, RLHF reward models)	Cheap, flexible, easy to deploy	Judge is still inference. Cannot resolve blockers in this pattern. Used here only as one input among many to the deterministic Resolver.
CrewAI / LangGraph multi-agent chat	Great ergonomics, fast iteration	Free-form prose between agents. Confidence inflation is the default failure mode.

Position this pattern as hybrid neuro-symbolic verification — neural for generation and critique, symbolic for adjudication — sitting on top of whichever multi-agent framework you already use.

Contributing

If you ship something using this pattern, file an issue or PR with a one-line summary of your Red Flag Playbook and gate configuration. The goal is a shared library across domains.

PRs welcome for:

Additional reference adapter examples (Python, Go, Rust).
Domain-specific Red Flag Playbooks.
Real-world case studies (sanitized).

Origin

Distilled from a production multi-agent media-planning system where a "Proposer" agent and a "Pressure-Tester" agent had to agree on a campaign plan before live ad spend was committed. Letting two LLMs converge in free-form chat turned out to be indistinguishable from one LLM rationalizing. Adding the deterministic Resolver between them turned the system from confident-sounding theater into a defensible decision pipeline.

The pattern is now generalized. Use it anywhere an LLM-generated proposal needs to clear a real-world bar before being acted on.

Inventor & Contact

Simon Foster — inventor of the Agent Dialectic Resolver pattern.

Questions, implementation help, real-world case studies, or commercial inquiries: simon@spotrunner.com

Released to the public domain under CC0. Attribution is appreciated but not required.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
docs		docs
examples/code-review		examples/code-review
src		src
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agent Dialectic Resolver

The one-line pitch

Architecture at a glance

What's in this repo

Quick start

When to use this pattern

Generic use cases

What this pattern costs you

What makes this pattern different

Related work

Contributing

Origin

Inventor & Contact

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Agent Dialectic Resolver

The one-line pitch

Architecture at a glance

What's in this repo

Quick start

When to use this pattern

Generic use cases

What this pattern costs you

What makes this pattern different

Related work

Contributing

Origin

Inventor & Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages