340-agent adversarial clinical documentation system
17 features x 4 tiers x 5 providers = 340 concurrent agents
Python 3.14 · FastAPI · React 19 · transformers 5.7.0 · Pydantic v2 · SQLite
ARIA converts a raw clinical transcript into a validated, structured SOAP note.
It does not use a single model. It runs 340 specialized agents in parallel via asyncio. Each agent has one job. Agents from different model families argue over every extracted fact. A fact advances only if it maps to a verbatim span of the source transcript. Facts that fail this check are dropped at the P2 tier before any downstream agent sees them.
The final output is a SOAPNote Pydantic model. Every field links to the exact words in the source transcript that produced it.
17 features x 4 tiers x 5 providers = 340 total agents
| Dimension | Count | Composition |
|---|---|---|
| Total agents | 340 | 17 features x 4 tiers x 5 providers |
| Agents per provider | 68 | 17 features x 4 tiers |
| Agents per feature | 20 | 4 tiers x 5 providers |
| Named system parts | 22 | layers, modules, tiers, providers, UX |
All 340 agents run concurrently. Full deliberation — from raw transcript to validated SOAPNote — completes in under 15 seconds.
No provider is treated as authoritative. Each fills a distinct role.
| Provider | Model | Role | Agents | Env Key |
|---|---|---|---|---|
| OpenAI | GPT-4o | General clinical inference | 68 | OPENAI_API_KEY |
| Anthropic | Claude 3.5 | Nuance detection, safety reasoning | 68 | ANTHROPIC_API_KEY |
| xAI | Grok | Adversarial edge-case detection | 68 | GROK_API_KEY |
| Google AI | Gemini 1.5 | Deep medical context and knowledge retrieval | 68 | GOOGLE_AI_STUDIO_API_KEY |
| Cerebras | Cerebras-native | Low-latency adversarial tier execution | 68 | CEREBRAS_API_KEY |
Four tiers run per provider, per feature. Each has a fixed behavioral contract. No tier can be skipped or overridden.
P1 — Specialist
- Performs first-pass extraction from the raw transcript.
- Optimized for high recall over high precision.
- Produces the initial structured clinical hypothesis.
- System prompt grounded in reasoning patterns from
OpenMed/MedDialogviasrc/operations/distillation.py.
P2 — Attending
- Checks every P1 claim against the source transcript.
- A claim that cannot be matched to a verbatim or high-confidence semantic span is blocked here. It does not advance.
- Can challenge P1 output from any provider, not only its own.
- This is the primary fact-filtering gate.
P3 — Chief
- Reviews the full P1 and P2 argument record within its provider family.
- Resolves conflicting interpretations.
- Produces a single consolidated position per provider.
- Weights by logical grounding of the argument, not by model confidence score.
P4 — Synthesis
- Runs after all five P3 outputs are complete.
- Aggregates across all 340 deliberation points.
- Produces the final
SOAPNotePydantic model. - Enforces that every
ClinicalFactcarries a populatedsource_quotebefore the model is returned.
Two axes run simultaneously for every feature.
Vertical — Intra-Provider
Each provider runs its own internal argument sequence before any result leaves that provider team:
P1 [extract] -> P2 [challenge] -> P3 [resolve] -> P4 [converge]
Horizontal — Inter-Provider
At each tier, all five providers see and challenge each other's outputs:
Round 1 P1 x5 All five specialists generate independent analyses in parallel
Round 2 P2 x5 All five attendings attack all P1 outputs across all providers
Round 3 P3 x5 All five chiefs evaluate the full cross-provider P1/P2 argument log
Round 4 P4 x5 Final cross-provider synthesis; all claims verified against source
Provider disagreement is treated as a signal. Convergence is the result of repeated adversarial challenge — not averaging or voting.
Every ClinicalFact carries a required source_quote field. This field holds the verbatim transcript span that supports the fact. It is not optional. Pydantic v2 rejects any ClinicalFact where this field is empty or missing.
The P2 tier enforces this at runtime. If a claim cannot be mapped to the transcript via exact match or high-confidence semantic similarity (computed by the local transformers embedding layer), the fact is dropped silently. It does not reach P3.
File: src/data/schemas.py
from pydantic import BaseModel
from typing import List, Optional
class ClinicalFact(BaseModel):
category: str
fact: str
source_quote: str # verbatim transcript span — required, enforced by Pydantic v2
timestamp: Optional[str]
confidence: float
class SOAPNote(BaseModel):
subjective: List[ClinicalFact]
objective: List[ClinicalFact]
assessment: List[ClinicalFact]
plan: List[ClinicalFact]
agent_deliberation_log: str # full argument record across all 340 agentsPydantic v2 enforces this schema at every tier boundary. An invalid model is rejected before it advances.
Package: transformers 5.7.0
Location: venv/lib/python3.14/site-packages/transformers/
Execution: Runs entirely inside the venv. No external API call is made.
Three jobs run locally before any payload leaves the machine:
Named Entity Recognition (NER) Tags medical terms in the transcript stream in real time — symptoms, durations, medications, procedures. Tags are returned as JSON to the frontend to trigger live visual formatting.
PII Scrubbing Strips patient identifiers from the transcript payload. External provider API calls receive only the clinical-signal text. No patient identifier reaches an external endpoint.
Embedding Generation
Computes sentence embeddings used by the P2 tier to perform semantic similarity matching between a ClinicalFact.fact string and the candidate source_quote span. This is how high-confidence non-exact matches are verified.
The transformers library exposes these three functions through src/features/ner_pipeline.py. The Whisper model inside the same package handles speech-to-text transcription for the live voice input path.
File: src/operations/distillation.py
Dataset: OpenMed/MedDialog on Hugging Face (requires HF_TOKEN)
All 340 agents are initialized with reasoning patterns extracted from MedDialog. The pipeline:
- Streams gold-standard clinical dialogues from MedDialog.
- Extracts canonical reasoning patterns — for example: resolving conflicting lab values, handling ambiguous symptom clusters, weighing negative findings.
- Injects those patterns into the system prompt of every agent at startup via
src/core/config.py.
Every agent's baseline reasoning reflects documented clinical logic rather than generic language model priors.
17 independent clinical modules. Each lives in src/features/. Each is governed by its own 20-agent team (4 tiers x 5 providers).
| Category | Feature | File |
|---|---|---|
| Performance | Parallel agent execution via asyncio |
parallel_execution.py |
| Performance | Sub-15s transcript processing | sub15s_processing.py |
| Performance | Cross-provider output comparison and ranking | multi_provider_competition.py |
| Traceability | Click-to-highlight transcript span mapping | click_to_highlight.py |
| Traceability | source_quote verification and storage |
source_quote_mapping.py |
| Traceability | SQLAlchemy audit log to SQLite | sqlite_audit_trail.py |
| Safety | Safety gate — blocks all downstream processing | stop_first_safety.py |
| Safety | P2 adversarial critique behavioral logic | adversarial_critique.py |
| Safety | System self-evaluation and reliability scoring | ai_readiness_audit.py |
| Reasoning | Argument weighting and convergence logic | deliberative_convergence.py |
| Reasoning | 340 distinct per-agent system prompt definitions | per_role_prompts.py |
| Reasoning | Leadership briefing format and delivery | leadership_briefings.py |
| Transparency | Live deliberation streaming to Glass Box | realtime_reasoning_display.py |
| Transparency | Clinician-facing reasoning explanation | staff_transparency.py |
| Transparency | Glass Box and Clinician Dashboard UX backends | src/ux/glass_box.py, src/ux/dashboard.py |
| Data | Raw transcript to SOAPNote conversion |
unstructured_to_structured.py |
| Data | Tier-boundary Pydantic v2 schema enforcement | pydantic_validation.py |
| Layer | Technology | Version |
|---|---|---|
| Backend framework | FastAPI with asyncio |
latest |
| ASGI server | Uvicorn | 0.46.0 |
| Data validation | Pydantic | v2 |
| Audit storage | SQLite + SQLAlchemy | latest |
| Local NLP | transformers | 5.7.0 |
| Content hashing | xxhash | 3.7.0 |
| HTTP client | urllib3 | 2.6.3 |
| CLI interface | typer | 0.25.0 |
| Config parsing | PyYAML | latest |
| URL handling | yarl | 1.23.0 |
| Type support | typing_extensions | 4.15.0 |
| Type introspection | typing_inspection | 0.4.2 |
| Frontend | React 19 SPA | latest |
| Frontend build | Vite | see vite.config.js |
| Python runtime | CPython | 3.14 |
REST (POST): Uploads transcripts. Retrieves final SOAPNote Pydantic models.
WebSocket: Streams live agent arguments from the asyncio task pool to the frontend as they execute. Required for the Glass Box real-time deliberation view. Primary endpoint: /ws/scribe.
Three panes. No global scroll. Each pane scrolls independently. Fixed layout: 30% / 30% / 40%.
Every interface element serves a clinical function. The interface renders nothing before data is ready. No placeholders. No skeleton states with dummy content.
- Patient name (Semibold SF Pro), MRN, age, sex.
- Time in ED clock, top right. Turns red after 4 hours.
- Five provider status dots. Green when idle. Pulsing when deliberating.
- Background:
#F5F5F7. Pane surfaces: white at 80% opacity withbackdrop-blur.
File: src/components/LiveCanvas.jsx
- Circular record button, bottom center. Three states:
- Idle: microphone icon, no animation.
- Recording: pulsing blue ring.
getUserMediaactive. Audio streaming to/ws/scribe. - Processing: shimmer effect.
asynciotask pool running.
- Transcript renders in real time with diarization labels:
MD:in dark gray,Patient:in standard weight. - NER tags from the local
transformerslayer trigger live inline formatting: symptoms bold in blue, durations bold in green. Tags arrive as JSON over the WebSocket. - "Import Transcript" button: routes a text block directly to
src/core/orchestrator.py, bypassing voice and Whisper entirely.
File: src/api/ws_scribe.py
- Accepts binary audio chunks.
- Routes to Whisper inside
transformers. - Yields text blocks to the frontend.
- Simultaneously routes text to
src/features/ner_pipeline.pyfor NER tagging.
File: src/components/ClinicalSource.jsx
- Three cards: Chief Complaint (top, high-visibility), Patient Story / HPI, Medical and Family History (two side-by-side).
- All cards are hidden on load. No placeholder text is shown.
- Each card fades in independently when the P1 Specialist agents for that feature stream a convergence event over the WebSocket.
- Fade-in is triggered by a confirmed backend event, not a timer or word count threshold.
File: src/components/IntelligenceMatrix.jsx
- Renders the validated
SOAPNotePydantic model field by field. - Toggle at top:
[ SOAP | APSO ]. APSO reorders the display to show Assessment and Plan first. Reorder is immediate — no re-fetch, no full component re-render. - Each rendered fact is a clickable element bound to
src/features/click_to_highlight.py.
File: src/components/GlassBox.jsx
- Slide-out drawer. Opens only on fact click.
- Displays the verbatim
source_quotefrom theClinicalFactmodel field. - Displays the one-line conflict resolution log from
agent_deliberation_log. Example:Grok challenged OpenAI on MI risk; converged on UA due to negative Trop. - Simultaneously highlights the source span in the Live Canvas via
src/features/click_to_highlight.py.
Adaptive-Reasoning-Intelligence-Assembly/
│
├── run.py # System entry point — launches FastAPI + Vite
├── vite.config.js # Vite frontend build configuration
├── .env # API keys — git-ignored, never committed
├── .gitignore # Enforces key exclusion on every commit
├── requirements.txt # Python dependencies
│
├── src/
│ ├── core/
│ │ ├── config.py # Env loading, model config, tier config, prompt injection
│ │ └── orchestrator.py # Launches and manages the 340-agent asyncio task pool
│ │
│ ├── agents/
│ │ ├── base.py # Shared agent interface, lifecycle hooks, logging
│ │ ├── specialist.py # P1 — first-pass extraction, high-recall mode
│ │ ├── attending.py # P2 — adversarial critique, source-quote gating
│ │ ├── chief.py # P3 — intra-provider argument synthesis
│ │ └── synthesis.py # P4 — cross-provider convergence, final SOAPNote output
│ │
│ ├── features/
│ │ ├── parallel_execution.py # asyncio task pool management across 340 agents
│ │ ├── sub15s_processing.py # Latency tracking and timeout enforcement
│ │ ├── multi_provider_competition.py # Cross-provider output comparison and ranking
│ │ ├── click_to_highlight.py # Maps SOAPNote fact coordinates to transcript spans
│ │ ├── source_quote_mapping.py # Fact-to-transcript verification; drops unverifiable facts
│ │ ├── sqlite_audit_trail.py # SQLAlchemy models; logs inputs, outputs, agent IDs, versions
│ │ ├── stop_first_safety.py # Safety gate; blocks all downstream execution if triggered
│ │ ├── adversarial_critique.py # P2 behavioral constraints and critique protocol
│ │ ├── ai_readiness_audit.py # System self-scoring on output reliability
│ │ ├── deliberative_convergence.py # Argument weighting; produces P3/P4 positions
│ │ ├── per_role_prompts.py # Defines all 340 distinct agent system prompts
│ │ ├── leadership_briefings.py # Formats executive summaries from SOAPNote output
│ │ ├── realtime_reasoning_display.py # Streams deliberation events to the Glass Box WebSocket
│ │ ├── staff_transparency.py # Formats per-fact reasoning for clinician-facing display
│ │ ├── unstructured_to_structured.py # Converts raw transcript text to SOAPNote schema input
│ │ └── pydantic_validation.py # Enforces schema at every tier boundary; rejects invalid output
│ │
│ ├── api/
│ │ └── ws_scribe.py # WebSocket /ws/scribe — audio -> Whisper -> text stream
│ │
│ ├── data/
│ │ ├── schemas.py # Pydantic v2: ClinicalFact, SOAPNote
│ │ └── knowledge_base/ # MedDialog-distilled reasoning patterns; loaded at init
│ │
│ ├── ux/
│ │ ├── glass_box.py # Backend for Glass Box deliberation stream
│ │ └── dashboard.py # Backend for Clinician Dashboard structured output
│ │
│ └── operations/
│ ├── distillation.py # MedDialog streaming, pattern extraction, prompt injection
│ └── system_audit.py # Health checks, latency logging, system maintenance
│
├── tests/
│ ├── test_agents.py # P1–P4 behavioral contract tests
│ ├── test_features.py # Per-feature output validation
│ ├── test_validation.py # Pydantic schema enforcement tests
│ └── test_convergence.py # Cross-provider convergence stability tests
│
└── venv/ # CPython 3.14 virtual environment (git-ignored)
├── pyvenv.cfg
└── lib/
└── python3.14/
└── site-packages/
├── transformers/ # v5.7.0
│ ├── models/ # whisper/, wav2vec2/, bert/, etc.
│ ├── pipelines/ # automatic_speech_recognition, token_classification, etc.
│ ├── quantizers/ # bnb, gptq, awq, torchao, etc.
│ └── utils/ # logging, hub, import_utils, etc.
├── transformers-5.7.0.dist-info/
├── uvicorn/ # v0.46.0
│ ├── protocols/http/
│ ├── protocols/websockets/
│ ├── loops/
│ ├── middleware/
│ └── supervisors/
├── uvicorn-0.46.0.dist-info/
├── typer/ # v0.25.0
├── typer-0.25.0.dist-info/
├── xxhash/ # v3.7.0
│ └── _xxhash.cpython-314-darwin.so
├── xxhash-3.7.0.dist-info/
├── urllib3/ # v2.6.3
│ ├── contrib/emscripten/
│ ├── http2/
│ └── util/
├── urllib3-2.6.3.dist-info/
├── yaml/ # PyYAML
├── yarl/ # v1.23.0
│ └── _quoting_c.cpython-314-darwin.so
├── yarl-1.23.0.dist-info/
├── typing_extensions.py # v4.15.0
├── typing_extensions-4.15.0.dist-info/
├── typing_inspection/ # v0.4.2
├── typing_inspection-0.4.2.dist-info/
└── ... # fastapi, pydantic, sqlalchemy, aiohttp, etc.
Requirements: Python 3.14, Node.js.
# Clone
git clone <repo-url>
cd Adaptive-Reasoning-Intelligence-Assembly
# Python environment
python3.14 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Frontend
npm install
# Module resolution
export PYTHONPATH=$PYTHONPATH:.
# Launch
python3 run.py.env file (git-ignored, must never be committed):
OPENAI_API_KEY=
ANTHROPIC_API_KEY=
GROK_API_KEY=
GOOGLE_AI_STUDIO_API_KEY=
CEREBRAS_API_KEY=
HF_TOKEN=- API keys must not appear in Git history.
.gitignoreenforces this on every commit.venv/is git-ignored and never committed.- The local
transformerslayer runs PII scrubbing before any payload leaves the machine. - No external API call is made until local NER and scrubbing complete.
- If a provider fails mid-run, the remaining four continue. The matrix does not halt.
agent_deliberation_logis preserved on everySOAPNotefor audit purposes.
| Control | Mechanism |
|---|---|
| Fact traceability | Every ClinicalFact requires a non-empty source_quote; Pydantic v2 rejects violations |
| Agent-level logging | Inputs, outputs, model versions, and agent IDs written to SQLite via SQLAlchemy |
| Tier-boundary validation | Pydantic v2 validates schema at every tier transition; invalid output is rejected, not passed |
| PII protection | Local NER strips patient identifiers before any text reaches an external API endpoint |
| Provider failover | If one provider fails mid-deliberation, remaining four continue the full matrix |
| Deliberation record | agent_deliberation_log string preserved on every SOAPNote output for audit |
| Model versioning | Provider model version strings logged per API call for reproducibility |
| Metric | Target | Mechanism |
|---|---|---|
| Processing time | Under 15 seconds per transcript | All 340 agents run concurrently via asyncio |
| Source verification | Every output fact carries a transcript-backed source_quote |
P2 gate + Pydantic v2 enforcement |
| Recall | High — no premature filtering at P1 | P1 Specialists optimized for full structured extraction |
| Convergence stability | Stable across all five providers | Iterative cross-provider argument revision before P4 fires |
| Provider fault tolerance | System continues with 4 of 5 providers | Failover handled in src/core/orchestrator.py |
pytest tests/ -vCovers 110 parameters: agent behavioral contracts (P1–P4), per-feature output validation, Pydantic schema boundary enforcement, and cross-provider convergence stability.
Binary conditions. Each must pass before the module is marked complete.
Layout and Visual
- Background is exactly
#F5F5F7. Pane surfaces are white at 80% opacity withbackdrop-blur. - Layout is 30% / 30% / 40%. Global scroll is disabled. Each pane scrolls independently.
- Patient Banner is fixed at the top with translucent styling.
- Time in ED clock turns red at exactly 4 hours. Change is driven by elapsed time, not a static flag.
- Five provider dots reflect live backend polling: green when idle, pulsing when
asynciotasks are active.
Left Pane — Live Canvas
- Mercury button cycles correctly through all three states with no intermediate stuck states.
- Audio chunks stream over WebSocket to
/ws/scribewithout blocking the main UI thread. - Transcript renders
MD:andPatient:labels in real time as text arrives. - NER JSON tags from
src/features/ner_pipeline.pytrigger live inline formatting: symptoms blue, durations green. - "Import Transcript" bypasses
getUserMediaand Whisper; routes text block directly tosrc/core/orchestrator.py.
Center Pane — Clinical Source
- All three cards are hidden on load. No placeholder or skeleton content is visible.
- Each card fades in only after the P1 Specialists for that feature stream a convergence event from the backend.
- Cards do not appear based on a timer or word count threshold.
Right Pane — Intelligence Matrix
-
SOAPNotefields render from the Pydantic model, not from mock data. - SOAP/APSO toggle reorders the section array instantly with no re-fetch.
- Clicking a
ClinicalFacttriggerssrc/features/click_to_highlight.pyand opens the Glass Box drawer. - Glass Box displays the verbatim
source_quotestring from theClinicalFactmodel field. - Glass Box displays the one-line conflict resolution entry from
agent_deliberation_log. - The corresponding transcript span is highlighted in the Live Canvas simultaneously with the Glass Box opening.
Backend and Integration
- Clicking "Stop" triggers
src/core/orchestrator.pyand all 340 agents begin executing. - A deliberation progress indicator updates to reflect the live status of all five provider families.
-
src/agents/synthesis.pypopulates the Right Pane once P4 completes. - No external API call is made before
src/features/ner_pipeline.pycompletes PII scrubbing. - Patient View is absent from all frontend routes and all backend handlers.
- No external tracking, analytics, or unapproved API endpoints exist in the codebase.
-
venv/is in.gitignoreand does not appear in Git history. -
.envis in.gitignoreand does not appear in Git history.