LENS: Role-Aware Clinical Summary Grading

LENS is a role-aware multi-agent grading pipeline for clinical summaries. The same summary is scored in parallel by three role-specific agents:

Physician
Triage Nurse
Bedside Nurse

Each role scores the summary across 8 rubric dimensions on a 1-5 scale. The system then computes a role-level weighted overall score, a cross-role overall score, and an Orchestrator Disagreement view that shows how far the three role scores differ on each dimension.

Core Capabilities

Parallel scoring by three role-specific agents
Shared 8-dimension LENS rubric
Two scoring modes:
- llm: OpenAI model-based scoring
- heuristic: local baseline scoring without API calls
Per-role weighted overall scoring based on questionnaire-derived role priors
Orchestrator validation, disagreement mapping, and score aggregation
Human-readable and JSON outputs

Pipeline Overview

Input a clinical summary.
Load rubric definitions and role configurations.
Run the three role agents in parallel.
Validate each role scorecard.
Build an Orchestrator Disagreement map for all 8 dimensions.
Aggregate the role outputs into:
- per-role scores
- per-role overall scores
- final overall score across roles

Repository Structure

src/grading_pipeline/ Python package (published on PyPI as edlens).
- cli.py — Command-line entrypoint and human-readable output formatting
- orchestrator.py — Multi-agent pipeline, validation, disagreement mapping, and aggregation
- llm_scoring.py — LLM-based scoring logic
- scoring.py — Heuristic baseline scoring and score utilities
- openai_client.py — Minimal OpenAI Responses API client
- config.py — Rubric/role configuration loaders
- validation.py — Input validation
config/
- lens_rubric.json — 8 rubric dimensions and evaluation focus
- roles.json — Role agents, persona metadata, and w_prior weights
- role_profiles/ — Role-specific LLM scoring profiles
schemas/
- agent_output.schema.json — JSON Schema for structured agent output
docs/ — API reference (mkdocs + mkdocs-material)
tests/ — Input-validation and orchestrator tests
Dockerfile — Container image for running the pipeline

Requirements

Python 3.12+
OpenAI API key for llm mode

Installation

pip install edlens

Or install from source with dev and docs extras:

pip install -e ".[dev,docs]"

The package has no runtime dependencies beyond the Python standard library.

API Key Setup

If you want to run the LLM pipeline, you must use your own OpenAI API key.

Create a file named .env in the project root.

Project root:

same folder as README.md
same folder as config/
same folder as grading_pipeline/

Expected file location:

LENS Project/.env

Add the following line to .env:

OPENAI_API_KEY=your_openai_api_key_here

Optional override:

OPENAI_BASE_URL=https://api.openai.com/v1/responses

You can use .env.example as the template:

cp .env.example .env

Important notes:

.env is already ignored by git and should not be committed.
If you run with --engine heuristic, no API key is required.
The code reads OPENAI_API_KEY from .env first, then falls back to your shell environment.

Quick Start

Run with the default LLM mode:

python -m grading_pipeline --summary "Your summary here"
# or, using the installed CLI entry point:
lens --summary "Your summary here"

Run with the heuristic baseline:

lens --engine heuristic --summary "Your summary here"

Use a summary file:

lens --summary-file path/to/summary.txt

Output JSON instead of the human-readable report:

lens --summary "Your summary here" --format json --pretty

Select a specific model:

lens --model gpt-4o-mini --summary "Your summary here"

Adjust the disagreement threshold:

lens --gap-threshold 0.5 --summary "Your summary here"

Docker

docker build -t lens .
docker run lens --summary "Your summary here" --engine heuristic

Input Rules

The CLI validates summary input before the scoring pipeline runs.

The summary must:

be provided through --summary or --summary-file
not be empty
not be whitespace only
be at least 30 characters after trimming whitespace

If the summary is invalid, the CLI exits with a non-zero code and no scoring call is made.

Output

The human-readable output includes:

role-by-role scores for all 8 dimensions
a weighted Overall score for each role
Orchestrator Disagreement showing score gaps per dimension
final Overall Score across all three roles

Example output shape:

----------------------------------------
Role-Aware Multi-Agent Grading Pipeline:
----------------------------------------
Physician:
Factual Accuracy: 5.0
Relevant Chronic Problem Coverage: 4.0
...

Overall: 4.12
----------------------------------------
Triage Nurse:
...
----------------------------------------
Bedside Nurse:
...
----------------------------------------
----------------------------------------
Orchestrator Disagreement:
----------------------------------------
Factual Accuracy: 1.0
Relevant Chronic Problem Coverage: 0.0
...
----------------------------------------
Overall Score: 4.0

Scoring Logic

Each role has its own prior weights in config/roles.json.

Role-level overall score:

Role Overall = weighted average of the 8 dimension scores

Cross-role overall score:

Overall Score = average of the 3 role overall scores

Disagreement per dimension:

Gap = highest agent score - lowest agent score

Testing

Run the test suite:

pytest -q

Current tests cover:

CLI summary input validation
disagreement-map correctness
validation and repair behavior
conditional adjudication behavior
weighted aggregation behavior

Current Status

The current implementation includes:

parallel three-role scoring
role-aware weighting
strict input validation
orchestrator disagreement reporting
weighted final score aggregation
human-readable report formatting for demo and presentation use

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.github/workflows		.github/workflows
config		config
docs		docs
schemas		schemas
src/grading_pipeline		src/grading_pipeline
static		static
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LENS: Role-Aware Clinical Summary Grading

Core Capabilities

Pipeline Overview

Repository Structure

Requirements

Installation

API Key Setup

Quick Start

Docker

Input Rules

Output

Scoring Logic

Testing

Current Status

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LENS: Role-Aware Clinical Summary Grading

Core Capabilities

Pipeline Overview

Repository Structure

Requirements

Installation

API Key Setup

Quick Start

Docker

Input Rules

Output

Scoring Logic

Testing

Current Status

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages