Narrative Coherence Error Detector

A hybrid system that detects logical inconsistencies in fiction by combining Large Language Models (for structured information extraction from narrative text) with Answer Set Programming (for formal logical reasoning via Clingo).

The system processes novels chapter-by-chapter, extracts structured facts (characters, locations, items, events, relationships), converts them to ASP facts, and applies logic programs to detect five categories of narrative errors: Causality, Coherence, Temporal, Location, and Emotional.

Architecture Overview

The system follows a strict "logic-first" architecture:

Python handles orchestration: state management, entity tracking, data conversion, and persistence.
LLMs (OpenAI, Google Gemini, or local) handle natural language understanding: extracting structured JSON from raw chapter text.
ASP/Clingo handles all logical reasoning: violation detection through constraint checking and derived facts.

                        ┌─────────────────────────┐
                        │     Chapter Text         │
                        └────────────┬────────────┘
                                     │
                                     ▼
                        ┌─────────────────────────┐
                        │   LLM Extraction Layer   │
                        │  (structured JSON output) │
                        └────────────┬────────────┘
                                     │
                    ┌────────────────┼────────────────┐
                    ▼                ▼                ▼
            ┌──────────────┐ ┌────────────┐ ┌──────────────┐
            │AliasResolver │ │ItemTracker │ │  Continuity  │
            │(canonical ID)│ │(lifecycle) │ │   Context    │
            └──────┬───────┘ └─────┬──────┘ └──────────────┘
                   │               │
                   └───────┬───────┘
                           ▼
                ┌──────────────────────┐
                │    EventExecutor     │
                │  ┌────────────────┐  │
                │  │ to_asp()       │  │   ──► ASP facts
                │  │ ActiveUniverse │  │   ──► Entity filtering
                │  │ MovementGuard  │  │   ──► Implicit transitions
                │  │ check_clingo() │  │   ──► Rule evaluation
                │  └────────────────┘  │
                └──────────┬───────────┘
                           │
          ┌────────────────┼────────────────┐
          ▼                                ▼
   ┌─────────────┐              ┌──────────────┐
   │  Conflict   │              │   Context    │
   │  Resolver   │              │ Persistence  │
   │ (overrides) │              │  (save/load) │
   └─────────────┘              └──────────────┘
                           │
                           ▼
                ┌──────────────────────┐
                │   FinalAnalyzer      │
                │  (story-wide audit)  │
                └──────────────────────┘

Key Principles

Logic-first: Python never encodes narrative logic in conditionals — all reasoning is in ASP rules.
Deterministic: Every conclusion traces to rules. First-seen-wins for alias unification.
Memory-bounded: No historical state snapshots. Lifecycle management (ACTIVE/LATENT/FROZEN) bounds entity count. Active universe limits ASP grounding.
Structured output only: All outputs are JSON-serializable dataclasses with full provenance.
Audit trail: Alias promotions, rule overrides, and conflict resolutions are all logged.

Project Structure

narrative_coherence_error_detector/
├── engine/                    # Core evaluation engine (Python)
│   ├── state_manager.py       # World state & Logic Knowledge Graph (LKG)
│   ├── event_executor.py      # ASP fact generation & Clingo evaluation
│   ├── alias_resolver.py      # Cross-chapter entity identity resolution
│   ├── entity_registry.py     # Canonical entity storage with lifecycle
│   ├── item_tracker.py        # Item lifecycle & Chekhov's Gun detection
│   ├── rule_registry.py       # ASP rule file management & priority layers
│   ├── conflict_resolver.py   # Story vs. universal rule conflict handling
│   ├── final_analysis.py      # Story-wide consistency analysis
│   ├── active_universe.py     # ASP-visible entity universe computation
│   ├── relationship_projector.py  # Relationship projection into ASP
│   ├── rule_projector.py      # Story rule projection with filtering
│   ├── movement_continuity_guard.py  # Implicit movement inference
│   ├── continuity_context.py  # LLM prompt context builder
│   ├── context_persistence.py # Cross-session JSON state persistence
│   └── asp_diagnostics.py     # ASP universe size monitoring
│
├── rules/                     # ASP rule programs (Clingo)
│   ├── core.lp                # Foundation: Event Calculus, entity types, aliases
│   ├── base.lp                # Original self-contained rule set (v1)
│   ├── general.lp             # Domain-independent rules (violation/4)
│   ├── general_v2.lp          # Enhanced with descriptions (violation/5)
│   ├── general_abstract.lp    # Comprehensive standalone rules
│   ├── general_narrative.lp   # Appearance & behavioral analysis
│   ├── simple_narrative.lp    # Primary production rules (Phase 3)
│   ├── enhanced_detection.lp  # Multi-signal anomaly detection
│   ├── story_rules.lp         # Story-specific dynamic rules
│   └── universal/             # Lowest-priority overridable rules
│       ├── temporal.lp        # Temporal violation detection
│       ├── location.lp        # Spatial reasoning with containment
│       ├── causality.lp       # Dead agents, effect without cause
│       ├── coherence.lp       # Self-referential & state contradictions
│       ├── emotional.lp       # Relationship-action mismatches
│       ├── appearance.lp      # Physical appearance consistency
│       ├── items.lp           # Item lifecycle & Chekhov's Gun
│       ├── knowledge.lp       # Monotonic knowledge tracking
│       └── narrative_state.lp # Three-tier state model
│
├── scripts/                   # Experiment orchestration & extraction
│   ├── run_narrative_experiment_refactored.py  # Main CLI entry point
│   ├── run_kfold_experiment.py      # K-fold: LLM vs logic comparison
│   ├── kfold_experiment_runner.py   # K-fold on pre-extracted data
│   ├── logic_only_test_runner.py    # ASP-only runner (no LLM)
│   ├── analyze_data_from_experiment.py     # Post-experiment analysis
│   ├── analyze_llm_only_experiment_results.py  # Step 1 result analysis
│   ├── debug_chapter.py            # Quick single-chapter debug tool
│   ├── extraction/            # LLM-based chapter extraction
│   │   ├── api_clients.py     # LLM backend factory (local/OpenAI/Gemini)
│   │   ├── prompts.py         # All LLM prompt templates
│   │   ├── chapter_extractor.py    # Main extraction entry point
│   │   ├── extractors.py     # Phase 2 split extraction pipeline
│   │   ├── entity_registry.py     # Canonical entity management
│   │   ├── relationship_normalizer.py  # Relationship validation
│   │   ├── event_normalizer.py    # Event validation & item classification
│   │   ├── event_type_mapper.py   # Event type refinement
│   │   ├── lifecycle_tracker.py   # Cross-chapter lifecycle tracking
│   │   ├── post_salvage_reconciler.py  # Salvaged event repair
│   │   ├── extraction_diagnostics.py   # Under-extraction detection
│   │   ├── temporal_diagnostics.py     # Temporal recall analysis
│   │   ├── llm_client.py     # LLM-only evaluation client
│   │   └── json_utils.py     # LLM output JSON repair
│   ├── logic/                 # ASP conversion & evaluation
│   │   ├── asp_converter.py   # JSON → ASP fact conversion
│   │   └── evaluator.py      # Main LogicEvaluator orchestrator
│   ├── experiment/            # Experiment management
│   │   ├── runners.py         # Step 1 (LLM) & Step 2 (Logic) runners
│   │   ├── ground_truth.py    # Ground truth loading & metrics
│   │   └── summary.py        # Cross-step comparison reports
│   ├── evaluation/            # Result evaluation
│   │   ├── evidence_checklist.py  # Per-category evidence requirements
│   │   └── detectability.py  # Error detectability classification
│   ├── merge/                 # JSON parsing utilities
│   │   ├── json_parser.py     # Advanced iterative JSON repair
│   │   └── json_repair.py    # Structure validation
│   └── state/                 # Shared configuration
│       ├── config.py          # Paths, API keys, constants
│       ├── data_structures.py # Shared result dataclasses
│       └── logging.py        # Console + file logging
│
└── experiments/               # Experiment results

Core Pipeline

The system processes a story through these stages for each chapter:

1. Continuity Context Building

Before extraction, the system builds a context of known characters, relationships, and states from previous chapters (via ContinuityContextBuilder). This is injected into the LLM prompt so it respects established facts.

2. LLM Extraction

The chapter text is sent to an LLM with structured prompts requesting:

Characters: names, aliases, traits, emotions, states
Locations: names, containment hierarchy
Items: names, carriers, lifecycle state
Events: agent, patient, action type, location, destination
Relationships: type, directionality, evidence

Extraction supports single-call or split extraction (Phase 2), which makes four independent LLM calls for characters/locations, items, relationships, and events, then merges the results.

3. Normalization Pipeline

The raw extraction is processed through multiple normalization phases:

Alias Resolution (AliasResolver): maps all entity references to canonical IDs with conflict detection and canonical promotion
Relationship Normalization (RelationshipNormalizer): validates types, expands group references, detects conflicts
Event Normalization (EventNormalizer): validates agents exist, generates event times, classifies item usage
Item Tracking (ItemTracker): classifies items as causal/latent/background, tracks lifecycle

4. ASP Fact Generation

The normalized extraction is converted to ASP facts (EventExecutor.to_asp()):

Entity declarations (character/1, location/1, item/1)
Alias mappings (alias/2)
State facts (is_dead/1, emotion/2, trait/2)
Event facts (event/6, event_order/2, event_time/2)
Relationship facts (relationship/4)
Presence facts (present/3, implied_present/3)
Cross-chapter persistent state facts
Implicit movement transitions (via MovementContinuityGuard)

The Active Universe optimization filters all facts to include only entities relevant to the current chapter, reducing ASP grounding time.

5. Clingo Evaluation

The ASP facts are combined with rule files and evaluated by Clingo:

Core rules (core.lp) are always loaded
Universal rules (rules/universal/) provide default detection
Story-specific rules are projected through the active universe filter
Violations are extracted as violation/N atoms and parsed into StructuredViolation objects

6. Post-Processing

Conflict Resolution: story-specific rules can override universal rules
State Update: the world state is updated with new facts from the chapter
Persistence: state is saved to disk for cross-session continuity

7. Final Analysis

After the last chapter, FinalAnalyzer performs story-wide analysis:

Rule audit: which rules fired, how often, which were overridden
Loose ends: unresolved Chekhov's Gun items, unused characters, dangling relationships
Long-range inconsistencies: violations spanning multiple chapters (e.g., repeated dead character acting)

Engine

The engine/ package is the Python orchestration layer. All modules are imported through engine/__init__.py (version 0.9.9, Phase 8.11.2).

State Management

state_manager.py — Central world state management maintaining the Logic Knowledge Graph (LKG).

Class	Purpose
`Entity`	Represents a character, location, or item with traits, state, emotion, aliases
`Relation`	Time-indexed relation (`present`, `carries`, `relationship`)
`StoryRule`	Story-specific rules (relationship, trait, location, possession, temporal)
`WorldState`	Snapshot at a timestep; contains entities, relations, derived facts. Has `to_asp_facts()`
`StateManager`	Main class: `add_entity()`, `mark_dead()`, `set_emotion()`, `advance_time()`, `get_asp_facts_for_clingo()`, `save()`/`load()`

Key design: only current state in memory (no historical snapshots). Delegates entity storage to EntityRegistry.

Event Evaluation

event_executor.py — The core evaluation engine converting structured data to ASP facts and invoking Clingo.

Class	Purpose
`ViolationSeverity`	Enum: `HARD`, `SOFT`, `WARNING`
`Provenance`	Tracks how conclusions were reached (rule ID, layer, chapter)
`StructuredViolation`	Violation output with rule, category, type, entities, severity, provenance
`ChapterEvaluationResult`	Complete chapter result with violations, state changes, ASP facts
`Event`	State transition event with agent, patient, location, destination
`EventExecutor`	Main class: `to_asp()`, `check_with_clingo()`, `evaluate_chapter_structured()`

Three evaluation modes:

evaluate_chapter() — legacy mode
evaluate_chapter_structured() — batch Clingo evaluation (primary)
evaluate_chapter_sequential() — per-event sequential evaluation

Entity & Alias Resolution

alias_resolver.py — Ensures consistent entity identity across chapters.

Maps aliases to canonical IDs for characters and locations
Canonical promotion: when a better ID is discovered (e.g., potter → harry_potter)
Conflict unification: merges two canonical IDs sharing an alias (first-seen-wins)
Location containment hierarchy (contains/2 ASP facts)
Extraction normalization: normalize_extraction() normalizes all entity references
ASP generation: to_asp_facts() produces alias/2 and contains/2

entity_registry.py — Canonical entity storage with deduplication and lifecycle management.

Lifecycle State	Definition
`ACTIVE`	Acted within the last 3 chapters
`LATENT`	Inactive for 3–9 chapters
`FROZEN`	Inactive for ≥10 chapters

Only ACTIVE entities participate in ASP reasoning.

Item Tracking

item_tracker.py — Tracks item lifecycle states for narrative significance and Chekhov's Gun detection.

Item lifecycle states: INTRODUCED → CARRIED → USED / DISCARDED / DESTROYED / GIVEN / LATENT

Item relevance classification:

Relevance	Description
`CAUSAL`	Participates in constraints — full violation checking
`LATENT`	Tracked for Chekhov's Gun — candidate for unused item violation
`BACKGROUND`	No violations generated — suppressed from reasoning

Rule Management

rule_registry.py — Central registry for all ASP rule files with priority layers.

Layer	Priority	Description
`UNIVERSAL`	1 (lowest)	Default world assumptions, overridable
`STORY`	2 (highest)	Story-specific rules and overrides

Key methods: load_legacy_rules(), get_active_rule_files(), get_combined_rules_content(), audit_summary().

Active Universe Optimization

active_universe.py — Computes the set of ASP-visible entities per chapter to reduce grounding time.

5-step algorithm:

Current chapter entities (from extraction)
Previous chapter entities (carry-over)
1-hop relationship expansion
Carried items expansion
Location presence

All ASP fact generation is filtered through this universe, preventing the knowledge base from growing unboundedly across chapters.

Projection Layer

relationship_projector.py — Re-projects persistent relationships into the current ASP context. Both endpoints must be in the active universe for a relationship to be projected.

rule_projector.py — Projects story-specific rules into ASP, gated by active universe membership. At least one referenced entity must be in the active universe for a rule to be projected.

movement_continuity_guard.py — Detects implicit movement between locations (when a character changes location without an explicit movement event). Generates synthetic implicit_leave events and derived_event/1, implicit_transition/5 ASP facts to reduce false time-travel violations.

Conflict Resolution

conflict_resolver.py — Handles conflicts between story-specific and universal rules. Analyzes violations to determine if they represent story-vs-universal conflicts (ghost exceptions, teleportation, magic systems), and resolves them via override (full deactivation), exception (entity-specific), or retain.

Final Analysis

final_analysis.py — Story-wide consistency analysis run after the final chapter.

Produces a FinalAnalysisResult containing:

Rule audit: each rule's activation count, violation count, override status
Loose ends: unresolved items (Chekhov's Gun), unused characters, dangling relationships
Long-range inconsistencies: violations spanning multiple chapters
Statistics: total violations, category distribution, entity counts

Persistence & Context

context_persistence.py — Persists global narrative context to disk as JSON. Supports atomic writes (temp file → rename), incremental updates, and cross-session state restoration.

continuity_context.py — Builds continuity context for injection into LLM extraction prompts, ensuring the LLM respects established facts (known characters, relationships, character states).

Diagnostics

asp_diagnostics.py — Lightweight optional instrumentation for monitoring ASP universe size across chapters. Detects growth patterns and warns on exponential growth. Zero runtime overhead when disabled (toggle via ASP_DIAGNOSTICS=1 environment variable).

Rules

The rules/ folder contains Answer Set Programming (ASP) rule files processed by Clingo. Rules detect logical inconsistencies by computing violation/N atoms from extracted facts.

Core & Base Rules

core.lp (~1000 lines) — The foundation loaded by all other rule layers. Implements:

Event Calculus: holds/2, initially/1, initiates/2, terminates/2 with frame axioms
Entity type hierarchy: entity/1, character/1, location/1, item/1
Alias resolution: canonical/2, alias/2
Time management: time/1, before/2, after/2, next_time/2
Containment hierarchy: contains_transitive/2, same_container/2
Event classification: is_movement_event/1, is_death_event/1, etc.
Layered rule precedence: rule_active/1, rule_overridden/1, suppressed/4
Posture/consciousness modeling: posture/3, consciousness/3

base.lp (~660 lines) — The original self-contained rule set (v1). Includes Allen's interval algebra, fluent reasoning, common-sense cause/effect axioms, phobia rules, and edibility defaults.

General Rule Sets

Multiple iterations of domain-independent general rules, each progressively more sophisticated:

File	Format	Key Additions
`general.lp`	`violation/4`	Clean, self-contained; covers all 5 categories
`general_v2.lp`	`violation/5`	Human-readable `@format` descriptions, material types
`general_abstract.lp`	`violation/5`	Auto-generated entity/event names, complete domain hooks
`general_narrative.lp`	`violation/5`	Appearance analysis, behavioral rules, sudden emotional shifts

enhanced_detection.lp — Multi-signal anomaly detection. Accumulates evidence scores across independent signals (negative emotions, mixed relationships, hostile/friendly action mismatches, unusual states) and fires when thresholds are met. Designed for modified-story detection.

simple_narrative.lp (~852 lines) — The primary production rule set (Phase 3). Designed for practical multi-chapter analysis without explicit time intervals. Integrates with the three-tier narrative state model. Includes extensive reflexive action detection, authority figure exceptions, and implicit transition detection.

Universal Rules

Located in rules/universal/, these form the lowest priority layer (priority 1) and can be overridden by story-specific rules. All use rule_active/1 guards and register with rule(ID, universal, Description) for audit.

File	Focus	Key Violations
`temporal.lp`	Temporal ordering & state contradictions	`ordering_violation`, `causal_violation`, `state_contradiction`, `time_travel`, `knowledge_before_acquisition`
`location.lp`	Spatial reasoning with containment hierarchy	`explicit_ubiquity`, `impossible_travel`, `invalid_remote`, `simultaneous_presence`
`causality.lp`	Dead agents, broken causal chains	`dead_character_acting`, `interacting_with_dead`, `effect_without_cause`
`coherence.lp`	Self-referential impossibilities, contradictory states	`impossible_self_action`, `contradictory_state`
`emotional.lp`	Relationship-action mismatches (evidence-gated)	`relationship_action_mismatch`, `relationship_betrayal`, `trait_action_mismatch`
`appearance.lp`	Physical appearance consistency	`appearance_exclusive_conflict`, `appearance_sudden_change`, `appearance_persistent_lost`
`items.lp`	Item lifecycle & Chekhov's Gun	`chekhov_candidate`, `suppress_violation` for background items
`knowledge.lp`	Monotonic knowledge tracking	Infrastructure for temporal `knowledge_before_acquisition`
`narrative_state.lp`	Three-tier state separation	`dead_in_scene`, `absent_acting`, `presence_contradiction`

The three-tier narrative state model (narrative_state.lp) is architecturally significant:

alive(C, T) — existence (alive or dead)
present_in_scene(C, T) — narrative scene presence (can be present without known location)
physically_at(C, L, T) — spatial location (implies scene presence)

This separation means a character can talk in a scene without a known location (valid), and a dead character cannot enter scenes (violation).

Story-Specific Rules

story_rules.lp (~922 lines) — Dynamic rules tracking how events modify relationships, traits, possession, and location. Contains 100+ action-relationship contradiction mappings (e.g., congratulates contradicts hates), action-trait contradiction pairs, and positive/negative relationship modifiers.

Includes story-specific content: Hogwarts location hierarchy, disjoint castle areas, canonical event exceptions (e.g., Hermione petrifying Neville does not flag as hostile).

Error Categories

Category	Detection Focus	Key Rule Files
Causality	Dead agents acting, effect without cause, Chekhov's Gun, missing preconditions	`causality.lp`, `items.lp`, `narrative_state.lp`
Coherence	Trait conflicts, inedible consumption, self-referential impossibilities, appearance anomalies, contradictory states	`coherence.lp`, `appearance.lp`, `narrative_state.lp`
Temporal	Circular time, ordering violations, state contradictions, time travel, duration bounds	`temporal.lp`, `story_rules.lp`
Location	Ubiquity, impossible travel, unreachable locations, invalid remote interaction	`location.lp`, `story_rules.lp`
Emotional	Relationship-action mismatches, trait-action conflicts, betrayal, sudden emotional shifts	`emotional.lp`, `story_rules.lp`

Scripts

The scripts/ package provides experiment orchestration, LLM extraction, logic evaluation, and analysis tools.

Extraction Pipeline

The extraction pipeline converts raw chapter text into structured JSON through multiple phases:

Phase 2 — Split Extraction (extraction/extractors.py): Four independent LLM calls per chapter, each extracting a different aspect:

Characters & Locations
Items
Relationships
Events

Results are merged into a unified extraction. Includes truncation recovery (salvage).

Phase 3 — Entity Registry (extraction/entity_registry.py): Canonical entity management with alias resolution and event validation.

Phase 4 — Relationship Normalization (extraction/relationship_normalizer.py): Type validation, group expansion (e.g., "the trio" → individual characters), conflict detection.

Phase 5 — Event Normalization (extraction/event_normalizer.py): Strict event validation (agents must exist), event_time generation, item usage classification (causal vs. latent).

Phase 6 — Lifecycle Tracking (extraction/lifecycle_tracker.py): Cross-chapter entity lifecycle tracking, Chekhov's Gun detection for unused items, dormant character detection.

Supporting modules:

api_clients.py — Factory for LLM backends (LocalLLMClient, OpenAIAPIClient, GeminiAPIClient)
prompts.py — All LLM prompt templates (~1624 lines): extraction schemas, social action typing, implied presence
chapter_extractor.py — Main extraction entry point (structure_chapter_standalone())
json_utils.py — Deterministic JSON repair for LLM output (markdown fences, single quotes, trailing commas)
extraction_diagnostics.py — Detects under-extraction via regex evidence scanning
temporal_diagnostics.py — Detects temporal language not captured by extraction

Logic Evaluation

logic/asp_converter.py — Converts structured JSON to ASP facts. Handles characters, locations, items, events, relationships, story rules, and implied presence. Supports active universe filtering.

logic/evaluator.py — Main LogicEvaluator orchestrator: LLM extraction → ASP facts → Clingo checking → result collection. Tracks cross-chapter state (dead characters, emotions, relationships, story rules).

Experiment Runners

run_narrative_experiment_refactored.py — Main CLI entry point with a two-step experiment design:

Step	Description
Step 1 — LLM-Only	Reads chapters sequentially; LLM detects errors directly from text (no formal logic)
Step 2 — Logic-Based	Full pipeline: LLM extraction → normalization → ASP → Clingo → violations

Step 2 supports multiple modes:

--engine: Full Phase 5 pipeline with all engine modules
--structured: LogicEvaluator.evaluate_chapter_v2() without interpretation LLM call
Default: LogicEvaluator.evaluate_chapter() with LLM interpretation
--split-extraction: Enables Phase 2 four-function extraction
--extraction-only: Extract only, skip logic evaluation

experiment/runners.py — Implements run_step1_llm() and run_step2_logic() / run_step2_engine() / run_step2_debug().

K-Fold Experiments:

run_kfold_experiment.py — Compares LLM vs. logic approaches with k-fold cross-validation
kfold_experiment_runner.py — Runs logic-only k-fold on pre-extracted data

Other runners:

logic_only_test_runner.py — Runs ASP/Clingo on pre-extracted data without LLMs; supports --trace for per-statement evaluation

Evaluation & Analysis

evaluation/evidence_checklist.py — Defines per-category evidence requirements. Each error category has mandatory and alternative predicates that must be present in the ASP facts for an error to be detectable.

evaluation/detectability.py — Classifies errors as:

DETECTED — Error was found by the system
DETECTABLE_BUT_MISSED — Required predicates were present but no violation fired
UNDETECTABLE_BY_DESIGN — Required predicates were not extracted

experiment/ground_truth.py — Loads ground truth from CSV files and computes precision, recall, and F1 metrics.

experiment/summary.py — Generates comparison summaries from Step 1 + Step 2 results against ground truth.

Post-experiment analysis scripts:

analyze_data_from_experiment.py — Compares implanted errors vs. extraction/evaluation results
analyze_llm_only_experiment_results.py — Analyzes Step 1 LLM-only results with aggregate CSV reports

Utilities

merge/json_parser.py — Advanced iterative JSON repair with position-based error fixing (missing commas, unterminated strings, unclosed brackets).

merge/json_repair.py — Simpler JSON repair with structure validation for chapter data.

state/config.py — Path constants, API keys, story list, error categories.

state/data_structures.py — Shared dataclasses: ChapterError, ChapterResult, StepResults.

state/logging.py — Console + file logging with timestamps and step separators.

Running Experiments

Basic Usage

# Step 1: LLM-only evaluation
python -m scripts.run_narrative_experiment_refactored --step 1 --api-mode openai

# Step 2: Logic-based evaluation with full engine
python -m scripts.run_narrative_experiment_refactored --step 2 --engine --api-mode openai

# Step 2 with split extraction
python -m scripts.run_narrative_experiment_refactored --step 2 --engine --api-mode openai --split-extraction

# Both steps with summary
python -m scripts.run_narrative_experiment_refactored --step both --engine --api-mode openai --summarize

CLI Options

Option	Description
`--step {1,2,both}`	Which evaluation step to run
`--api-mode {local,gemini,openai,debug}`	LLM backend (default: `local`)
`--api-model MODEL`	Model name (e.g., `gpt-4o`, `gemini-2.0-flash`)
`--engine`	Use full Phase 5 engine pipeline
`--split-extraction`	Enable Phase 2 four-function extraction
`--extraction-only`	Extract only, skip logic evaluation
`--structured`	Use `evaluate_chapter_v2()`
`--max-chapters N`	Limit number of chapters processed
`--llm-timeout N`	LLM call timeout in seconds
`--summarize`	Generate experiment summary after completion

Logic-Only Testing

# Run ASP/Clingo on pre-extracted data (no LLM required)
python -m scripts.logic_only_test_runner --story "Harry Potter" --chapter 1

# With per-statement trace
python -m scripts.logic_only_test_runner --story "Harry Potter" --chapter 1 --trace

K-Fold Experiments

# LLM vs Logic comparison
python -m scripts.run_kfold_experiment --api-mode openai

Environment Variables

Variable	Required For	Description
`OPENAI_API_KEY`	`--api-mode openai`	OpenAI API key
`GEMINI_API_KEY`	`--api-mode gemini`	Google Gemini API key
`ASP_DIAGNOSTICS`	Optional	Set to `1` to enable ASP universe size diagnostics

When using --api-mode local, the system expects a local LLM server at localhost:8080.

Supported Stories

The system has been tested on both original and error-injected ("modified") variants of:

Harry Potter (and the Philosopher's Stone)
The Hunger Games
The Lord of the Rings
Twilight
Goosebumps

Story texts should be placed in original_books/ and modified_books/ directories. Ground truth error annotations are loaded from errors_checklist/ CSV files.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
engine		engine
experiments		experiments
rules		rules
scripts		scripts
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Narrative Coherence Error Detector

Table of Contents

Architecture Overview

Key Principles

Project Structure

Core Pipeline

1. Continuity Context Building

2. LLM Extraction

3. Normalization Pipeline

4. ASP Fact Generation

5. Clingo Evaluation

6. Post-Processing

7. Final Analysis

Engine

State Management

Event Evaluation

Entity & Alias Resolution

Item Tracking

Rule Management

Active Universe Optimization

Projection Layer

Conflict Resolution

Final Analysis

Persistence & Context

Diagnostics

Rules

Core & Base Rules

General Rule Sets

Universal Rules

Story-Specific Rules

Error Categories

Scripts

Extraction Pipeline

Logic Evaluation

Experiment Runners

Evaluation & Analysis

Utilities

Running Experiments

Basic Usage

CLI Options

Logic-Only Testing

K-Fold Experiments

Environment Variables

Supported Stories

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages