Skip to content

UCM-FDI-DISIA/narrative_coherence_error_detector

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Narrative Coherence Error Detector

A hybrid system that detects logical inconsistencies in fiction by combining Large Language Models (for structured information extraction from narrative text) with Answer Set Programming (for formal logical reasoning via Clingo).

The system processes novels chapter-by-chapter, extracts structured facts (characters, locations, items, events, relationships), converts them to ASP facts, and applies logic programs to detect five categories of narrative errors: Causality, Coherence, Temporal, Location, and Emotional.


Table of Contents


Architecture Overview

The system follows a strict "logic-first" architecture:

  • Python handles orchestration: state management, entity tracking, data conversion, and persistence.
  • LLMs (OpenAI, Google Gemini, or local) handle natural language understanding: extracting structured JSON from raw chapter text.
  • ASP/Clingo handles all logical reasoning: violation detection through constraint checking and derived facts.
                        ┌─────────────────────────┐
                        │     Chapter Text         │
                        └────────────┬────────────┘
                                     │
                                     ▼
                        ┌─────────────────────────┐
                        │   LLM Extraction Layer   │
                        │  (structured JSON output) │
                        └────────────┬────────────┘
                                     │
                    ┌────────────────┼────────────────┐
                    ▼                ▼                ▼
            ┌──────────────┐ ┌────────────┐ ┌──────────────┐
            │AliasResolver │ │ItemTracker │ │  Continuity  │
            │(canonical ID)│ │(lifecycle) │ │   Context    │
            └──────┬───────┘ └─────┬──────┘ └──────────────┘
                   │               │
                   └───────┬───────┘
                           ▼
                ┌──────────────────────┐
                │    EventExecutor     │
                │  ┌────────────────┐  │
                │  │ to_asp()       │  │   ──► ASP facts
                │  │ ActiveUniverse │  │   ──► Entity filtering
                │  │ MovementGuard  │  │   ──► Implicit transitions
                │  │ check_clingo() │  │   ──► Rule evaluation
                │  └────────────────┘  │
                └──────────┬───────────┘
                           │
          ┌────────────────┼────────────────┐
          ▼                                ▼
   ┌─────────────┐              ┌──────────────┐
   │  Conflict   │              │   Context    │
   │  Resolver   │              │ Persistence  │
   │ (overrides) │              │  (save/load) │
   └─────────────┘              └──────────────┘
                           │
                           ▼
                ┌──────────────────────┐
                │   FinalAnalyzer      │
                │  (story-wide audit)  │
                └──────────────────────┘

Key Principles

  1. Logic-first: Python never encodes narrative logic in conditionals — all reasoning is in ASP rules.
  2. Deterministic: Every conclusion traces to rules. First-seen-wins for alias unification.
  3. Memory-bounded: No historical state snapshots. Lifecycle management (ACTIVE/LATENT/FROZEN) bounds entity count. Active universe limits ASP grounding.
  4. Structured output only: All outputs are JSON-serializable dataclasses with full provenance.
  5. Audit trail: Alias promotions, rule overrides, and conflict resolutions are all logged.

Project Structure

narrative_coherence_error_detector/
├── engine/                    # Core evaluation engine (Python)
│   ├── state_manager.py       # World state & Logic Knowledge Graph (LKG)
│   ├── event_executor.py      # ASP fact generation & Clingo evaluation
│   ├── alias_resolver.py      # Cross-chapter entity identity resolution
│   ├── entity_registry.py     # Canonical entity storage with lifecycle
│   ├── item_tracker.py        # Item lifecycle & Chekhov's Gun detection
│   ├── rule_registry.py       # ASP rule file management & priority layers
│   ├── conflict_resolver.py   # Story vs. universal rule conflict handling
│   ├── final_analysis.py      # Story-wide consistency analysis
│   ├── active_universe.py     # ASP-visible entity universe computation
│   ├── relationship_projector.py  # Relationship projection into ASP
│   ├── rule_projector.py      # Story rule projection with filtering
│   ├── movement_continuity_guard.py  # Implicit movement inference
│   ├── continuity_context.py  # LLM prompt context builder
│   ├── context_persistence.py # Cross-session JSON state persistence
│   └── asp_diagnostics.py     # ASP universe size monitoring
│
├── rules/                     # ASP rule programs (Clingo)
│   ├── core.lp                # Foundation: Event Calculus, entity types, aliases
│   ├── base.lp                # Original self-contained rule set (v1)
│   ├── general.lp             # Domain-independent rules (violation/4)
│   ├── general_v2.lp          # Enhanced with descriptions (violation/5)
│   ├── general_abstract.lp    # Comprehensive standalone rules
│   ├── general_narrative.lp   # Appearance & behavioral analysis
│   ├── simple_narrative.lp    # Primary production rules (Phase 3)
│   ├── enhanced_detection.lp  # Multi-signal anomaly detection
│   ├── story_rules.lp         # Story-specific dynamic rules
│   └── universal/             # Lowest-priority overridable rules
│       ├── temporal.lp        # Temporal violation detection
│       ├── location.lp        # Spatial reasoning with containment
│       ├── causality.lp       # Dead agents, effect without cause
│       ├── coherence.lp       # Self-referential & state contradictions
│       ├── emotional.lp       # Relationship-action mismatches
│       ├── appearance.lp      # Physical appearance consistency
│       ├── items.lp           # Item lifecycle & Chekhov's Gun
│       ├── knowledge.lp       # Monotonic knowledge tracking
│       └── narrative_state.lp # Three-tier state model
│
├── scripts/                   # Experiment orchestration & extraction
│   ├── run_narrative_experiment_refactored.py  # Main CLI entry point
│   ├── run_kfold_experiment.py      # K-fold: LLM vs logic comparison
│   ├── kfold_experiment_runner.py   # K-fold on pre-extracted data
│   ├── logic_only_test_runner.py    # ASP-only runner (no LLM)
│   ├── analyze_data_from_experiment.py     # Post-experiment analysis
│   ├── analyze_llm_only_experiment_results.py  # Step 1 result analysis
│   ├── debug_chapter.py            # Quick single-chapter debug tool
│   ├── extraction/            # LLM-based chapter extraction
│   │   ├── api_clients.py     # LLM backend factory (local/OpenAI/Gemini)
│   │   ├── prompts.py         # All LLM prompt templates
│   │   ├── chapter_extractor.py    # Main extraction entry point
│   │   ├── extractors.py     # Phase 2 split extraction pipeline
│   │   ├── entity_registry.py     # Canonical entity management
│   │   ├── relationship_normalizer.py  # Relationship validation
│   │   ├── event_normalizer.py    # Event validation & item classification
│   │   ├── event_type_mapper.py   # Event type refinement
│   │   ├── lifecycle_tracker.py   # Cross-chapter lifecycle tracking
│   │   ├── post_salvage_reconciler.py  # Salvaged event repair
│   │   ├── extraction_diagnostics.py   # Under-extraction detection
│   │   ├── temporal_diagnostics.py     # Temporal recall analysis
│   │   ├── llm_client.py     # LLM-only evaluation client
│   │   └── json_utils.py     # LLM output JSON repair
│   ├── logic/                 # ASP conversion & evaluation
│   │   ├── asp_converter.py   # JSON → ASP fact conversion
│   │   └── evaluator.py      # Main LogicEvaluator orchestrator
│   ├── experiment/            # Experiment management
│   │   ├── runners.py         # Step 1 (LLM) & Step 2 (Logic) runners
│   │   ├── ground_truth.py    # Ground truth loading & metrics
│   │   └── summary.py        # Cross-step comparison reports
│   ├── evaluation/            # Result evaluation
│   │   ├── evidence_checklist.py  # Per-category evidence requirements
│   │   └── detectability.py  # Error detectability classification
│   ├── merge/                 # JSON parsing utilities
│   │   ├── json_parser.py     # Advanced iterative JSON repair
│   │   └── json_repair.py    # Structure validation
│   └── state/                 # Shared configuration
│       ├── config.py          # Paths, API keys, constants
│       ├── data_structures.py # Shared result dataclasses
│       └── logging.py        # Console + file logging
│
└── experiments/               # Experiment results

Core Pipeline

The system processes a story through these stages for each chapter:

1. Continuity Context Building

Before extraction, the system builds a context of known characters, relationships, and states from previous chapters (via ContinuityContextBuilder). This is injected into the LLM prompt so it respects established facts.

2. LLM Extraction

The chapter text is sent to an LLM with structured prompts requesting:

  • Characters: names, aliases, traits, emotions, states
  • Locations: names, containment hierarchy
  • Items: names, carriers, lifecycle state
  • Events: agent, patient, action type, location, destination
  • Relationships: type, directionality, evidence

Extraction supports single-call or split extraction (Phase 2), which makes four independent LLM calls for characters/locations, items, relationships, and events, then merges the results.

3. Normalization Pipeline

The raw extraction is processed through multiple normalization phases:

  • Alias Resolution (AliasResolver): maps all entity references to canonical IDs with conflict detection and canonical promotion
  • Relationship Normalization (RelationshipNormalizer): validates types, expands group references, detects conflicts
  • Event Normalization (EventNormalizer): validates agents exist, generates event times, classifies item usage
  • Item Tracking (ItemTracker): classifies items as causal/latent/background, tracks lifecycle

4. ASP Fact Generation

The normalized extraction is converted to ASP facts (EventExecutor.to_asp()):

  • Entity declarations (character/1, location/1, item/1)
  • Alias mappings (alias/2)
  • State facts (is_dead/1, emotion/2, trait/2)
  • Event facts (event/6, event_order/2, event_time/2)
  • Relationship facts (relationship/4)
  • Presence facts (present/3, implied_present/3)
  • Cross-chapter persistent state facts
  • Implicit movement transitions (via MovementContinuityGuard)

The Active Universe optimization filters all facts to include only entities relevant to the current chapter, reducing ASP grounding time.

5. Clingo Evaluation

The ASP facts are combined with rule files and evaluated by Clingo:

  • Core rules (core.lp) are always loaded
  • Universal rules (rules/universal/) provide default detection
  • Story-specific rules are projected through the active universe filter
  • Violations are extracted as violation/N atoms and parsed into StructuredViolation objects

6. Post-Processing

  • Conflict Resolution: story-specific rules can override universal rules
  • State Update: the world state is updated with new facts from the chapter
  • Persistence: state is saved to disk for cross-session continuity

7. Final Analysis

After the last chapter, FinalAnalyzer performs story-wide analysis:

  • Rule audit: which rules fired, how often, which were overridden
  • Loose ends: unresolved Chekhov's Gun items, unused characters, dangling relationships
  • Long-range inconsistencies: violations spanning multiple chapters (e.g., repeated dead character acting)

Engine

The engine/ package is the Python orchestration layer. All modules are imported through engine/__init__.py (version 0.9.9, Phase 8.11.2).

State Management

state_manager.py — Central world state management maintaining the Logic Knowledge Graph (LKG).

Class Purpose
Entity Represents a character, location, or item with traits, state, emotion, aliases
Relation Time-indexed relation (present, carries, relationship)
StoryRule Story-specific rules (relationship, trait, location, possession, temporal)
WorldState Snapshot at a timestep; contains entities, relations, derived facts. Has to_asp_facts()
StateManager Main class: add_entity(), mark_dead(), set_emotion(), advance_time(), get_asp_facts_for_clingo(), save()/load()

Key design: only current state in memory (no historical snapshots). Delegates entity storage to EntityRegistry.

Event Evaluation

event_executor.py — The core evaluation engine converting structured data to ASP facts and invoking Clingo.

Class Purpose
ViolationSeverity Enum: HARD, SOFT, WARNING
Provenance Tracks how conclusions were reached (rule ID, layer, chapter)
StructuredViolation Violation output with rule, category, type, entities, severity, provenance
ChapterEvaluationResult Complete chapter result with violations, state changes, ASP facts
Event State transition event with agent, patient, location, destination
EventExecutor Main class: to_asp(), check_with_clingo(), evaluate_chapter_structured()

Three evaluation modes:

  • evaluate_chapter() — legacy mode
  • evaluate_chapter_structured() — batch Clingo evaluation (primary)
  • evaluate_chapter_sequential() — per-event sequential evaluation

Entity & Alias Resolution

alias_resolver.py — Ensures consistent entity identity across chapters.

  • Maps aliases to canonical IDs for characters and locations
  • Canonical promotion: when a better ID is discovered (e.g., potterharry_potter)
  • Conflict unification: merges two canonical IDs sharing an alias (first-seen-wins)
  • Location containment hierarchy (contains/2 ASP facts)
  • Extraction normalization: normalize_extraction() normalizes all entity references
  • ASP generation: to_asp_facts() produces alias/2 and contains/2

entity_registry.py — Canonical entity storage with deduplication and lifecycle management.

Lifecycle State Definition
ACTIVE Acted within the last 3 chapters
LATENT Inactive for 3–9 chapters
FROZEN Inactive for ≥10 chapters

Only ACTIVE entities participate in ASP reasoning.

Item Tracking

item_tracker.py — Tracks item lifecycle states for narrative significance and Chekhov's Gun detection.

Item lifecycle states: INTRODUCEDCARRIEDUSED / DISCARDED / DESTROYED / GIVEN / LATENT

Item relevance classification:

Relevance Description
CAUSAL Participates in constraints — full violation checking
LATENT Tracked for Chekhov's Gun — candidate for unused item violation
BACKGROUND No violations generated — suppressed from reasoning

Rule Management

rule_registry.py — Central registry for all ASP rule files with priority layers.

Layer Priority Description
UNIVERSAL 1 (lowest) Default world assumptions, overridable
STORY 2 (highest) Story-specific rules and overrides

Key methods: load_legacy_rules(), get_active_rule_files(), get_combined_rules_content(), audit_summary().

Active Universe Optimization

active_universe.py — Computes the set of ASP-visible entities per chapter to reduce grounding time.

5-step algorithm:

  1. Current chapter entities (from extraction)
  2. Previous chapter entities (carry-over)
  3. 1-hop relationship expansion
  4. Carried items expansion
  5. Location presence

All ASP fact generation is filtered through this universe, preventing the knowledge base from growing unboundedly across chapters.

Projection Layer

relationship_projector.py — Re-projects persistent relationships into the current ASP context. Both endpoints must be in the active universe for a relationship to be projected.

rule_projector.py — Projects story-specific rules into ASP, gated by active universe membership. At least one referenced entity must be in the active universe for a rule to be projected.

movement_continuity_guard.py — Detects implicit movement between locations (when a character changes location without an explicit movement event). Generates synthetic implicit_leave events and derived_event/1, implicit_transition/5 ASP facts to reduce false time-travel violations.

Conflict Resolution

conflict_resolver.py — Handles conflicts between story-specific and universal rules. Analyzes violations to determine if they represent story-vs-universal conflicts (ghost exceptions, teleportation, magic systems), and resolves them via override (full deactivation), exception (entity-specific), or retain.

Final Analysis

final_analysis.py — Story-wide consistency analysis run after the final chapter.

Produces a FinalAnalysisResult containing:

  • Rule audit: each rule's activation count, violation count, override status
  • Loose ends: unresolved items (Chekhov's Gun), unused characters, dangling relationships
  • Long-range inconsistencies: violations spanning multiple chapters
  • Statistics: total violations, category distribution, entity counts

Persistence & Context

context_persistence.py — Persists global narrative context to disk as JSON. Supports atomic writes (temp file → rename), incremental updates, and cross-session state restoration.

continuity_context.py — Builds continuity context for injection into LLM extraction prompts, ensuring the LLM respects established facts (known characters, relationships, character states).

Diagnostics

asp_diagnostics.py — Lightweight optional instrumentation for monitoring ASP universe size across chapters. Detects growth patterns and warns on exponential growth. Zero runtime overhead when disabled (toggle via ASP_DIAGNOSTICS=1 environment variable).


Rules

The rules/ folder contains Answer Set Programming (ASP) rule files processed by Clingo. Rules detect logical inconsistencies by computing violation/N atoms from extracted facts.

Core & Base Rules

core.lp (~1000 lines) — The foundation loaded by all other rule layers. Implements:

  • Event Calculus: holds/2, initially/1, initiates/2, terminates/2 with frame axioms
  • Entity type hierarchy: entity/1, character/1, location/1, item/1
  • Alias resolution: canonical/2, alias/2
  • Time management: time/1, before/2, after/2, next_time/2
  • Containment hierarchy: contains_transitive/2, same_container/2
  • Event classification: is_movement_event/1, is_death_event/1, etc.
  • Layered rule precedence: rule_active/1, rule_overridden/1, suppressed/4
  • Posture/consciousness modeling: posture/3, consciousness/3

base.lp (~660 lines) — The original self-contained rule set (v1). Includes Allen's interval algebra, fluent reasoning, common-sense cause/effect axioms, phobia rules, and edibility defaults.

General Rule Sets

Multiple iterations of domain-independent general rules, each progressively more sophisticated:

File Format Key Additions
general.lp violation/4 Clean, self-contained; covers all 5 categories
general_v2.lp violation/5 Human-readable @format descriptions, material types
general_abstract.lp violation/5 Auto-generated entity/event names, complete domain hooks
general_narrative.lp violation/5 Appearance analysis, behavioral rules, sudden emotional shifts

enhanced_detection.lp — Multi-signal anomaly detection. Accumulates evidence scores across independent signals (negative emotions, mixed relationships, hostile/friendly action mismatches, unusual states) and fires when thresholds are met. Designed for modified-story detection.

simple_narrative.lp (~852 lines) — The primary production rule set (Phase 3). Designed for practical multi-chapter analysis without explicit time intervals. Integrates with the three-tier narrative state model. Includes extensive reflexive action detection, authority figure exceptions, and implicit transition detection.

Universal Rules

Located in rules/universal/, these form the lowest priority layer (priority 1) and can be overridden by story-specific rules. All use rule_active/1 guards and register with rule(ID, universal, Description) for audit.

File Focus Key Violations
temporal.lp Temporal ordering & state contradictions ordering_violation, causal_violation, state_contradiction, time_travel, knowledge_before_acquisition
location.lp Spatial reasoning with containment hierarchy explicit_ubiquity, impossible_travel, invalid_remote, simultaneous_presence
causality.lp Dead agents, broken causal chains dead_character_acting, interacting_with_dead, effect_without_cause
coherence.lp Self-referential impossibilities, contradictory states impossible_self_action, contradictory_state
emotional.lp Relationship-action mismatches (evidence-gated) relationship_action_mismatch, relationship_betrayal, trait_action_mismatch
appearance.lp Physical appearance consistency appearance_exclusive_conflict, appearance_sudden_change, appearance_persistent_lost
items.lp Item lifecycle & Chekhov's Gun chekhov_candidate, suppress_violation for background items
knowledge.lp Monotonic knowledge tracking Infrastructure for temporal knowledge_before_acquisition
narrative_state.lp Three-tier state separation dead_in_scene, absent_acting, presence_contradiction

The three-tier narrative state model (narrative_state.lp) is architecturally significant:

  1. alive(C, T) — existence (alive or dead)
  2. present_in_scene(C, T) — narrative scene presence (can be present without known location)
  3. physically_at(C, L, T) — spatial location (implies scene presence)

This separation means a character can talk in a scene without a known location (valid), and a dead character cannot enter scenes (violation).

Story-Specific Rules

story_rules.lp (~922 lines) — Dynamic rules tracking how events modify relationships, traits, possession, and location. Contains 100+ action-relationship contradiction mappings (e.g., congratulates contradicts hates), action-trait contradiction pairs, and positive/negative relationship modifiers.

Includes story-specific content: Hogwarts location hierarchy, disjoint castle areas, canonical event exceptions (e.g., Hermione petrifying Neville does not flag as hostile).

Error Categories

Category Detection Focus Key Rule Files
Causality Dead agents acting, effect without cause, Chekhov's Gun, missing preconditions causality.lp, items.lp, narrative_state.lp
Coherence Trait conflicts, inedible consumption, self-referential impossibilities, appearance anomalies, contradictory states coherence.lp, appearance.lp, narrative_state.lp
Temporal Circular time, ordering violations, state contradictions, time travel, duration bounds temporal.lp, story_rules.lp
Location Ubiquity, impossible travel, unreachable locations, invalid remote interaction location.lp, story_rules.lp
Emotional Relationship-action mismatches, trait-action conflicts, betrayal, sudden emotional shifts emotional.lp, story_rules.lp

Scripts

The scripts/ package provides experiment orchestration, LLM extraction, logic evaluation, and analysis tools.

Extraction Pipeline

The extraction pipeline converts raw chapter text into structured JSON through multiple phases:

Phase 2 — Split Extraction (extraction/extractors.py): Four independent LLM calls per chapter, each extracting a different aspect:

  1. Characters & Locations
  2. Items
  3. Relationships
  4. Events

Results are merged into a unified extraction. Includes truncation recovery (salvage).

Phase 3 — Entity Registry (extraction/entity_registry.py): Canonical entity management with alias resolution and event validation.

Phase 4 — Relationship Normalization (extraction/relationship_normalizer.py): Type validation, group expansion (e.g., "the trio" → individual characters), conflict detection.

Phase 5 — Event Normalization (extraction/event_normalizer.py): Strict event validation (agents must exist), event_time generation, item usage classification (causal vs. latent).

Phase 6 — Lifecycle Tracking (extraction/lifecycle_tracker.py): Cross-chapter entity lifecycle tracking, Chekhov's Gun detection for unused items, dormant character detection.

Supporting modules:

  • api_clients.py — Factory for LLM backends (LocalLLMClient, OpenAIAPIClient, GeminiAPIClient)
  • prompts.py — All LLM prompt templates (~1624 lines): extraction schemas, social action typing, implied presence
  • chapter_extractor.py — Main extraction entry point (structure_chapter_standalone())
  • json_utils.py — Deterministic JSON repair for LLM output (markdown fences, single quotes, trailing commas)
  • extraction_diagnostics.py — Detects under-extraction via regex evidence scanning
  • temporal_diagnostics.py — Detects temporal language not captured by extraction

Logic Evaluation

logic/asp_converter.py — Converts structured JSON to ASP facts. Handles characters, locations, items, events, relationships, story rules, and implied presence. Supports active universe filtering.

logic/evaluator.py — Main LogicEvaluator orchestrator: LLM extraction → ASP facts → Clingo checking → result collection. Tracks cross-chapter state (dead characters, emotions, relationships, story rules).

Experiment Runners

run_narrative_experiment_refactored.py — Main CLI entry point with a two-step experiment design:

Step Description
Step 1 — LLM-Only Reads chapters sequentially; LLM detects errors directly from text (no formal logic)
Step 2 — Logic-Based Full pipeline: LLM extraction → normalization → ASP → Clingo → violations

Step 2 supports multiple modes:

  • --engine: Full Phase 5 pipeline with all engine modules
  • --structured: LogicEvaluator.evaluate_chapter_v2() without interpretation LLM call
  • Default: LogicEvaluator.evaluate_chapter() with LLM interpretation
  • --split-extraction: Enables Phase 2 four-function extraction
  • --extraction-only: Extract only, skip logic evaluation

experiment/runners.py — Implements run_step1_llm() and run_step2_logic() / run_step2_engine() / run_step2_debug().

K-Fold Experiments:

  • run_kfold_experiment.py — Compares LLM vs. logic approaches with k-fold cross-validation
  • kfold_experiment_runner.py — Runs logic-only k-fold on pre-extracted data

Other runners:

  • logic_only_test_runner.py — Runs ASP/Clingo on pre-extracted data without LLMs; supports --trace for per-statement evaluation

Evaluation & Analysis

evaluation/evidence_checklist.py — Defines per-category evidence requirements. Each error category has mandatory and alternative predicates that must be present in the ASP facts for an error to be detectable.

evaluation/detectability.py — Classifies errors as:

  • DETECTED — Error was found by the system
  • DETECTABLE_BUT_MISSED — Required predicates were present but no violation fired
  • UNDETECTABLE_BY_DESIGN — Required predicates were not extracted

experiment/ground_truth.py — Loads ground truth from CSV files and computes precision, recall, and F1 metrics.

experiment/summary.py — Generates comparison summaries from Step 1 + Step 2 results against ground truth.

Post-experiment analysis scripts:

  • analyze_data_from_experiment.py — Compares implanted errors vs. extraction/evaluation results
  • analyze_llm_only_experiment_results.py — Analyzes Step 1 LLM-only results with aggregate CSV reports

Utilities

merge/json_parser.py — Advanced iterative JSON repair with position-based error fixing (missing commas, unterminated strings, unclosed brackets).

merge/json_repair.py — Simpler JSON repair with structure validation for chapter data.

state/config.py — Path constants, API keys, story list, error categories.

state/data_structures.py — Shared dataclasses: ChapterError, ChapterResult, StepResults.

state/logging.py — Console + file logging with timestamps and step separators.


Running Experiments

Basic Usage

# Step 1: LLM-only evaluation
python -m scripts.run_narrative_experiment_refactored --step 1 --api-mode openai

# Step 2: Logic-based evaluation with full engine
python -m scripts.run_narrative_experiment_refactored --step 2 --engine --api-mode openai

# Step 2 with split extraction
python -m scripts.run_narrative_experiment_refactored --step 2 --engine --api-mode openai --split-extraction

# Both steps with summary
python -m scripts.run_narrative_experiment_refactored --step both --engine --api-mode openai --summarize

CLI Options

Option Description
--step {1,2,both} Which evaluation step to run
--api-mode {local,gemini,openai,debug} LLM backend (default: local)
--api-model MODEL Model name (e.g., gpt-4o, gemini-2.0-flash)
--engine Use full Phase 5 engine pipeline
--split-extraction Enable Phase 2 four-function extraction
--extraction-only Extract only, skip logic evaluation
--structured Use evaluate_chapter_v2()
--max-chapters N Limit number of chapters processed
--llm-timeout N LLM call timeout in seconds
--summarize Generate experiment summary after completion

Logic-Only Testing

# Run ASP/Clingo on pre-extracted data (no LLM required)
python -m scripts.logic_only_test_runner --story "Harry Potter" --chapter 1

# With per-statement trace
python -m scripts.logic_only_test_runner --story "Harry Potter" --chapter 1 --trace

K-Fold Experiments

# LLM vs Logic comparison
python -m scripts.run_kfold_experiment --api-mode openai

Environment Variables

Variable Required For Description
OPENAI_API_KEY --api-mode openai OpenAI API key
GEMINI_API_KEY --api-mode gemini Google Gemini API key
ASP_DIAGNOSTICS Optional Set to 1 to enable ASP universe size diagnostics

When using --api-mode local, the system expects a local LLM server at localhost:8080.


Supported Stories

The system has been tested on both original and error-injected ("modified") variants of:

  1. Harry Potter (and the Philosopher's Stone)
  2. The Hunger Games
  3. The Lord of the Rings
  4. Twilight
  5. Goosebumps

Story texts should be placed in original_books/ and modified_books/ directories. Ground truth error annotations are loaded from errors_checklist/ CSV files.

About

Narrative coherence error detector

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors