A hybrid system that detects logical inconsistencies in fiction by combining Large Language Models (for structured information extraction from narrative text) with Answer Set Programming (for formal logical reasoning via Clingo).
The system processes novels chapter-by-chapter, extracts structured facts (characters, locations, items, events, relationships), converts them to ASP facts, and applies logic programs to detect five categories of narrative errors: Causality, Coherence, Temporal, Location, and Emotional.
- Architecture Overview
- Project Structure
- Core Pipeline
- Engine (
engine/) - Rules (
rules/) - Scripts (
scripts/) - Running Experiments
- Environment Variables
- Supported Stories
The system follows a strict "logic-first" architecture:
- Python handles orchestration: state management, entity tracking, data conversion, and persistence.
- LLMs (OpenAI, Google Gemini, or local) handle natural language understanding: extracting structured JSON from raw chapter text.
- ASP/Clingo handles all logical reasoning: violation detection through constraint checking and derived facts.
┌─────────────────────────┐
│ Chapter Text │
└────────────┬────────────┘
│
▼
┌─────────────────────────┐
│ LLM Extraction Layer │
│ (structured JSON output) │
└────────────┬────────────┘
│
┌────────────────┼────────────────┐
▼ ▼ ▼
┌──────────────┐ ┌────────────┐ ┌──────────────┐
│AliasResolver │ │ItemTracker │ │ Continuity │
│(canonical ID)│ │(lifecycle) │ │ Context │
└──────┬───────┘ └─────┬──────┘ └──────────────┘
│ │
└───────┬───────┘
▼
┌──────────────────────┐
│ EventExecutor │
│ ┌────────────────┐ │
│ │ to_asp() │ │ ──► ASP facts
│ │ ActiveUniverse │ │ ──► Entity filtering
│ │ MovementGuard │ │ ──► Implicit transitions
│ │ check_clingo() │ │ ──► Rule evaluation
│ └────────────────┘ │
└──────────┬───────────┘
│
┌────────────────┼────────────────┐
▼ ▼
┌─────────────┐ ┌──────────────┐
│ Conflict │ │ Context │
│ Resolver │ │ Persistence │
│ (overrides) │ │ (save/load) │
└─────────────┘ └──────────────┘
│
▼
┌──────────────────────┐
│ FinalAnalyzer │
│ (story-wide audit) │
└──────────────────────┘
- Logic-first: Python never encodes narrative logic in conditionals — all reasoning is in ASP rules.
- Deterministic: Every conclusion traces to rules. First-seen-wins for alias unification.
- Memory-bounded: No historical state snapshots. Lifecycle management (ACTIVE/LATENT/FROZEN) bounds entity count. Active universe limits ASP grounding.
- Structured output only: All outputs are JSON-serializable dataclasses with full provenance.
- Audit trail: Alias promotions, rule overrides, and conflict resolutions are all logged.
narrative_coherence_error_detector/
├── engine/ # Core evaluation engine (Python)
│ ├── state_manager.py # World state & Logic Knowledge Graph (LKG)
│ ├── event_executor.py # ASP fact generation & Clingo evaluation
│ ├── alias_resolver.py # Cross-chapter entity identity resolution
│ ├── entity_registry.py # Canonical entity storage with lifecycle
│ ├── item_tracker.py # Item lifecycle & Chekhov's Gun detection
│ ├── rule_registry.py # ASP rule file management & priority layers
│ ├── conflict_resolver.py # Story vs. universal rule conflict handling
│ ├── final_analysis.py # Story-wide consistency analysis
│ ├── active_universe.py # ASP-visible entity universe computation
│ ├── relationship_projector.py # Relationship projection into ASP
│ ├── rule_projector.py # Story rule projection with filtering
│ ├── movement_continuity_guard.py # Implicit movement inference
│ ├── continuity_context.py # LLM prompt context builder
│ ├── context_persistence.py # Cross-session JSON state persistence
│ └── asp_diagnostics.py # ASP universe size monitoring
│
├── rules/ # ASP rule programs (Clingo)
│ ├── core.lp # Foundation: Event Calculus, entity types, aliases
│ ├── base.lp # Original self-contained rule set (v1)
│ ├── general.lp # Domain-independent rules (violation/4)
│ ├── general_v2.lp # Enhanced with descriptions (violation/5)
│ ├── general_abstract.lp # Comprehensive standalone rules
│ ├── general_narrative.lp # Appearance & behavioral analysis
│ ├── simple_narrative.lp # Primary production rules (Phase 3)
│ ├── enhanced_detection.lp # Multi-signal anomaly detection
│ ├── story_rules.lp # Story-specific dynamic rules
│ └── universal/ # Lowest-priority overridable rules
│ ├── temporal.lp # Temporal violation detection
│ ├── location.lp # Spatial reasoning with containment
│ ├── causality.lp # Dead agents, effect without cause
│ ├── coherence.lp # Self-referential & state contradictions
│ ├── emotional.lp # Relationship-action mismatches
│ ├── appearance.lp # Physical appearance consistency
│ ├── items.lp # Item lifecycle & Chekhov's Gun
│ ├── knowledge.lp # Monotonic knowledge tracking
│ └── narrative_state.lp # Three-tier state model
│
├── scripts/ # Experiment orchestration & extraction
│ ├── run_narrative_experiment_refactored.py # Main CLI entry point
│ ├── run_kfold_experiment.py # K-fold: LLM vs logic comparison
│ ├── kfold_experiment_runner.py # K-fold on pre-extracted data
│ ├── logic_only_test_runner.py # ASP-only runner (no LLM)
│ ├── analyze_data_from_experiment.py # Post-experiment analysis
│ ├── analyze_llm_only_experiment_results.py # Step 1 result analysis
│ ├── debug_chapter.py # Quick single-chapter debug tool
│ ├── extraction/ # LLM-based chapter extraction
│ │ ├── api_clients.py # LLM backend factory (local/OpenAI/Gemini)
│ │ ├── prompts.py # All LLM prompt templates
│ │ ├── chapter_extractor.py # Main extraction entry point
│ │ ├── extractors.py # Phase 2 split extraction pipeline
│ │ ├── entity_registry.py # Canonical entity management
│ │ ├── relationship_normalizer.py # Relationship validation
│ │ ├── event_normalizer.py # Event validation & item classification
│ │ ├── event_type_mapper.py # Event type refinement
│ │ ├── lifecycle_tracker.py # Cross-chapter lifecycle tracking
│ │ ├── post_salvage_reconciler.py # Salvaged event repair
│ │ ├── extraction_diagnostics.py # Under-extraction detection
│ │ ├── temporal_diagnostics.py # Temporal recall analysis
│ │ ├── llm_client.py # LLM-only evaluation client
│ │ └── json_utils.py # LLM output JSON repair
│ ├── logic/ # ASP conversion & evaluation
│ │ ├── asp_converter.py # JSON → ASP fact conversion
│ │ └── evaluator.py # Main LogicEvaluator orchestrator
│ ├── experiment/ # Experiment management
│ │ ├── runners.py # Step 1 (LLM) & Step 2 (Logic) runners
│ │ ├── ground_truth.py # Ground truth loading & metrics
│ │ └── summary.py # Cross-step comparison reports
│ ├── evaluation/ # Result evaluation
│ │ ├── evidence_checklist.py # Per-category evidence requirements
│ │ └── detectability.py # Error detectability classification
│ ├── merge/ # JSON parsing utilities
│ │ ├── json_parser.py # Advanced iterative JSON repair
│ │ └── json_repair.py # Structure validation
│ └── state/ # Shared configuration
│ ├── config.py # Paths, API keys, constants
│ ├── data_structures.py # Shared result dataclasses
│ └── logging.py # Console + file logging
│
└── experiments/ # Experiment results
The system processes a story through these stages for each chapter:
Before extraction, the system builds a context of known characters, relationships, and states from previous chapters (via ContinuityContextBuilder). This is injected into the LLM prompt so it respects established facts.
The chapter text is sent to an LLM with structured prompts requesting:
- Characters: names, aliases, traits, emotions, states
- Locations: names, containment hierarchy
- Items: names, carriers, lifecycle state
- Events: agent, patient, action type, location, destination
- Relationships: type, directionality, evidence
Extraction supports single-call or split extraction (Phase 2), which makes four independent LLM calls for characters/locations, items, relationships, and events, then merges the results.
The raw extraction is processed through multiple normalization phases:
- Alias Resolution (
AliasResolver): maps all entity references to canonical IDs with conflict detection and canonical promotion - Relationship Normalization (
RelationshipNormalizer): validates types, expands group references, detects conflicts - Event Normalization (
EventNormalizer): validates agents exist, generates event times, classifies item usage - Item Tracking (
ItemTracker): classifies items as causal/latent/background, tracks lifecycle
The normalized extraction is converted to ASP facts (EventExecutor.to_asp()):
- Entity declarations (
character/1,location/1,item/1) - Alias mappings (
alias/2) - State facts (
is_dead/1,emotion/2,trait/2) - Event facts (
event/6,event_order/2,event_time/2) - Relationship facts (
relationship/4) - Presence facts (
present/3,implied_present/3) - Cross-chapter persistent state facts
- Implicit movement transitions (via
MovementContinuityGuard)
The Active Universe optimization filters all facts to include only entities relevant to the current chapter, reducing ASP grounding time.
The ASP facts are combined with rule files and evaluated by Clingo:
- Core rules (
core.lp) are always loaded - Universal rules (
rules/universal/) provide default detection - Story-specific rules are projected through the active universe filter
- Violations are extracted as
violation/Natoms and parsed intoStructuredViolationobjects
- Conflict Resolution: story-specific rules can override universal rules
- State Update: the world state is updated with new facts from the chapter
- Persistence: state is saved to disk for cross-session continuity
After the last chapter, FinalAnalyzer performs story-wide analysis:
- Rule audit: which rules fired, how often, which were overridden
- Loose ends: unresolved Chekhov's Gun items, unused characters, dangling relationships
- Long-range inconsistencies: violations spanning multiple chapters (e.g., repeated dead character acting)
The engine/ package is the Python orchestration layer. All modules are imported through engine/__init__.py (version 0.9.9, Phase 8.11.2).
state_manager.py — Central world state management maintaining the Logic Knowledge Graph (LKG).
| Class | Purpose |
|---|---|
Entity |
Represents a character, location, or item with traits, state, emotion, aliases |
Relation |
Time-indexed relation (present, carries, relationship) |
StoryRule |
Story-specific rules (relationship, trait, location, possession, temporal) |
WorldState |
Snapshot at a timestep; contains entities, relations, derived facts. Has to_asp_facts() |
StateManager |
Main class: add_entity(), mark_dead(), set_emotion(), advance_time(), get_asp_facts_for_clingo(), save()/load() |
Key design: only current state in memory (no historical snapshots). Delegates entity storage to EntityRegistry.
event_executor.py — The core evaluation engine converting structured data to ASP facts and invoking Clingo.
| Class | Purpose |
|---|---|
ViolationSeverity |
Enum: HARD, SOFT, WARNING |
Provenance |
Tracks how conclusions were reached (rule ID, layer, chapter) |
StructuredViolation |
Violation output with rule, category, type, entities, severity, provenance |
ChapterEvaluationResult |
Complete chapter result with violations, state changes, ASP facts |
Event |
State transition event with agent, patient, location, destination |
EventExecutor |
Main class: to_asp(), check_with_clingo(), evaluate_chapter_structured() |
Three evaluation modes:
evaluate_chapter()— legacy modeevaluate_chapter_structured()— batch Clingo evaluation (primary)evaluate_chapter_sequential()— per-event sequential evaluation
alias_resolver.py — Ensures consistent entity identity across chapters.
- Maps aliases to canonical IDs for characters and locations
- Canonical promotion: when a better ID is discovered (e.g.,
potter→harry_potter) - Conflict unification: merges two canonical IDs sharing an alias (first-seen-wins)
- Location containment hierarchy (
contains/2ASP facts) - Extraction normalization:
normalize_extraction()normalizes all entity references - ASP generation:
to_asp_facts()producesalias/2andcontains/2
entity_registry.py — Canonical entity storage with deduplication and lifecycle management.
| Lifecycle State | Definition |
|---|---|
ACTIVE |
Acted within the last 3 chapters |
LATENT |
Inactive for 3–9 chapters |
FROZEN |
Inactive for ≥10 chapters |
Only ACTIVE entities participate in ASP reasoning.
item_tracker.py — Tracks item lifecycle states for narrative significance and Chekhov's Gun detection.
Item lifecycle states: INTRODUCED → CARRIED → USED / DISCARDED / DESTROYED / GIVEN / LATENT
Item relevance classification:
| Relevance | Description |
|---|---|
CAUSAL |
Participates in constraints — full violation checking |
LATENT |
Tracked for Chekhov's Gun — candidate for unused item violation |
BACKGROUND |
No violations generated — suppressed from reasoning |
rule_registry.py — Central registry for all ASP rule files with priority layers.
| Layer | Priority | Description |
|---|---|---|
UNIVERSAL |
1 (lowest) | Default world assumptions, overridable |
STORY |
2 (highest) | Story-specific rules and overrides |
Key methods: load_legacy_rules(), get_active_rule_files(), get_combined_rules_content(), audit_summary().
active_universe.py — Computes the set of ASP-visible entities per chapter to reduce grounding time.
5-step algorithm:
- Current chapter entities (from extraction)
- Previous chapter entities (carry-over)
- 1-hop relationship expansion
- Carried items expansion
- Location presence
All ASP fact generation is filtered through this universe, preventing the knowledge base from growing unboundedly across chapters.
relationship_projector.py — Re-projects persistent relationships into the current ASP context. Both endpoints must be in the active universe for a relationship to be projected.
rule_projector.py — Projects story-specific rules into ASP, gated by active universe membership. At least one referenced entity must be in the active universe for a rule to be projected.
movement_continuity_guard.py — Detects implicit movement between locations (when a character changes location without an explicit movement event). Generates synthetic implicit_leave events and derived_event/1, implicit_transition/5 ASP facts to reduce false time-travel violations.
conflict_resolver.py — Handles conflicts between story-specific and universal rules. Analyzes violations to determine if they represent story-vs-universal conflicts (ghost exceptions, teleportation, magic systems), and resolves them via override (full deactivation), exception (entity-specific), or retain.
final_analysis.py — Story-wide consistency analysis run after the final chapter.
Produces a FinalAnalysisResult containing:
- Rule audit: each rule's activation count, violation count, override status
- Loose ends: unresolved items (Chekhov's Gun), unused characters, dangling relationships
- Long-range inconsistencies: violations spanning multiple chapters
- Statistics: total violations, category distribution, entity counts
context_persistence.py — Persists global narrative context to disk as JSON. Supports atomic writes (temp file → rename), incremental updates, and cross-session state restoration.
continuity_context.py — Builds continuity context for injection into LLM extraction prompts, ensuring the LLM respects established facts (known characters, relationships, character states).
asp_diagnostics.py — Lightweight optional instrumentation for monitoring ASP universe size across chapters. Detects growth patterns and warns on exponential growth. Zero runtime overhead when disabled (toggle via ASP_DIAGNOSTICS=1 environment variable).
The rules/ folder contains Answer Set Programming (ASP) rule files processed by Clingo. Rules detect logical inconsistencies by computing violation/N atoms from extracted facts.
core.lp (~1000 lines) — The foundation loaded by all other rule layers. Implements:
- Event Calculus:
holds/2,initially/1,initiates/2,terminates/2with frame axioms - Entity type hierarchy:
entity/1,character/1,location/1,item/1 - Alias resolution:
canonical/2,alias/2 - Time management:
time/1,before/2,after/2,next_time/2 - Containment hierarchy:
contains_transitive/2,same_container/2 - Event classification:
is_movement_event/1,is_death_event/1, etc. - Layered rule precedence:
rule_active/1,rule_overridden/1,suppressed/4 - Posture/consciousness modeling:
posture/3,consciousness/3
base.lp (~660 lines) — The original self-contained rule set (v1). Includes Allen's interval algebra, fluent reasoning, common-sense cause/effect axioms, phobia rules, and edibility defaults.
Multiple iterations of domain-independent general rules, each progressively more sophisticated:
| File | Format | Key Additions |
|---|---|---|
general.lp |
violation/4 |
Clean, self-contained; covers all 5 categories |
general_v2.lp |
violation/5 |
Human-readable @format descriptions, material types |
general_abstract.lp |
violation/5 |
Auto-generated entity/event names, complete domain hooks |
general_narrative.lp |
violation/5 |
Appearance analysis, behavioral rules, sudden emotional shifts |
enhanced_detection.lp — Multi-signal anomaly detection. Accumulates evidence scores across independent signals (negative emotions, mixed relationships, hostile/friendly action mismatches, unusual states) and fires when thresholds are met. Designed for modified-story detection.
simple_narrative.lp (~852 lines) — The primary production rule set (Phase 3). Designed for practical multi-chapter analysis without explicit time intervals. Integrates with the three-tier narrative state model. Includes extensive reflexive action detection, authority figure exceptions, and implicit transition detection.
Located in rules/universal/, these form the lowest priority layer (priority 1) and can be overridden by story-specific rules. All use rule_active/1 guards and register with rule(ID, universal, Description) for audit.
| File | Focus | Key Violations |
|---|---|---|
temporal.lp |
Temporal ordering & state contradictions | ordering_violation, causal_violation, state_contradiction, time_travel, knowledge_before_acquisition |
location.lp |
Spatial reasoning with containment hierarchy | explicit_ubiquity, impossible_travel, invalid_remote, simultaneous_presence |
causality.lp |
Dead agents, broken causal chains | dead_character_acting, interacting_with_dead, effect_without_cause |
coherence.lp |
Self-referential impossibilities, contradictory states | impossible_self_action, contradictory_state |
emotional.lp |
Relationship-action mismatches (evidence-gated) | relationship_action_mismatch, relationship_betrayal, trait_action_mismatch |
appearance.lp |
Physical appearance consistency | appearance_exclusive_conflict, appearance_sudden_change, appearance_persistent_lost |
items.lp |
Item lifecycle & Chekhov's Gun | chekhov_candidate, suppress_violation for background items |
knowledge.lp |
Monotonic knowledge tracking | Infrastructure for temporal knowledge_before_acquisition |
narrative_state.lp |
Three-tier state separation | dead_in_scene, absent_acting, presence_contradiction |
The three-tier narrative state model (narrative_state.lp) is architecturally significant:
alive(C, T)— existence (alive or dead)present_in_scene(C, T)— narrative scene presence (can be present without known location)physically_at(C, L, T)— spatial location (implies scene presence)
This separation means a character can talk in a scene without a known location (valid), and a dead character cannot enter scenes (violation).
story_rules.lp (~922 lines) — Dynamic rules tracking how events modify relationships, traits, possession, and location. Contains 100+ action-relationship contradiction mappings (e.g., congratulates contradicts hates), action-trait contradiction pairs, and positive/negative relationship modifiers.
Includes story-specific content: Hogwarts location hierarchy, disjoint castle areas, canonical event exceptions (e.g., Hermione petrifying Neville does not flag as hostile).
| Category | Detection Focus | Key Rule Files |
|---|---|---|
| Causality | Dead agents acting, effect without cause, Chekhov's Gun, missing preconditions | causality.lp, items.lp, narrative_state.lp |
| Coherence | Trait conflicts, inedible consumption, self-referential impossibilities, appearance anomalies, contradictory states | coherence.lp, appearance.lp, narrative_state.lp |
| Temporal | Circular time, ordering violations, state contradictions, time travel, duration bounds | temporal.lp, story_rules.lp |
| Location | Ubiquity, impossible travel, unreachable locations, invalid remote interaction | location.lp, story_rules.lp |
| Emotional | Relationship-action mismatches, trait-action conflicts, betrayal, sudden emotional shifts | emotional.lp, story_rules.lp |
The scripts/ package provides experiment orchestration, LLM extraction, logic evaluation, and analysis tools.
The extraction pipeline converts raw chapter text into structured JSON through multiple phases:
Phase 2 — Split Extraction (extraction/extractors.py):
Four independent LLM calls per chapter, each extracting a different aspect:
- Characters & Locations
- Items
- Relationships
- Events
Results are merged into a unified extraction. Includes truncation recovery (salvage).
Phase 3 — Entity Registry (extraction/entity_registry.py):
Canonical entity management with alias resolution and event validation.
Phase 4 — Relationship Normalization (extraction/relationship_normalizer.py):
Type validation, group expansion (e.g., "the trio" → individual characters), conflict detection.
Phase 5 — Event Normalization (extraction/event_normalizer.py):
Strict event validation (agents must exist), event_time generation, item usage classification (causal vs. latent).
Phase 6 — Lifecycle Tracking (extraction/lifecycle_tracker.py):
Cross-chapter entity lifecycle tracking, Chekhov's Gun detection for unused items, dormant character detection.
Supporting modules:
api_clients.py— Factory for LLM backends (LocalLLMClient,OpenAIAPIClient,GeminiAPIClient)prompts.py— All LLM prompt templates (~1624 lines): extraction schemas, social action typing, implied presencechapter_extractor.py— Main extraction entry point (structure_chapter_standalone())json_utils.py— Deterministic JSON repair for LLM output (markdown fences, single quotes, trailing commas)extraction_diagnostics.py— Detects under-extraction via regex evidence scanningtemporal_diagnostics.py— Detects temporal language not captured by extraction
logic/asp_converter.py — Converts structured JSON to ASP facts. Handles characters, locations, items, events, relationships, story rules, and implied presence. Supports active universe filtering.
logic/evaluator.py — Main LogicEvaluator orchestrator: LLM extraction → ASP facts → Clingo checking → result collection. Tracks cross-chapter state (dead characters, emotions, relationships, story rules).
run_narrative_experiment_refactored.py — Main CLI entry point with a two-step experiment design:
| Step | Description |
|---|---|
| Step 1 — LLM-Only | Reads chapters sequentially; LLM detects errors directly from text (no formal logic) |
| Step 2 — Logic-Based | Full pipeline: LLM extraction → normalization → ASP → Clingo → violations |
Step 2 supports multiple modes:
--engine: Full Phase 5 pipeline with all engine modules--structured:LogicEvaluator.evaluate_chapter_v2()without interpretation LLM call- Default:
LogicEvaluator.evaluate_chapter()with LLM interpretation --split-extraction: Enables Phase 2 four-function extraction--extraction-only: Extract only, skip logic evaluation
experiment/runners.py — Implements run_step1_llm() and run_step2_logic() / run_step2_engine() / run_step2_debug().
K-Fold Experiments:
run_kfold_experiment.py— Compares LLM vs. logic approaches with k-fold cross-validationkfold_experiment_runner.py— Runs logic-only k-fold on pre-extracted data
Other runners:
logic_only_test_runner.py— Runs ASP/Clingo on pre-extracted data without LLMs; supports--tracefor per-statement evaluation
evaluation/evidence_checklist.py — Defines per-category evidence requirements. Each error category has mandatory and alternative predicates that must be present in the ASP facts for an error to be detectable.
evaluation/detectability.py — Classifies errors as:
DETECTED— Error was found by the systemDETECTABLE_BUT_MISSED— Required predicates were present but no violation firedUNDETECTABLE_BY_DESIGN— Required predicates were not extracted
experiment/ground_truth.py — Loads ground truth from CSV files and computes precision, recall, and F1 metrics.
experiment/summary.py — Generates comparison summaries from Step 1 + Step 2 results against ground truth.
Post-experiment analysis scripts:
analyze_data_from_experiment.py— Compares implanted errors vs. extraction/evaluation resultsanalyze_llm_only_experiment_results.py— Analyzes Step 1 LLM-only results with aggregate CSV reports
merge/json_parser.py — Advanced iterative JSON repair with position-based error fixing (missing commas, unterminated strings, unclosed brackets).
merge/json_repair.py — Simpler JSON repair with structure validation for chapter data.
state/config.py — Path constants, API keys, story list, error categories.
state/data_structures.py — Shared dataclasses: ChapterError, ChapterResult, StepResults.
state/logging.py — Console + file logging with timestamps and step separators.
# Step 1: LLM-only evaluation
python -m scripts.run_narrative_experiment_refactored --step 1 --api-mode openai
# Step 2: Logic-based evaluation with full engine
python -m scripts.run_narrative_experiment_refactored --step 2 --engine --api-mode openai
# Step 2 with split extraction
python -m scripts.run_narrative_experiment_refactored --step 2 --engine --api-mode openai --split-extraction
# Both steps with summary
python -m scripts.run_narrative_experiment_refactored --step both --engine --api-mode openai --summarize| Option | Description |
|---|---|
--step {1,2,both} |
Which evaluation step to run |
--api-mode {local,gemini,openai,debug} |
LLM backend (default: local) |
--api-model MODEL |
Model name (e.g., gpt-4o, gemini-2.0-flash) |
--engine |
Use full Phase 5 engine pipeline |
--split-extraction |
Enable Phase 2 four-function extraction |
--extraction-only |
Extract only, skip logic evaluation |
--structured |
Use evaluate_chapter_v2() |
--max-chapters N |
Limit number of chapters processed |
--llm-timeout N |
LLM call timeout in seconds |
--summarize |
Generate experiment summary after completion |
# Run ASP/Clingo on pre-extracted data (no LLM required)
python -m scripts.logic_only_test_runner --story "Harry Potter" --chapter 1
# With per-statement trace
python -m scripts.logic_only_test_runner --story "Harry Potter" --chapter 1 --trace# LLM vs Logic comparison
python -m scripts.run_kfold_experiment --api-mode openai| Variable | Required For | Description |
|---|---|---|
OPENAI_API_KEY |
--api-mode openai |
OpenAI API key |
GEMINI_API_KEY |
--api-mode gemini |
Google Gemini API key |
ASP_DIAGNOSTICS |
Optional | Set to 1 to enable ASP universe size diagnostics |
When using --api-mode local, the system expects a local LLM server at localhost:8080.
The system has been tested on both original and error-injected ("modified") variants of:
- Harry Potter (and the Philosopher's Stone)
- The Hunger Games
- The Lord of the Rings
- Twilight
- Goosebumps
Story texts should be placed in original_books/ and modified_books/ directories. Ground truth error annotations are loaded from errors_checklist/ CSV files.