Cloze Reader: Comprehensive System Overview

Architecture Foundation

The Cloze Reader is a vanilla JavaScript application with a FastAPI backend that transforms classic literature into interactive reading comprehension exercises. At its core, it represents a fascinating intersection of educational assessment theory and modern AI - using masked language modeling (the same technique that trains modern LLMs) to create reading tests for humans.

The application runs entirely in the browser using ES6 modules, with no build process required. This architectural choice maintains transparency and allows users to inspect every component of the system.

Core Game Flow

1. Initialization Sequence (`app.js:35-45`)

When the application starts, it follows this precise sequence:

Welcome overlay displays first-time user instructions
Game engine initialization loads book data and AI services
First round generation creates the initial cloze exercise
UI activation reveals the game interface and controls

2. Level-Aware Progression System (`clozeGameEngine.js:30-57`)

The game implements a sophisticated difficulty progression:

Levels 1-5: 1 blank per passage, easier vocabulary (4-7 letters)
Levels 6-10: 2 blanks per passage, medium difficulty (4-10 letters)
Level 11+: 3 blanks per passage, challenging vocabulary (5-14 letters)

Level advancement requires passing a single round. The scoring system is intentionally strict:

1 blank: Must be correct (100% accuracy)
2 blanks: Must get both correct (100% accuracy)
3+ blanks: Must get all but one correct (allowing one mistake)

AI Integration and Word Selection

Unified Model Architecture (`aiService.js:8-18`)

The system uses a single-model approach with Google Gemma-3-27b for all AI operations:

Word selection and difficulty assessment
Contextual hint generation
Literary contextualization
Conversational chat responses

For local deployment, it automatically switches to Gemma-3-12b when ?local=true is used, connecting to port 1234 (compatible with LM Studio and similar tools).

Intelligent Word Selection Process (`aiService.js:199-480`)

The AI word selection follows a sophisticated multi-step process:

Level-based constraints define vocabulary difficulty ranges
Passage analysis identifies candidate words, avoiding:
- Capitalized words (proper nouns, sentence beginnings)
- First 10 words of any passage (context establishment)
- Concatenated artifacts like "fromthe", "tothe", "hewas"
Distribution algorithm ensures blanks are spread throughout the passage
Validation filtering confirms words exist in the passage and meet length requirements
Fallback mechanisms provide manual selection if AI fails

Content Quality Filtering (`clozeGameEngine.js:108-310`)

The passage extraction system includes sophisticated quality detection:

Statistical analysis of capitalization ratios, punctuation density, and sentence structure
Pattern recognition for academic material (citations, abbreviations, etymology brackets)
Dictionary detection using hash symbols, reference numbers, and technical terminology
Formatting analysis identifying tables, indexes, and title pages
Progressive scoring system rejects passages above threshold (score > 3)

Book Data Service and Content Streaming

Dual-Source Architecture (`bookDataService.js:1-563`)

The system intelligently manages content from two sources:

Primary: Hugging Face Datasets API streaming from manu/project_gutenberg
- Real-time access to 70,000+ books
- Lazy loading with on-demand text processing
- Quality validation after selection
Fallback: Local embedded classics (10 canonical works)
- Pride and Prejudice, Tom Sawyer, Great Expectations, etc.
- Pre-processed and guaranteed to work offline
- Activated when streaming fails or API unavailable

Smart Content Processing (`bookDataService.js:200-310`)

Books undergo sophisticated cleaning:

Project Gutenberg artifact removal: Start/end markers, metadata headers, scanning notes
Structural cleaning: Chapter headers, page numbers, formatting artifacts
Content identification: Locates actual narrative text vs. front matter
Quality validation: Ensures sufficient length, narrative structure, and readability

The system uses lazy processing - books are initially loaded with minimal validation, then fully processed only when selected for gameplay, optimizing performance.

UI Components and User Interactions

Responsive Interface Design (`index.html:51-69`)

The application features a sticky control panel architecture:

Primary controls: Submit, Next Passage, Show Hints buttons
Leaderboard access: Quick trophy button for score viewing
Mobile optimization: Fixed bottom positioning that stays above mobile keyboards
Accessibility: 48px minimum touch targets, backdrop blur effects

Interactive Cloze Interface

Each blank is rendered as an intelligent input field that:

Dynamic sizing: Width adjusts to expected word length (Math.max(50, word.length * 10)px)
Chat integration: 💬 button next to each blank for contextual help
Navigation flow: Enter key moves to next blank or submits when complete
Visual feedback: Real-time styling for correct/incorrect answers

Modal Chat System (`chatInterface.js:1-329`)

The chat interface provides contextual word-level assistance:

Question Categories:

Grammar: "What type of word is this?"
Meaning: "What does this word mean?"
Context: "Why does this word fit here?"
Clue: "Give me a clue"

Smart Features:

Persistent history: Conversations preserved per blank across the round
Question tracking: Used questions marked with ✓ and disabled
Typing indicators: Visual feedback during AI processing
Current input aware: AI sees user's partial answer for contextual help

Progressive Hint System

The application provides multiple hint layers:

Structural hints: Word length, first letter, last letter (level-dependent)
AI-generated hints: Contextual clues based on passage meaning
Interactive chat: Personalized assistance through conversation

Leaderboard and Scoring System

Arcade-Style Leaderboard (`leaderboardService.js:1-424`)

The leaderboard follows classic arcade game conventions:

Entry Requirements:

3-letter initials: A-Z only, automatically validated and sanitized
Top 10 tracking: Maximum entries maintained with automatic trimming
Fresh session: Data resets on each page load for fair competition

Scoring Hierarchy (Primary to Tertiary):

Highest level reached (most important)
Round number at that level (secondary ranking)
Total passages passed (tiebreaker)
Date achieved (final tiebreaker - newer wins)

Dual Persistence Strategy:

Primary: Hugging Face Hub backend for global persistence
Fallback: localStorage for offline functionality
Automatic sync: HF data downloads to localStorage on startup

Comprehensive Statistics Tracking

The system maintains detailed player analytics:

Performance Metrics:

Accuracy tracking: Correct/total words, success rates
Progression data: Highest level, rounds completed
Streak monitoring: Current and longest consecutive successes
Vocabulary analysis: Unique words correctly identified (stored as Set)

Session Management:

Reset on page load: Fresh competition each session
Real-time updates: Stats update after every passage attempt
Milestone notifications: Special alerts every 5 levels

Initials Entry System (`leaderboardUI.js:182-384`)

The initials modal provides dual input methods:

Modern Text Input:

Direct typing: Standard text field for keyboard users
Auto-sync: Updates arcade controls in real-time
Validation: Live 3-character limit with uppercase conversion

Arcade Controls:

Arrow navigation: Up/down buttons for each letter slot
Keyboard controls: Arrow keys, Enter, Tab navigation
Visual feedback: Active slot highlighting, smooth transitions

Both input methods remain perfectly synchronized throughout the interaction.

State Management and Persistence

Client-Side Storage Architecture

The application uses a pure localStorage strategy with no backend database requirements:

Storage Keys (`leaderboardService.js:11-15`):

cloze-reader-leaderboard: Top 10 high scores with ranking data
cloze-reader-player: Player profile with initials and session info
cloze-reader-stats: Comprehensive performance analytics

Data Serialization Strategy:

Set handling: JavaScript Sets converted to Arrays for JSON storage
Type validation: Runtime checks for data integrity
Fallback creation: Automatic empty object generation on corruption

Game State Flow

The application maintains state across multiple layers:

Round-Level State (`clozeGameEngine.js:10-26`):

this.currentBook = null;        // Active book metadata
this.originalText = '';         // Clean passage text  
this.clozeText = '';           // Text with blank placeholders
this.blanks = [];              // Word positions and answers
this.userAnswers = [];         // Current user input
this.currentLevel = 1;         // Difficulty progression
this.currentRound = 1;         // Round counter
this.contextualization = '';   // AI-generated context

Conversation State (`chatInterface.js:8`):

this.messageHistory = new Map(); // blankId -> message arrays
// Preserves chat history per blank throughout round

Performance State:

Live statistics: Updated after every passage attempt
Streak tracking: Current and historical success runs
Vocabulary learning: Cumulative words correctly identified

Session Lifecycle Management

The application follows a clear state progression:

Page Load: Fresh leaderboard, reset statistics
Game Start: Initialize book service, load first passage
Round Progression: Maintain state, clear chat history between rounds
High Score: Trigger initials entry, update leaderboard
Session End: Data persists until next page load

Error Handling and Fallback Mechanisms

Multi-Layer Fallback Strategy

The application implements comprehensive error recovery at every level:

AI Service Fallbacks (`aiService.js:48-63`):

Retry with exponential backoff: Up to 3 attempts with increasing delays
Response extraction hierarchy:
- Primary: message.content
- Secondary: reasoning field
- Tertiary: reasoning_details array
- Final: Regex pattern matching for partial extraction
Manual word selection: If AI fails, statistical content word selection
Generic hint generation: Fallback responses based on question type

Content Service Resilience (`bookDataService.js:91-115`):

HF API availability check: Test connection before streaming attempts
Preloaded content: Cache books for immediate access
Local book fallback: 10 embedded classics guarantee functionality
Lazy processing: Defer expensive operations until needed
Quality validation: Multiple attempts with different passage selections

Network and API Error Recovery:

// Example from aiService.js:507-520
const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), 15000);
// ... fetch with timeout handling
if (error.name === 'AbortError') {
  throw new Error('Request timed out - falling back to sequential processing');
}

Progressive Degradation Model

The system gracefully reduces functionality rather than failing completely:

API Key Scenarios:

No API key: Uses local LLM mode on port 1234
API failure: Manual word selection with structural hints
Partial response: Extracts usable data from incomplete JSON

Content Scenarios:

Streaming failure: Fall back to local embedded books
Book processing error: Skip to next available book
Quality filter rejection: Retry with different passage selection

UI Error Handling (`app.js:41-44`):

catch (error) {
  console.error('Failed to initialize app:', error);
  this.showError('Failed to load the game. Please refresh and try again.');
}

Logging and Debugging Infrastructure

The application provides comprehensive diagnostic information:

Quality score breakdown: Detailed passage rejection reasons
AI response parsing: Full response logging for debugging
Performance timing: Book processing duration tracking
State transitions: Level advancement and round progression logging

Development and Deployment Architecture

Environment Configuration

The system supports multiple deployment modes:

Local Development (`CLAUDE.md`):

make dev or python -m http.server 8000: Simple HTTP server
make dev-python or python app.py: FastAPI development server

Production Deployment:

Docker support: Full containerization with docker-compose
Environment injection: FastAPI securely injects API keys via meta tags
Static file serving: No build process required for vanilla JavaScript

API Key Management

The application handles API keys through multiple channels:

Environment variables: OPENROUTER_API_KEY for server-side injection
Browser globals: window.OPENROUTER_API_KEY from meta tags
Runtime setting: window.setOpenRouterKey() for browser console updates
Local mode: ?local=true bypasses API key requirements

Conceptual Framework: The Meta-Commentary

This application represents a convergence of two parallel histories:

Educational Assessment (1953): Wilson L. Taylor's Cloze Procedure

Systematic word deletion to measure reading comprehension
Context-dependent gap filling requiring syntactic and semantic integration
Efficiency through multiple-choice elimination and objective scoring

AI Training (2018): BERT's Masked Language Modeling

Random token masking to train contextual understanding
Prediction accuracy as a measure of language model performance
Scaled training on internet-scale text corpora

The Recursive Loop

The Cloze Reader creates a fascinating recursive relationship:

AI models trained on cloze tasks now generate cloze tests for humans
Assessment methodology becomes training data for future model iterations
Human performance data could theoretically improve AI cloze generation

This system stages the tension between:

Standardized assessment vs serendipitous discovery
Human-curated difficulty vs algorithmically-determined challenge
Transparent educational goals vs black-boxed AI decision making
Local control vs cloud dependency

By using open-weight models (Gemma), streaming from public archives (Project Gutenberg), and maintaining full client-side operation, the system preserves interrogability and agency while exploring the convergence of human and machine language understanding.

This comprehensive overview demonstrates how the Cloze Reader transforms classic literature into an interactive learning experience while exploring fundamental questions about assessment, AI, and reading comprehension in the digital age.

FilesExpand file tree

SYSTEM_OVERVIEW.md

Latest commit

History