Skip to content

Latest commit

 

History

History
338 lines (261 loc) · 15.3 KB

File metadata and controls

338 lines (261 loc) · 15.3 KB

Cloze Reader: Comprehensive System Overview

Architecture Foundation

The Cloze Reader is a vanilla JavaScript application with a FastAPI backend that transforms classic literature into interactive reading comprehension exercises. At its core, it represents a fascinating intersection of educational assessment theory and modern AI - using masked language modeling (the same technique that trains modern LLMs) to create reading tests for humans.

The application runs entirely in the browser using ES6 modules, with no build process required. This architectural choice maintains transparency and allows users to inspect every component of the system.

Core Game Flow

1. Initialization Sequence (app.js:35-45)

When the application starts, it follows this precise sequence:

  1. Welcome overlay displays first-time user instructions
  2. Game engine initialization loads book data and AI services
  3. First round generation creates the initial cloze exercise
  4. UI activation reveals the game interface and controls

2. Level-Aware Progression System (clozeGameEngine.js:30-57)

The game implements a sophisticated difficulty progression:

  • Levels 1-5: 1 blank per passage, easier vocabulary (4-7 letters)
  • Levels 6-10: 2 blanks per passage, medium difficulty (4-10 letters)
  • Level 11+: 3 blanks per passage, challenging vocabulary (5-14 letters)

Level advancement requires passing a single round. The scoring system is intentionally strict:

  • 1 blank: Must be correct (100% accuracy)
  • 2 blanks: Must get both correct (100% accuracy)
  • 3+ blanks: Must get all but one correct (allowing one mistake)

AI Integration and Word Selection

Unified Model Architecture (aiService.js:8-18)

The system uses a single-model approach with Google Gemma-3-27b for all AI operations:

  • Word selection and difficulty assessment
  • Contextual hint generation
  • Literary contextualization
  • Conversational chat responses

For local deployment, it automatically switches to Gemma-3-12b when ?local=true is used, connecting to port 1234 (compatible with LM Studio and similar tools).

Intelligent Word Selection Process (aiService.js:199-480)

The AI word selection follows a sophisticated multi-step process:

  1. Level-based constraints define vocabulary difficulty ranges
  2. Passage analysis identifies candidate words, avoiding:
    • Capitalized words (proper nouns, sentence beginnings)
    • First 10 words of any passage (context establishment)
    • Concatenated artifacts like "fromthe", "tothe", "hewas"
  3. Distribution algorithm ensures blanks are spread throughout the passage
  4. Validation filtering confirms words exist in the passage and meet length requirements
  5. Fallback mechanisms provide manual selection if AI fails

Content Quality Filtering (clozeGameEngine.js:108-310)

The passage extraction system includes sophisticated quality detection:

  • Statistical analysis of capitalization ratios, punctuation density, and sentence structure
  • Pattern recognition for academic material (citations, abbreviations, etymology brackets)
  • Dictionary detection using hash symbols, reference numbers, and technical terminology
  • Formatting analysis identifying tables, indexes, and title pages
  • Progressive scoring system rejects passages above threshold (score > 3)

Book Data Service and Content Streaming

Dual-Source Architecture (bookDataService.js:1-563)

The system intelligently manages content from two sources:

  1. Primary: Hugging Face Datasets API streaming from manu/project_gutenberg

    • Real-time access to 70,000+ books
    • Lazy loading with on-demand text processing
    • Quality validation after selection
  2. Fallback: Local embedded classics (10 canonical works)

    • Pride and Prejudice, Tom Sawyer, Great Expectations, etc.
    • Pre-processed and guaranteed to work offline
    • Activated when streaming fails or API unavailable

Smart Content Processing (bookDataService.js:200-310)

Books undergo sophisticated cleaning:

  1. Project Gutenberg artifact removal: Start/end markers, metadata headers, scanning notes
  2. Structural cleaning: Chapter headers, page numbers, formatting artifacts
  3. Content identification: Locates actual narrative text vs. front matter
  4. Quality validation: Ensures sufficient length, narrative structure, and readability

The system uses lazy processing - books are initially loaded with minimal validation, then fully processed only when selected for gameplay, optimizing performance.

UI Components and User Interactions

Responsive Interface Design (index.html:51-69)

The application features a sticky control panel architecture:

  • Primary controls: Submit, Next Passage, Show Hints buttons
  • Leaderboard access: Quick trophy button for score viewing
  • Mobile optimization: Fixed bottom positioning that stays above mobile keyboards
  • Accessibility: 48px minimum touch targets, backdrop blur effects

Interactive Cloze Interface

Each blank is rendered as an intelligent input field that:

  1. Dynamic sizing: Width adjusts to expected word length (Math.max(50, word.length * 10)px)
  2. Chat integration: 💬 button next to each blank for contextual help
  3. Navigation flow: Enter key moves to next blank or submits when complete
  4. Visual feedback: Real-time styling for correct/incorrect answers

Modal Chat System (chatInterface.js:1-329)

The chat interface provides contextual word-level assistance:

Question Categories:

  • Grammar: "What type of word is this?"
  • Meaning: "What does this word mean?"
  • Context: "Why does this word fit here?"
  • Clue: "Give me a clue"

Smart Features:

  • Persistent history: Conversations preserved per blank across the round
  • Question tracking: Used questions marked with ✓ and disabled
  • Typing indicators: Visual feedback during AI processing
  • Current input aware: AI sees user's partial answer for contextual help

Progressive Hint System

The application provides multiple hint layers:

  1. Structural hints: Word length, first letter, last letter (level-dependent)
  2. AI-generated hints: Contextual clues based on passage meaning
  3. Interactive chat: Personalized assistance through conversation

Leaderboard and Scoring System

Arcade-Style Leaderboard (leaderboardService.js:1-424)

The leaderboard follows classic arcade game conventions:

Entry Requirements:

  • 3-letter initials: A-Z only, automatically validated and sanitized
  • Top 10 tracking: Maximum entries maintained with automatic trimming
  • Fresh session: Data resets on each page load for fair competition

Scoring Hierarchy (Primary to Tertiary):

  1. Highest level reached (most important)
  2. Round number at that level (secondary ranking)
  3. Total passages passed (tiebreaker)
  4. Date achieved (final tiebreaker - newer wins)

Dual Persistence Strategy:

  • Primary: Hugging Face Hub backend for global persistence
  • Fallback: localStorage for offline functionality
  • Automatic sync: HF data downloads to localStorage on startup

Comprehensive Statistics Tracking

The system maintains detailed player analytics:

Performance Metrics:

  • Accuracy tracking: Correct/total words, success rates
  • Progression data: Highest level, rounds completed
  • Streak monitoring: Current and longest consecutive successes
  • Vocabulary analysis: Unique words correctly identified (stored as Set)

Session Management:

  • Reset on page load: Fresh competition each session
  • Real-time updates: Stats update after every passage attempt
  • Milestone notifications: Special alerts every 5 levels

Initials Entry System (leaderboardUI.js:182-384)

The initials modal provides dual input methods:

Modern Text Input:

  • Direct typing: Standard text field for keyboard users
  • Auto-sync: Updates arcade controls in real-time
  • Validation: Live 3-character limit with uppercase conversion

Arcade Controls:

  • Arrow navigation: Up/down buttons for each letter slot
  • Keyboard controls: Arrow keys, Enter, Tab navigation
  • Visual feedback: Active slot highlighting, smooth transitions

Both input methods remain perfectly synchronized throughout the interaction.

State Management and Persistence

Client-Side Storage Architecture

The application uses a pure localStorage strategy with no backend database requirements:

Storage Keys (leaderboardService.js:11-15):

  • cloze-reader-leaderboard: Top 10 high scores with ranking data
  • cloze-reader-player: Player profile with initials and session info
  • cloze-reader-stats: Comprehensive performance analytics

Data Serialization Strategy:

  • Set handling: JavaScript Sets converted to Arrays for JSON storage
  • Type validation: Runtime checks for data integrity
  • Fallback creation: Automatic empty object generation on corruption

Game State Flow

The application maintains state across multiple layers:

Round-Level State (clozeGameEngine.js:10-26):

this.currentBook = null;        // Active book metadata
this.originalText = '';         // Clean passage text  
this.clozeText = '';           // Text with blank placeholders
this.blanks = [];              // Word positions and answers
this.userAnswers = [];         // Current user input
this.currentLevel = 1;         // Difficulty progression
this.currentRound = 1;         // Round counter
this.contextualization = '';   // AI-generated context

Conversation State (chatInterface.js:8):

this.messageHistory = new Map(); // blankId -> message arrays
// Preserves chat history per blank throughout round

Performance State:

  • Live statistics: Updated after every passage attempt
  • Streak tracking: Current and historical success runs
  • Vocabulary learning: Cumulative words correctly identified

Session Lifecycle Management

The application follows a clear state progression:

  1. Page Load: Fresh leaderboard, reset statistics
  2. Game Start: Initialize book service, load first passage
  3. Round Progression: Maintain state, clear chat history between rounds
  4. High Score: Trigger initials entry, update leaderboard
  5. Session End: Data persists until next page load

Error Handling and Fallback Mechanisms

Multi-Layer Fallback Strategy

The application implements comprehensive error recovery at every level:

AI Service Fallbacks (aiService.js:48-63):

  1. Retry with exponential backoff: Up to 3 attempts with increasing delays
  2. Response extraction hierarchy:
    • Primary: message.content
    • Secondary: reasoning field
    • Tertiary: reasoning_details array
    • Final: Regex pattern matching for partial extraction
  3. Manual word selection: If AI fails, statistical content word selection
  4. Generic hint generation: Fallback responses based on question type

Content Service Resilience (bookDataService.js:91-115):

  1. HF API availability check: Test connection before streaming attempts
  2. Preloaded content: Cache books for immediate access
  3. Local book fallback: 10 embedded classics guarantee functionality
  4. Lazy processing: Defer expensive operations until needed
  5. Quality validation: Multiple attempts with different passage selections

Network and API Error Recovery:

// Example from aiService.js:507-520
const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), 15000);
// ... fetch with timeout handling
if (error.name === 'AbortError') {
  throw new Error('Request timed out - falling back to sequential processing');
}

Progressive Degradation Model

The system gracefully reduces functionality rather than failing completely:

API Key Scenarios:

  • No API key: Uses local LLM mode on port 1234
  • API failure: Manual word selection with structural hints
  • Partial response: Extracts usable data from incomplete JSON

Content Scenarios:

  • Streaming failure: Fall back to local embedded books
  • Book processing error: Skip to next available book
  • Quality filter rejection: Retry with different passage selection

UI Error Handling (app.js:41-44):

catch (error) {
  console.error('Failed to initialize app:', error);
  this.showError('Failed to load the game. Please refresh and try again.');
}

Logging and Debugging Infrastructure

The application provides comprehensive diagnostic information:

  • Quality score breakdown: Detailed passage rejection reasons
  • AI response parsing: Full response logging for debugging
  • Performance timing: Book processing duration tracking
  • State transitions: Level advancement and round progression logging

Development and Deployment Architecture

Environment Configuration

The system supports multiple deployment modes:

Local Development (CLAUDE.md):

  • make dev or python -m http.server 8000: Simple HTTP server
  • make dev-python or python app.py: FastAPI development server

Production Deployment:

  • Docker support: Full containerization with docker-compose
  • Environment injection: FastAPI securely injects API keys via meta tags
  • Static file serving: No build process required for vanilla JavaScript

API Key Management

The application handles API keys through multiple channels:

  1. Environment variables: OPENROUTER_API_KEY for server-side injection
  2. Browser globals: window.OPENROUTER_API_KEY from meta tags
  3. Runtime setting: window.setOpenRouterKey() for browser console updates
  4. Local mode: ?local=true bypasses API key requirements

Conceptual Framework: The Meta-Commentary

This application represents a convergence of two parallel histories:

Educational Assessment (1953): Wilson L. Taylor's Cloze Procedure

  • Systematic word deletion to measure reading comprehension
  • Context-dependent gap filling requiring syntactic and semantic integration
  • Efficiency through multiple-choice elimination and objective scoring

AI Training (2018): BERT's Masked Language Modeling

  • Random token masking to train contextual understanding
  • Prediction accuracy as a measure of language model performance
  • Scaled training on internet-scale text corpora

The Recursive Loop

The Cloze Reader creates a fascinating recursive relationship:

  • AI models trained on cloze tasks now generate cloze tests for humans
  • Assessment methodology becomes training data for future model iterations
  • Human performance data could theoretically improve AI cloze generation

This system stages the tension between:

  • Standardized assessment vs serendipitous discovery
  • Human-curated difficulty vs algorithmically-determined challenge
  • Transparent educational goals vs black-boxed AI decision making
  • Local control vs cloud dependency

By using open-weight models (Gemma), streaming from public archives (Project Gutenberg), and maintaining full client-side operation, the system preserves interrogability and agency while exploring the convergence of human and machine language understanding.


This comprehensive overview demonstrates how the Cloze Reader transforms classic literature into an interactive learning experience while exploring fundamental questions about assessment, AI, and reading comprehension in the digital age.