The Cloze Reader is a vanilla JavaScript application with a FastAPI backend that transforms classic literature into interactive reading comprehension exercises. At its core, it represents a fascinating intersection of educational assessment theory and modern AI - using masked language modeling (the same technique that trains modern LLMs) to create reading tests for humans.
The application runs entirely in the browser using ES6 modules, with no build process required. This architectural choice maintains transparency and allows users to inspect every component of the system.
When the application starts, it follows this precise sequence:
- Welcome overlay displays first-time user instructions
- Game engine initialization loads book data and AI services
- First round generation creates the initial cloze exercise
- UI activation reveals the game interface and controls
The game implements a sophisticated difficulty progression:
- Levels 1-5: 1 blank per passage, easier vocabulary (4-7 letters)
- Levels 6-10: 2 blanks per passage, medium difficulty (4-10 letters)
- Level 11+: 3 blanks per passage, challenging vocabulary (5-14 letters)
Level advancement requires passing a single round. The scoring system is intentionally strict:
- 1 blank: Must be correct (100% accuracy)
- 2 blanks: Must get both correct (100% accuracy)
- 3+ blanks: Must get all but one correct (allowing one mistake)
The system uses a single-model approach with Google Gemma-3-27b for all AI operations:
- Word selection and difficulty assessment
- Contextual hint generation
- Literary contextualization
- Conversational chat responses
For local deployment, it automatically switches to Gemma-3-12b when ?local=true is used, connecting to port 1234 (compatible with LM Studio and similar tools).
The AI word selection follows a sophisticated multi-step process:
- Level-based constraints define vocabulary difficulty ranges
- Passage analysis identifies candidate words, avoiding:
- Capitalized words (proper nouns, sentence beginnings)
- First 10 words of any passage (context establishment)
- Concatenated artifacts like "fromthe", "tothe", "hewas"
- Distribution algorithm ensures blanks are spread throughout the passage
- Validation filtering confirms words exist in the passage and meet length requirements
- Fallback mechanisms provide manual selection if AI fails
The passage extraction system includes sophisticated quality detection:
- Statistical analysis of capitalization ratios, punctuation density, and sentence structure
- Pattern recognition for academic material (citations, abbreviations, etymology brackets)
- Dictionary detection using hash symbols, reference numbers, and technical terminology
- Formatting analysis identifying tables, indexes, and title pages
- Progressive scoring system rejects passages above threshold (score > 3)
The system intelligently manages content from two sources:
-
Primary: Hugging Face Datasets API streaming from
manu/project_gutenberg- Real-time access to 70,000+ books
- Lazy loading with on-demand text processing
- Quality validation after selection
-
Fallback: Local embedded classics (10 canonical works)
- Pride and Prejudice, Tom Sawyer, Great Expectations, etc.
- Pre-processed and guaranteed to work offline
- Activated when streaming fails or API unavailable
Books undergo sophisticated cleaning:
- Project Gutenberg artifact removal: Start/end markers, metadata headers, scanning notes
- Structural cleaning: Chapter headers, page numbers, formatting artifacts
- Content identification: Locates actual narrative text vs. front matter
- Quality validation: Ensures sufficient length, narrative structure, and readability
The system uses lazy processing - books are initially loaded with minimal validation, then fully processed only when selected for gameplay, optimizing performance.
The application features a sticky control panel architecture:
- Primary controls: Submit, Next Passage, Show Hints buttons
- Leaderboard access: Quick trophy button for score viewing
- Mobile optimization: Fixed bottom positioning that stays above mobile keyboards
- Accessibility: 48px minimum touch targets, backdrop blur effects
Each blank is rendered as an intelligent input field that:
- Dynamic sizing: Width adjusts to expected word length (
Math.max(50, word.length * 10)px) - Chat integration: 💬 button next to each blank for contextual help
- Navigation flow: Enter key moves to next blank or submits when complete
- Visual feedback: Real-time styling for correct/incorrect answers
The chat interface provides contextual word-level assistance:
- Grammar: "What type of word is this?"
- Meaning: "What does this word mean?"
- Context: "Why does this word fit here?"
- Clue: "Give me a clue"
- Persistent history: Conversations preserved per blank across the round
- Question tracking: Used questions marked with ✓ and disabled
- Typing indicators: Visual feedback during AI processing
- Current input aware: AI sees user's partial answer for contextual help
The application provides multiple hint layers:
- Structural hints: Word length, first letter, last letter (level-dependent)
- AI-generated hints: Contextual clues based on passage meaning
- Interactive chat: Personalized assistance through conversation
The leaderboard follows classic arcade game conventions:
- 3-letter initials: A-Z only, automatically validated and sanitized
- Top 10 tracking: Maximum entries maintained with automatic trimming
- Fresh session: Data resets on each page load for fair competition
- Highest level reached (most important)
- Round number at that level (secondary ranking)
- Total passages passed (tiebreaker)
- Date achieved (final tiebreaker - newer wins)
- Primary: Hugging Face Hub backend for global persistence
- Fallback: localStorage for offline functionality
- Automatic sync: HF data downloads to localStorage on startup
The system maintains detailed player analytics:
- Accuracy tracking: Correct/total words, success rates
- Progression data: Highest level, rounds completed
- Streak monitoring: Current and longest consecutive successes
- Vocabulary analysis: Unique words correctly identified (stored as Set)
- Reset on page load: Fresh competition each session
- Real-time updates: Stats update after every passage attempt
- Milestone notifications: Special alerts every 5 levels
The initials modal provides dual input methods:
- Direct typing: Standard text field for keyboard users
- Auto-sync: Updates arcade controls in real-time
- Validation: Live 3-character limit with uppercase conversion
- Arrow navigation: Up/down buttons for each letter slot
- Keyboard controls: Arrow keys, Enter, Tab navigation
- Visual feedback: Active slot highlighting, smooth transitions
Both input methods remain perfectly synchronized throughout the interaction.
The application uses a pure localStorage strategy with no backend database requirements:
cloze-reader-leaderboard: Top 10 high scores with ranking datacloze-reader-player: Player profile with initials and session infocloze-reader-stats: Comprehensive performance analytics
- Set handling: JavaScript Sets converted to Arrays for JSON storage
- Type validation: Runtime checks for data integrity
- Fallback creation: Automatic empty object generation on corruption
The application maintains state across multiple layers:
this.currentBook = null; // Active book metadata
this.originalText = ''; // Clean passage text
this.clozeText = ''; // Text with blank placeholders
this.blanks = []; // Word positions and answers
this.userAnswers = []; // Current user input
this.currentLevel = 1; // Difficulty progression
this.currentRound = 1; // Round counter
this.contextualization = ''; // AI-generated contextthis.messageHistory = new Map(); // blankId -> message arrays
// Preserves chat history per blank throughout round- Live statistics: Updated after every passage attempt
- Streak tracking: Current and historical success runs
- Vocabulary learning: Cumulative words correctly identified
The application follows a clear state progression:
- Page Load: Fresh leaderboard, reset statistics
- Game Start: Initialize book service, load first passage
- Round Progression: Maintain state, clear chat history between rounds
- High Score: Trigger initials entry, update leaderboard
- Session End: Data persists until next page load
The application implements comprehensive error recovery at every level:
- Retry with exponential backoff: Up to 3 attempts with increasing delays
- Response extraction hierarchy:
- Primary:
message.content - Secondary:
reasoningfield - Tertiary:
reasoning_detailsarray - Final: Regex pattern matching for partial extraction
- Primary:
- Manual word selection: If AI fails, statistical content word selection
- Generic hint generation: Fallback responses based on question type
- HF API availability check: Test connection before streaming attempts
- Preloaded content: Cache books for immediate access
- Local book fallback: 10 embedded classics guarantee functionality
- Lazy processing: Defer expensive operations until needed
- Quality validation: Multiple attempts with different passage selections
// Example from aiService.js:507-520
const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), 15000);
// ... fetch with timeout handling
if (error.name === 'AbortError') {
throw new Error('Request timed out - falling back to sequential processing');
}The system gracefully reduces functionality rather than failing completely:
- No API key: Uses local LLM mode on port 1234
- API failure: Manual word selection with structural hints
- Partial response: Extracts usable data from incomplete JSON
- Streaming failure: Fall back to local embedded books
- Book processing error: Skip to next available book
- Quality filter rejection: Retry with different passage selection
catch (error) {
console.error('Failed to initialize app:', error);
this.showError('Failed to load the game. Please refresh and try again.');
}The application provides comprehensive diagnostic information:
- Quality score breakdown: Detailed passage rejection reasons
- AI response parsing: Full response logging for debugging
- Performance timing: Book processing duration tracking
- State transitions: Level advancement and round progression logging
The system supports multiple deployment modes:
make devorpython -m http.server 8000: Simple HTTP servermake dev-pythonorpython app.py: FastAPI development server
- Docker support: Full containerization with
docker-compose - Environment injection: FastAPI securely injects API keys via meta tags
- Static file serving: No build process required for vanilla JavaScript
The application handles API keys through multiple channels:
- Environment variables:
OPENROUTER_API_KEYfor server-side injection - Browser globals:
window.OPENROUTER_API_KEYfrom meta tags - Runtime setting:
window.setOpenRouterKey()for browser console updates - Local mode:
?local=truebypasses API key requirements
This application represents a convergence of two parallel histories:
- Systematic word deletion to measure reading comprehension
- Context-dependent gap filling requiring syntactic and semantic integration
- Efficiency through multiple-choice elimination and objective scoring
- Random token masking to train contextual understanding
- Prediction accuracy as a measure of language model performance
- Scaled training on internet-scale text corpora
The Cloze Reader creates a fascinating recursive relationship:
- AI models trained on cloze tasks now generate cloze tests for humans
- Assessment methodology becomes training data for future model iterations
- Human performance data could theoretically improve AI cloze generation
This system stages the tension between:
- Standardized assessment vs serendipitous discovery
- Human-curated difficulty vs algorithmically-determined challenge
- Transparent educational goals vs black-boxed AI decision making
- Local control vs cloud dependency
By using open-weight models (Gemma), streaming from public archives (Project Gutenberg), and maintaining full client-side operation, the system preserves interrogability and agency while exploring the convergence of human and machine language understanding.
This comprehensive overview demonstrates how the Cloze Reader transforms classic literature into an interactive learning experience while exploring fundamental questions about assessment, AI, and reading comprehension in the digital age.