The Narration Converter is a Node.jsβbased content generation tool developed for the CrackCode gamified learning platform. It transforms raw programming questions from CSV datasets into structured, narrative-driven, multi-language challenges for the CrackCode ecosystem.
The generator operates independently as an offline content preparation tool, ensuring data is ready for both the backend and frontend. Generated content can be exported as JSON files or uploaded directly to MongoDB.
- Added a standalone Upload CLI (
npm run upload) to upload existing JSON outputs without regenerating content. - Added upload filters:
--all,--learn,--challenge,--challenge --phase <N>, and--file <path>. - Added post-generation upload support via
-u/--uploadin the generate flow. - Added generation overrides for
--language,--difficulty, and--count. - Added output cleanup support with
-clr/--clear-outputs. - Added scoped registry reset commands: full reset, learn-only reset, and challenge-only reset.
- Upgraded AI refinement with persona style guides, few-shot conditioning, phrase blacklists, structural enforcement, and JSON repair/retry fallback.
- Story-driven question narration for immersive learning.
- Multi-language variants (Python, Java, C++, JavaScript) per question.
- AI-powered narrative refinement using LLMs for enhanced storytelling.
- Bloom's Taxonomy tagging for educational tracking.
- Mode-based selection (Learn vs. Challenge).
- Registry-based prevention of duplicate content across runs.
- MongoDB upload with automatic collection routing per mode, language, and difficulty.
Converts plain logic problems into engaging stories. Each programming language follows a distinct thematic arc:
| Language | Narrative Theme |
|---|---|
| Python | Noir Detective storyline |
| Java | Heist Crew storyline |
| C++ | Sentinel Hacker storyline |
| JavaScript | Covert Secret Agent / Spy thriller |
Note: The narrative only affects the flavor text; problem logic remains identical across all versions.
The tool integrates an AI refinement layer powered by Groq's LLaMA 3.3 70B model to polish narrative titles and descriptions. This optional feature enhances storytelling while preserving technical accuracy.
Personas Supported:
- Noir Detective β Gritty, mysterious, world-weary (Python storylines)
- Digital Heist Crew β Slick, confident, tactical (common Java storyline mapping)
- White Hat Sentinel β Precise, defensive security tone (common C++ storyline mapping)
- Covert Secret Agent β Sleek, cool under pressure, tactical (JavaScript storylines)
- Helpful Mentor β Clean, neutral, encouraging (fallback persona)
Key Features:
- Difficulty-aware tone adjustment (Easy: encouraging, Medium: focused, Hard: high-stakes)
- Persona-specific style fingerprints (voice, syntax rhythm, metaphor rules) per storyline
- Few-shot persona exemplars to keep title/description quality consistent
- Anti-pattern blacklist + regex AI-ism scrubber for less generic LLM phrasing
- Structural enforcement for 3-part description flow (opening, body, actionable numbered steps)
- Dynamic flavor phrase injection pools for variation without changing task semantics
- Preserves coding task, constraints, and technical details
- Automatic fallback to original content if refinement fails
- Dual-pass resilience: JSON mode call + non-JSON retry with JSON repair extraction
- Rate-limited API calls (30 requests/min)
Enable with: -ai or --ai-refine flag
- Learn Mode: Generates a stable set of 45 questions (15 Easy, 15 Medium, 15 Hard) to build structured roadmaps. Supports filtering by difficulty level and language variants.
- Challenge Mode: Releases advanced practice questions in phased batches (e.g., 30 per phase). Contains mixed difficulty levels (Medium and Hard), ensuring no overlap with Learn mode questions.
Generated JSON output files can be uploaded directly to MongoDB, with each file automatically routed to the correct collection based on its content.
Collection Routing:
| File Pattern | Target Collection | Example |
|---|---|---|
learn_programming_{difficulty}_{language}.json |
learn{Language}{Difficulty}Q |
learnPythonEasyQ |
challenges_phase_{N}_{language}.json |
challenge{Language}Q |
challengePythonQ |
Learn mode produces up to 12 collections (4 languages Γ 3 difficulties). Each question is routed based on its difficulty and variant language.
Challenge mode produces up to 4 collections (1 per language). All phases are stored in the same collection, with a phase field distinguishing them.
Key behaviors:
- Uses
findOneAndUpdatewith upsert β safe to re-run without creating duplicates. - Learn upsert key:
{ problemId }β one entry per problem per collection. - Challenge upsert key:
{ problemId, phase }β same problem can appear in different phases. - Separate Mongoose schemas for learn (includes
story) and challenge (includesbeatId,phase). - Target collection is determined from item data, not filenames.
Current behavior note: The uploader currently routes using the first variant language (
item.variants[0].language). For predictable MongoDB routing, generate/export single-language batches (for example--language pythonor--language java) before uploading.
Narration-Converter-dev/
βββ data/
β βββ input/ # Raw CSV datasets (LeetCode, etc.)
β βββ output/ # Generated JSON production files
β βββ registry/ # Usage registry (JSON tracking)
βββ src/
β βββ cli/ # Command-line interface logic
β β βββ generate.js # Main generation CLI
β β βββ upload.js # Standalone upload CLI
β βββ db/ # Database layer
β β βββ connection.js # MongoDB connect/disconnect
β β βββ models/
β β βββ question.js # Mongoose schemas + collection router
β βββ loaders/ # CSV loading & parsing
β βββ normalizer/ # Data cleaning & normalization
β βββ classifier/ # Topic & Bloom classification
β βββ selector/ # Learn & Challenge selection logic
β βββ narrative/ # Story and template engines
β βββ refinement/ # AI refinement engine
β βββ registry/ # Registry Read/Write handlers
β βββ uploader/ # JSON-to-MongoDB upload logic
β β βββ uploadFromJson.js
β βββ utils/ # Shared utility helpers
βββ config/
β βββ dataset_mappings/ # Per-dataset column mappings
β βββ selection_rules.json
β βββ stories.json
βββ package.json
βββ README.md
Key files:
- Generation CLI: src/cli/generate.js
- Upload CLI: src/cli/upload.js
- AI Refinement: src/refinement/refinerEngine.js
- DB Connection: src/db/connection.js
- Question Schemas: src/db/models/question.js
- Upload Logic: src/uploader/uploadFromJson.js
- Config: config/selection_rules.json
- Package metadata: package.json
- Registry: data/registry/usage_registry.json
Create a .env file at the repository root to supply local defaults:
DEFAULT_DATASET=datasetA
DEFAULT_INPUT_PATH=data/input/datasetA.csv
DEFAULT_MODE=learn
# AI Refinement (Optional)
GROQ_API_KEY=your_groq_api_key_here
# MongoDB (Required for --upload flag and standalone upload CLI)
MONGO_URI=mongodb://localhost:27017/narration-converterAPI Key Setup:
- Sign up at Groq Console
- Generate an API key
- Add
GROQ_API_KEYto your.envfile
MongoDB Setup:
- Ensure MongoDB is running locally, or use a MongoDB Atlas connection string
- Add
MONGO_URIto your.envfile - Install Mongoose:
npm install mongoose
The CLI will use these defaults when flags are omitted.
The program supports two main execution styles: Shortcuts for common tasks and Manual Flags for full control.
Add short npm scripts (example to paste into the scripts object in package.json):
"scripts": {
"generate": "node src/cli/generate.js",
"upload": "node src/cli/upload.js",
"gen:learn": "npm run generate -- -m learn",
"gen:challenge": "npm run generate -- -m challenge"
}Examples (using defaults from .env or passing dataset):
npm run gen:learn
npm run generate -- -m learn --reset-registry --dataset datasetA
npm run gen:challenge -- --dataset datasetA --phase 1
# With AI refinement enabled
npm run gen:learn -- --ai
npm run gen:challenge -- --dataset datasetA --phase 1 --ai-refine
# Generate and upload to MongoDB
npm run gen:learn -- --upload
npm run gen:challenge -- --dataset datasetA --phase 1 --uploadUse the base generate script and pass flags after -- to override defaults.
- Generate Learn (explicit):
npm run generate -- --dataset datasetA --input data/input/datasetA.csv --mode learn- Generate Learn and reset registry:
npm run generate -- --mode learn --reset-registry- Generate Challenge phase 2:
npm run generate -- --mode challenge --phase 2 --dataset datasetA- Generate with AI refinement:
npm run generate -- --mode learn --ai
npm run generate -- --mode challenge --phase 1 --ai-refine- Generate and upload to MongoDB:
npm run generate -- --mode learn --upload
npm run generate -- --mode learn --difficulty Easy --language python --upload
npm run generate -- --mode challenge --phase 1 --uploadUpload previously generated output files without re-running generation:
# Upload all output files (learn + challenge)
npm run upload -- --all
# Upload only learn files
npm run upload -- --learn
# Upload only challenge files (all phases)
npm run upload -- --challenge
# Upload a specific challenge phase
npm run upload -- --challenge --phase 2
# Upload a single specific file
npm run upload -- --file data/output/learn_programming_easy_python.json
# Override output directory
npm run upload -- --all --dir data/output/Core Flags:
-d,--dataset: The dataset name (e.g.,datasetA,leetcode).-i,--input: Path to the CSV file (inferred from dataset if omitted).-m,--mode:learnorchallenge.-p,--phase: Challenge phase number (default1).
Learn Mode Options:
-diff,--difficulty: Filter by difficulty level (Easy,Medium, orHard) for Learn mode only.-c,--count: Override the number of questions to select (Learn mode only).-lang,--language: Override language selection to generate variants for a specific language.
AI Refinement:
-ai,--ai-refine: Enable AI narrative refinement (requiresGROQ_API_KEY).
Registry Management:
-R,--reset-registry: Clears full usage registry.-rl,--reset-learn-only: Clears only Learn mode history.-rc,--reset-challenges-only: Clears only Challenge mode history.
Output Management:
-clr,--clear-outputs <type>: Clear previously generated output files.- Valid types:
all,learn,learn:easy,learn:medium,learn:hard,learn:hard:python,learn:hard:java,learn:hard:cpp,learn:hard:javascript,challenge:phase<N>
- Valid types:
Upload:
-u,--upload: Upload generated output files to MongoDB after writing JSON.
Examples:
# Clear all Learn outputs
npm run generate -- -m learn -clr learn
# Clear only Learn Easy outputs
npm run generate -- -m learn -clr learn:easy
# Clear Challenge Phase 1 outputs
npm run generate -- -m challenge -clr challenge:phase1
# Clear all outputs
npm run generate -- -m learn -clr all-a,--all: Upload all learn + challenge files.-l,--learn: Upload all learn files.-ch,--challenge: Upload all challenge files.-p,--phase <N>: Upload only a specific challenge phase (use with--challenge).-f,--file <path>: Upload a single specific file.--dir <path>: Override output directory (default:data/output/).
- Learn Mode: Balanced roadmap β 15 Easy, 15 Medium, 15 Hard (or filtered by difficulty). Supports single-language variant generation. Avoids repeats via registry.
- Challenge Mode: Produces Medium and Hard questions, split into phases (30 per phase). Supports single-language override and ensures no overlap with Learn-used questions or previous challenge phases.
- Narrative Generation: Creates language variants for Python, Java, C++, and JavaScript with story-specific personas.
- AI Refinement: Optionally refines narrative titles and descriptions using LLM with rate limiting (max 30 requests/minute).
- AI Safety & Robustness: Cleans banned generic phrasing, repairs malformed model JSON, and enforces consistent output structure before persisting.
- Registry:
data/registry/usage_registry.jsontracks used questions to prevent duplicates unless manually reset. - Upload: Routes each output JSON file to MongoDB collections using mode, language, and difficulty with upsert semantics. For reliable language-specific routing, use single-language output files.
- Learn output:
data/output/learn_programming.jsonβ 45 questions (15 Easy, 15 Medium, 15 Hard) with all language variants. - Filtered Learn outputs: When using
--difficultyfilter, outputs likedata/output/learn_programming_easy.json,learn_programming_medium.json, etc. - Language-specific outputs: When using
--languagefilter, outputs likedata/output/learn_programming_hard_python.json, etc. - Challenge output:
data/output/challenges_phase_X.jsonβ 30 questions (Medium and Hard mix) per phase with all language variants. - Language-specific challenge outputs: When using
--languagefilter, outputs likedata/output/challenges_phase_1_python.json, etc. - Registry file:
data/registry/usage_registry.jsonprevents duplicates across runs and tracks Learn vs. Challenge usage separately.
Learn collections (up to 12):
| Collection | Content |
|---|---|
learnPythonEasyQ |
Easy Python questions |
learnPythonMediumQ |
Medium Python questions |
learnPythonHardQ |
Hard Python questions |
learnJavaEasyQ |
Easy Java questions |
learnJavaMediumQ |
Medium Java questions |
learnJavaHardQ |
Hard Java questions |
learnCppEasyQ |
Easy C++ questions |
learnCppMediumQ |
Medium C++ questions |
learnCppHardQ |
Hard C++ questions |
learnJavascriptEasyQ |
Easy JavaScript questions |
learnJavascriptMediumQ |
Medium JavaScript questions |
learnJavascriptHardQ |
Hard JavaScript questions |
Challenge collections (up to 4):
| Collection | Content |
|---|---|
challengePythonQ |
All phases, Python |
challengeJavaQ |
All phases, Java |
challengeCppQ |
All phases, C++ |
challengeJavascriptQ |
All phases, JavaScript |
- Add dataset-specific npm scripts (e.g.,
gen:learn:datasetA) in package.json for one-command runs. - Create an optional tiny wrapper CLI
src/cli/short.jsthat maps short aliases (l,c) to full flags so you can runnpm run nc -- l datasetA r. - Use defaults in
.envsonpm run gen:learnis sufficient for most runs. - Use the
-clrflag to quickly clean up outputs before generating fresh batches. - Use
npm run upload -- --allto bulk-upload existing outputs without regenerating.
- The CLI forwards extra flags after
--to the script; use that to override defaults. - AI Rate Limiting: The refiner enforces 1 request per 2 seconds (30 RPM max) to respect Groq API limits.
- For large CSVs, prefer streaming parsing (
csv-parserstream) and JSONL outputs to reduce memory. - Use an in-memory registry cache with batched writes to reduce disk I/O and speed repeated runs.
- Consider worker threads for CPU-bound classification/narrative generation and lazy language-variant generation to parallelize work.