Skip to content

Commit c5d7990

Browse files
jeremymanningclaude
andcommitted
Audit and fix README accuracy: 11 corrections from codebase research
- Fix video count in intro (5,000+ → 5,400+, actual: 5,407) - Fix estimator description (RBF → Gaussian Process with Matern 3/2 kernel) - Add embedding dimensionality (768-dim) and projection method (UMAP) - Fix total question count (2,450 → 2,500 = 50 domains × 50) - Fix video data description (catalog.json only, not transcripts/embeddings) - Fix BibTeX citation URL (psyarxiv.com → osf.io/preprints/psyarxiv) - Simplify transcript embedding description to sliding-window only - Improve project structure annotations (GP, nanostores, Canvas 2D) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent f2eca3c commit c5d7990

1 file changed

Lines changed: 18 additions & 18 deletions

File tree

README.md

Lines changed: 18 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Knowledge Mapper
22

3-
An interactive visualization that maps your conceptual knowledge across 250,000 Wikipedia articles and 5,000+ Khan Academy videos. Answer questions to watch a real-time heatmap of your strengths and gaps emerge, then get personalized video recommendations to fill knowledge gaps.
3+
An interactive visualization that maps your conceptual knowledge across 250,000 Wikipedia articles and 5,400+ Khan Academy videos. Answer questions to watch a real-time heatmap of your strengths and gaps emerge, then get personalized video recommendations to fill knowledge gaps.
44

55
**[Try the live demo](https://contextlab.github.io/mapper/)** | **[Read the paper](https://osf.io/preprints/psyarxiv/dh3q2)**
66

@@ -12,7 +12,7 @@ An interactive visualization that maps your conceptual knowledge across 250,000
1212
4. **Get video recommendations** -- Khan Academy videos are suggested based on your weakest areas
1313
5. **Explore freely** -- zoom, pan, hover video trajectories, and click articles for Wikipedia content
1414

15-
Under the hood, text embedding models place every article, question, and video transcript into a shared high-dimensional vector space, then project them onto a 2D map where related concepts cluster together. Density flattening via optimal transport ensures even spatial coverage. As you answer questions, a Bayesian estimator interpolates your knowledge across the map using radial basis functions.
15+
Text embedding models place every article, question, and video transcript into a shared 768-dimensional vector space, then project them onto a 2D map via UMAP where related concepts cluster together. Density flattening via optimal transport ensures even spatial coverage. As you answer questions, a Gaussian Process with a Matern 3/2 kernel interpolates your knowledge across the map.
1616

1717
## Features
1818

@@ -23,7 +23,7 @@ Under the hood, text embedding models place every article, question, and video t
2323
- **Video discovery panel** -- left sidebar with toggleable video visibility, scrollable list, and map trajectory highlighting
2424
- **Video trajectories** -- hover a video dot to see its topic path across the map; click to play
2525
- **Knowledge insights** -- see your strongest/weakest concepts and learning suggestions
26-
- **Social sharing** -- export your knowledge map as an image with grid lines and colorbar
26+
- **Social sharing** -- export your knowledge map as a PNG with grid lines and colorbar
2727
- **Keyboard navigation** -- full keyboard accessibility for quiz answers and map controls
2828
- **Fully client-side** -- no data leaves your browser; progress saved to localStorage
2929

@@ -49,20 +49,20 @@ npm run preview # preview the production build locally
4949

5050
```
5151
mapper/
52-
├── index.html # HTML entry point (layout, styles, modals)
53-
├── src/ # Application source code
52+
├── index.html # Single-page app shell (layout, styles, modals)
53+
├── src/ # Application source (vanilla JS, ES modules)
5454
│ ├── app.js # Entry point: init, routing, event wiring
5555
│ ├── domain/ # Domain data loading and registry
56-
│ ├── learning/ # Adaptive quiz engine + video recommender
57-
│ ├── state/ # Application state and persistence
56+
│ ├── learning/ # GP estimator, adaptive sampler, video recommender
57+
│ ├── state/ # Reactive state (nanostores) and localStorage persistence
5858
│ ├── ui/ # UI components (controls, quiz, insights, share, video panel/modal)
5959
│ ├── utils/ # Math, accessibility, feature detection
60-
│ └── viz/ # Canvas rendering (heatmap, minimap, particles)
60+
│ └── viz/ # Canvas 2D rendering (heatmap, minimap, particles)
6161
├── data/ # Pre-computed data bundles
62-
│ ├── domains/ # 50 per-domain JSON bundles + index.json
63-
│ └── videos/ # Video catalog + transcripts + embeddings
64-
├── scripts/ # Python data pipeline
65-
├── tests/ # Unit tests (vitest) + E2E tests (Playwright)
62+
│ ├── domains/ # 50 per-domain JSON bundles + index.json registry
63+
│ └── videos/ # Video catalog with spatial coordinates (catalog.json)
64+
├── scripts/ # Python data pipeline (30 scripts)
65+
├── tests/ # Unit tests (Vitest) + E2E tests (Playwright)
6666
└── public/ # Static assets
6767
```
6868

@@ -71,11 +71,11 @@ mapper/
7171
The `scripts/` directory contains the Python pipeline that generates the data powering the frontend:
7272

7373
1. **Embed articles** using `google/embeddinggemma-300m` (768-dim vectors)
74-
2. **Generate questions** via Claude Opus 4.6 (50 per domain, 2,450 total)
75-
3. **Embed questions** using the same model (for coordinate consistency)
74+
2. **Generate questions** via Claude Opus 4.6 (50 per domain, 2,500 total)
75+
3. **Embed questions** using the same model for coordinate consistency
7676
4. **Transcribe videos** via Whisper on GPU cluster (5,400+ Khan Academy transcripts)
77-
5. **Embed transcripts** -- both full-document and sliding-window (512 words, 50-word stride)
78-
6. **Joint UMAP projection** -- project articles + questions + transcripts TOGETHER to 2D
77+
5. **Embed transcripts** -- sliding-window embeddings (512 words, 50-word stride)
78+
6. **Joint UMAP projection** -- project articles + questions + transcripts together to 2D
7979
7. **Density flattening** via approximate optimal transport (`mu=0.85`)
8080
8. **Apply coordinates** to all domain bundles and video catalog
8181
9. **Compute bounding boxes** from question positions (5th-95th percentile)
@@ -84,7 +84,7 @@ The `scripts/` directory contains the Python pipeline that generates the data po
8484

8585
```bash
8686
npx vitest run # 82 unit tests (estimator, sampler, recommender, stability)
87-
npx playwright test # 9 E2E test specs (quiz flow, video recs, sharing, edge cases)
87+
npx playwright test # 9 E2E specs across 5 browser projects (Chromium, Firefox, WebKit, mobile)
8888
```
8989

9090
## Citation
@@ -94,7 +94,7 @@ npx playwright test # 9 E2E test specs (quiz flow, video recs, sharing, edge c
9494
title={Text embedding models yield detailed conceptual knowledge maps derived from short multiple-choice quizzes},
9595
author={Fitzpatrick, Paxton C. and Heusser, Andrew C. and Manning, Jeremy R.},
9696
year={2025},
97-
url={https://psyarxiv.com/dh3q2}
97+
url={https://osf.io/preprints/psyarxiv/dh3q2}
9898
}
9999
```
100100

0 commit comments

Comments
 (0)