Skip to content

josephfemia/jepa_learning

Repository files navigation

Predict the Representation — An Interactive JEPA Course

→ Live: josephfemia.github.io/jepa_learning · Jump to the from-scratch labs

Predict the representation, not the pixels — an interactive JEPA & world-models course

An interactive, single-page course that takes you from a basic machine-learning background to JEPA and world-model fluency — approaching ML-researcher depth. It's built around one thesis, the one Yann LeCun keeps returning to:

Predict the representation, not the pixels.

Most self-supervised models waste capacity reconstructing every pixel — including the unpredictable noise. JEPA (Joint-Embedding Predictive Architecture) predicts in a learned latent space instead, so the model spends its budget on what's actually predictable about the world. This course builds that idea up from scratch, no black boxes.

What makes it different

It deliberately mixes three learning modes so it never becomes a wall of text:

  • 📖 Read — concept-first explanations in an Andrej Karpathy "build-from-scratch, concrete-before-abstract" voice.
  • 🧪 Interact — hands-on labs and diagrams: the reconstruction tax, masking strategies, representation collapse (and the tricks that prevent it), an interactive energy-landscape explorer, a VICReg term isolator, a latent-space CEM planner, the H-JEPA hierarchy, a model explorer, and more — all original SVG/Canvas, no external assets.
  • 🧠 Remember — learning-science tactics baked in: predict-first prompts, discovery sequencing, spaced-retrieval checkpoints, and dual coding.

Plus executable from-scratch notebooks (#labs) and an accuracy bar where every numeric or named claim traces to a primary source.

The course reads as separate but cohesive lectures: a left sidebar paginates between them, and a Prev/Next pager threads them into one naturally-flowing arc.

What you'll learn

The core ideawhy latent prediction winsbuilding a JEPArepresentation collapse & how to avoid itunder the hood (objectives & math)JEPA vs. the alternativesthe research timelinethe model family (I-JEPA, V-JEPA, …)world models & planningrecap.

Quick start

npm install
npm run dev      # http://localhost:5173
npm run build    # production build → dist/
npm run preview  # preview the production build
npm test         # Vitest unit tests (quiz scoring, CEM planner, lab logic cores)

Requires Node 18+.

Stack

  • Vite + React 18, a tiny hash router (#labs → the from-scratch notebooks page, else the course).
  • Tailwind CSS (core utilities + arbitrary values via JIT) plus a shared design system in src/index.css.
  • A small framework-agnostic visualization toolkit in src/widgets/ (rllab.js for SVG/DOM + animate.js for reduced-motion-aware tweens), with a React Lab wrapper — the interactive labs are authored against it.
  • No runtime dependencies. Three web fonts: Archivo (display/UI), Source Serif 4 (prose), IBM Plex Mono (code).
  • Light-mode only. Editorial palette: warm-neutral paper, one cobalt accent (#2742CC), an orange "signal" (#E8590C) for emphasis, muted semantic tints, hairline borders.

Project structure

.
├── index.html              # Vite entry (+ Open Graph / Twitter card meta)
├── vite.config.js
├── tailwind.config.js      # scans index.html + src/**
├── public/
│   ├── og-image.png        # social share card
│   └── notebooks/          # executed from-scratch .ipynb (served at /notebooks)
├── src/
│   ├── main.jsx            # React root → renders <JepaCourse />
│   ├── index.css           # shared design system (tokens, sidebar, .lab stage, …)
│   ├── theme.js            # single light palette + ThemeContext / useTheme()
│   ├── data.js             # course data (SECTIONS, NAV_GROUPS, PAGES, MODELS, …)
│   ├── logic.js            # pure, tested helpers (scoreQuiz, planCEM, clamp/lerp, …)
│   ├── logic/              # per-lab numeric cores + vitest tests (collapseSim, energyLandscape, reconTax)
│   ├── widgets/            # rllab.js toolkit + animate.js + Lab.jsx wrapper
│   ├── labs/               # toolkit-based interactive labs (CollapseLab, EnergyLandscapeLab, …)
│   ├── Notebooks.jsx       # the #labs page (renders executed notebooks)
│   └── JepaCourse.jsx      # course shell (sidebar SPA) + remaining components + the lecture bodies
└── docs/superpowers/specs/ # design / review notes

How to extend it (conventions)

  • Run npm run build before considering any change done. It's the real "does it compile" check and catches JSX errors and Tailwind JIT misses.
  • Colors come from useTheme() (C.cyan = cobalt accent, C.amber = orange signal, C.green, C.violet, surfaces, text), never hardcoded hex — except the intentionally-dark code blocks and the dark .lab-stage, where the rllab.js C palette supplies bright-on-dark colors.
  • Module-scope data stores color names as strings ("cyan") and resolves them with C[name] inside components — never reference the theme object at module scope.
  • Tailwind: literal class strings only (max-w-[1080px]), never built by concatenation, or JIT won't emit them.
  • No localStorage/sessionStorage — React state only (SSR-safe, intentional).
  • Interactive sandboxes render on the dark .lab-stage via the toolkit; static schematics stay on white .figure cards. Canvas/animation must respect prefers-reduced-motion and clean up requestAnimationFrame on unmount.
  • Diagrams are schematics, not literal training traces — keep illustrative simplifications labeled.
  • Pure logic that has tests lives in logic.js / logic/ so the components import the exact code the tests cover.

Learning-science tactics (preserve these when extending)

  • Pretesting / "predict first"GuessGate asks the learner to guess before the answer is revealed (guessing wrong first improves retention).
  • Discovery sequencing — the history timeline frames each step as problem → idea, so JEPA feels inevitable, not arbitrary.
  • Generative learning — every abstract idea has a manipulable lab.
  • Dual coding — concepts paired with a visual, never text alone.
  • Spaced retrievalCheckpoint quiz with immediate per-question feedback; recall prompts at lecture transitions.
  • Karpathy voiceInstructor asides, from-scratch CodeBlocks, concrete-before-abstract, explicit demystifying, teach-it-back advice.
  • Congruent color-coding — cobalt = signal/latent, green = correct, orange = the "pixel/generative" foil, violet = energy/abstraction.

Content accuracy reference (verified facts the course relies on)

Every numeric/named claim traces to a primary source (links are in the course footer). Prefer the paper over secondary coverage, and hedge research bets ("critics argue…", "the claim is X under assumption Y, not a universal theorem").

  • JEPA — LeCun, A Path Towards Autonomous Machine Intelligence (2022). Six-module agent: configurator, perception, world model, cost, actor, short-term memory. H-JEPA = hierarchical.
  • I-JEPA (Assran et al., CVPR 2023) — context encoder (ViT, visible patches) + EMA target encoder (full image, stop-gradient) + a narrow ViT predictor conditioned on target position. Multi-block masking: 1 context block scale (0.85, 1.0), 4 target blocks scale (0.15, 0.2), aspect (0.75, 1.5), overlap removed, targets masked at the encoder output. ViT-H/14 on ImageNet, 16 A100s, <72h.
  • V-JEPA (Bardes et al., blog Feb 2024 · arXiv Apr 2024) — 3D (spatiotemporal) multi-block masking; latent prediction.
  • V-JEPA 2 (Assran, Ballas et al., Jun 11 2025) — ViT-g (>1B params), VideoMix22M (1M+ hours), mask-denoising + 3D RoPE. Stage 2 V-JEPA 2-AC: 24-layer predictor, 7-DoF actions, <62 h DROID robot video, encoder frozen. Planning = MPC + Cross-Entropy Method, energy = L1 distance to the goal-image embedding, ~16 s/action vs a reported ~4 min for a diffusion baseline (Cosmos; ≈15×) — the gap is latent-embedding scoring vs full pixel rendering, same CEM both sides. Zero-shot on Franka arms in two labs, no task reward.
  • LeJEPA (Balestriero & LeCun, Nov 2025) — proves the isotropic Gaussian minimizes worst-case downstream risk for linear probes with Gaussian priors (not a universal theorem); enforces it via SIGReg (random 1-D projections + Epps–Pulley normality test; Cramér–Wold justification); removes EMA/stop-gradient; one hyperparameter λ.
  • VICReg (Bardes, Ponce & LeCun, 2022) — three terms: invariance (MSE), variance (hinge on std, γ=1), covariance (squared off-diagonals). Coefficients λ=25, μ=25, ν=1 are the paper's ImageNet values, not universal (tuned per dataset/batch/dimension). Hinge the std, not the variance.
  • Stop-gradient vs EMA — distinct mechanisms: the stop-gradient breaks the learning-signal symmetry; EMA just makes the teacher a slow copy. SimSiam (Chen & He, 2021) dropped the EMA/momentum encoder entirely and still avoided collapse with only a stop-gradient + predictor head — so the asymmetry matters more than EMA specifically. (In BYOL the momentum encoder is the EMA — they are the same mechanism, so BYOL is not evidence of an EMA-free path; SimSiam is.) Exactly why a stop-gradient prevents collapse is still not fully understood.
  • Method categorization — DINO/DINOv2 are non-contrastive (self-distillation: EMA teacher + centering + sharpening, no negatives) — kin to JEPA's lineage, NOT to SimCLR/MoCo. The genuinely contrastive methods (negatives, push-apart) are SimCLR, MoCo, and CPC.
  • DINO-WM (Zhou et al., 2024/25) — frozen DINOv2 + learned predictor; avoids collapse by not training the encoder.
  • PLDM (Sobal et al., 2025) — end-to-end JEPA-WM, VICReg-derived 7-term objective.
  • LeWorldModel / LeWM (Maes, Le Lidec, Scieur, LeCun, Balestriero; arXiv 2603.19312, Mar 2026) — end-to-end from pixels, ViT-Tiny, ~15M params total, two-term loss (next-embedding prediction + a Gaussian-enforcing regularizer, SIGReg-style), no EMA/stop-grad/frozen encoder, a reported 192-dim single token (reported ~200× fewer than DINO-WM-class models), plans a reported ~48× faster than foundation-model world models (DINO-WM-class), a reported +18% over PLDM on Push-T, underperforms on the simplest env (Two-Room — the isotropic-Gaussian prior is too strong for a low-dim distribution).
  • AMI Labs (Advanced Machine Intelligence; "ami" = friend in French) — LeCun left Meta Nov 2025, co-founded AMI Labs (Paris, Dec 2025), ~$1B seed early 2026, executive chairman; Alexandre LeBrun CEO. Built on the world-model (not LLM) bet.

Deploy (GitHub Pages)

A GitHub Actions workflow (.github/workflows/deploy.yml) builds and publishes to GitHub Pages on every push to master/main. No manual config is needed: hash routing (#labs) requires no SPA rewrites, and vite.config.js reads base from VITE_BASE, which the workflow derives automatically from the repository name (/<repo>/ for a project page, / for a <user>.github.io page). The workflow also enables Pages itself.

One-time setup — create a public repo (free Pages requires public), then:

git remote add origin https://github.com/<USER>/<REPO>.git
git push -u origin master        # use `git branch -M main` first if you prefer main

When the Actions → Deploy to GitHub Pages run is green, the site is live at https://<USER>.github.io/<REPO>/ (and /#labs for the notebooks).

(If a run ever fails with a Pages 404, enable Settings → Pages → Source: GitHub Actions, then re-run.)

Notes

  • Light-only theme via React context (no localStorage; SSR-safe by design).
  • All diagrams/animations are original SVG/Canvas; canvas effects respect prefers-reduced-motion.
  • Diagrams are faithful schematics of the underlying mechanisms, not literal training traces.

License

Dual-licensed:

  • Code (source, build config, tooling) — MIT, see LICENSE.
  • Course content (text, diagrams, animations) — CC BY 4.0, see LICENSE-CONTENT.md.

In short: reuse the code freely, and share or adapt the course content as long as you credit Joseph Femia.

About

An interactive, build-from-scratch course on JEPA and world models. Predict the representation, not the pixels.

Topics

Resources

License

Unknown, Unknown licenses found

Licenses found

Unknown
LICENSE
Unknown
LICENSE-CONTENT.md

Stars

Watchers

Forks

Packages

 
 
 

Contributors