Skip to content

luigismith/handsynth

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

77 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HandSynth

A gestural synthesizer for live performance — play with your hands, mouth, and face.

Move your hands, open your mouth, smile, frown, point, swipe — a generative music brain responds in real time. Designed for stage: webcam in, low-latency audio out, no MIDI hardware needed. Cyberpunk visualizer · MediaPipe webcam tracking · Tone.js audio engine.

hero

tests license platform


What it is

A web app + Electron desktop app that turns your body into an expressive musical instrument for live performance. No keys, no knobs to learn — your hands, head, mouth, and face are the interface. Setup is a webcam and a browser. The audio output works through any system route (laptop speakers, USB interface, BlackHole into a DAW, mixer).

📖 Install on macOS: INSTALL.md (🇮🇹 italiano) — detailed walkthrough including Gatekeeper, audio routing for stage, webcam choice tips.

  • Hand position sweeps the master filter
  • Hand height drives note density and brightness
  • Hand openness controls reverb, delay, drive, and resonance
  • Hand depth (toward / away from camera) raises and lowers the master volume
  • Palm rotation modulates brightness and grit
  • Per-finger curl — each of the 10 fingers maps to its own audio dimension (right hand: delay feedback, filter cutoff, reverb, brightness, delay wet; left hand: drive, Q, reverb extra, master volume, tremolo). Play your hands like an instrument.
  • Pinch right triggers a harmony-aware lead stab
  • Pinch left advances the chord progression
  • Mouth open sweeps delay wet, filter cutoff, reverb, and brightness simultaneously
  • Smile / frown / surprise / anger modulate brightness, cutoff, drive, and resonance — your face is the fourth controller
  • Both fists mutes; both hands above head is the "drop"

The music brain stays inside the active scale — it is impossible to play a wrong note — and four built-in vibes (Tycho, Bonobo, Hopkins, Floating Points) set tonality, BPM, and timbre. The PATCH editor lets you override the key (12 chromatic roots) and scale (Major, Minor, Harmonic / Melodic Minor, the seven church modes, Pentatonic Major / Minor, Blues, Chromatic) independently of the vibe — keep the Bonobo style but play in C-minor, or run Tycho through Phrygian for a darker drift. Eight factory PATCH presets (LUSH, ACID, DUB, BRIGHT, DARK, TAPE, SPACE, INIT) flip whole-sound character with one click — each preset re-shapes the oscillators and envelopes of the pad/lead/bass voices on top of the FX chain, so they sound genuinely different (not just "same patch through different reverb"). Save your own patches to localStorage.

Intelligent voicing. Each pitched voice (pad/lead/bass) runs two oscillator stacks through a Tone.js CrossFade — the analog-synth "WAVE" knob feel. The PATCH editor's VOICE section gives you BOTH a waveform dropdown to explicitly pick the A-side oscillator per voice (13 options: sine, triangle, sawtooth, square, pulse, plus the fat/FM/AM variants) AND a Pad/Lead/Bass Morph dial to crossfade continuously between that pick and a clean morph destination (sine for pad, pulse for lead, triangle for bass). On top of that, a SMART toggle (ON by default) nudges each voice's morph by up to ±0.15 based on the running FX state — low cutoff biases the pad toward sine, high drive pulls everything toward harmonic-rich waveforms for the saturator to bite, high reverb pushes the pad glassy. Subtle, never an override — your knob always wins.

Quick start (web)

pnpm install
pnpm dev

Open http://localhost:5173, click Allow webcam and begin (or Permetti webcam e iniziare if your browser is set to Italian), raise your hands.

For live shows, run pnpm build && pnpm preview instead — production mode, no HMR overhead, much steadier audio.

Tested in Chrome, Edge, Safari 15+, Firefox 114+. Webcam permission required. Full Mac install walkthrough including Gatekeeper, audio routing, and stage tips: INSTALL.md · INSTALL.it.md.

Bilingual UI (English / Italian)

The entire interface is available in English and Italian. The browser locale picks the starting language automatically (it, it-IT, it-CH → Italian; everything else → English), and a small IT / EN pill in the bottom-right HUD strip flips the UI live without a reload. Your choice is remembered in localStorage. The in-app manual itself is fully translated — see USER_MANUAL.md (English) and USER_MANUAL.it.md (Italian).

Desktop installers

# Windows .exe (NSIS installer with directory picker + shortcuts)
pnpm electron:build:win
# → release/HandSynth-Setup-0.1.0.exe (~87 MB)

# macOS .dmg (run on a Mac — cross-build from Windows is unreliable)
pnpm electron:build:mac
# → release/HandSynth-0.1.0-arm64.dmg + HandSynth-0.1.0-x64.dmg

# Dev with hot reload
pnpm electron:dev

See electron/BUILD.md for code-signing + notarization notes.

Controls

Hands (always on)

Gesture Audio target Range
Hands distance (3D) Filter cutoff 200 Hz → 12 kHz, log
Mean hand height Note density + brightness 1/4 → 1/16, dim → bright
Right palm openness Reverb wet + delay feedback dry → wash
Left palm openness Saturator drive + filter Q clean → screaming
Right pinch (rising edge) Trigger harmony-aware lead stab one-shot
Left pinch (rising edge) Advance chord progression one-shot
Both fists held Master mute (200 ms fade) toggle
Both hands above head Drop (max reverb, filter wide open) held
Mean hand depth (Z) Master volume (closer = louder) additive
Right palm roll Brightness fine-tune (±0.15) additive
Left palm roll Saturator drive fine-tune (±0.4) additive
Mean palm pitch Delay feedback fine-tune (±0.15) additive

Per-finger (additive, 35% weight on top of openness)

Each finger on each hand emits its own continuous 0..1 curl scalar and drives its own audio dimension. Tuned high-sensitivity (One-Euro β=0.08) so small deliberate motions register.

Finger Right (FX hand) Left (drive hand)
Thumb Delay feedback 0.7 → 0.1 Saturator drive 2.6 → 0.8
Index Filter cutoff 14 kHz → 600 Hz (log) Filter Q 12 → 1.5
Middle Reverb wet 0.85 → 0.05 Reverb wet extra 0.7 → 0.05
Ring Brightness offset +0.15 → −0.15 Master volume offset +0.10 → −0.10
Pinky Delay wet 0.6 → 0.05 Tremolo depth (brightness LFO @5 Hz) 0 → 1

Discrete gestures (right hand unless noted)

Gesture Hand Effect
Point right Filter Q spike +6 (decays over 600 ms)
Peace (V) right Brightness pulse (vibrato approximation)
Rock on (horns) right Saturator drive +0.35 for 1.5 s
OK (ring) right Delay feedback +0.2 for 1 s (tape flutter)
Finger gun right Lead chord-tone stab
Thumbs up right Save quick-patch (logged)
Thumbs down right Reset to INIT factory preset
Three (I+M+R) right Apply factory preset slot 3
Four (I+M+R+P) right Apply factory preset slot 4
Call me (shaka) right Percussion one-shot
Snap (fast index extend) either Percussion one-shot
Swipe right right Next factory preset
Swipe left right Previous factory preset
Fist pump (both hands, fast down) both Drop bomb (delay fb +0.3 + reverb +0.3)
Wave (oscillating, ≥3 flips in 1.2 s) either Tremolo wobble (brightness LFO)

Discrete gestures are gated by a 3-frame consensus + per-gesture cooldown so flicker doesn't fire spurious events. See USER_MANUAL.md for the full cheat sheet.

Face (when in frame)

Gesture Audio target Visual
Apparent face size Reverb wet blend (closer = drier)
Head roll Brightness offset (±0.15)
Head yaw Filter resonance offset (±5 Q)
Head pitch Note density boost
Mouth open Delay wet + filter cutoff +6k + reverb +0.3 + brightness +0.4 Mouth-emit particles
Mouth open (rising edge) Lead chord-tone stab
Smile Brightness +0.2, masterDuck −0.15 (brighter, louder)
Frown Filter cutoff pulled toward 1.5 kHz (darkens)
Surprise Reverb +0.3, delay feedback +0.1 (opens up)
Anger Saturator drive +0.6 (clamped), filter Q +5 (clamped)
Face lost > 1.5 s Master duck +0.15

Keyboard

Key Action
t Toggle the live event terminal (left side)
p Toggle the PATCH editor
h or F1 Toggle the in-app manual (also exposed as the bottom-right help icon)
m Flip the selfie mirror (only if your webcam stream is already pre-mirrored)
Escape Mute / unmute the audio (also exposed as the bottom-right STOP icon)

All shortcuts use letter or function/control keys so they remain reachable on every international keyboard layout (no symbol-key dependencies).

A small bottom-right HUD strip surfaces these as three tiny icon buttons (STOP / TERMINAL / HELP) for users who prefer a click. See USER_MANUAL.md for the long-form user guide.

Key & scale

The PATCH editor exposes two dropdowns above the knob grid: KEY (12 chromatic roots) and SCALE (13 modes, from Major and the church modes through Harmonic / Melodic Minor, Pentatonics, Blues, and Chromatic). Pick whatever you want and the music brain re-tunes the lead and bass generators on the next note — the chord progression keeps the vibe's character, the snap-to-consonance filter handles any clashes. The button resets to the current vibe's default; your selection persists across sessions in localStorage.

Architecture

Eight modular subsystems with strictly-typed contracts in src/types/contracts.ts:

HandTracker ─┐
FaceTracker ─┴─→ InteractionMapper ─→ AudioEngine ─→ AnalyserNode ─→ Visualizer
                                  └→ MusicBrain ─→ AudioEngine
  • AudioEngine owns Tone.js: master FX chain (filter → saturator → EQ → compressor → widener → ping-pong delay → reverb → limiter), four voice engines (pad, lead, bass, perc), AudioWorklet saturator with Tone.Distortion fallback.
  • MusicBrain is the generative composer: order-2 Markov chains on scale degrees with Bezier contour shaping, harmonic filter that snaps every note to a chord-tone or consonant tension, Bjorklund Euclidean rhythms for percussion, swing on 16ths.
  • HandTracker / FaceTracker wrap MediaPipe Tasks Vision, do per-landmark One-Euro filtering, derive scalars (openness, pinch, depth, roll, pitch, eyesWide, mouthOpen, etc.), emit events at 24 Hz / 8 Hz respectively to keep main-thread headroom for the audio scheduler.
  • InteractionMapper is the patch bay — gesture state in, audio params + music inputs out. Per-param epsilon diff so unchanged knobs don't fire ramp events.
  • Visualizer is a p5.js instance-mode sketch: low-poly orange-on-charcoal cyberpunk style, FFT-reactive triangle particles, hex grid backdrop, hand silhouettes with diamond fingertips, full face mesh with iris rings, fake-arm bezier connectors.
  • SettingsPanel is the analog-synth-style PATCH editor with knobs, factory presets, and patch save/load.
  • Terminal is the translucent left-side event log.

Full module map and data-flow diagram in ARCHITECTURE.md.

Quality gates

pnpm typecheck   # tsc strict, no errors
pnpm lint        # eslint
pnpm test        # vitest — 275 unit + integration tests
pnpm build       # vite production build

CI runs all four on every push (see .github/workflows/ci.yml).

Tech

Build Vite 6, TypeScript strict (ES2022), pnpm
Audio Tone.js v15, AudioWorklet, custom soft-clip saturator
Tracking MediaPipe Tasks Vision (HandLandmarker + FaceLandmarker)
Visualizer p5.js 1.x in instance mode, Canvas2D, custom bloom + scanline pipeline
Music @tonaljs/tonal for scales / chords / voicings
Smoothing One Euro Filter (Casiez, Roussel, Vogel — CHI 2012)
Desktop Electron 33 + electron-builder

Performance notes

The app runs heavy real-time work on the main thread (two MediaPipe inferences + p5 sketch + Tone.js scheduler). To keep audio glitch-free under cumulative load:

  • HandLandmarker capped at 24 Hz; FaceLandmarker at 8 Hz
  • p5 frameRate capped at 30
  • Per-particle pool with no per-frame allocations
  • Per-param epsilon diff so static gestures don't queue ramp events
  • AudioContext built with latencyHint: 0.15 and Tone lookAhead: 0.4 s so short main-thread stalls don't audibly drop audio

If you experience audio glitches: try pnpm preview (production build, no Vite HMR overhead) instead of pnpm dev. The production build is significantly lighter on the main thread.

Contributing

PRs welcome. See CONTRIBUTING.md. The project uses a multi-agent contract pattern — modules in src/audio/, src/music/, src/hands/, src/face/, src/visual/, src/interaction/, src/ui/ are owned independently and connected only via interfaces in src/types/contracts.ts. Don't break the contract.

License

MIT — see LICENSE. Use it, fork it, install it on your friends' machines, run a live show, build a follow-on with it. Attribution is appreciated but not required.


Built with Claude Code.

About

Gestural synthesizer played with your hands, mouth, and eyes — Tone.js + MediaPipe + Electron desktop.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages