Spectra

Real-time social-emotional intelligence for video calls

An AI-powered desktop overlay that helps neurodivergent individuals understand social cues during video conversations by analyzing facial expressions, voice prosody, and speech patterns.

🎯 Overview

Spectra is a desktop application designed to bridge the social communication gap for neurodivergent individuals during video calls. By combining computer vision, speech recognition, and AI-powered analysis, Spectra provides real-time insights into the emotional state and social cues of conversation partners.

Key Features

🎭 Facial Expression Analysis - Real-time detection of smiles, brow movements, and eye contact using MediaPipe
🎤 Voice Prosody Analysis - Emotion detection from vocal tone and pitch using Hume AI
💬 Live Transcription - Real-time speech-to-text for both conversation participants via Deepgram
🤖 AI-Powered Insights - Contextual social cue interpretation using GPT-5-nano or Llama 3.1
⚡ Dual-Path Synthesis - Fast rule-based alerts (500ms) + deep LLM analysis (2s)
🪟 Floating Overlay - Non-intrusive HUD that stays on top of all applications
⌨️ Keyboard Controls - Quick positioning and visibility toggling

🏗️ Architecture

System Components

Frontend Layer

Electron: Cross-platform desktop framework with native macOS integration
React 18: Component-based UI with hooks for state management
Zustand: Lightweight state management for real-time data flow
Framer Motion: Smooth animations and transitions for the overlay UI
Tailwind CSS v4: Modern utility-first styling with glassmorphism effects

AI Services Layer

MediaPipe FaceLandmarker: 468-point facial landmark detection with expression blendshapes
Hume AI Batch API: Voice prosody analysis for emotional tone detection
Deepgram Nova-2: Real-time speech-to-text with speaker diarization
OpenAI GPT-5-nano / Groq Llama 3.1: Social cue interpretation and contextual advice generation

Processing Pipeline

Input Capture: Desktop screen + system audio capture via Electron's desktopCapturer
Parallel Processing: Simultaneous face analysis (160ms), audio transcription (real-time), and prosody detection (3s chunks)
Context Synthesis: Fusion of visual, auditory, and textual signals into unified emotional state
Dual-Path Analysis:
- Fast Path (500ms): Rule-based detection for sarcasm, disengagement, escalation
- Slow Path (2s): LLM-powered nuanced interpretation with actionable advice
Real-Time Display: Emotion visualization, live advice, and alerts on floating overlay

📊 Data Flow

Detailed Signal Flow

For a comprehensive view of the complete signal processing pipeline with all intermediate steps:

🚀 Getting Started

Prerequisites

Node.js 18.x or higher
macOS 12.0+ (Monterey or later) with Screen Recording permissions
API Keys for:
- Hume AI - Voice prosody analysis
- Deepgram - Speech-to-text
- OpenAI or Groq - LLM inference

Installation

# Clone the repository
git clone https://github.com/Dharshan2004/spectra.git
cd spectra

# Install dependencies
npm install

# Create environment file
cp .env.example .env

# Add your API keys to .env
VITE_HUME_API_KEY=your_hume_key
VITE_DEEPGRAM_API_KEY=your_deepgram_key
VITE_OPENAI_API_KEY=your_openai_key
# OR
VITE_GROQ_API_KEY=your_groq_key

Running in Development

npm run dev

The app will launch with DevTools attached. On first run, you'll need to grant Screen Recording permissions:

Go to System Settings → Privacy & Security → Screen Recording
Enable Electron in the list
Restart the app (Cmd+Q, then npm run dev)

Building for Production

# Build the app
npm run build

# Package for macOS
npm run package:mac

🎮 Usage

Basic Controls

Action	Shortcut
Move overlay	`Cmd + Arrow Keys`
Hide/Show	`Cmd + H`
Select source	Click monitor icon
Start microphone	Click mic icon
Collapse/Expand	Click chevron icon

Workflow

Launch Spectra - The overlay appears in the top-right corner
Select Source - Click the monitor icon and choose "Entire Screen" (required for audio)
Start Call - Join your video call (Zoom, Meet, Teams, etc.)
Monitor Insights - Watch real-time emotion detection and social cue advice
Optional: Enable Mic - Add your own voice for better conversation context

Understanding the HUD

Emotion Orb: Pulses with detected emotion color (Joy=Gold, Anger=Red, etc.)
Emotion Bar: Visual intensity indicator below the toolbar
Live Advice: Main text showing contextual social cue interpretation
Cue Tags: Detected patterns with confidence scores
Alerts: Pop-up notifications for critical situations (escalation, sarcasm)

🔧 Technical Details

Emotion Detection

Facial Expression Mapping:

Smile: (mouthSmileLeft + mouthSmileRight) / 2 from MediaPipe blendshapes
Brow Tension: (browDownLeft + browDownRight) / 2 indicates concern/frustration
Eye Contact: Calculated from horizontal gaze direction (looking straight vs. away)

Voice Prosody:

Hume AI analyzes 48 emotional dimensions, mapped to 6 core emotions: Joy, Anger, Sadness, Fear, Anxiety, Neutral
Audio encoded as 16-bit PCM WAV at 16kHz sample rate
Processed in 3-second chunks via Hume's batch API

Context Synthesis:

Rule-based fast path detects sarcasm (tone-expression mismatch), disengagement (low eye contact + flat affect), escalation (high anger)
LLM slow path combines transcript + facial data + prosody for nuanced interpretation
Advice generated with emphasis on supportive, actionable guidance

macOS Screen Capture

Spectra uses desktopCapturer with special handling for system audio:

// Screen capture with audio
const stream = await navigator.mediaDevices.getUserMedia({
  audio: {
    mandatory: {
      chromeMediaSource: 'desktop',
      chromeMediaSourceId: sourceId
    }
  },
  video: {
    mandatory: {
      chromeMediaSource: 'desktop',
      chromeMediaSourceId: sourceId
    }
  }
})

Important: System audio is only available when capturing "Entire Screen", not individual windows. This is a macOS security restriction.

Performance Optimization

Frame Processing: 6 FPS (every 160ms) for MediaPipe to balance accuracy and CPU usage
AudioContext: ScriptProcessor with 4096 buffer size for low-latency processing
State Management: Zustand with useShallow to prevent unnecessary re-renders
Debouncing: LLM calls rate-limited to 1.5s minimum interval
Lazy Loading: MediaPipe WASM loaded from CDN on-demand

🧩 Project Structure

spectra/
├── electron/
│   ├── main.ts              # Main process (window management, IPC)
│   └── preload.ts           # Preload script (IPC bridge)
├── src/
│   ├── components/
│   │   └── HUD/
│   │       └── SocialHUD.tsx    # Main overlay UI
│   ├── services/
│   │   ├── audioCapture.ts      # System audio + mic capture
│   │   ├── screenCapture.ts     # Desktop video capture
│   │   ├── vision.ts            # MediaPipe integration
│   │   ├── hume.ts              # Hume AI prosody API
│   │   ├── transcription.ts     # Deepgram streaming
│   │   └── socialCueDecoder.ts  # LLM-powered analysis
│   ├── core/
│   │   └── contextSynthesis.ts  # Signal fusion logic
│   ├── hooks/
│   │   └── useSocialSynthesis.ts # Dual-path orchestration
│   ├── store/
│   │   └── useSocialStore.ts    # Zustand state management
│   ├── types/
│   │   └── index.ts             # TypeScript definitions
│   ├── App.tsx                  # Root component
│   ├── main.tsx                 # React entry point
│   └── index.css                # Global styles (Tailwind)
├── assets/
│   ├── logo.png                 # Application logo
│   ├── architecture.png         # System architecture diagram
│   └── dataflow.png             # Data flow visualization
├── electron.vite.config.ts      # Vite + Electron build config
├── tailwind.config.js           # Tailwind CSS configuration
├── tsconfig.json                # TypeScript configuration
├── package.json                 # Dependencies and scripts
└── .env.example                 # Environment variables template

🔐 Privacy & Security

Local Processing: All AI inference happens via API calls; no data stored locally
No Recording: Spectra analyzes live streams but does not record or save audio/video
Encrypted Transit: All API communications use HTTPS/WSS
Permissions: Only requests Screen Recording access; no camera or mic access by default
Click-Through: Overlay doesn't interfere with underlying applications
API Keys: Stored in local .env file, never committed to version control

🐛 Troubleshooting

Screen Recording Permission Issues

Problem: "Unable to capture screen" or "Permission denied"

Solution:

Open System Settings → Privacy & Security → Screen Recording
Find Electron (dev mode) or Spectra (production) in the list
Toggle it OFF and back ON
Fully quit the app (Cmd+Q) and restart

System Audio Not Working

Problem: Transcription and prosody not detecting audio

Solution:

Ensure you selected "Entire Screen" or "Screen 1", NOT a window
macOS only provides audio for full screen captures, not individual windows
Check that audio is playing from the video call (test with YouTube)

Overlay Not Appearing on Fullscreen Apps

Problem: HUD disappears when video call goes fullscreen

Solution:

This should be handled automatically via setVisibleOnAllWorkspaces
If it persists, try moving the overlay with Cmd+Arrow Keys while in fullscreen
Alternative: Use "windowed fullscreen" instead of native fullscreen in your video call app

High CPU Usage

Problem: Fans spinning up, system lag

Solution:

MediaPipe face detection is GPU-accelerated but can be intensive
Reduce quality: Lower screen capture resolution in screenCapture.ts (line 44-47)
Close DevTools in production (npm run build instead of npm run dev)
Disable Hume prosody if not needed (comment out in App.tsx line 110-114)

🛠️ Development

Code Style

# Run linter
npm run lint

# Auto-fix issues
npm run lint:fix

# Format code
npm run format

Testing

# Run tests (when implemented)
npm test

# Test in production mode without packaging
npm run build
npm run preview

Debugging

Electron Main Process:

# Run with inspector
npm run dev -- --inspect=5858

React DevTools:

DevTools open automatically in development mode
Use Console tab for service logs ([MediaPipe], [Hume], [Deepgram], etc.)

📈 Future Roadmap

Multi-language Support - Extend beyond English for global accessibility
Sentiment History - Track emotional patterns over time with visualizations
Custom Alerts - User-defined rules for specific social cue combinations
Recording Mode - Optional session recording for post-call review
Mobile Companion - iOS/Android app for in-person conversations
Accessibility Features - Screen reader support, high-contrast themes
Integration APIs - Export insights to therapy apps or journaling tools

🤝 Contributing

We welcome contributions from the community! Whether it's bug reports, feature requests, or code contributions, please feel free to get involved.

How to Contribute

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Please read our CONTRIBUTING.md for detailed guidelines.

👥 Team

NTU WIT Beyond Binary 2026 Hackathon Team

_{K Priyadharshan}
_{Lead Developer}

_{Negha M}
_Teammate

_Aafia
_Teammate

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

MediaPipe by Google - Facial landmark detection framework
Hume AI - Voice prosody and emotion recognition API
Deepgram - High-accuracy speech-to-text service
OpenAI / Groq - LLM inference for social cue interpretation
Electron - Cross-platform desktop framework
React - UI component library

📧 Contact

For questions, feedback, or collaboration opportunities:

Email: kpd2204@gmail.com
GitHub: @Dharshan2004
LinkedIn: K Priyadharshan

Built for NTU WIT Beyond Binary 2026 Hackathon

Empowering neurodivergent individuals with AI-driven social intelligence

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
assets		assets
electron		electron
src		src
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
electron.vite.config.ts		electron.vite.config.ts
entitlements.mac.plist		entitlements.mac.plist
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
tailwind.config.js		tailwind.config.js
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json

Folders and files

Latest commit

History

Repository files navigation

Spectra

🎯 Overview

Key Features

🏗️ Architecture

System Components

Frontend Layer

AI Services Layer

Processing Pipeline

📊 Data Flow

Detailed Signal Flow

🚀 Getting Started

Prerequisites

Installation

Running in Development

Building for Production

🎮 Usage

Basic Controls

Workflow

Understanding the HUD

🔧 Technical Details

Emotion Detection

macOS Screen Capture

Performance Optimization

🧩 Project Structure

🔐 Privacy & Security

🐛 Troubleshooting

Screen Recording Permission Issues

System Audio Not Working

Overlay Not Appearing on Fullscreen Apps

High CPU Usage

🛠️ Development

Code Style

Testing

Debugging

📈 Future Roadmap

🤝 Contributing

How to Contribute

👥 Team

NTU WIT Beyond Binary 2026 Hackathon Team

📄 License

🙏 Acknowledgments

📧 Contact

Built for NTU WIT Beyond Binary 2026 Hackathon

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages