Project Daredevil

Spatial Audio Blind Assistance

A computer vision and spatial audio system that combines object detection, monocular depth estimation, and spatial audio processing to create an immersive audio experience based on detected objects in the camera view. Project Daredevil explores how consumer devices (iPhone, AirPods, webcams) can provide affordable, real-time spatial audio feedback to blind and low-vision users.

Design Review Slides

Overview

Our goal is to create a proof-of-concept system that translates depth perception into sound—like digital echolocation—to enhance spatial awareness in everyday environments.

System Modules

Camera Module - Handles video streaming from various sources including phone cameras
Detection Module - Object detection and tracking (YOLO-based)
Depth Module - Monocular depth estimation using Hugging Face DPT models
Spatial Audio Module - Converts object positions and depths to spatial audio

Key Features

Phone Camera Support: Stream from iPhone/Android via various methods
Real-time Object Detection: Live tracking of objects (focused on water bottles)
Monocular Depth Estimation: Depth estimation without depth sensors
Spatial Audio: Stereo panning based on object position and depth
Modular Design: Each component can be used independently
Apple Silicon Optimized: Uses MPS acceleration for depth processing

Project Motivations

Make spatial awareness assistance affordable and portable
Provide subtle, continuous cues (ambient whooshes, localized pitch shifts) instead of overwhelming object-to-sound mappings
Enable detection of key social and safety cues:
- An approaching handshake
- Objects moving into one's path
- "The last 10 feet" problem
- Ambient depth shifts in hallways or open spaces

Success Criteria

Our prototype should demonstrate:

Real-time object detection, depth estimation and directional audio
Clear, intuitive audio depth cues for our codesigners
Smooth integration of all system components

Future goals include:

Voice command integration
iOS native app (LiDAR + ARKit + AirPods)
Continuous ambient spatial audio
Lightweight deployment on AI/AR glasses (Meta Ray-Ban, open source SDK solutions)

Current Limitations

Safety: Audio feedback must not interfere with natural hearing
Hardware: Standard webcams have limited field of view
Learning Curve: Users need time to interpret depth-based audio cues

Installation

Prerequisites

Python 3.8+
macOS (for Apple Silicon optimization)
Camera access permissions

Setup

Clone the repository:

git clone https://github.com/MIT-Assistive-Technology/Project-Daredevil.git
cd Project-Daredevil

Create and activate virtual environment:

python3 -m venv env
source env/bin/activate  # On macOS/Linux

Install dependencies:

pip install --upgrade pip
pip install torch torchvision transformers opencv-python numpy ultralytics PyOpenAL pygame

For spatial audio (macOS only), install OpenAL:

brew install openal-soft

Quick Start Commands

⚡ Quickest Way to Launch

# One command - uses your default settings (camera 1, person bottle)
./main.sh

# Or use the configurable version
./main.sh --camera 1 --classes person bottle --volume 0.3 --confidence 0.3

# Or use the web interface (most user-friendly)
./setup_web.sh  # First time only
./web_launch.py  # Then open http://localhost:8080

Run Everything (Recommended)

# Full system with detection, depth, and spatial audio
source env/bin/activate && python3 main.py

# Just depth + detection (no audio)
source env/bin/activate && python3 depth/detection_depth_stream.py

# Live depth streaming only
source env/bin/activate && python3 depth/depth_stream.py

Test Everything Works

# List available cameras
python3 camera/index.py

# Test depth processing
python3 depth/test_depth_integration.py

Individual Components

# Test depth processing only
python3 depth/test_depth_integration.py

# Live depth streaming
python3 depth/depth_stream.py

Repository Structure

Project-Daredevil/
├── camera/           # Camera streaming module
├── detection/        # Object detection & tracking
├── depth/           # Depth estimation module
├── spatial-audio/   # Spatial audio processing
├── main.py          # Central Python entrypoint
├── main.sh          # Single launch script (configurable)
├── web_launch.py    # Web control panel
├── env/             # Virtual environment
└── README.md        # This file

Performance

Depth Processing: ~12.8 FPS on Apple Silicon
Camera Streaming: 30 FPS
Memory Usage: ~1GB for depth model
Latency: ~78ms per frame for depth processing

Development

Project progress is tracked in our GitHub Project Board.

Testing

Run integration tests:

python3 depth/test_depth_integration.py

Technical Documentation

Module Documentation

Monocular depth estimation using Hugging Face DPT models.

Key Features:

Offline-capable depth estimation
Bounding box ROI processing
Multiple normalization methods
Integration-ready interface

Usage:

from depth import DepthProcessor, create_depth_processor

processor = create_depth_processor()
result = processor.get_depth_for_spatial_audio(frame, bbox)
depth_value = result['normalized_depth']  # 0.0 to 1.0

Detection Module (`detection/`)

Object detection and tracking (YOLO-based).

Status: Completed

Focus: Water bottle detection for demo purposes

Spatial Audio Module (`spatial-audio/`)

Real 3D spatial audio using OpenAL, compatible with Apple AirPods.

Status: ✅ Completed

Features:

True 3D spatial audio using OpenAL
AirPods spatial audio compatible
Calm white noise for object localization
Real-time 3D positioning (30-60 Hz)
Depth-based distance rendering
Multi-object support (up to 10 simultaneous sources)

Usage:

# Test spatial audio standalone
python3 spatial-audio/index.py

# Run full integration with detection + depth + audio
python3 spatial-audio/integration.py

See spatial-audio/README.md for full documentation.

Coordinate System

The system uses a universal coordinate system:

Origin: Top-left corner (0, 0)
X-axis: Left to right (0 to frame_width)
Y-axis: Top to bottom (0 to frame_height)
Bounding Box Format: [x1, y1, x2, y2]
Depth Values: Normalized to 0.0-1.0 range

Phone Camera Setup

Method 1: DroidCam (Android)

Install DroidCam:
- Download DroidCam from Google Play Store
- Install DroidCam Server on Mac from https://droidcam.com
Setup:
- Connect via USB or WiFi
- DroidCam will appear as camera device

Method 2: IP Webcam (Android)

Install IP Webcam:
- Download IP Webcam from Google Play Store
Setup:
- Start IP Webcam on phone
- Note the IP address (e.g., 192.168.1.100:8080)
- Use IP camera streaming in the camera module

Method 3: Continuity Camera (iPhone)

Enable Continuity Camera:
- iPhone must be signed into same Apple ID as Mac
- iPhone must be nearby and unlocked
- For a more reliable connection, can directly connect to Mac via wire AND with permission granted
Usage:
- Continuity Camera appears as camera index 1
- Automatically detected by the camera module

Troubleshooting

Camera Issues

Problem: "Could not open camera" Solution:

Check camera permissions in System Preferences
Try different camera indices (0, 1, 2)
Restart camera applications

Problem: Phone camera not detected Solution:

Ensure DroidCam is running on phone
Check USB/WiFi connection
Try IP camera method

Depth Processing Issues

Problem: Model loading fails Solution:

Check internet connection for first download
Verify sufficient disk space
Check PyTorch installation

Problem: Low performance Solution:

Use smaller frame sizes
Process only necessary bounding boxes
Use median method for depth calculation

Permission Issues

Problem: Camera access denied Solution:

Go to System Preferences > Security & Privacy > Camera
Enable camera access for Terminal/Python
Restart the application

Credits

Hugging Face - DPT depth estimation models
Ultralytics - YOLO object detection
OpenCV - Computer vision framework
Apple - Continuity Camera API

MIT Assistive Technology Club
Fall 2025 Project

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
camera		camera
data		data
depth		depth
detection		detection
environment		environment
spatial-audio		spatial-audio
.gitignore		.gitignore
DEMO_COMMANDS.md		DEMO_COMMANDS.md
LICENSE		LICENSE
README.md		README.md
detection_depth_integration.py		detection_depth_integration.py
install_and_run.sh		install_and_run.sh
main.py		main.py
main.sh		main.sh
run_tests.sh		run_tests.sh
setup_web.sh		setup_web.sh
web_launch.py		web_launch.py
yolo11n.pt		yolo11n.pt
yolov8n.pt		yolov8n.pt

License

MIT-Assistive-Technology/Project-Daredevil

Folders and files

Latest commit

History

Repository files navigation