Spatial Audio Blind Assistance
A computer vision and spatial audio system that combines object detection, monocular depth estimation, and spatial audio processing to create an immersive audio experience based on detected objects in the camera view. Project Daredevil explores how consumer devices (iPhone, AirPods, webcams) can provide affordable, real-time spatial audio feedback to blind and low-vision users.
Our goal is to create a proof-of-concept system that translates depth perception into sound—like digital echolocation—to enhance spatial awareness in everyday environments.
- Camera Module - Handles video streaming from various sources including phone cameras
- Detection Module - Object detection and tracking (YOLO-based)
- Depth Module - Monocular depth estimation using Hugging Face DPT models
- Spatial Audio Module - Converts object positions and depths to spatial audio
- Phone Camera Support: Stream from iPhone/Android via various methods
- Real-time Object Detection: Live tracking of objects (focused on water bottles)
- Monocular Depth Estimation: Depth estimation without depth sensors
- Spatial Audio: Stereo panning based on object position and depth
- Modular Design: Each component can be used independently
- Apple Silicon Optimized: Uses MPS acceleration for depth processing
- Make spatial awareness assistance affordable and portable
- Provide subtle, continuous cues (ambient whooshes, localized pitch shifts) instead of overwhelming object-to-sound mappings
- Enable detection of key social and safety cues:
- An approaching handshake
- Objects moving into one's path
- "The last 10 feet" problem
- Ambient depth shifts in hallways or open spaces
Our prototype should demonstrate:
- Real-time object detection, depth estimation and directional audio
- Clear, intuitive audio depth cues for our codesigners
- Smooth integration of all system components
Future goals include:
- Voice command integration
- iOS native app (LiDAR + ARKit + AirPods)
- Continuous ambient spatial audio
- Lightweight deployment on AI/AR glasses (Meta Ray-Ban, open source SDK solutions)
- Safety: Audio feedback must not interfere with natural hearing
- Hardware: Standard webcams have limited field of view
- Learning Curve: Users need time to interpret depth-based audio cues
- Python 3.8+
- macOS (for Apple Silicon optimization)
- Camera access permissions
- Clone the repository:
git clone https://github.com/MIT-Assistive-Technology/Project-Daredevil.git
cd Project-Daredevil- Create and activate virtual environment:
python3 -m venv env
source env/bin/activate # On macOS/Linux- Install dependencies:
pip install --upgrade pip
pip install torch torchvision transformers opencv-python numpy ultralytics PyOpenAL pygame- For spatial audio (macOS only), install OpenAL:
brew install openal-soft# One command - uses your default settings (camera 1, person bottle)
./main.sh
# Or use the configurable version
./main.sh --camera 1 --classes person bottle --volume 0.3 --confidence 0.3
# Or use the web interface (most user-friendly)
./setup_web.sh # First time only
./web_launch.py # Then open http://localhost:8080# Full system with detection, depth, and spatial audio
source env/bin/activate && python3 main.py
# Just depth + detection (no audio)
source env/bin/activate && python3 depth/detection_depth_stream.py
# Live depth streaming only
source env/bin/activate && python3 depth/depth_stream.py# List available cameras
python3 camera/index.py
# Test depth processing
python3 depth/test_depth_integration.py# Test depth processing only
python3 depth/test_depth_integration.py
# Live depth streaming
python3 depth/depth_stream.pyProject-Daredevil/
├── camera/ # Camera streaming module
├── detection/ # Object detection & tracking
├── depth/ # Depth estimation module
├── spatial-audio/ # Spatial audio processing
├── main.py # Central Python entrypoint
├── main.sh # Single launch script (configurable)
├── web_launch.py # Web control panel
├── env/ # Virtual environment
└── README.md # This file- Depth Processing: ~12.8 FPS on Apple Silicon
- Camera Streaming: 30 FPS
- Memory Usage: ~1GB for depth model
- Latency: ~78ms per frame for depth processing
Project progress is tracked in our GitHub Project Board.
Run integration tests:
python3 depth/test_depth_integration.pyMonocular depth estimation using Hugging Face DPT models.
Key Features:
- Offline-capable depth estimation
- Bounding box ROI processing
- Multiple normalization methods
- Integration-ready interface
Usage:
from depth import DepthProcessor, create_depth_processor
processor = create_depth_processor()
result = processor.get_depth_for_spatial_audio(frame, bbox)
depth_value = result['normalized_depth'] # 0.0 to 1.0Object detection and tracking (YOLO-based).
Status: Completed
Focus: Water bottle detection for demo purposes
Real 3D spatial audio using OpenAL, compatible with Apple AirPods.
Status: ✅ Completed
Features:
- True 3D spatial audio using OpenAL
- AirPods spatial audio compatible
- Calm white noise for object localization
- Real-time 3D positioning (30-60 Hz)
- Depth-based distance rendering
- Multi-object support (up to 10 simultaneous sources)
Usage:
# Test spatial audio standalone
python3 spatial-audio/index.py
# Run full integration with detection + depth + audio
python3 spatial-audio/integration.pySee spatial-audio/README.md for full documentation.
The system uses a universal coordinate system:
- Origin: Top-left corner (0, 0)
- X-axis: Left to right (0 to frame_width)
- Y-axis: Top to bottom (0 to frame_height)
- Bounding Box Format: [x1, y1, x2, y2]
- Depth Values: Normalized to 0.0-1.0 range
-
Install DroidCam:
- Download DroidCam from Google Play Store
- Install DroidCam Server on Mac from https://droidcam.com
-
Setup:
- Connect via USB or WiFi
- DroidCam will appear as camera device
-
Install IP Webcam:
- Download IP Webcam from Google Play Store
-
Setup:
- Start IP Webcam on phone
- Note the IP address (e.g., 192.168.1.100:8080)
- Use IP camera streaming in the camera module
-
Enable Continuity Camera:
- iPhone must be signed into same Apple ID as Mac
- iPhone must be nearby and unlocked
- For a more reliable connection, can directly connect to Mac via wire AND with permission granted
-
Usage:
- Continuity Camera appears as camera index 1
- Automatically detected by the camera module
Problem: "Could not open camera" Solution:
- Check camera permissions in System Preferences
- Try different camera indices (0, 1, 2)
- Restart camera applications
Problem: Phone camera not detected Solution:
- Ensure DroidCam is running on phone
- Check USB/WiFi connection
- Try IP camera method
Problem: Model loading fails Solution:
- Check internet connection for first download
- Verify sufficient disk space
- Check PyTorch installation
Problem: Low performance Solution:
- Use smaller frame sizes
- Process only necessary bounding boxes
- Use median method for depth calculation
Problem: Camera access denied Solution:
- Go to System Preferences > Security & Privacy > Camera
- Enable camera access for Terminal/Python
- Restart the application
- Hugging Face - DPT depth estimation models
- Ultralytics - YOLO object detection
- OpenCV - Computer vision framework
- Apple - Continuity Camera API
MIT Assistive Technology Club
Fall 2025 Project