WorldLens — AI Vision & Voice Assistant

WorldLens is a real-time assistive application designed to help visually impaired individuals navigate and understand their environment. Powered by Amazon Nova Sonic and Amazon Nova Lite, it provides a low-latency, voice-first interface for grocery shopping, document reading, medication safety, and general environmental awareness.

🏗️ Architecture

WorldLens uses a hybrid architecture to balance fast, static frontend delivery with robust, persistent bidirectional communication.

graph TD
    Client[Next.js Web App] -->|HTTPS / POST| API_Routes[Next.js API Routes / Amplify]
    API_Routes -->|Vision/Grounding| Bedrock_Lite[Amazon Nova Lite]
    API_Routes -->|State| DDB[DynamoDB]
    
    Client <-->|SDK / Bidirectional Stream| Bedrock_Sonic[Amazon Nova Sonic]
    Client -->|Auth| Cognito[Cognito Identity Pool]

Key Components:

Frontend (Next.js): Hosted on AWS Amplify, providing a responsive UI with real-time VAD (Voice Activity Detection), motion sensing, and direct Bedrock streaming.
Cognito Integration: Provides temporary AWS credentials via an Identity Pool for secure browser-side access to Bedrock and S3.
BFF (Backend-for-Frontend) API: Server-side routes in Next.js to securely proxy Bedrock vision analysis, grounding requests, and DynamoDB state management.
Real-Time Pipeline: Direct bidirectional streaming between the client and Amazon Bedrock Sonic for sub-second voice latency.

✨ Features

🎙️ Real-Time Voice Assistant: Natural, bidirectional voice interaction using Nova Sonic.
👁️ Multimodal Vision Modes:
- Grocery Mode: Identifies products, brands, and prices in a shopping aisle.
- Document Mode: High-precision OCR and document analysis for bills, letters, and forms.
- Medication Mode: Critical safety checks for dosage and medicine names.
- Environment Mode: Identifies hazards (traffic lights, obstacles) and scene context.
🧠 Stable Session Memory: Remembers what has been seen across the session to provide context-aware proactive suggestions.
⚡ Proactive Insights: Automatically alerts users to hazards or relevant products based on their current "goal."
📊 Real-Time VAD & Motion Detection: Intelligently captures frames only when significant motion or speech is detected.

🚀 Getting Started

Prerequisites

AWS Account with access to Amazon Bedrock (Nova Sonic and Nova Lite models enabled in us-east-1 or your preferred region).
AWS CLI configured with administrative credentials.
Node.js 22+ installed locally.

Configuring AWS Credentials

WorldLens implements Zero-Touch IAM. You do NOT need to manually create users or access keys in the AWS Console for the application to function.

Configure your "Bootstrap" Identity: Ensure your local machine has an AWS profile with administrative permissions (needed for the initial deployment).
```
aws configure
```
Run the Deploy Script:
```
./scripts/deploy.sh
```
What this does automatically:
- Bootstraps your AWS environment (if needed).
- Deploys the main CDK stack.
- Creates a dedicated LocalDevUser with scoped permissions (Bedrock, DynamoDB).
- Generates an Access Key/Secret for this user.
- Extracts the credentials and the WebSocket URL.
- Automatically generates a .env.local file with all required configuration.

After running the script, your environment is fully configured and ready for local development.

Installation

Clone the repository:
```
git clone <repo-url>
cd world-lens
```
Install dependencies:
```
npm install
```
Deploy Infrastructure: WorldLens uses AWS CDK to provision the voice pipeline and DynamoDB tables.
```
chmod +x scripts/deploy.sh
./scripts/deploy.sh
```
This script will bootstrap CDK, deploy the stack, and automatically generate your .env.local file with the correct WebSocket URL and region.

🛠️ Usage

Running Locally

npm run dev

Open http://localhost:3000 on your mobile device (via HTTPS or localhost tunneling) to test the camera features.

Application Modes

Grocery: Set a goal like "Find healthy cereal." The AI will chime and speak when it sees a match.
Document: Point at a document; the AI will read it out and allow you to ask questions.
Medication: Scan a medicine bottle for safety verification and dosage info.
Environment: Continuous safety monitoring for hazards and scene orientation.

Voice Sessions

Tap "Start Voice Session" to establish the long-lived connection.
Use the Mic button to toggle active listening.
Watch the Transcript overlay for real-time speech-to-text feedback.

Debug Panel

Enable the Debug Panel (via the "Show Debug" link at the bottom) to monitor:

Memory: Live state of objects correctly seen.
Latency: Real-time processing speed for vision and voice.
Tool Calls: Inspect the internal reasoning of the AI.
WebSocket Status: Verification of the persistent backend connection.

🧪 Testing

The project includes a comprehensive test suite (45 suites, 118 tests) covering orchestrators, safety guards, and UI components.

npm run test

🗑️ Cleanup

To remove all AWS resources provisioned by the project:

chmod +x scripts/teardown.sh
./scripts/teardown.sh

🛡️ Safety & Privacy

WorldLens is designed as an assistive tool, not a medical or safety-critical life-support system.
Images are processed in-memory and NOT stored by the backend.
Blurry or low-confidence results trigger a "Retry" prompt to avoid hallucinations in sensitive contexts (like medication dosage).

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 286 Commits
conductor		conductor
infrastructure		infrastructure
public		public
scripts		scripts
src		src
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
amplify.yml		amplify.yml
eslint.config.mjs		eslint.config.mjs
idea.md		idea.md
jest.config.mjs		jest.config.mjs
jest.setup.ts		jest.setup.ts
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WorldLens — AI Vision & Voice Assistant

🏗️ Architecture

Key Components:

✨ Features

🚀 Getting Started

Prerequisites

Configuring AWS Credentials

Installation

🛠️ Usage

Running Locally

Application Modes

Voice Sessions

Debug Panel

🧪 Testing

🗑️ Cleanup

🛡️ Safety & Privacy

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

WorldLens — AI Vision & Voice Assistant

🏗️ Architecture

Key Components:

✨ Features

🚀 Getting Started

Prerequisites

Configuring AWS Credentials

Installation

🛠️ Usage

Running Locally

Application Modes

Voice Sessions

Debug Panel

🧪 Testing

🗑️ Cleanup

🛡️ Safety & Privacy

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages