ConversAI MVP: AI Driven Digital Human

A voice-first conversational AI application that enables real-time voice interaction with an AI agent using Microsoft's DialoGPT model.

🎯 Project Overview

This Semester 7 MVP validates the core concept of voice-driven AI conversation with the following key features:

Voice Input: Web Speech API for real-time speech-to-text
AI Processing: Microsoft DialoGPT-small for conversation generation
Voice Output: Browser Speech Synthesis for text-to-speech
Conversation History: SQLite database for persistent chat storage
Modern UI: Responsive web interface with real-time feedback

🚀 Quick Start

Prerequisites

Python 3.8+ (recommended: Python 3.10+)
Modern web browser with Web Speech API support (Chrome, Edge, Safari)
Microphone access for voice input

Installation & Setup

Clone and navigate to the project:

git clone <repository-url>
cd ConversAI-MVP

Create and activate virtual environment:

python -m venv venv

# On Windows:
venv\Scripts\activate

# On macOS/Linux:
source venv/bin/activate

Install backend dependencies:

cd backend
pip install -r requirements.txt

Initialize the database:
```
python database.py
```
Test the model (optional but recommended):
```
python test_model.py
```
Start the backend server:
```
python app.py
```
Open the frontend:
- Navigate to http://localhost:5000 in your browser
- Or open frontend/index.html directly in your browser

🎮 Usage

Voice Mode (Recommended)

Click the microphone button 🎤
Speak clearly into your microphone
Wait for the AI to process and respond
The AI will respond both in text and voice

Text Mode (Fallback)

If voice is not supported, use the text input box
Type your message and press Enter or click Send
Receive text and voice responses

🏗️ Architecture

Frontend (HTML/CSS/JS)
├── Web Speech API (STT)
├── Chat Interface
└── Speech Synthesis (TTS)

Backend (Python/Flask)
├── REST API (/api/chat)
├── DialoGPT Model
└── SQLite Database

Data Flow:
Browser Voice → STT → Backend → DialoGPT → Response → TTS → Browser Voice

📁 Project Structure

ConversAI-MVP/
├── frontend/
│   ├── index.html          # Main UI
│   ├── style.css           # Styling
│   └── script.js           # Web Speech API & chat logic
├── backend/
│   ├── app.py              # Flask server
│   ├── model.py            # DialoGPT integration
│   ├── database.py         # SQLite operations
│   ├── test_model.py       # Model testing
│   └── requirements.txt    # Python dependencies
├── .gitignore
└── README.md

🔧 API Endpoints

GET / - Serves the frontend
POST /api/chat - Main chat endpoint
- Input: {"message": "user input", "session_id": "optional"}
- Output: {"reply": "bot response", "session_id": "session_id"}
GET /api/history/<session_id> - Get conversation history
GET /api/health - Health check

🧪 Testing

Model Testing

cd backend
python test_model.py

Manual Testing Checklist

Backend starts without errors
Frontend loads in browser
Microphone button responds to clicks
Voice input is captured and transcribed
AI generates relevant responses
Text-to-speech works
Conversation history is saved to database
Multiple conversation turns work
Error handling works (try speaking when mic is off)

⚙️ Configuration

Model Parameters

Edit backend/model.py to adjust:

max_length: Maximum response length
temperature: Response creativity (0.1-1.0)
top_k: Vocabulary diversity
top_p: Nucleus sampling

Speech Recognition

Edit frontend/script.js to adjust:

recognition.lang: Language setting
utterance.rate: Speech speed
utterance.pitch: Voice pitch

🐛 Troubleshooting

Common Issues

"Model not loading"

Ensure you have sufficient RAM (2GB+ recommended)
Check internet connection for model download
Try running python test_model.py to diagnose

"Speech recognition not working"

Check microphone permissions in browser
Use HTTPS or localhost (required for Web Speech API)
Try Chrome or Edge browser

"Backend connection failed"

Ensure Flask server is running on port 5000
Check for firewall blocking
Verify CORS settings in app.py

"Empty responses"

Check model loading in backend logs
Try simpler inputs first
Verify database is initialized

Performance Tips

First run: Model download may take 5-10 minutes
Response time: 2-6 seconds for local inference
Memory usage: ~2GB RAM for DialoGPT-small
Browser: Chrome/Edge recommended for best Web Speech API support

📊 Success Metrics

✅ Voice input captured and transcribed accurately
✅ AI responses generated within 2-6 seconds
✅ Text-to-speech working smoothly
✅ Conversation history persisted
✅ Multiple conversation turns maintained
✅ Error handling graceful

🔮 Future Enhancements

Semester 8+ Roadmap

Avatar Integration: 3D/2D digital human rendering
Advanced Models: GPT-3.5/4, Claude, or local fine-tuned models
Multi-language Support: Internationalization
Emotion Detection: Voice emotion analysis
Vector Database: Pinecone for long-term memory
Real-time Streaming: WebSocket for faster responses
Mobile App: React Native or Flutter
Cloud Deployment: AWS/Azure with auto-scaling

Technical Upgrades

Backend: FastAPI for better performance
Frontend: React/Vue.js for complex UI
Database: PostgreSQL for production scale
Caching: Redis for session management
Monitoring: Prometheus + Grafana
CI/CD: GitHub Actions for automated testing

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

👥 Team

Product Manager: ConversAI MVP Team
Technical Lead: ConversAI MVP Team
Code Generator: Cursor AI Assistant

📞 Support

For issues and questions:

Check the troubleshooting section above
Review the test results from python test_model.py
Check browser console for frontend errors
Check backend logs for server errors

Happy Conversing! 🤖💬

Built with ❤️ for Semester 7 MVP validation

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
QUICK_START.md		QUICK_START.md
README.md		README.md
setup.py		setup.py
start.py		start.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ConversAI MVP: AI Driven Digital Human

🎯 Project Overview

🚀 Quick Start

Prerequisites

Installation & Setup

🎮 Usage

Voice Mode (Recommended)

Text Mode (Fallback)

🏗️ Architecture

📁 Project Structure

🔧 API Endpoints

🧪 Testing

Model Testing

Manual Testing Checklist

⚙️ Configuration

Model Parameters

Speech Recognition

🐛 Troubleshooting

Common Issues

Performance Tips

📊 Success Metrics

🔮 Future Enhancements

Semester 8+ Roadmap

Technical Upgrades

🤝 Contributing

📝 License

👥 Team

📞 Support

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ConversAI MVP: AI Driven Digital Human

🎯 Project Overview

🚀 Quick Start

Prerequisites

Installation & Setup

🎮 Usage

Voice Mode (Recommended)

Text Mode (Fallback)

🏗️ Architecture

📁 Project Structure

🔧 API Endpoints

🧪 Testing

Model Testing

Manual Testing Checklist

⚙️ Configuration

Model Parameters

Speech Recognition

🐛 Troubleshooting

Common Issues

Performance Tips

📊 Success Metrics

🔮 Future Enhancements

Semester 8+ Roadmap

Technical Upgrades

🤝 Contributing

📝 License

👥 Team

📞 Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages