A real-time voice-based AI tutor built with Pipecat, Sarvam AI (Indian-language STT/TTS), and Google Gemini 2.5 Pro (reasoning LLM).
Based on the Sarvam AI Tutor Agent cookbook.
Student Audio → Sarvam STT (Saaras v3) → Gemini 2.5 Pro → Sarvam TTS (Bulbul v3) → Audio Output
- 🗣️ Multilingual speech recognition — auto-detects Indian languages
- 🧠 Gemini 2.5 Pro reasoning — strong problem-solving for math, science, and more
- 🔊 Natural Indian-English voice — Sarvam Bulbul v3 with clear articulation
- 📚 Multi-subject tutor — Maths, Science, Languages, Social Studies
- 🎯 Adaptive teaching — adjusts explanations to student level
- 🎤 Browser UI — beautiful mic mute/unmute interface with live transcript
- Python 3.9+
- API keys from:
- Sarvam AI — STT & TTS
- Google AI Studio — Gemini LLM
pip install -r requirements.txtcp .env.example .env
# Edit .env and add your real API keyspython3 server.pyThis starts the FastAPI server which:
- Serves the web UI at http://localhost:7860
- Handles WebRTC signaling at
/api/offer - Spawns the tutor bot for each new connection
- Open http://localhost:7860 in your browser
- Click "Connect to Tutor"
- Click the mic button to unmute
- Start speaking — the tutor will respond!
tutor_agent/
├── tutor_agent.py # Main agent — Pipecat pipeline (Sarvam + Gemini)
├── static/
│ └── index.html # Browser UI with mic button
├── requirements.txt # Python dependencies
├── .env.example # API key template
└── README.md
Edit tutor_agent.py:
# Hindi tutor
stt = SarvamSTTService(..., language="hi-IN")
tts = SarvamTTSService(..., target_language_code="hi-IN", speaker="simran")en-IN hi-IN bn-IN ta-IN te-IN gu-IN kn-IN ml-IN mr-IN pa-IN od-IN unknown (auto-detect)
- Female: Ritu, Priya, Neha, Pooja, Simran, Kavya, Ishita (default), Shreya, Roopa, and more
- Male: Shubh, Aditya, Rahul, Rohan, Amit, Dev, and more