This guide will help you install a multimodal Math Tutor Bot using Qwen2.5-Omni-3B. The bot supports:
- Text conversations
- Voice input/output
- Image understanding (diagrams, graphs, etc.)
- Size: ~12GB download
- Features: All capabilities (text, speech, images)
- Requirements: 16GB RAM, 15GB disk space
- Best for: Those with sufficient resources
- Size: ~4GB download
- Features: Text + separate TTS/STT
- Requirements: 8GB RAM, 5GB disk space
- Best for: Limited resources or no GPU
Choose conda (recommended) or venv:
Using conda:
conda create -n tutor python=3.11 -y
conda activate tutorUsing venv:
python -m venv tutor
source tutor/bin/activate # On Windows: tutor\Scripts\activatepip install torch transformers accelerate
pip install qwen-omni-utils soundfile
pip install openai-whisper
pip install kokoro>=0.9.2Run the installation script:
python install_qwen_full.pyThis will:
- Download ~12GB of model files (first time only)
- Load the model into memory
- Run a test to verify installation
- Save model in
~/.cache/huggingface/
Alternative: Manual Installation
from transformers import Qwen2_5OmniForConditionalGeneration, Qwen2_5OmniProcessor
import torch
# Load model (downloads ~12GB on first run)
model = Qwen2_5OmniForConditionalGeneration.from_pretrained(
"Qwen/Qwen2.5-Omni-3B",
torch_dtype="auto",
device_map="auto" # Uses GPU if available, else CPU
)
processor = Qwen2_5OmniProcessor.from_pretrained("Qwen/Qwen2.5-Omni-3B")
print("Model loaded successfully!")python simple_tutor_example.pyExpected output:
Loading model...
Model loaded!
Question: How do I solve 2x + 5 = 13?
Answer: [Model's step-by-step explanation]
conda create -n tutor python=3.11 -y
conda activate tutorpip install transformers torch
pip install kokoro>=0.9.2 soundfile
pip install openai-whisperfrom transformers import AutoModelForCausalLM, AutoTokenizer
from kokoro import KPipeline
import whisper
# 1. Text generation: Qwen2.5-1.5B (~3GB)
text_model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen2.5-1.5B-Instruct",
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-1.5B-Instruct")
# 2. Text-to-Speech: Kokoro (~200MB)
tts = KPipeline(lang_code='a')
# 3. Speech-to-Text: Whisper tiny (~150MB)
stt = whisper.load_model("tiny")python full_tutor_lightweight.pyIf you have limited GPU memory, use quantization:
pip install bitsandbytes
python install_qwen_with_quantization.pyThis reduces memory from 12GB to ~3GB.
Manual quantization:
from transformers import BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.float16
)
model = Qwen2_5OmniForConditionalGeneration.from_pretrained(
"Qwen/Qwen2.5-Omni-3B",
quantization_config=quantization_config,
device_map="auto"
)- OS: Windows, macOS, or Linux
- Python: 3.10 or 3.11
- RAM: 16GB minimum (8GB with quantization)
- Disk: 15GB free space
- GPU: Optional (8GB VRAM recommended for better performance)
- Internet: For initial download only
- OS: Windows, macOS, or Linux
- Python: 3.10 or 3.11
- RAM: 8GB minimum
- Disk: 5GB free space
- GPU: Optional
- Internet: For initial download only
Create test.py:
from transformers import Qwen2_5OmniForConditionalGeneration, Qwen2_5OmniProcessor
print("Loading model...")
model = Qwen2_5OmniForConditionalGeneration.from_pretrained(
"Qwen/Qwen2.5-Omni-3B",
torch_dtype="auto",
device_map="auto"
)
processor = Qwen2_5OmniProcessor.from_pretrained("Qwen/Qwen2.5-Omni-3B")
print("✓ Model loaded!")
conversation = [{"role": "user", "content": "What is 2+2?"}]
text_input = processor.apply_chat_template(conversation, add_generation_prompt=True, tokenize=False)
inputs = processor(text=text_input, return_tensors="pt").to(model.device)
generated_ids = model.generate(**inputs, max_new_tokens=50)
response = processor.batch_decode(generated_ids[:, inputs['input_ids'].shape[1]:], skip_special_tokens=True)[0]
print(f"Test: What is 2+2?")
print(f"Answer: {response}")
print("\n✓ Everything works!")Run:
python test.pySolutions:
- Use quantization:
python install_qwen_with_quantization.py - Use lightweight version: See Option B above
- Close other applications
- Use smaller model: Qwen2.5-1.5B instead of 3B
Solution:
pip install transformers accelerateSolutions:
- Be patient - 12GB takes 30-60 minutes
- Check internet connection
- Download will resume if interrupted
- Models cache in
~/.cache/huggingface/
Solutions:
# Use CPU instead
device_map="cpu"
# Or use half precision
torch_dtype=torch.float16Solution:
# Check internet connection
# Verify HuggingFace is accessible
# Try manual download from https://huggingface.co/Qwen/Qwen2.5-Omni-3BAfter installation, try these examples:
python simple_tutor_example.pypython qwen_usage_guide.pypython test_qwen_omni.pypython test_kokoro.py- Test basic functionality: Run
simple_tutor_example.py - Explore all features: Read
qwen_usage_guide.py - Customize system prompt: Edit teaching style
- Build frontend: Web app, desktop app, or CLI
- Add features: Progress tracking, quizzes, etc.
- Python environment created and activated
- All dependencies installed via pip
- Model downloaded successfully (~12GB or ~4GB)
- Test script runs without errors
- Model responds to "What is 2+2?" correctly
- Example scripts execute successfully
# Create environment
conda create -n tutor python=3.11 -y
conda activate tutor
# Install dependencies
pip install torch transformers accelerate
pip install qwen-omni-utils soundfile
pip install openai-whisper
pip install kokoro>=0.9.2
# Install model
python install_qwen_full.py
# Test installation
python simple_tutor_example.py# Create environment
conda create -n tutor python=3.11 -y
conda activate tutor
# Install dependencies
pip install transformers torch
pip install kokoro>=0.9.2 soundfile
pip install openai-whisper
# Run lightweight version
python full_tutor_lightweight.py- Qwen2.5-Omni Documentation: https://github.com/QwenLM/Qwen2.5-Omni
- Model on HuggingFace: https://huggingface.co/Qwen/Qwen2.5-Omni-3B
- Kokoro TTS: https://huggingface.co/hexgrad/Kokoro-82M
- Whisper STT: https://github.com/openai/whisper
- Transformers Docs: https://huggingface.co/docs/transformers
If you encounter issues:
- Check the troubleshooting section above
- Verify system requirements
- Review error messages carefully
- Check the example scripts for reference
- Ensure all dependencies are installed
You're ready to build your Math Tutor Bot!