Skip to content

Commit f0ba8f0

Browse files
committed
feat: add model selection, replace tinyllama with llama3.2 and qwen2.5
- Add multi-model support: users choose model via /newsession <number> - Available models: Llama 3.2 (1B), Qwen 2.5 (1.5B instruct) - Warn users on deprecated models and prompt to create new session - Fix API Gateway 29s timeout: reduce Ollama timeout to 22s, smart retry - Fix Lambda deployment: include requests dependency in zip - Update README with model management docs and available models table - Update variables.tf default ollama_model to llama3.2:1b
1 parent e9e6fe6 commit f0ba8f0

5 files changed

Lines changed: 183 additions & 61 deletions

File tree

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,8 @@ package/
2929

3030
# lambda
3131
lambda_function.zip
32+
lambda.zip
33+
*.zip
3234

3335
# SSH keys and certificates
3436
*.pem

CHANGELOG.md

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,12 +15,18 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
1515
- `modules/monitoring/` - CloudWatch metric filter and error alarm
1616
- `modules/ec2/` - EC2 instance running Ollama for AI inference
1717
- **Ollama AI Integration**: Self-hosted LLM on EC2 (external API)
18-
- EC2 instance (t3.large) running Ollama with tinyllama model
18+
- EC2 instance (t3.large) running Ollama with multiple models
19+
- Available models: Llama 3.2 (1B), Qwen 2.5 (1.5B instruct)
20+
- Model selection via `/newsession <number>` command
21+
- Unavailable model detection: warns users on deprecated models
1922
- Lambda calls `POST /api/chat` for AI-powered chat responses
2023
- Elastic IP for stable endpoint across stop/start cycles
2124
- Model persistence: S3 sync on shutdown, restore on boot
2225
- Lifecycle management script: `scripts/manage-ollama.sh`
23-
- Error handling with timeouts, structured logging, graceful fallback
26+
- Error handling with timeouts (22s), smart retry on fast failures, graceful fallback
27+
- "Thinking..." indicator with message editing for response delivery
28+
- Rate limiting: prevents duplicate requests while AI is processing
29+
- Update deduplication via update_id tracking in DynamoDB
2430
- **Observability**: Added structured logging and monitoring
2531
- Structured JSON logging in `handler.py` (level, timestamp, action, outcome, request_id)
2632
- CloudWatch metric filter for ERROR-level logs (`{ $.level = "ERROR" }`)
@@ -36,6 +42,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
3642
- **Documentation**: Added observability and remote state sections to README
3743

3844
### Changed
45+
- **Models**: Replaced tinyllama with Llama 3.2 (1B) and added Qwen 2.5 (1.5B instruct)
3946
- **handler.py**: Replaced placeholder with actual Ollama AI integration, updated `/status` and `/help`
4047
- **main.tf**: Migrated from inline resources to module calls, added monitoring and EC2 modules
4148
- **outputs.tf**: Updated outputs to reference module outputs, added monitoring and EC2 outputs

README.md

Lines changed: 45 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,8 @@ This project creates a serverless Telegram bot running on AWS. When users send m
7979
|---------|---------|--------|
8080
| `/start` or `/hello` | Initialize and greet user | ✅ Working |
8181
| `/help` | Show available commands | ✅ Working |
82-
| `/newsession` | Create a new chat session | ✅ Working |
82+
| `/newsession` | Show available models | ✅ Working |
83+
| `/newsession <number>` | Create session with chosen model | ✅ Working |
8384
| `/listsessions` | List all user sessions | ✅ Working |
8485
| `/switch <number>` | Switch to a different session | ✅ Working |
8586
| `/history` | Show recent messages in session | ✅ Working |
@@ -428,7 +429,7 @@ Creates an EC2 instance running Ollama for AI inference.
428429
|----------|-------------|---------|
429430
| `instance_name` | Name tag for the instance | Required |
430431
| `instance_type` | EC2 instance type | `t3.large` |
431-
| `ollama_model` | Model to pull on first boot | `tinyllama` |
432+
| `ollama_model` | Model to pull on first boot | `llama3.2:1b` |
432433
| `models_s3_bucket` | S3 bucket for model persistence | Required |
433434
| `ssh_allowed_cidr` | CIDR for SSH access | `0.0.0.0/0` |
434435

@@ -448,11 +449,21 @@ The bot integrates with [Ollama](https://ollama.com), a self-hosted large langua
448449
| Endpoint | `POST http://<EC2_EIP>:11434/api/chat` |
449450
| Protocol | HTTP (REST) |
450451
| Authentication | API key via `X-API-Key` header (nginx reverse proxy) |
451-
| Request format | JSON: `{"model": "tinyllama", "messages": [...], "stream": false}` |
452+
| Request format | JSON: `{"model": "llama3.2:1b", "messages": [...], "stream": false}` |
452453
| Response format | JSON: `{"message": {"content": "..."}}` |
453454

455+
**Available Models:**
456+
457+
| # | Model | Description | Size |
458+
|---|-------|-------------|------|
459+
| 1 | `llama3.2:1b` | Meta Llama 3.2, fast general-purpose | 1.3 GB |
460+
| 2 | `qwen2.5:1.5b-instruct-q4_K_M` | Alibaba Qwen 2.5, instruction-tuned | 986 MB |
461+
462+
Users select a model when creating a session via `/newsession <number>`. Sessions using removed models show a warning and prompt the user to create a new session.
463+
454464
**Error Handling:**
455-
- Connection timeouts (45s) with structured JSON error logging
465+
- Connection timeouts (22s) with structured JSON error logging
466+
- Smart retry: retries once only on fast connection errors (< 5s), not on timeouts
456467
- HTTP status code validation (non-200 responses return user-friendly error)
457468
- Exception handling with stack traces logged to CloudWatch
458469
- Graceful fallback: bot remains functional even if Ollama is unreachable
@@ -476,8 +487,38 @@ The bot integrates with [Ollama](https://ollama.com), a self-hosted large langua
476487
./scripts/manage-ollama.sh start # Start instance, wait for Ollama API
477488
./scripts/manage-ollama.sh stop # Stop instance (syncs models to S3)
478489
./scripts/manage-ollama.sh status # Check instance and API health
490+
./scripts/manage-ollama.sh ssh # SSH into the instance
491+
```
492+
493+
**Managing Models:**
494+
495+
To add a new model, SSH into the EC2 instance and pull it:
496+
497+
```bash
498+
# SSH into the instance
499+
./scripts/manage-ollama.sh ssh
500+
501+
# Pull a model (must set OLLAMA_HOST since Ollama binds to port 11435)
502+
OLLAMA_HOST=http://127.0.0.1:11435 ollama pull <model_name>
503+
504+
# List installed models
505+
OLLAMA_HOST=http://127.0.0.1:11435 ollama list
506+
507+
# Remove a model
508+
OLLAMA_HOST=http://127.0.0.1:11435 ollama rm <model_name>
509+
```
510+
511+
After pulling a new model, add it to the `AVAILABLE_MODELS` list in `handler.py` and redeploy the Lambda:
512+
513+
```bash
514+
# Rebuild and deploy
515+
cp handler.py /tmp/lambda-build/handler.py
516+
cd /tmp/lambda-build && zip -r /path/to/lambda.zip . -x '__pycache__/*' '*.pyc'
517+
aws lambda update-function-code --function-name telegram-bot --zip-file fileb://lambda.zip
479518
```
480519

520+
> **Note:** Ollama binds to `127.0.0.1:11435` (not the default 11434) because nginx reverse proxy occupies port 11434 for API key authentication. Always set `OLLAMA_HOST=http://127.0.0.1:11435` when using the `ollama` CLI on the instance.
521+
481522
---
482523

483524
## Data Storage

handler.py

Lines changed: 126 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -213,7 +213,7 @@ def get_active_session(user_id: int) -> Optional[Dict[str, Any]]:
213213
return None
214214

215215

216-
def create_session(user_id: int, model_name: str = "tinyllama") -> Dict[str, Any]:
216+
def create_session(user_id: int, model_name: str = "llama3.2:1b") -> Dict[str, Any]:
217217
"""Create a new session, deactivate old active ones."""
218218
session_id = str(uuid.uuid4())
219219
sk = f"MODEL#{model_name}#SESSION#{session_id}"
@@ -263,7 +263,7 @@ def append_to_conversation(session: Dict[str, Any], message_dict: Dict[str, Any]
263263

264264

265265
def call_ollama(model: str, messages: List[Dict[str, Any]]) -> str:
266-
"""Call Ollama API for chat completion."""
266+
"""Call Ollama API for chat completion. Retries once only on fast connection errors."""
267267
if not OLLAMA_URL:
268268
logger.warning("call_ollama", message="OLLAMA_URL not configured")
269269
return "AI service is not configured. Please contact the administrator."
@@ -274,19 +274,32 @@ def call_ollama(model: str, messages: List[Dict[str, Any]]) -> str:
274274
"stream": False
275275
}
276276
headers = {"X-API-Key": OLLAMA_API_KEY} if OLLAMA_API_KEY else {}
277-
try:
278-
resp = requests.post(f"{OLLAMA_URL}/api/chat", json=payload, headers=headers, timeout=45)
279-
if resp.status_code == 200:
280-
data = resp.json()
281-
response_content = data['message']['content']
282-
logger.info("call_ollama", message=f"Response length {len(response_content)} chars")
283-
return response_content
284-
else:
285-
logger.error("call_ollama", message=f"Ollama API error: HTTP {resp.status_code}")
277+
for attempt in range(2):
278+
start = time.time()
279+
try:
280+
resp = requests.post(f"{OLLAMA_URL}/api/chat", json=payload, headers=headers, timeout=22)
281+
if resp.status_code == 200:
282+
data = resp.json()
283+
response_content = data['message']['content']
284+
logger.info("call_ollama", message=f"Response length {len(response_content)} chars")
285+
return response_content
286+
else:
287+
elapsed = time.time() - start
288+
logger.error("call_ollama", message=f"Ollama API error: HTTP {resp.status_code}, attempt {attempt+1}")
289+
# Only retry if failure was fast (connection issue, not slow inference)
290+
if attempt == 0 and elapsed < 5:
291+
time.sleep(1)
292+
continue
293+
return "Sorry, AI response unavailable. Use /status to check connection."
294+
except Exception as e:
295+
elapsed = time.time() - start
296+
logger.error("call_ollama", message=f"Ollama connection error, attempt {attempt+1}", error=e)
297+
# Only retry if failure was fast (connection refused, not timeout)
298+
if attempt == 0 and elapsed < 5:
299+
time.sleep(1)
300+
continue
286301
return "Sorry, AI response unavailable. Use /status to check connection."
287-
except Exception as e:
288-
logger.error("call_ollama", message="Ollama connection error", error=e)
289-
return "Sorry, AI response unavailable. Use /status to check connection."
302+
return "Sorry, AI response unavailable. Use /status to check connection."
290303

291304

292305
# ==================== ARCHIVE FUNCTIONS ====================
@@ -426,20 +439,33 @@ def import_archive_to_s3(user_id: int, archive_data: Dict[str, Any]) -> Optional
426439

427440
# ==================== COMMAND HANDLERS ====================
428441

442+
AVAILABLE_MODELS = [
443+
{"id": "llama3.2:1b", "name": "Llama 3.2", "desc": "Meta's latest, fast (1B)"},
444+
{"id": "qwen2.5:1.5b-instruct-q4_K_M", "name": "Qwen 2.5", "desc": "Instruction-tuned (1.5B)"},
445+
]
446+
447+
def format_model_list() -> str:
448+
"""Format available models as a numbered list."""
449+
lines = []
450+
for i, m in enumerate(AVAILABLE_MODELS):
451+
lines.append(f"{i+1}. {m['name']} - {m['desc']}")
452+
return "\n".join(lines)
453+
429454
def handle_command(cmd: str, payload: str, chat_id: int, user_id: int, update_id: int) -> str:
430455
"""Handle bot commands."""
431456
logger.info("handle_command", message=f"Command: {cmd}", command=cmd)
432457

433458
if cmd == "/start" or cmd == "/hello":
434459
session = get_current_session(user_id)
435-
resp = f"Hello! Your current model is {session['model_name']}. Chat away or use /help."
460+
resp = f"Hello! Your current model is {session['model_name']}.\n\nAvailable models:\n{format_model_list()}\n\nUse /newsession <number> to start a session with a specific model.\nChat away or use /help."
436461
send_message(chat_id, resp)
437462
return "start_or_hello"
438463

439464
if cmd == "/help":
440-
resp = """Commands:
441-
/start or /hello - Greeting and session init
442-
/newsession - Start a new chat session
465+
resp = f"""Commands:
466+
/start or /hello - Greeting and session info
467+
/newsession - Show available models
468+
/newsession <number> - Start session with chosen model
443469
/listsessions - List your sessions
444470
/switch <number> - Switch to a session (e.g., /switch 1)
445471
/history - Show recent messages in current session
@@ -453,6 +479,9 @@ def handle_command(cmd: str, payload: str, chat_id: int, user_id: int, update_id
453479
/export <number> - Export an archive as a file
454480
(Send a JSON file to import an archive)
455481
482+
Available models:
483+
{format_model_list()}
484+
456485
Send any text message to chat with the AI model."""
457486
send_message(chat_id, resp)
458487
return "help"
@@ -475,10 +504,27 @@ def handle_command(cmd: str, payload: str, chat_id: int, user_id: int, update_id
475504
return "status"
476505

477506
if cmd == "/newsession":
478-
new_session = create_session(user_id)
479-
resp = f"New session created with model '{new_session['model_name']}' (ID: {new_session['session_id'][:8]})."
480-
send_message(chat_id, resp)
481-
return "newsession"
507+
if not payload.strip():
508+
resp = f"Choose a model for your new session:\n{format_model_list()}\n\nUsage: /newsession <number> (e.g., /newsession 1)"
509+
send_message(chat_id, resp)
510+
return "newsession_list"
511+
try:
512+
idx = int(payload.strip()) - 1
513+
if 0 <= idx < len(AVAILABLE_MODELS):
514+
model_id = AVAILABLE_MODELS[idx]["id"]
515+
model_name_display = AVAILABLE_MODELS[idx]["name"]
516+
new_session = create_session(user_id, model_name=model_id)
517+
resp = f"New session created with {model_name_display} ({model_id}).\nSession ID: {new_session['session_id'][:8]}"
518+
send_message(chat_id, resp)
519+
return "newsession"
520+
else:
521+
resp = f"Invalid choice. Available models:\n{format_model_list()}\n\nUsage: /newsession <number>"
522+
send_message(chat_id, resp)
523+
return "invalid_newsession"
524+
except ValueError:
525+
resp = f"Invalid choice. Available models:\n{format_model_list()}\n\nUsage: /newsession <number>"
526+
send_message(chat_id, resp)
527+
return "invalid_newsession"
482528

483529
if cmd == "/listsessions":
484530
items = get_user_items(user_id)
@@ -711,7 +757,7 @@ def handle_document(document: Dict[str, Any], chat_id: int, user_id: int) -> str
711757
return "imported"
712758

713759

714-
def handle_message(text: str, chat_id: int, user_id: int, update_id: int, document: Optional[Dict[str, Any]] = None) -> str:
760+
def handle_message(text: str, chat_id: int, user_id: int, update_id: int, document: Optional[Dict[str, Any]] = None, thinking_msg_id: Optional[int] = None) -> str:
715761
"""Handle incoming messages: commands, chat, or documents."""
716762

717763
if document:
@@ -742,10 +788,25 @@ def handle_message(text: str, chat_id: int, user_id: int, update_id: int, docume
742788
session = get_current_session(user_id)
743789
now = int(time.time())
744790

791+
# Check if session's model is still available
792+
available_model_ids = {m["id"] for m in AVAILABLE_MODELS}
793+
session_model = session.get("model_name", "")
794+
if session_model not in available_model_ids:
795+
available_list = format_model_list()
796+
warn_msg = f"The model '{session_model}' is no longer available.\n\nPlease create a new session with an available model:\n{available_list}\n\nUsage: /newsession <number>"
797+
if thinking_msg_id:
798+
edit_message(chat_id, thinking_msg_id, warn_msg)
799+
else:
800+
send_message(chat_id, warn_msg)
801+
return "model_unavailable"
802+
745803
# Check if a request is already being processed (within last 55 seconds)
746804
pending_since = session.get("pending_request_ts", 0)
747805
if pending_since and (now - pending_since) < 55:
748-
send_message(chat_id, "Please wait, still generating a response to your previous message...")
806+
if thinking_msg_id:
807+
edit_message(chat_id, thinking_msg_id, "Please wait, still generating a response to your previous message...")
808+
else:
809+
send_message(chat_id, "Please wait, still generating a response to your previous message...")
749810
return "rate_limited"
750811

751812
user_msg = {"role": "user", "content": text, "ts": now}
@@ -755,35 +816,32 @@ def handle_message(text: str, chat_id: int, user_id: int, update_id: int, docume
755816
session['pending_request_ts'] = now
756817
table.put_item(Item=session)
757818

758-
# Send "Thinking..." message so user knows the bot is working
759-
thinking_resp = send_message(chat_id, "Thinking...")
760-
thinking_msg_id = None
761-
if thinking_resp and thinking_resp.get("ok"):
762-
thinking_msg_id = thinking_resp["result"]["message_id"]
763-
764-
# Build conversation context for Ollama (strip ts field, limit to last 10 messages)
765-
all_msgs = [
766-
{"role": m["role"], "content": m["content"]}
767-
for m in session.get("conversation", [])
768-
if "role" in m and "content" in m
769-
]
770-
messages = all_msgs[-10:]
771-
772-
# Call Ollama for AI response
773-
model_name = session.get("model_name", "tinyllama")
774-
ai_response = call_ollama(model_name, messages)
775-
776-
# Clear pending flag
777-
session['pending_request_ts'] = 0
778-
ass_msg = {"role": "assistant", "content": ai_response, "ts": int(time.time())}
779-
append_to_conversation(session, ass_msg)
780-
781-
# Replace "Thinking..." with the actual response
782-
if thinking_msg_id:
783-
edit_message(chat_id, thinking_msg_id, ai_response)
784-
else:
785-
send_message(chat_id, ai_response)
786-
return "ai_response"
819+
try:
820+
# Build conversation context for Ollama (strip ts field, limit to last 10 messages)
821+
all_msgs = [
822+
{"role": m["role"], "content": m["content"]}
823+
for m in session.get("conversation", [])
824+
if "role" in m and "content" in m
825+
]
826+
messages = all_msgs[-10:]
827+
828+
# Call Ollama for AI response
829+
model_name = session.get("model_name", "tinyllama")
830+
ai_response = call_ollama(model_name, messages)
831+
832+
ass_msg = {"role": "assistant", "content": ai_response, "ts": int(time.time())}
833+
append_to_conversation(session, ass_msg)
834+
835+
# Replace "Thinking..." with the actual response
836+
if thinking_msg_id:
837+
edit_message(chat_id, thinking_msg_id, ai_response)
838+
else:
839+
send_message(chat_id, ai_response)
840+
return "ai_response"
841+
finally:
842+
# Always clear pending flag, even on unexpected errors
843+
session['pending_request_ts'] = 0
844+
table.put_item(Item=session)
787845

788846

789847
def process_telegram_update(update: Dict[str, Any]) -> Dict[str, Any]:
@@ -808,18 +866,32 @@ def process_telegram_update(update: Dict[str, Any]) -> Dict[str, Any]:
808866
logger.warning("process_update", outcome="skipped", message="No chat_id in update")
809867
return {"processed": False, "reason": "no_chat_id"}
810868

869+
# For chat messages (not commands, not documents), send "Thinking..." immediately
870+
thinking_msg_id = None
871+
if text and not text.strip().startswith('/') and not document:
872+
thinking_resp = send_message(chat_id, "Thinking...")
873+
if thinking_resp and thinking_resp.get("ok"):
874+
thinking_msg_id = thinking_resp["result"]["message_id"]
875+
811876
# Deduplicate: check if this update_id was already processed
812877
try:
813878
dedup_resp = table.get_item(Key={'pk': OFFSET_PK, 'sk': f'update#{update_id}'})
814879
if 'Item' in dedup_resp:
815880
logger.info("process_update", outcome="skipped", message=f"Duplicate update_id={update_id}")
881+
# Delete the "Thinking..." message if we sent one
882+
if thinking_msg_id:
883+
try:
884+
requests.post(f"{TELEGRAM_API}/deleteMessage",
885+
json={"chat_id": chat_id, "message_id": thinking_msg_id}, timeout=5)
886+
except Exception:
887+
pass
816888
return {"processed": False, "reason": "duplicate"}
817889
# Mark this update_id as processed
818890
table.put_item(Item={'pk': OFFSET_PK, 'sk': f'update#{update_id}', 'ts': int(time.time())})
819891
except Exception:
820892
pass # If dedup check fails, process anyway
821893

822-
handle_result = handle_message(text, chat_id, user_id, update_id, document)
894+
handle_result = handle_message(text, chat_id, user_id, update_id, document, thinking_msg_id)
823895

824896
return {
825897
"processed": True,

0 commit comments

Comments
 (0)