Add Streamlit prototype for sales instructor AI by soy-tuber · Pull Request #2 · soy-tuber/SoyLM

soy-tuber · 2026-05-14T00:02:07Z

Summary

Implements a Streamlit-based prototype of the sales instructor AI system, enabling document ingestion, multi-database RAG search, and interactive chat with Claude/Gemini. This is a Phase 1–3 implementation of the design specification, deployable directly to Streamlit Cloud.

Key Changes

Streamlit UI (streamlit_app.py): Multi-page application with four sections:
- 💬 Chat: Query selected databases with streaming responses from Claude/Gemini
- 📥 Data Ingestion: Upload documents (PDF, DOCX, PPTX, XLSX, CSV, TXT, MD) for LLM-driven structuring
- 📚 Data Management: View, export, import, and delete records across databases
- ⚙️ Settings: Configure API keys (stored in session state only, never persisted to disk)
Database Layer (sales_ai/db_manager.py): SQLite + FTS5 storage with:
- Automatic schema initialization with external content tables and triggers
- Trigram tokenizer for Japanese text search
- Per-schema database files under data/sales_ai/
Schema Definitions (sales_ai/schemas.py): Five predefined schemas for sales data:
- Proposals (提案資料)
- Sales talks (営業トーク)
- Reports (レポート)
- Results (成果データ)
- Follow-ups (フォロー事例)
Ingestion Pipeline (sales_ai/pipeline.py): End-to-end document processing:
- Text extraction from multiple file formats
- LLM-driven structuring with schema-aware prompts
- Automatic type coercion for database insertion
LLM Router (sales_ai/llm_router.py): Unified streaming interface for:
- Anthropic Claude (Opus, Sonnet, Haiku)
- Google Gemini (2.5 Pro, Flash, Flash Lite)
- Consistent OpenAI-style message format
RAG Engine (sales_ai/rag_engine.py): Multi-database retrieval and context synthesis:
- Parallel FTS5 search across selected databases
- BM25-ranked result aggregation
- Context block generation for LLM prompts
Configuration: Added Streamlit config (config.toml) and secrets template

Notable Implementation Details

Session-based API key management: Keys are stored only in st.session_state and never written to disk, supporting both Streamlit Cloud Secrets and runtime input
Trigram tokenizer: Replaces SQLite's default unicode61 to enable proper Japanese substring matching
Streaming responses: Both chat and ingestion use streaming for better UX
Data persistence: Database files are volatile on Streamlit Cloud; users can download/upload .db files via the UI for backup/restore
Error handling: Graceful fallbacks for missing API keys, failed extractions, and LLM parsing errors

https://claude.ai/code/session_01N4GWxopPmrYavJ9psb6y43

Multi-LLM (Claude/Gemini) + multi-DB (5 SQLite+FTS5 schemas) prototype deployable to Streamlit Cloud Community. Existing FastAPI SoyLM is untouched; new entry point is streamlit_app.py. - sales_ai/llm_router.py: streaming chat for Anthropic + google-genai - sales_ai/db_manager.py: per-schema SQLite with trigram FTS5 for JA - sales_ai/pipeline.py: PDF/DOCX/PPTX/XLSX/CSV/TXT extraction + LLM JSON - sales_ai/rag_engine.py: parallel multi-DB retrieval + context build - streamlit_app.py: chat / ingest / data-management / settings pages - .streamlit/: config + secrets.toml example (real secrets gitignored)

Streamlit Cloud build was failing on the heavy SoyLM FastAPI side deps (playwright pulls browser binaries; fastapi/uvicorn/etc. are unused by the Streamlit prototype). Move them to requirements-soylm.txt so the prototype installs cleanly on Cloud.

Replace the freeform chat panel with three task-specific feature pages, each backed by a structured form, champion-case retrieval, and an output shape fixed by a per-feature system prompt: - 訪問前ブリーフィング: keyword retrieval over deals_fts, then output キーメッセージ/想定質問/反論対応/差別化/準備チェックリスト. - 提案書作成: bundles full A-E champion context into Opus to emit a 完全な Markdown 提案書 with [案件 #ID] citations. - 訪問後フォローアップ: takes a visit report and returns 商談評価/ 強み/リスク/ToDo 表 (期限付き)/フォローメール文案. Data model: introduce a deals master ("案件マスタ") and move the five existing schemas (proposals/talks/reports/results/followups) under it via deal_id FKs. All tables now live in a single sales_ai.db so deal bundles can be retrieved with a simple join, and FTS5 sticks with the trigram tokenizer for Japanese substring matching. Seed: ship 10 illustrative champion deals on first run so the retrieval/feature pages have something to surface immediately. Ingestion is now deal-scoped (pipeline.ingest takes deal_id). The data management page becomes 案件マスタ with new/detail/export tabs.

claude added 3 commits May 13, 2026 23:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Streamlit prototype for sales instructor AI#2

Add Streamlit prototype for sales instructor AI#2
soy-tuber wants to merge 3 commits into
mainfrom
streamlit-prototype

soy-tuber commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

soy-tuber commented May 14, 2026

Summary

Key Changes

Notable Implementation Details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants