Skip to content

Add Streamlit prototype for sales instructor AI#2

Open
soy-tuber wants to merge 3 commits into
mainfrom
streamlit-prototype
Open

Add Streamlit prototype for sales instructor AI#2
soy-tuber wants to merge 3 commits into
mainfrom
streamlit-prototype

Conversation

@soy-tuber
Copy link
Copy Markdown
Owner

Summary

Implements a Streamlit-based prototype of the sales instructor AI system, enabling document ingestion, multi-database RAG search, and interactive chat with Claude/Gemini. This is a Phase 1–3 implementation of the design specification, deployable directly to Streamlit Cloud.

Key Changes

  • Streamlit UI (streamlit_app.py): Multi-page application with four sections:

    • 💬 Chat: Query selected databases with streaming responses from Claude/Gemini
    • 📥 Data Ingestion: Upload documents (PDF, DOCX, PPTX, XLSX, CSV, TXT, MD) for LLM-driven structuring
    • 📚 Data Management: View, export, import, and delete records across databases
    • ⚙️ Settings: Configure API keys (stored in session state only, never persisted to disk)
  • Database Layer (sales_ai/db_manager.py): SQLite + FTS5 storage with:

    • Automatic schema initialization with external content tables and triggers
    • Trigram tokenizer for Japanese text search
    • Per-schema database files under data/sales_ai/
  • Schema Definitions (sales_ai/schemas.py): Five predefined schemas for sales data:

    • Proposals (提案資料)
    • Sales talks (営業トーク)
    • Reports (レポート)
    • Results (成果データ)
    • Follow-ups (フォロー事例)
  • Ingestion Pipeline (sales_ai/pipeline.py): End-to-end document processing:

    • Text extraction from multiple file formats
    • LLM-driven structuring with schema-aware prompts
    • Automatic type coercion for database insertion
  • LLM Router (sales_ai/llm_router.py): Unified streaming interface for:

    • Anthropic Claude (Opus, Sonnet, Haiku)
    • Google Gemini (2.5 Pro, Flash, Flash Lite)
    • Consistent OpenAI-style message format
  • RAG Engine (sales_ai/rag_engine.py): Multi-database retrieval and context synthesis:

    • Parallel FTS5 search across selected databases
    • BM25-ranked result aggregation
    • Context block generation for LLM prompts
  • Configuration: Added Streamlit config (config.toml) and secrets template

Notable Implementation Details

  • Session-based API key management: Keys are stored only in st.session_state and never written to disk, supporting both Streamlit Cloud Secrets and runtime input
  • Trigram tokenizer: Replaces SQLite's default unicode61 to enable proper Japanese substring matching
  • Streaming responses: Both chat and ingestion use streaming for better UX
  • Data persistence: Database files are volatile on Streamlit Cloud; users can download/upload .db files via the UI for backup/restore
  • Error handling: Graceful fallbacks for missing API keys, failed extractions, and LLM parsing errors

https://claude.ai/code/session_01N4GWxopPmrYavJ9psb6y43

claude added 3 commits May 13, 2026 23:44
Multi-LLM (Claude/Gemini) + multi-DB (5 SQLite+FTS5 schemas) prototype
deployable to Streamlit Cloud Community. Existing FastAPI SoyLM is
untouched; new entry point is streamlit_app.py.

- sales_ai/llm_router.py: streaming chat for Anthropic + google-genai
- sales_ai/db_manager.py: per-schema SQLite with trigram FTS5 for JA
- sales_ai/pipeline.py: PDF/DOCX/PPTX/XLSX/CSV/TXT extraction + LLM JSON
- sales_ai/rag_engine.py: parallel multi-DB retrieval + context build
- streamlit_app.py: chat / ingest / data-management / settings pages
- .streamlit/: config + secrets.toml example (real secrets gitignored)
Streamlit Cloud build was failing on the heavy SoyLM FastAPI side
deps (playwright pulls browser binaries; fastapi/uvicorn/etc. are
unused by the Streamlit prototype). Move them to requirements-soylm.txt
so the prototype installs cleanly on Cloud.
Replace the freeform chat panel with three task-specific feature pages,
each backed by a structured form, champion-case retrieval, and an
output shape fixed by a per-feature system prompt:

- 訪問前ブリーフィング: keyword retrieval over deals_fts, then output
  キーメッセージ/想定質問/反論対応/差別化/準備チェックリスト.
- 提案書作成: bundles full A-E champion context into Opus to emit a
  完全な Markdown 提案書 with [案件 #ID] citations.
- 訪問後フォローアップ: takes a visit report and returns 商談評価/
  強み/リスク/ToDo 表 (期限付き)/フォローメール文案.

Data model: introduce a deals master ("案件マスタ") and move the five
existing schemas (proposals/talks/reports/results/followups) under it
via deal_id FKs. All tables now live in a single sales_ai.db so deal
bundles can be retrieved with a simple join, and FTS5 sticks with the
trigram tokenizer for Japanese substring matching.

Seed: ship 10 illustrative champion deals on first run so the
retrieval/feature pages have something to surface immediately.

Ingestion is now deal-scoped (pipeline.ingest takes deal_id). The data
management page becomes 案件マスタ with new/detail/export tabs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants