Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
136 changes: 60 additions & 76 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,117 +4,101 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co

## Project Overview

InnieMe is a Discord bot that provides AI-powered Q&A capabilities using document knowledge bases. The bot scans and vectorizes documents from specified directories, connects to Discord channels, and responds to user mentions with context-aware responses using OpenAI's GPT models.
InnieMe is a multi-platform bot (Discord and Slack) that provides AI-powered Q&A using document knowledge bases. It scans and vectorizes documents from configured directories and responds to user mentions with context-aware answers via OpenAI's GPT models.

## Common Development Commands
## Development Commands

### Installation and Setup
```bash
# Install dependencies
# Install
pip install -e .
pip install -r requirements-dev.txt

# Create configuration from example
cp config.example.yaml config.yaml
# Edit config.yaml with your Discord token, OpenAI API key, and channel settings
```
# Run Discord bot (default: config.yaml)
innieme discord
innieme discord -c custom_config.yaml

### Running the Bot
```bash
# Run the bot (main entry point)
innieme_bot
# Run Slack bot (default: slack_config.yaml)
innieme slack

# Or run directly with Python
python src/innieme/cli/run_bot.py
```

### Testing
```bash
# Run all tests
pytest

# Run tests with coverage
# Run a single test file
pytest tests/test_slack_bot.py

# Run a specific test
pytest tests/test_slack_bot.py::test_identify_topic

# With coverage
pytest --cov=src/innieme

# Run async tests specifically
pytest -k "async" --asyncio-mode=strict
# Format / lint
black src/ tests/
isort src/ tests/
flake8 src/ tests/
```

### Code Quality
```bash
# Format code
black src/ tests/
## Architecture

# Sort imports
isort src/ tests/
### Data flow

# Lint code
flake8 src/ tests/
1. On startup, each `Topic` runs `scan_and_vectorize()` — documents in `docs_dir` are chunked (1000 chars, 200 overlap) and stored in an **in-memory** Chroma collection. A new collection is created every startup; there is no persistence.
2. On mention/message, the bot identifies the `Topic` for that channel, retrieves the `thread_id`, and calls `Topic.process_query()`.
3. `ConversationEngine` does a similarity search (top-5 chunks) and constructs an OpenAI chat completion with: system prompt (`role`) + matched doc chunks + thread conversation history.
4. OpenAI model: `gpt-3.5-turbo`, `temperature=0.1`, `max_tokens=1000`.

### Config → runtime object hierarchy

Both `DiscordBotConfig` and `SlackBotConfig` share the same YAML shape:

```
BotConfig
└── outies: List[OutieConfig] # admins
└── topics: List[TopicConfig] # knowledge domains
└── channels: List[ChannelConfig] # where bot listens
```

## Architecture Overview
Pydantic models wire **back-references** via `model_validator(mode='after')`: `OutieConfig.bot`, `TopicConfig.outie`, `ChannelConfig.topic`. This lets any nested object reach its parent config without extra arguments being passed around.

The application follows a modular architecture with clear separation of concerns:
Key difference between platforms:
- Discord: `outie_id`, `guild_id`, `channel_id` are **integers**; thread IDs are Discord integer IDs.
- Slack: `outie_id` (`U...`), `channel_id` (`C...`) are **strings**; thread IDs are Slack message timestamps (strings).

### Core Components
`ConversationEngine` imports `TopicConfig` from `discord_bot_config` — this is shared by both platforms since the config shapes are identical.

1. **DiscordBot** (`src/innieme/discord_bot.py`): Main bot interface that handles Discord events, commands, and message routing
2. **Innie** (`src/innieme/innie.py`): Container class that manages multiple topics and their configurations
3. **Topic** (`src/innieme/innie.py`): Represents a single topic with its own document store, channels, and conversation engine
4. **ConversationEngine** (`src/innieme/conversation_engine.py`): Handles query processing and response generation using OpenAI
5. **DocumentProcessor** (`src/innieme/document_processor.py`): Manages document scanning, vectorization, and similarity search
6. **KnowledgeManager** (`src/innieme/knowledge_manager.py`): Handles conversation summarization and knowledge base storage
### Channel → Topic routing

### Factory Pattern Components
Both `DiscordBot` and `SlackBot` build a `defaultdict[channel_id, List[Topic]]` at init time. `_identify_topic(channel_id)` returns the first topic for a channel. A channel can theoretically map to multiple topics but only the first is used.

- **EmbeddingsFactory** (`src/innieme/embeddings_factory.py`): Creates embedding instances (OpenAI, HuggingFace, or Fake)
- **VectorStoreFactory** (`src/innieme/vector_store_factory.py`): Creates vector store instances (Chroma, FAISS)
### Thread tracking

### Configuration System
`Topic.active_threads` is a set of thread IDs the bot is actively following. When a user mentions the bot, the thread ID is added to this set. Subsequent messages in that thread are then answered automatically without needing another mention.

The bot uses YAML configuration (`config.yaml`) with the following structure:
- Multiple "outies" (administrators) can be defined
- Each outie can have multiple topics
- Each topic has its own role/system prompt, document directory, and Discord channels
- Configuration is loaded via `DiscordBotConfig` class
### Embedding model selection

### Bot Behavior
Configured via `embedding_model` in the YAML. Options: `openai`, `huggingface`, `fake`. Use `fake` in tests to avoid real API calls. The vector store backend is **hardcoded to Chroma** in `innie.py` (`FAISSVectorStoreFactory` is present but commented out).

- Bot responds when mentioned in Discord channels
- Creates threaded conversations for each interaction
- Follows threads where it was mentioned or initially responded
- Supports admin commands like "summary and file" and "please consult outie"
- Admin can approve summaries to add to the knowledge base
### KnowledgeManager (partial implementation)

## Key Implementation Details
`KnowledgeManager.generate_summary()` is a placeholder — it returns a static string, not an actual LLM summary. `store_summary()` saves the pending summary to `./data/summaries/` as JSON. The approval workflow (`!approve` / `/approve`) exists but thread tracking for approval is not fully wired in the Slack bot.

### Thread Management
- New mentions create threads automatically
- Bot tracks active threads in `Topic.active_threads` set
- Thread context is preserved for conversation continuity
### Response length limits

### Document Processing
- Documents are vectorized using configurable embedding models
- Vector stores support both Chroma and FAISS backends
- Document search provides context for LLM responses
- Discord: 2000 chars — overflow sent as `response.txt` file attachment.
- Slack: 4000 chars — overflow uploaded via `files_upload`.

### Response Generation
- Uses OpenAI GPT-3.5-turbo model by default
- Combines document context with conversation history
- Handles responses longer than Discord's 2000 character limit by sending as files
## Configuration

### Error Handling
- Comprehensive logging with configurable levels (LOG_LEVEL, INNIEME_LOG_LEVEL environment variables)
- Graceful error messages sent to Discord users
- Exception re-raising for debugging purposes
Copy the example files and fill in credentials:

## Testing Configuration
```bash
cp config.example.yaml config.yaml # Discord
cp slack_config.example.yaml slack_config.yaml # Slack
```

Tests are configured for async support with `pytest.ini` settings:
- `asyncio_mode = strict`
- Various warning filters for dependencies (faiss, pydantic, numpy, etc.)
The `role` field in each topic is the LLM system prompt. Each topic has its own `docs_dir` and set of channels.

## Environment Variables

- `LOG_LEVEL`: Global logging level (default: INFO)
- `INNIEME_LOG_LEVEL`: Package-specific logging level (default: INFO)
- `LOG_LEVEL`: Root logger level (default: `INFO`)
- `INNIEME_LOG_LEVEL`: `innieme` package logger level (default: `INFO`)
2 changes: 2 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@ dev = [

[project.scripts]
innieme_bot = "innieme.cli.run_bot:main"
innieme_slack_bot = "innieme.cli.run_slack_bot:main"
innieme = "innieme.cli.run_unified_bot:main"

[tool.setuptools]
py-modules = []
Expand Down
3 changes: 2 additions & 1 deletion requirements.in
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# Discord Integration
# Platform Integrations
discord.py
slack-bolt

# Environment Configuration
python-dotenv
Expand Down
4 changes: 4 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -376,6 +376,10 @@ six==1.17.0
# kubernetes
# posthog
# python-dateutil
slack-bolt==1.23.0
# via -r requirements.in
slack-sdk==3.36.0
# via slack-bolt
sniffio==1.3.1
# via
# anyio
Expand Down
53 changes: 53 additions & 0 deletions slack_config.example.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Slack Bot Configuration Example
# Copy this to slack_config.yaml and fill in your actual values

# Slack Bot Token (starts with xoxb-)
# Get this from your Slack app's OAuth & Permissions page
slack_bot_token: "xoxb-your-bot-token-here"

# Slack App Token (starts with xapp-)
# Get this from your Slack app's Basic Information page (App-Level Tokens section)
# Required for Socket Mode
slack_app_token: "xapp-your-app-token-here"

# OpenAI API Key
openai_api_key: "your-openai-api-key"

# Embedding model to use: openai, huggingface, or fake
embedding_model: "openai"

# Bot administrators and their topics
outies:
- outie_id: "U1234567890" # Slack User ID (starts with U)
topics:
- name: "math"
role: "You are a helpful math tutor. Answer math questions clearly and provide step-by-step explanations."
docs_dir: "./docs/math" # Directory containing math documents
channels:
- channel_id: "C1234567890" # Slack Channel ID (starts with C)

- name: "science"
role: "You are a knowledgeable science teacher. Explain scientific concepts in an accessible way."
docs_dir: "./docs/science"
channels:
- channel_id: "C0987654321"

- outie_id: "U0987654321" # Another administrator
topics:
- name: "support"
role: "You are a helpful customer support agent. Be friendly and solution-oriented."
docs_dir: "./docs/support"
channels:
- channel_id: "C5555555555"

# To get Slack IDs:
# - User IDs: Right-click on a user in Slack → Copy → Member ID
# - Channel IDs: Right-click on a channel → Copy → Channel ID
#
# To set up your Slack app:
# 1. Go to https://api.slack.com/apps
# 2. Create a new app "From scratch"
# 3. Add Bot Token Scopes: app_mentions:read, channels:history, channels:read, chat:write, files:write, im:history, im:read, users:read
# 4. Enable Socket Mode and create an App-Level Token with connections:write scope
# 5. Install the app to your workspace
# 6. Invite the bot to your channels: /invite @your-bot-name
37 changes: 37 additions & 0 deletions src/innieme/cli/run_slack_bot.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
from innieme.slack_bot import SlackBot
from innieme.slack_bot_config import SlackBotConfig

import logging
import os

# Configure root logger with LOG_LEVEL
log_level_name = os.environ.get("LOG_LEVEL", "INFO")
log_level = getattr(logging, log_level_name.upper(), logging.INFO)

logging.basicConfig(
level=log_level,
format='%(asctime)s %(levelname)-8s %(name)s %(message)s',
datefmt='%Y-%m-%d %H:%M:%S'
)

# Configure innieme package logger with INNIEME_LOG_LEVEL
innieme_log_level_name = os.environ.get("INNIEME_LOG_LEVEL", "INFO")
innieme_log_level = getattr(logging, innieme_log_level_name.upper(), logging.INFO)
innieme_logger = logging.getLogger("innieme")
innieme_logger.setLevel(innieme_log_level)

# Load environment variables
current_dir = os.getcwd()
yaml_path = os.path.join(current_dir, 'slack_config.yaml')
with open(yaml_path, "r") as yaml_file:
yaml_content = yaml_file.read()
config = SlackBotConfig.from_yaml(yaml_content)
print(f"Loaded config from {yaml_path}")

def main():
# Create and run the bot
bot = SlackBot(config)
bot.run()

if __name__ == "__main__":
main()
Loading
Loading