Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,10 @@ OPENROUTER_API_KEY=
# Web search
BRAVE_API_KEY=

# Paperclip — biomedical paper search (8M+ papers from bioRxiv, medRxiv, PMC, arXiv)
# Get API key at https://paperclip.gxl.ai
# PAPERCLIP_API_KEY=

# GitHub integration
GITHUB_TOKEN=

Expand Down
57 changes: 57 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,62 @@
# Changelog

## v0.6.0

MCP server fixes, per-server mode configuration, @ mention system for referencing resources, parallel file inspection tool, conversation title generation improvements, and security hardening.

### MCP Server Fixes
- **Celery worker MCP loading** -- MCP servers are now loaded in the Celery background worker path, matching the inline session path. Previously MCP tools were completely unavailable when using background jobs
- **Multi-server dispatch** -- Tools from multiple MCP servers now dispatch to their correct originating client. Previously `self._mcp_client` was overwritten by each server, causing tools from earlier servers to fail at execution time
- **Plan mode MCP access** -- MCP tools are no longer blocked by the plan-mode whitelist. Each tool is tracked with its own mode configuration and bypasses the built-in tool restrictions
- **Exception logging** -- `register_mcp_tools` now logs warnings on failure instead of silently swallowing exceptions with bare `except: pass`
- **Tool name collision logging** -- MCP tools that attempt to shadow built-in tool names are logged with a warning for security observability
- **Connection timeout** -- MCP server connections in the Celery worker are wrapped with a 30-second timeout to prevent hanging workers
- **Cleanup** -- MCP connections are properly disconnected in the Celery worker's finally block

### Per-Server Mode Configuration
- **Mode checkboxes** -- Each MCP server can be configured to be available in Plan mode, Execute mode, or both (default: both) via checkboxes in Settings > MCP Servers
- **Backend enforcement** -- The `modes` field is stored in the server config, passed through `MCPManager.connect_servers` to `register_mcp_tools`, and enforced by `ToolRouter.is_tool_allowed` per-tool
- **Status endpoint** -- `GET /api/mcp/status` now includes `modes` in each server's response

### @ Mention System
- **MentionPopover component** -- Type `@` in the chat input to open a dropdown showing MCP servers and workspace files. Supports directory browsing (typing `@code/` lists files in `code/`), keyboard navigation (arrows, Enter, Tab, Escape), and filtering by name
- **Mention chips** -- Active mentions are displayed as colored chips above the input area (blue for MCP servers, amber for files)
- **Resource references** -- Mentions are sent as lightweight structured references (`{type, value}`) alongside the message. The backend prepends reference hints that instruct the agent to use appropriate tools (`read`, `inspect_files`, MCP tools) to interact with the referenced resources
- **Mention model** -- New `Mention` Pydantic model with `type: Literal["server", "file"]` and `value: str` (max 1024 chars). Added `mentions` field to `MessageSend`
- **Input sanitization** -- Mention values are sanitized (backticks, newlines, control characters stripped, length capped at 256) before interpolation into prompt text to prevent LLM prompt injection

### inspect_files Tool
- **Parallel file reading** -- New `inspect_files` tool reads multiple files or directories concurrently via `asyncio.gather` and scores each file for relevance against a user query
- **Keyword relevance scoring** -- Files are scored by keyword overlap between their content and the query, sorted by relevance, and returned within a configurable token budget (100K chars default)
- **Directory expansion** -- Directory paths are expanded to their file listings; hidden files (dotfiles) are excluded
- **Safety limits** -- Max 50 files per call, 200 lines per file for scoring, 2MB file-size gate (large files skipped before reading), negative `max_files` clamped to 1
- **Security** -- Each child file in expanded directories is re-validated via `_validate_path` to catch symlinks escaping the workspace
- **Plan mode access** -- Added to the plan-mode allowlist for read-only context gathering

### Conversation Title Generation
- **Deferred generation** -- Title generation no longer triggers after the 1st user message. It now triggers after the 3rd user message or on page refresh, whichever comes first
- **No re-updates** -- Once a title is set, it is not overwritten by subsequent triggers. A race-condition guard in `_auto_title` re-checks the current title from DB before persisting
- **Trigger guard** -- The `send_message` endpoint checks `conv.title == "New conversation"` before triggering, preventing redundant generation

### Security Hardening
- Symlink traversal protection in `inspect_files` -- each child entry in expanded directories is validated via `_validate_path` before reading
- File-size gate in `inspect_files` -- files over 2MB are skipped before `read_text()` to prevent OOM
- `asyncio.get_running_loop()` used instead of deprecated `get_event_loop()` in async contexts
- MCP tool name shadowing logged as a warning for security observability
- Mention values sanitized to strip backticks, newlines, and control characters before prompt interpolation
- `Mention.value` field constrained to max 1024 characters via Pydantic `Field`
- MCP connection timeout (30s) in Celery worker prevents indefinite worker stalls

### UI Fixes
- **Layout gap fix** -- Fixed 1px gap between the main content area and the right sidebar caused by `paddingRight` being 1px larger than the RightPanel's rendered width (`289px` → `288px`, `49px` → `48px`)
- **MCP live connection status** -- MCP server dots in the right panel now turn green when connected. Previously the status was hardcoded to `connected: false` in the REST endpoint. An `mcp_status` SSE event is now broadcast from both the session manager and Celery worker after `MCPManager.connect_servers()` succeeds, and the frontend handles it to update the dots in real time
- **Pre-existing lint fixes** -- Removed extraneous f-string prefix in `papers.py`, removed unused `AsyncMock` import in `test_tools_papers.py`

### Testing
- **34 new backend tests** -- MCP multi-client dispatch (9), inspect_files tool (12), mention enrichment (7), Mention model validation (3), title generation (3 from prior session)
- **Total: 915 backend + 223 frontend = 1,138 tests**
- All ruff checks pass, frontend eslint 0 errors

## v0.5.0

Project-scoped conversations, unified file workspace, Monaco code viewer, TODO approval flow, comprehensive agent guidance, and test infrastructure improvements.
Expand Down
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,8 @@
- **Background jobs** — Celery + Redis. Close the browser, come back later.
- **Multi-provider LLMs** — OpenAI, Anthropic, OpenRouter, plus local models (Ollama, LM Studio). Add custom providers with OpenAI SDK, Anthropic SDK, OpenRouter, or LiteLLM compatibility.
- **Model picker** — Browse models grouped by provider with logos, sorted by release date. Recently used models at the top. Fetches live from [models.dev](https://models.dev).
- **MCP servers** — Connect remote HTTP/HTTPS MCP servers with custom authentication (Bearer, API key, headers).
- **MCP servers** — Connect remote HTTP/HTTPS MCP servers with custom authentication (Bearer, API key, headers). Configure per-server mode availability (Plan, Execute, or both). Live connection status in the sidebar.
- **@ mentions** — Type `@` in the chat to reference MCP servers or workspace files/directories. The agent uses its tools to interact with the referenced resources.
- **Onboarding flow** — Guided setup when no LLM provider is configured.

## Quick Start
Expand Down
11 changes: 11 additions & 0 deletions backend/openmlr/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,12 +70,23 @@ class ConversationDetail(BaseModel):
# ---- Messaging ----


class Mention(BaseModel):
"""A resource reference from an @ mention in the chat input."""

type: Literal["server", "file"]
value: str = Field(
max_length=1024,
description="server name or workspace-relative file/directory path",
)


class MessageSend(BaseModel):
message: str
mode: Literal["plan", "execute"] | None = (
None # per-message mode; only plan or execute accepted
)
request_id: str | None = None # client-generated idempotency key
mentions: list[Mention] | None = None # @ mention references


class ApprovalRequest(BaseModel):
Expand Down
73 changes: 65 additions & 8 deletions backend/openmlr/routes/agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

import asyncio
import logging
import re
from typing import Annotated

from fastapi import APIRouter, Depends, HTTPException, Request
Expand All @@ -19,6 +20,12 @@

router = APIRouter(prefix="/api", tags=["agent"])

# Default conversation title — used as sentinel to detect untitled conversations
_DEFAULT_TITLE = "New conversation"

# Regex for stripping control characters from mention values (prompt injection defense)
_MENTION_SANITIZE_RE = re.compile(r"[`\x00-\x1f]")


def _sm(request: Request):
return request.app.state.session_manager
Expand Down Expand Up @@ -147,8 +154,8 @@ async def get_conversation(
conv = await _get_conv_or_404(db, uuid, user.id)
msgs = await ops.get_messages(db, conv.id)

# Re-generate title if still "New conversation" and has messages
if conv.title == "New conversation" and msgs:
# Re-generate title if still the default and has messages
if conv.title == _DEFAULT_TITLE and msgs:
msg_dicts = [_msg_dict(m) for m in msgs]
_task = asyncio.create_task(
_auto_title(_sm(request), _bus(request), db, conv.id, conv.uuid, msg_dicts)
Expand Down Expand Up @@ -420,7 +427,10 @@ async def send_message(
# If conversation has no model, use user's default
effective_model = conv.model or user_default_model

# Title generation after 1st and 3rd messages
# Enrich message with @ mention reference hints
enriched_message = _enrich_with_mentions(body.message, body.mentions)

# Title generation after 3rd user message (if not already titled)
user_count = (conv.user_message_count or 0) + 1

# If background jobs are enabled, use Celery
Expand All @@ -429,14 +439,14 @@ async def send_message(
db=db,
conversation_id=conv.id,
user_id=user.id,
message=body.message,
message=enriched_message,
mode=body.mode,
model=effective_model,
uuid=conv.uuid,
)

# Title generation (still async in web process for now)
if user_count in (1, 3):
if user_count == 3 and conv.title == _DEFAULT_TITLE:
msg_dicts = await _load_messages(db, conv.id)
_task = asyncio.create_task(
_auto_title(sm, event_bus, db, conv.id, conv.uuid, msg_dicts)
Expand All @@ -445,7 +455,7 @@ async def send_message(
return {"ok": True, "job_id": job.job_id if job else None, "background": True}

# Synchronous processing (original flow)
# Persist user message to DB
# Persist original message to DB (without enrichment clutter)
await ops.add_message(db, conv.id, "user", body.message)
await ops.increment_user_message_count(db, conv.id)

Expand Down Expand Up @@ -475,9 +485,9 @@ async def send_message(
_wire_persistence(active, db, conv.id)
active._persist_wired = True

_task = asyncio.create_task(sm.process_message(conv.id, body.message, mode=body.mode))
_task = asyncio.create_task(sm.process_message(conv.id, enriched_message, mode=body.mode))

if user_count in (1, 3):
if user_count == 3 and conv.title == _DEFAULT_TITLE:
msg_dicts = await _load_messages(db, conv.id)
_task = asyncio.create_task(_auto_title(sm, event_bus, db, conv.id, conv.uuid, msg_dicts))

Expand Down Expand Up @@ -753,6 +763,46 @@ def _conv_dict(c) -> dict:
}


def _sanitize_mention_value(v: str) -> str:
"""Strip control characters and cap length to prevent prompt injection."""
v = v[:256]
return _MENTION_SANITIZE_RE.sub("", v)


def _enrich_with_mentions(message: str, mentions: list | None) -> str:
"""Prepend resource-reference hints for @ mentions.

Mentions are lightweight pointers — the agent is expected to use its
tools (``read``, ``inspect_files``, MCP tools) to interact with them.
"""
if not mentions:
return message

refs: list[str] = []
for m in mentions:
safe_value = _sanitize_mention_value(m.value)
if m.type == "file":
path = safe_value.rstrip("/")
if m.value.endswith("/"):
refs.append(
f"- Directory {path}/ — list its contents with read or use "
f"inspect_files to inspect relevant files."
)
else:
refs.append(f"- File {path} — use read to inspect this file.")
elif m.type == "server":
refs.append(f"- MCP Server {safe_value} — use tools provided by this server.")

if not refs:
return message

hint = (
"[The user referenced these resources — use the appropriate tools "
"to interact with them before responding:]\n" + "\n".join(refs)
)
return hint + "\n\n" + message


def _msg_dict(m) -> dict:
return {
"id": m.id,
Expand Down Expand Up @@ -801,6 +851,13 @@ async def _auto_title(sm, event_bus, db, conv_id, uuid, messages):
title = await LLMProvider.generate_title(messages, config)

if title:
# Re-check the current title to avoid overwriting a title
# that was already set by another trigger (e.g. page refresh).
current_conv = await ops.get_conversation_by_id(db, conv_id)
if current_conv and current_conv.title != _DEFAULT_TITLE:
logger.debug(f"Skipping title update for conv {conv_id}: already titled")
return

await ops.update_conversation_title(db, conv_id, title)
await event_bus.broadcast(
AgentEvent(event_type="conversation_updated", data={"uuid": uuid, "title": title})
Expand Down
1 change: 1 addition & 0 deletions backend/openmlr/routes/mcp.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ async def get_status(
"url": config.get("url", ""),
"enabled": config.get("enabled", True),
"connected": False, # Will be updated via SSE in real-time
"modes": config.get("modes", ["plan", "execute"]),
}
)

Expand Down
12 changes: 12 additions & 0 deletions backend/openmlr/routes/settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ async def update_setting(
"github_token": "GITHUB_TOKEN",
"semantic_scholar_api_key": "SEMANTIC_SCHOLAR_API_KEY",
"openalex_api_key": "OPENALEX_API_KEY",
"paperclip_api_key": "PAPERCLIP_API_KEY",
"modal_token_id": "MODAL_TOKEN_ID",
"modal_token_secret": "MODAL_TOKEN_SECRET",
"hf_token": "HF_TOKEN",
Expand Down Expand Up @@ -127,6 +128,7 @@ def _is_provider_configured(provider_id: str, provider_settings: dict) -> bool:
"github": "GITHUB_TOKEN",
"semantic_scholar": "SEMANTIC_SCHOLAR_API_KEY",
"openalex": "OPENALEX_API_KEY",
"paperclip": "PAPERCLIP_API_KEY",
"modal": "MODAL_TOKEN_ID",
"huggingface": "HF_TOKEN",
}
Expand All @@ -144,6 +146,7 @@ def _is_provider_configured(provider_id: str, provider_settings: dict) -> bool:
"github": "github_token",
"semantic_scholar": "semantic_scholar_api_key",
"openalex": "openalex_api_key",
"paperclip": "paperclip_api_key",
"modal": "modal_token_id",
"huggingface": "hf_token",
}.get(provider_id)
Expand Down Expand Up @@ -261,6 +264,14 @@ async def list_providers(
"categories": ["compute"],
"docs_url": "https://modal.com/docs",
},
{
"id": "paperclip",
"name": "Paperclip",
"key_env": "PAPERCLIP_API_KEY",
"configured": _is_provider_configured("paperclip", provider_settings),
"categories": ["papers"],
"docs_url": "https://paperclip.gxl.ai/docs",
},
{
"id": "huggingface",
"name": "Hugging Face",
Expand Down Expand Up @@ -768,6 +779,7 @@ async def save_config(
"GITHUB_TOKEN",
"SEMANTIC_SCHOLAR_API_KEY",
"OPENALEX_API_KEY",
"PAPERCLIP_API_KEY",
"MODAL_TOKEN_ID",
"MODAL_TOKEN_SECRET",
"HF_TOKEN",
Expand Down
7 changes: 7 additions & 0 deletions backend/openmlr/services/session_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -191,6 +191,13 @@ async def get_or_create_session(
)
if count > 0:
log.info(f"Session {conversation_id}: loaded {count} MCP tools")
# Broadcast live connection status to the frontend
await self.event_bus.broadcast(
AgentEvent(
event_type="mcp_status",
data={"servers": mcp_manager.get_server_statuses()},
)
)
except Exception as e:
log.warning(f"Session {conversation_id}: failed to load MCP servers - {e}")

Expand Down
37 changes: 37 additions & 0 deletions backend/openmlr/tasks/agent_tasks.py
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,9 @@ async def _async_process_message(
sandbox_manager = SandboxManager()
tool_router = create_tool_router(sandbox_manager)

# Track MCP manager for cleanup in finally block
mcp_manager = None

# Resolve project workspace for workspace tools and local tools
async with worker_session() as db:
try:
Expand All @@ -148,6 +151,33 @@ async def _async_process_message(
except Exception as e:
logger.warning(f"Worker job {job_id}: failed to resolve project workspace - {e}")

# Load MCP servers from user settings (with timeout to avoid stalling the worker)
async with worker_session() as db:
try:
from ..tools.mcp import MCPManager

user_settings = await ops.get_all_settings(db, user_id, category="mcp")
mcp_servers = user_settings.get("mcp", {}).get("servers", {})
if mcp_servers:
mcp_manager = MCPManager()
count = await asyncio.wait_for(
mcp_manager.connect_servers(mcp_servers, tool_router, blocklist=set()),
timeout=30.0,
)
if count > 0:
logger.info(f"Worker job {job_id}: loaded {count} MCP tools")
# Broadcast live connection status to frontend
await publish_event(
AgentEvent(
event_type="mcp_status",
data={"servers": mcp_manager.get_server_statuses()},
)
)
except TimeoutError:
logger.warning(f"Worker job {job_id}: MCP server connection timed out")
except Exception as e:
logger.warning(f"Worker job {job_id}: failed to load MCP servers - {e}")

# Build and set system prompt
session.context_manager.system_prompt = build_system_prompt(
tool_specs=tool_router.get_raw_specs(),
Expand Down Expand Up @@ -264,6 +294,13 @@ async def _poll_interrupt():
except Exception:
pass

# Disconnect MCP servers
if mcp_manager:
try:
await mcp_manager.disconnect_all()
except Exception:
pass

# Clear any lingering interrupt key
try:
from ..services.redis_pubsub import clear_interrupt
Expand Down
Loading
Loading