An AI-powered research assistant that can find, analyze, and synthesize academic papers, build knowledge over time, and provide intelligent insights. This agent uses LangGraph for workflow orchestration, arXiv for paper discovery, and Mem0 for persistent knowledge management.
- Paper Discovery: Search arXiv for relevant academic papers by topic or keywords
- Paper Analysis: Download and analyze paper content, extracting key insights
- Knowledge Synthesis: Combine findings from multiple papers into comprehensive summaries
- Citation Management: Track and reference analyzed papers
- Persistent Memory: Build a knowledge graph of learned concepts using Mem0
- Knowledge Querying: Ask questions about previously learned topics
- Cross-Session Learning: Knowledge persists across conversations and sessions
- Contextual Retrieval: Find related information from the knowledge base
- Intent Detection: Automatically determines if you want research, knowledge queries, or analysis
- Dynamic Planning: Creates step-by-step execution plans based on your request
- Real-time Progress: Streaming interface shows progress through research workflow
- Error Handling: Graceful degradation and informative error messages
- Chat Interface: Modern, responsive web interface for conversations
- Markdown Support: Rich text formatting for better readability
- Progress Tracking: Visual progress bars showing research workflow steps
- Streaming Responses: Real-time updates as the agent works
- Phoenix Integration: Full tracing and monitoring of agent workflows
- Request Tracking: Monitor performance and debug issues
- OpenTelemetry: Comprehensive observability for all components
The agent uses a modular LangGraph-based architecture:
User Request → Intent Detection → Planning → Execution → Response Generation
↓
┌─ Research Execution
│ ├─ Topic Extraction
│ ├─ Paper Search (arXiv)
│ ├─ Paper Analysis
│ └─ Knowledge Storage
│
└─ Knowledge Query
├─ Knowledge Search
├─ Information Retrieval
└─ Response Formulation
- LangGraph Agent: Orchestrates the research workflow with multiple specialized nodes
- arXiv Client: Searches and downloads academic papers from arXiv
- Knowledge Graph: Mem0-powered persistent memory for long-term learning
- Prompts System: Centralized prompt management for all AI interactions
- Demo Interface: Flask-powered web UI with streaming capabilities
- FastAPI Server: High-performance API server with OpenTelemetry tracing
- Docker and Docker Compose
- Python 3.12+ with pyenv
- OpenAI API key
- Internet connection for arXiv access
-
Clone and Setup Environment
git clone <repository-url> cd ResearchLearner ./bin/bootstrap.sh source .venv/bin/activate
-
Configure Environment Variables Create a
.envfile in the project root:OPENAI_API_KEY="your-openai-api-key" OPENAI_MODEL="gpt-4o" OPENAI_TEMPERATURE=0.1 FASTAPI_URL="http://fastapi:8000" PHOENIX_COLLECTOR_ENDPOINT="http://phoenix:6006/v1/traces"
-
Run the Application
./bin/run_agent.sh --build
-
Access the Interfaces
- Demo Chat: http://localhost:8080
- Phoenix Dashboard: http://localhost:6006
Ask the agent to research any academic topic:
- "Find papers about transformer architectures"
- "Research quantum computing applications in machine learning"
- "What are the latest developments in JEPA models?"
The agent will:
- Extract research keywords
- Search arXiv for relevant papers
- Download and analyze top papers
- Store findings in knowledge graph
- Provide comprehensive synthesis
Query previously learned information:
- "What have I learned about transformers?"
- "Summarize what you know about neural networks"
- "Tell me about the papers on reinforcement learning"
Analyze specific papers:
- "Analyze the paper arxiv:2024.12345"
- "What are the key insights from the BERT paper?"
OPENAI_API_KEY="your-api-key"
OPENAI_MODEL="gpt-4o" # or gpt-4o-mini, gpt-3.5-turbo
OPENAI_TEMPERATURE=0.1 # 0.0-1.0, lower = more focusedFASTAPI_URL="http://fastapi:8000"
PHOENIX_COLLECTOR_ENDPOINT="http://phoenix:6006/v1/traces"agent/
├── agent.py # Main agent orchestrator
├── langgraph_agent.py # LangGraph workflow implementation
├── arxiv_client.py # arXiv paper search and download
├── knowledge_graph.py # Mem0 knowledge management
├── prompts.py # Centralized prompt templates
├── schema.py # Request/response models
├── server.py # FastAPI application server
├── caching.py # LRU cache for conversations
└── demo_code/ # Web demo interface
├── demo_server.py # Flask demo server
├── templates/ # HTML templates
└── static/ # CSS, JavaScript, assets
The agent uses a sophisticated LangGraph workflow with multiple specialized nodes:
- IntentDetectionNode: Determines user intent (research, knowledge_query, analysis, general)
- PlanningNode: Creates step-by-step execution plans
- ResearchExecutionNode: Handles paper search, analysis, and synthesis
- KnowledgeQueryNode: Retrieves and formulates responses from knowledge base
- ResponseGenerationNode: Creates final user-facing responses
- Paper Search: Advanced arXiv queries with category filtering
- Content Analysis: Full paper download and AI-powered analysis
- Knowledge Storage: Automatic storage of insights in persistent memory
- Cross-Reference: Links related papers and concepts
- Mem0 Integration: Vector-based knowledge storage with semantic search
- Persistent Learning: Knowledge survives across sessions and conversations
- Contextual Retrieval: Smart retrieval of relevant information
- Knowledge Synthesis: Combines multiple sources for comprehensive answers
To add research sources beyond arXiv, extend the SimpleResearchAgent class:
class SimpleResearchAgent:
async def search_papers(self, query: str, sources: List[str] = ["arxiv"]):
if "pubmed" in sources:
# Add PubMed search logic
if "semantic_scholar" in sources:
# Add Semantic Scholar search logicAll prompts are centralized in agent/prompts.py. Add new prompts as properties:
@property
def custom_analysis_prompt(self) -> ChatPromptTemplate:
return ChatPromptTemplate.from_messages([
("system", "Your custom system prompt..."),
("human", "{user_input}")
])The knowledge graph can be extended to store custom metadata:
def add_custom_insight(self, insight: str, metadata: Dict):
self.memory.add(
messages=[{"role": "user", "content": insight}],
user_id=self.user_id,
metadata=metadata
)Access the Phoenix dashboard at http://localhost:6006 to:
- View request traces and spans
- Monitor agent workflow execution
- Debug performance issues
- Analyze usage patterns
View container logs:
docker logs researchlearner-fastapi-1 # Agent logs
docker logs researchlearner-demo-1 # Demo server logs
docker logs researchlearner-phoenix-1 # Phoenix logsCheck service health:
curl http://localhost:8080/api/health # Demo server
curl http://localhost:8000/health # FastAPI serverPhoenix Connection Error
- Ensure Phoenix container is running:
docker ps - Check PHOENIX_COLLECTOR_ENDPOINT in .env file
API Key Issues
- Verify OPENAI_API_KEY is valid and has sufficient credits
- Check OPENAI_MODEL is supported (gpt-4o, gpt-4o-mini, etc.)
arXiv Access Issues
- Ensure internet connectivity
- arXiv may rate limit requests; the client handles this automatically
Knowledge Graph Issues
- Check that
~/.research_learner/knowledge_dbdirectory is writable - Verify Mem0 installation:
pip install mem0ai
Container Build Issues
./bin/run_agent.sh --build # Rebuild containers
docker system prune # Clean up old containers
docker-compose logs # View detailed logsDemo UI Issues
- Hard refresh browser (Ctrl+F5 / Cmd+Shift+R)
- Check browser console for JavaScript errors
- Verify streaming is supported in your browser
Research Performance
- Adjust
max_papersparameter in research queries - Use specific categories for arXiv searches
- Cache expensive operations
Knowledge Graph Performance
- Regular cleanup of old memories
- Optimize vector store configuration
- Use semantic search filters
API Performance
- Monitor OpenAI API usage and costs
- Implement request batching for multiple papers
- Use appropriate model selection (gpt-4o-mini for simpler tasks)
- LangGraph: Agent workflow orchestration
- OpenAI: LLM provider for analysis and synthesis
- Mem0: Knowledge graph and long-term memory
- arXiv: Academic paper search and download
- FastAPI: High-performance API server
- Phoenix: Observability and tracing
- Flask: Demo web server
- Bootstrap: UI styling
- JavaScript: Streaming chat interface
This research agent can be connected to Claude Desktop as an MCP (Model Context Protocol) server, allowing Claude to use the research capabilities as tools.
-
Install MCP Dependencies
pip install mcp
-
Configure Claude Desktop Add the following to your Claude Desktop configuration file:
macOS:
~/Library/Application Support/Claude/claude_desktop_config.jsonWindows:%APPDATA%/Claude/claude_desktop_config.json{ "mcpServers": { "research-agent": { "command": "python", "args": [ "/path/to/ResearchLearner/mcp_server.py" ], "env": { "OPENAI_API_KEY": "your-openai-api-key", "OPENAI_MODEL": "gpt-4o", "OPENAI_TEMPERATURE": "0.1" } } } }Important: Update
/path/to/ResearchLearner/with the actual path to your project directory. -
Set Environment Variables Make sure your
.envfile contains:OPENAI_API_KEY="your-openai-api-key" OPENAI_MODEL="gpt-4o" OPENAI_TEMPERATURE=0.1
-
Restart Claude Desktop After saving the configuration, restart Claude Desktop to load the MCP server.
Once connected, Claude will have access to these research tools:
Research a specific topic using arXiv papers and knowledge graph
- Input:
topic(string),max_papers(integer, optional) - Example: "Research papers about transformer architectures"
Query the existing knowledge graph for information
- Input:
query(string),limit(integer, optional) - Example: "What do I know about neural networks?"
Analyze a specific arXiv paper by ID
- Input:
paper_id(string) - Example: "Analyze paper 2301.12345"
Get a comprehensive knowledge summary for a topic
- Input:
topic(string) - Example: "Get knowledge summary for machine learning"
Add a research insight to the knowledge graph
- Input:
insight(string),topic(string),context(object, optional) - Example: "Store this insight about attention mechanisms"
Once the MCP server is connected, you can ask Claude:
- "Use the research agent to find papers about quantum machine learning"
- "Query your knowledge base for what you know about transformers"
- "Analyze the latest JEPA paper and add insights to your knowledge"
- "Get a summary of all the research you've done on reinforcement learning"
Configuration Issues
- Verify the path to
mcp_server.pyis correct and absolute - Ensure Python can find all dependencies (run from activated virtual environment)
- Check that environment variables are properly set
Connection Problems
- Restart Claude Desktop after configuration changes
- Check Claude Desktop logs for MCP server errors
- Verify the MCP server runs independently:
python mcp_server.py
Permission Issues
- Ensure the knowledge database directory
~/.research_learner/knowledge_dbis writable - Verify OpenAI API key has sufficient permissions and credits
The MCP server provides:
- Stateful Knowledge: Maintains persistent knowledge across Claude sessions
- Resource Access: Claude can read knowledge base and recent papers
- Full Integration: Access to all research agent capabilities through simple tool calls
- Error Handling: Graceful error handling with informative messages
This project is licensed under the MIT License - see the LICENSE file for details.