An AI-powered research tool that automatically generates comprehensive reports on any topic by intelligently searching the web and synthesizing information using large language models.
Deep Sage is a sophisticated research automation system that leverages multiple AI components to create detailed, well-structured reports. It combines web search capabilities with advanced language models to gather, analyze, and synthesize information on any given topic.
- Intelligent Planning: Automatically creates structured research plans with targeted sections
- Web Research: Performs comprehensive web searches using Tavily API for up-to-date information
- Content Generation: Uses multiple LLM providers (Groq, OpenAI, Anthropic) for content creation
- Async Processing: Concurrent processing of multiple sections for improved performance
- Source Management: Automatic deduplication and citation of sources
- Multiple Formats: Generates both Markdown and PDF outputs
- Token Tracking: Comprehensive cost tracking across all LLM operations
The system follows a modular pipeline architecture with four main components:
- Planner: Creates research sections and determines which require web research
- Sections Writer: Generates content for research-based sections using web search
- Final Writer: Creates introduction/conclusion sections and report title
- Finalizer: Merges sources and assembles the final report
- Python 3.13 or higher
- UV package manager (recommended) or pip
git clone https://github.com/bgunyel/deep-sage.git
cd deep-sage
uv syncgit clone https://github.com/bgunyel/deep-sage.git
cd deep-sage
pip install -e .Create a .env file in the project root with your API keys:
GROQ_API_KEY=your_groq_api_key
OPENAI_API_KEY=your_openai_api_key
ANTHROPIC_API_KEY=your_anthropic_api_key
TAVILY_API_KEY=your_tavily_api_key
LANGSMITH_API_KEY=your_langsmith_api_key # Optional for tracing
LANGSMITH_TRACING=true # Optional for tracingfrom deep_sage import Researcher
# Configure LLM settings
llm_config = {
'language_model': {
'model': 'llama-3.3-70b-versatile',
'model_provider': 'groq',
'api_key': 'your_groq_api_key',
'model_args': {
'temperature': 0,
'max_retries': 5,
'max_tokens': 32768,
}
},
'reasoning_model': {
'model': 'deepseek-r1-distill-llama-70b',
'model_provider': 'groq',
'api_key': 'your_groq_api_key',
'model_args': {
'temperature': 0,
'max_retries': 5,
'max_tokens': 32768,
}
}
}
# Create researcher instance
researcher = Researcher(
llm_config=llm_config,
web_search_api_key='your_tavily_api_key'
)
# Generate report
result = researcher.run(
topic="Impact of artificial intelligence on healthcare",
config={
"configurable": {
"max_iterations": 3,
"max_results_per_query": 5,
"number_of_queries": 3,
"search_category": "general"
}
}
)
print(result['content']) # Markdown report
print(f"Report title: {result['report_title']}")For quick testing, use the included development script:
python src/main_dev.pyThis will generate a report and save both Markdown and PDF versions to the out/ directory with timestamped filenames.
| Parameter | Description | Default |
|---|---|---|
max_iterations |
Maximum research iterations | 3 |
max_results_per_query |
Results per search query | 5 |
max_tokens_per_source |
Token limit per source | 5000 |
number_of_queries |
Search queries to generate | 3 |
search_category |
Tavily search category | "general" |
strip_thinking_tokens |
Remove reasoning tokens | true |
- Groq: Fast inference with Llama and other models
- OpenAI: GPT-4 and other OpenAI models
- Anthropic: Claude models
Generated reports include:
- Title: AI-generated report title
- Introduction: Overview and context
- Research Sections: Data-driven content with sources
- Conclusion: Synthesis and key takeaways
- Citations: Automatically formatted numbered source list with clickable links
langchain: LLM framework and integrationslanggraph: Workflow orchestrationtavily-python: Web search APIpydantic: Data validation and settingsmd2pdf: PDF generationai-common: Shared utilities (custom package)summary-writer: Content generation (custom package)
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
Bertan Günyel
- Email: bertan.gunyel@gmail.com
- GitHub: @bgunyel
- Built with LangChain and LangGraph frameworks
- Uses Tavily for web search capabilities
- Supports multiple LLM providers for flexibility