BasicDeepSearch - Autonomous Research Agent

The BasicDeepSearch application provides autonomous, multi-stage research capabilities that go beyond simple search to deliver comprehensive, synthesized research reports with citations.

Overview

BasicDeepSearch implements a state-of-the-art four-stage research pipeline:

Planning: Decomposes complex research queries into structured sub-tasks
Question Development: Generates specific, diverse search queries for each sub-task
Web Exploration: Executes parallel web searches and gathers evidence
Report Generation: Synthesizes findings into structured reports with citations

Key Features

Autonomous Research: Minimal human intervention required
Reflexive Research: Analyzes limitations and performs targeted refinement searches
Parallel Execution: Multiple web searches run simultaneously for speed
Structured Output: Professional reports with citations and analysis
Verification Layer: Optional fact-checking and validation
Multiple Formats: Structured, narrative, or executive report styles
Configurable Depth: Brief, standard, or comprehensive research modes

Installation

BasicDeepSearch relies on the built-in web tools (web_search, fetch_url).

pip install "abstractcore[tools]"
# plus any provider extras you use, e.g.:
# pip install "abstractcore[openai]"

CLI Usage

Basic Usage

# Simple research query
deepsearch "What are the latest developments in quantum computing?"

# Research with specific focus areas
deepsearch "AI impact on healthcare" --focus "diagnosis,treatment,ethics"

# Comprehensive research with custom output
deepsearch "sustainable energy 2025" --depth comprehensive --format executive --output report.json

Advanced Options

# High-volume research with custom LLM
deepsearch "blockchain technology trends" \
  --max-sources 25 \
  --provider openai \
  --model gpt-4o-mini \
  --verbose

# Fast research without verification
deepsearch "current market trends" \
  --depth brief \
  --no-verification \
  --parallel-searches 10

# Reflexive mode - analyzes gaps and refines research automatically
deepsearch "quantum computing breakthroughs" --reflexive
deepsearch "AI safety research" --reflexive --max-reflexive-iterations 3

# Full-text extraction with reflexive improvement
deepsearch "climate change solutions" --full-text --reflexive

Reflexive Research Mode

Reflexive mode (--reflexive) enables adaptive, self-improving research that learns from its own limitations and iteratively refines the results.

How It Works

Standard Research: Executes the normal 4-stage pipeline
Gap Analysis: LLM analyzes the "Methodology & Limitations" section to identify specific information gaps
Targeted Refinement: Generates focused search queries to address identified gaps
Iterative Improvement: Repeats until no significant gaps remain or max iterations reached

Gap Types Identified

Missing Perspectives: Lack of expert opinions or alternative viewpoints
Insufficient Data: Areas where more quantitative information is needed
Outdated Information: When current findings may be superseded by recent developments
Technical Details: Missing technical specifications or implementation details
Recent Developments: Gaps in coverage of latest news or research

Example Reflexive Analysis

Initial Research: "quantum computing timeline"
├── Finds general information about quantum computing progress
├── Limitations: "Limited coverage of recent commercial developments"
└── Reflexive Gap Analysis:
    ├── Gap: "Missing industry expert predictions for 2025-2030"
    ├── Searches: ["quantum computing expert predictions 2025", "industry roadmap quantum timeline"]
    └── Result: Enhanced report with expert opinions and commercial timelines

Configuration

# Enable reflexive mode with default 2 iterations
deepsearch "AI safety research" --reflexive

# Custom iteration limit
deepsearch "climate solutions" --reflexive --max-reflexive-iterations 3

# Combine with other advanced features
deepsearch "quantum breakthroughs" --reflexive --full-text --max-sources 20

Python API Usage

Basic Research

from abstractcore.processing import BasicDeepSearch

# Initialize with default settings
searcher = BasicDeepSearch()

# Conduct research
report = searcher.research("What are the latest developments in quantum computing?")

# Access results
print(f"Title: {report.title}")
print(f"Summary: {report.executive_summary}")
print(f"Sources: {len(report.sources)}")

Advanced Configuration

from abstractcore import create_llm
from abstractcore.processing import BasicDeepSearch

# Custom LLM configuration
llm = create_llm("openai", model="gpt-4o-mini", max_tokens=32000)

# Reflexive research configuration
searcher = BasicDeepSearch(
    llm=llm,
    reflexive_mode=True,
    max_reflexive_iterations=3,
    full_text_extraction=True
)

# Conduct reflexive research
report = searcher.research(
    "What are the current challenges in AI safety research?",
    focus_areas=["alignment", "robustness", "interpretability"],
    output_format="structured"
)

print(f"Methodology: {report.methodology}")
print(f"Limitations: {report.limitations}")
print(f"Sources analyzed: {len(report.sources)}")

# Initialize with custom settings
searcher = BasicDeepSearch(
    llm=llm,
    max_parallel_searches=8,
    timeout=600
)

# Comprehensive research
report = searcher.research(
    query="Impact of AI on healthcare",
    focus_areas=["medical diagnosis", "drug discovery", "patient care"],
    max_sources=20,
    search_depth="comprehensive",
    include_verification=True,
    output_format="executive"
)

Research Depths

Brief (3 sub-tasks, ~5 minutes)

Quick overview and current state
Suitable for initial exploration
10-15 sources typically

Standard (5 sub-tasks, ~10 minutes)

Balanced depth and breadth
Good for most research needs
15-20 sources typically

Comprehensive (8 sub-tasks, ~20 minutes)

Deep analysis with multiple perspectives
Includes stakeholders, economics, technical aspects
20-30 sources typically

Output Formats

Structured (Default)

Professional research report format
Clear sections: Executive Summary, Key Findings, Analysis, Conclusions
Ideal for academic or business use

Executive

Concise, business-focused format
Emphasizes strategic insights and implications
Suitable for decision-makers

Narrative

Engaging, story-driven format
Shows connections between findings
Great for presentations and communication

Report Structure

All reports include:

Title: Descriptive report title
Executive Summary: 2-3 sentence overview
Key Findings: Bullet points of main discoveries
Detailed Analysis: Comprehensive synthesis (3-4 paragraphs)
Conclusions: Implications and recommendations
Sources: Complete list with URLs and relevance scores
Methodology: Research approach description
Limitations: Caveats and constraints

Configuration Options

Research Parameters

focus_areas: Specific areas to emphasize
max_sources: Number of sources to gather (1-100)
search_depth: Research thoroughness level
include_verification: Enable fact-checking
output_format: Report style

Performance Settings

max_parallel_searches: Concurrent web searches (1-20)
timeout: HTTP request timeout
max_tokens: LLM context window
max_output_tokens: LLM output limit

Best Practices

Query Formulation

Use specific, focused questions
Avoid overly broad topics
Include time constraints when relevant
Specify domain or context

Good Examples:

"What are the latest developments in quantum computing for drug discovery?"
"How is AI transforming medical diagnosis in 2024-2025?"
"What are the main challenges facing renewable energy adoption?"

Avoid:

"Tell me about AI" (too broad)
"What is quantum computing?" (basic definition)
"Everything about healthcare" (unfocused)

Focus Areas

Provide 3-5 specific focus areas for complex topics
Use domain-specific terminology
Balance breadth and depth

Example:

deepsearch "AI in education" --focus "personalized learning,assessment automation,teacher tools,student outcomes,ethical concerns"

Performance Optimization

Use brief depth for quick overviews
Increase parallel_searches for faster execution
Use cloud providers (OpenAI, Anthropic) for reliability
Enable verbose mode for progress tracking

Output Management

Save comprehensive reports to files (--output report.json)
Use markdown format for sharing (--output report.md)
Choose appropriate format for audience

Error Handling

The system includes robust error handling:

Network Issues: Automatic retries with exponential backoff
LLM Failures: Graceful degradation with fallback responses
Parsing Errors: Fallback to simplified report generation
Source Failures: Continues with available sources

Limitations

Source Quality: Limited to publicly available web content
Real-time Data: May not capture very recent developments
Language: Primarily English-language sources
Verification: Automated fact-checking has limitations
Bias: Inherits biases from web sources and LLM training

Integration Examples

Research Pipeline

# Multi-stage research workflow
topics = [
    "quantum computing applications",
    "AI safety developments", 
    "renewable energy innovations"
]

searcher = BasicDeepSearch()
reports = []

for topic in topics:
    report = searcher.research(
        topic,
        search_depth="standard",
        max_sources=15
    )
    reports.append(report)

# Analyze across reports
all_sources = []
for report in reports:
    all_sources.extend(report.sources)

print(f"Total unique sources: {len(set(s['url'] for s in all_sources))}")

Custom Analysis

# Extract specific insights
def extract_trends(report):
    trends = []
    for finding in report.key_findings:
        if any(word in finding.lower() for word in ['trend', 'growing', 'increasing', 'emerging']):
            trends.append(finding)
    return trends

report = searcher.research("AI market trends 2025")
trends = extract_trends(report)
print("Key trends identified:")
for trend in trends:
    print(f"- {trend}")

Troubleshooting

Common Issues

"Failed to initialize default Ollama model"

Install Ollama: https://ollama.com/
Pull model: ollama pull qwen3:4b-instruct-2507-q4_K_M
Or specify custom provider: --provider openai --model gpt-4o-mini

"No search results found"

Check internet connectivity
Try broader search terms
Reduce max_sources if hitting rate limits
Install ddgs for better web search: pip install ddgs

"Report generation failed"

Increase max_output_tokens
Use more capable model (e.g., gpt-4o-mini)
Reduce max_sources to avoid context overflow

"Timeout errors"

Increase --timeout value
Reduce parallel_searches
Use faster LLM provider

Performance Tips

Use local models (Ollama) for cost-effective research
Use cloud models (OpenAI, Anthropic) for reliability
Enable verbose mode to monitor progress
Save reports to files for large research projects
Use brief depth for quick iterations

FilesExpand file tree

basic-deepsearch.md

Latest commit

History