Perspicacité — AI-Powered Scientific Literature Research Assistant

Perspicacité (French for "insight") helps scientists, researchers, and students search, understand, and organize academic literature using AI grounded in real research papers.

Features

Multi-database search — Semantic Scholar, OpenAlex, PubMed, arXiv, HAL, DBLP, and more
Unified content pipeline — Retrieves structured full text (PMC JATS XML, arXiv HTML), PDFs, or abstracts with quality-based priority routing
5 RAG modes — From fast KB retrieval to multi-cycle agentic research and systematic literature surveys
Knowledge base management — Import from BibTeX, add papers by DOI, semantic search within your collections
MCP server — 8 tools exposed via Model Context Protocol for integration with AI agents (Mimosa, SmolAgents, etc.)
REST API — Full JSON API for chat, KB management, conversations, and literature surveys
Local-first — Data stays on your machine; only API calls go to LLM providers

Quick Start

Requirements

Python 3.12+
uv (recommended) or pip

Install

git clone <repository-url>
cd perspicacite_v2
uv sync --dev

Configure

cp .env.example .env
# Edit .env — add at least one LLM API key

Provider	Get a key at
DeepSeek (default)	platform.deepseek.com
OpenAI	platform.openai.com
Anthropic	console.anthropic.com

Run

uv run perspicacite -c config.yml serve

Open http://localhost:8000 in your browser.

The MCP server runs on the same port at /mcp. Use --no-mcp to disable it.

How to Use Perspicacité

Using the Web Interface

1. Choose Your Knowledge Base (or Don't!)

In the left sidebar, you'll see a "Knowledge Base" section:

"No KB (web search only)" — Searches the entire web for papers
Your own KBs — Searches only papers you've added

To create a new Knowledge Base:

Click "+ Create new KB"
Enter a KB name and drag-and-drop a .bib file
Click "Create from BibTeX" to import papers

2. Ask a Question

Type your research question in the chat box. Examples:

"What are the effects of green tea extract on metabolism?"
"How is feature-based molecular networking used in metabolomics?"
"Compare transformer models to CNNs for medical imaging"

3. Choose a Mode

Select a mode from the dropdown. See the RAG Modes table below for details.

4. Review the Answer

Perspicacité will:

Show its "thinking process" (click to expand)
Search relevant papers
Filter and score them for relevance
Download full texts when possible
Generate an answer with citations

5. Save Interesting Papers

At the bottom of each response, you'll see papers found during research. Click "Add to KB" to save them to your knowledge base.

Building Your Knowledge Base

Method 1: Import from BibTeX

Export your references as BibTeX from Zotero, Mendeley, or EndNote
Click "+ Create new KB", drag your .bib file, enter a name
Click "Create from BibTeX"

Method 2: Add Papers from Search Results When Perspicacité finds papers during research, click "Add to KB" on any paper.

Method 3: Add via MCP or API

# Via MCP tool
add_papers_to_kb(kb_name="my-kb", papers=[{"title": "...", "doi": "..."}])

# Via REST API
POST /api/kb/my-kb/papers  [{"title": "...", "doi": "..."}]

Tips for Best Results

Writing good questions:

Be specific: "What are the antioxidant properties of green tea catechins?" over "Tell me about tea"
Ask research-focused questions — Perspicacité summarizes literature, it doesn't write original content

Managing KBs:

Keep KBs focused — create separate ones for different projects
Start with 10-20 key papers, expand as needed
Pay attention to relevance scores — high-scoring papers are most useful

When to use each mode:

Basic: You have a well-curated KB and want quick answers
Advanced: Your KB might need broader search
Profound: Complex questions needing multiple perspectives
Agentic: Questions requiring web search beyond your KB
Literature Survey: Mapping a research field with AI-identified themes

RAG Modes

Mode	Description	Speed
Basic	Single-query retrieval from your KB with hybrid vector+BM25 search	Fast
Advanced	Query expansion, WRRF fusion scoring, reranking	Medium
Profound	Multi-cycle research (up to 3 iterations) with planning and self-evaluation	Slower
Agentic	Intent-based agent with tool use (web search, PDF download), up to 5 iterations	Variable
Literature Survey	Systematic field mapping: broad search, theme clustering, AI recommendations, paper selection	Slowest

Content Retrieval Pipeline

Paper content is retrieved through a unified pipeline with quality-based priority:

1. Discovery — OpenAlex + Unpaywall → learn PMCID, arXiv ID, OA status, abstract
2. Structured full text — PMC JATS XML (sections + references) or arXiv HTML
3. PDF full text — OA PDF, arXiv PDF, Unpaywall, publisher APIs (Springer, Wiley, Elsevier, etc.)
4. Abstract only — from discovery metadata
5. Discard — papers with no retrievable content

Structured content (PMC, arXiv) provides sections and references. PDF content provides raw text via PyMuPDF. Papers behind paywalls with no OA version are served as abstracts.

MCP Server

Perspicacité exposes an MCP server with 8 tools, accessible via:

MCP protocol — native tool discovery and invocation
HTTP JSON-RPC — POST /mcp with {"method": "tools/call", "params": {"name": "...", "arguments": {...}}}

Tools

Tool	Description
`search_literature`	Search academic databases with year range and article-type filters
`get_paper_content`	Fetch full text + sections by DOI through the unified pipeline
`get_paper_references`	Extract cited references from a paper
`create_knowledge_base`	Create a new KB
`add_papers_to_kb`	Add papers with auto-download and indexing
`search_knowledge_base`	Semantic search within a KB
`list_knowledge_bases`	List all KBs with stats
`generate_report`	Synthesize a research report using RAG

Example: JSON-RPC Call

import httpx

# Initialize session
r = httpx.post("http://localhost:8000/mcp", json={
    "jsonrpc": "2.0", "id": 1, "method": "initialize",
    "params": {"protocolVersion": "2024-11-05", "capabilities": {},
               "clientInfo": {"name": "my-agent", "version": "1.0"}}
}, headers={"Content-Type": "application/json", "Accept": "application/json, text/event-stream"})
session_id = r.headers["mcp-session-id"]

# Call a tool
r = httpx.post("http://localhost:8000/mcp", json={
    "jsonrpc": "2.0", "id": 2, "method": "tools/call",
    "params": {"name": "search_literature", "arguments": {"query": "flash attention"}}
}, headers={"Content-Type": "application/json", "Accept": "application/json, text/event-stream",
            "Mcp-Session-Id": session_id})

REST API

Method	Path	Description
`POST`	`/api/chat`	Chat endpoint (streaming SSE or non-streaming JSON)
`GET`	`/api/health`	Health check
`GET`	`/api/conversations`	List conversations
`POST`	`/api/conversations`	Create conversation
`DELETE`	`/api/conversations/{id}`	Delete conversation
`GET`	`/api/kb`	List knowledge bases
`POST`	`/api/kb`	Create KB
`GET`	`/api/kb/{name}`	Get KB details
`DELETE`	`/api/kb/{name}`	Delete KB
`POST`	`/api/kb/{name}/papers`	Add papers to KB
`POST`	`/api/kb/{name}/bibtex`	Import from BibTeX
`GET`	`/api/survey/{session_id}`	Get literature survey status
`POST`	`/api/survey/{session_id}/generate`	Generate survey report

Non-streaming chat: pass "stream": false to /api/chat to get a JSON response instead of SSE.

CLI Commands

# Start the server (web + MCP)
perspicacite -c config.yml serve [--host 0.0.0.0] [--port 8000] [--no-mcp] [--reload]

# Create a KB from BibTeX
perspicacite -c config.yml create-kb my-kb --from-bibtex papers.bib

# Show version
perspicacite version

Configuration

Copy and edit config.example.yml. Key sections:

llm:
  default_provider: "deepseek"   # deepseek, openai, anthropic
  default_model: "deepseek-chat"

knowledge_base:
  embedding_model: "text-embedding-3-small"
  chunk_size: 1000
  chunk_overlap: 200
  chunking_method: "token"       # token, semantic, agentic

pdf_download:
  unpaywall_email: "your@email.com"
  # Optional publisher API keys:
  # elsevier_api_key: "..."
  # springer_api_key: "..."
  # wiley_tdm_token: "..."

mcp:
  enabled: true

Academic database search APIs are configured under scilex: — enabled sources include Semantic Scholar, OpenAlex, PubMed, arXiv, HAL, DBLP by default.

Knowledge Bases

Create from BibTeX:

In the web UI: click "+ Create new KB", drag a .bib file, enter a name
Via CLI: perspicacite create-kb my-kb --from-bibtex refs.bib
Via MCP: create_knowledge_base then add_papers_to_kb

Add papers during research:

Agentic mode finds and downloads papers — click "Add to KB" to save
Literature Survey mode lets you select recommended papers and add in bulk

Chunking strategies:

token — fixed-size token chunks (default)
semantic — splits at semantic boundaries
agentic — AI-driven chunking optimized for RAG

Development

Run Tests

# Unit tests
uv run pytest tests/unit/ -v

# Live MCP tests (requires running server)
uv run python tests/test_mcp_live.py --all --port 8000
uv run python tests/test_mcp_live.py --test search
uv run python tests/test_mcp_live.py --test kb

Project Structure

src/perspicacite/
  cli.py                      # CLI commands (serve, create-kb, version)
  config/schema.py            # Configuration model
  mcp/server.py               # MCP server with 8 tools
  pipeline/
    download/                 # Content retrieval pipeline
      discovery.py            # OpenAlex + Unpaywall discovery
      unified.py              # Unified retrieve_paper_content()
      europepmc.py            # PMC JATS XML fetcher
      arxiv.py                # arXiv HTML + PDF
    parsers/pdf.py            # PyMuPDF-based parser
    bibtex_kb.py              # BibTeX → KB pipeline
  rag/
    engine.py                 # RAGEngine (routes to mode handlers)
    modes/                    # basic, advanced, profound, agentic, literature_survey
    tools/                    # Tool registry, KB search, LOTUS
  search/scilex_adapter.py    # Multi-database literature search
  retrieval/                  # ChromaDB vector store + hybrid search
web_app_full.py               # FastAPI web application
templates/index.html          # Single-page chat UI

Privacy & Data

Your data stays local — KBs are stored in ChromaDB and SQLite on your machine
API calls only — Questions are sent to your configured LLM provider
No tracking — No usage data collected

Contributing

See CONTRIBUTING.md for guidelines.

License

Apache License 2.0 — see LICENSE and NOTICE.

Acknowledgments

ChromaDB for vector storage
OpenAlex and Semantic Scholar for academic search
Unpaywall for open access discovery
SciLEx for literature exploration toolkit

References

@inproceedings{pradi2025perspicacite,
  title     = {An AI Pipeline for Scientific Literacy and Discovery: a Demonstration of Perspicacit\'{e}-AI Integration with Knowledge Graphs},
  author    = {Pradi, Lucas and Jiang, Tao and Feraud, Matthieu and Bekbergenova, Madina and Taghzouti, Yousouf and Nothias, Louis-Felix},
  booktitle = {ISWC-C 2025},
  year      = {2025}
}

@softwareversion{scilex2026,
  title = {SciLEx, Science Literature Exploration Toolkit},
  author = {Ringwald, C\'{e}lian and Navet, Benjamin},
  url = {https://github.com/Wimmics/SciLEx},
  year = {2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
docs		docs
scripts		scripts
src/perspicacite		src/perspicacite
templates		templates
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
EMPLOYER_AUTHORIZATION.md		EMPLOYER_AUTHORIZATION.md
INDIVIDUAL_CLA.md		INDIVIDUAL_CLA.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
bug_from_mimosa_side.txt		bug_from_mimosa_side.txt
config.example.yml		config.example.yml
licensing-notes.md		licensing-notes.md
pyproject.toml		pyproject.toml
web_app_full.py		web_app_full.py

Folders and files

Latest commit

History

Repository files navigation

Perspicacité — AI-Powered Scientific Literature Research Assistant

Features

Quick Start

Requirements

Install

Configure

Run

How to Use Perspicacité

Using the Web Interface

1. Choose Your Knowledge Base (or Don't!)

2. Ask a Question

3. Choose a Mode

4. Review the Answer

5. Save Interesting Papers

Building Your Knowledge Base

Tips for Best Results

RAG Modes

Content Retrieval Pipeline

MCP Server

Tools

Example: JSON-RPC Call

REST API

CLI Commands

Configuration

Knowledge Bases

Development

Run Tests

Project Structure

Privacy & Data

Contributing

License

Acknowledgments

References

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages