Go-Docs MCP

Install and Go. One command, single binary. Your AI reads any document — PDF, text, Markdown, DOCX, images.

MCP server for multi-format document access — read, search, extract images, OCR, and fetch documents from URLs via the Model Context Protocol. 13 tools, 6 formats, zero configuration.

go install github.com/drolosoft/go-docs-mcp@latest
# That's it. Single binary, starts in milliseconds.

For a deeper look at why an MCP server beats a direct tool, see Why MCP?

🏆 Why Go-Docs MCP?

Every other document MCP server handles one format — a PDF server for PDFs, a DOCX server for DOCX. You'd need three separate servers to read three formats.

	Go-Docs MCP	Others
Single binary, no runtime	Yes	Need Node/Python
`go install` one-liner	Yes	npm+deps or pip+venv
Multi-format (6 types)	Yes	One format each
Full-text search	Yes	Partial or none
OCR (scanned PDFs + images)	Yes	Rare
Image & table extraction	Yes	Partial
Document outline	Yes	Rare
Fetch from URL	Yes	Rare
Dir-locked, read-only	Yes	Varies
Smart caching	Yes	No
Fully offline	Yes	Yes

Go-Docs MCP reads them all from a single binary — fast, secure, and dependency-free at runtime.

📋 Features — 13 Tools

Category	Tool	Description
Discovery	`list_documents`	List all documents with metadata (format, pages, size)
Discovery	`list_formats`	List supported formats and dependency status
Reading	`read_document`	Full text, specific page, or page ranges from any format
Reading	`read_url`	Download from URL and extract text (50MB max)
Reading	`get_document_summary`	First 3 pages as a quick overview
Search	`search_document`	Case-insensitive full-text search with context
Analysis	`get_document_metadata`	Title, author, dates, version, page count
Analysis	`get_document_outline`	Table of contents / bookmarks
Analysis	`extract_tables`	Tables as structured data
Analysis	`extract_images`	Images as base64 (max 10 per call)
OCR	`ocr_document`	Force OCR on scanned/image-based PDFs
OCR	`read_image`	Extract text from PNG, JPG, TIFF via OCR
Export	`convert_to_markdown`	Convert any document to clean Markdown

Highlights:

Fast — mtime-based in-memory caching avoids redundant extraction
Multi-format — PDF, TXT, MD, CSV, DOCX, and images from one server
OCR — automatic fallback to tesseract for scanned documents
Secure — directory-locked with path traversal prevention
Portable — works on macOS and Linux

📄 Supported Formats

Format	Dependencies	Notes
PDF	poppler (`pdftotext`, `pdfinfo`, `pdfimages`, `pdftoppm`)	Full support — text, images, metadata, OCR fallback
TXT, MD, CSV	None	Native, zero dependencies
DOCX	pandoc (optional)	Word document extraction
Images (PNG, JPG, TIFF)	tesseract (optional)	OCR text extraction

📦 Prerequisites

Go 1.25+ (install)
poppler — required for PDF support
tesseract (optional) — enables OCR for scanned docs and images
pandoc (optional) — enables DOCX support

# macOS
brew install poppler
brew install tesseract        # optional: OCR
brew install pandoc           # optional: DOCX

# Debian/Ubuntu
apt install poppler-utils
apt install tesseract-ocr     # optional: OCR
apt install pandoc            # optional: DOCX

# Fedora/RHEL
dnf install poppler-utils
dnf install tesseract         # optional: OCR
dnf install pandoc            # optional: DOCX

Note: TXT, MD, and CSV work out of the box with zero dependencies. Install only what you need.

🚀 Installation

From source

go install github.com/drolosoft/go-docs-mcp@latest

Build locally

git clone https://github.com/drolosoft/go-docs-mcp.git
cd go-docs-mcp
make build      # produces ./go-docs-mcp
make install    # installs to /usr/local/bin/

⚙️ Configuration

Go-Docs MCP reads documents from a configured directory. Set DOCS_MCP_DIR to change it:

Variable	Default	Description
`DOCS_MCP_DIR`	`~/.docs-mcp/documents/`	Directory containing documents to serve
`PDF_MCP_DIR`	(legacy alias)	Backward-compatible alias for `DOCS_MCP_DIR`

Place your documents in the directory and the server finds them automatically. All supported formats are detected.

💡 Usage

With Claude Code

Add to your .claude/settings.json:

{
  "mcpServers": {
    "docs": {
      "command": "go-docs-mcp",
      "env": {
        "DOCS_MCP_DIR": "/path/to/your/documents"
      }
    }
  }
}

With Claude Desktop

Add to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS):

{
  "mcpServers": {
    "docs": {
      "command": "/usr/local/bin/go-docs-mcp",
      "env": {
        "DOCS_MCP_DIR": "/path/to/your/documents"
      }
    }
  }
}

With any MCP client

The server communicates over stdio using JSON-RPC 2.0:

echo '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}' | go-docs-mcp

📖 Tool Reference

`list_documents`

Lists all documents in the configured directory with format detection.

Parameters: None

Example output:

[
  {
    "filename": "architecture-guide.pdf",
    "format": "pdf",
    "title": "architecture-guide",
    "pages": 42,
    "size_bytes": 1048576
  },
  {
    "filename": "notes.md",
    "format": "markdown",
    "title": "notes",
    "size_bytes": 4096
  }
]

`list_formats`

Lists all supported document formats and their dependency status.

Parameters: None

`read_document`

Reads the extracted text content of a document. Automatically falls back to OCR if the document is image-based/scanned and pdftotext returns empty text.

Parameters:

Name	Type	Required	Description
`filename`	string	Yes	The document filename to read
`page`	number	No	Single page number (1-based). Omit for full text.
`pages`	string	No	Page ranges, e.g. "1-5", "10", "1-3,7,10-12". Overrides `page`.

Example input:

{
  "filename": "architecture-guide.pdf",
  "pages": "1-3,10-12"
}

`search_document`

Searches within a document for lines matching a query. Returns matches with 2 lines of context and approximate page numbers.

Parameters:

Name	Type	Required	Description
`filename`	string	Yes	The document filename to search
`query`	string	Yes	Search query (case-insensitive)

Example output:

Found 3 matches for 'microservice' in architecture-guide.pdf:

--- Match 1 (page ~2, line 45) ---
  The system is composed of several
> microservice components that communicate
  via gRPC and message queues.

`get_document_summary`

Returns the text from the first 3 pages of a document as a quick summary.

Parameters:

Name	Type	Required	Description
`filename`	string	Yes	The document filename to summarize

`get_document_metadata`

Returns full document metadata.

Parameters:

Name	Type	Required	Description
`filename`	string	Yes	The document filename to get metadata for

Example output:

{
  "title": "Architecture Guide",
  "author": "Jane Doe",
  "subject": "System Design",
  "creator": "LaTeX",
  "producer": "pdfTeX",
  "creation_date": "Thu May 15 10:30:00 2025",
  "modification_date": "Thu May 15 10:30:00 2025",
  "pages": 42,
  "file_size_bytes": 1048576,
  "pdf_version": "1.5"
}

`get_document_outline`

Extracts the document outline (table of contents / bookmarks) as a structured list.

Parameters:

Name	Type	Required	Description
`filename`	string	Yes	The document filename to extract outline from

`extract_tables`

Extracts tables from a document as structured data.

Parameters:

Name	Type	Required	Description
`filename`	string	Yes	The document filename to extract tables from
`page`	number	No	Specific page to extract from. Omit for all pages.

`extract_images`

Extracts images from a document as base64-encoded data. Returns up to 10 images per call.

Parameters:

Name	Type	Required	Description
`filename`	string	Yes	The document filename to extract images from
`page`	number	No	Specific page to extract from. Omit for all pages.

Example output:

[
  {
    "page": 1,
    "index": 0,
    "format": "jpeg",
    "width": 800,
    "height": 600,
    "data_base64": "/9j/4AAQSkZJRg..."
  }
]

`read_url`

Downloads a document from a URL and extracts its text content. Maximum file size: 50MB.

Parameters:

Name	Type	Required	Description
`url`	string	Yes	The URL of the document to download and read
`pages`	string	No	Page ranges to extract, e.g. "1-5". Omit for full text.

Example input:

{
  "url": "https://example.com/report.pdf",
  "pages": "1-3"
}

`ocr_document`

Forces OCR on a PDF document using tesseract. Useful for scanned/image-based PDFs or when pdftotext returns garbled text. Requires tesseract and pdftoppm.

Note: read_document already auto-detects image-based PDFs and falls back to OCR. Use ocr_document when you want to force OCR regardless, or need to specify a non-English language.

Parameters:

Name	Type	Required	Description
`filename`	string	Yes	The PDF filename to OCR
`page`	number	No	Specific page to OCR (1-based). Omit for all pages.
`language`	string	No	Tesseract language code (default: `eng`). Use `spa`, `fra`, etc.

Example input:

{
  "filename": "scanned-contract.pdf",
  "page": 1,
  "language": "spa"
}

`read_image`

Extracts text from an image file using OCR. Supports PNG, JPG, and TIFF. Requires tesseract.

Parameters:

Name	Type	Required	Description
`filename`	string	Yes	The image filename to read (PNG, JPG, TIFF)
`language`	string	No	Tesseract language code (default: `eng`).

Example input:

{
  "filename": "receipt.png",
  "language": "eng"
}

🔒 Security

Directory-locked — only files within DOCS_MCP_DIR are accessible
Path traversal prevention — filenames sanitized; ../ rejected
Extension filter — only supported formats served
Read-only — no write operations
URL downloads — 50MB limit, Content-Type validated, temp files cleaned immediately

🛠️ Development

make build     # Build the binary
make test      # Run tests with race detector
make clean     # Remove build artifacts

Project structure

go-docs-mcp/
  main.go              # MCP server setup, 12 tool registrations
  internal/
    pdf/
      reader.go        # Document extraction, caching, search, metadata, images, OCR
  Makefile             # Build targets
  go.mod               # Module definition

🦙 Glama Score

📄 License

💛 Support

Drolosoft — Tools we wish existed

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
assets		assets
doc		doc
internal/pdf		internal/pdf
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
glama.json		glama.json
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Folders and files

Latest commit

History

Repository files navigation

Go-Docs MCP

🏆 Why Go-Docs MCP?

📋 Features — 13 Tools

📄 Supported Formats

📦 Prerequisites

🚀 Installation

From source

Build locally

⚙️ Configuration

💡 Usage

With Claude Code

With Claude Desktop

With any MCP client

📖 Tool Reference

list_documents

list_formats

read_document

search_document

get_document_summary

get_document_metadata

get_document_outline

extract_tables

extract_images

read_url

ocr_document

read_image

🔒 Security

🛠️ Development

Project structure

🦙 Glama Score

📄 License

💛 Support

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`list_documents`

`list_formats`

`read_document`

`search_document`

`get_document_summary`

`get_document_metadata`

`get_document_outline`

`extract_tables`

`extract_images`

`read_url`

`ocr_document`

`read_image`

Packages