opencode-parser

An opencode plugin that parses any file into structured text the LLM can work with.

Opencode-parser.mp4

Install

{
  "plugin": ["opencode-parser"]
}

Or install via CLI: opencode plugin opencode-parser -g

Or copy src/ into .opencode/tools/ for a local zero-config setup.

Supported formats

Format	Extensions	Extracted
PDF	`.pdf`	Text, metadata, pages
Word	`.docx`	Text, tables, metadata
Excel	`.xlsx`, `.xls`, `.csv`, `.tsv`	Text, tables, sheet names
PowerPoint	`.pptx`, `.ppt`	Slide text, speaker notes
Images	`.png`, `.jpg`, `.jpeg`, `.webp`, `.gif`, `.bmp`, `.tiff`	OCR text (opt-in)
EPUB	`.epub`	Full text with heading structure
HTML	`.html`, `.htm`	Body text, headings
XML	`.xml`	Stripped text content
Markdown	`.md`	Raw text
Jupyter	`.ipynb`	Code, markdown, outputs
ZIP	`.zip`	File listing with sizes
Archives	`.rar`, `.7z`, `.tar`, `.gz`	Listing (extraction notes)
Plain text	`.txt`, `.json`, `.yaml`, `.toml`, `.ini`	Raw content

Usage

Parse @report.pdf and give me a summary

parse the spreadsheet at @data.xlsx but only the first 3 sheets

parse @report.pdf and save the full output

Options

Option	Default	Description
`filePath`	—	Path to the file (required)
`maxChars`	50000	Limit output chars (`-1` for unlimited). Pass `-1` to get the full document.
`extractTables`	true	Extract tables from docs/spreadsheets
`extractImages`	false	Enable OCR for images
`ocrLang`	"eng"	OCR language for tesseract.js (e.g. "eng", "fra", "ara")
`maxPages`	varies	Limit pages/slides/sheets processed
`save`	false	Save the full parsed output as a `.md` file alongside the original (no truncation)
`outputPath`	—	Custom path for the Markdown export (overrides `save` path)

How it works

File is verified by magic bytes, not just extension
Type detection dispatches to the right parser
Metadata is extracted (author, pages, sheet count, etc.)
Tables become readable markdown
Large content is truncated gracefully with a note to the LLM

All 15+ format handlers return the same output structure, so the LLM gets consistent results regardless of file type.

Development

npm install
npm run typecheck

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

opencode-parser

Install

Supported formats

Usage

Options

How it works

Development

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

opencode-parser

Install

Supported formats

Usage

Options

How it works

Development

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages