An opencode plugin that parses any file into structured text the LLM can work with.
Opencode-parser.mp4
{
"plugin": ["opencode-parser"]
}Or install via CLI: opencode plugin opencode-parser -g
Or copy src/ into .opencode/tools/ for a local zero-config setup.
| Format | Extensions | Extracted |
|---|---|---|
.pdf |
Text, metadata, pages | |
| Word | .docx |
Text, tables, metadata |
| Excel | .xlsx, .xls, .csv, .tsv |
Text, tables, sheet names |
| PowerPoint | .pptx, .ppt |
Slide text, speaker notes |
| Images | .png, .jpg, .jpeg, .webp, .gif, .bmp, .tiff |
OCR text (opt-in) |
| EPUB | .epub |
Full text with heading structure |
| HTML | .html, .htm |
Body text, headings |
| XML | .xml |
Stripped text content |
| Markdown | .md |
Raw text |
| Jupyter | .ipynb |
Code, markdown, outputs |
| ZIP | .zip |
File listing with sizes |
| Archives | .rar, .7z, .tar, .gz |
Listing (extraction notes) |
| Plain text | .txt, .json, .yaml, .toml, .ini |
Raw content |
Parse @report.pdf and give me a summary
parse the spreadsheet at @data.xlsx but only the first 3 sheets
parse @report.pdf and save the full output
| Option | Default | Description |
|---|---|---|
filePath |
— | Path to the file (required) |
maxChars |
50000 | Limit output chars (-1 for unlimited). Pass -1 to get the full document. |
extractTables |
true | Extract tables from docs/spreadsheets |
extractImages |
false | Enable OCR for images |
ocrLang |
"eng" | OCR language for tesseract.js (e.g. "eng", "fra", "ara") |
maxPages |
varies | Limit pages/slides/sheets processed |
save |
false | Save the full parsed output as a .md file alongside the original (no truncation) |
outputPath |
— | Custom path for the Markdown export (overrides save path) |
- File is verified by magic bytes, not just extension
- Type detection dispatches to the right parser
- Metadata is extracted (author, pages, sheet count, etc.)
- Tables become readable markdown
- Large content is truncated gracefully with a note to the LLM
All 15+ format handlers return the same output structure, so the LLM gets consistent results regardless of file type.
npm install
npm run typecheck
MIT