parsepdf

A multi-library PDF text extraction tool to test how Applicant Tracking Systems (ATS) parse your resume/CV.

If you've created a fancy resume in Figma, Canva, or similar design tools, there's a good chance ATS systems can't read it properly. This tool lets you see exactly what different parsing engines extract from your PDF.

Why?

Most job applications go through an ATS that extracts text from your resume. If the extraction is garbled, your resume might get rejected before a human ever sees it. This tool runs your PDF through 8 different parsing libraries used by real ATS systems, so you can spot problems before applying.

Quick Start (Docker)

# Build the image
docker build -t parsepdf .

# Parse a PDF (outputs appear next to the input file)
docker run --rm -v ~/Desktop:/data parsepdf resume.pdf

This mounts your Desktop folder to /data in the container. The 8 output .txt files appear alongside your PDF.

You can mount any folder:

docker run --rm -v /path/to/folder:/data parsepdf myfile.pdf

Installation (without Docker)

git clone https://github.com/yourusername/parsepdf.git
cd parsepdf

# Python dependencies
pip install -r requirements.txt

# JavaScript dependencies
npm install

# Go binary (already built, or rebuild with)
go build -o parsepdf-go parsepdf.go

# Optional: poppler for pdftotext (recommended)
brew install poppler  # macOS
# apt install poppler-utils  # Ubuntu/Debian

Usage

./parse resume.pdf

By default, it looks for PDFs in ~/Desktop and outputs there too:

./parse resume.pdf  # looks for ~/Desktop/resume.pdf

Use absolute paths for other locations:

./parse /path/to/resume.pdf

Output

Running ./parse resume.pdf creates 8 text files:

Library	Output File	Notes
pdfminer.six	`resume.pdfminer.txt`	Python - common in enterprise ATS
PyMuPDF	`resume.pymupdf.txt`	Python - fast and accurate
pdfplumber	`resume.pdfplumber.txt`	Python - good for tables
pdf-parse	`resume.pdfparse.txt`	JavaScript - built on pdf.js
pdfjs-dist	`resume.pdfjs.txt`	JavaScript - Mozilla's pdf.js
pdf2json	`resume.pdf2json.txt`	JavaScript
ledongthuc/pdf	`resume.ledongthuc.txt`	Go
poppler	`resume.pdftotext.txt`	CLI - very common in enterprise ATS

What to look for

Open each output file and check:

Is the text in the right order? Multi-column layouts often get jumbled
Are there garbage characters? Custom fonts may not embed properly
Is anything missing? Text in images won't be extracted
Are words running together? Spacing issues from design tools

If pdftotext (poppler) output looks bad, most enterprise ATS systems will have the same problem.

Fixing common issues

Text out of order: Flatten your design to a single-column layout, or ensure text boxes are created in reading order
Garbage characters: Use standard fonts (Arial, Helvetica, Times) or ensure fonts are properly embedded
Missing text: Don't put important info in images - use actual text
Export settings: If you really must use Figma, there's no exporting of PDF with actual text, it outlines it. So you have to export as SVG, uncheck the outline text box and then reconstruct the SVG files in another application. It's a proper faff. But who really knows right? Maybe it's you and maybe its' the CV. There's no way to know.

Libraries used

Python

pdfminer.six - Pure Python PDF parser
PyMuPDF - Python bindings for MuPDF
pdfplumber - PDF parsing with table extraction

JavaScript

pdf-parse - PDF parser built on pdf.js
pdfjs-dist - Mozilla's PDF.js
pdf2json - PDF to JSON converter

Go

ledongthuc/pdf - Pure Go PDF reader

CLI

poppler - PDF rendering library (pdftotext)

License

MIT License - see LICENSE

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod
go.sum		go.sum
package-lock.json		package-lock.json
package.json		package.json
parse		parse
parsepdf.go		parsepdf.go
parsepdf.js		parsepdf.js
parsepdf.py		parsepdf.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

parsepdf

Why?

Quick Start (Docker)

Installation (without Docker)

Usage

Output

What to look for

Fixing common issues

Libraries used

License

About

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

License

treejamie/maybe-ats-parse

Folders and files

Latest commit

History

Repository files navigation

parsepdf

Why?

Quick Start (Docker)

Installation (without Docker)

Usage

Output

What to look for

Fixing common issues

Libraries used

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 2

Uh oh!

Languages