afpp

afpp — A modern, dependency-light PDF parser for Node.js.

Built for performance, reliability, and developer sanity.

Overview

afpp (Another PDF Parser, Properly) is a Node.js library for extracting text and images from PDF files without heavyweight native dependencies, event-loop blocking, or fragile runtime assumptions.

The project was created to address recurring problems encountered with existing PDF tooling in the Node.js ecosystem:

Excessive bundle sizes and transitive dependencies
Native build steps (canvas, ImageMagick, Ghostscript)
Browser-specific assumptions (window, DOM, canvas)
Poor TypeScript support
Unreliable handling of encrypted PDFs
Performance and memory inefficiencies

afpp focuses on predictable behavior, explicit APIs, and production-ready defaults.

Key Features

Zero native build dependencies
Fully asynchronous, non-blocking architecture
First-class TypeScript support
Supports local files, buffers, and remote URLs
Handles encrypted PDFs
Configurable concurrency and rendering scale
Minimal and auditable dependency graph

Requirements

Node.js >= 22.14.0

Installation

Install using your preferred package manager:

npm install afpp
# or
yarn add afpp
# or
pnpm add afpp

Quick Start

All parsing functions accept the same input types:

string (file path)
Buffer
URL

Extract Text from a PDF

import { readFile } from 'fs/promises';
import path from 'path';

import { pdf2string } from 'afpp';

(async () => {
  const filePath = path.join('..', 'test', 'example.pdf');
  const buffer = await readFile(filePath);

  const pages = await pdf2string(buffer);
  console.log(pages); // ['Page 1 text', 'Page 2 text', ...]
})();

Render PDF Pages as Images

import { pdf2image } from 'afpp';

(async () => {
  const url = new URL('https://pdfobject.com/pdf/sample.pdf');
  const images = await pdf2image(url);

  console.log(images); // [Buffer, Buffer, ...]
})();

Streaming API (Large PDFs)

For large PDFs, use streaming functions to process pages incrementally without loading all results into memory:

import { writeFile } from 'fs/promises';

import { streamPdf2image, streamPdf2string } from 'afpp';

// Stream images - process each page as it's rendered
for await (const { pageNumber, pageCount, data } of streamPdf2image(
  './large.pdf',
)) {
  await writeFile(`page-${pageNumber}.png`, data);
  console.log(`Processed ${pageNumber}/${pageCount}`);
}

// Stream text - process each page as it's extracted
for await (const { pageNumber, data } of streamPdf2string('./large.pdf')) {
  console.log(`Page ${pageNumber}: ${data.substring(0, 100)}...`);
}

Benefits:

Lower peak memory usage
Faster time-to-first-result
Built-in progress tracking via pageNumber and pageCount

Extract PDF Metadata

import { getPdfMetadata } from 'afpp';

const metadata = await getPdfMetadata('./document.pdf');
console.log(metadata.pageCount); // e.g. 9
console.log(metadata.isEncrypted); // false
console.log(metadata.title); // 'My Document' or undefined
console.log(metadata.creationDate); // Date object or undefined

// Encrypted PDF
const meta = await getPdfMetadata('./secure.pdf', { password: 'secret' });
console.log(meta.isEncrypted); // true

Low-Level Parsing API

For advanced use cases, parsePdf exposes page-level control and transformation.

import { parsePdf } from 'afpp';

(async () => {
  const response = await fetch('https://pdfobject.com/pdf/sample.pdf');
  const buffer = Buffer.from(await response.arrayBuffer());

  const result = await parsePdf(buffer, {}, (pageContent) => pageContent);
  console.log(result);
})();

Configuration

All public APIs accept a shared options object.

const result = await parsePdf(buffer, {
  concurrency: 5,
  imageEncoding: 'jpeg',
  password: 'STRONG_PASS',
  scale: 4,
});

AfppParseOptions

Option	Type	Default	Description
`concurrency`	`number \| 'auto'`	`1`	Number of pages processed in parallel. Use `'auto'` for CPU-based scaling.
`imageEncoding`	`'png' \| 'jpeg' \| 'webp' \| 'avif'`	`'png'`	Output format for rendered images
`password`	`string`	—	Password for encrypted PDFs
`scale`	`number`	`1.0`	Rendering scale (1.0 = 72 DPI, 2.0 = 144 DPI)

PdfMetadata

Returned by getPdfMetadata. All fields except pageCount and isEncrypted are optional — absent metadata fields are undefined, never empty strings.

Field	Type	Description
`pageCount`	`number`	Total number of pages
`isEncrypted`	`boolean`	Whether the document required a password to open
`title`	`string?`	Document title
`author`	`string?`	Document author
`subject`	`string?`	Document subject
`creator`	`string?`	Application that created the document
`producer`	`string?`	PDF producer application
`creationDate`	`Date?`	Document creation date
`modificationDate`	`Date?`	Document last modification date

Design Principles

Node-first: No browser globals or DOM assumptions
Explicit over implicit: No magic configuration
Fail fast: Clear errors instead of silent corruption
Production-oriented: Optimized for long-running processes

Name		Name	Last commit message	Last commit date
Latest commit History 359 Commits
.github		.github
.husky		.husky
.vscode		.vscode
.zed		.zed
benchmark		benchmark
docs		docs
examples		examples
src		src
test		test
.czrc		.czrc
.gitignore		.gitignore
.lintstagedrc		.lintstagedrc
.npmrc		.npmrc
.nvmrc		.nvmrc
.oxlintrc.json		.oxlintrc.json
.prettierignore		.prettierignore
.prettierrc.json		.prettierrc.json
.releaserc.json		.releaserc.json
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Improvements.md		Improvements.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
commitlint.config.mjs		commitlint.config.mjs
package-lock.json		package-lock.json
package.json		package.json
tsconfig.build.json		tsconfig.build.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

afpp

Overview

Key Features

Requirements

Installation

Quick Start

Extract Text from a PDF

Render PDF Pages as Images

Streaming API (Large PDFs)

Extract PDF Metadata

Low-Level Parsing API

Configuration

AfppParseOptions

PdfMetadata

Design Principles

License

About

Uh oh!

Releases 39

Packages

Uh oh!

Contributors 5

Uh oh!

Languages

License

l2ysho/afpp

Folders and files

Latest commit

History

Repository files navigation

afpp

Overview

Key Features

Requirements

Installation

Quick Start

Extract Text from a PDF

Render PDF Pages as Images

Streaming API (Large PDFs)

Extract PDF Metadata

Low-Level Parsing API

Configuration

AfppParseOptions

PdfMetadata

Design Principles

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 39

Packages 0

Uh oh!

Contributors 5

Uh oh!

Languages

Packages