stet-pdf-reader exposes the structural content of a PDF — metadata,
bookmarks, annotations, form fields, embedded files, page geometry,
parse diagnostics — as typed, read-only Rust data. This is in addition
to the rendering API (render_page, render_page_to_rgba); the two
are independent, and using one does not pay the cost of the other.
The stet inspect <file.pdf> CLI subcommand exercises the full API as
a human-readable summary; see the bottom of this document for sample
output.
Every public enum in this API is marked #[non_exhaustive] so
new variants (PDF 2.0 features, additional annotation subtypes,
new error kinds, additional page-element subtypes, …) can land
additively without forcing a major version bump. Cross-crate match
expressions over these enums must include a wildcard arm:
PdfError(parser/resolver errors)Destination,ViewSpec,Action(link/outline targets)AnnotationKind,AnnotationKindData,AnnotationDate,AnnotationColorFieldKind,ButtonType,FieldValueTrappedFlag,PageLayout,PageMode,ReadingDirection,PrintScaling,Duplex,AfRelationshipLayerIntent,UsageState,PageElementSubtype,LayerTreeNode,BaseState,ListMode,AutoStateEvent,RenderIntentParsePhase,LocationHint,Severity(parse diagnostics)
The structural structs (DocumentMetadata, OutlineItem,
Annotation, FormField, Layer, Configuration, LayerSet,
OcgVisibility, EmbeddedFile, PageBoxes, …) are intentionally
not #[non_exhaustive]. Marking a struct that way blocks all
cross-crate struct expression syntax, including ..Default::default()
updates — which would prevent users from constructing their own
LayerSet overrides or building synthetic documents in tests. New
fields on these structs land additively; consumers should pattern-
match with .. or read fields directly so field additions remain
non-breaking on the read path.
use stet_pdf_reader::PdfDocument;
let data = std::fs::read("document.pdf")?;
let doc = PdfDocument::from_bytes(&data)?;
println!("Title: {:?}", doc.metadata().title);
println!("Pages: {}", doc.page_count());
println!("Outline items: {}", doc.outline().len());
# Ok::<(), Box<dyn std::error::Error>>(())Every accessor below parses lazily on first call and caches its result. A document the caller only renders pays nothing for the structural API surface.
metadata() -> &DocumentMetadata
use stet_pdf_reader::{PdfDocument, TrappedFlag};
# let data = vec![];
let doc = PdfDocument::from_bytes(&data)?;
let m = doc.metadata();
println!("Title: {:?}", m.title);
println!("Author: {:?}", m.author);
println!("Subject: {:?}", m.subject);
println!("Keywords: {:?}", m.keywords);
println!("Creator: {:?}", m.creator); // source authoring app
println!("Producer: {:?}", m.producer); // PDF-writing app
if let Some(d) = &m.creation_date {
println!("Created: {}-{:02}-{:02}", d.year, d.month, d.day);
}
match m.trapped {
Some(TrappedFlag::True) => println!("Trapped for press"),
Some(TrappedFlag::False) => println!("Not trapped"),
Some(TrappedFlag::Unknown) => println!("Trap state unknown"),
Some(_) => {} // TrappedFlag is #[non_exhaustive]
None => {}
}
// XMP metadata stream (PDF 1.4+) as raw XML
if let Some(xmp) = &m.xmp_xml {
println!("XMP length: {} bytes", xmp.len());
}
// Non-standard /Info entries the document carried
for (key, value) in &m.custom {
println!("{key} = {value}");
}
# Ok::<(), Box<dyn std::error::Error>>(())String decoding handles PDFDocEncoding (default), UTF-16BE with BOM
(FE FF), and UTF-8 with BOM (EF BB BF, PDF 2.0). Date strings come
in PDF's own format D:YYYYMMDDHHmmSSOHH'mm'; truncated forms (just
year, year+month, etc.) are accepted.
viewer_preferences() -> &ViewerPreferences
use stet_pdf_reader::{PageMode, PrintScaling};
# let doc: stet_pdf_reader::PdfDocument = unimplemented!();
let prefs = doc.viewer_preferences();
if prefs.hide_toolbar { println!("hide toolbar"); }
if prefs.fit_window { println!("fit window to first page"); }
if prefs.display_doc_title { println!("show /Info /Title in title bar"); }
match prefs.page_mode {
PageMode::FullScreen => println!("opens in full-screen"),
PageMode::UseOutlines => println!("opens with outline panel"),
_ => {}
}
match prefs.print_scaling {
PrintScaling::None => println!("default: print at 100%"),
PrintScaling::AppDefault => println!("default: viewer chooses"),
_ => {} // PrintScaling is #[non_exhaustive]
}All fields default per spec (ISO 32000-2 §12.2 Table 147 / §7.7.2 Table 28) when the corresponding entries are absent.
outline() -> &[OutlineItem]
use stet_pdf_reader::{Destination, OutlineItem};
# let doc: stet_pdf_reader::PdfDocument = unimplemented!();
fn print_outline(items: &[OutlineItem], depth: usize) {
let pad = " ".repeat(depth);
for item in items {
let target = match &item.destination {
Some(Destination::PageView { page: Some(p), .. }) => format!(" → page {}", p + 1),
Some(Destination::NamedDest(name)) => format!(" → /{name}"),
_ => String::new(),
};
println!("{pad}- {}{target}", item.title);
print_outline(&item.children, depth + 1);
}
}
print_outline(doc.outline(), 0);Each item carries title, destination and/or action, recursive
children, optional RGB color, style (italic/bold flags), and
open (from the sign of /Count).
The walker bounds traversal at 64 levels and 100 000 nodes; cycles are
detected via a visited set and result in a ParseWarning rather than
infinite recursion.
use stet_pdf_reader::{Action, Destination, ViewSpec};
# let doc: stet_pdf_reader::PdfDocument = unimplemented!();
// Every named destination in the document, merged from /Catalog /Dests
// and /Catalog /Names /Dests (legacy entries override name-tree entries
// per ISO 32000-2 §12.3.2.3).
for (name, dest) in doc.destinations() {
if let Destination::PageView { page: Some(p), view, .. } = dest {
println!("{name} → page {}", p + 1);
match view {
ViewSpec::Xyz { zoom, .. } => println!(" zoom: {:?}", zoom),
ViewSpec::Fit => println!(" fit page"),
_ => {}
}
}
}
// Direct lookup
if let Some(dest) = doc.resolve_named_destination("Chapter1") {
// ...
}Action is an enum covering URI, GoTo, GoToR (remote), GoToE
(embedded), Launch, Named, JavaScript, SubmitForm, ResetForm, Hide,
Sound, Movie, Thread, and Other for unknown subtypes. JavaScript is
exposed as raw source — stet does not execute.
page_annotations(page) -> Result<&[Annotation], PdfError>
use stet_pdf_reader::{Annotation, AnnotationKind, AnnotationKindData, Action};
# let doc: stet_pdf_reader::PdfDocument = unimplemented!();
for page in 0..doc.page_count() {
for annot in doc.page_annotations(page)? {
match (&annot.kind, &annot.kind_data) {
(AnnotationKind::Link, AnnotationKindData::Link(link)) => {
if let Some(Action::Uri { uri, .. }) = &link.action {
println!("page {}: link → {uri}", page + 1);
}
}
(AnnotationKind::Highlight, AnnotationKindData::Markup(m)) => {
println!("page {}: highlight over {} regions", page + 1, m.quad_points.len());
}
(AnnotationKind::Text, AnnotationKindData::Text(t)) => {
println!(
"page {}: sticky note ({:?}) — {:?}",
page + 1,
t.icon,
annot.contents
);
}
_ => {}
}
}
}
# Ok::<(), Box<dyn std::error::Error>>(())Per-page lazy parsing: each page caches independently in a
Vec<OnceCell<Vec<Annotation>>>, so a 1000-page document with
annotations only on a handful of pages doesn't pay to parse the rest.
Common fields (rect, contents, name, modified, title,
subject, flags, color, border, has_appearance) live on
Annotation; subtype-specific fields live in AnnotationKindData
variants:
Link— action, destination, highlight mode, quad pointsText— open state, icon name, review stateFreeText— default appearance, quadding, callout lineMarkup(Highlight / Underline / Squiggly / StrikeOut) — quad pointsLine— endpoints, line endings, leader length, captionShape(Square / Circle) — interior color, border paddingPolygon(Polygon / PolyLine) — vertices, line endingsInk— strokesStamp— icon nameCaret— symbolFileAttachment— filename, iconPopup— open state, parent annotation referenceMinimal— for unhandled or rare subtypes (Screen, PrinterMark, TrapNet, Watermark, Sound, Movie, Widget, 3D, RichMedia, Other)
form() -> Option<&FormCatalog>
use stet_pdf_reader::{ButtonType, FieldKind, FieldValue};
# let doc: stet_pdf_reader::PdfDocument = unimplemented!();
let Some(form) = doc.form() else {
return; // document has no AcroForm
};
if form.sig_flags.signatures_exist {
println!("document carries digital signatures");
}
fn walk(fields: &[stet_pdf_reader::FormField]) {
for f in fields {
match &f.kind {
FieldKind::Text(tf) => println!(
"Text {} = {:?} (max {:?}, multiline={})",
f.name, f.value, tf.max_length, tf.multiline
),
FieldKind::Button(bf) => match bf.button_type {
ButtonType::Checkbox => println!("Check {} = {:?}", f.name, f.value),
ButtonType::Radio => println!("Radio {} (options: {:?})", f.name, bf.options),
ButtonType::Pushbutton => println!("Push {}", f.name),
_ => {} // ButtonType is #[non_exhaustive]
},
FieldKind::Choice(cf) => println!(
"{} {} = {:?} ({} options)",
if cf.combo { "Combo" } else { "List " },
f.name, f.value, cf.options.len()
),
FieldKind::Signature(_) => println!("Sig {}", f.name),
FieldKind::Container => {} // non-terminal namespace node
FieldKind::Other { ft } => println!("Other ({}) {}", ft, f.name),
_ => {} // FieldKind is #[non_exhaustive]
}
walk(&f.children);
}
}
walk(&form.fields);Field names are fully qualified — /T partials joined with . from
the root, so order.shipping.street is one field nested two levels
deep under containers order then shipping.
Each terminal field also carries widget_obj_nums: Vec<u32> —
the object numbers of its widget annotations. To fetch the renderable
widget data for a field:
# use stet_pdf_reader::{AnnotationKind, FormField};
# let doc: stet_pdf_reader::PdfDocument = unimplemented!();
# let field: FormField = unimplemented!();
for page in 0..doc.page_count() {
for annot in doc.page_annotations(page).unwrap_or(&[]) {
if annot.kind == AnnotationKind::Widget {
// Match by obj_num via your own bookkeeping; the reader
// does not currently expose obj_num on Annotation directly.
// Annotations are deduplicated against the page's /Annots
// array.
}
}
}page_boxes(page) -> Result<PageBoxes, PdfError>
# let doc: stet_pdf_reader::PdfDocument = unimplemented!();
let pb = doc.page_boxes(0)?;
println!("MediaBox: {:?}", pb.media_box);
if let Some(b) = pb.crop_box { println!("CropBox: {:?}", b); }
if let Some(b) = pb.bleed_box { println!("BleedBox: {:?}", b); }
if let Some(b) = pb.trim_box { println!("TrimBox: {:?}", b); }
if let Some(b) = pb.art_box { println!("ArtBox: {:?}", b); }
println!("Rotate: {}", pb.rotate);
println!("UserUnit: {}", pb.user_unit);
if pb.has_transition { println!("page transition declared"); }
if pb.has_additional_actions { println!("page-AA dict declared"); }
# Ok::<(), Box<dyn std::error::Error>>(())Each box is Option<[f64; 4]> so callers can distinguish "explicitly
set" from "inherits MediaBox per spec default". Rotation is normalized
to 0/90/180/270 (negative values wrap; non-multiples of 90 coerce to 0).
# use stet_pdf_reader::{AfRelationship};
# let doc: stet_pdf_reader::PdfDocument = unimplemented!();
for (name, ef) in doc.embedded_files() {
let size = ef.size.map(|n| format!("{n} B")).unwrap_or_default();
let rel = match &ef.relationship {
Some(AfRelationship::Source) => "source",
Some(AfRelationship::Data) => "data",
Some(AfRelationship::FormData) => "form-data",
_ => "?",
};
println!("{name} ({size}, {rel}, {:?})", ef.mime_type);
}
// Read the bytes of a specific attachment on demand.
let bytes = doc.embedded_file_bytes("data.csv")?;
# Ok::<(), Box<dyn std::error::Error>>(())Embedded files come from the catalog's /Names /EmbeddedFiles name
tree. Stream bytes are decoded on demand; the table itself only holds
metadata.
PDFs can mark slices of their content for selective visibility — CAD
layers, watermarks, multilingual annotations, print-only or
screen-only overlays. stet-pdf-reader exposes the full Optional
Content Group (OCG) model: per-layer metadata, hierarchy, alternate
configurations, runtime visibility overrides via LayerSet, OCMD
membership policies and /VE boolean expressions, and intent-driven
rendering.
# use stet_pdf_reader::{PdfDocument, RenderIntent, layers};
# let doc: PdfDocument = unimplemented!();
for layer in doc.layers() {
println!(
"{:>3} {:<32} default_visible={} locked={}",
layer.ocg_id, layer.name, layer.default_visible, layer.locked
);
}
// Render the page hiding any /AS-marked print-off layers.
let print_set = doc.layer_set_for(RenderIntent::Print);
let (rgba, w, h) = doc.render_page_to_rgba_with_layers(0, 150.0, &print_set)?;
// Or build a custom override set and toggle one layer.
let mut custom = layers::layer_set_from_document(&doc);
custom.set(/* ocg_id */ 42, false);
let (rgba2, _, _) = doc.render_page_to_rgba_with_layers(0, 150.0, &custom)?;
# let _ = (rgba, rgba2, w, h);
# Ok::<(), Box<dyn std::error::Error>>(())Full reference (types, methods, OCMD semantics, intent-driven
rendering, /VE grammar): see
docs/PDF-LAYERS.md.
parse_warnings() -> Ref<'_, [ParseWarning]>
use stet_pdf_reader::{ParsePhase, Severity};
# let doc: stet_pdf_reader::PdfDocument = unimplemented!();
for w in doc.parse_warnings().iter() {
let sev = match w.severity {
Severity::Info => "info",
Severity::Warning => "warn",
Severity::Error => "error",
};
let phase = match &w.phase {
ParsePhase::Outline => "outline".to_string(),
ParsePhase::Annotations { page } => format!("annotations(p{})", page + 1),
ParsePhase::Form => "form".to_string(),
// ...
other => format!("{:?}", other),
};
eprintln!("[{sev}] {phase}: {}", w.message);
}Recoverable malformations — outline cycles, annotations missing
/Rect, form-field trees exceeding the depth cap — push a
ParseWarning rather than failing the whole parse. The list grows as
accessors are called for the first time; cached subsequent calls
don't re-emit.
The returned Ref wraps a RefCell borrow of the underlying
storage. Drop it before invoking other accessors that might push more
warnings (iterating through it is fine; calling doc.outline()
mid-iteration is not).
The stet inspect <file.pdf> subcommand exercises every accessor in
this document and pretty-prints the result. Sections appear only
when the document has the corresponding data:
$ stet inspect document.pdf
document.pdf
Metadata:
Title: Annual Report 2026
Author: Scott Bowman
Producer: stet 0.2.0
Created: 2026-04-27 12:00:00 UTC
Pages: 4
Page 1 size: 612.0 × 792.0 pt (8.50 × 11.00 in)
Outline (3 entries):
- Chapter 1 → page 1 (fit)
- Section 1.1 → page 2 (xyz)
- Chapter 2 → page 3 (fit)
Named destinations: 2
Annotations: 3
Page 1: 2 Link
Page 3: 1 Highlight
Form: 4 terminal fields (4 widgets)
By kind: Button: 1, Text: 3
NeedAppearances: true
Embedded files: 1
- data.csv (1.2 KB, text/csv)
Warnings: 1
outline: 1
[warn] outline: outline cycle detected; sibling chain truncated
Pass --password <pw> for encrypted documents.
- Single-threaded.
PdfDocument<'a>is!Send(the resolver usesRefCellfor caching). If you need to share a document across threads, build a wrapper that re-parses or use one document per thread. Cross-thread use of the structural API is not supported. - PDF 2.0 coverage. We parse PDF 1.x and most PDF 2.0 additions (FormData / Schema relationship hints, UTF-8 string BOM, …). XFA payloads are detected but not parsed.
- No write API. This is a read-only API. Authoring PDF
structures (writing bookmarks, annotations, form fields) is the
separate
pdfmarkplan, not this one. - Walker caps. Outline / form-field / name-tree walkers cap at
depths in the 32–64 range and total nodes / entries in the
100 000 – 1 000 000 range. Real-world documents never hit these;
caps exist to make pathological / cyclic / malicious inputs
bounded rather than fatal. Each truncation pushes a
[
ParseWarning]. - Cross-linking widgets. A terminal
FormField'swidget_obj_numslist cross-references the widget annotations on the relevant pages. Look up the annotation via your own bookkeeping if you need both views of the same widget.