Skip to content

Add Reader.EmbeddedFiles with a cycle-guarded name-tree walk#5

Merged
pgundlach merged 1 commit into
mainfrom
claude/embedded-files-accessor
Jun 19, 2026
Merged

Add Reader.EmbeddedFiles with a cycle-guarded name-tree walk#5
pgundlach merged 1 commit into
mainfrom
claude/embedded-files-accessor

Conversation

@fank

@fank fank commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

Why

Inspectors that read PDF attachments (ZUGFeRD/Factur-X invoice XML, etc.) otherwise hand-roll the EmbeddedFiles name-tree walk, and a cyclic or self-referential /Kids graph drives that recursion into a stack overflow — fatal and unrecoverable. (Reproduced here: without the guard, the cyclic test crashes with fatal error: stack overflow.)

What

Reader.EmbeddedFiles() []EmbeddedFile walks the catalog's EmbeddedFiles name tree once and returns the attachments in tree order:

type EmbeddedFile struct {
	Name string // name-tree key, e.g. "factur-x.xml"
	Spec *Dict  // /Filespec dict; its /EF stream holds the bytes
}

The walk is guarded by a visited-set (reference cycles) and a depth cap (inline-nested /Kids), so a hostile tree can neither loop nor overflow the stack. The shape mirrors the existing DocumentInfo() accessor; an ordered slice (not a map) keeps the result deterministic.

Tests

TestEmbeddedFiles (flat), TestEmbeddedFilesNestedKids, TestEmbeddedFilesCyclicKidsTerminates (the cycle that previously overflowed), TestEmbeddedFilesNone. Full suite + go vet pass; gofmt-clean.

Inspectors that read PDF attachments (e.g. ZUGFeRD/Factur-X invoice XML)
otherwise hand-roll the EmbeddedFiles name-tree walk, where a cyclic or
self-referential /Kids graph drives the recursion into a stack overflow.

Add Reader.EmbeddedFiles() []EmbeddedFile: it walks the tree once with a
visited-set (reference cycles) and a depth cap (inline-nested /Kids) and
returns entries in tree order, mirroring DocumentInfo's accessor shape.

Tests: flat, nested /Kids, cyclic /Kids terminates, and no-attachments.
@fank fank marked this pull request as ready for review June 18, 2026 22:53
@fank fank requested a review from pgundlach June 18, 2026 22:53
@pgundlach pgundlach merged commit e2d91ac into main Jun 19, 2026
1 check passed
@pgundlach pgundlach deleted the claude/embedded-files-accessor branch June 19, 2026 05:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants