Skip to content

fix: improve EPUB image parsing and path resolution#22

Merged
Aatricks merged 1 commit intomainfrom
epub-fix-images
Jan 13, 2026
Merged

fix: improve EPUB image parsing and path resolution#22
Aatricks merged 1 commit intomainfrom
epub-fix-images

Conversation

@Aatricks
Copy link
Owner

This pull request improves the EPUB chapter parsing and image handling logic in the ContentRepository. The main changes make the content extraction more robust, especially for images and nested elements, and improve the way image paths are resolved and matched within EPUB archives.

Parsing and content extraction improvements:

  • Replaced the flat iteration over body children with a recursive traverse function to better handle nested elements, ensuring both text and images (including those nested inside other tags) are properly extracted and added as ContentElements. This also improves direct text extraction from non-standard containers.

EPUB image path resolution and matching:

  • Enhanced the resolveEpubPath function to normalize relative paths, correctly handling . and .. path segments, preventing issues with image references that use directory traversal.
  • Normalized image paths by replacing backslashes with slashes and removing leading slashes when extracting image references from URLs and EPUB entries, ensuring consistent matching regardless of path format. [1] [2]

@Aatricks Aatricks self-assigned this Jan 13, 2026
@Aatricks Aatricks merged commit 8312995 into main Jan 13, 2026
1 check passed
@Aatricks Aatricks deleted the epub-fix-images branch January 13, 2026 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant