Skip to content

Conversation

@srkirkland
Copy link
Member

@srkirkland srkirkland commented Jan 16, 2026

Adds the ability to generate PDF bookmarks (outlines) for tagged PDFs. This feature is controlled by a configuration option and defaults to off.

  • Introduces a new IPdfBookmarkService to handle bookmark generation.
  • Implements a PdfBookmarkService that extracts headings from tagged PDFs to create bookmarks.
  • Adds a no-op implementation for when the feature is disabled.
  • Includes integration tests to verify bookmark generation.

Summary by CodeRabbit

  • New Features

    • Added automatic bookmark generation for tagged PDF documents to enhance navigation and usability.
    • Made PDF bookmark generation configurable through feature flags.
  • Tests

    • Added comprehensive test coverage for bookmark generation functionality across various PDF document states.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 16, 2026

📝 Walkthrough

Walkthrough

This PR adds PDF bookmark generation functionality to the ingest pipeline. It introduces a new IPdfBookmarkService interface with PdfBookmarkService and NoopPdfBookmarkService implementations, adds UsePdfBookmarks configuration flags, wires the service via dependency injection, integrates it into PdfRemediationProcessor, and includes integration tests.

Changes

Cohort / File(s) Summary
Configuration Options
server.core/Ingest/FileIngestOptions.cs, server.core/Ingest/PdfProcessorOptions.cs
Adds new UsePdfBookmarks boolean property to both classes; FileIngestOptions resets the flag in UseNoops method.
Bookmark Service Interface & Implementations
server.core/Remediate/Bookmarks/IPdfBookmarkService.cs, server.core/Remediate/Bookmarks/PdfBookmarkService.cs, server.core/Remediate/Bookmarks/NoopPdfBookmarkService.cs
Defines IPdfBookmarkService interface with EnsureBookmarksAsync method; PdfBookmarkService provides full implementation with heading extraction, outline hierarchy building, and tag tree traversal logic; NoopPdfBookmarkService provides a no-op implementation.
Dependency Injection & Processor Integration
server.core/Ingest/IngestServiceCollectionExtensions.cs, server.core/Remediate/PdfRemediationProcessor.cs
Binds UsePdfBookmarks configuration and registers IPdfBookmarkService (PdfBookmarkService or NoopPdfBookmarkService based on flag); PdfRemediationProcessor accepts IPdfBookmarkService dependency and calls EnsureBookmarksAsync after initializing PDF structure indices.
Worker Configuration
workers/function.ingest/Program.cs
Exposes UsePdfBookmarks configuration flag from Ingest:UsePdfBookmarks or INGEST_USE_PDF_BOOKMARKS sources during AddFileIngest setup.
Integration Tests & Helpers
tests/server.tests/Integration/Remediate/PdfBookmarkServiceTests.cs, tests/server.tests/Integration/Remediate/NoopPdfBookmarkService.cs, tests/server.tests/Integration/Remediate/PdfRemediationProcessor*Tests.cs (3 files)
Adds comprehensive test suite for PdfBookmarkService covering tagged PDFs with/without bookmarks and untagged PDFs; test helper NoopPdfBookmarkService; updates PdfRemediationProcessor constructor calls across three test files to include new bookmark service parameter.
Test Fixtures & Documentation
tests/server.tests/Fixtures/pdfs/README.md
Documents new tagged-missing-bookmarks.pdf test fixture.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

🐰 With whiskers twitching, I've hopped through the code,
Adding bookmarks to PDFs, a path newly trod,
Hierarchies built from tagged headings so neat,
Outlines and destinations, a complete feat,
Service injected, tests written with care—
A bookmark-filled PDF feast, beyond compare! 📚✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 6.82% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title 'Enables PDF bookmark generation' accurately summarizes the main change: introducing the ability to generate PDF bookmarks for tagged PDFs through a new feature.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@server.core/Remediate/Bookmarks/IPdfBookmarkService.cs`:
- Around line 7-12: The XML doc on IPdfBookmarkService currently says
implementations "must not throw", which is misleading because cancellation
should be allowed; update the <remarks> for the IPdfBookmarkService contract to
state that implementations must swallow and not propagate non-cancellation
exceptions but may allow OperationCanceledException (or other cancellation
exceptions) to propagate so callers can handle cancellations—mention that
implementers should catch Exception and rethrow only if it's a cancellation
(e.g., OperationCanceledException).
🧹 Nitpick comments (1)
tests/server.tests/Integration/Remediate/PdfBookmarkServiceTests.cs (1)

178-192: Consider centralizing repo-root discovery for reuse.

This helper appears in multiple integration tests; a shared utility could reduce duplication and drift.

Comment on lines +7 to +12
/// <summary>
/// Ensures the provided PDF has bookmarks (outlines) when possible.
/// </summary>
/// <remarks>
/// Implementations must not throw; failures should be treated as a no-op.
/// </remarks>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Clarify the “must not throw” contract to allow cancellation.

Both the no-op and real implementation will throw OperationCanceledException on cancellation. The remark currently forbids all throws, which is misleading for callers/implementers. Consider tightening the doc to “swallow non-cancellation exceptions.”

✏️ Proposed doc fix
-    /// <remarks>
-    /// Implementations must not throw; failures should be treated as a no-op.
-    /// </remarks>
+    /// <remarks>
+    /// Implementations should swallow non-cancellation exceptions and treat failures as a no-op.
+    /// <see cref="OperationCanceledException" /> may propagate when cancellation is requested.
+    /// </remarks>
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/// <summary>
/// Ensures the provided PDF has bookmarks (outlines) when possible.
/// </summary>
/// <remarks>
/// Implementations must not throw; failures should be treated as a no-op.
/// </remarks>
/// <summary>
/// Ensures the provided PDF has bookmarks (outlines) when possible.
/// </summary>
/// <remarks>
/// Implementations should swallow non-cancellation exceptions and treat failures as a no-op.
/// <see cref="OperationCanceledException" /> may propagate when cancellation is requested.
/// </remarks>
🤖 Prompt for AI Agents
In `@server.core/Remediate/Bookmarks/IPdfBookmarkService.cs` around lines 7 - 12,
The XML doc on IPdfBookmarkService currently says implementations "must not
throw", which is misleading because cancellation should be allowed; update the
<remarks> for the IPdfBookmarkService contract to state that implementations
must swallow and not propagate non-cancellation exceptions but may allow
OperationCanceledException (or other cancellation exceptions) to propagate so
callers can handle cancellations—mention that implementers should catch
Exception and rethrow only if it's a cancellation (e.g.,
OperationCanceledException).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants