Skip to content

Document Metadata: Support nested sectPr in tables and text boxes #51

@JSv4

Description

@JSv4

Description

The GetDocumentMetadata() function currently only processes section properties (w:sectPr) at the document body's top level. Section breaks that occur inside tables or text boxes are not detected.

Current Behavior

The CollectSectionData method in WmlToHtmlConverter.cs (lines 850-910) only iterates over top-level elements:

var blockElements = body.Elements()
    .Where(e => e.Name == W.p || e.Name == W.tbl || e.Name == W.sectPr)
    .ToList();

This means:

  • sectPr inside paragraph properties (w:p/w:pPr/w:sectPr) is handled
  • ✅ Document-level sectPr at end of body (w:body/w:sectPr) is handled
  • sectPr inside tables (w:tbl/.../w:sectPr) is NOT detected
  • sectPr inside text boxes/shapes is NOT detected

Expected Behavior

The metadata extraction should scan the entire document tree for sectPr elements, not just top-level ones.

Impact

  • Documents with section breaks inside tables may report incorrect section counts
  • Paragraph/table indices per section may be inaccurate for complex documents
  • This is an edge case - most documents don't have section breaks inside tables

Suggested Implementation

  1. Use body.Descendants(W.sectPr) or a recursive scan to find all sectPr elements
  2. Determine the proper ordering of sections based on document position
  3. Associate content (paragraphs/tables) with their containing sections

Related

Labels

enhancement, metadata-api

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions