-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
Description
The GetDocumentMetadata() function currently only processes section properties (w:sectPr) at the document body's top level. Section breaks that occur inside tables or text boxes are not detected.
Current Behavior
The CollectSectionData method in WmlToHtmlConverter.cs (lines 850-910) only iterates over top-level elements:
var blockElements = body.Elements()
.Where(e => e.Name == W.p || e.Name == W.tbl || e.Name == W.sectPr)
.ToList();This means:
- ✅
sectPrinside paragraph properties (w:p/w:pPr/w:sectPr) is handled - ✅ Document-level
sectPrat end of body (w:body/w:sectPr) is handled - ❌
sectPrinside tables (w:tbl/.../w:sectPr) is NOT detected - ❌
sectPrinside text boxes/shapes is NOT detected
Expected Behavior
The metadata extraction should scan the entire document tree for sectPr elements, not just top-level ones.
Impact
- Documents with section breaks inside tables may report incorrect section counts
- Paragraph/table indices per section may be inaccurate for complex documents
- This is an edge case - most documents don't have section breaks inside tables
Suggested Implementation
- Use
body.Descendants(W.sectPr)or a recursive scan to find all sectPr elements - Determine the proper ordering of sections based on document position
- Associate content (paragraphs/tables) with their containing sections
Related
- PR feat(npm): add document metadata API for lazy loading (Issue #44 Phase 3) #50: Document Metadata API for Lazy Loading
- Issue WASM execution blocks UI thread, preventing loading states from rendering #44: Performance improvements and lazy loading
Labels
enhancement, metadata-api
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels