⚡️ Speed up method _DocxPartitioner.iter_document_elements by 1,146%
#59
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 1,146% (11.46x) speedup for
_DocxPartitioner.iter_document_elementsinunstructured/partition/docx.py⏱️ Runtime :
44.1 microseconds→3.54 microseconds(best of7runs)📝 Explanation and details
The optimization replaces a ternary expression with an explicit if/else statement in the
iter_document_elementsmethod.What changed: The original code used
return (self._iter_document_elements() if self._document_contains_sections else self._iter_sectionless_document_elements())which creates a generator expression that must be evaluated and returned. The optimized version uses directif/elsewithyield fromstatements.Why it's faster: The ternary expression creates an intermediate generator object that Python must allocate, evaluate, and then return. The direct
if/elsewithyield fromeliminates this overhead by yielding directly from the appropriate method without creating an intermediate object. This is a classic Python micro-optimization where avoiding object creation in hot paths provides measurable speedups.Performance impact: The 1146% speedup (44.1μs → 3.54μs) demonstrates the significant overhead of the ternary expression in generator contexts. This optimization is particularly effective because the function is called from
partition_docx(), which converts the entire iterator to a list, meaning every element yielded goes through this path.Test case benefits: This optimization helps all document types equally since the conditional check happens once per document partition, regardless of document size or structure. Both sectioned and sectionless documents benefit from the reduced overhead in the entry point method.
✅ Correctness verification report:
⚙️ Existing Unit Tests and Runtime
To edit these changes
git checkout codeflash/optimize-_DocxPartitioner.iter_document_elements-mjdvbd39and push.