⚡️ Speed up method ObjectDetectionLayoutDumper.dump by 229%
#68
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 229% (2.29x) speedup for
ObjectDetectionLayoutDumper.dumpinunstructured/partition/pdf_image/analysis/layout_dump.py⏱️ Runtime :
24.5 microseconds→7.46 microseconds(best of37runs)📝 Explanation and details
The optimization adds
@lru_cache(maxsize=8)to theobject_detection_classesfunction, which provides a 229% speedup by caching expensive model loading operations.What was optimized:
functools.lru_cachedecorator to cache the result ofobject_detection_classes()for each unique model nameWhy this creates a speedup:
The line profiler reveals that
get_model(model_name)consumes 100% of the execution time (228ms out of 228ms total). This function likely involves expensive operations like:With caching, subsequent calls with the same model name return the cached class list instantly, avoiding the expensive
get_model()call entirely.Impact on workloads:
The test results show consistent 150-350% speedups across various scenarios, particularly benefiting:
Test case performance:
The optimization is particularly effective because object detection models are typically reused across multiple document pages, making the cache hit ratio very high in real-world usage patterns.
✅ Correctness verification report:
⚙️ Existing Unit Tests and Runtime
partition/pdf_image/test_analysis.py::test_od_document_layout_dump🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-ObjectDetectionLayoutDumper.dump-mje7bjzcand push.