⚡️ Speed up function get_bbox_thickness by 1,267%
#53
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 1,267% (12.67x) speedup for
get_bbox_thicknessinunstructured/partition/pdf_image/analysis/bbox_visualisation.py⏱️ Runtime :
5.01 milliseconds→367 microseconds(best of250runs)📝 Explanation and details
The optimization replaces
np.polyfitwith direct linear interpolation, achieving a 13x speedup by eliminating unnecessary computational overhead.Key Optimization:
np.polyfit: The original code used NumPy's polynomial fitting for a simple linear interpolation between two points, which is computationally expensiveslope = (max_value - min_value) / (ratio_for_max_value - ratio_for_min_value)Why This is Faster:
np.polyfitperforms general polynomial regression using least squares, involving matrix operations and SVD decomposition - overkill for two pointsnp.polyfitline consumed 91.7% of execution time (10.67ms out of 11.64ms total)Performance Impact:
The function is called from
draw_bbox_on_imagewhich processes bounding boxes for PDF image visualization. Since this appears to be in a rendering pipeline that could process many bounding boxes per page, the 13x speedup significantly improves visualization performance. Test results show consistent 12-13x improvements across all scenarios, from single bbox calls (~25μs → ~2μs) to batch processing of 100 random bboxes (1.6ms → 116μs).Optimization Benefits:
This optimization is particularly valuable for PDF processing workflows where many bounding boxes need thickness calculations for visualization.
✅ Correctness verification report:
⚙️ Existing Unit Tests and Runtime
partition/pdf_image/test_analysis.py::test_get_bbox_thickness🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-get_bbox_thickness-mjdlipbjand push.