perf: ⚡️ enhance read_image_as_pil read speed for better slice speed #1353
Merged
onuralpszr merged 4 commits intomainfrom Apr 20, 2026
Merged
perf: ⚡️ enhance read_image_as_pil read speed for better slice speed #1353onuralpszr merged 4 commits intomainfrom
onuralpszr merged 4 commits intomainfrom
Conversation
… handling in get_prediction Signed-off-by: Onuralp SEZER <[email protected]>
…d add _to_hwc for array format conversion Signed-off-by: Onuralp SEZER <[email protected]>
… return_arr parameters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR handle 2 major parts first one is bug fix related to image pil read CHW-detection and secondly improve read speed without convert and just straight forward to read frame(s)/image(s)
Bug fix
read_image_as_pil: replaced theshape[0] < 5CHW-detection heuristic withshape[0] in (1, 3, 4) and shape[-1] not in (1, 3, 4). The old check misidentified any image whose height happened to be ≤ 4 px as channel-first, silently transposing the array and producing garbage output.Performance Added
return_arr: bool = Falsetoread_image_as_pil. WhenTrueand the input is already anp.ndarray, the costlyImage.fromarray()call is skipped entirely and the (possibly transposed) array is returned directly. For PIL / string inputs the PIL object is converted withnp.asarray()before returning.get_predictionupdated to passreturn_arr=True, eliminating the full np -> PIL -> np round-trip that occurred on every slice duringget_sliced_prediction.Bug details
Performance report
Benchmark:
scripts/benchmark_read_image.py300 evenly-sampled frames per video, 3-frame warm-up,time.perf_counter()wall time, CPU only (no model).I ran 2 different videos from pexels to test this case and I see speed ups
return_arr=False(ms/frame)return_arr=True(ms/frame)14845279_3840_2160_60fps.mp414845759_1920_1080_60fps.mp4Image.fromarray()cost scales with pixel count (~24 MP vs ~2 MP). Thereturn_arr=Truepath is effectively zero-costnp.asarrayon an ndarray is a no-copy view._to_hwcintentionally does not flip channel order. SAHI's public contract is that callers supply RGB arrays (matching whatread_image()and PIL both return). Adding a blanket BGR -> RGB flip insideread_image_as_pilwould silently corrupt inputs that are already in RGB confirmed by a test regression during development.