Skip to content

perf: ⚡️ enhance read_image_as_pil read speed for better slice speed #1353

Merged
onuralpszr merged 4 commits intomainfrom
perf-image-read-cv-pil
Apr 20, 2026
Merged

perf: ⚡️ enhance read_image_as_pil read speed for better slice speed #1353
onuralpszr merged 4 commits intomainfrom
perf-image-read-cv-pil

Conversation

@onuralpszr
Copy link
Copy Markdown
Collaborator

@onuralpszr onuralpszr commented Apr 20, 2026

This PR handle 2 major parts first one is bug fix related to image pil read CHW-detection and secondly improve read speed without convert and just straight forward to read frame(s)/image(s)

  • Bug fix read_image_as_pil: replaced the shape[0] < 5 CHW-detection heuristic with shape[0] in (1, 3, 4) and shape[-1] not in (1, 3, 4). The old check misidentified any image whose height happened to be ≤ 4 px as channel-first, silently transposing the array and producing garbage output.

  • Performance Added return_arr: bool = False to read_image_as_pil. When True and the input is already a np.ndarray, the costly Image.fromarray() call is skipped entirely and the (possibly transposed) array is returned directly. For PIL / string inputs the PIL object is converted with np.asarray() before returning.

  • get_prediction updated to pass return_arr=True, eliminating the full np -> PIL -> np round-trip that occurred on every slice during get_sliced_prediction.

Bug details

# before — breaks for any image with height ≤ 4 px
if image.ndim == 3 and image.shape[0] < 5:   # treats height=4 as CHW
    if image.shape[2] > 4:
        image = np.transpose(image, (1, 2, 0))

# after — keyed on channel count, not pixel count
# _to_hwc() in sahi/utils/cv.py
if a.ndim == 3 and a.shape[0] in (1, 3, 4) and a.shape[-1] not in (1, 3, 4):
    return np.transpose(a, (1, 2, 0))

Performance report

Benchmark: scripts/benchmark_read_image.py 300 evenly-sampled frames per video, 3-frame warm-up, time.perf_counter() wall time, CPU only (no model).

I ran 2 different videos from pexels to test this case and I see speed ups

Video Resolution return_arr=False (ms/frame) return_arr=True (ms/frame) Speedup
14845279_3840_2160_60fps.mp4 3840 × 2160 9.51 < 0.001 ~31 000×
14845759_1920_1080_60fps.mp4 1920 × 1080 1.47 < 0.001 ~4 400×

Image.fromarray() cost scales with pixel count (~24 MP vs ~2 MP). The return_arr=True path is effectively zero-cost np.asarray on an ndarray is a no-copy view.

_to_hwc intentionally does not flip channel order. SAHI's public contract is that callers supply RGB arrays (matching what read_image() and PIL both return). Adding a blanket BGR -> RGB flip inside read_image_as_pil would silently corrupt inputs that are already in RGB confirmed by a test regression during development.

@onuralpszr onuralpszr merged commit 9310bec into main Apr 20, 2026
9 checks passed
@onuralpszr onuralpszr deleted the perf-image-read-cv-pil branch April 20, 2026 17:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant