Re-land remote cold start workflow headers by hansent · Pull Request #2224 · roboflow/inference

hansent · 2026-04-10T16:47:27Z

Summary

re-land remote cold start model metadata aggregation in workflow response headers
restore the dedicated request_metrics module and its focused tests
fix the stale http_api.py middleware copy by using the shared request_metrics implementation

Root Cause

PR #2209 merged into main and was then reverted in #2222 because inference/core/interfaces/http/http_api.py still contained an older local copy of GCPServerlessMiddleware after the shared request_metrics.py extraction. That stale copy referenced symbols that were no longer imported there, which triggered the CI flake8 F821 undefined name failures.

Why

Workflow responses should include remote model IDs, cold start counts, load times, and detailed load metadata when remote execution happens. Without this reland, top-level workflow headers only reflect local execution and under-report remote cold starts.

Validation

python3.10 -m py_compile inference/core/interfaces/http/http_api.py inference/core/interfaces/http/request_metrics.py inference_sdk/http/utils/executors.py inference_sdk/config.py inference/core/managers/model_load_collector.py
/Users/hansent/.venv/bin/python -m pytest tests/inference_sdk/unit_tests/test_config.py tests/inference_sdk/unit_tests/http/utils/test_remote_processing_time_collection.py tests/inference/unit_tests/core/interfaces/http/test_remote_processing_time_middleware.py tests/inference/unit_tests/core/interfaces/http/test_model_response_headers.py

Context

This PR re-lands the intended behavior from #2209 after the revert in #2222, with the missing import/middleware cleanup included.

…old-start-headers

rafel-roboflow · 2026-04-16T11:34:07Z

Hello, Claude found two possible bugs, could you confirm if they are relevant?

Bug 1 — Single-response SDK calls miss new model-load headers

_collect_processing_time_from_response (client.py L118) only reads X-Processing-Time and calls collector.add(). It does not handle X-Model-Id, X-Model-Cold-Start, X-Model-Load-Time, or X-Model-Load-Details.

The new header parsing was added in _collect_remote_processing_times (executors.py L164), which is used by the batch/parallel sync path. But these three single-response methods still call the old function:

get_clip_text_embeddings → client.py L1373
clip_compare → client.py L1478
get_perception_encoder_text_embeddings → client.py L1590

Example: The inference server cold-starts the CLIP model and returns:

X-Processing-Time: 1.5
X-Model-Cold-Start: true
X-Model-Id: clip/ViT-B-32
X-Model-Load-Time: 3.2

_collect_processing_time_from_response records processing_time=1.5 but silently drops everything else. The collector ends up with _model_ids = {}, _cold_start_count = 0, _cold_start_total_load_time = 0.0. If the same request went through execute_requests_packages → _collect_remote_processing_times, all four fields would be populated.

These methods are called from workflow blocks (clip/v1.py L216, perception_encoder/v1.py L189), so cold-start metadata from remote CLIP/Perception Encoder text embeddings never surfaces in the outer response.

Bug 2 — Async SDK path drops all headers before any collection

make_request_async (executors.py L393) returns (response.status, response_data) inside an async with block — headers are destroyed when the context manager exits. make_parallel_requests_async (executors.py L354) then strips even the status code: return [r[1] for r in responses], yielding bare dict | bytes bodies.

Unlike its sync counterpart execute_requests_packages (which calls _collect_remote_processing_times at L81), execute_requests_packages_async (executors.py L304) has no call to any header-collection function. Headers are structurally gone before any collector could see them.

Affected call sites (all receive body-only, no headers):

infer_from_api_v0_async — client.py L671
infer_from_api_v1_async — client.py L824
OCR async — client.py L1226
SAM3 3D async — client.py L2013
YOLO World async — client.py L2541
Generic async inference — client.py L2889

Example: An async object-detection call triggers a cold start. The server returns X-Processing-Time: 0.8, X-Model-Cold-Start: true, X-Model-Id: my-project/1, X-Model-Load-Time: 5.1. The RemoteProcessingTimeCollector records nothing — all headers were discarded at make_request_async and further reduced to body-only at make_parallel_requests_async. The cold-start/model-load feature is effectively sync-only.

hansent added 2 commits April 10, 2026 11:46

Re-land remote cold start workflow headers

b3aa227

Merge remote-tracking branch 'origin/main' into codex/reland-remote-c…

7110fb2

…old-start-headers

hansent marked this pull request as ready for review April 10, 2026 16:50

hansent requested review from PawelPeczek-Roboflow, dkosowski87, grzegorz-roboflow, probicheaux and yeldarby as code owners April 10, 2026 16:50

Merge branch 'main' into codex/reland-remote-cold-start-headers

7ae17f7

hansent requested a review from rafel-roboflow as a code owner April 14, 2026 14:40

hansent changed the title ~~[codex] Re-land remote cold start workflow headers~~ Re-land remote cold start workflow headers Apr 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-land remote cold start workflow headers#2224

Re-land remote cold start workflow headers#2224
hansent wants to merge 3 commits intomainfrom
codex/reland-remote-cold-start-headers

hansent commented Apr 10, 2026

Uh oh!

rafel-roboflow commented Apr 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hansent commented Apr 10, 2026

Summary

Root Cause

Why

Validation

Context

Uh oh!

rafel-roboflow commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rafel-roboflow commented Apr 16, 2026 •

edited

Loading