Skip to content

fix(embedding): scope long batch timeout to embed-jobs worker (#445)#446

Merged
cjimti merged 1 commit into
mainfrom
fix/embedding-default-timeout
May 19, 2026
Merged

fix(embedding): scope long batch timeout to embed-jobs worker (#445)#446
cjimti merged 1 commit into
mainfrom
fix/embedding-default-timeout

Conversation

@cjimti
Copy link
Copy Markdown
Member

@cjimti cjimti commented May 19, 2026

Summary

Closes #445.

v1.64.0 switched EmbedBatch to Ollama's batched /api/embed endpoint (#435) without raising the HTTP timeout that governs it. On CPU-only Ollama a 30-text batched POST routinely exceeds the 30s DefaultTimeout, so the api-gateway embed-jobs worker timeout-storms on every spec write, retries five times, fails terminally, gets re-enqueued by the reconciler, and loops. From the operator's perspective the spec badge alternates between indexing and failed and looks "stuck on queued".

Approach: scoped, not global

A first pass raised embedding.DefaultTimeout from 30s to 5m. The adversarial review caught that this also affects synchronous request-path callers (memory_recall, memory_manage, capture_insight, apigateway queryVectorFor), which would now hang an MCP tools/call for 5 minutes on a wedged Ollama instead of failing in 30s. That is a real regression for interactive tool calls.

The shipped fix scopes the longer budget to the async worker only:

  • embedding.DefaultTimeout stays at 30s. The constant's comment explicitly documents that the singular /api/embeddings path is what it's tuned for, and points at the worker's dedicated provider for the batched path.
  • New apigateway.embed_jobs.embed_timeout config (duration, default 5m). The worker constructs its own Ollama provider with this timeout via a new Platform.workerEmbedder() helper. The shared p.embeddingProv that request-path callers use is unchanged.
  • Provider kinds other than ollama (noop today, future kinds) reuse the shared instance. Today only Ollama has the long-tail batch latency profile that needs this.

New config

apigateway:
  embed_jobs:
    workers: 1
    embed_timeout: 5m
Field Default Affects
embed_jobs.embed_timeout 5m The api-gateway embed-jobs worker only. Request-path embedding calls (memory_recall, memory_manage, capture_insight, apigateway query-vector) continue to use memory.embedding.ollama.timeout (default 30s via embedding.DefaultTimeout).

Tests

Unit (pkg/platform/worker_embedder_test.go)

  • TestWorkerEmbedder_OllamaUsesEmbedTimeout proves the worker gets a fresh embedding.Provider with the configured timeout AND the shared instance keeps its 30s default. This is the regression-prevention assertion: a future change that accidentally re-globalizes the long timeout fails this test loudly.
  • TestWorkerEmbedder_OllamaDefaultsTo5Min covers the zero-value fallback.
  • TestWorkerEmbedder_NoopReturnsShared covers the non-Ollama path.

The tests reach the unexported http.Client.Timeout via reflection. A future rename of that field produces a loud failed assertion (durations are compared against specific values), not a silent skip.

Integration against a real Ollama (pkg/platform/integration_embedjobs_realollama_test.go, build tag integration)

The reason #445 shipped undetected is that every prior integration test used synthetic delays in stub Computers. None exercised actual Ollama batch latency.

This PR adds the missing coverage: spins up real Ollama in testcontainers, pulls nomic-embed-text, seeds a 30-operation spec, enqueues a job, and asserts the embedding pass completes inside the worker's timeout budget. Pre-fix behavior (30s global default) would fail this test on the very batch shape that caused the incident; post-fix it passes with margin.

The shared internal/testollama helper uses sync.Once to start the Ollama container once per test process. Provider(timeout) returns a fresh provider with the supplied timeout so each call site names the production code path it's exercising (30s for request-path, 5m for worker). The pattern is reusable by every upcoming embedding-related consumer (DataHub semantic search, prompt discovery, knowledge insights recall, portal asset search).

Verification

  • make verify clean: tools-check, gofmt, race tests, total + patch coverage above gate, lint (patch-scoped against origin/main), gosec, govulncheck, semgrep, codeql, doc-check, dead-code, mutation testing, goreleaser dry-run.
  • make test-integration clean against pgvector + real Ollama. The real-Ollama test takes ~3.5min (model pull dominates; the 30-op batched embed itself completes in ~2 seconds once warm).
  • Adversarial sub-agent review: 2 rounds. Round 1 surfaced (a) the global timeout regression for request-path callers and (b) test poll deadline equaling the HTTP timeout with no margin. Both fixed. Round 2 returned CLEAN.

Operator workaround on v1.64.0 (no upgrade required)

Add timeout: 5m under memory.embedding.ollama in the platform configmap and roll the deployment. Once v1.64.1 is rolled, the override under memory.embedding.ollama.timeout can be removed because the worker no longer shares that path; operators who want to keep the request-path timeout at 60s and the worker timeout at 5m can now set them independently.

Test plan

  • make verify passes locally.
  • make test-integration passes locally against pgvector + real Ollama.
  • Adversarial review verdict CLEAN.
  • CI green.
  • Manual: deploy to a cluster running CPU-only Ollama with a multi-op spec (30+ operations), confirm the spec indexes without timeout errors and the badge ticks indexing N/M upward to completion.
  • Manual: with memory.embedding.ollama.timeout unset, confirm a wedged Ollama still fails memory_recall in ~30s (the request-path budget) while the embed-jobs worker continues with its 5m budget.

v1.64.0 shipped the Ollama batch endpoint (#435) without raising the
HTTP timeout that governs the path. On CPU-only Ollama a 30-text batch
exceeds the 30s default `DefaultTimeout`, so the api-gateway embed-jobs
worker timeout-storms on every spec write, retries five times, fails
terminally, gets re-enqueued by the reconciler, and loops. From the
operator's perspective the spec badge alternates between "indexing"
and looks "stuck on queued".

## Fix

The fix is scoped to the async worker. The shared `embedding.Provider`
that request-path callers (`memory_recall`, `memory_manage`,
`capture_insight`, apigateway `queryVectorFor`) use keeps its 30s
default so a wedged Ollama fails an MCP `tools/call` in seconds, not
minutes. Only the embed-jobs worker gets the longer budget.

- `DefaultTimeout` stays at 30s. The constant's comment now explicitly
  documents that the singular `/api/embeddings` path is what it's tuned
  for, and points at the worker's dedicated provider for the batched
  path.
- New `apigateway.embed_jobs.embed_timeout` config (default 5m). The
  worker constructs its own Ollama provider with this timeout via
  `Platform.workerEmbedder()`. Provider != ollama (noop, future kinds)
  reuses the shared instance unchanged.

## Tests

Three new unit tests in `worker_embedder_test.go` cover the scoping:

- `TestWorkerEmbedder_OllamaUsesEmbedTimeout` proves the worker gets a
  fresh Provider with the configured timeout AND the shared instance
  keeps its 30s default (the regression-prevention assertion).
- `TestWorkerEmbedder_OllamaDefaultsTo5Min` covers the zero-value
  fallback.
- `TestWorkerEmbedder_NoopReturnsShared` covers the non-Ollama path.

The tests reach the unexported `http.Client.Timeout` via reflection;
a future rename of that field produces a loud failed assertion (the
duration compares against specific values), not a silent skip.

## Integration test against real Ollama

New `pkg/platform/integration_embedjobs_realollama_test.go` (build tag
`integration`) spins up real Ollama in testcontainers, pulls
`nomic-embed-text`, enqueues a 30-operation spec, and asserts the
embedding pass completes within the worker's timeout budget. This is
the regression test that would have caught #445 before release; prior
integration tests used synthetic delays in stub Computers and never
exercised actual Ollama batch latency.

The shared `internal/testollama` helper uses `sync.Once` to start the
Ollama container once per test process. `Provider(timeout)` returns a
fresh embedding.Provider with the supplied timeout so call sites name
the production path they're exercising (30s for request-path, 5m for
worker). testcontainers' ryuk reaper handles teardown at process exit.

## Verification

- `make verify` clean (tools-check, fmt, race tests, lint patch-scoped
  against `origin/main`, gosec, govulncheck, semgrep, codeql,
  doc-check, dead-code, mutation testing, goreleaser dry-run).
- `make test-integration` clean against pgvector + real Ollama. Real-
  Ollama test runtime: ~3.5min (model pull dominates; the 30-op batched
  embed itself completes in ~2 seconds once warm).
- Adversarial sub-agent review: 2 rounds. Round 1 surfaced (a) the
  global timeout change would affect synchronous request-path callers
  too and (b) the test poll deadline equaled the HTTP timeout with no
  margin. Both fixed. Round 2 CLEAN.

## Operator workaround on v1.64.0 (no upgrade required)

Add `timeout: 5m` under `memory.embedding.ollama` in the platform
configmap and roll the deployment.
@codecov
Copy link
Copy Markdown

codecov Bot commented May 19, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.19%. Comparing base (ffed470) to head (2fdc237).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #446   +/-   ##
=======================================
  Coverage   86.18%   86.19%           
=======================================
  Files         235      235           
  Lines       32241    32252   +11     
=======================================
+ Hits        27787    27798   +11     
  Misses       3239     3239           
  Partials     1215     1215           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@cjimti cjimti merged commit c91e711 into main May 19, 2026
9 checks passed
@cjimti cjimti deleted the fix/embedding-default-timeout branch May 19, 2026 23:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(embedding): default Ollama HTTP timeout is too tight for the batched /api/embed path

1 participant