fix(embedding): scope long batch timeout to embed-jobs worker (#445)#446
Merged
Conversation
v1.64.0 shipped the Ollama batch endpoint (#435) without raising the HTTP timeout that governs the path. On CPU-only Ollama a 30-text batch exceeds the 30s default `DefaultTimeout`, so the api-gateway embed-jobs worker timeout-storms on every spec write, retries five times, fails terminally, gets re-enqueued by the reconciler, and loops. From the operator's perspective the spec badge alternates between "indexing" and looks "stuck on queued". ## Fix The fix is scoped to the async worker. The shared `embedding.Provider` that request-path callers (`memory_recall`, `memory_manage`, `capture_insight`, apigateway `queryVectorFor`) use keeps its 30s default so a wedged Ollama fails an MCP `tools/call` in seconds, not minutes. Only the embed-jobs worker gets the longer budget. - `DefaultTimeout` stays at 30s. The constant's comment now explicitly documents that the singular `/api/embeddings` path is what it's tuned for, and points at the worker's dedicated provider for the batched path. - New `apigateway.embed_jobs.embed_timeout` config (default 5m). The worker constructs its own Ollama provider with this timeout via `Platform.workerEmbedder()`. Provider != ollama (noop, future kinds) reuses the shared instance unchanged. ## Tests Three new unit tests in `worker_embedder_test.go` cover the scoping: - `TestWorkerEmbedder_OllamaUsesEmbedTimeout` proves the worker gets a fresh Provider with the configured timeout AND the shared instance keeps its 30s default (the regression-prevention assertion). - `TestWorkerEmbedder_OllamaDefaultsTo5Min` covers the zero-value fallback. - `TestWorkerEmbedder_NoopReturnsShared` covers the non-Ollama path. The tests reach the unexported `http.Client.Timeout` via reflection; a future rename of that field produces a loud failed assertion (the duration compares against specific values), not a silent skip. ## Integration test against real Ollama New `pkg/platform/integration_embedjobs_realollama_test.go` (build tag `integration`) spins up real Ollama in testcontainers, pulls `nomic-embed-text`, enqueues a 30-operation spec, and asserts the embedding pass completes within the worker's timeout budget. This is the regression test that would have caught #445 before release; prior integration tests used synthetic delays in stub Computers and never exercised actual Ollama batch latency. The shared `internal/testollama` helper uses `sync.Once` to start the Ollama container once per test process. `Provider(timeout)` returns a fresh embedding.Provider with the supplied timeout so call sites name the production path they're exercising (30s for request-path, 5m for worker). testcontainers' ryuk reaper handles teardown at process exit. ## Verification - `make verify` clean (tools-check, fmt, race tests, lint patch-scoped against `origin/main`, gosec, govulncheck, semgrep, codeql, doc-check, dead-code, mutation testing, goreleaser dry-run). - `make test-integration` clean against pgvector + real Ollama. Real- Ollama test runtime: ~3.5min (model pull dominates; the 30-op batched embed itself completes in ~2 seconds once warm). - Adversarial sub-agent review: 2 rounds. Round 1 surfaced (a) the global timeout change would affect synchronous request-path callers too and (b) the test poll deadline equaled the HTTP timeout with no margin. Both fixed. Round 2 CLEAN. ## Operator workaround on v1.64.0 (no upgrade required) Add `timeout: 5m` under `memory.embedding.ollama` in the platform configmap and roll the deployment.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #446 +/- ##
=======================================
Coverage 86.18% 86.19%
=======================================
Files 235 235
Lines 32241 32252 +11
=======================================
+ Hits 27787 27798 +11
Misses 3239 3239
Partials 1215 1215 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #445.
v1.64.0 switched
EmbedBatchto Ollama's batched/api/embedendpoint (#435) without raising the HTTP timeout that governs it. On CPU-only Ollama a 30-text batched POST routinely exceeds the 30sDefaultTimeout, so the api-gateway embed-jobs worker timeout-storms on every spec write, retries five times, fails terminally, gets re-enqueued by the reconciler, and loops. From the operator's perspective the spec badge alternates betweenindexingandfailedand looks "stuck on queued".Approach: scoped, not global
A first pass raised
embedding.DefaultTimeoutfrom 30s to 5m. The adversarial review caught that this also affects synchronous request-path callers (memory_recall,memory_manage,capture_insight, apigatewayqueryVectorFor), which would now hang an MCPtools/callfor 5 minutes on a wedged Ollama instead of failing in 30s. That is a real regression for interactive tool calls.The shipped fix scopes the longer budget to the async worker only:
embedding.DefaultTimeoutstays at 30s. The constant's comment explicitly documents that the singular/api/embeddingspath is what it's tuned for, and points at the worker's dedicated provider for the batched path.apigateway.embed_jobs.embed_timeoutconfig (duration, default5m). The worker constructs its own Ollama provider with this timeout via a newPlatform.workerEmbedder()helper. The sharedp.embeddingProvthat request-path callers use is unchanged.ollama(noop today, future kinds) reuse the shared instance. Today only Ollama has the long-tail batch latency profile that needs this.New config
embed_jobs.embed_timeout5mmemory_recall,memory_manage,capture_insight, apigateway query-vector) continue to usememory.embedding.ollama.timeout(default 30s viaembedding.DefaultTimeout).Tests
Unit (
pkg/platform/worker_embedder_test.go)TestWorkerEmbedder_OllamaUsesEmbedTimeoutproves the worker gets a freshembedding.Providerwith the configured timeout AND the shared instance keeps its 30s default. This is the regression-prevention assertion: a future change that accidentally re-globalizes the long timeout fails this test loudly.TestWorkerEmbedder_OllamaDefaultsTo5Mincovers the zero-value fallback.TestWorkerEmbedder_NoopReturnsSharedcovers the non-Ollama path.The tests reach the unexported
http.Client.Timeoutvia reflection. A future rename of that field produces a loud failed assertion (durations are compared against specific values), not a silent skip.Integration against a real Ollama (
pkg/platform/integration_embedjobs_realollama_test.go, build tagintegration)The reason #445 shipped undetected is that every prior integration test used synthetic delays in stub Computers. None exercised actual Ollama batch latency.
This PR adds the missing coverage: spins up real Ollama in testcontainers, pulls
nomic-embed-text, seeds a 30-operation spec, enqueues a job, and asserts the embedding pass completes inside the worker's timeout budget. Pre-fix behavior (30s global default) would fail this test on the very batch shape that caused the incident; post-fix it passes with margin.The shared
internal/testollamahelper usessync.Onceto start the Ollama container once per test process.Provider(timeout)returns a fresh provider with the supplied timeout so each call site names the production code path it's exercising (30s for request-path, 5m for worker). The pattern is reusable by every upcoming embedding-related consumer (DataHub semantic search, prompt discovery, knowledge insights recall, portal asset search).Verification
make verifyclean: tools-check, gofmt, race tests, total + patch coverage above gate, lint (patch-scoped againstorigin/main), gosec, govulncheck, semgrep, codeql, doc-check, dead-code, mutation testing, goreleaser dry-run.make test-integrationclean against pgvector + real Ollama. The real-Ollama test takes ~3.5min (model pull dominates; the 30-op batched embed itself completes in ~2 seconds once warm).Operator workaround on v1.64.0 (no upgrade required)
Add
timeout: 5mundermemory.embedding.ollamain the platform configmap and roll the deployment. Once v1.64.1 is rolled, the override undermemory.embedding.ollama.timeoutcan be removed because the worker no longer shares that path; operators who want to keep the request-path timeout at 60s and the worker timeout at 5m can now set them independently.Test plan
make verifypasses locally.make test-integrationpasses locally against pgvector + real Ollama.indexing N/Mupward to completion.memory.embedding.ollama.timeoutunset, confirm a wedged Ollama still failsmemory_recallin ~30s (the request-path budget) while the embed-jobs worker continues with its 5m budget.