feat(search): fan-out KB retrieval across bound vector stores by ochanism · Pull Request #1386 · Tencent/WeKnora

ochanism · 2026-05-18T13:25:55Z

Summary

Fan out multi-KB hybrid search across the VectorStore each knowledge base is bound to. KBs that share a store (or all share the env store) still take the existing single-engine path with zero overhead. When KBs span more than one store, retrieval runs in parallel via errgroup with a bounded fan-out and a per-group timeout, and per-engine vector scores are normalized to a common [0, 1] scale before fusion.

Multi-KB search now also explicitly enforces embedding-model consistency and rejects scopes that include knowledge bases the caller is not entitled to read.

Context

Part of #993 — Phase 2 (Per-KB VectorStore Binding) of the multi-store roadmap.

Phase 2 PR sequence (this is PR 4 of 5):

#	PR	Status
1	#994 — `vector_store_id` column + migration	MERGED
2	#1310 — KB-scoped retrieve-engine factory + 25-site refactor	MERGED
3	#1372 — KB create/retrieve API + defensive logic	MERGED
4	this PR — multi-store fan-out search	here
5	KB create/detail UI	planned

Depends on #994, #1310, and #1372.

Changes at a glance

HybridSearch now batch-loads every KB in scope, partitions them by (VectorStoreID, KB.TenantID), resolves each group's engine through the PR2 factory using the owning tenant for ownership lookup, and fans out retrieval via errgroup with SetLimit(4) and a per-goroutine context.WithTimeout (default 30s, env knob MULTI_STORE_RETRIEVE_TIMEOUT_SEC).
A new ScoreNormalizer interface (internal/application/service/retriever/normalizer.go) plus an EngineAwareNormalizer implementation rescales vector scores per engine type so that cross-engine merges produce a directly comparable ranked list. Keyword (BM25) scores pass through unchanged because the downstream RRF fusion is rank-based and immune to scale.
Embedding-model consistency is enforced explicitly via ResolveEmbeddingModelKeys. Multi-KB searches whose KBs resolve to different model identities return 400 (ErrBadRequest) instead of silently producing meaningless cross-model scores.
Per-KB authorization at the search boundary: same-tenant KBs are always accessible; foreign-tenant KBs (Organization-shared) must pass an explicit kbShareService.HasKBPermission check. Otherwise the search returns 404 (ErrNotFound) — the existence of a foreign KB is never leaked back to the caller.
The HybridSearch handler now preserves typed AppErrors (2200 / 2201 from feat(knowledge-base): validate vector store bindings on create, copy, and delete #1372 plus the new 400 / 404 above) end-to-end instead of downgrading them to InternalServerError.
Single-group calls (every KB on the env store, or every KB on the same DB store) bypass fan-out, normalization, and the timeout wrapper entirely — the dominant path today since every existing KB has vector_store_id = NULL.

File-by-file summary

Production

File	Change
`internal/application/service/retriever/normalizer.go` (new)	`ScoreNormalizer(ctx, score, retrieverType, engineType)` interface + `EngineAwareNormalizer` implementing each documented per-engine score formula (Elasticsearch / ElasticFaiss / Milvus cosine → `(s+1)/2`; Postgres / Qdrant / Weaviate / SQLite / Infinity / TencentVectorDB / Doris identity), plus `clamp01` with NaN / ±Inf guards. The interface is invoked only on `RetrieverType == Vector`; the caller emits a deduplicated WARN for unknown engine types so `Normalize` itself stays IO-free and panic-free even on a nil ctx
`internal/application/service/knowledgebase_search_storegroup.go` (new)	`storeGroup` struct (immutable `BaseParams` + mutable `TopK`), `resolveStoreGroups` (partitions by `(storeID, kb.TenantID)`, resolves engine via PR2 factory with the owning tenant), `classifyFactoryError` (sentinel → typed AppError with sanitized structured logs, no UUID in user-facing messages), `validateSameEmbeddingModel` (uses `ResolveEmbeddingModelKeys` so cross-tenant shared KBs resolving to the same underlying model are correctly tolerated), `authorizeKBAccess` (per-KB share permission check before any fan-out or downstream retrieval)
`internal/application/service/knowledgebase_search_fanout.go` (new)	`retrieveFromStores` with single-group fast path, `errgroup.SetLimit(4)`, per-goroutine `context.WithTimeout`, mixed-engine normalization (only when results span >1 engine type, single source of truth: `RetrieveResult.RetrieverEngineType`), and `isParentCancelled` narrowed to `context.Canceled` so parent-deadline expiry surfaces as a typed 2201 instead of leaking `context.DeadlineExceeded`. Failure policy is all-or-nothing; the first group error fails the whole search with a single generic `ErrVectorStoreUnavailable` message
`internal/application/service/knowledgebase_search.go`	`HybridSearch` rewired around the new helpers. Query embedding is pre-computed once before fan-out and propagated via `params.QueryEmbedding` so per-group `buildRetrievalParams` does not re-embed the same text N times. `pickPrimary` returns `nil` on miss (no `kbs[0]` fallback) so a caller-supplied id that is not in the scope produces a clean 404 rather than silently pivoting to an unintended KB
`internal/application/service/knowledgebase_search_faq.go`	`applyFAQPostProcessing` and `iterativeRetrieveWithDeduplication` now operate on `[]storeGroup` and return `([]IndexWithScore, error)`** so typed AppErrors raised inside the iterative fan-out path (per-group timeout, store binding invalid) surface to the user instead of being silently converted into a truncated chunk list. Each iteration mutates only `group.TopK`; the immutable `BaseParams` is rebuilt fresh per call by `paramsWithTopK`, leaving no aliasing for the parallel goroutines
`internal/handler/knowledgebase.go`	`HybridSearch` handler wraps the service call with `apperrors.IsAppError` so typed 2200 / 2201 / 400 / 404 reach the client unchanged. Mirrors the pattern already in `CreateKnowledgeBase`

Tests

File	Change
`internal/application/service/retriever/normalizer_test.go` (new)	Table tests over the 10 documented engine types, keyword passthrough on every engine, unknown engine clamp, nil-ctx safety, NaN / ±Inf guards via `clamp01`, compile-time interface satisfaction assertion
`internal/application/service/knowledgebase_search_fanout_test.go` (new)	Fan-out helpers (paramsWithTopK rebuilds fresh per call, hasMixedEngineTypes, isKnownEngineType, storeKindLabel, multiStoreRetrieveTimeout env override) + `retrieveFromStores` integration tests built on top of the PR2 factory: empty groups, single-group fast path, multi-group parallel concat, mixed-engine score normalization (ES → `[0, 1]` while PG passthrough), keyword passthrough on mixed engine, one-group failure → typed `ErrVectorStoreUnavailable` (no raw error leak), per-group timeout (`MULTI_STORE_RETRIEVE_TIMEOUT_SEC=1`) → 2201, iterative-FAQ pattern under `-race` (BaseParams TopK invariant preserved), iterative path propagates typed AppError, `isParentCancelled` distinguishes `context.Canceled` from `context.DeadlineExceeded`. Also covers `classifyFactoryError` sentinel → 2200 / 2201 / 2200 / passthrough with message sanitization, `validateSameEmbeddingModel` (single-KB no-op, same identity / different identity / wiki-only carve-out / cross-tenant same model / log-injection sanitization), and `authorizeKBAccess` (same tenant pass / foreign with share / foreign without share → 404 / no user → 404 / share lookup error → 500 / empty kbs no-op)

Backward compatibility

Surface	Compatibility analysis
Single-KB search (no `knowledge_base_ids` in body)	Same as today. Batch-load returns one row, `resolveStoreGroups` produces a single env-store group, fast path takes the single engine's `Retrieve` directly. Zero fan-out, normalization, or timeout overhead
Multi-KB search where every KB is unbound	`resolveStoreGroups` produces a single env-store group → single-group fast path. Same behavior as before this PR
Multi-KB search where every KB is bound to the same store	Single store-group → single-group fast path. Same behavior
Multi-KB search across different stores (new in this PR)	Activated only when KBs in `knowledge_base_ids` actually live on different stores. Today the only such KBs would be those bound by clients that have started using #1372's `vector_store_id` field
Embedding-model mismatch in multi-KB search	Used to silently produce incomparable scores; now an explicit 400 with a clear English message. Single-KB callers unaffected (the validator short-circuits on `len(kbs) <= 1`)
Foreign-tenant KB UUID injected into `knowledge_base_ids`	Used to be filtered out implicitly downstream (the search engine simply had no data for that scope). Now an explicit 404 — the search never reaches the retrieval layer with an unauthorized KB id, and the existence of the foreign KB is not revealed
Per-group timeout via `MULTI_STORE_RETRIEVE_TIMEOUT_SEC`	Default 30 s, applied only on the multi-group path. Single-store deployments see no behavior change
`params.QueryEmbedding` pre-computed by caller (e.g. chat pipeline)	Embedding precompute is skipped when `len(params.QueryEmbedding) > 0`, so callers that already supply their own embedding are unaffected
Migrations	None

Behavior change (intentional)

Embedding-model mismatch is now a 400. Multi-KB search across KBs that resolve to different embedding-model identities used to silently produce a single result list with incomparable scores (the query was embedded with the primary KB's model, then compared against vectors from incompatible model spaces). This PR rejects it explicitly so the user sees the failure rather than a misleading result set.
Foreign-tenant KB injection is now a 404. A caller could previously include another tenant's KB UUID in params.KnowledgeBaseIDs; downstream retrieval would find no matching data, but the response shape did not communicate that the scope was rejected. This PR runs an explicit per-KB permission check (kbShareService.HasKBPermission) and returns 404 — same status as if the KB did not exist, to avoid leaking foreign-tenant KB existence.

Known limitations

GetKnowledgeBaseByIDs is tenant-agnostic at the repository layer. Authorization is enforced at the service layer via authorizeKBAccess. Pushing tenant scope into the repository (and leveraging the idx_knowledge_bases_tenant_vector_store composite index added by feat: add vector_store_id column to knowledge_bases for per-KB vector store binding #994 for that path) is tracked as a separate hardening PR.
No global PostgreSQL connection-pool cap. The application-level g.SetLimit(4) bounds per-request fan-out, but the shared gorm pool currently leaves MaxOpenConns at the Go default (unlimited) for PostgreSQL. Operators should set SetMaxOpenConns per their PostgreSQL max_connections budget; that change is out of scope for this PR.
Engine-aware normalization formulas are a single central switch. Adding a new engine requires editing normalizer.go. A per-engine NormalizeScore method on RetrieveEngineService would distribute this cleanly but touches every engine implementation in the repository and is deferred to a follow-up.
FAQ iterative retrieval is not wrapped in a separate Langfuse span. Trace timing for FAQ KBs that fall into iterative retrieval currently shows only the first retrieveFromStores call; subsequent iterations are still inside the same parent context but outside the explicit "retrieve" child span. A separate span will be added in a follow-up cleanup.
Cross-model multi-KB search is rejected, not merged. The validator blocks the case explicitly; merging KBs across embedding models (group by model, embed per group, normalize and merge) requires retrieval-quality work and is left to a future issue.

Test plan

go build ./...
go vet ./...
go test -count=1 -race ./internal/application/service/retriever/... (normalizer table, NaN / Inf, nil-ctx, interface satisfaction)
go test -count=1 -race ./internal/application/service/... (resolveStoreGroups partition matrix, classifyFactoryError sentinel translation, validateSameEmbeddingModel matrix, authorizeKBAccess matrix, retrieveFromStores fast path / multi-group / mixed-engine / one-group failure / per-group timeout / keyword passthrough / iterative pattern under -race, FAQ iterative typed-AppError propagation, isParentCancelled discrimination)
go test -count=1 -race ./internal/handler/...
go test -count=1 -race ./internal/types/... ./internal/application/repository/... (no regressions on prior PRs)
Local E2E smoke against docker compose dev (PG + Qdrant + DocReader + MinIO + Redis) with an OpenAI-compatible embedding endpoint:
- Single-KB search: 200, group count: 1
- Multi-KB same store: 200, group count: 1
- Multi-KB cross-store (env-PG + Qdrant): 200, group count: 2, results from both engines merged (server log shows engine: qdrant, retriever: vector|keywords and engine: postgres, retriever: vector|keywords co-firing)
- Multi-KB embedding-model mismatch: 400 with the English selected knowledge bases use different embedding models message
- Foreign-tenant KB injection (second tenant created via registration, its KB id added to knowledge_base_ids of a first-tenant search): 404 with knowledge base not found + structured audit log search scope rejected: unauthorized foreign-tenant KB

Multi-KB hybrid search now groups KBs by their bound VectorStore (partition key (storeID, owner_tenant_id)), retrieves in parallel via errgroup with a SetLimit(4) cap and a per-group timeout (MULTI_STORE_RETRIEVE_TIMEOUT_SEC, default 30s), and merges results. When the collected results span more than one engine type, an EngineAwareNormalizer rescales vector scores to [0, 1]; keyword (BM25) scores pass through to the existing RRF fusion. Single-group calls take the fast path with zero fan-out overhead, preserving today's behavior for deployments where every KB has vector_store_id = NULL. Embedding-model consistency is now enforced explicitly via ResolveEmbeddingModelKeys. Multi-KB searches across KBs whose resolved model identities differ return BadRequest instead of silently producing incomparable scores. Cross-tenant Organization-shared KBs are preserved by partitioning on KB.TenantID so the factory's ownership lookup runs against the source tenant. Foreign-tenant KB UUIDs injected via the request body are rejected via kbShareService.HasTenantKBPermission (Plan 3 of Tencent#1303, 3-D capped) before any retrieval; rejected scopes surface as 404 to avoid leaking foreign KB existence. Service-layer typed AppErrors (ErrVectorStoreBindingInvalid 2200 / ErrVectorStoreUnavailable 2201) are mapped from PR2 sentinel hierarchy and preserved end-to-end: the iterative FAQ path returns them rather than swallowing, and the HybridSearch handler routes typed AppErrors to the client unchanged instead of downgrading to 500. Part of Tencent#993 (Phase 2: Per-KB VectorStore Binding). Phase 2 roadmap item: PR 4 (Multi-store fan-out search). Depends on Tencent#994, Tencent#1310, Tencent#1372.

ochanism force-pushed the enhancement/1206-4 branch from f5be7e9 to efc9980 Compare May 18, 2026 14:05

ochanism force-pushed the enhancement/1206-4 branch from efc9980 to 0e6f823 Compare May 18, 2026 14:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(search): fan-out KB retrieval across bound vector stores#1386

feat(search): fan-out KB retrieval across bound vector stores#1386
ochanism wants to merge 1 commit into
Tencent:mainfrom
ochanism:enhancement/1206-4

ochanism commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ochanism commented May 18, 2026

Summary

Context

Changes at a glance

File-by-file summary

Production

Tests

Backward compatibility

Behavior change (intentional)

Known limitations

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant