Skip to content

feat(search): fan-out KB retrieval across bound vector stores#1386

Open
ochanism wants to merge 1 commit into
Tencent:mainfrom
ochanism:enhancement/1206-4
Open

feat(search): fan-out KB retrieval across bound vector stores#1386
ochanism wants to merge 1 commit into
Tencent:mainfrom
ochanism:enhancement/1206-4

Conversation

@ochanism
Copy link
Copy Markdown
Contributor

Summary

Fan out multi-KB hybrid search across the VectorStore each knowledge base is bound to. KBs that share a store (or all share the env store) still take the existing single-engine path with zero overhead. When KBs span more than one store, retrieval runs in parallel via errgroup with a bounded fan-out and a per-group timeout, and per-engine vector scores are normalized to a common [0, 1] scale before fusion.

Multi-KB search now also explicitly enforces embedding-model consistency and rejects scopes that include knowledge bases the caller is not entitled to read.

Context

Part of #993 — Phase 2 (Per-KB VectorStore Binding) of the multi-store roadmap.

Phase 2 PR sequence (this is PR 4 of 5):

# PR Status
1 #994vector_store_id column + migration MERGED
2 #1310 — KB-scoped retrieve-engine factory + 25-site refactor MERGED
3 #1372 — KB create/retrieve API + defensive logic MERGED
4 this PR — multi-store fan-out search here
5 KB create/detail UI planned

Depends on #994, #1310, and #1372.

Changes at a glance

  • HybridSearch now batch-loads every KB in scope, partitions them by (VectorStoreID, KB.TenantID), resolves each group's engine through the PR2 factory using the owning tenant for ownership lookup, and fans out retrieval via errgroup with SetLimit(4) and a per-goroutine context.WithTimeout (default 30s, env knob MULTI_STORE_RETRIEVE_TIMEOUT_SEC).
  • A new ScoreNormalizer interface (internal/application/service/retriever/normalizer.go) plus an EngineAwareNormalizer implementation rescales vector scores per engine type so that cross-engine merges produce a directly comparable ranked list. Keyword (BM25) scores pass through unchanged because the downstream RRF fusion is rank-based and immune to scale.
  • Embedding-model consistency is enforced explicitly via ResolveEmbeddingModelKeys. Multi-KB searches whose KBs resolve to different model identities return 400 (ErrBadRequest) instead of silently producing meaningless cross-model scores.
  • Per-KB authorization at the search boundary: same-tenant KBs are always accessible; foreign-tenant KBs (Organization-shared) must pass an explicit kbShareService.HasKBPermission check. Otherwise the search returns 404 (ErrNotFound) — the existence of a foreign KB is never leaked back to the caller.
  • The HybridSearch handler now preserves typed AppErrors (2200 / 2201 from feat(knowledge-base): validate vector store bindings on create, copy, and delete #1372 plus the new 400 / 404 above) end-to-end instead of downgrading them to InternalServerError.
  • Single-group calls (every KB on the env store, or every KB on the same DB store) bypass fan-out, normalization, and the timeout wrapper entirely — the dominant path today since every existing KB has vector_store_id = NULL.

File-by-file summary

Production

File Change
internal/application/service/retriever/normalizer.go (new) ScoreNormalizer(ctx, score, retrieverType, engineType) interface + EngineAwareNormalizer implementing each documented per-engine score formula (Elasticsearch / ElasticFaiss / Milvus cosine → (s+1)/2; Postgres / Qdrant / Weaviate / SQLite / Infinity / TencentVectorDB / Doris identity), plus clamp01 with NaN / ±Inf guards. The interface is invoked only on RetrieverType == Vector; the caller emits a deduplicated WARN for unknown engine types so Normalize itself stays IO-free and panic-free even on a nil ctx
internal/application/service/knowledgebase_search_storegroup.go (new) storeGroup struct (immutable BaseParams + mutable TopK), resolveStoreGroups (partitions by (storeID, kb.TenantID), resolves engine via PR2 factory with the owning tenant), classifyFactoryError (sentinel → typed AppError with sanitized structured logs, no UUID in user-facing messages), validateSameEmbeddingModel (uses ResolveEmbeddingModelKeys so cross-tenant shared KBs resolving to the same underlying model are correctly tolerated), authorizeKBAccess (per-KB share permission check before any fan-out or downstream retrieval)
internal/application/service/knowledgebase_search_fanout.go (new) retrieveFromStores with single-group fast path, errgroup.SetLimit(4), per-goroutine context.WithTimeout, mixed-engine normalization (only when results span >1 engine type, single source of truth: RetrieveResult.RetrieverEngineType), and isParentCancelled narrowed to context.Canceled so parent-deadline expiry surfaces as a typed 2201 instead of leaking context.DeadlineExceeded. Failure policy is all-or-nothing; the first group error fails the whole search with a single generic ErrVectorStoreUnavailable message
internal/application/service/knowledgebase_search.go HybridSearch rewired around the new helpers. Query embedding is pre-computed once before fan-out and propagated via params.QueryEmbedding so per-group buildRetrievalParams does not re-embed the same text N times. pickPrimary returns nil on miss (no kbs[0] fallback) so a caller-supplied id that is not in the scope produces a clean 404 rather than silently pivoting to an unintended KB
internal/application/service/knowledgebase_search_faq.go applyFAQPostProcessing and iterativeRetrieveWithDeduplication now operate on []*storeGroup and return ([]*IndexWithScore, error) so typed AppErrors raised inside the iterative fan-out path (per-group timeout, store binding invalid) surface to the user instead of being silently converted into a truncated chunk list. Each iteration mutates only group.TopK; the immutable BaseParams is rebuilt fresh per call by paramsWithTopK, leaving no aliasing for the parallel goroutines
internal/handler/knowledgebase.go HybridSearch handler wraps the service call with apperrors.IsAppError so typed 2200 / 2201 / 400 / 404 reach the client unchanged. Mirrors the pattern already in CreateKnowledgeBase

Tests

File Change
internal/application/service/retriever/normalizer_test.go (new) Table tests over the 10 documented engine types, keyword passthrough on every engine, unknown engine clamp, nil-ctx safety, NaN / ±Inf guards via clamp01, compile-time interface satisfaction assertion
internal/application/service/knowledgebase_search_fanout_test.go (new) Fan-out helpers (paramsWithTopK rebuilds fresh per call, hasMixedEngineTypes, isKnownEngineType, storeKindLabel, multiStoreRetrieveTimeout env override) + retrieveFromStores integration tests built on top of the PR2 factory: empty groups, single-group fast path, multi-group parallel concat, mixed-engine score normalization (ES → [0, 1] while PG passthrough), keyword passthrough on mixed engine, one-group failure → typed ErrVectorStoreUnavailable (no raw error leak), per-group timeout (MULTI_STORE_RETRIEVE_TIMEOUT_SEC=1) → 2201, iterative-FAQ pattern under -race (BaseParams TopK invariant preserved), iterative path propagates typed AppError, isParentCancelled distinguishes context.Canceled from context.DeadlineExceeded. Also covers classifyFactoryError sentinel → 2200 / 2201 / 2200 / passthrough with message sanitization, validateSameEmbeddingModel (single-KB no-op, same identity / different identity / wiki-only carve-out / cross-tenant same model / log-injection sanitization), and authorizeKBAccess (same tenant pass / foreign with share / foreign without share → 404 / no user → 404 / share lookup error → 500 / empty kbs no-op)

Backward compatibility

Surface Compatibility analysis
Single-KB search (no knowledge_base_ids in body) Same as today. Batch-load returns one row, resolveStoreGroups produces a single env-store group, fast path takes the single engine's Retrieve directly. Zero fan-out, normalization, or timeout overhead
Multi-KB search where every KB is unbound resolveStoreGroups produces a single env-store group → single-group fast path. Same behavior as before this PR
Multi-KB search where every KB is bound to the same store Single store-group → single-group fast path. Same behavior
Multi-KB search across different stores (new in this PR) Activated only when KBs in knowledge_base_ids actually live on different stores. Today the only such KBs would be those bound by clients that have started using #1372's vector_store_id field
Embedding-model mismatch in multi-KB search Used to silently produce incomparable scores; now an explicit 400 with a clear English message. Single-KB callers unaffected (the validator short-circuits on len(kbs) <= 1)
Foreign-tenant KB UUID injected into knowledge_base_ids Used to be filtered out implicitly downstream (the search engine simply had no data for that scope). Now an explicit 404 — the search never reaches the retrieval layer with an unauthorized KB id, and the existence of the foreign KB is not revealed
Per-group timeout via MULTI_STORE_RETRIEVE_TIMEOUT_SEC Default 30 s, applied only on the multi-group path. Single-store deployments see no behavior change
params.QueryEmbedding pre-computed by caller (e.g. chat pipeline) Embedding precompute is skipped when len(params.QueryEmbedding) > 0, so callers that already supply their own embedding are unaffected
Migrations None

Behavior change (intentional)

  • Embedding-model mismatch is now a 400. Multi-KB search across KBs that resolve to different embedding-model identities used to silently produce a single result list with incomparable scores (the query was embedded with the primary KB's model, then compared against vectors from incompatible model spaces). This PR rejects it explicitly so the user sees the failure rather than a misleading result set.
  • Foreign-tenant KB injection is now a 404. A caller could previously include another tenant's KB UUID in params.KnowledgeBaseIDs; downstream retrieval would find no matching data, but the response shape did not communicate that the scope was rejected. This PR runs an explicit per-KB permission check (kbShareService.HasKBPermission) and returns 404 — same status as if the KB did not exist, to avoid leaking foreign-tenant KB existence.

Known limitations

  • GetKnowledgeBaseByIDs is tenant-agnostic at the repository layer. Authorization is enforced at the service layer via authorizeKBAccess. Pushing tenant scope into the repository (and leveraging the idx_knowledge_bases_tenant_vector_store composite index added by feat: add vector_store_id column to knowledge_bases for per-KB vector store binding #994 for that path) is tracked as a separate hardening PR.
  • No global PostgreSQL connection-pool cap. The application-level g.SetLimit(4) bounds per-request fan-out, but the shared gorm pool currently leaves MaxOpenConns at the Go default (unlimited) for PostgreSQL. Operators should set SetMaxOpenConns per their PostgreSQL max_connections budget; that change is out of scope for this PR.
  • Engine-aware normalization formulas are a single central switch. Adding a new engine requires editing normalizer.go. A per-engine NormalizeScore method on RetrieveEngineService would distribute this cleanly but touches every engine implementation in the repository and is deferred to a follow-up.
  • FAQ iterative retrieval is not wrapped in a separate Langfuse span. Trace timing for FAQ KBs that fall into iterative retrieval currently shows only the first retrieveFromStores call; subsequent iterations are still inside the same parent context but outside the explicit "retrieve" child span. A separate span will be added in a follow-up cleanup.
  • Cross-model multi-KB search is rejected, not merged. The validator blocks the case explicitly; merging KBs across embedding models (group by model, embed per group, normalize and merge) requires retrieval-quality work and is left to a future issue.

Test plan

  • go build ./...
  • go vet ./...
  • go test -count=1 -race ./internal/application/service/retriever/... (normalizer table, NaN / Inf, nil-ctx, interface satisfaction)
  • go test -count=1 -race ./internal/application/service/... (resolveStoreGroups partition matrix, classifyFactoryError sentinel translation, validateSameEmbeddingModel matrix, authorizeKBAccess matrix, retrieveFromStores fast path / multi-group / mixed-engine / one-group failure / per-group timeout / keyword passthrough / iterative pattern under -race, FAQ iterative typed-AppError propagation, isParentCancelled discrimination)
  • go test -count=1 -race ./internal/handler/...
  • go test -count=1 -race ./internal/types/... ./internal/application/repository/... (no regressions on prior PRs)
  • Local E2E smoke against docker compose dev (PG + Qdrant + DocReader + MinIO + Redis) with an OpenAI-compatible embedding endpoint:
    • Single-KB search: 200, group count: 1
    • Multi-KB same store: 200, group count: 1
    • Multi-KB cross-store (env-PG + Qdrant): 200, group count: 2, results from both engines merged (server log shows engine: qdrant, retriever: vector|keywords and engine: postgres, retriever: vector|keywords co-firing)
    • Multi-KB embedding-model mismatch: 400 with the English selected knowledge bases use different embedding models message
    • Foreign-tenant KB injection (second tenant created via registration, its KB id added to knowledge_base_ids of a first-tenant search): 404 with knowledge base not found + structured audit log search scope rejected: unauthorized foreign-tenant KB

@ochanism ochanism force-pushed the enhancement/1206-4 branch from f5be7e9 to efc9980 Compare May 18, 2026 14:05
Multi-KB hybrid search now groups KBs by their bound VectorStore (partition
key (storeID, owner_tenant_id)), retrieves in parallel via errgroup with a
SetLimit(4) cap and a per-group timeout (MULTI_STORE_RETRIEVE_TIMEOUT_SEC,
default 30s), and merges results. When the collected results span more than
one engine type, an EngineAwareNormalizer rescales vector scores to [0, 1];
keyword (BM25) scores pass through to the existing RRF fusion. Single-group
calls take the fast path with zero fan-out overhead, preserving today's
behavior for deployments where every KB has vector_store_id = NULL.

Embedding-model consistency is now enforced explicitly via
ResolveEmbeddingModelKeys. Multi-KB searches across KBs whose resolved
model identities differ return BadRequest instead of silently producing
incomparable scores. Cross-tenant Organization-shared KBs are preserved by
partitioning on KB.TenantID so the factory's ownership lookup runs against
the source tenant. Foreign-tenant KB UUIDs injected via the request body
are rejected via kbShareService.HasTenantKBPermission (Plan 3 of Tencent#1303,
3-D capped) before any retrieval; rejected scopes surface as 404 to avoid
leaking foreign KB existence.

Service-layer typed AppErrors (ErrVectorStoreBindingInvalid 2200 /
ErrVectorStoreUnavailable 2201) are mapped from PR2 sentinel hierarchy and
preserved end-to-end: the iterative FAQ path returns them rather than
swallowing, and the HybridSearch handler routes typed AppErrors to the
client unchanged instead of downgrading to 500.

Part of Tencent#993 (Phase 2: Per-KB VectorStore Binding).
Phase 2 roadmap item: PR 4 (Multi-store fan-out search).
Depends on Tencent#994, Tencent#1310, Tencent#1372.
@ochanism ochanism force-pushed the enhancement/1206-4 branch from efc9980 to 0e6f823 Compare May 18, 2026 14:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant