feat(search): fan-out KB retrieval across bound vector stores#1386
Open
ochanism wants to merge 1 commit into
Open
feat(search): fan-out KB retrieval across bound vector stores#1386ochanism wants to merge 1 commit into
ochanism wants to merge 1 commit into
Conversation
f5be7e9 to
efc9980
Compare
Multi-KB hybrid search now groups KBs by their bound VectorStore (partition key (storeID, owner_tenant_id)), retrieves in parallel via errgroup with a SetLimit(4) cap and a per-group timeout (MULTI_STORE_RETRIEVE_TIMEOUT_SEC, default 30s), and merges results. When the collected results span more than one engine type, an EngineAwareNormalizer rescales vector scores to [0, 1]; keyword (BM25) scores pass through to the existing RRF fusion. Single-group calls take the fast path with zero fan-out overhead, preserving today's behavior for deployments where every KB has vector_store_id = NULL. Embedding-model consistency is now enforced explicitly via ResolveEmbeddingModelKeys. Multi-KB searches across KBs whose resolved model identities differ return BadRequest instead of silently producing incomparable scores. Cross-tenant Organization-shared KBs are preserved by partitioning on KB.TenantID so the factory's ownership lookup runs against the source tenant. Foreign-tenant KB UUIDs injected via the request body are rejected via kbShareService.HasTenantKBPermission (Plan 3 of Tencent#1303, 3-D capped) before any retrieval; rejected scopes surface as 404 to avoid leaking foreign KB existence. Service-layer typed AppErrors (ErrVectorStoreBindingInvalid 2200 / ErrVectorStoreUnavailable 2201) are mapped from PR2 sentinel hierarchy and preserved end-to-end: the iterative FAQ path returns them rather than swallowing, and the HybridSearch handler routes typed AppErrors to the client unchanged instead of downgrading to 500. Part of Tencent#993 (Phase 2: Per-KB VectorStore Binding). Phase 2 roadmap item: PR 4 (Multi-store fan-out search). Depends on Tencent#994, Tencent#1310, Tencent#1372.
efc9980 to
0e6f823
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fan out multi-KB hybrid search across the VectorStore each knowledge base is bound to. KBs that share a store (or all share the env store) still take the existing single-engine path with zero overhead. When KBs span more than one store, retrieval runs in parallel via
errgroupwith a bounded fan-out and a per-group timeout, and per-engine vector scores are normalized to a common[0, 1]scale before fusion.Multi-KB search now also explicitly enforces embedding-model consistency and rejects scopes that include knowledge bases the caller is not entitled to read.
Context
Part of #993 — Phase 2 (Per-KB VectorStore Binding) of the multi-store roadmap.
Phase 2 PR sequence (this is PR 4 of 5):
vector_store_idcolumn + migrationDepends on #994, #1310, and #1372.
Changes at a glance
HybridSearchnow batch-loads every KB in scope, partitions them by(VectorStoreID, KB.TenantID), resolves each group's engine through the PR2 factory using the owning tenant for ownership lookup, and fans out retrieval viaerrgroupwithSetLimit(4)and a per-goroutinecontext.WithTimeout(default 30s, env knobMULTI_STORE_RETRIEVE_TIMEOUT_SEC).ScoreNormalizerinterface (internal/application/service/retriever/normalizer.go) plus anEngineAwareNormalizerimplementation rescales vector scores per engine type so that cross-engine merges produce a directly comparable ranked list. Keyword (BM25) scores pass through unchanged because the downstream RRF fusion is rank-based and immune to scale.ResolveEmbeddingModelKeys. Multi-KB searches whose KBs resolve to different model identities return 400 (ErrBadRequest) instead of silently producing meaningless cross-model scores.kbShareService.HasKBPermissioncheck. Otherwise the search returns 404 (ErrNotFound) — the existence of a foreign KB is never leaked back to the caller.InternalServerError.vector_store_id = NULL.File-by-file summary
Production
internal/application/service/retriever/normalizer.go(new)ScoreNormalizer(ctx, score, retrieverType, engineType)interface +EngineAwareNormalizerimplementing each documented per-engine score formula (Elasticsearch / ElasticFaiss / Milvus cosine →(s+1)/2; Postgres / Qdrant / Weaviate / SQLite / Infinity / TencentVectorDB / Doris identity), plusclamp01with NaN / ±Inf guards. The interface is invoked only onRetrieverType == Vector; the caller emits a deduplicated WARN for unknown engine types soNormalizeitself stays IO-free and panic-free even on a nil ctxinternal/application/service/knowledgebase_search_storegroup.go(new)storeGroupstruct (immutableBaseParams+ mutableTopK),resolveStoreGroups(partitions by(storeID, kb.TenantID), resolves engine via PR2 factory with the owning tenant),classifyFactoryError(sentinel → typed AppError with sanitized structured logs, no UUID in user-facing messages),validateSameEmbeddingModel(usesResolveEmbeddingModelKeysso cross-tenant shared KBs resolving to the same underlying model are correctly tolerated),authorizeKBAccess(per-KB share permission check before any fan-out or downstream retrieval)internal/application/service/knowledgebase_search_fanout.go(new)retrieveFromStoreswith single-group fast path,errgroup.SetLimit(4), per-goroutinecontext.WithTimeout, mixed-engine normalization (only when results span >1 engine type, single source of truth:RetrieveResult.RetrieverEngineType), andisParentCancellednarrowed tocontext.Canceledso parent-deadline expiry surfaces as a typed 2201 instead of leakingcontext.DeadlineExceeded. Failure policy is all-or-nothing; the first group error fails the whole search with a single genericErrVectorStoreUnavailablemessageinternal/application/service/knowledgebase_search.goHybridSearchrewired around the new helpers. Query embedding is pre-computed once before fan-out and propagated viaparams.QueryEmbeddingso per-groupbuildRetrievalParamsdoes not re-embed the same text N times.pickPrimaryreturnsnilon miss (nokbs[0]fallback) so a caller-supplied id that is not in the scope produces a clean 404 rather than silently pivoting to an unintended KBinternal/application/service/knowledgebase_search_faq.goapplyFAQPostProcessinganditerativeRetrieveWithDeduplicationnow operate on[]*storeGroupand return([]*IndexWithScore, error)so typed AppErrors raised inside the iterative fan-out path (per-group timeout, store binding invalid) surface to the user instead of being silently converted into a truncated chunk list. Each iteration mutates onlygroup.TopK; the immutableBaseParamsis rebuilt fresh per call byparamsWithTopK, leaving no aliasing for the parallel goroutinesinternal/handler/knowledgebase.goHybridSearchhandler wraps the service call withapperrors.IsAppErrorso typed 2200 / 2201 / 400 / 404 reach the client unchanged. Mirrors the pattern already inCreateKnowledgeBaseTests
internal/application/service/retriever/normalizer_test.go(new)clamp01, compile-time interface satisfaction assertioninternal/application/service/knowledgebase_search_fanout_test.go(new)retrieveFromStoresintegration tests built on top of the PR2 factory: empty groups, single-group fast path, multi-group parallel concat, mixed-engine score normalization (ES →[0, 1]while PG passthrough), keyword passthrough on mixed engine, one-group failure → typedErrVectorStoreUnavailable(no raw error leak), per-group timeout (MULTI_STORE_RETRIEVE_TIMEOUT_SEC=1) → 2201, iterative-FAQ pattern under-race(BaseParams TopK invariant preserved), iterative path propagates typed AppError,isParentCancelleddistinguishescontext.Canceledfromcontext.DeadlineExceeded. Also coversclassifyFactoryErrorsentinel → 2200 / 2201 / 2200 / passthrough with message sanitization,validateSameEmbeddingModel(single-KB no-op, same identity / different identity / wiki-only carve-out / cross-tenant same model / log-injection sanitization), andauthorizeKBAccess(same tenant pass / foreign with share / foreign without share → 404 / no user → 404 / share lookup error → 500 / empty kbs no-op)Backward compatibility
knowledge_base_idsin body)resolveStoreGroupsproduces a single env-store group, fast path takes the single engine'sRetrievedirectly. Zero fan-out, normalization, or timeout overheadresolveStoreGroupsproduces a single env-store group → single-group fast path. Same behavior as before this PRknowledge_base_idsactually live on different stores. Today the only such KBs would be those bound by clients that have started using #1372'svector_store_idfieldlen(kbs) <= 1)knowledge_base_idsMULTI_STORE_RETRIEVE_TIMEOUT_SECparams.QueryEmbeddingpre-computed by caller (e.g. chat pipeline)len(params.QueryEmbedding) > 0, so callers that already supply their own embedding are unaffectedBehavior change (intentional)
params.KnowledgeBaseIDs; downstream retrieval would find no matching data, but the response shape did not communicate that the scope was rejected. This PR runs an explicit per-KB permission check (kbShareService.HasKBPermission) and returns 404 — same status as if the KB did not exist, to avoid leaking foreign-tenant KB existence.Known limitations
GetKnowledgeBaseByIDsis tenant-agnostic at the repository layer. Authorization is enforced at the service layer viaauthorizeKBAccess. Pushing tenant scope into the repository (and leveraging theidx_knowledge_bases_tenant_vector_storecomposite index added by feat: add vector_store_id column to knowledge_bases for per-KB vector store binding #994 for that path) is tracked as a separate hardening PR.g.SetLimit(4)bounds per-request fan-out, but the shared gorm pool currently leavesMaxOpenConnsat the Go default (unlimited) for PostgreSQL. Operators should setSetMaxOpenConnsper their PostgreSQLmax_connectionsbudget; that change is out of scope for this PR.normalizer.go. A per-engineNormalizeScoremethod onRetrieveEngineServicewould distribute this cleanly but touches every engine implementation in the repository and is deferred to a follow-up.retrieveFromStorescall; subsequent iterations are still inside the same parent context but outside the explicit "retrieve" child span. A separate span will be added in a follow-up cleanup.Test plan
go build ./...go vet ./...go test -count=1 -race ./internal/application/service/retriever/...(normalizer table, NaN / Inf, nil-ctx, interface satisfaction)go test -count=1 -race ./internal/application/service/...(resolveStoreGroups partition matrix, classifyFactoryError sentinel translation, validateSameEmbeddingModel matrix, authorizeKBAccess matrix, retrieveFromStores fast path / multi-group / mixed-engine / one-group failure / per-group timeout / keyword passthrough / iterative pattern under -race, FAQ iterative typed-AppError propagation, isParentCancelled discrimination)go test -count=1 -race ./internal/handler/...go test -count=1 -race ./internal/types/... ./internal/application/repository/...(no regressions on prior PRs)group count: 1group count: 1group count: 2, results from both engines merged (server log showsengine: qdrant, retriever: vector|keywordsandengine: postgres, retriever: vector|keywordsco-firing)selected knowledge bases use different embedding modelsmessageknowledge_base_idsof a first-tenant search): 404 withknowledge base not found+ structured audit logsearch scope rejected: unauthorized foreign-tenant KB