feat(knowledge-base): validate vector store bindings on create, copy, and delete by ochanism · Pull Request #1372 · Tencent/WeKnora

ochanism · 2026-05-18T03:16:08Z

Context

Part of #993 — Phase 2 (Per-KB VectorStore Binding) of the multi-store roadmap.

Phase 2 PR sequence (this is PR 3 of 5):

#	PR	Status
1	#994 — `vector_store_id` column + migration	MERGED
2	#1310 — KB-scoped retrieve-engine factory + 25-site refactor	MERGED
3	this PR — KB create/retrieve API + defensive logic	here
4	multi-store fan-out search	planned
5	KB create/detail UI	planned

Depends on #994 and #1310.

Changes at a glance

POST /knowledge-bases accepts an optional vector_store_id and validates it against the caller's tenant scope + the engine registry before persisting.
Two new typed error codes distinguish the validation outcomes:
- 2200 ErrVectorStoreBindingInvalid — UUID is malformed, does not exist, or belongs to another tenant (single message; intentionally no enumeration oracle).
- 2201 ErrVectorStoreUnavailable — store exists in DB but is not registered in the engine registry; check connection_config.
KB responses (POST / GET /:id / PUT / PUT /:id/pin) gain four read-only fields describing the bound store: vector_store_name, vector_store_source, vector_store_engine_type, vector_store_status. ListKnowledgeBases deliberately does not include these to avoid N+1 queries.
Cross-tenant shared-KB responses receive a suppressed payload: vector_store_id is stripped and metadata fields are set to source="shared". The owner tenant's store inventory cannot be correlated or enumerated.
POST /knowledge-bases/copy rejects mismatched clones synchronously, before enqueueing the async clone task:
- 400 if source.embedding_model_id != target.embedding_model_id
- 400 if source.vector_store_id != target.vector_store_id
DELETE /vector-stores/:id refuses to remove a store that still has bound knowledge bases. The check runs inside a transaction that row-locks the store on PostgreSQL (SELECT … FOR UPDATE) and serializes via WAL on SQLite. A panic in the in-memory registry unregister is recovered into a structured warning rather than silently leaking a stale engine.
vector_store_id is immutable post-creation. The GORM <-:create tag (from feat: add vector_store_id column to knowledge_bases for per-KB vector store binding #994) blocks every ORM update path; the service-layer UpdateKnowledgeBaseRequest DTO omits the field entirely; a reflection-based test catches any future regression in either layer.
Empty-string vector_store_id is normalized to nil at the create boundary and inside SharesStoreWith, so rows persisted by callers that did not run Normalize() first (raw-SQL writes, external migrations, ops scripts) still compare correctly.

File-by-file summary

Production

File	Change
`internal/types/knowledgebase.go`	`HasVectorStore()`, `Normalize()`, `SharesStoreWith()` helpers (all nil-safe)
`internal/types/vectorstore.go`	`StoreDisplay` DTO + `StoreSource*` constants (`env` / `user` / `shared` / `unavailable`) + display-payload constructors
`internal/types/interfaces/knowledgebase.go`	`CountByVectorStoreID(ctx, *gorm.DB, tenantID, storeID)` interface — accepts a handle so tx and non-tx callers share one implementation
`internal/types/interfaces/vectorstore.go`	`ResolveStoreView` + `BatchResolveStoreView` on `VectorStoreService`
`internal/application/repository/knowledgebase.go`	`CountByVectorStoreID` implementation; uses GORM's auto soft-delete scope (no explicit `deleted_at IS NULL` predicate)
`internal/application/service/retriever/factory.go`	`VerifyBinding(ctx, registry, ownership, tenantID, storeID)` — exposes the existing ownership + registry sentinel hierarchy so the KB create path can reuse the same source of truth
`internal/application/service/knowledgebase.go`	`validateVectorStoreBinding` (UUID parse → `VerifyBinding` → typed AppError translation); embedding-model + store defenses inside `CopyKnowledgeBase`; new-target clone now inherits `VectorStoreID`; `Update` godoc documents the immutability contract
`internal/application/service/vectorstore.go`	DI extended with `kbRepo` + `*gorm.DB` + cached `envStores`; `DeleteStore` transactional binding guard with PG row-lock; `unregisterSafely` panic-recovery; `ResolveStoreView` + `BatchResolveStoreView` implementations
`internal/container/container.go`	Startup assertion that `Dialector.Name()` is `"postgres"` or `"sqlite"` — guards against silent SQLite fallback under a future driver swap
`internal/errors/errors.go`	New error codes 2200 / 2201 + matching constructors
`internal/logger/logger.go`	`WarnWithFields(ctx, fields, msg)` helper + `Fields` alias
`internal/handler/knowledgebase.go`	`buildKBResponse` map-merge helper (avoids `KnowledgeBase.MarshalJSON` swallowing the new fields); `resolveKBStoreView` (env / shared / DB / unavailable branches); response builder applied to `Create` / `Get` / `Update` / `TogglePin`; `Copy` pre-flight (same checks as the service-layer worker entry); typed-AppError unwrap on `Create` so 2200 / 2201 reach the client

API docs

File	Change
`docs/api/knowledge-base.md`	`vector_store_id` request field; new response fields + meanings table; `POST /copy` synchronous pre-flight section; new error code rows; explicit note that `GET /knowledge-bases` (list) does NOT include resolved store metadata
`docs/api/vector-store.md`	`DELETE /:id` binding-guard section with example 400 body; updated error-code table; env-store section addendum

Tests

File	Change
`internal/types/knowledgebase_test.go`	`HasVectorStore`, `Normalize`, `SharesStoreWith` matrices including the empty-string-vs-nil normalization
`internal/application/repository/knowledgebase_sqlite_test.go`	`TestCountByVectorStoreID` against in-memory sqlite — tenant scope, soft-delete auto-scope, empty-string row exclusion, shared-tx idempotency
`internal/application/service/retriever/factory_test.go`	`TestVerifyBinding` — ownership infra error, not owned, registry miss, success, cross-tenant
`internal/application/service/knowledgebase_pr3_test.go` (new)	Create validation matrix (UUID parse, normalization, sentinel wrap, error code distinction) + CopyKB defenses including round-trip assertion that the new KB inherits `VectorStoreID`
`internal/application/service/vectorstore_test.go`	DeleteStore guard against real sqlite tx (success, bound count, soft-deleted ignored, cross-tenant ignored, rollback, panic recovery) + ResolveStoreView / BatchResolveStoreView matrices
`internal/handler/knowledgebase_request_test.go` (new)	Reflection assertion that neither `UpdateKnowledgeBaseRequest` nor `KnowledgeBaseConfig` exposes `VectorStoreID` (structural enforcement of the service-layer immutability tier)
`internal/handler/knowledgebase_copy_preflight_test.go` (new)	Synchronous 400 for mismatched embedding model / vector store; user-facing message must not contain internal roadmap labels
`internal/handler/knowledgebase_pr3_response_test.go` (new)	Typed error code 2200 / 2201 preserved at the HTTP boundary; raw infra errors still fall through to 500; shared-KB response strips `vector_store_id` while owner response retains it

Backward compatibility

A pre-existing client that does not send vector_store_id sees zero behavior change.

Surface	Compatibility analysis
KB create without `vector_store_id`	JSON unmarshal → `kb.VectorStoreID = nil` → `Normalize()` is a no-op → `HasVectorStore()` is false → validation skipped → DB row stores NULL → retrieve-engine factory falls back to the tenant's effective engines (env store), same as before this PR
KB create with `vector_store_id = ""`	Normalized to `nil` and treated identically to "not sent" (matches the factory's nil/empty pre-condition from #1310)
KB update	DTO does not accept `vector_store_id`; a client that round-trips a full KB object (GET → modify → PUT) is silently ignored on this field — round-trip pattern is not broken
KB response shape	Four new string fields are added; clients that ignore unknown JSON fields (the REST norm and the WeKnora frontend's pattern) are unaffected. Strict-schema clients should treat these as additive
`POST /knowledge-bases/copy`	Same embedding model + same store → 100% identical behavior. Different embedding model used to be a silent corruption path (mixed vector spaces); now a 400. Different vector store used to land on a shared physical index implicitly; now an explicit 400 with a message pointing at the not-yet-supported migration capability
`DELETE /vector-stores/:id`	Guard is a no-op when there are no bound KBs. At merge time of this PR every KB has `vector_store_id IS NULL` (no UI yet binds a store), so the guard is effectively dormant until clients start opting in
`KBDeletePayload` / `IndexDeletePayload` in Asynq	`VectorStoreID *string` was added with `omitempty` in #1310; tasks enqueued before that upgrade decode it as `nil` and transparently fall back to the pre-serialized effective engines — same here
Migrations	None. The column and the composite index were added by #994

Behavior change (intentional)

The current CopyKnowledgeBase silently allows cloning between KBs with different embedding models, producing vectors that are incomparable across the resulting KBs. This PR turns that case into a 400 with a clear message. Cross-store clones are 400'd for the same reason — cross-store data migration is a separate piece of work tracked in the parent issue.

Known limitations

Create-Delete race — a KB can be created against a store that is simultaneously being deleted. The factory rejects subsequent searches with ErrVectorStoreForbidden / NotFound, and the response view surfaces vector_store_status: "unavailable" so the UI can guide recovery. Closing the window fully would require SELECT … FOR SHARE on the store row during every CreateKB, which is disproportionate cost for a corner case.
Multi-replica registry staleness — UnregisterByStoreID only updates the in-memory registry of the deleting process. Sibling replicas continue to serve the engine until process restart. Documented; cluster-wide invalidation is a separate hardening item.
ListKnowledgeBases does not enrich store metadata — N+1 avoidance. The detail endpoint covers the upcoming UI; a batch resolver (BatchResolveStoreView) is shipped on the service interface for the follow-up UI PR to wire up.
KBDeletePayload / IndexDeletePayload are unsigned — pre-existing risk inherited from refactor(retriever): introduce factory for KB-scoped retrieve engine resolution #1310. The factory's ownership chain (payload.TenantID ↔ store's tenant_id) prevents cross-tenant delete even with a tampered payload. Adding HMAC is a separate hardening PR.

Test plan

go build ./...
go vet ./...
go test -race ./internal/application/service/retriever/... (sentinel matrix + parallel invocation)
go test -race ./internal/types/... (nil-safe helpers + JSON shape regressions)
go test ./internal/application/repository/... (CountByVectorStoreID + GORM <-:create immutability)
go test ./internal/application/service/... (CreateKB validation matrix, CopyKB defenses, DeleteStore guard against sqlite, ResolveStoreView / BatchResolveStoreView)
go test ./internal/handler/... (typed error code preservation, shared-KB UUID suppression, copy pre-flight, reflection-based immutability)
Manual smoke against PostgreSQL: create / get / delete flow with and without vector_store_id
Manual smoke against the SQLite Lite edition

… and delete Wires KnowledgeBase.VectorStoreID and the ownership-aware retrieve factory into the user-facing knowledge-base lifecycle: - POST /knowledge-bases validates the requested vector_store_id against the caller's tenant scope and the engine registry. New error codes ErrVectorStoreBindingInvalid (2200) and ErrVectorStoreUnavailable (2201) distinguish the typed branches without echoing UUIDs to the client. - GET / POST / PUT / PUT-pin responses embed the bound store's display metadata (name, source, engine_type, status) without exposing any connection credentials. Cross-tenant shared KBs receive a suppressed payload (vector_store_id stripped, source="shared") so operator-chosen store names cannot be enumerated across tenants. - POST /knowledge-bases/copy synchronously rejects clones whose target has a different embedding model or vector store, before the async clone task is enqueued. The async clone worker re-applies the same checks for defense in depth. - DELETE /vector-stores/:id refuses to remove a store with bound KBs, inside a transaction that row-locks the store on PostgreSQL and serializes via WAL on SQLite. unregister-from-registry is wrapped in defer/recover so a panic surfaces as a structured warning instead of silently leaking a stale engine. - vector_store_id is immutable after creation. The GORM <-:create tag blocks every ORM update path; the service-layer DTO omits the field entirely; a reflection-based regression test catches any future maintainer who adds it back to either layer. - Empty-string vector_store_id is normalized to nil at both the create path and inside SharesStoreWith, so rows persisted by callers that did not run Normalize first cannot trip false same-store comparisons. Part of Tencent#993. Depends on Tencent#994 and Tencent#1310.

Multi-KB hybrid search now groups KBs by their bound VectorStore (partition key (storeID, owner_tenant_id)), retrieves in parallel via errgroup with a SetLimit(4) cap and a per-group timeout (MULTI_STORE_RETRIEVE_TIMEOUT_SEC, default 30s), and merges results. When the collected results span more than one engine type, an EngineAwareNormalizer rescales vector scores to [0, 1]; keyword (BM25) scores pass through to the existing RRF fusion. Single-group calls take the fast path with zero fan-out overhead, preserving today's behavior for deployments where every KB has vector_store_id = NULL. Embedding-model consistency is now enforced explicitly via ResolveEmbeddingModelKeys. Multi-KB searches across KBs whose resolved model identities differ return BadRequest instead of silently producing incomparable scores. Cross-tenant Organization-shared KBs are preserved by partitioning on KB.TenantID so the factory's ownership lookup runs against the source tenant. Service-layer typed AppErrors (ErrVectorStoreBindingInvalid 2200 / ErrVectorStoreUnavailable 2201) are mapped from PR2 sentinel hierarchy and preserved end-to-end: the iterative FAQ path returns them rather than swallowing, and the HybridSearch handler routes typed AppErrors to the client unchanged instead of downgrading to 500. Part of Tencent#993 (Phase 2: Per-KB VectorStore Binding). Phase 2 roadmap item: PR 4 (Multi-store fan-out search). Depends on Tencent#994, Tencent#1310, Tencent#1372.

Multi-KB hybrid search now groups KBs by their bound VectorStore (partition key (storeID, owner_tenant_id)), retrieves in parallel via errgroup with a SetLimit(4) cap and a per-group timeout (MULTI_STORE_RETRIEVE_TIMEOUT_SEC, default 30s), and merges results. When the collected results span more than one engine type, an EngineAwareNormalizer rescales vector scores to [0, 1]; keyword (BM25) scores pass through to the existing RRF fusion. Single-group calls take the fast path with zero fan-out overhead, preserving today's behavior for deployments where every KB has vector_store_id = NULL. Embedding-model consistency is now enforced explicitly via ResolveEmbeddingModelKeys. Multi-KB searches across KBs whose resolved model identities differ return BadRequest instead of silently producing incomparable scores. Cross-tenant Organization-shared KBs are preserved by partitioning on KB.TenantID so the factory's ownership lookup runs against the source tenant. Foreign-tenant KB UUIDs injected via the request body are rejected via kbShareService.HasTenantKBPermission (Plan 3 of Tencent#1303, 3-D capped) before any retrieval; rejected scopes surface as 404 to avoid leaking foreign KB existence. Service-layer typed AppErrors (ErrVectorStoreBindingInvalid 2200 / ErrVectorStoreUnavailable 2201) are mapped from PR2 sentinel hierarchy and preserved end-to-end: the iterative FAQ path returns them rather than swallowing, and the HybridSearch handler routes typed AppErrors to the client unchanged instead of downgrading to 500. Part of Tencent#993 (Phase 2: Per-KB VectorStore Binding). Phase 2 roadmap item: PR 4 (Multi-store fan-out search). Depends on Tencent#994, Tencent#1310, Tencent#1372.

ochanism force-pushed the enhancement/1206-3 branch from ce62712 to a2474ce Compare May 18, 2026 04:30

lyingbug merged commit 0e8de61 into Tencent:main May 18, 2026

ochanism mentioned this pull request May 18, 2026

feat(search): fan-out KB retrieval across bound vector stores #1386

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(knowledge-base): validate vector store bindings on create, copy, and delete#1372

feat(knowledge-base): validate vector store bindings on create, copy, and delete#1372
lyingbug merged 1 commit into
Tencent:mainfrom
ochanism:enhancement/1206-3

ochanism commented May 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ochanism commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

Changes at a glance

File-by-file summary

Production

API docs

Tests

Backward compatibility

Behavior change (intentional)

Known limitations

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ochanism commented May 18, 2026 •

edited

Loading