Skip to content

fix(agent): resolve fabricated chunk citations#1325

Open
wolfkill wants to merge 1 commit into
Tencent:mainfrom
wolfkill:fix/chunk-citation-fallback
Open

fix(agent): resolve fabricated chunk citations#1325
wolfkill wants to merge 1 commit into
Tencent:mainfrom
wolfkill:fix/chunk-citation-fallback

Conversation

@wolfkill
Copy link
Copy Markdown
Contributor

Description

Fixes #1323.

Smart-reasoning final answers can sometimes cite fabricated chunk IDs such as <knowledge_id>_chunk_<index> instead of the UUID chunk IDs returned by knowledge_search / grep_chunks. The frontend then calls /api/v1/chunks/by-id/{chunk_id} and receives Chunk not found.

This PR keeps the normal UUID lookup path unchanged, then adds a compatibility fallback only when the direct chunk lookup misses:

  • Parse strict uuid_chunk_<index> references.
  • Resolve them by knowledge_id + chunk_index, preferring text chunks when multiple chunk types share an index.
  • Preserve the existing permission flow: the handler still validates access through the resolved chunk's knowledge item before returning data.
  • Tighten the final-answer prompt so the model uses <kb doc="..." chunk_id="..."/> citations with the exact tool-returned chunk_id, and does not derive IDs from knowledge_id/chunk_index.

Testing

  • CGO_LDFLAGS='-lc++' go test ./internal/application/service -run 'TestGetChunkByIDOnly(ResolvesFabricatedKnowledgeChunkReference|RejectsMalformedFabricatedReference)' -count=1
  • CGO_LDFLAGS='-lc++' go test ./internal/application/repository -run TestGetChunkByKnowledgeIDAndIndexOnlyPrefersTextChunk -count=1
  • CGO_LDFLAGS='-lc++' go test ./internal/application/repository ./internal/application/service ./internal/handler -run '^$' -count=1
  • CGO_LDFLAGS='-lc++' go test ./internal/agent -run '^$' -count=1
  • git diff --check
  • Docker-backed local HTTP validation with Postgres, Redis, DocReader, and local go run ./cmd/server: inserted a temporary KB/knowledge/chunk, verified both the real UUID and knowledge_id_chunk_7 return the same chunk through /api/v1/chunks/by-id/{id}, verified malformed fallback IDs still return 404, then cleaned up the temp rows and restored the local test tenant API key.

Notes

No database migration is required. The fallback query is only used after direct UUID lookup returns not found.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Agent 回答引用内容显示 Chunk not found — LLM 伪造 chunk_id 而非使用工具返回的 UUID

1 participant