-
Notifications
You must be signed in to change notification settings - Fork 286
[BUG] Retrieve tool incorrectly filters out relevant results when using AWS Aurora PostgreSQL as backing vector database due to cosine distance being used #417
Description
Checks
- I have updated to the lastest minor and patch version of Strands
- I have checked the documentation and this is not expected behavior
- I have searched ./issues and there are no duplicates of my issue
Strands Version
1.22.0
Tools Package Version
0.2.19
Tools used
- Retrieve
Python Version
3.12.11
Operating System
Mac OS X Tahoe 26.3.1
Installation Method
other
Steps to Reproduce
- Configure an Amazon Bedrock Knowledge Base using AWS Aurora PostgreSQL as the vector store.
- Ensure the vector store is configured to use the
pg_vectorso the cosine distance operator (<=>) is supported. - Initialize the retrieve tool using strands-agents=1.22.0 and strands-agents-tools=0.2.19.
- Perform a retrieval query with a high-relevance expected match and set a score threshold (e.g., score=0.4):
results = agent.tool.retrieve(
text="<query-with-known-exact-matches>",
score=0.4
)- Observe that the exact or highly relevant matches (which have scores < 0.4 due to being distances) are filtered out, while poor matches (distances >= 0.4) are returned.
Expected Behavior
The retrieve tool should either automatically detect the scoring metric being used by the underlying vector database, or provide a parameter (e.g., score_metric="distance" | "similarity") to determine whether it should filter using <= or >=.
Actual Behavior
The tool strictly applies score >= min_score, incorrectly dropping highly relevant documents when the Bedrock Knowledge Base returns distance metrics instead of similarity metrics.
Additional Context
The retrieve tool incorrectly filters out the most relevant search results when using Amazon Bedrock Knowledge Bases backed by an Aurora PostgreSQL database using the <=> (cosine distance) operator.
When performing a semantic search in PostgreSQL using pgvector, the <=> operator returns the cosine distance (1 - cosine_similarity). For this metric, a smaller score indicates a closer match (0 is an exact match).
However, the filter_results_by_score function in retrieve.py assumes that a higher score indicates higher relevance. It uses a greater-than-or-equal-to comparison (>= min_score), which effectively drops the most relevant results (scores close to 0) and only returns the least relevant results (scores above the threshold).
Possible Solution
No response
Related Issues
No response