Skip to content

[BUG] Retrieve tool incorrectly filters out relevant results when using AWS Aurora PostgreSQL as backing vector database due to cosine distance being used #417

@starJammer

Description

@starJammer

Checks

  • I have updated to the lastest minor and patch version of Strands
  • I have checked the documentation and this is not expected behavior
  • I have searched ./issues and there are no duplicates of my issue

Strands Version

1.22.0

Tools Package Version

0.2.19

Tools used

  1. Retrieve

Python Version

3.12.11

Operating System

Mac OS X Tahoe 26.3.1

Installation Method

other

Steps to Reproduce

  1. Configure an Amazon Bedrock Knowledge Base using AWS Aurora PostgreSQL as the vector store.
  2. Ensure the vector store is configured to use the pg_vector so the cosine distance operator (<=>) is supported.
  3. Initialize the retrieve tool using strands-agents=1.22.0 and strands-agents-tools=0.2.19.
  4. Perform a retrieval query with a high-relevance expected match and set a score threshold (e.g., score=0.4):
results = agent.tool.retrieve(
    text="<query-with-known-exact-matches>",
    score=0.4
)
  1. Observe that the exact or highly relevant matches (which have scores < 0.4 due to being distances) are filtered out, while poor matches (distances >= 0.4) are returned.

Expected Behavior

The retrieve tool should either automatically detect the scoring metric being used by the underlying vector database, or provide a parameter (e.g., score_metric="distance" | "similarity") to determine whether it should filter using <= or >=.

Actual Behavior

The tool strictly applies score >= min_score, incorrectly dropping highly relevant documents when the Bedrock Knowledge Base returns distance metrics instead of similarity metrics.

Additional Context

The retrieve tool incorrectly filters out the most relevant search results when using Amazon Bedrock Knowledge Bases backed by an Aurora PostgreSQL database using the <=> (cosine distance) operator.

When performing a semantic search in PostgreSQL using pgvector, the <=> operator returns the cosine distance (1 - cosine_similarity). For this metric, a smaller score indicates a closer match (0 is an exact match).

However, the filter_results_by_score function in retrieve.py assumes that a higher score indicates higher relevance. It uses a greater-than-or-equal-to comparison (>= min_score), which effectively drops the most relevant results (scores close to 0) and only returns the least relevant results (scores above the threshold).

Possible Solution

No response

Related Issues

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions