Skip to content

openai-agents tests

fb348bb
Select commit
Loading
Failed to load commit list.
Draft

feat: Send GenAI spans as V2 envelope items #6079

openai-agents tests
fb348bb
Select commit
Loading
Failed to load commit list.
@sentry/warden / warden: code-review completed Apr 17, 2026 in 39m 40s

9 issues

code-review: Found 9 issues (2 high, 7 medium)

High

Typo 'spans' instead of 'span' causes test to capture no span items - `tests/integrations/openai_agents/test_openai_agents.py:528`

On line 528, capture_items("transaction", "spans") uses the incorrect item type "spans" instead of "span". The capture_items fixture filters items by item.type, which is "span" (singular). As a result, no span items will be captured, and line 537's filter item.type == "span" will return an empty list. The subsequent next() call on line 538-540 will raise a StopIteration exception, causing the test to fail.

Also found at:

  • tests/integrations/openai_agents/test_openai_agents.py:1731
  • tests/integrations/openai_agents/test_openai_agents.py:1796
Test accesses wrong span format - transaction spans have 'op' not 'attributes.sentry.op' - `tests/integrations/pydantic_ai/test_pydantic_ai.py:830-838`

In test_message_history, spans are extracted from second_transaction["spans"] (line 830) but then filtered using s["attributes"].get("sentry.op", "") (line 832). Transaction-embedded spans use the legacy format with s["op"] and s["data"], not s["attributes"]. This inconsistency will cause the filter to find zero matches since the spans don't have an attributes key, making the test assertions pass vacuously or fail with KeyError.

Also found at:

  • tests/tracing/test_misc.py:628

Medium

Wrong event variable passed to span conversion - uses original event instead of prepared event - `sentry_sdk/client.py:1134`

On line 1134, event (the original function parameter) is passed to _serialized_v1_span_to_serialized_v2_span() instead of event_opt (the prepared/processed event). The _prepare_event() function populates release, environment, and sdk fields from options (lines 805-811 in client.py), and applies scope data. Since _serialized_v1_span_to_serialized_v2_span() extracts these values to populate span attributes (like sentry.release, sentry.environment, sentry.sdk.name), using the original event will result in missing or incomplete attributes on the converted GenAI spans.

Sort key uses 'name' twice instead of 'name' and 'description' - `tests/integrations/google_genai/test_google_genai.py:330`

The sorting lambda uses t.get("name", "") twice as the sort key tuple, but the comment says "sort by name and description for comparison". This appears to be a copy-paste error during refactoring. The second key should be t.get("description", "") to match the stated intent and ensure deterministic ordering when multiple tools have the same name.

Test uses incorrect key 'attributes' instead of 'data' for inline_data - `tests/integrations/google_genai/test_google_genai.py:2153`

The test was changed to use attributes as the key for binary data in inline_data, but the Google GenAI SDK uses data. The transform_google_content_part function (sentry_sdk/ai/utils.py:286) accesses inline_data.get("data", ""), so this test now passes accidentally due to the code overwriting result["content"] with BLOB_DATA_SUBSTITUTE regardless of input. This means the test no longer validates correct handling of real Google GenAI inline_data dictionaries.

Also found at:

  • tests/integrations/pydantic_ai/test_pydantic_ai.py:490-496
Hardcoded SDK version will cause test failures on version bumps - `tests/integrations/huggingface_hub/test_huggingface_hub.py:523`

The test hardcodes "sentry.sdk.version": "2.58.0" instead of using mock.ANY like all other similar tests in this file and other test files. This will cause the test to fail when the SDK version is incremented, making this test brittle and requiring manual updates with each release.

Unused list comprehension results in dead code and no test assertions - `tests/integrations/langchain/test_langchain.py:1840-1844`

The list comprehension at lines 1840-1844 creates a list that is never assigned to a variable or used for any assertion. This makes the test test_langchain_embeddings_error_handling effectively test nothing after the error is raised - it only verifies that the ValueError is raised, but makes no assertions about the captured data. Additionally, the capture_items call at line 1821 only captures 'transaction' and 'span' types, but the comprehension filters for item.type == 'event', which would never match anyway.

Test assertions silently skipped due to missing 'span' in capture_items types - `tests/integrations/litellm/test_litellm.py:945`

At line 945, capture_items("transaction") only captures transaction items, but later assertions (lines 1020-1023, outside the hunk) iterate over items filtering for item.type == "span". Since spans aren't captured, the spans list will be empty and the for-loop never executes, causing the test to silently pass without verifying any span attributes.

Also found at:

  • tests/integrations/litellm/test_litellm.py:1020-1023
Removed assertion weakens test coverage for concurrent transaction capture - `tests/integrations/openai_agents/test_openai_agents.py:2275`

The original test test_multiple_agents_asyncio had an explicit assert len(events) == 3 to verify exactly 3 transactions were captured. This assertion was removed during refactoring. If fewer transactions are captured, unpacking will fail with a ValueError (not an assertion), and if more transactions are captured, extras are silently ignored due to generator unpacking semantics.


Duration: 39m 20s · Tokens: 14.5M in / 179.4k out · Cost: $20.80 (+extraction: $0.01, +merge: $0.01, +fix_gate: $0.03)

Annotations

Check failure on line 528 in tests/integrations/openai_agents/test_openai_agents.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: code-review

Typo 'spans' instead of 'span' causes test to capture no span items

On line 528, `capture_items("transaction", "spans")` uses the incorrect item type `"spans"` instead of `"span"`. The `capture_items` fixture filters items by `item.type`, which is `"span"` (singular). As a result, no span items will be captured, and line 537's filter `item.type == "span"` will return an empty list. The subsequent `next()` call on line 538-540 will raise a `StopIteration` exception, causing the test to fail.

Check failure on line 1731 in tests/integrations/openai_agents/test_openai_agents.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: code-review

[YAP-E7Q] Typo 'spans' instead of 'span' causes test to capture no span items (additional location)

On line 528, `capture_items("transaction", "spans")` uses the incorrect item type `"spans"` instead of `"span"`. The `capture_items` fixture filters items by `item.type`, which is `"span"` (singular). As a result, no span items will be captured, and line 537's filter `item.type == "span"` will return an empty list. The subsequent `next()` call on line 538-540 will raise a `StopIteration` exception, causing the test to fail.

Check failure on line 1796 in tests/integrations/openai_agents/test_openai_agents.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: code-review

[YAP-E7Q] Typo 'spans' instead of 'span' causes test to capture no span items (additional location)

On line 528, `capture_items("transaction", "spans")` uses the incorrect item type `"spans"` instead of `"span"`. The `capture_items` fixture filters items by `item.type`, which is `"span"` (singular). As a result, no span items will be captured, and line 537's filter `item.type == "span"` will return an empty list. The subsequent `next()` call on line 538-540 will raise a `StopIteration` exception, causing the test to fail.

Check failure on line 838 in tests/integrations/pydantic_ai/test_pydantic_ai.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: code-review

Test accesses wrong span format - transaction spans have 'op' not 'attributes.sentry.op'

In `test_message_history`, spans are extracted from `second_transaction["spans"]` (line 830) but then filtered using `s["attributes"].get("sentry.op", "")` (line 832). Transaction-embedded spans use the legacy format with `s["op"]` and `s["data"]`, not `s["attributes"]`. This inconsistency will cause the filter to find zero matches since the spans don't have an `attributes` key, making the test assertions pass vacuously or fail with KeyError.

Check failure on line 628 in tests/tracing/test_misc.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: code-review

[2K8-RWQ] Test accesses wrong span format - transaction spans have 'op' not 'attributes.sentry.op' (additional location)

In `test_message_history`, spans are extracted from `second_transaction["spans"]` (line 830) but then filtered using `s["attributes"].get("sentry.op", "")` (line 832). Transaction-embedded spans use the legacy format with `s["op"]` and `s["data"]`, not `s["attributes"]`. This inconsistency will cause the filter to find zero matches since the spans don't have an `attributes` key, making the test assertions pass vacuously or fail with KeyError.

Check warning on line 1134 in sentry_sdk/client.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: code-review

Wrong event variable passed to span conversion - uses original event instead of prepared event

On line 1134, `event` (the original function parameter) is passed to `_serialized_v1_span_to_serialized_v2_span()` instead of `event_opt` (the prepared/processed event). The `_prepare_event()` function populates `release`, `environment`, and `sdk` fields from options (lines 805-811 in client.py), and applies scope data. Since `_serialized_v1_span_to_serialized_v2_span()` extracts these values to populate span attributes (like `sentry.release`, `sentry.environment`, `sentry.sdk.name`), using the original `event` will result in missing or incomplete attributes on the converted GenAI spans.

Check warning on line 330 in tests/integrations/google_genai/test_google_genai.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: code-review

Sort key uses 'name' twice instead of 'name' and 'description'

The sorting lambda uses `t.get("name", "")` twice as the sort key tuple, but the comment says "sort by name and description for comparison". This appears to be a copy-paste error during refactoring. The second key should be `t.get("description", "")` to match the stated intent and ensure deterministic ordering when multiple tools have the same name.

Check warning on line 2153 in tests/integrations/google_genai/test_google_genai.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: code-review

Test uses incorrect key 'attributes' instead of 'data' for inline_data

The test was changed to use `attributes` as the key for binary data in `inline_data`, but the Google GenAI SDK uses `data`. The `transform_google_content_part` function (sentry_sdk/ai/utils.py:286) accesses `inline_data.get("data", "")`, so this test now passes accidentally due to the code overwriting `result["content"]` with `BLOB_DATA_SUBSTITUTE` regardless of input. This means the test no longer validates correct handling of real Google GenAI inline_data dictionaries.

Check warning on line 496 in tests/integrations/pydantic_ai/test_pydantic_ai.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: code-review

[4J9-CBL] Test uses incorrect key 'attributes' instead of 'data' for inline_data (additional location)

The test was changed to use `attributes` as the key for binary data in `inline_data`, but the Google GenAI SDK uses `data`. The `transform_google_content_part` function (sentry_sdk/ai/utils.py:286) accesses `inline_data.get("data", "")`, so this test now passes accidentally due to the code overwriting `result["content"]` with `BLOB_DATA_SUBSTITUTE` regardless of input. This means the test no longer validates correct handling of real Google GenAI inline_data dictionaries.

Check warning on line 523 in tests/integrations/huggingface_hub/test_huggingface_hub.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: code-review

Hardcoded SDK version will cause test failures on version bumps

The test hardcodes `"sentry.sdk.version": "2.58.0"` instead of using `mock.ANY` like all other similar tests in this file and other test files. This will cause the test to fail when the SDK version is incremented, making this test brittle and requiring manual updates with each release.

Check warning on line 1844 in tests/integrations/langchain/test_langchain.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: code-review

Unused list comprehension results in dead code and no test assertions

The list comprehension at lines 1840-1844 creates a list that is never assigned to a variable or used for any assertion. This makes the test `test_langchain_embeddings_error_handling` effectively test nothing after the error is raised - it only verifies that the ValueError is raised, but makes no assertions about the captured data. Additionally, the `capture_items` call at line 1821 only captures 'transaction' and 'span' types, but the comprehension filters for `item.type == 'event'`, which would never match anyway.

Check warning on line 945 in tests/integrations/litellm/test_litellm.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: code-review

Test assertions silently skipped due to missing 'span' in capture_items types

At line 945, `capture_items("transaction")` only captures transaction items, but later assertions (lines 1020-1023, outside the hunk) iterate over `items` filtering for `item.type == "span"`. Since spans aren't captured, the `spans` list will be empty and the for-loop never executes, causing the test to silently pass without verifying any span attributes.

Check warning on line 1023 in tests/integrations/litellm/test_litellm.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: code-review

[6SA-XRM] Test assertions silently skipped due to missing 'span' in capture_items types (additional location)

At line 945, `capture_items("transaction")` only captures transaction items, but later assertions (lines 1020-1023, outside the hunk) iterate over `items` filtering for `item.type == "span"`. Since spans aren't captured, the `spans` list will be empty and the for-loop never executes, causing the test to silently pass without verifying any span attributes.

Check warning on line 2275 in tests/integrations/openai_agents/test_openai_agents.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: code-review

Removed assertion weakens test coverage for concurrent transaction capture

The original test `test_multiple_agents_asyncio` had an explicit `assert len(events) == 3` to verify exactly 3 transactions were captured. This assertion was removed during refactoring. If fewer transactions are captured, unpacking will fail with a ValueError (not an assertion), and if more transactions are captured, extras are silently ignored due to generator unpacking semantics.