feat: Send GenAI spans as V2 envelope items #6079
9 issues
code-review: Found 9 issues (2 high, 7 medium)
High
Typo 'spans' instead of 'span' causes test to capture no span items - `tests/integrations/openai_agents/test_openai_agents.py:528`
On line 528, capture_items("transaction", "spans") uses the incorrect item type "spans" instead of "span". The capture_items fixture filters items by item.type, which is "span" (singular). As a result, no span items will be captured, and line 537's filter item.type == "span" will return an empty list. The subsequent next() call on line 538-540 will raise a StopIteration exception, causing the test to fail.
Also found at:
tests/integrations/openai_agents/test_openai_agents.py:1731tests/integrations/openai_agents/test_openai_agents.py:1796
Test accesses wrong span format - transaction spans have 'op' not 'attributes.sentry.op' - `tests/integrations/pydantic_ai/test_pydantic_ai.py:830-838`
In test_message_history, spans are extracted from second_transaction["spans"] (line 830) but then filtered using s["attributes"].get("sentry.op", "") (line 832). Transaction-embedded spans use the legacy format with s["op"] and s["data"], not s["attributes"]. This inconsistency will cause the filter to find zero matches since the spans don't have an attributes key, making the test assertions pass vacuously or fail with KeyError.
Also found at:
tests/tracing/test_misc.py:628
Medium
Wrong event variable passed to span conversion - uses original event instead of prepared event - `sentry_sdk/client.py:1134`
On line 1134, event (the original function parameter) is passed to _serialized_v1_span_to_serialized_v2_span() instead of event_opt (the prepared/processed event). The _prepare_event() function populates release, environment, and sdk fields from options (lines 805-811 in client.py), and applies scope data. Since _serialized_v1_span_to_serialized_v2_span() extracts these values to populate span attributes (like sentry.release, sentry.environment, sentry.sdk.name), using the original event will result in missing or incomplete attributes on the converted GenAI spans.
Sort key uses 'name' twice instead of 'name' and 'description' - `tests/integrations/google_genai/test_google_genai.py:330`
The sorting lambda uses t.get("name", "") twice as the sort key tuple, but the comment says "sort by name and description for comparison". This appears to be a copy-paste error during refactoring. The second key should be t.get("description", "") to match the stated intent and ensure deterministic ordering when multiple tools have the same name.
Test uses incorrect key 'attributes' instead of 'data' for inline_data - `tests/integrations/google_genai/test_google_genai.py:2153`
The test was changed to use attributes as the key for binary data in inline_data, but the Google GenAI SDK uses data. The transform_google_content_part function (sentry_sdk/ai/utils.py:286) accesses inline_data.get("data", ""), so this test now passes accidentally due to the code overwriting result["content"] with BLOB_DATA_SUBSTITUTE regardless of input. This means the test no longer validates correct handling of real Google GenAI inline_data dictionaries.
Also found at:
tests/integrations/pydantic_ai/test_pydantic_ai.py:490-496
Hardcoded SDK version will cause test failures on version bumps - `tests/integrations/huggingface_hub/test_huggingface_hub.py:523`
The test hardcodes "sentry.sdk.version": "2.58.0" instead of using mock.ANY like all other similar tests in this file and other test files. This will cause the test to fail when the SDK version is incremented, making this test brittle and requiring manual updates with each release.
Unused list comprehension results in dead code and no test assertions - `tests/integrations/langchain/test_langchain.py:1840-1844`
The list comprehension at lines 1840-1844 creates a list that is never assigned to a variable or used for any assertion. This makes the test test_langchain_embeddings_error_handling effectively test nothing after the error is raised - it only verifies that the ValueError is raised, but makes no assertions about the captured data. Additionally, the capture_items call at line 1821 only captures 'transaction' and 'span' types, but the comprehension filters for item.type == 'event', which would never match anyway.
Test assertions silently skipped due to missing 'span' in capture_items types - `tests/integrations/litellm/test_litellm.py:945`
At line 945, capture_items("transaction") only captures transaction items, but later assertions (lines 1020-1023, outside the hunk) iterate over items filtering for item.type == "span". Since spans aren't captured, the spans list will be empty and the for-loop never executes, causing the test to silently pass without verifying any span attributes.
Also found at:
tests/integrations/litellm/test_litellm.py:1020-1023
Removed assertion weakens test coverage for concurrent transaction capture - `tests/integrations/openai_agents/test_openai_agents.py:2275`
The original test test_multiple_agents_asyncio had an explicit assert len(events) == 3 to verify exactly 3 transactions were captured. This assertion was removed during refactoring. If fewer transactions are captured, unpacking will fail with a ValueError (not an assertion), and if more transactions are captured, extras are silently ignored due to generator unpacking semantics.
Duration: 39m 20s · Tokens: 14.5M in / 179.4k out · Cost: $20.80 (+extraction: $0.01, +merge: $0.01, +fix_gate: $0.03)
Annotations
Check failure on line 528 in tests/integrations/openai_agents/test_openai_agents.py
sentry-warden / warden: code-review
Typo 'spans' instead of 'span' causes test to capture no span items
On line 528, `capture_items("transaction", "spans")` uses the incorrect item type `"spans"` instead of `"span"`. The `capture_items` fixture filters items by `item.type`, which is `"span"` (singular). As a result, no span items will be captured, and line 537's filter `item.type == "span"` will return an empty list. The subsequent `next()` call on line 538-540 will raise a `StopIteration` exception, causing the test to fail.
Check failure on line 1731 in tests/integrations/openai_agents/test_openai_agents.py
sentry-warden / warden: code-review
[YAP-E7Q] Typo 'spans' instead of 'span' causes test to capture no span items (additional location)
On line 528, `capture_items("transaction", "spans")` uses the incorrect item type `"spans"` instead of `"span"`. The `capture_items` fixture filters items by `item.type`, which is `"span"` (singular). As a result, no span items will be captured, and line 537's filter `item.type == "span"` will return an empty list. The subsequent `next()` call on line 538-540 will raise a `StopIteration` exception, causing the test to fail.
Check failure on line 1796 in tests/integrations/openai_agents/test_openai_agents.py
sentry-warden / warden: code-review
[YAP-E7Q] Typo 'spans' instead of 'span' causes test to capture no span items (additional location)
On line 528, `capture_items("transaction", "spans")` uses the incorrect item type `"spans"` instead of `"span"`. The `capture_items` fixture filters items by `item.type`, which is `"span"` (singular). As a result, no span items will be captured, and line 537's filter `item.type == "span"` will return an empty list. The subsequent `next()` call on line 538-540 will raise a `StopIteration` exception, causing the test to fail.
Check failure on line 838 in tests/integrations/pydantic_ai/test_pydantic_ai.py
sentry-warden / warden: code-review
Test accesses wrong span format - transaction spans have 'op' not 'attributes.sentry.op'
In `test_message_history`, spans are extracted from `second_transaction["spans"]` (line 830) but then filtered using `s["attributes"].get("sentry.op", "")` (line 832). Transaction-embedded spans use the legacy format with `s["op"]` and `s["data"]`, not `s["attributes"]`. This inconsistency will cause the filter to find zero matches since the spans don't have an `attributes` key, making the test assertions pass vacuously or fail with KeyError.
Check failure on line 628 in tests/tracing/test_misc.py
sentry-warden / warden: code-review
[2K8-RWQ] Test accesses wrong span format - transaction spans have 'op' not 'attributes.sentry.op' (additional location)
In `test_message_history`, spans are extracted from `second_transaction["spans"]` (line 830) but then filtered using `s["attributes"].get("sentry.op", "")` (line 832). Transaction-embedded spans use the legacy format with `s["op"]` and `s["data"]`, not `s["attributes"]`. This inconsistency will cause the filter to find zero matches since the spans don't have an `attributes` key, making the test assertions pass vacuously or fail with KeyError.
Check warning on line 1134 in sentry_sdk/client.py
sentry-warden / warden: code-review
Wrong event variable passed to span conversion - uses original event instead of prepared event
On line 1134, `event` (the original function parameter) is passed to `_serialized_v1_span_to_serialized_v2_span()` instead of `event_opt` (the prepared/processed event). The `_prepare_event()` function populates `release`, `environment`, and `sdk` fields from options (lines 805-811 in client.py), and applies scope data. Since `_serialized_v1_span_to_serialized_v2_span()` extracts these values to populate span attributes (like `sentry.release`, `sentry.environment`, `sentry.sdk.name`), using the original `event` will result in missing or incomplete attributes on the converted GenAI spans.
Check warning on line 330 in tests/integrations/google_genai/test_google_genai.py
sentry-warden / warden: code-review
Sort key uses 'name' twice instead of 'name' and 'description'
The sorting lambda uses `t.get("name", "")` twice as the sort key tuple, but the comment says "sort by name and description for comparison". This appears to be a copy-paste error during refactoring. The second key should be `t.get("description", "")` to match the stated intent and ensure deterministic ordering when multiple tools have the same name.
Check warning on line 2153 in tests/integrations/google_genai/test_google_genai.py
sentry-warden / warden: code-review
Test uses incorrect key 'attributes' instead of 'data' for inline_data
The test was changed to use `attributes` as the key for binary data in `inline_data`, but the Google GenAI SDK uses `data`. The `transform_google_content_part` function (sentry_sdk/ai/utils.py:286) accesses `inline_data.get("data", "")`, so this test now passes accidentally due to the code overwriting `result["content"]` with `BLOB_DATA_SUBSTITUTE` regardless of input. This means the test no longer validates correct handling of real Google GenAI inline_data dictionaries.
Check warning on line 496 in tests/integrations/pydantic_ai/test_pydantic_ai.py
sentry-warden / warden: code-review
[4J9-CBL] Test uses incorrect key 'attributes' instead of 'data' for inline_data (additional location)
The test was changed to use `attributes` as the key for binary data in `inline_data`, but the Google GenAI SDK uses `data`. The `transform_google_content_part` function (sentry_sdk/ai/utils.py:286) accesses `inline_data.get("data", "")`, so this test now passes accidentally due to the code overwriting `result["content"]` with `BLOB_DATA_SUBSTITUTE` regardless of input. This means the test no longer validates correct handling of real Google GenAI inline_data dictionaries.
Check warning on line 523 in tests/integrations/huggingface_hub/test_huggingface_hub.py
sentry-warden / warden: code-review
Hardcoded SDK version will cause test failures on version bumps
The test hardcodes `"sentry.sdk.version": "2.58.0"` instead of using `mock.ANY` like all other similar tests in this file and other test files. This will cause the test to fail when the SDK version is incremented, making this test brittle and requiring manual updates with each release.
Check warning on line 1844 in tests/integrations/langchain/test_langchain.py
sentry-warden / warden: code-review
Unused list comprehension results in dead code and no test assertions
The list comprehension at lines 1840-1844 creates a list that is never assigned to a variable or used for any assertion. This makes the test `test_langchain_embeddings_error_handling` effectively test nothing after the error is raised - it only verifies that the ValueError is raised, but makes no assertions about the captured data. Additionally, the `capture_items` call at line 1821 only captures 'transaction' and 'span' types, but the comprehension filters for `item.type == 'event'`, which would never match anyway.
Check warning on line 945 in tests/integrations/litellm/test_litellm.py
sentry-warden / warden: code-review
Test assertions silently skipped due to missing 'span' in capture_items types
At line 945, `capture_items("transaction")` only captures transaction items, but later assertions (lines 1020-1023, outside the hunk) iterate over `items` filtering for `item.type == "span"`. Since spans aren't captured, the `spans` list will be empty and the for-loop never executes, causing the test to silently pass without verifying any span attributes.
Check warning on line 1023 in tests/integrations/litellm/test_litellm.py
sentry-warden / warden: code-review
[6SA-XRM] Test assertions silently skipped due to missing 'span' in capture_items types (additional location)
At line 945, `capture_items("transaction")` only captures transaction items, but later assertions (lines 1020-1023, outside the hunk) iterate over `items` filtering for `item.type == "span"`. Since spans aren't captured, the `spans` list will be empty and the for-loop never executes, causing the test to silently pass without verifying any span attributes.
Check warning on line 2275 in tests/integrations/openai_agents/test_openai_agents.py
sentry-warden / warden: code-review
Removed assertion weakens test coverage for concurrent transaction capture
The original test `test_multiple_agents_asyncio` had an explicit `assert len(events) == 3` to verify exactly 3 transactions were captured. This assertion was removed during refactoring. If fewer transactions are captured, unpacking will fail with a ValueError (not an assertion), and if more transactions are captured, extras are silently ignored due to generator unpacking semantics.