feat: Send GenAI spans as V2 envelope items #6079
20 issues
High
Typo 'spans' instead of 'span' causes test to capture no span items - `tests/integrations/openai_agents/test_openai_agents.py:528`
On line 528, capture_items("transaction", "spans") uses the incorrect item type "spans" instead of "span". The capture_items fixture filters items by item.type, which is "span" (singular). As a result, no span items will be captured, and line 537's filter item.type == "span" will return an empty list. The subsequent next() call on line 538-540 will raise a StopIteration exception, causing the test to fail.
Also found at:
tests/integrations/openai_agents/test_openai_agents.py:1731tests/integrations/openai_agents/test_openai_agents.py:1796
Test accesses wrong span format - transaction spans have 'op' not 'attributes.sentry.op' - `tests/integrations/pydantic_ai/test_pydantic_ai.py:830-838`
In test_message_history, spans are extracted from second_transaction["spans"] (line 830) but then filtered using s["attributes"].get("sentry.op", "") (line 832). Transaction-embedded spans use the legacy format with s["op"] and s["data"], not s["attributes"]. This inconsistency will cause the filter to find zero matches since the spans don't have an attributes key, making the test assertions pass vacuously or fail with KeyError.
Also found at:
tests/tracing/test_misc.py:628
Using unprocessed 'event' instead of 'event_opt' causes V2 spans to miss scope-applied attributes - `sentry_sdk/client.py:1134`
At line 1134, _serialized_v1_span_to_serialized_v2_span(span, event) uses the original event parameter instead of the processed event_opt. The _prepare_event method (line 1075) applies scope transformations that add/modify user info, release, environment, and other attributes. Since V2 spans are created from the unprocessed event, they will be missing or have stale values for user.id, user.name, user.email, sentry.release, sentry.environment, sentry.segment.name, and SDK metadata compared to the transaction itself.
Also found at:
sentry_sdk/client.py:158-160
Test uses incorrect key 'data' instead of 'attributes' for V2 span items - `tests/tracing/test_misc.py:628`
The test changed to use capture_items("span") which returns V2 span format with an attributes key, but the test still accesses spans[0]["data"]. The capture_items fixture (conftest.py lines 361-367) explicitly creates payloads with an attributes key for span items, not data. This will cause a KeyError: 'data' when the test runs.
Medium
Wrong event variable passed to span conversion - uses original event instead of prepared event - `sentry_sdk/client.py:1134`
On line 1134, event (the original function parameter) is passed to _serialized_v1_span_to_serialized_v2_span() instead of event_opt (the prepared/processed event). The _prepare_event() function populates release, environment, and sdk fields from options (lines 805-811 in client.py), and applies scope data. Since _serialized_v1_span_to_serialized_v2_span() extracts these values to populate span attributes (like sentry.release, sentry.environment, sentry.sdk.name), using the original event will result in missing or incomplete attributes on the converted GenAI spans.
Sort key uses 'name' twice instead of 'name' and 'description' - `tests/integrations/google_genai/test_google_genai.py:330`
The sorting lambda uses t.get("name", "") twice as the sort key tuple, but the comment says "sort by name and description for comparison". This appears to be a copy-paste error during refactoring. The second key should be t.get("description", "") to match the stated intent and ensure deterministic ordering when multiple tools have the same name.
Test uses incorrect key 'attributes' instead of 'data' for inline_data - `tests/integrations/google_genai/test_google_genai.py:2153`
The test was changed to use attributes as the key for binary data in inline_data, but the Google GenAI SDK uses data. The transform_google_content_part function (sentry_sdk/ai/utils.py:286) accesses inline_data.get("data", ""), so this test now passes accidentally due to the code overwriting result["content"] with BLOB_DATA_SUBSTITUTE regardless of input. This means the test no longer validates correct handling of real Google GenAI inline_data dictionaries.
Also found at:
tests/integrations/pydantic_ai/test_pydantic_ai.py:490-496
Hardcoded SDK version will cause test failures on version bumps - `tests/integrations/huggingface_hub/test_huggingface_hub.py:523`
The test hardcodes "sentry.sdk.version": "2.58.0" instead of using mock.ANY like all other similar tests in this file and other test files. This will cause the test to fail when the SDK version is incremented, making this test brittle and requiring manual updates with each release.
Unused list comprehension results in dead code and no test assertions - `tests/integrations/langchain/test_langchain.py:1840-1844`
The list comprehension at lines 1840-1844 creates a list that is never assigned to a variable or used for any assertion. This makes the test test_langchain_embeddings_error_handling effectively test nothing after the error is raised - it only verifies that the ValueError is raised, but makes no assertions about the captured data. Additionally, the capture_items call at line 1821 only captures 'transaction' and 'span' types, but the comprehension filters for item.type == 'event', which would never match anyway.
Test assertions silently skipped due to missing 'span' in capture_items types - `tests/integrations/litellm/test_litellm.py:945`
At line 945, capture_items("transaction") only captures transaction items, but later assertions (lines 1020-1023, outside the hunk) iterate over items filtering for item.type == "span". Since spans aren't captured, the spans list will be empty and the for-loop never executes, causing the test to silently pass without verifying any span attributes.
Also found at:
tests/integrations/litellm/test_litellm.py:1020-1023
...and 10 more
4 skills analyzed
| Skill | Findings | Duration | Cost |
|---|---|---|---|
| code-review | 9 | 39m 20s | $20.76 |
| find-bugs | 11 | 25m 8s | $28.01 |
| skill-scanner | 0 | 43m 20s | $7.01 |
| security-review | 0 | 35m 38s | $5.54 |
Duration: 143m 26s · Tokens: 39.2M in / 481.2k out · Cost: $61.42 (+extraction: $0.02, +merge: $0.01, +fix_gate: $0.05, +dedup: $0.02)