feat: Send GenAI spans as V2 envelope items#6079
feat: Send GenAI spans as V2 envelope items#6079alexander-alderman-webb wants to merge 36 commits intomasterfrom
Conversation
Semver Impact of This PR🟡 Minor (new features) 📋 Changelog PreviewThis is how your changes will appear in the changelog. New Features ✨
Bug Fixes 🐛
Internal Changes 🔧
🤖 This preview updates automatically when you update the PR. |
Codecov Results 📊✅ 142 passed | Total: 142 | Pass Rate: 100% | Execution Time: 20.99s 📊 Comparison with Base Branch
✨ No test changes detected All tests are passing successfully. ✅ Patch coverage is 81.40%. Project has 14170 uncovered lines. Files with missing lines (2)
Coverage diff@@ Coverage Diff @@
## main #PR +/-##
==========================================
+ Coverage 33.75% 34.06% +0.31%
==========================================
Files 190 190 —
Lines 21365 21490 +125
Branches 7068 7158 +90
==========================================
+ Hits 7211 7320 +109
- Misses 14154 14170 +16
- Partials 700 725 +25Generated by Codecov Action |
There was a problem hiding this comment.
Sorting key uses 'name' twice instead of 'name' and 'description' (tests/integrations/google_genai/test_google_genai.py:330)
The sorting lambda in test_generate_content_with_tools was changed from key=lambda t: (t.get("name", ""), t.get("description", "")) to key=lambda t: (t.get("name", ""), t.get("name", "")). This appears to be an accidental duplication error. While this may not break the test currently (since tool names are unique in this test), it defeats the purpose of the secondary sort key and could cause non-deterministic test ordering if tools have the same name but different descriptions.
Orphaned _meta after GenAI spans are split from transaction (tests/integrations/openai/test_openai.py:3758)
In test_openai_message_truncation, the test accesses event["_meta"]["spans"]["0"] to verify truncation metadata for the GenAI span. However, with the V2 envelope changes, GenAI spans are now split out of the transaction via _split_gen_ai_spans() in client.py and sent as separate envelope items. The _meta is generated during serialization (line 848) before the span split occurs (line 1104), leaving orphaned metadata that references a span no longer present in the transaction. The test may pass but validates stale metadata that doesn't correspond to any span in the actual transaction payload.
Identified by Warden find-bugs
There was a problem hiding this comment.
Test assertions check orphaned _meta data after GenAI spans are extracted (tests/integrations/openai/test_openai.py:3756)
After GenAI spans are sent as separate V2 envelope items, the transaction's spans array no longer contains them. However, the test at lines 3757-3760 still asserts against event["_meta"]["spans"]["0"] which contains stale metadata referring to a span that's no longer in the transaction. The _meta path references span index "0" but if all spans were GenAI spans, the transaction's spans array will be empty while _meta["spans"]["0"] still exists from before the split.
Identified by Warden find-bugs
There was a problem hiding this comment.
Test assumes first span has error status without validation (tests/integrations/langchain/test_langchain.py:940)
In test_span_status_error, the assertion assert spans[0]["status"] == "error" assumes the first span in the list is the one with the error. However, langchain integration can produce multiple spans (agent, chat, tool execution), and the order may not be deterministic. Unlike similar tests in other integrations (e.g., anthropic tests verify GEN_AI_SYSTEM attribute, pydantic_ai tests assert len(spans) == 1), this test doesn't validate it's examining the correct span. This could lead to flaky tests.
test_async_exception_handling patches wrong client (embeddings instead of completions) (tests/integrations/litellm/test_litellm.py:866)
In test_async_exception_handling, the mock patches client.embeddings._client._client but the test calls litellm.acompletion() which uses the completions endpoint. This causes the mock to not actually intercept the API call, making the test unreliable. The sync version test_exception_handling correctly patches client.completions._client._client.
Test accesses non-existent 'data' key instead of 'attributes' on capture_items span payload (tests/tracing/test_misc.py:628)
The test uses capture_items("span") which transforms span payloads to have an 'attributes' key (see conftest.py lines 361-367), but the test accesses spans[0]["data"] which doesn't exist. Other tests using capture_items("span") consistently access span["attributes"] (e.g., test_google_genai.py). This will cause the test to fail with a KeyError at runtime.
Identified by Warden find-bugs
There was a problem hiding this comment.
Test accesses orphaned _meta after gen_ai span is removed from transaction (tests/integrations/openai/test_openai.py:3758)
After gen_ai spans are split from the transaction and sent as V2 envelope items, the transaction's spans list no longer contains the gen_ai span. However, the test still accesses event["_meta"]["spans"]["0"]["data"] expecting truncation metadata. Since the span at index 0 has been moved to the V2 envelope, _meta["spans"]["0"] now references metadata for a span that no longer exists in the transaction's spans array. This test will likely fail or assert against orphaned/stale metadata.
Test expects V2 span envelope for non-gen_ai op span, will fail (tests/tracing/test_misc.py:618)
The test test_conversation_id_propagates_to_span_with_gen_ai_operation_name was modified to use capture_items("span") which captures V2 envelope span items. However, the span being created has op="http.client", and _split_gen_ai_spans() in client.py only splits spans where op starts with gen_ai.. This span will NOT be sent as a V2 envelope item - it will remain in the transaction event. The test will fail because spans list will be empty or not contain the expected span.
Identified by Warden find-bugs
There was a problem hiding this comment.
Test asserts against stale _meta path after GenAI spans are extracted to V2 envelope items (tests/integrations/langchain/test_langchain.py:1381)
Line 1381 asserts tx["_meta"]["spans"]["0"]["data"]["gen_ai.request.messages"][""]["len"] == 5 but with _experiments={"gen_ai_as_v2_spans": True} enabled (line 1313), the GenAI span is extracted from the transaction and sent as a separate envelope item. After extraction, the transaction's spans array no longer contains the GenAI span at index 0, making the _meta path invalid or pointing to a different span. This test will fail at runtime when the extracted spans leave behind mismatched _meta indices.
Identified by Warden find-bugs
There was a problem hiding this comment.
Async exception handling test mocks wrong client endpoint (tests/integrations/litellm/test_litellm.py:878)
In test_async_exception_handling, the mock patches client.embeddings._client._client on line 878-879, but the test calls litellm.acompletion() which uses the completions endpoint, not embeddings. The sync version test_exception_handling correctly mocks client.completions._client._client. This mismatch means the mock may not properly intercept the request, causing the test to potentially fail or not test what it intends.
Test checks wrong key 'attributes' instead of 'data' for transaction context (tests/integrations/openai_agents/test_openai_agents.py:3592)
The test test_no_conversation_id_when_not_provided checks transaction["contexts"]["trace"].get("attributes", {}) at lines 3592-3594, but all other tests in this file check transaction["contexts"]["trace"]["data"] for transaction span attributes (see lines 3389 and 3528). This inconsistency means the test could pass even if gen_ai.conversation.id is incorrectly present in the data key of the transaction context.
Identified by Warden find-bugs
Description
Issues
Reminders
tox -e linters.feat:,fix:,ref:,meta:)