Skip to content

feat: Send GenAI spans as V2 envelope items#6079

Draft
alexander-alderman-webb wants to merge 36 commits intomasterfrom
webb/gen-ai-v2
Draft

feat: Send GenAI spans as V2 envelope items#6079
alexander-alderman-webb wants to merge 36 commits intomasterfrom
webb/gen-ai-v2

Conversation

@alexander-alderman-webb
Copy link
Copy Markdown
Contributor

Description

Issues

Reminders

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 15, 2026

Semver Impact of This PR

🟡 Minor (new features)

📋 Changelog Preview

This is how your changes will appear in the changelog.
Entries from this PR are highlighted with a left border (blockquote style).


New Features ✨

  • (ci) Cancel in-progress PR workflows on new commit push by joshuarli in #5994
  • Send GenAI spans as V2 envelope items by alexander-alderman-webb in #6079
  • Add db.driver.name spans to database integrations by ericapisani in #6082

Bug Fixes 🐛

  • (google_genai) Redact binary data in inline_data and fix multi-part message extraction by ericapisani in #5977
  • (grpc) Add isolation_scope to async server interceptor by robinvd in #5940
  • (profiler) Stop nulling buffer on teardown by ericapisani in #6075

Internal Changes 🔧

  • (celery) Remove unused NoOpMgr from utils by sentrivana in #6078
  • (pydantic-ai) Remove dead Model.request patch by alexander-alderman-webb in #5956
  • (tests) Replace deprecated enable_tracingwith traces_sample_rate by sentrivana in #6077
  • Set explicit base-branch for codecov action by ericapisani in #5992

🤖 This preview updates automatically when you update the PR.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 15, 2026

Codecov Results 📊

142 passed | Total: 142 | Pass Rate: 100% | Execution Time: 20.99s

📊 Comparison with Base Branch

Metric Change
Total Tests
Passed Tests
Failed Tests
Skipped Tests

✨ No test changes detected

All tests are passing successfully.

✅ Patch coverage is 81.40%. Project has 14170 uncovered lines.
✅ Project coverage is 34.06%. Comparing base (base) to head (head).

Files with missing lines (2)
File Patch % Lines
client.py 58.97% ⚠️ 272 Missing and 88 partials
consts.py 99.43% ⚠️ 2 Missing
Coverage diff
@@            Coverage Diff             @@
##          main       #PR       +/-##
==========================================
+ Coverage    33.75%    34.06%    +0.31%
==========================================
  Files          190       190         —
  Lines        21365     21490      +125
  Branches      7068      7158       +90
==========================================
+ Hits          7211      7320      +109
- Misses       14154     14170       +16
- Partials       700       725       +25

Generated by Codecov Action

Comment thread sentry_sdk/client.py Outdated
Comment thread sentry_sdk/client.py Outdated
Comment thread sentry_sdk/client.py Outdated
Comment thread sentry_sdk/client.py
Comment thread sentry_sdk/client.py Outdated
Comment thread sentry_sdk/client.py Outdated
Comment thread sentry_sdk/client.py Outdated
Comment thread tests/integrations/google_genai/test_google_genai.py
Comment thread tests/integrations/litellm/test_litellm.py
Copy link
Copy Markdown

@sentry-warden sentry-warden bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorting key uses 'name' twice instead of 'name' and 'description' (tests/integrations/google_genai/test_google_genai.py:330)

The sorting lambda in test_generate_content_with_tools was changed from key=lambda t: (t.get("name", ""), t.get("description", "")) to key=lambda t: (t.get("name", ""), t.get("name", "")). This appears to be an accidental duplication error. While this may not break the test currently (since tool names are unique in this test), it defeats the purpose of the secondary sort key and could cause non-deterministic test ordering if tools have the same name but different descriptions.

Orphaned _meta after GenAI spans are split from transaction (tests/integrations/openai/test_openai.py:3758)

In test_openai_message_truncation, the test accesses event["_meta"]["spans"]["0"] to verify truncation metadata for the GenAI span. However, with the V2 envelope changes, GenAI spans are now split out of the transaction via _split_gen_ai_spans() in client.py and sent as separate envelope items. The _meta is generated during serialization (line 848) before the span split occurs (line 1104), leaving orphaned metadata that references a span no longer present in the transaction. The test may pass but validates stale metadata that doesn't correspond to any span in the actual transaction payload.

Identified by Warden find-bugs

Comment thread tests/integrations/huggingface_hub/test_huggingface_hub.py
Copy link
Copy Markdown

@sentry-warden sentry-warden bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test assertions check orphaned _meta data after GenAI spans are extracted (tests/integrations/openai/test_openai.py:3756)

After GenAI spans are sent as separate V2 envelope items, the transaction's spans array no longer contains them. However, the test at lines 3757-3760 still asserts against event["_meta"]["spans"]["0"] which contains stale metadata referring to a span that's no longer in the transaction. The _meta path references span index "0" but if all spans were GenAI spans, the transaction's spans array will be empty while _meta["spans"]["0"] still exists from before the split.

Identified by Warden find-bugs

Comment thread tests/integrations/openai_agents/test_openai_agents.py Outdated
Comment thread tests/integrations/pydantic_ai/test_pydantic_ai.py
Comment thread tests/integrations/openai_agents/test_openai_agents.py
Comment thread tests/integrations/openai_agents/test_openai_agents.py
Copy link
Copy Markdown

@sentry-warden sentry-warden bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test assumes first span has error status without validation (tests/integrations/langchain/test_langchain.py:940)

In test_span_status_error, the assertion assert spans[0]["status"] == "error" assumes the first span in the list is the one with the error. However, langchain integration can produce multiple spans (agent, chat, tool execution), and the order may not be deterministic. Unlike similar tests in other integrations (e.g., anthropic tests verify GEN_AI_SYSTEM attribute, pydantic_ai tests assert len(spans) == 1), this test doesn't validate it's examining the correct span. This could lead to flaky tests.

test_async_exception_handling patches wrong client (embeddings instead of completions) (tests/integrations/litellm/test_litellm.py:866)

In test_async_exception_handling, the mock patches client.embeddings._client._client but the test calls litellm.acompletion() which uses the completions endpoint. This causes the mock to not actually intercept the API call, making the test unreliable. The sync version test_exception_handling correctly patches client.completions._client._client.

Test accesses non-existent 'data' key instead of 'attributes' on capture_items span payload (tests/tracing/test_misc.py:628)

The test uses capture_items("span") which transforms span payloads to have an 'attributes' key (see conftest.py lines 361-367), but the test accesses spans[0]["data"] which doesn't exist. Other tests using capture_items("span") consistently access span["attributes"] (e.g., test_google_genai.py). This will cause the test to fail with a KeyError at runtime.

Identified by Warden find-bugs

Comment thread tests/integrations/openai_agents/test_openai_agents.py
Comment thread tests/integrations/langchain/test_langchain.py
Comment thread tests/tracing/test_misc.py Outdated
Comment thread tests/integrations/langchain/test_langchain.py
Comment thread tests/integrations/langchain/test_langchain.py
Copy link
Copy Markdown

@sentry-warden sentry-warden bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test accesses orphaned _meta after gen_ai span is removed from transaction (tests/integrations/openai/test_openai.py:3758)

After gen_ai spans are split from the transaction and sent as V2 envelope items, the transaction's spans list no longer contains the gen_ai span. However, the test still accesses event["_meta"]["spans"]["0"]["data"] expecting truncation metadata. Since the span at index 0 has been moved to the V2 envelope, _meta["spans"]["0"] now references metadata for a span that no longer exists in the transaction's spans array. This test will likely fail or assert against orphaned/stale metadata.

Test expects V2 span envelope for non-gen_ai op span, will fail (tests/tracing/test_misc.py:618)

The test test_conversation_id_propagates_to_span_with_gen_ai_operation_name was modified to use capture_items("span") which captures V2 envelope span items. However, the span being created has op="http.client", and _split_gen_ai_spans() in client.py only splits spans where op starts with gen_ai.. This span will NOT be sent as a V2 envelope item - it will remain in the transaction event. The test will fail because spans list will be empty or not contain the expected span.

Identified by Warden find-bugs

Comment thread sentry_sdk/client.py
Comment thread tests/integrations/pydantic_ai/test_pydantic_ai.py Outdated
Comment thread tests/integrations/pydantic_ai/test_pydantic_ai.py
Comment thread tests/integrations/pydantic_ai/test_pydantic_ai.py
Copy link
Copy Markdown

@sentry-warden sentry-warden bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test asserts against stale _meta path after GenAI spans are extracted to V2 envelope items (tests/integrations/langchain/test_langchain.py:1381)

Line 1381 asserts tx["_meta"]["spans"]["0"]["data"]["gen_ai.request.messages"][""]["len"] == 5 but with _experiments={"gen_ai_as_v2_spans": True} enabled (line 1313), the GenAI span is extracted from the transaction and sent as a separate envelope item. After extraction, the transaction's spans array no longer contains the GenAI span at index 0, making the _meta path invalid or pointing to a different span. This test will fail at runtime when the extracted spans leave behind mismatched _meta indices.

Identified by Warden find-bugs

Copy link
Copy Markdown

@sentry-warden sentry-warden bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Async exception handling test mocks wrong client endpoint (tests/integrations/litellm/test_litellm.py:878)

In test_async_exception_handling, the mock patches client.embeddings._client._client on line 878-879, but the test calls litellm.acompletion() which uses the completions endpoint, not embeddings. The sync version test_exception_handling correctly mocks client.completions._client._client. This mismatch means the mock may not properly intercept the request, causing the test to potentially fail or not test what it intends.

Test checks wrong key 'attributes' instead of 'data' for transaction context (tests/integrations/openai_agents/test_openai_agents.py:3592)

The test test_no_conversation_id_when_not_provided checks transaction["contexts"]["trace"].get("attributes", {}) at lines 3592-3594, but all other tests in this file check transaction["contexts"]["trace"]["data"] for transaction span attributes (see lines 3389 and 3528). This inconsistency means the test could pass even if gen_ai.conversation.id is incorrectly present in the data key of the transaction context.

Identified by Warden find-bugs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant