Skip to content

Fix masked_scatter decomposition to resolve OOM error in gemma3 multimodal models#4315

Draft
sonalibaskaran2499 wants to merge 1 commit intomainfrom
masked_scatter_decomp_fix
Draft

Fix masked_scatter decomposition to resolve OOM error in gemma3 multimodal models#4315
sonalibaskaran2499 wants to merge 1 commit intomainfrom
masked_scatter_decomp_fix

Conversation

@sonalibaskaran2499
Copy link
Copy Markdown
Contributor

Ticket

Link to Github Issue

Problem description

Gemma 3 multimodal variants was failed with the OOM error
Out of Memory: Not enough space to allocate 2904555520 B DRAM buffer across 12 banks, where each bank needs to store 242049024 B, but bank size is 1071821792 B

What's changed

Resolved the OOM error by adding a fast path for the common case where the mask is row-constant. Instead of running cumsum over the full B×S×H flat tensor (709k elements) detect this via mask.stride(-1) == 0 and run cumsum only over the B×S token dimension (277 elements), then gather entire feature rows at once.

Checklist

  • New/Existing tests provide coverage for changes

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 32.85%. Comparing base (d1d7e77) to head (1af56ff).
⚠️ Report is 15 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4315      +/-   ##
==========================================
+ Coverage   27.16%   32.85%   +5.69%     
==========================================
  Files          33       36       +3     
  Lines        4307     4690     +383     
==========================================
+ Hits         1170     1541     +371     
- Misses       3137     3149      +12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants