Commit 37e232f
authored
fix(zero): detach flat buffer to prevent autograd inplace error on CP… (#7948)
…U accelerator
The on-device flatten path (introduced in #7828) passes nn.Parameter
objects with requires_grad=True to torch.cat(), creating a flat buffer
with CatBackward0 grad_fn. Later, _unflatten_dense_tensors produces
SplitBackward0 views that are assigned to model params. Inplace copy_()
on these views during optimizer step raises:
RuntimeError: Output 0 of SplitBackward0 is a view and is being modified
inplace.
This especially affects CPU training where
CPU_Accelerator.is_available() returns True and available_memory()
returns system RAM, so the on-device path is always taken.
Fix: add .detach() to the flattened buffer, matching the implicit detach
behavior of the CPU-offload path (param.data.cpu() + .to(device)).
Also rename flatten_on_gpu -> flatten_on_accelerator and replace
GPU-specific terminology in comments/logs with accelerator-generic
equivalents.
---------
Signed-off-by: Guokai Ma <[email protected]>
Signed-off-by: Ma, Guokai <[email protected]>1 parent bf0126b commit 37e232f
File tree
2 files changed
+63
-18
lines changed- deepspeed/runtime/zero
- tests/unit/v1/zero
2 files changed
+63
-18
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
368 | 368 | | |
369 | 369 | | |
370 | 370 | | |
371 | | - | |
| 371 | + | |
372 | 372 | | |
373 | 373 | | |
374 | 374 | | |
| |||
378 | 378 | | |
379 | 379 | | |
380 | 380 | | |
381 | | - | |
382 | | - | |
383 | | - | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
384 | 384 | | |
385 | | - | |
| 385 | + | |
386 | 386 | | |
387 | | - | |
| 387 | + | |
388 | 388 | | |
389 | 389 | | |
390 | 390 | | |
| |||
409 | 409 | | |
410 | 410 | | |
411 | 411 | | |
412 | | - | |
| 412 | + | |
413 | 413 | | |
414 | 414 | | |
415 | 415 | | |
416 | 416 | | |
417 | 417 | | |
418 | | - | |
419 | | - | |
| 418 | + | |
| 419 | + | |
420 | 420 | | |
421 | 421 | | |
422 | | - | |
| 422 | + | |
423 | 423 | | |
424 | | - | |
| 424 | + | |
425 | 425 | | |
426 | | - | |
| 426 | + | |
427 | 427 | | |
428 | 428 | | |
429 | 429 | | |
| |||
437 | 437 | | |
438 | 438 | | |
439 | 439 | | |
440 | | - | |
| 440 | + | |
| 441 | + | |
441 | 442 | | |
442 | 443 | | |
443 | 444 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
| 11 | + | |
11 | 12 | | |
12 | 13 | | |
13 | 14 | | |
14 | 15 | | |
15 | | - | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
16 | 19 | | |
17 | 20 | | |
18 | 21 | | |
| |||
70 | 73 | | |
71 | 74 | | |
72 | 75 | | |
73 | | - | |
74 | | - | |
75 | | - | |
76 | | - | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
77 | 80 | | |
78 | 81 | | |
79 | 82 | | |
| |||
107 | 110 | | |
108 | 111 | | |
109 | 112 | | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
0 commit comments