Report incorrect documentation
Location of incorrect documentation
media/docs/cpp/cute/02_layout_algebra.md.
https://docs.nvidia.com/cutlass/latest/media/docs/cpp/cute/02_layout_algebra.html
Describe the problems or issues found in the documentation
In the example in the Computing Composition section, The third entry in the resulting stride tuple is written as 4*x, but it should be 4*y. Each entry in the result is supposed to be the residue multiplied by the corresponding mode's original stride which is y for the third residue, since the strides of A are (w, x, y, z). This is likely just a typo.
Steps taken to verify documentation is incorrect
Re-derived the residues for (3,6,2,8) / 72 mode-by-mode using the rule described earlier in the same section. This result can be verified with:
import cutlass.cute as cute
@cute.jit
def check():
# Pick (w,x,y,z) = (1,3,18,36) so that x and y are DIFFERENT numbers.
A = cute.make_layout((3, 6, 2, 8), stride=(1, 3, 18, 36))
R = cute.composition(A, cute.make_layout(4, stride=72))
cute.printf("R = {}", R) # prints: 4:72
cute.printf("strides = {}", R.stride) # prints: 72
check()
# Then the third stride lets us tell the two predictions apart:
print("old = (72, 72, 12, 72) # 4*x = 4*3 = 12")
print("fixed = (72, 72, 72, 72) # 4*y = 4*18 = 72")
Suggested fix for documentation.
Change:
... produces (72*w,24*x,4*x,2*z) as the strides of the strided layout.
to:
... produces (72*w,24*x,4*y,2*z) as the strides of the strided layout
Report incorrect documentation
Location of incorrect documentation
media/docs/cpp/cute/02_layout_algebra.md.https://docs.nvidia.com/cutlass/latest/media/docs/cpp/cute/02_layout_algebra.html
Describe the problems or issues found in the documentation
In the example in the Computing Composition section, The third entry in the resulting stride tuple is written as
4*x, but it should be4*y. Each entry in the result is supposed to be the residue multiplied by the corresponding mode's original stride which isyfor the third residue, since the strides ofAare(w, x, y, z). This is likely just a typo.Steps taken to verify documentation is incorrect
Re-derived the residues for (3,6,2,8) / 72 mode-by-mode using the rule described earlier in the same section. This result can be verified with:
Suggested fix for documentation.
Change:
to: