Lecture 1 How to profile CUDA kernels in PyTorch

Рет қаралды 10,692

CUDA MODE

4 ай бұрын

Slides: docs.google.com/presentation/...
Code: github.com/msaroufim/cudamode...

Пікірлер: 19

@TheAIEpiphany 10 күн бұрын

Nice walk-through mark! So in practice on a high level one would profile the code, identify the perf bottlenecks and then replace some of the functions associated with that bottleneck with a direct CUDA/Triton implementation?

@Graverman 4 ай бұрын

thanks for providing this for free!

@mlock1000 3 ай бұрын

'i believe thing i see' I'm in the right place. Thanks!!

@DiogoSanti 25 күн бұрын

Awesome, this channel is a gem.

@zerotwo7319 4 ай бұрын

oh no. now I have no excuse to be a productive member in my village. Oh, I accidentally subscribed, the terror.

@JasonKuanCapillaryJ 2 күн бұрын

Nice talk

@loabrasumente2283 4 ай бұрын

at 30:40 where you change the BLOCK_SIZE to 1024. How is it possible to reach 8000GB/s when max memory bandwidth of A10G is only 600GB/s? I think setting BLOCK_SIZE = 1024 makes triton compute only the first 1024 columns of the matrix while ignoring the rest, so when you computing the GB/s, the "seconds" part is fixed, while the "GB" grows linearly (128 * i), that's why you're seeing the perf growing linearly. Also the reason why the little `torch.allclose` test didn't complain, is that you are only testing a small matrix (1823, 781) here, whose n_cols

@CUDAMODE 4 ай бұрын

Indeed and actually after rerunning this the torch.allclose did indeed complain so that slide is just plain wrong, will revisit what went wrong

@elliot6285 4 ай бұрын

Do you have any suggestions for comprehensive resources or study materials that can help a beginner learn about CPUs and GPUs, particularly focusing on their roles and functions in Machine Learning and Deep Learning? I'm looking for in-depth yet accessible information to build a strong foundation in this area, which will enable me to understand the technical aspects discussed in certain videos related to ML/DL, especially this one :).

@mikhailkilianovski8024 4 ай бұрын

This course could be helpful, I am going through it with a pleasure kzbin.info/www/bejne/rYXPZqqIebljebc

@CUDAMODE 2 ай бұрын

This is a good start github.com/cuda-mode/resource-stream