GPU Puzzles: Let's Play

  Рет қаралды 7,932

Sasha Rush 🤗

Sasha Rush 🤗

Күн бұрын

Пікірлер: 15
@eyannoronha831
@eyannoronha831 Жыл бұрын
Thank you so much for making these puzzles! It was a great introduction to GPU code for a beginner like myself! I am looking for more beginner-friendly GPU implementations like these. Will try to look up some material on Triton and TVM!
@wilbyyang4554
@wilbyyang4554 17 күн бұрын
Thank you so much for giving the amazing education! Do you about somthing like SIMD puzzles?
@SurflFilms
@SurflFilms 7 ай бұрын
thank you for making this, you are awesome.
@sourabmangrulkar9105
@sourabmangrulkar9105 11 ай бұрын
Thank you! very insightful and easy to follow.
@ms-wk2wg
@ms-wk2wg Жыл бұрын
Thanks for following up on my earlier comment and re-uploading so quickly!
@jonathanlee7969
@jonathanlee7969 11 ай бұрын
Thanks for the great content. At the same time I have a question, is it necessary to insert cuda.syncthreads() after the inner loop in puzzle 14? Because shared memory in thread blocks gets modified repeatedly.
@srush_nlp
@srush_nlp 11 ай бұрын
Possible that I missed a syncthreads. You should add one always after writing and before reading.
@jonathanlee7969
@jonathanlee7969 11 ай бұрын
@@srush_nlp Thanks for your reply. How about after reading and before writing to the same memory? I think it's also necessary to insert a syncthreads.
@rohanreddy1083
@rohanreddy1083 Жыл бұрын
Thanks for the reupload Sasha, the sound and video are in sync now! I just finished following along, thanks so much for the tutorial, there are sparse resources available to gain an intuitive understanding of CUDA for beginners. I believe you had a comment on the prior upload about what next steps to take after completing these puzzles, but I can't access it anymore as that video is down now. Especially for someone who may be interested in working with low-level optimizations and CUDA-type frameworks professionally, what would be some good projects to try and build? For example, implement some recent ML papers in Numba or Triton? Or try to make a basic version of such compilers myself?
@srush_nlp
@srush_nlp Жыл бұрын
Yeah, so I think both Triton and TVM are worth learning. If you are looking for harder projects, I would say flash-attention and gptq are both interesting things to try to implement. Both of them have raw CUDA and Triton kernels that you can find online to look at and study.
@rohanreddy1083
@rohanreddy1083 Жыл бұрын
Thank you!!
@grosucelmic
@grosucelmic 8 ай бұрын
The `int` had no attribute inputs in puzzle 12 was due to initialising the cache using 0 instead of 0.0 (apparently numba doesn’t want to coerce that to a numba.float32 on its own)
@sotasearcher
@sotasearcher 10 ай бұрын
I passed all the puzzles up to 10 before hitting a wall and deciding to check my solutions so far with this. I read "adds 10 to each position of a and stores it in out" as "adds 10 to each index of a and stores it in out", and every value is it's index, so I was just adding local_i and everything was passing 😅
@sotasearcher
@sotasearcher 10 ай бұрын
also I was first adding `out[local_i]` (which is 0) to deal with type issues
@yousefalnaser1751
@yousefalnaser1751 6 ай бұрын
You have one bug in the Prefix Sum code. If you didn't run the "else" branch that sets cache[local_i] = 0, you will get incorrect results. You have to have that else branch! You got correct results because you already initialized the cache with zeroes and did not reset the GPU in between. Just fyi :) otherwise it's a really good practice
Tensor Puzzles: Let's Play
1:15:33
Sasha Rush 🤗
Рет қаралды 6 М.
Speculations on Test-Time Scaling (o1)
47:56
Sasha Rush 🤗
Рет қаралды 7 М.
СКОЛЬКО ПАЛЬЦЕВ ТУТ?
00:16
Masomka
Рет қаралды 2,5 МЛН
Family Love #funny #sigma
00:16
CRAZY GREAPA
Рет қаралды 52 МЛН
When mom gets home, but you're in rollerblades.
00:40
Daniel LaBelle
Рет қаралды 156 МЛН
Writing Code That Runs FAST on a GPU
15:32
Low Level
Рет қаралды 567 М.
Just enough C to have fun
39:29
Kay Lack
Рет қаралды 62 М.
Will Merrill: The Illusion of State in State-Space Models
45:43
Formal Languages and Neural Networks Seminar
Рет қаралды 1,6 М.
How can a jigsaw have two distinct solutions?
26:23
Stand-up Maths
Рет қаралды 541 М.
Mike Seddon - Rust GPU Compute
29:58
RustAU
Рет қаралды 10 М.
Large Language Models in Five Formulas
58:02
Sasha Rush 🤗
Рет қаралды 35 М.
Being Competent With Coding Is More Fun
11:13
TheVimeagen
Рет қаралды 111 М.
СКОЛЬКО ПАЛЬЦЕВ ТУТ?
00:16
Masomka
Рет қаралды 2,5 МЛН