How CUDA Programming Works

How CUDA Programming Works | GTC 2022

Рет қаралды 28,434

Dan the Man

Күн бұрын

Пікірлер: 26

@sami9323 7 ай бұрын

this is one of the clearest and most lucid presentations i have seen, on any topic

@zoriiginalx7544 Ай бұрын

The illustration of how RAM works on a physical level really was illuminating. Really drove home why linear accesses are important and why cache lines are the way they are. Fantastic talk.

@dennisrkb 2 жыл бұрын

Great presentation on GPU architecture, performance tradeoffs and considerations.

@ypwangreg Жыл бұрын

I was always puzzled and fascinated about how those grid/block/threads work in parallel in the GPU and this video explains it in one and all. very impressive and helpful!

@SrikarDurgi 5 ай бұрын

Dan is definitely the MAN. Great talk!

@holeo196 2 жыл бұрын

Another great presentation by Stephen Jones, fascinating

@KingDestrukto 3 ай бұрын

Fantastic presentation, wow!

@hadiorabi692 10 ай бұрын

Man this is amazing

@KalkiCharcha-hd5un 4 ай бұрын

@21:17 "Its exactly the right amount of data to hit the peak bandwidth of my mem system , Even if my program reads data from all over the place , each read is exactly ONE page of my memory " I didnt understand this statement 21:17 "Even if my program reads data from all over the place" Does it mean even if the data is read from non consecutive memory ??

@perli216 4 ай бұрын

yes

@perli216 4 ай бұрын

You got the benefits of reading contiguous memory for free basically, even when doing random reads

@KalkiCharcha-hd5un 4 ай бұрын

@@perli216 Ok cool so basically only mem is contigues we get advantage like if i = tid + bid*bsize , and not like i = 2*(tid + bid*bsize)

@perli216 4 ай бұрын

@@KalkiCharcha-hd5un I don't understand your question

@KalkiCharcha-hd5un 4 ай бұрын

@@perli216 "Even if my program reads data from all over the place" , I think I got it , Initially I thought "... all over the place" as in any random memory / non consecutive . all over the place as in diff threads from same page , because single thread will bring in the data from same page anyway.

@miramar-103 2 күн бұрын

superb!

@chamidou2023 7 ай бұрын

Great presentation!

@purandharb Жыл бұрын

thanks for detailed explanation. Really enjoyed it.

@kimoohuang 4 ай бұрын

Great presentation！It is mentioned that 4 warps x 256 bytes per warp = 1024 bytes, and that equals to the Memory page size 1024 bytes. It only happens when the 4 warps running adjacent threads。Are the 4 warps always running adjacent threads?

@perli216 4 ай бұрын

@@kimoohuang Not necessarily. Depends on the warp scheduler