this is one of the clearest and most lucid presentations i have seen, on any topic
@zoriiginalx7544Ай бұрын
The illustration of how RAM works on a physical level really was illuminating. Really drove home why linear accesses are important and why cache lines are the way they are. Fantastic talk.
@dennisrkb2 жыл бұрын
Great presentation on GPU architecture, performance tradeoffs and considerations.
@ypwangreg Жыл бұрын
I was always puzzled and fascinated about how those grid/block/threads work in parallel in the GPU and this video explains it in one and all. very impressive and helpful!
@SrikarDurgi5 ай бұрын
Dan is definitely the MAN. Great talk!
@holeo1962 жыл бұрын
Another great presentation by Stephen Jones, fascinating
@KingDestrukto3 ай бұрын
Fantastic presentation, wow!
@hadiorabi69210 ай бұрын
Man this is amazing
@KalkiCharcha-hd5un4 ай бұрын
@21:17 "Its exactly the right amount of data to hit the peak bandwidth of my mem system , Even if my program reads data from all over the place , each read is exactly ONE page of my memory " I didnt understand this statement 21:17 "Even if my program reads data from all over the place" Does it mean even if the data is read from non consecutive memory ??
@perli2164 ай бұрын
yes
@perli2164 ай бұрын
You got the benefits of reading contiguous memory for free basically, even when doing random reads
@KalkiCharcha-hd5un4 ай бұрын
@@perli216 Ok cool so basically only mem is contigues we get advantage like if i = tid + bid*bsize , and not like i = 2*(tid + bid*bsize)
@perli2164 ай бұрын
@@KalkiCharcha-hd5un I don't understand your question
@KalkiCharcha-hd5un4 ай бұрын
@@perli216 "Even if my program reads data from all over the place" , I think I got it , Initially I thought "... all over the place" as in any random memory / non consecutive . all over the place as in diff threads from same page , because single thread will bring in the data from same page anyway.
@miramar-1032 күн бұрын
superb!
@chamidou20237 ай бұрын
Great presentation!
@purandharb Жыл бұрын
thanks for detailed explanation. Really enjoyed it.
@kimoohuang4 ай бұрын
Great presentation!It is mentioned that 4 warps x 256 bytes per warp = 1024 bytes, and that equals to the Memory page size 1024 bytes. It only happens when the 4 warps running adjacent threads。Are the 4 warps always running adjacent threads?
@perli2164 ай бұрын
@@kimoohuang Not necessarily. Depends on the warp scheduler
@mugglepower11 ай бұрын
oh man I hope my mum fixed me with a better brain processing unit so I could understand this
@LetoTheSecond04 ай бұрын
Looks like the link in the description is broken/truncated?
@perli2164 ай бұрын
@@LetoTheSecond0 yes, yourube did this. It's just the original source for the video