22:08 (basics of attention + memory hierarchy in GPU till here ) actual explainations starts
@TheAIEpiphany Жыл бұрын
btw at 28:10 the animation got the order wrong compared to the paper's Algorithm 1, the inner loop should be going over queries not over values
@for-ever-229 ай бұрын
These videos are amazing
@denizlarson8862 Жыл бұрын
good research and nicely explained
@rfernand2 Жыл бұрын
Great work and presentation. Where else could this be applied?
@shuminghu Жыл бұрын
Why does tiling reduce HBM to SRAM transfer? Or is it through pipelining that transfer time overlap more with compute?
@kawingchan Жыл бұрын
I am not familiar at all with CPU or GPU architecture, so i naturally wonder how much of this also applies to Apple GPU (MPS). It was mentioned this is already in pytorch, but i do doubt if it even get activated on MPS. I would love to know, maybe at high level, how it may (if possible) be ported to Apple GPU, which has this unified memory thing.
@xianbiaoqi7009 Жыл бұрын
Good idea and nice talk.
@aamirmirza2806 Жыл бұрын
Really nice well explained.
@brandomiranda6703 Жыл бұрын
ML for theorem proving would also benefit with longer sequences! Reference Lemma proved in 300 BC...
@brandomiranda6703 Жыл бұрын
11:09
@sskhdsk Жыл бұрын
simple and effective
@JazevoAudiosurf Жыл бұрын
well explained
@deepanshusingh2527 Жыл бұрын
This is utilised in inference as well? How fast compared to naive implementation?