FlashAttention - Tri Dao | Stanford MLSys #67

  Рет қаралды 31,626

Stanford MLSys Seminars

Stanford MLSys Seminars

Күн бұрын

Пікірлер: 14
@anishbhanushali
@anishbhanushali Жыл бұрын
22:08 (basics of attention + memory hierarchy in GPU till here ) actual explainations starts
@TheAIEpiphany
@TheAIEpiphany Жыл бұрын
btw at 28:10 the animation got the order wrong compared to the paper's Algorithm 1, the inner loop should be going over queries not over values
@for-ever-22
@for-ever-22 9 ай бұрын
These videos are amazing
@denizlarson8862
@denizlarson8862 Жыл бұрын
good research and nicely explained
@rfernand2
@rfernand2 Жыл бұрын
Great work and presentation. Where else could this be applied?
@shuminghu
@shuminghu Жыл бұрын
Why does tiling reduce HBM to SRAM transfer? Or is it through pipelining that transfer time overlap more with compute?
@kawingchan
@kawingchan Жыл бұрын
I am not familiar at all with CPU or GPU architecture, so i naturally wonder how much of this also applies to Apple GPU (MPS). It was mentioned this is already in pytorch, but i do doubt if it even get activated on MPS. I would love to know, maybe at high level, how it may (if possible) be ported to Apple GPU, which has this unified memory thing.
@xianbiaoqi7009
@xianbiaoqi7009 Жыл бұрын
Good idea and nice talk.
@aamirmirza2806
@aamirmirza2806 Жыл бұрын
Really nice well explained.
@brandomiranda6703
@brandomiranda6703 Жыл бұрын
ML for theorem proving would also benefit with longer sequences! Reference Lemma proved in 300 BC...
@brandomiranda6703
@brandomiranda6703 Жыл бұрын
11:09
@sskhdsk
@sskhdsk Жыл бұрын
simple and effective
@JazevoAudiosurf
@JazevoAudiosurf Жыл бұрын
well explained
@deepanshusingh2527
@deepanshusingh2527 Жыл бұрын
This is utilised in inference as well? How fast compared to naive implementation?
Distributed and Decentralized Learning  - Ce Zhang | Stanford MLSys #68
48:13
Stanford MLSys Seminars
Рет қаралды 6 М.
Visualizing transformers and attention | Talk for TNG Big Tech Day '24
57:45
1% vs 100% #beatbox #tiktok
01:10
BeatboxJCOP
Рет қаралды 67 МЛН
Try this prank with your friends 😂 @karina-kola
00:18
Andrey Grechka
Рет қаралды 9 МЛН
小丑女COCO的审判。#天使 #小丑 #超人不会飞
00:53
超人不会飞
Рет қаралды 16 МЛН
Hardware-aware Algorithms for Sequence Modeling - Tri Dao | Stanford MLSys #87
1:19:06
The KV Cache: Memory Usage in Transformers
8:33
Efficient NLP
Рет қаралды 47 М.
Jason Wei: Scaling Paradigms for Large Language Models
40:10
Mayur Naik
Рет қаралды 3,5 М.
Monarch Mixer: Making Foundation Models More Efficient - Dan Fu | Stanford MLSys #86
56:32
Attention in transformers, visually explained | DL6
26:10
3Blue1Brown
Рет қаралды 2 МЛН
Learn Machine Learning Like a GENIUS and Not Waste Time
15:03
Infinite Codes
Рет қаралды 342 М.
How FlashAttention Accelerates Generative AI Revolution
11:54
Jia-Bin Huang
Рет қаралды 3,7 М.
Reformer: The Efficient Transformer
29:12
Yannic Kilcher
Рет қаралды 20 М.
1% vs 100% #beatbox #tiktok
01:10
BeatboxJCOP
Рет қаралды 67 МЛН