FlashAttention - Tri Dao | Stanford MLSys #67

Distributed and Decentralized Learning - Ce Zhang | Stanford MLSys #68

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

1% vs 100% #beatbox #tiktok

ЧТО ОПАСНЕЕ? ОТВЕТЫ ВАС ШОКИРУЮТ... (1% ОТВЕЧАЮТ ПРАВИЛЬНО) #Shorts #Глент

Try this prank with your friends 😂 @karina-kola

小丑女COCO的审判。#天使 #小丑 #超人不会飞

FlashAttention - Tri Dao | Stanford MLSys #67

Рет қаралды 31,626

Stanford MLSys Seminars

Stanford MLSys Seminars

Күн бұрын

Пікірлер: 14

@anishbhanushali

@anishbhanushali Жыл бұрын

22:08 (basics of attention + memory hierarchy in GPU till here ) actual explainations starts

@TheAIEpiphany Жыл бұрын

btw at 28:10 the animation got the order wrong compared to the paper's Algorithm 1, the inner loop should be going over queries not over values

@for-ever-22 9 ай бұрын

These videos are amazing

@denizlarson8862

@denizlarson8862 Жыл бұрын

good research and nicely explained

@rfernand2 Жыл бұрын

Great work and presentation. Where else could this be applied?

@shuminghu Жыл бұрын

Why does tiling reduce HBM to SRAM transfer? Or is it through pipelining that transfer time overlap more with compute?

@kawingchan Жыл бұрын

I am not familiar at all with CPU or GPU architecture, so i naturally wonder how much of this also applies to Apple GPU (MPS). It was mentioned this is already in pytorch, but i do doubt if it even get activated on MPS. I would love to know, maybe at high level, how it may (if possible) be ported to Apple GPU, which has this unified memory thing.

@xianbiaoqi7009

@xianbiaoqi7009 Жыл бұрын

Good idea and nice talk.

@aamirmirza2806

@aamirmirza2806 Жыл бұрын

Really nice well explained.

@brandomiranda6703

@brandomiranda6703 Жыл бұрын

ML for theorem proving would also benefit with longer sequences! Reference Lemma proved in 300 BC...

@brandomiranda6703

@brandomiranda6703 Жыл бұрын

11:09

@sskhdsk Жыл бұрын

simple and effective

@JazevoAudiosurf

@JazevoAudiosurf Жыл бұрын

well explained

@deepanshusingh2527

@deepanshusingh2527 Жыл бұрын

This is utilised in inference as well? How fast compared to naive implementation?

Distributed and Decentralized Learning - Ce Zhang | Stanford MLSys #68

48:13

Distributed and Decentralized Learning - Ce Zhang | Stanford MLSys #68

Stanford MLSys Seminars

Рет қаралды 6 М.

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

57:45

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Grant Sanderson

Рет қаралды 296 М.

1% vs 100% #beatbox #tiktok

01:10

1% vs 100% #beatbox #tiktok

BeatboxJCOP

Рет қаралды 67 МЛН

ЧТО ОПАСНЕЕ? ОТВЕТЫ ВАС ШОКИРУЮТ... (1% ОТВЕЧАЮТ ПРАВИЛЬНО) #Shorts #Глент

00:38

ЧТО ОПАСНЕЕ? ОТВЕТЫ ВАС ШОКИРУЮТ... (1% ОТВЕЧАЮТ ПРАВИЛЬНО) #Shorts #Глент

ГЛЕНТ

Рет қаралды 2,4 МЛН

Try this prank with your friends 😂 @karina-kola

00:18

Try this prank with your friends 😂 @karina-kola

Andrey Grechka

Рет қаралды 9 МЛН

小丑女COCO的审判。#天使 #小丑 #超人不会飞

00:53

小丑女COCO的审判。#天使 #小丑 #超人不会飞

超人不会飞

Рет қаралды 16 МЛН

Hardware-aware Algorithms for Sequence Modeling - Tri Dao | Stanford MLSys #87

1:19:06

Hardware-aware Algorithms for Sequence Modeling - Tri Dao | Stanford MLSys #87

Stanford MLSys Seminars

Рет қаралды 6 М.

The KV Cache: Memory Usage in Transformers

8:33

The KV Cache: Memory Usage in Transformers

Efficient NLP

Рет қаралды 47 М.

MedAI #54: FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness | Tri Dao

47:47

MedAI #54: FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness | Tri Dao

Stanford MedAI

Рет қаралды 14 М.

Jason Wei: Scaling Paradigms for Large Language Models

40:10

Jason Wei: Scaling Paradigms for Large Language Models

Mayur Naik

Рет қаралды 3,5 М.

Monarch Mixer: Making Foundation Models More Efficient - Dan Fu | Stanford MLSys #86

56:32

Monarch Mixer: Making Foundation Models More Efficient - Dan Fu | Stanford MLSys #86

Stanford MLSys Seminars

Рет қаралды 4,2 М.

Attention in transformers, visually explained | DL6

26:10

Attention in transformers, visually explained | DL6

3Blue1Brown

Рет қаралды 2 МЛН

Stanford Webinar - Large Language Models Get the Hype, but Compound Systems Are the Future of AI

58:06

Stanford Webinar - Large Language Models Get the Hype, but Compound Systems Are the Future of AI

Stanford Online

Рет қаралды 89 М.

Learn Machine Learning Like a GENIUS and Not Waste Time

15:03

Learn Machine Learning Like a GENIUS and Not Waste Time

Infinite Codes

Рет қаралды 342 М.

How FlashAttention Accelerates Generative AI Revolution

11:54

How FlashAttention Accelerates Generative AI Revolution

Jia-Bin Huang

Рет қаралды 3,7 М.

Reformer: The Efficient Transformer

29:12

Reformer: The Efficient Transformer

Yannic Kilcher

Рет қаралды 20 М.

1% vs 100% #beatbox #tiktok

01:10

1% vs 100% #beatbox #tiktok

BeatboxJCOP

Рет қаралды 67 МЛН