Reformer: The Efficient Transformer

Reformer: The Efficient Transformer

Efficient Self-Attention for Transformers

Қарғалардың анасы бар ма? | 1 серия | Сериал «‎QARGA 2»‎ | КОНКУРС

Он ждал ДВА ГОДА, чтобы преподать наглецу УРОК СКРОМНОСТИ #shorts

She made herself an ear of corn from his marmalade candies🌽🌽🌽

Đang ngồi chơi bỗng dưng bể cá vỡ kính, may có CCTV chứng minh sự trong sạch cho cô bé

Reformer: The Efficient Transformer

Рет қаралды 5,456

Connor Shorten

Connor Shorten

Күн бұрын

Пікірлер: 7

@CristianGarcia

@CristianGarcia 5 жыл бұрын

Thanks, recently read it, enjoyed the video! BTW: I think they claim to reduce memory from O(n^2) to O(n log n), not O(n).

@connor-shorten

@connor-shorten 5 жыл бұрын

Thank you!! I was commenting on indexing the individual queries vs. the entire dot product with the title O(L^2) to O(L). I apologize for not making this clear, hopefully future watchers can see this comment and understand the mistake! I didn't mention the n log n in the video because I wasn't really able to understand it myself, maybe you could help clarify this! Is this the memory cost of the LSH bucketing?

@CristianGarcia

@CristianGarcia 5 жыл бұрын

Mi intuition was that because of the bucketing defined by LSH they only had to store the dot-product of these small matrices, but I really don't know the details of the implementation. Today I saw the Google Research team has an implementation of the Reformer on this jax-based library called trax they created (terrible documentation): github.com/google/trax/tree/master/trax/models/reformer

@connor-shorten

@connor-shorten 5 жыл бұрын

@@CristianGarcia Yeah, it was a tough paper to read, really vague about exactly how they implemented it. I saw a lot of criticism on the paper on reddit / hackernews. Thanks for sharing, I'll check this out!

@jielyu4943 5 жыл бұрын

nice summary

@hihiendru 5 жыл бұрын

greatr explanation, thank you.

@planktonfun1 5 жыл бұрын

I've seen this on paper, never thought they would use it

Reformer: The Efficient Transformer

29:12

Reformer: The Efficient Transformer

Yannic Kilcher

Рет қаралды 20 М.

Efficient Self-Attention for Transformers

21:31

Efficient Self-Attention for Transformers

Machine Learning Studio

Рет қаралды 4,2 М.

Қарғалардың анасы бар ма? | 1 серия | Сериал «‎QARGA 2»‎ | КОНКУРС

41:02

Қарғалардың анасы бар ма? | 1 серия | Сериал «‎QARGA 2»‎ | КОНКУРС

OMIR

Рет қаралды 1,4 МЛН

Он ждал ДВА ГОДА, чтобы преподать наглецу УРОК СКРОМНОСТИ #shorts

01:00

Он ждал ДВА ГОДА, чтобы преподать наглецу УРОК СКРОМНОСТИ #shorts

BalcevMMA_BOXING

Рет қаралды 10 МЛН

She made herself an ear of corn from his marmalade candies🌽🌽🌽

00:38

She made herself an ear of corn from his marmalade candies🌽🌽🌽

Valja & Maxim Family

Рет қаралды 18 МЛН

Đang ngồi chơi bỗng dưng bể cá vỡ kính, may có CCTV chứng minh sự trong sạch cho cô bé

00:27

Đang ngồi chơi bỗng dưng bể cá vỡ kính, may có CCTV chứng minh sự trong sạch cho cô bé

Tiin_vn - Viettel Media

Рет қаралды 28 МЛН

Sparse LLMs at inference: 6x faster transformers! | DEJAVU paper explained

13:17

Sparse LLMs at inference: 6x faster transformers! | DEJAVU paper explained

AI Coffee Break with Letitia

Рет қаралды 6 М.

Attention in transformers, step-by-step | DL6

26:10

Attention in transformers, step-by-step | DL6

3Blue1Brown

Рет қаралды 2 МЛН

LSTM is dead. Long Live Transformers!

28:48

LSTM is dead. Long Live Transformers!

Seattle Applied Deep Learning

Рет қаралды 531 М.

FixMatch

18:10

Connor Shorten

Рет қаралды 5 М.

Rethinking Attention with Performers (Paper Explained)

54:39

Rethinking Attention with Performers (Paper Explained)

Yannic Kilcher

Рет қаралды 56 М.

Google’s New AI Is Recreating the Whole World to Unlock Superhuman Intelligence

9:16

Google’s New AI Is Recreating the Whole World to Unlock Superhuman Intelligence

AI Revolution

Рет қаралды 52 М.

Transformers (how LLMs work) explained visually | DL5

27:14

Transformers (how LLMs work) explained visually | DL5

3Blue1Brown

Рет қаралды 4,4 МЛН

CS480/680 Lecture 19: Attention and Transformer Networks

1:22:38

CS480/680 Lecture 19: Attention and Transformer Networks

Pascal Poupart

Рет қаралды 353 М.

Longformer: The Long-Document Transformer

26:36

Longformer: The Long-Document Transformer

Yannic Kilcher

Рет қаралды 24 М.

Why Does Diffusion Work Better than Auto-Regression?

20:18

Why Does Diffusion Work Better than Auto-Regression?

Algorithmic Simplicity

Рет қаралды 434 М.

Қарғалардың анасы бар ма? | 1 серия | Сериал «‎QARGA 2»‎ | КОНКУРС

41:02

Қарғалардың анасы бар ма? | 1 серия | Сериал «‎QARGA 2»‎ | КОНКУРС

OMIR

Рет қаралды 1,4 МЛН