Reformer: The Efficient Transformer

  Рет қаралды 5,456

Connor Shorten

Connor Shorten

Күн бұрын

Пікірлер: 7
@CristianGarcia
@CristianGarcia 5 жыл бұрын
Thanks, recently read it, enjoyed the video! BTW: I think they claim to reduce memory from O(n^2) to O(n log n), not O(n).
@connor-shorten
@connor-shorten 5 жыл бұрын
Thank you!! I was commenting on indexing the individual queries vs. the entire dot product with the title O(L^2) to O(L). I apologize for not making this clear, hopefully future watchers can see this comment and understand the mistake! I didn't mention the n log n in the video because I wasn't really able to understand it myself, maybe you could help clarify this! Is this the memory cost of the LSH bucketing?
@CristianGarcia
@CristianGarcia 5 жыл бұрын
Mi intuition was that because of the bucketing defined by LSH they only had to store the dot-product of these small matrices, but I really don't know the details of the implementation. Today I saw the Google Research team has an implementation of the Reformer on this jax-based library called trax they created (terrible documentation): github.com/google/trax/tree/master/trax/models/reformer
@connor-shorten
@connor-shorten 5 жыл бұрын
@@CristianGarcia Yeah, it was a tough paper to read, really vague about exactly how they implemented it. I saw a lot of criticism on the paper on reddit / hackernews. Thanks for sharing, I'll check this out!
@jielyu4943
@jielyu4943 5 жыл бұрын
nice summary
@hihiendru
@hihiendru 5 жыл бұрын
greatr explanation, thank you.
@planktonfun1
@planktonfun1 5 жыл бұрын
I've seen this on paper, never thought they would use it
Reformer: The Efficient Transformer
29:12
Yannic Kilcher
Рет қаралды 20 М.
Efficient Self-Attention for Transformers
21:31
Machine Learning Studio
Рет қаралды 4,2 М.
She made herself an ear of corn from his marmalade candies🌽🌽🌽
00:38
Valja & Maxim Family
Рет қаралды 18 МЛН
Sparse LLMs at inference: 6x faster transformers! | DEJAVU paper explained
13:17
AI Coffee Break with Letitia
Рет қаралды 6 М.
Attention in transformers, step-by-step | DL6
26:10
3Blue1Brown
Рет қаралды 2 МЛН
LSTM is dead. Long Live Transformers!
28:48
Seattle Applied Deep Learning
Рет қаралды 531 М.
FixMatch
18:10
Connor Shorten
Рет қаралды 5 М.
Rethinking Attention with Performers (Paper Explained)
54:39
Yannic Kilcher
Рет қаралды 56 М.
Transformers (how LLMs work) explained visually | DL5
27:14
3Blue1Brown
Рет қаралды 4,4 МЛН
CS480/680 Lecture 19: Attention and Transformer Networks
1:22:38
Pascal Poupart
Рет қаралды 353 М.
Longformer: The Long-Document Transformer
26:36
Yannic Kilcher
Рет қаралды 24 М.
Why Does Diffusion Work Better than Auto-Regression?
20:18
Algorithmic Simplicity
Рет қаралды 434 М.