Reformer: The Efficient Transformer

  Рет қаралды 20,678

Yannic Kilcher

Yannic Kilcher

Күн бұрын

Пікірлер
@autripat
@autripat 4 жыл бұрын
Loved this presentation. Essential watching for the rest of us. Amazing references as well.
@michaelcarlon1831
@michaelcarlon1831 5 жыл бұрын
My man! Great, super useful video! A true contribution to the community.
@allenhung4390
@allenhung4390 3 жыл бұрын
You explained concepts pretty clear. Keep up the good work!!
@tsupeichen693
@tsupeichen693 4 жыл бұрын
Great Tutorial with great Voice! Thank you!
@xaviergastaldi
@xaviergastaldi 5 жыл бұрын
Nice one! I think that, for the chunks illustration, the arrows only go to the left because the authors used a decoder reformer i.e. attention is only looking at past words and not at future ones
@YannicKilcher
@YannicKilcher 5 жыл бұрын
That makes sense. In the text they say every chunk can look at itself and its neighbors, I guess in the encoder that would be to the left and right.
@matthewtang1489
@matthewtang1489 3 жыл бұрын
That makes so much more sense! thanks!
@rramjee1
@rramjee1 4 жыл бұрын
Hi Yannic. Thanks for this wonderful explanation. Can you please share any practical implementation of the reformer architecture. Thanks again.
@avihudekel4709
@avihudekel4709 3 жыл бұрын
in 11:00 I thought about an alt j song. great video!
@manos_v163
@manos_v163 4 жыл бұрын
Still not clear how nlogn factor comes out from LSH Attention. Quite vague how they manage this complexity. My intuition is that through chunking, each vector is involved in a constant number of computations, hence asymptotically smaller than logn factor.
@RaviTeja-zk4lb
@RaviTeja-zk4lb 4 жыл бұрын
Both the longformer and this reformer are trying to solve the problem of large sentences. But, both uses different ideas.here it is using lsh and their it is using sliding window and global attention concept. Which works wel?
@mohammadkhan5430
@mohammadkhan5430 4 жыл бұрын
Great Explanation
@xiquandong1183
@xiquandong1183 5 жыл бұрын
Great video. If possible, can you please explain ALBERT paper next? Thanks.
@mathmagic9333
@mathmagic9333 4 жыл бұрын
Hi Yannic, nice video and presentation! I had a question about 18:42 -- how do we know that a specific "bucket" does not spill over into multiple chunks (in the diagram, max bucket size
@YannicKilcher
@YannicKilcher 4 жыл бұрын
true, I guess rare cases are just left out for now :)
@tamasionut2279
@tamasionut2279 4 жыл бұрын
I was wondering if you can do a presentation on the RevNets article as well.
@lucasnestler6933
@lucasnestler6933 4 жыл бұрын
memcnn explains it rather well while also providing a pytorch wrapper for it
@petroschristodoulou7987
@petroschristodoulou7987 4 жыл бұрын
thansk a lot for your video
@pratikkorat790
@pratikkorat790 4 жыл бұрын
How can i contact you ? Plzzz answer..!!
@tae898
@tae898 4 жыл бұрын
You are so good!
@simleek
@simleek 5 жыл бұрын
Huh. This reminds me of error correction a bit. I think we should be looking into error correction more for transformers. I tried using an error correction algorithm and a bool array for the place/number encoding for my transformer. Multiple numbers could be represented as overlayed on top of each other, and space can be reduced by removing these binary neurons by layer, or just at random. I wonder if that backprop fix could apply to that system...
@YannicKilcher
@YannicKilcher 5 жыл бұрын
That sounds like it could work, but I haven't thought about it deeply. Maybe worth a try
@YannicKilcher
@YannicKilcher 5 жыл бұрын
The main problem with drawing parallels to things like security or cryptography is the following: Machine learning thinks in terms of distances, ratios, proximity, high and low numbers, whereas these other fields think about data in a much more strict sense: Either two things are exactly equal or not at all, either two hashes match or not, they don't care if the two are similar, so the fundamental goals of the fields are opposite.
@jony7779
@jony7779 2 жыл бұрын
Im convinced this hasn't caught on because its so complicated to implement 🤨
@prakharthapak4229
@prakharthapak4229 4 жыл бұрын
5.30 Yannic be like lets skip time complexity :p
Visualizing transformers and attention | Talk for TNG Big Tech Day '24
57:45
Linformer: Self-Attention with Linear Complexity (Paper Explained)
50:24
“Don’t stop the chances.”
00:44
ISSEI / いっせい
Рет қаралды 62 МЛН
The evil clown plays a prank on the angel
00:39
超人夫妇
Рет қаралды 53 МЛН
Reformer: The Efficient Transformer
14:04
Connor Shorten
Рет қаралды 5 М.
How might LLMs store facts | DL7
22:43
3Blue1Brown
Рет қаралды 922 М.
Longformer: The Long-Document Transformer
26:36
Yannic Kilcher
Рет қаралды 24 М.
Transformers (how LLMs work) explained visually | DL5
27:14
3Blue1Brown
Рет қаралды 4,2 МЛН
Why Does Diffusion Work Better than Auto-Regression?
20:18
Algorithmic Simplicity
Рет қаралды 408 М.
Denoising Diffusion Probabilistic Models | DDPM Explained
29:29
ExplainingAI
Рет қаралды 49 М.
“Don’t stop the chances.”
00:44
ISSEI / いっせい
Рет қаралды 62 МЛН