Loved this presentation. Essential watching for the rest of us. Amazing references as well.
@michaelcarlon18315 жыл бұрын
My man! Great, super useful video! A true contribution to the community.
@allenhung43903 жыл бұрын
You explained concepts pretty clear. Keep up the good work!!
@tsupeichen6934 жыл бұрын
Great Tutorial with great Voice! Thank you!
@xaviergastaldi5 жыл бұрын
Nice one! I think that, for the chunks illustration, the arrows only go to the left because the authors used a decoder reformer i.e. attention is only looking at past words and not at future ones
@YannicKilcher5 жыл бұрын
That makes sense. In the text they say every chunk can look at itself and its neighbors, I guess in the encoder that would be to the left and right.
@matthewtang14893 жыл бұрын
That makes so much more sense! thanks!
@rramjee14 жыл бұрын
Hi Yannic. Thanks for this wonderful explanation. Can you please share any practical implementation of the reformer architecture. Thanks again.
@avihudekel47093 жыл бұрын
in 11:00 I thought about an alt j song. great video!
@manos_v1634 жыл бұрын
Still not clear how nlogn factor comes out from LSH Attention. Quite vague how they manage this complexity. My intuition is that through chunking, each vector is involved in a constant number of computations, hence asymptotically smaller than logn factor.
@RaviTeja-zk4lb4 жыл бұрын
Both the longformer and this reformer are trying to solve the problem of large sentences. But, both uses different ideas.here it is using lsh and their it is using sliding window and global attention concept. Which works wel?
@mohammadkhan54304 жыл бұрын
Great Explanation
@xiquandong11835 жыл бұрын
Great video. If possible, can you please explain ALBERT paper next? Thanks.
@mathmagic93334 жыл бұрын
Hi Yannic, nice video and presentation! I had a question about 18:42 -- how do we know that a specific "bucket" does not spill over into multiple chunks (in the diagram, max bucket size
@YannicKilcher4 жыл бұрын
true, I guess rare cases are just left out for now :)
@tamasionut22794 жыл бұрын
I was wondering if you can do a presentation on the RevNets article as well.
@lucasnestler69334 жыл бұрын
memcnn explains it rather well while also providing a pytorch wrapper for it
@petroschristodoulou79874 жыл бұрын
thansk a lot for your video
@pratikkorat7904 жыл бұрын
How can i contact you ? Plzzz answer..!!
@tae8984 жыл бұрын
You are so good!
@simleek5 жыл бұрын
Huh. This reminds me of error correction a bit. I think we should be looking into error correction more for transformers. I tried using an error correction algorithm and a bool array for the place/number encoding for my transformer. Multiple numbers could be represented as overlayed on top of each other, and space can be reduced by removing these binary neurons by layer, or just at random. I wonder if that backprop fix could apply to that system...
@YannicKilcher5 жыл бұрын
That sounds like it could work, but I haven't thought about it deeply. Maybe worth a try
@YannicKilcher5 жыл бұрын
The main problem with drawing parallels to things like security or cryptography is the following: Machine learning thinks in terms of distances, ratios, proximity, high and low numbers, whereas these other fields think about data in a much more strict sense: Either two things are exactly equal or not at all, either two hashes match or not, they don't care if the two are similar, so the fundamental goals of the fields are opposite.
@jony77792 жыл бұрын
Im convinced this hasn't caught on because its so complicated to implement 🤨