XLNet: Generalized Autoregressive Pretraining for Language Understanding

  Рет қаралды 24,583

Yannic Kilcher

Yannic Kilcher

Күн бұрын

Abstract:
With the capability of modeling bidirectional contexts, denoising autoencoding based pretraining like BERT achieves better performance than pretraining approaches based on autoregressive language modeling. However, relying on corrupting the input with masks, BERT neglects dependency between the masked positions and suffers from a pretrain-finetune discrepancy. In light of these pros and cons, we propose XLNet, a generalized autoregressive pretraining method that (1) enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and (2) overcomes the limitations of BERT thanks to its autoregressive formulation. Furthermore, XLNet integrates ideas from Transformer-XL, the state-of-the-art autoregressive model, into pretraining. Empirically, XLNet outperforms BERT on 20 tasks, often by a large margin, and achieves state-of-the-art results on 18 tasks including question answering, natural language inference, sentiment analysis, and document ranking.
Authors: Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le
arxiv.org/abs/...

Пікірлер: 49
@clevuag
@clevuag 5 жыл бұрын
Please keep making these videos. Your work is amazing:))
@connor-shorten
@connor-shorten 5 жыл бұрын
Really cool! The "New York is a city" example helped a lot with my understanding of this!
@deeplearner2634
@deeplearner2634 3 жыл бұрын
I didn't really understand the random permutation idea from other sources but this video made it clear on how shuffled permutation allows to combine AR and BERT's AE idea. Thanks!
@abcdxx1059
@abcdxx1059 5 жыл бұрын
after a point searching on the internet gives you nothing this channel is the only place where i find explanations for very complex things in a way a newbie can understand please dont stop
@nikeshnaik5516
@nikeshnaik5516 5 жыл бұрын
I was not getting core idea behind XLNet and you made it look like piece of cake. Subscribed!! . Thank you.
@rpcruz
@rpcruz 5 жыл бұрын
I liked the quick digression into language modeling before getting into the meat of the paper. Awesome video!
@hemichael2111
@hemichael2111 5 жыл бұрын
so do I
@helloadventureworld
@helloadventureworld 4 жыл бұрын
you are genuinely changing the way I read and understand papers. your work is amazing do more NLP papers plz
@kaenovama
@kaenovama 2 жыл бұрын
7 min in and finally i get it where i didn't understand! Thank you!
@limynet
@limynet 3 жыл бұрын
This is a really nice rundown, compare to me half reading and half sleeping over the long paper, thank you so much.
@yuchengcho7471
@yuchengcho7471 5 жыл бұрын
Thanks Yannic, this explanation is super helpful!!
@vedantwalke1789
@vedantwalke1789 4 жыл бұрын
Great Video. The explanation made it very simple to understand and was very helpful !!
@fahadqurashi7103
@fahadqurashi7103 4 жыл бұрын
Excellent explanation, easy to understand and to the point 👌👌
@aayatrubab
@aayatrubab 5 жыл бұрын
I was eagerly waiting for it... Thanks, Yannic :)
@darkmythos4457
@darkmythos4457 5 жыл бұрын
Was actualy waiting for you to post this, thanks
@aleksandrbazanov3866
@aleksandrbazanov3866 5 жыл бұрын
Yannic is the best guy on the internet
@thepresistence5935
@thepresistence5935 2 жыл бұрын
I took 2.20 hours to understand this, but worth I don't forgot anymore
@venkatalv7014
@venkatalv7014 5 жыл бұрын
very clear explanation, thanks for the video
@nenadsubat9489
@nenadsubat9489 Жыл бұрын
This is so enlightening!!!
@BSelm05
@BSelm05 5 жыл бұрын
Thank you for a very clear explanation. I wonder how many samples they perform for each sentence. I couldn't find it in the paper.
@neilteng4161
@neilteng4161 4 жыл бұрын
Thank you So Much!
@Rednivrug
@Rednivrug 4 жыл бұрын
Language Modelling where Autoregressive is used to predict the next word by using the windows of previous words and Autoencoding is predicting the missing words in the windows of words. Aren't These two techniques are the same which we used to train word embedding for Word2Vec where CBOW(continuous bag of words) used to predict the next word by taking the previous window of words and N-gram method which used to predict the missing word by using previous and next words. What's the difference? Am I missing something?
@YannicKilcher
@YannicKilcher 4 жыл бұрын
The difference is that in autoregressive decoding you do it again and again in a sequence.
@keerthanajaganathan
@keerthanajaganathan 4 жыл бұрын
Thanks for the video - it is very helpful. Could you please make a video on Cross-lingual Language Model Pretraining (XLM)?
@prateethnayak8422
@prateethnayak8422 4 жыл бұрын
@12:40 is what model is listening to ! :D
@prabhikthapa4671
@prabhikthapa4671 5 жыл бұрын
Hi, could you also clarify why are embedding being multiplied to the representation produced by network in the equation 1,2 formulation, my understanding was you could directly apply softmax to the representation to train?
@AlphaMoury
@AlphaMoury 2 жыл бұрын
Thank you man
@narendraparmar1631
@narendraparmar1631 4 жыл бұрын
Thanks
@aqibfayyaz1619
@aqibfayyaz1619 3 жыл бұрын
Great effort.
@srikanthkoraveni8210
@srikanthkoraveni8210 5 жыл бұрын
Thank you
@aj-tg
@aj-tg 4 жыл бұрын
Thanks, You are doing god's work!
@supertramp_og
@supertramp_og 5 жыл бұрын
"Hmmmm " :P Great video.
@RajeshSharma-bd5zo
@RajeshSharma-bd5zo 2 жыл бұрын
Cool video!! Thanks for it. However, the voice quality was not that great and clearly, there is a scope of improvement for it here.
@RAZZKIRAN
@RAZZKIRAN 4 жыл бұрын
thankq
@jingciwang587
@jingciwang587 4 жыл бұрын
Now all my mind is like New Hmm is a Hmm, New York is a Hmm Hmm and Hmm~ Hmm~ Hmm~ Hmm~~~
@robinranabhat3125
@robinranabhat3125 5 жыл бұрын
In this AI journey, I find some explain papers. leave behind the code. some explain the code. hopelessly though. and leave the theory. Can't we have like a paper explanation followed by an explanation of the code in tensorflow or pytorch ?? OR maybe everyone just knows only the high-level overview and thus, ignoring that part. although requiring great necessity. please upvote guys.
@YannicKilcher
@YannicKilcher 5 жыл бұрын
If I were to also review the code, the videos would be 2+ hours 😁 but thanks for the feedback, will consider doing separate code reviews
@robinranabhat3125
@robinranabhat3125 5 жыл бұрын
@@YannicKilcher if you do code review as well, trust me your channel we be the one of its kind. Anyone strudy enough to learn these papers, would want to see implementation details
@abcdxx1059
@abcdxx1059 5 жыл бұрын
@@YannicKilcher damn you would do that for us 🤗🤗🤗
@tanny411
@tanny411 5 жыл бұрын
I swear to sit through the 2 hours+ videos. This channel is life!
@jwstolk
@jwstolk 4 жыл бұрын
2 out of 5 words is closer to 40%
@emuccino
@emuccino 4 жыл бұрын
18:23 😳😂
@CynthiaHarris-q5d
@CynthiaHarris-q5d 4 ай бұрын
Price Vista
@DeborahRodriguez-q8l
@DeborahRodriguez-q8l 4 ай бұрын
Carley Roads
@MarcelaApker-c9r
@MarcelaApker-c9r 4 ай бұрын
Tremblay Parkways
@AlfredMag-g8h
@AlfredMag-g8h 3 ай бұрын
Hailie Mountain
@RoxieRingelspaugh-n6r
@RoxieRingelspaugh-n6r 4 ай бұрын
Velva Tunnel
@wongmikeho
@wongmikeho 5 жыл бұрын
Hmm..hmm...hmm...hmmm
@KennethWilliams-s6y
@KennethWilliams-s6y 4 ай бұрын
Peggie Key
RoBERTa: A Robustly Optimized BERT Pretraining Approach
19:15
Yannic Kilcher
Рет қаралды 26 М.
Big Bird: Transformers for Longer Sequences (Paper Explained)
34:30
Yannic Kilcher
Рет қаралды 24 М.
Banana vs Sword on a Conveyor Belt
01:00
Mini Katana
Рет қаралды 77 МЛН
КОГДА БАТЯ ПОЛУЧИЛ ТРАВМУ НА РАБОТЕ😂#shorts
00:59
Thank you 😅
00:15
Nadir Show
Рет қаралды 46 МЛН
Transformers (how LLMs work) explained visually | DL5
27:14
3Blue1Brown
Рет қаралды 4,8 МЛН
Visualizing transformers and attention | Talk for TNG Big Tech Day '24
57:45
Reformer: The Efficient Transformer
29:12
Yannic Kilcher
Рет қаралды 20 М.
LSTM is dead. Long Live Transformers!
28:48
Seattle Applied Deep Learning
Рет қаралды 531 М.
What if all the world's biggest problems have the same solution?
24:52
Banana vs Sword on a Conveyor Belt
01:00
Mini Katana
Рет қаралды 77 МЛН