GPT-2: Language Models are Unsupervised Multitask Learners

  Рет қаралды 30,177

Yannic Kilcher

Yannic Kilcher

Күн бұрын

A look at OpenAI's new GPT-2 model and the surrounding controversy.
blog.openai.co...
Abstract:
Natural language processing tasks, such as question answering, machine translation, reading comprehension, and summarization, are typically approached with supervised learning on taskspecific datasets. We demonstrate that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText. When conditioned on a document plus questions, the answers generated by the language model reach 55 F1 on the CoQA dataset - matching or exceeding the performance of 3 out of 4 baseline systems without using the 127,000+ training examples. The capacity of the language model is essential to the success of zero-shot task transfer and increasing it improves performance in a log-linear fashion across tasks. Our largest model, GPT-2, is a 1.5B parameter Transformer that achieves state of the art results on 7 out of 8 tested language modeling datasets in a zero-shot setting but still underfits WebText. Samples from the model reflect these improvements and contain coherent paragraphs of text. These findings suggest a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.
Authors:
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever

Пікірлер: 22
@jcorey333
@jcorey333 10 ай бұрын
It's such a shame that the field stagnated after this. Nothing bigger or better than GPT2. Maybe someday.
@bobsalita3417
@bobsalita3417 5 жыл бұрын
We need more paper talkers such as Yannic. Yes, Two Minute Papers is great but there's many papers worthy of discussion, many opinions needed, and many worthy methods of analysis.
@Xnaarkhoo
@Xnaarkhoo 2 жыл бұрын
First ten minutes no substance - don’t have more time to waste here
@michaelcarlon1831
@michaelcarlon1831 5 жыл бұрын
These paper-talks are great!
@neuron8186
@neuron8186 3 жыл бұрын
Open Ai is more like Close Ai
@ben2258
@ben2258 4 жыл бұрын
Did you ever end up making a video that discusses byte pair encoding?
@YannicKilcher
@YannicKilcher 4 жыл бұрын
Not yet :)
@ambujmittal6824
@ambujmittal6824 4 жыл бұрын
kzbin.info/www/bejne/Y2Gsm3ljbLR1adU&ab_channel=Rasa Here you go. :)
@eab4984
@eab4984 2 жыл бұрын
Would be awesome if Yannic made the video on Byte Pair Encoding mentioned 18:30
@harmitchhabra989
@harmitchhabra989 3 жыл бұрын
I think a neural network is essentialy a function that we can't express explicitly... The function is fine tuned and generated uaing the training data and then the said function is passed an input that we want to know the output of and since the function was fine tuned to the dataset that we gave it we can expect a prediction of output similar to the dataset. Essentialy Nn can be used to rougly map huge pieces of data to each other and then use the mapping to obtain similar outputs for inputs whose outputs are otherwise unknown to us. Also to check wether a given input is similar to the other inouts of our dataset we can input the input on a trained neural network and then see accuracy of neural network to compare similarity of this input to training inputs. Thus can be used for a recommendation system like youtubes.
@xaxfixho
@xaxfixho 2 ай бұрын
We need more of this
@kumarsubham2078
@kumarsubham2078 4 жыл бұрын
Great video! Btw, is the model released now and do we have weights available?
@YannicKilcher
@YannicKilcher 4 жыл бұрын
Yes, I think so
@dongilseo5727
@dongilseo5727 3 жыл бұрын
Thanks for sharing this video. I just found that GPT2 models will be available soon at Ainize Teachable NLP for free fine-tuning.
@dongilseo5727
@dongilseo5727 3 жыл бұрын
@Web Front-end You can just search with 'teachable nlp'! (it seems links are auto-deleted on youtube)
@mannacharya4088
@mannacharya4088 Жыл бұрын
7:10 Got me rolling on the floor laughing
@ambujmittal6824
@ambujmittal6824 4 жыл бұрын
How can we say that GPT is simply not overfitting since it literally has seen so much data that now any down-stream task would already have been covered in the training dataset?
@YannicKilcher
@YannicKilcher 3 жыл бұрын
Not necessarily. They do deduplication of the downstream tasks
@user-or7ji5hv8y
@user-or7ji5hv8y 5 жыл бұрын
Is there a good video that explains how transformers work?
@YannicKilcher
@YannicKilcher 5 жыл бұрын
kzbin.info/www/bejne/n3XYnZulhpejqNE
@pensarfeo
@pensarfeo 5 жыл бұрын
Hilarious!
GPT-3: Language Models are Few-Shot Learners (Paper Explained)
1:04:30
Yannic Kilcher
Рет қаралды 212 М.
when you have plan B 😂
00:11
Andrey Grechka
Рет қаралды 67 МЛН
Пришёл к другу на ночёвку 😂
01:00
Cadrol&Fatich
Рет қаралды 11 МЛН
The joker favorite#joker  #shorts
00:15
Untitled Joker
Рет қаралды 30 МЛН
Dark Matter in the Pub 2024 Highlight Video
5:39
ARC Centre of Excellence for Dark Matter
Рет қаралды 10
Ilya Sutskever - GPT-2
38:43
Matroid
Рет қаралды 10 М.
GPT2 Explained!
11:12
Connor Shorten
Рет қаралды 28 М.
An Observation on Generalization
57:21
Simons Institute
Рет қаралды 161 М.
Scaling Instruction-Finetuned Language Models - Video Summary
7:55
Shayne Longpre
Рет қаралды 2,7 М.
GPT-1 | Paper Explained & PyTorch Implementation
18:46
Maciej Balawejder
Рет қаралды 5 М.