Don't Stop Pretraining!

  Рет қаралды 4,672

Connor Shorten

Connor Shorten

Күн бұрын

Пікірлер: 10
@connor-shorten
@connor-shorten 4 жыл бұрын
1:20 Motivation, Do the latest pretrained models work universally? 2:32 Continued Pre-training 4:05 What leads to gains in continued pre-training? 5:05 Heuristic Domain Similarity 6:00 Domains Explored 7:15 Quick Takeaways 8:27 Domain-Adaptive Pre-training 9:34 Task-Adaptive Pre-training 10:27 Computational Cost 11:21 unlabeled, task-relevant data for Task-adaptive pre-training 11:58 Recommendations to dataset developers and future work 12:55 Rambling
@shaz7163
@shaz7163 4 жыл бұрын
This concept is awesome.
@connor-shorten
@connor-shorten 4 жыл бұрын
Haha, I think it might have been better if they found that you don't need to keep pre-training. Interesting to think of what it might take to get there.
@shaz7163
@shaz7163 4 жыл бұрын
@@connor-shorten True! But most of NLP papers recently (even GPT-3) highlights it is important to work with data or model size. So are we reaching an era where new architectural/model developments are minimal ?
@connor-shorten
@connor-shorten 4 жыл бұрын
@@shaz7163 Definitely hard to say at this point, I'm placing my bet on dataset construction and injecting priors into the data will be more beneficial than new model designs. So many areas of research though so it really feels like reading tea leaves to predict confidently haha
@shaz7163
@shaz7163 4 жыл бұрын
@@connor-shorten Yeah agree! In a way it might be nice, CZ we can solve more and more real-world problems having access to large LMs. I also think Transformers will become ubiquitous.
@maxmetz01
@maxmetz01 4 жыл бұрын
@@shaz7163 I can really see this happening, despite disliking it. It is similar for other domains, such as computer vision. The design of new architectures is slowing down and researchers look a lot at ways of dealing with larger datasets (e.g. Teacher-Student models with unlabeled data) and larger models (EfficientNet) instead. On a short-term base, it seems like a more promising path
@OlegJakushkin
@OlegJakushkin 4 жыл бұрын
Do they finetune data encoders in-between stages?
@connor-shorten
@connor-shorten 4 жыл бұрын
Yes, I like to think of it as stepping stones in representation learning.
Data Augmentation using Pre-trained Transformer Models
19:34
Connor Shorten
Рет қаралды 4,4 М.
Exploring Simple Siamese Representation Learning
14:36
Connor Shorten
Рет қаралды 9 М.
Minecraft: Who made MINGLE the best? 🤔 #Shorts
00:34
Twi Shorts
Рет қаралды 46 МЛН
Small Language Models Are Also Few-Shot Learners
20:51
Connor Shorten
Рет қаралды 5 М.
PEGASUS Explained!
24:16
Connor Shorten
Рет қаралды 10 М.
Pattern-Exploiting Training for NLP!
20:31
Connor Shorten
Рет қаралды 8 М.
Contrastive Clustering with SwAV
18:47
Connor Shorten
Рет қаралды 11 М.
Rethinking Pre-training and Self-Training
17:53
Connor Shorten
Рет қаралды 8 М.
ImageGPT (Generative Pre-training from Pixels)
20:16
Connor Shorten
Рет қаралды 8 М.
RoBERTa: A Robustly Optimized BERT Pretraining Approach
19:15
Yannic Kilcher
Рет қаралды 26 М.
AI Weekly Update - January 31st, 2022
36:41
Connor Shorten
Рет қаралды 1,9 М.
What if all the world's biggest problems have the same solution?
24:52
Minecraft: Who made MINGLE the best? 🤔 #Shorts
00:34
Twi Shorts
Рет қаралды 46 МЛН