Don't Stop Pretraining!

  Рет қаралды 4,619

Connor Shorten

Connor Shorten

Күн бұрын

Пікірлер: 10
@connor-shorten
@connor-shorten 4 жыл бұрын
1:20 Motivation, Do the latest pretrained models work universally? 2:32 Continued Pre-training 4:05 What leads to gains in continued pre-training? 5:05 Heuristic Domain Similarity 6:00 Domains Explored 7:15 Quick Takeaways 8:27 Domain-Adaptive Pre-training 9:34 Task-Adaptive Pre-training 10:27 Computational Cost 11:21 unlabeled, task-relevant data for Task-adaptive pre-training 11:58 Recommendations to dataset developers and future work 12:55 Rambling
@shaz7163
@shaz7163 4 жыл бұрын
This concept is awesome.
@connor-shorten
@connor-shorten 4 жыл бұрын
Haha, I think it might have been better if they found that you don't need to keep pre-training. Interesting to think of what it might take to get there.
@shaz7163
@shaz7163 4 жыл бұрын
@@connor-shorten True! But most of NLP papers recently (even GPT-3) highlights it is important to work with data or model size. So are we reaching an era where new architectural/model developments are minimal ?
@connor-shorten
@connor-shorten 4 жыл бұрын
@@shaz7163 Definitely hard to say at this point, I'm placing my bet on dataset construction and injecting priors into the data will be more beneficial than new model designs. So many areas of research though so it really feels like reading tea leaves to predict confidently haha
@shaz7163
@shaz7163 4 жыл бұрын
@@connor-shorten Yeah agree! In a way it might be nice, CZ we can solve more and more real-world problems having access to large LMs. I also think Transformers will become ubiquitous.
@maxmetz01
@maxmetz01 4 жыл бұрын
@@shaz7163 I can really see this happening, despite disliking it. It is similar for other domains, such as computer vision. The design of new architectures is slowing down and researchers look a lot at ways of dealing with larger datasets (e.g. Teacher-Student models with unlabeled data) and larger models (EfficientNet) instead. On a short-term base, it seems like a more promising path
@OlegJakushkin
@OlegJakushkin 4 жыл бұрын
Do they finetune data encoders in-between stages?
@connor-shorten
@connor-shorten 4 жыл бұрын
Yes, I like to think of it as stepping stones in representation learning.
Data Augmentation using Pre-trained Transformer Models
19:34
Connor Shorten
Рет қаралды 4,4 М.
PEGASUS Explained!
24:16
Connor Shorten
Рет қаралды 10 М.
Car Bubble vs Lamborghini
00:33
Stokes Twins
Рет қаралды 36 МЛН
I Turned My Mom into Anxiety Mode! 😆💥 #prank #familyfun #funny
00:32
RoBERTa: A Robustly Optimized BERT Pretraining Approach
19:15
Yannic Kilcher
Рет қаралды 25 М.
Fine-tuning Large Language Models (LLMs) | w/ Example Code
28:18
Shaw Talebi
Рет қаралды 348 М.
Contrastive Clustering with SwAV
18:47
Connor Shorten
Рет қаралды 11 М.
Big Bird: Transformers for Longer Sequences (Paper Explained)
34:30
Yannic Kilcher
Рет қаралды 24 М.
ImageGPT (Generative Pre-training from Pixels)
20:16
Connor Shorten
Рет қаралды 8 М.
Open Pretrained Transformers - Susan Zhang  | Stanford MLSys #77
1:00:05
Stanford MLSys Seminars
Рет қаралды 18 М.
Small Language Models Are Also Few-Shot Learners
20:51
Connor Shorten
Рет қаралды 5 М.