Don't Stop Pretraining!

Рет қаралды 4,672

Күн бұрын

Пікірлер: 10

@connor-shorten 4 жыл бұрын

1:20 Motivation, Do the latest pretrained models work universally? 2:32 Continued Pre-training 4:05 What leads to gains in continued pre-training? 5:05 Heuristic Domain Similarity 6:00 Domains Explored 7:15 Quick Takeaways 8:27 Domain-Adaptive Pre-training 9:34 Task-Adaptive Pre-training 10:27 Computational Cost 11:21 unlabeled, task-relevant data for Task-adaptive pre-training 11:58 Recommendations to dataset developers and future work 12:55 Rambling

@shaz7163 4 жыл бұрын

This concept is awesome.

@connor-shorten 4 жыл бұрын

Haha, I think it might have been better if they found that you don't need to keep pre-training. Interesting to think of what it might take to get there.

@shaz7163 4 жыл бұрын

@@connor-shorten True! But most of NLP papers recently (even GPT-3) highlights it is important to work with data or model size. So are we reaching an era where new architectural/model developments are minimal ?

@connor-shorten 4 жыл бұрын

@@shaz7163 Definitely hard to say at this point, I'm placing my bet on dataset construction and injecting priors into the data will be more beneficial than new model designs. So many areas of research though so it really feels like reading tea leaves to predict confidently haha

@shaz7163 4 жыл бұрын

@@connor-shorten Yeah agree! In a way it might be nice, CZ we can solve more and more real-world problems having access to large LMs. I also think Transformers will become ubiquitous.

@maxmetz01 4 жыл бұрын

@@shaz7163 I can really see this happening, despite disliking it. It is similar for other domains, such as computer vision. The design of new architectures is slowing down and researchers look a lot at ways of dealing with larger datasets (e.g. Teacher-Student models with unlabeled data) and larger models (EfficientNet) instead. On a short-term base, it seems like a more promising path