1:20 Motivation, Do the latest pretrained models work universally? 2:32 Continued Pre-training 4:05 What leads to gains in continued pre-training? 5:05 Heuristic Domain Similarity 6:00 Domains Explored 7:15 Quick Takeaways 8:27 Domain-Adaptive Pre-training 9:34 Task-Adaptive Pre-training 10:27 Computational Cost 11:21 unlabeled, task-relevant data for Task-adaptive pre-training 11:58 Recommendations to dataset developers and future work 12:55 Rambling
@shaz71634 жыл бұрын
This concept is awesome.
@connor-shorten4 жыл бұрын
Haha, I think it might have been better if they found that you don't need to keep pre-training. Interesting to think of what it might take to get there.
@shaz71634 жыл бұрын
@@connor-shorten True! But most of NLP papers recently (even GPT-3) highlights it is important to work with data or model size. So are we reaching an era where new architectural/model developments are minimal ?
@connor-shorten4 жыл бұрын
@@shaz7163 Definitely hard to say at this point, I'm placing my bet on dataset construction and injecting priors into the data will be more beneficial than new model designs. So many areas of research though so it really feels like reading tea leaves to predict confidently haha
@shaz71634 жыл бұрын
@@connor-shorten Yeah agree! In a way it might be nice, CZ we can solve more and more real-world problems having access to large LMs. I also think Transformers will become ubiquitous.
@maxmetz014 жыл бұрын
@@shaz7163 I can really see this happening, despite disliking it. It is similar for other domains, such as computer vision. The design of new architectures is slowing down and researchers look a lot at ways of dealing with larger datasets (e.g. Teacher-Student models with unlabeled data) and larger models (EfficientNet) instead. On a short-term base, it seems like a more promising path
@OlegJakushkin4 жыл бұрын
Do they finetune data encoders in-between stages?
@connor-shorten4 жыл бұрын
Yes, I like to think of it as stepping stones in representation learning.