μTransfer: Tuning GPT-3 hyperparameters on one GPU

No video

μTransfer: Tuning GPT-3 hyperparameters on one GPU | Explained by the inventor

Рет қаралды 3,428

Edward Hu

Күн бұрын

Пікірлер: 22

@dimekogjia 7 ай бұрын

Wow, Wow and Wow! What were you waiting to start creating those videos? You are a wonderful teacher, easy to follow. Thanks again and please keep making those videos!

@athelstanrex 10 ай бұрын

This is amazing!!! Such amazing information. Please, please, please keep making these videos! I'd love to see your distillation of the deep learning field-- like what are the most promising directions that you see, etc?

@wkevwang 7 ай бұрын

Great description! Thank you for sharing

@costahuang6568 10 ай бұрын

Instant subscribe as well! Good to see researchers like yourself explaining topics in simpler terms 🤗

@edwardjhu 10 ай бұрын

Thanks Costa!🦾

@Niamato_inc 5 ай бұрын

Thank you wholeheartedly.

@iefan 10 ай бұрын

This for making this video, it's very helpful.

@chaeyoonjang9157 9 ай бұрын

I was very impressed with your paper and video. Thank you.🥰 But I have one question. If the problem during the training phase is that a significant correlation between activations and weights arises with width, leading to blow-up, might it be enough to only use the aspect of muP that adjusts the learning rate based on width? Even if the initialization cannot be done according to muP, could this still ensure sufficient alignment? (In fact, I require mu-transfer for fine-tuning, yet I am currently dealing with a model not pre-trained using muP.)

@jimshtepa5423 11 күн бұрын

thank you for a great presentation. I am new to llm and would like to try to run the code on github. is my local machine (macbook m1) can handle it? or is it something for large enterprises with massive compute inventory?

@chamber3593 10 ай бұрын

Instant subscribe, keep em coming 🐱🐱🐱

@brandomiranda6703 7 ай бұрын

In my experience the parameter to scale for quicker fitting is the depth not the width. Width barely helps. I think extending mu-transfer to sufficiently large width but deeper net might be much better.

@saikalyan3966 10 ай бұрын

Yo big fan, can u make a separate vedio using muP for pre training

@edwardjhu 10 ай бұрын

I might make another one focusing on how to implement muP!

@faridsaud6567 2 күн бұрын

@@edwardjhuthat would be amazing!

@animastershorts 10 ай бұрын

Can you please make a video of you demonstrating using parametrization on real data set? I'd love some inspiration for using this code!

@edwardjhu 10 ай бұрын

What datasets do you have in mind?

@animastershorts 10 ай бұрын

I'm actually new to LLMs and I don't have a large dataset right now. I was hoping you had a few large data sets you used in this project that you could use again, so that I could see your project working and understand it more. @@edwardjhu

@jimshtepa5423 11 күн бұрын

@@edwardjhu anything from kaggle. hotel booking data? in general what is it good for?