No video

μTransfer: Tuning GPT-3 hyperparameters on one GPU | Explained by the inventor

  Рет қаралды 3,428

Edward Hu

Edward Hu

Күн бұрын

Пікірлер: 22
@dimekogjia
@dimekogjia 7 ай бұрын
Wow, Wow and Wow! What were you waiting to start creating those videos? You are a wonderful teacher, easy to follow. Thanks again and please keep making those videos!
@athelstanrex
@athelstanrex 10 ай бұрын
This is amazing!!! Such amazing information. Please, please, please keep making these videos! I'd love to see your distillation of the deep learning field-- like what are the most promising directions that you see, etc?
@wkevwang
@wkevwang 7 ай бұрын
Great description! Thank you for sharing
@costahuang6568
@costahuang6568 10 ай бұрын
Instant subscribe as well! Good to see researchers like yourself explaining topics in simpler terms 🤗
@edwardjhu
@edwardjhu 10 ай бұрын
Thanks Costa!🦾
@Niamato_inc
@Niamato_inc 5 ай бұрын
Thank you wholeheartedly.
@iefan
@iefan 10 ай бұрын
This for making this video, it's very helpful.
@chaeyoonjang9157
@chaeyoonjang9157 9 ай бұрын
I was very impressed with your paper and video. Thank you.🥰 But I have one question. If the problem during the training phase is that a significant correlation between activations and weights arises with width, leading to blow-up, might it be enough to only use the aspect of muP that adjusts the learning rate based on width? Even if the initialization cannot be done according to muP, could this still ensure sufficient alignment? (In fact, I require mu-transfer for fine-tuning, yet I am currently dealing with a model not pre-trained using muP.)
@jimshtepa5423
@jimshtepa5423 11 күн бұрын
thank you for a great presentation. I am new to llm and would like to try to run the code on github. is my local machine (macbook m1) can handle it? or is it something for large enterprises with massive compute inventory?
@chamber3593
@chamber3593 10 ай бұрын
Instant subscribe, keep em coming 🐱🐱🐱
@brandomiranda6703
@brandomiranda6703 7 ай бұрын
In my experience the parameter to scale for quicker fitting is the depth not the width. Width barely helps. I think extending mu-transfer to sufficiently large width but deeper net might be much better.
@saikalyan3966
@saikalyan3966 10 ай бұрын
Yo big fan, can u make a separate vedio using muP for pre training
@edwardjhu
@edwardjhu 10 ай бұрын
I might make another one focusing on how to implement muP!
@faridsaud6567
@faridsaud6567 2 күн бұрын
@@edwardjhuthat would be amazing!
@animastershorts
@animastershorts 10 ай бұрын
Can you please make a video of you demonstrating using parametrization on real data set? I'd love some inspiration for using this code!
@edwardjhu
@edwardjhu 10 ай бұрын
What datasets do you have in mind?
@animastershorts
@animastershorts 10 ай бұрын
I'm actually new to LLMs and I don't have a large dataset right now. I was hoping you had a few large data sets you used in this project that you could use again, so that I could see your project working and understand it more. @@edwardjhu
@jimshtepa5423
@jimshtepa5423 11 күн бұрын
@@edwardjhu anything from kaggle. hotel booking data? in general what is it good for?
@muralikrishna9141
@muralikrishna9141 8 ай бұрын
Please give me a guide how to master AI from begginer to advanced im so confused i need some help in direction of our future please reply.
@SkySinghh
@SkySinghh 9 ай бұрын
I want to know more can you tell where to start if you are me now pls
@messengercreator
@messengercreator 10 ай бұрын
and I'll challenge u AI CHAT DEEPAI only explain and create a whole game and u and GPT 4
OMG what happened??😳 filaretiki family✨ #social
01:00
Filaretiki
Рет қаралды 12 МЛН
Schoolboy Runaway в реальной жизни🤣@onLI_gAmeS
00:31
МишАня
Рет қаралды 2,4 МЛН
🩷🩵VS👿
00:38
ISSEI / いっせい
Рет қаралды 19 МЛН
Why Does Diffusion Work Better than Auto-Regression?
20:18
Algorithmic Simplicity
Рет қаралды 284 М.
Are GFlowNets the future of AI?
7:29
Edward Hu
Рет қаралды 26 М.
Has Generative AI Already Peaked? - Computerphile
12:48
Computerphile
Рет қаралды 959 М.
PyTorch in 100 Seconds
2:43
Fireship
Рет қаралды 899 М.
This game does something impossible
21:47
Simon Clark
Рет қаралды 10 М.
Parameters vs hyperparameters in machine learning
1:19
codebasics
Рет қаралды 57 М.
Fine-tuning LLMs with PEFT and LoRA
15:35
Sam Witteveen
Рет қаралды 122 М.
Why Neural Networks can learn (almost) anything
10:30
Emergent Garden
Рет қаралды 1,2 МЛН