Great content. Really appreciate the step by step execution. Makes it a lot easier to understand.
@jacksmith-ih9rm20 күн бұрын
Great!
@tharunbhaskar6795Ай бұрын
Now hwow to scale it, like how to run it in mltple GPUs? or in Multiple nodes having multiple GPUs?
@princecanumaАй бұрын
You can use Huggingface TRL, transformers, accelerate or Axolotl to that.
@skanderbegvictor64872 ай бұрын
Subscribed, been following you on twitter, I am currently trying to write custom kernels for graph machine learning in mlx and am stuck.
@princecanuma2 ай бұрын
Great to hear 👌🏽 keep up the good work
@AZisk2 ай бұрын
good sound. good video. looking forward to seeing it in action
@princecanuma2 ай бұрын
Thank you very much! ❤️ I’m using a synco A2 mic.
@MaziyarPanahi2 ай бұрын
A complete walk through! Thank you. king!
@princecanuma2 ай бұрын
My pleasure @MaziyarPanahi
@gokayfem2 ай бұрын
lets go king!!
@princecanumaАй бұрын
Let’s go 🚀
@Create-The-Imaginable2 ай бұрын
What did you say in the beginning? 😃 What language was that? 🤔
@princecanuma2 ай бұрын
I said “hi, my name is Prince Canuma” in Polish 🇵🇱
@liyanan20042 ай бұрын
Could you please make a tutorial on vlm, and how it works. Like this series of videos, from scratch.
@princecanuma2 ай бұрын
That’s a great idea! 💡 Will do 👌🏽
@Tuscani2005GT3 ай бұрын
This channel is pure gold. Keep it up!
@princecanuma2 ай бұрын
Thank you very much! Glad you enjoy it :) There is a lot more coming soon 🚀
@sharjeel_mazhar3 ай бұрын
So in this series, you don't use any pre-trained weights? You build and train the model from scratch on a custom dataset?
@marinepower3 ай бұрын
Removing every other layer or something along those lines would be much more effective. If you think about it, this just means that one layer needs to do the work of two layers (one layer + one missing layer). Whereas if you just lop off half the network you suddenly need to learn 16 layers worth of processing in one fell swoop. And not only that, but your old layers need to be retrained since it is no longer sufficient for them to just do their one layer of work they were doing before. Basically, removing every other layer is a finetune, lopping off half the network is a cataclysmic change that (almost) requires training a brand new model from scratch.
@marinepower3 ай бұрын
The only thing that saves this technique is using the learned embeddings / the learned output layer, but you get that with strided layer removal too. Wish I had seen this video earlier, I'd have saved you $500 lol.
@wilfredomartel77813 ай бұрын
😊
@wilfredomartel77813 ай бұрын
😊🎉
@RadRebel43 ай бұрын
Amazing Video ! Could you Please Upload The traning scripts as well
@princecanumaАй бұрын
They are available in the video description. It’s an Axolotl config file
@fliptip3 ай бұрын
such a high quality content piece
@princecanumaАй бұрын
Thank you very much!
@sharjeel_mazhar3 ай бұрын
Can you please make sure that your future videos have higher resolution? Maybe 1440p or above? Other than that, great job! 💯
@linz42133 ай бұрын
Well made Prince! Learned a lot
@maslaxali88263 ай бұрын
CS programmers are vampires. My eeeeyyyes. great content though
@sergey_a3 ай бұрын
Why are there only 3 likes, I put 4 on HF.)
@spkgyk3 ай бұрын
Why do you use 32 bit paged optimzier when the model is being fine-tuned with QLoRA? Surely QLoRA stores the weights in 8bit double quantized form, so using a 32 bit optimizer makes no difference, and the weight updates need to be converted back to 8 bit anyway? Please help me understand this
@princecanuma3 ай бұрын
Additionally, 8bit states are dequantized to 32bit for the update anyways. huggingface.co/docs/bitsandbytes/main/en/explanations/optimizers
@spkgyk3 ай бұрын
@@princecanuma Thank you for the quick response. With 8-bit optimizers, large models can be finetuned with 75% less GPU memory without losing any accuracy compared to training with standard 32-bit optimizers. The reduced memory requirements means 8-bit optimizers are 4x faster than a standard optimizer, and no hyperparameter tuning is required. Surely this means that using 32 bit just wastes compute power? Please correct me if I'm wrong, I'm really trying to understand the benefits. Is it because training with 32 bit means that despite converting to 8 bit for the weight update, the conversion leads to small accuracy gains?
@princecanuma3 ай бұрын
There are no accuracy gains only reduced GPU usage and potentially some extra speed. In terms of speed, I personally didn’t notice any changes. I tested it yesterday and besides reduced GPU usage I noticed that it would take just as long as the 32bit to complete training.
@PaoloTshiyole3 ай бұрын
Your English is nice
@princecanuma3 ай бұрын
Thank you very much!
@leiray74653 ай бұрын
cool
@princecanuma3 ай бұрын
Awesome, I’m happy you liked it :)
@kishoretvk3 ай бұрын
Thanks for committing to the open source and educating people on cutting edge knowledge.
@princecanuma3 ай бұрын
Most welcome, it’s my pleasure!
@yoanijosias3 ай бұрын
Very good, can’t wait to see updates to it.
@princecanuma3 ай бұрын
You and me both!
@vivekpadman52484 ай бұрын
Bro how did you train llama 3 without paper?
@princecanuma4 ай бұрын
Could you elaborate?
@vivekpadman52484 ай бұрын
@@princecanuma As far as I know there hasn't been an official llama 3 paper released and no data Info as well. But I could be wrong... 😅
@princecanuma4 ай бұрын
@@vivekpadman5248 true, they only released a blog detailing the data, model arch and performance. Here is how I did it: Llama-3 has the same exact architecture of Llama-2 which we already covered in this channel. kzbin.info/aero/PLDn_JsyofyfQp4td_ub6LfIg5vxyu6YJK&si=0Gyt9mdaA-ydiWOA Finally, if you understand how these models work you don't need the paper, the code implementation is more than enough.
@vivekpadman52484 ай бұрын
@@princecanuma oh understood, thanks I'll check it out and also your video 💙
@princecanuma4 ай бұрын
Most welcome :)
@ngamcode24854 ай бұрын
this is very impressive and great content. thank you
@princecanuma4 ай бұрын
You're very welcome!
@jihoonjung27764 ай бұрын
Best video i ever seen. thanks~~!~!~!~!
@princecanuma4 ай бұрын
Most welcome!
@princecanuma4 ай бұрын
It’s my pleasure
@sheikhakbar20674 ай бұрын
Command-R is one of the best models out there for non-English / non-European languages. In Arabic I tried it, it's almost perfect, not as good as Claude (which also perfect for Arabic), but as far as I understand command-R from cohere (the community version I guess) is free! Is that true, it's free (I know command-R-plus is not free).
@kishoretvk4 ай бұрын
Super impressive. Great value One question How do I further train the model on my custom content Instead of LORA ? Can we further full training it and add new memory
@princecanuma4 ай бұрын
Most welcome! You can do that, but that can be very expensive.
@AC-go1tp4 ай бұрын
This is very thoughtful and great initiative! researchers with enough gray matter but limited means can be still in the game . Thank you PC🙏!
@princecanuma4 ай бұрын
Most welcome! It’s my pleasure:) I lived through this so others don’t have to.
@ojasvisingh7865 ай бұрын
🥳🤩👏💐
@philgoddard86065 ай бұрын
Thank you for the really nice entry into using gemma locally! Could you share how to utilize GPUs on mac - i just got a mac studio and saw you had referenced some code earlier for NVIDIA. Thnks in advance :)
@princecanuma5 ай бұрын
Most welcome! You can use MLX: github.com/ml-explore/mlx-examples/tree/main/llms
@sayantan3365 ай бұрын
Great work 🎉. Would be great if you can introduce tutorial on coding GPT and BERT from scratch as well using only Pytorch. And then show how to do their pre training on custom data.
@princecanuma5 ай бұрын
Thank you very much! Llama is pretty close to GPT so I think BERT is more differentiated. What kind of data would you suggest?
@morningstar39965 ай бұрын
Can we have the presentation please?
@princecanuma5 ай бұрын
Sure, here you go! www.canva.com/design/DAF7MlJ2Zoc/f75ryYIZnLc80NlIFZhS5A/edit?DAF7MlJ2Zoc&
@morningstar39965 ай бұрын
@@princecanuma Appreciate it my friend
@girijeshthodupunuri13005 ай бұрын
Great video! Learnt a lot.
@princecanuma5 ай бұрын
Thank you very much! I’m happy you liked it :) There is so much more on the way.
@girijeshthodupunuri13005 ай бұрын
@@princecanuma Could you go over how to implement Parent Document retriever?
@princecanuma5 ай бұрын
@user-vd7im8gc2w Why do you need position ids? You use it to map the input ids to their respective position in the sequence. Example: input_ids = [100, 20, 4, 50] position_ids = torch.arange(input_ids.shape…) print(position_ids) >> [0, 1, 2, 3]
@Frost-Head6 ай бұрын
Keep up the good work
@princecanuma6 ай бұрын
Thank you!
@sayantan3365 ай бұрын
Brilliant 🎉
@princecanuma5 ай бұрын
Thanks!
@Bebetter111116 ай бұрын
First time watching your video. Keep going bro 💪, its your friend Afzal
@princecanuma6 ай бұрын
Thank you very much brother! It's been long my friend :)
@RemekKinas6 ай бұрын
Really great job!
@princecanuma6 ай бұрын
Thank you very much, Remek! I’m happy you liked it :)
@dossantos44156 ай бұрын
Hey please continue with the coding llama 2 from scratch
@princecanuma6 ай бұрын
Hey, thanks for watching and pinging me for part 3. Don’t worry, Coding Llama 2 from scratch part 3 should be up soon. Potentially tomorrow :) The video has been recorded, However, it was delayed due to my first ever graduation which occurred today, a very important moment for me. 👨🏾🎓
@tharunbhaskar67956 ай бұрын
waiting for the training part
@princecanuma6 ай бұрын
Working on it 👌🏽 The video should be out this week.
@banbephanboi47086 ай бұрын
Great work! Wait for your next videos
@princecanuma6 ай бұрын
Thank very much! New videos dropping soon.
@CarlosAntunesZ7 ай бұрын
Amazing video 🖖🏽
@princecanuma7 ай бұрын
Thank you very much! I’m happy you enjoy it :)
@shihab-soft7 ай бұрын
Thank you very much this was very useful
@princecanuma7 ай бұрын
Most welcome :)
@illia_user7 ай бұрын
Great job! Thank you!
@princecanuma7 ай бұрын
Hi, thank you very much!
@buddhu93647 ай бұрын
Is there a way I could go about doing the same thing in Windows and Gemma?
@princecanuma7 ай бұрын
Hi, thanks for watching! Yes, there is and I will cover it in a future video soon. 👌🏽
@NitoKuvell7 ай бұрын
Parabens Prince é um orgulho ver oque te tornaste na esfera das tecnologias. Avante
@princecanuma7 ай бұрын
Thank you very much brother! It means a lot coming from you :) Long time no see, let’s catch up.
@steliomoiane4947 ай бұрын
uau, amazing Prince, thanks for sharing this very useful content