Prince Canuma

Пікірлер

@aliasad7086 12 күн бұрын

Great content. Really appreciate the step by step execution. Makes it a lot easier to understand.

@jacksmith-ih9rm 20 күн бұрын

Great！

@tharunbhaskar6795 Ай бұрын

Now hwow to scale it, like how to run it in mltple GPUs? or in Multiple nodes having multiple GPUs?

@princecanuma Ай бұрын

You can use Huggingface TRL, transformers, accelerate or Axolotl to that.

@skanderbegvictor6487 2 ай бұрын

Subscribed, been following you on twitter, I am currently trying to write custom kernels for graph machine learning in mlx and am stuck.

@princecanuma 2 ай бұрын

Great to hear 👌🏽 keep up the good work

@AZisk 2 ай бұрын

good sound. good video. looking forward to seeing it in action

@princecanuma 2 ай бұрын

Thank you very much! ❤️ I’m using a synco A2 mic.

@MaziyarPanahi 2 ай бұрын

A complete walk through! Thank you. king!

@princecanuma 2 ай бұрын

My pleasure @MaziyarPanahi

@gokayfem 2 ай бұрын

lets go king!!

@princecanuma Ай бұрын

Let’s go 🚀

@Create-The-Imaginable 2 ай бұрын

What did you say in the beginning? 😃 What language was that? 🤔

@princecanuma 2 ай бұрын

I said “hi, my name is Prince Canuma” in Polish 🇵🇱

@liyanan2004 2 ай бұрын

Could you please make a tutorial on vlm, and how it works. Like this series of videos, from scratch.

@princecanuma 2 ай бұрын

That’s a great idea! 💡 Will do 👌🏽

@Tuscani2005GT 3 ай бұрын

This channel is pure gold. Keep it up!

@princecanuma 2 ай бұрын

Thank you very much! Glad you enjoy it :) There is a lot more coming soon 🚀

@sharjeel_mazhar 3 ай бұрын

So in this series, you don't use any pre-trained weights? You build and train the model from scratch on a custom dataset?

@marinepower 3 ай бұрын

Removing every other layer or something along those lines would be much more effective. If you think about it, this just means that one layer needs to do the work of two layers (one layer + one missing layer). Whereas if you just lop off half the network you suddenly need to learn 16 layers worth of processing in one fell swoop. And not only that, but your old layers need to be retrained since it is no longer sufficient for them to just do their one layer of work they were doing before. Basically, removing every other layer is a finetune, lopping off half the network is a cataclysmic change that (almost) requires training a brand new model from scratch.

@marinepower 3 ай бұрын

The only thing that saves this technique is using the learned embeddings / the learned output layer, but you get that with strided layer removal too. Wish I had seen this video earlier, I'd have saved you $500 lol.

@wilfredomartel7781 3 ай бұрын

😊

@wilfredomartel7781 3 ай бұрын

😊🎉

@RadRebel4 3 ай бұрын

Amazing Video ! Could you Please Upload The traning scripts as well

@princecanuma Ай бұрын

They are available in the video description. It’s an Axolotl config file

@fliptip 3 ай бұрын

such a high quality content piece

@princecanuma Ай бұрын

Thank you very much!

@sharjeel_mazhar 3 ай бұрын

Can you please make sure that your future videos have higher resolution? Maybe 1440p or above? Other than that, great job! 💯

@linz4213 3 ай бұрын

Well made Prince! Learned a lot

@maslaxali8826 3 ай бұрын

CS programmers are vampires. My eeeeyyyes. great content though

@sergey_a 3 ай бұрын

Why are there only 3 likes, I put 4 on HF.)

@spkgyk 3 ай бұрын

Why do you use 32 bit paged optimzier when the model is being fine-tuned with QLoRA? Surely QLoRA stores the weights in 8bit double quantized form, so using a 32 bit optimizer makes no difference, and the weight updates need to be converted back to 8 bit anyway? Please help me understand this

@princecanuma 3 ай бұрын

Additionally, 8bit states are dequantized to 32bit for the update anyways. huggingface.co/docs/bitsandbytes/main/en/explanations/optimizers

@spkgyk 3 ай бұрын

@@princecanuma Thank you for the quick response. With 8-bit optimizers, large models can be finetuned with 75% less GPU memory without losing any accuracy compared to training with standard 32-bit optimizers. The reduced memory requirements means 8-bit optimizers are 4x faster than a standard optimizer, and no hyperparameter tuning is required. Surely this means that using 32 bit just wastes compute power? Please correct me if I'm wrong, I'm really trying to understand the benefits. Is it because training with 32 bit means that despite converting to 8 bit for the weight update, the conversion leads to small accuracy gains?

@princecanuma 3 ай бұрын

There are no accuracy gains only reduced GPU usage and potentially some extra speed. In terms of speed, I personally didn’t notice any changes. I tested it yesterday and besides reduced GPU usage I noticed that it would take just as long as the 32bit to complete training.

@PaoloTshiyole 3 ай бұрын

Your English is nice

@princecanuma 3 ай бұрын

Thank you very much!

@leiray7465 3 ай бұрын

cool

@princecanuma 3 ай бұрын

Awesome, I’m happy you liked it :)

@kishoretvk 3 ай бұрын

Thanks for committing to the open source and educating people on cutting edge knowledge.

@princecanuma 3 ай бұрын

Most welcome, it’s my pleasure!

@yoanijosias 3 ай бұрын

Very good, can’t wait to see updates to it.

@princecanuma 3 ай бұрын

You and me both!

@vivekpadman5248 4 ай бұрын

Bro how did you train llama 3 without paper?

@princecanuma 4 ай бұрын

Could you elaborate?

@vivekpadman5248 4 ай бұрын

@@princecanuma As far as I know there hasn't been an official llama 3 paper released and no data Info as well. But I could be wrong... 😅

@princecanuma 4 ай бұрын

@@vivekpadman5248 true, they only released a blog detailing the data, model arch and performance. Here is how I did it: Llama-3 has the same exact architecture of Llama-2 which we already covered in this channel. kzbin.info/aero/PLDn_JsyofyfQp4td_ub6LfIg5vxyu6YJK&si=0Gyt9mdaA-ydiWOA Finally, if you understand how these models work you don't need the paper, the code implementation is more than enough.

@vivekpadman5248 4 ай бұрын

@@princecanuma oh understood, thanks I'll check it out and also your video 💙

@princecanuma 4 ай бұрын

Most welcome :)

@ngamcode2485 4 ай бұрын

this is very impressive and great content. thank you

@princecanuma 4 ай бұрын

You're very welcome!

@jihoonjung2776 4 ай бұрын

Best video i ever seen. thanks~~!~!~!~!

@princecanuma 4 ай бұрын

Most welcome!

@princecanuma 4 ай бұрын

It’s my pleasure

@sheikhakbar2067 4 ай бұрын

Command-R is one of the best models out there for non-English / non-European languages. In Arabic I tried it, it's almost perfect, not as good as Claude (which also perfect for Arabic), but as far as I understand command-R from cohere (the community version I guess) is free! Is that true, it's free (I know command-R-plus is not free).

@kishoretvk 4 ай бұрын

Super impressive. Great value One question How do I further train the model on my custom content Instead of LORA ? Can we further full training it and add new memory

@princecanuma 4 ай бұрын

Most welcome! You can do that, but that can be very expensive.

@AC-go1tp 4 ай бұрын

This is very thoughtful and great initiative! researchers with enough gray matter but limited means can be still in the game . Thank you PC🙏!

@princecanuma 4 ай бұрын

Most welcome! It’s my pleasure:) I lived through this so others don’t have to.

@ojasvisingh786 5 ай бұрын

🥳🤩👏💐

@philgoddard8606 5 ай бұрын

Thank you for the really nice entry into using gemma locally! Could you share how to utilize GPUs on mac - i just got a mac studio and saw you had referenced some code earlier for NVIDIA. Thnks in advance :)

@princecanuma 5 ай бұрын

Most welcome! You can use MLX: github.com/ml-explore/mlx-examples/tree/main/llms

@sayantan336 5 ай бұрын

Great work 🎉. Would be great if you can introduce tutorial on coding GPT and BERT from scratch as well using only Pytorch. And then show how to do their pre training on custom data.

@princecanuma 5 ай бұрын

Thank you very much! Llama is pretty close to GPT so I think BERT is more differentiated. What kind of data would you suggest?

@morningstar3996 5 ай бұрын

Can we have the presentation please?

@princecanuma 5 ай бұрын

Sure, here you go! www.canva.com/design/DAF7MlJ2Zoc/f75ryYIZnLc80NlIFZhS5A/edit?DAF7MlJ2Zoc&

@morningstar3996 5 ай бұрын

@@princecanuma Appreciate it my friend

@girijeshthodupunuri1300 5 ай бұрын

Great video! Learnt a lot.

@princecanuma 5 ай бұрын

Thank you very much! I’m happy you liked it :) There is so much more on the way.

@girijeshthodupunuri1300 5 ай бұрын

@@princecanuma Could you go over how to implement Parent Document retriever?

@princecanuma 5 ай бұрын

@user-vd7im8gc2w Why do you need position ids? You use it to map the input ids to their respective position in the sequence. Example: input_ids = [100, 20, 4, 50] position_ids = torch.arange(input_ids.shape…) print(position_ids) >> [0, 1, 2, 3]

@Frost-Head 6 ай бұрын

Keep up the good work

@princecanuma 6 ай бұрын

Thank you!

@sayantan336 5 ай бұрын

Brilliant 🎉

@princecanuma 5 ай бұрын

Thanks!

@Bebetter11111 6 ай бұрын

First time watching your video. Keep going bro 💪, its your friend Afzal

@princecanuma 6 ай бұрын

Thank you very much brother! It's been long my friend :)

@RemekKinas 6 ай бұрын

Really great job!

@princecanuma 6 ай бұрын

Thank you very much, Remek! I’m happy you liked it :)

@dossantos4415 6 ай бұрын

Hey please continue with the coding llama 2 from scratch

@princecanuma 6 ай бұрын

Hey, thanks for watching and pinging me for part 3. Don’t worry, Coding Llama 2 from scratch part 3 should be up soon. Potentially tomorrow :) The video has been recorded, However, it was delayed due to my first ever graduation which occurred today, a very important moment for me. 👨🏾‍🎓

@tharunbhaskar6795 6 ай бұрын

waiting for the training part

@princecanuma 6 ай бұрын

Working on it 👌🏽 The video should be out this week.