QLoRA paper explained (Efficient Finetuning of Quantized LLMs)

Рет қаралды 14,827

Күн бұрын

Пікірлер

@pierluigiurru962 7 ай бұрын

Your videos on LoRA finally made the concepts click for me. It was clearly explained! Thank you for the content you make

@AIBites 7 ай бұрын

Glad it helped. Welcome 😊

@IgorAherne 4 ай бұрын

Thank you, that's a beautiful explanation! One thing I struggle understanding, is the term quantization blocks in 4:30 - why we need several of them. In my understanding from the video, we ponder about using 3 blocks of 16 bits to describe a number. Which is 48 bits and is more expensive than 32-bit float. But couldn't we just use 16*3 = 48 bits per number instead? Using 48 bits (without splitting it) would give us a very high precision within [0,1] range, due to powers-of-two I did ask GPT, and it responded that there exists a 'Scale Factor' and a 'Zero-Point', which are constants that shift and stretch the distribution in 6:02 Although I do understand these might be those quantization constants, - I am not entirely sure what the 64 blocks described in the video are 6:52 Is this because of the Rank of Matrix-Decompositions is 1, with 64 entries in both vectors?

@prathameshdinkar2966 Ай бұрын

Nicely explained! Keep the good work going!! 🤗

@AIBites Ай бұрын

Thank you 🙂

@AIBites Ай бұрын

Are you interested in more of theory or hands on implementation style videos? Your input will be very valuable 👍

@prathameshdinkar2966 Ай бұрын

@@AIBites I'm interested in more videos on concept understanding as the implementations are easily available

@robiulislamusa Ай бұрын

@@AIBites Yes. We all want

@haz5248 10 ай бұрын

That was really well explained with intuitative diagrams and explination. Thanks for the video , just subscribed .

@AIBites 10 ай бұрын

thank you! :)

@cacamaricano 9 ай бұрын

Thanks for connecting the dots!

@AIBites 9 ай бұрын

glad you liked! :)

@vuluu4942 11 ай бұрын

Thank you for the explanation! I find that it's very helpful!

@SudarakaYasindu 4 ай бұрын

Awesome explanation! ❤

@AIBites 4 ай бұрын

glad you think so and thank you indeed :)

@yayasy1362 4 ай бұрын

I don’t understand why you say that LoRA is fast for inference… in any case you need to forward through the full rank pretrained weights + low-rank finetuned weights.

@AIBites 4 ай бұрын

ah yes. If only we could quantize the weights, we can do better than the pre-trained weights. You are making a fair point here. Awesome and thank you! :)

@yayasy1362 4 ай бұрын

@@AIBites Yeah, if only we could replace the pretrained Full-Rank weights by the Low-Rank Weights... really nice video and illustrations! Thanks a lot!

@huitangtt 8 ай бұрын

Very well explained

@AIBites 7 ай бұрын

Thanks so much 😊

@JaishreeramCoder 8 ай бұрын

amazing explanation

@AIBites 8 ай бұрын

Glad you think so!

@yeduniya657 11 ай бұрын

Hey. I need your help. I have a curated set of notes and books and I wish to use it to finetune a model. How can it be done?

@AIBites 11 ай бұрын

would you like to see a fine-tuning video on text data? Would that be useful? Do you have any suggestions on the dataset I can show fine-tuning on?

@yeduniya657 10 ай бұрын

@@AIBites Yes, I have suggestion. Finetuning a model on my journal in which I have written about the truth of nonduality and illusionary nature of reality. I am also actively curating books on truth, and would love your help.

@haz5248 10 ай бұрын

@@AIBites That would be very help. There arent many good videos on fine tune out there.

@AIBites 9 ай бұрын

hope the fine-tuning video was of some help

@AIBites 9 ай бұрын

hope the fine-tuning video was of some help

@wilfredomartel7781 Жыл бұрын

Waiting to see it.

@AIBites Жыл бұрын

Sure! :)

@rahul.vpoojari6553 5 ай бұрын

Thank you sire

@AIBites 4 ай бұрын

my pleasure Rahul! :-)

@bharatbhusansau2996 2 ай бұрын

Bro, your statement from 05:22 is completely wrong and misguiding. LoRA is used for finetuning LLM models, when full-finetuning is not possible. It does so by freezing all model weights, and incorporating and training low-rank matrices(A*B) in Attention modules. LoRA speeds up training and reduces memory requirements but does not provide a speedup during inference. If LLM model is too large to be handled by LoRA due to GPU memory limitations, Quantized LoRA is used to finetune the model. Overall, QLoRA is a more advanced solution when LoRA alone cannot handle large models for finetuning.

@AIBites Ай бұрын

Thanks for your feedback. I think we are pretty much on the same page. Can you be more specific what I am wrong with? Unfortunately I won't be able to edit the video but can at leaset pin a message to viewers pointing the errata