QLoRA paper explained (Efficient Finetuning of Quantized LLMs)

  Рет қаралды 14,827

AI Bites

AI Bites

Күн бұрын

Пікірлер
@pierluigiurru962
@pierluigiurru962 7 ай бұрын
Your videos on LoRA finally made the concepts click for me. It was clearly explained! Thank you for the content you make
@AIBites
@AIBites 7 ай бұрын
Glad it helped. Welcome 😊
@IgorAherne
@IgorAherne 4 ай бұрын
Thank you, that's a beautiful explanation! One thing I struggle understanding, is the term quantization blocks in 4:30 - why we need several of them. In my understanding from the video, we ponder about using 3 blocks of 16 bits to describe a number. Which is 48 bits and is more expensive than 32-bit float. But couldn't we just use 16*3 = 48 bits per number instead? Using 48 bits (without splitting it) would give us a very high precision within [0,1] range, due to powers-of-two I did ask GPT, and it responded that there exists a 'Scale Factor' and a 'Zero-Point', which are constants that shift and stretch the distribution in 6:02 Although I do understand these might be those quantization constants, - I am not entirely sure what the 64 blocks described in the video are 6:52 Is this because of the Rank of Matrix-Decompositions is 1, with 64 entries in both vectors?
@prathameshdinkar2966
@prathameshdinkar2966 Ай бұрын
Nicely explained! Keep the good work going!! 🤗
@AIBites
@AIBites Ай бұрын
Thank you 🙂
@AIBites
@AIBites Ай бұрын
Are you interested in more of theory or hands on implementation style videos? Your input will be very valuable 👍
@prathameshdinkar2966
@prathameshdinkar2966 Ай бұрын
@@AIBites I'm interested in more videos on concept understanding as the implementations are easily available
@robiulislamusa
@robiulislamusa Ай бұрын
@@AIBites Yes. We all want
@haz5248
@haz5248 10 ай бұрын
That was really well explained with intuitative diagrams and explination. Thanks for the video , just subscribed .
@AIBites
@AIBites 10 ай бұрын
thank you! :)
@cacamaricano
@cacamaricano 9 ай бұрын
Thanks for connecting the dots!
@AIBites
@AIBites 9 ай бұрын
glad you liked! :)
@vuluu4942
@vuluu4942 11 ай бұрын
Thank you for the explanation! I find that it's very helpful!
@SudarakaYasindu
@SudarakaYasindu 4 ай бұрын
Awesome explanation! ❤
@AIBites
@AIBites 4 ай бұрын
glad you think so and thank you indeed :)
@yayasy1362
@yayasy1362 4 ай бұрын
I don’t understand why you say that LoRA is fast for inference… in any case you need to forward through the full rank pretrained weights + low-rank finetuned weights.
@AIBites
@AIBites 4 ай бұрын
ah yes. If only we could quantize the weights, we can do better than the pre-trained weights. You are making a fair point here. Awesome and thank you! :)
@yayasy1362
@yayasy1362 4 ай бұрын
@@AIBites Yeah, if only we could replace the pretrained Full-Rank weights by the Low-Rank Weights... really nice video and illustrations! Thanks a lot!
@huitangtt
@huitangtt 8 ай бұрын
Very well explained
@AIBites
@AIBites 7 ай бұрын
Thanks so much 😊
@JaishreeramCoder
@JaishreeramCoder 8 ай бұрын
amazing explanation
@AIBites
@AIBites 8 ай бұрын
Glad you think so!
@yeduniya657
@yeduniya657 11 ай бұрын
Hey. I need your help. I have a curated set of notes and books and I wish to use it to finetune a model. How can it be done?
@AIBites
@AIBites 11 ай бұрын
would you like to see a fine-tuning video on text data? Would that be useful? Do you have any suggestions on the dataset I can show fine-tuning on?
@yeduniya657
@yeduniya657 10 ай бұрын
@@AIBites Yes, I have suggestion. Finetuning a model on my journal in which I have written about the truth of nonduality and illusionary nature of reality. I am also actively curating books on truth, and would love your help.
@haz5248
@haz5248 10 ай бұрын
@@AIBites That would be very help. There arent many good videos on fine tune out there.
@AIBites
@AIBites 9 ай бұрын
hope the fine-tuning video was of some help
@AIBites
@AIBites 9 ай бұрын
hope the fine-tuning video was of some help
@wilfredomartel7781
@wilfredomartel7781 Жыл бұрын
Waiting to see it.
@AIBites
@AIBites Жыл бұрын
Sure! :)
@rahul.vpoojari6553
@rahul.vpoojari6553 5 ай бұрын
Thank you sire
@AIBites
@AIBites 4 ай бұрын
my pleasure Rahul! :-)
@bharatbhusansau2996
@bharatbhusansau2996 2 ай бұрын
Bro, your statement from 05:22 is completely wrong and misguiding. LoRA is used for finetuning LLM models, when full-finetuning is not possible. It does so by freezing all model weights, and incorporating and training low-rank matrices(A*B) in Attention modules. LoRA speeds up training and reduces memory requirements but does not provide a speedup during inference. If LLM model is too large to be handled by LoRA due to GPU memory limitations, Quantized LoRA is used to finetune the model. Overall, QLoRA is a more advanced solution when LoRA alone cannot handle large models for finetuning.
@AIBites
@AIBites Ай бұрын
Thanks for your feedback. I think we are pretty much on the same page. Can you be more specific what I am wrong with? Unfortunately I won't be able to edit the video but can at leaset pin a message to viewers pointing the errata
Quantization in Deep Learning (LLMs)
13:04
AI Bites
Рет қаралды 8 М.
JISOO - ‘꽃(FLOWER)’ M/V
3:05
BLACKPINK
Рет қаралды 137 МЛН
Every team from the Bracket Buster! Who ya got? 😏
0:53
FailArmy Shorts
Рет қаралды 13 МЛН
LoRA explained (and a bit about precision and quantization)
17:07
Which Quantization Method is Right for You? (GPTQ vs. GGUF vs. AWQ)
15:51
Maarten Grootendorst
Рет қаралды 22 М.
Why LLMs Are Going to a Dead End Explained | AGI Lambda
14:46
AGI Lambda
Рет қаралды 8 М.
QLoRA: Efficient Finetuning of Quantized LLMs | Tim Dettmers
30:48
Applied Machine Learning Days
Рет қаралды 3,4 М.
"okay, but I want Llama 3 for my specific use case" - Here's how
24:20
JISOO - ‘꽃(FLOWER)’ M/V
3:05
BLACKPINK
Рет қаралды 137 МЛН