What is LoRA? Low-Rank Adaptation for finetuning LLMs EXPLAINED

  Рет қаралды 50,870

AI Coffee Break with Letitia

AI Coffee Break with Letitia

Күн бұрын

Пікірлер: 80
@MikeTon
@MikeTon Жыл бұрын
Insightful : Especially the comparison from LORA to prefix tuning and adapters at the end!
@AICoffeeBreak
@AICoffeeBreak Жыл бұрын
Thank you! Glad you liked it.
@michelcusteau3184
@michelcusteau3184 11 ай бұрын
By far the clearest explanation on youtube
@AICoffeeBreak
@AICoffeeBreak 11 ай бұрын
Thank you very much for the visit and for leaving this heartwarming comment!
@elinetshaaf75
@elinetshaaf75 11 ай бұрын
true!
@AnthonyGarland
@AnthonyGarland Жыл бұрын
Thanks!
@AICoffeeBreak
@AICoffeeBreak Жыл бұрын
Wow, thanks a lot! 😁
@moeinhasani8718
@moeinhasani8718 Ай бұрын
Thanks!
@AICoffeeBreak
@AICoffeeBreak Ай бұрын
Wow, thank You!
@wholenutsanddonuts5741
@wholenutsanddonuts5741 Жыл бұрын
I’ve been using LoRAs for a while now but didn’t have a great understanding of how they work. Thank you for the explainer!
@wholenutsanddonuts5741
@wholenutsanddonuts5741 Жыл бұрын
I assume this works the same for diffusion models like stable diffusion?
@AICoffeeBreak
@AICoffeeBreak Жыл бұрын
For any neural network. You just need to figure out based on your application which matrices you should reduce and which not.
@wholenutsanddonuts5741
@wholenutsanddonuts5741 Жыл бұрын
@@AICoffeeBreak so super easy then! 😂 seriously though that’s awesome to know!
@rockapedra1130
@rockapedra1130 Жыл бұрын
Perfect. This exactly what I wanted to know. "Bite-sized" is right!
@SoulessGinge
@SoulessGinge 10 ай бұрын
Very clear and straightforward. The explanation of matrix rank was especially helpful. Thank you for the video.
@AICoffeeBreak
@AICoffeeBreak 10 ай бұрын
Thank You for the visit! hope to see you again soon!
@azmathmoosa4324
@azmathmoosa4324 6 күн бұрын
good concise explanation.
@keshavsingh489
@keshavsingh489 Жыл бұрын
So simple explanation, thank you soo much!!
@deviprasadkhatua
@deviprasadkhatua Жыл бұрын
Excellent explaination. Thanks!
@AICoffeeBreak
@AICoffeeBreak Жыл бұрын
Glad you enjoyed it!
@minkijung3
@minkijung3 10 ай бұрын
Thanks Letitia. Your explanation was very clear and helpful to understand the paper.
@AICoffeeBreak
@AICoffeeBreak 10 ай бұрын
I'm so glad it's helpful to you!
@DerPylz
@DerPylz Жыл бұрын
Yay, thanks!
@katorea
@katorea 24 күн бұрын
Loved your explanation! thank you very much!! :D
@AICoffeeBreak
@AICoffeeBreak 24 күн бұрын
@kindoblue
@kindoblue Жыл бұрын
Loved the explanation. Thanks
@Lanc840930
@Lanc840930 Жыл бұрын
Very comprehensive explanation! Thank you
@Lanc840930
@Lanc840930 Жыл бұрын
Thanks a lot. And I have a question for “linear dependence ”, is this mention in original paper?
@AICoffeeBreak
@AICoffeeBreak Жыл бұрын
The paper talks about the rank of a matrix, so about linear dependency between rows / columns.
@Lanc840930
@Lanc840930 Жыл бұрын
oh, I see! Thank you 😊
@thecodest2498
@thecodest2498 10 ай бұрын
Thank you sooooo much for this video. I started reading the paper, was very terrified by it, then I thought I should watch some KZbin video, watch one video, was asleep half-way through the video. Woke up again and stumbled across your video, your coffee woke me up and now I got the LoRA. Thanks for your efforts.
@AICoffeeBreak
@AICoffeeBreak 10 ай бұрын
Wow, this warms my coffee heart, thanks!
@GaryGan-US
@GaryGan-US Ай бұрын
very concise; what an amazing video.
@AICoffeeBreak
@AICoffeeBreak Ай бұрын
Thank you!
@m.rr.c.1570
@m.rr.c.1570 10 ай бұрын
Thank you for clearing my concepts regarding LoRA
@AICoffeeBreak
@AICoffeeBreak 10 ай бұрын
@ambivalentrecord
@ambivalentrecord Жыл бұрын
Great explanation Letitia
@AICoffeeBreak
@AICoffeeBreak Жыл бұрын
Glad you think so! 😄
@darrensapalo
@darrensapalo 5 ай бұрын
Great explanation. Thank you!
@outliier
@outliier Жыл бұрын
What a great topic!
@BogdanOfficalPage
@BogdanOfficalPage 2 ай бұрын
Wow! Just great ❤
@amelieschreiber6502
@amelieschreiber6502 Жыл бұрын
LoRA is awesome! It also helps with overfitting in protein language models as well. Cool video!
@jarj5313
@jarj5313 7 ай бұрын
THANKS THAT WAS GREAT EXPLANATION
@karndeepsingh
@karndeepsingh Жыл бұрын
Thanks again for amazing video. I would also request a detailed video on Flash Attention. Thanks
@AICoffeeBreak
@AICoffeeBreak Жыл бұрын
Noted. It's on The List. Thanks! 😄
@pranav_tushar_sg
@pranav_tushar_sg Жыл бұрын
thanks!
@AICoffeeBreak
@AICoffeeBreak Жыл бұрын
You're welcome!
@varun-h9e
@varun-h9e 11 ай бұрын
Firstly thanks for the amazing video. Can you also make a video about QLoRA.
@butterkaffee910
@butterkaffee910 Жыл бұрын
I love lora ❤ even for vit's
@soulfuljourney22
@soulfuljourney22 6 ай бұрын
Concept of rank of a matrix,tauught in such an effective way
@AICoffeeBreak
@AICoffeeBreak 6 ай бұрын
Cheers!
@deepak_kori
@deepak_kori Жыл бұрын
You are just amazing >>> so beautiful so elegant just wow😇😇
@ArunkumarMTamil
@ArunkumarMTamil 8 ай бұрын
how is Lora fine-tuning track changes from creating two decomposition matrix? How the ΔW is determined?
@kunalnikam9112
@kunalnikam9112 8 ай бұрын
In LoRA, Wupdated = Wo + BA, where B and A are decomposed matrices with low ranks, so i wanted to ask you that what does the parameters of B and A represent like are they both the parameters of pre trained model, or both are the parameters of target dataset, or else one (B) represents pre-trained model parameters and the other (A) represents target dataset parameters, please answer as soon as possible
@yacinegaci2831
@yacinegaci2831 Жыл бұрын
Great explanation, thanks for the video! I have a lingering question about LoRA: Is it necessary to approximate the low-rank matrices of the difference weights (the Delta W in the video). Or can we reduce the size of the original weight matrices? If I understood the video correctly, at the end of LoRA training, I have the full parameters of the roginal model + the difference weights (in reduced size). My question is why can't I learn low rank matrices for the original weights as well?
@AICoffeeBreak
@AICoffeeBreak Жыл бұрын
Hi, in principle you can, even though I would expect you could lose some model performance. The idea of finetuning with LoRA is that the small finetuning updates should have low rank. matrices. BUT there is work using LoRA for pretraining, called ReLoRA. Here is the paper 👉 arxiv.org/pdf/2307.05695.pdf There is also this discussion on Reddit going on: 👉 www.reddit.com/r/MachineLearning/comments/13upogz/d_lora_weight_merge_every_n_step_for_pretraining/
@yacinegaci2831
@yacinegaci2831 10 ай бұрын
@@AICoffeeBreak Oh, that's amazing. Thanks for the answer, for the links, and for your great videos :)
@alirezafarzaneh2539
@alirezafarzaneh2539 7 ай бұрын
Thanks for the simple and educating video! If I'm not mistaken, prefix tuning is pretty much the same as embedding vectors in diffusion models! How cool is that? 😀
@AIShipped
@AIShipped Жыл бұрын
Why use weight matrixes to start with if you can use lora representation? Assuming you gain space, the only downside I can think of is the additional compute to get back the weight matrix. But that should be smaller then the gain of the speed up of backward propagation.
@AICoffeeBreak
@AICoffeeBreak Жыл бұрын
Thanks for this question. You do not actually start with the weight matrices, you learn A and B directly from which you reconstruct the delta W matrix. Sorry this was not clear enough in the video.
@bdennyw1
@bdennyw1 Жыл бұрын
Fantastic video as always. QLora is even better if you are GPU poor like me.
@onomatopeia891
@onomatopeia891 10 ай бұрын
Thanks! But how do we determine the correct rank? Is it just trial and error with the value of R?
@AICoffeeBreak
@AICoffeeBreak 10 ай бұрын
Exactly. At least so far. Maybe some theoretical understanding will come up in time.
@alislounge
@alislounge 8 ай бұрын
Which one is the most and which one is the least 'compute efficient'? Adapters, Prefix Tuning or LORA?
@dr.mikeybee
@dr.mikeybee Жыл бұрын
If we knew what abstractions were handled layer by layer, we could make sure that the individual layers were trained to completely learn those abstractions. Let's hope Max Tegmark's work on introspection get us there.
@terjeoseberg990
@terjeoseberg990 Жыл бұрын
I thought this was long range wide band radio communications.
@AICoffeeBreak
@AICoffeeBreak Жыл бұрын
🤣🤣
@Micetticat
@Micetticat Жыл бұрын
LoRA: how can it be so simple? 🤯
@AICoffeeBreak
@AICoffeeBreak Жыл бұрын
Kind of tells us that fine-tuning all parameters in an LM is overkill.
@davidromero1373
@davidromero1373 Жыл бұрын
Hi a question, can we use lora to just reduce the size of a model and run inference, or we have to train it always?
@AICoffeeBreak
@AICoffeeBreak Жыл бұрын
LoRA just reduces the size of the trainable parameters for fine-tuning. But the number of parameters of the original model stays the same.
@floriankowarsch8682
@floriankowarsch8682 Жыл бұрын
As always amazing content! 😌 It's perfect to refresh knowledge & learn something new. I think interesting about LoRA is how strong it actually regularizes fine-tuning: Is it possible it overfit when using a very small matrix in LoRA? Can LoRA also harm optimization?
@TheRyulord
@TheRyulord Жыл бұрын
Still possible to overfit but more resistant to overfitting compared to a full finetune. All the work I've seen on LoRAs say that it's just as good as a full finetune in terms of task performance as long as your rank is high enough for the task. What's interesting is that the necessary rank is usually quite low (around 2) even for relatively big models (llama 7B) and reasonable complex tasks. At least that's all the case for language modelling. Might be different for other domains.
@mkamp
@mkamp Жыл бұрын
Absolutely awesome explanation. Would like to get your take on LoRA vs (IA)**3 as well. It seems that people still prefer LoRA over (IA)**3 even though the latter has a slightly higher performance?
@ryanhewitt9902
@ryanhewitt9902 Жыл бұрын
Aren't we effectively using the same kind of trick when we train the transformer encoder / self-attention block? Assuming row vectors, we can use the form W_v⋅v.T⋅k⋅W_k.T⋅W_q⋅q.T. Ignoring the *application* of attention and focusing its calculation, we get the form k⋅W_k.T⋅W_q⋅q.T . Since W_k and W_q are projection matrices from embedding length to dimension D_k, we have the same sort of low rank decomposition where D_k corresponds to "r" in your video. Is that right?
@mesochild
@mesochild 8 ай бұрын
what do i have to learn to understand this help please
@dineth9d
@dineth9d Жыл бұрын
Thanks!
@AICoffeeBreak
@AICoffeeBreak Жыл бұрын
Welcome! :)
LoRA & QLoRA Fine-tuning Explained In-Depth
14:39
Entry Point AI
Рет қаралды 54 М.
Don’t Choose The Wrong Box 😱
00:41
Topper Guild
Рет қаралды 62 МЛН
Quando eu quero Sushi (sem desperdiçar) 🍣
00:26
Los Wagners
Рет қаралды 15 МЛН
Мясо вегана? 🧐 @Whatthefshow
01:01
История одного вокалиста
Рет қаралды 7 МЛН
LoRA explained (and a bit about precision and quantization)
17:07
The moment we stopped understanding AI [AlexNet]
17:38
Welch Labs
Рет қаралды 1,5 МЛН
How RAG Turns AI Chatbots Into Something Practical
10:18
bycloud
Рет қаралды 54 М.
How might LLMs store facts | DL7
22:43
3Blue1Brown
Рет қаралды 917 М.
Why Does Diffusion Work Better than Auto-Regression?
20:18
Algorithmic Simplicity
Рет қаралды 407 М.
EASIEST Way to Fine-Tune a LLM and Use It With Ollama
5:18
warpdotdev
Рет қаралды 198 М.
History of AI Reasoning (AlphaGo, MuZero, LLMs)
17:24
Art of the Problem
Рет қаралды 72 М.
Fine-tuning Large Language Models (LLMs) | w/ Example Code
28:18
Shaw Talebi
Рет қаралды 382 М.
Don’t Choose The Wrong Box 😱
00:41
Topper Guild
Рет қаралды 62 МЛН