QLoRA-How to Fine-tune an LLM on a Single GPU (w/ Python Code)

  Рет қаралды 75,020

Shaw Talebi

Shaw Talebi

Күн бұрын

Пікірлер: 177
@ShawhinTalebi
@ShawhinTalebi 11 ай бұрын
👉More on LLMs: kzbin.info/aero/PLz-ep5RbHosU2hnz5ejezwaYpdMutMVB0 -- References [1] Fine-tuning LLMs: kzbin.info/www/bejne/m3SZeZdnnauppdU [2] ZeRO paper: arxiv.org/abs/1910.02054 [3] QLoRA paper: arxiv.org/abs/2305.14314 [4] Phi-1 paper: arxiv.org/abs/2306.11644 [5] LoRA paper: arxiv.org/abs/2106.09685
@mouadkrikbou4596
@mouadkrikbou4596 11 ай бұрын
well done ! well explained, I am a data scientist as well and love your videos, a lot of work behind the scenes to bring the koncepts in such simple yet interactive way!! many thanks Shawhin !!
@ShawhinTalebi
@ShawhinTalebi 11 ай бұрын
@@mouadkrikbou4596 Thanks! This one took longer than usual to put together, so glad you enjoyed it :)
@SpicyMelonYT
@SpicyMelonYT 7 ай бұрын
@@ShawhinTalebi somehow I feel like the most appropriate response you should have gave to comments in this thread is with using the ShawGPT LOL, but yeah mad respect for ya, this was great to watch!
@jinghao7708
@jinghao7708 2 ай бұрын
I am from math, and a lot of times I felt the videos I saw are too vague. Your videos are very clear, and at the same time easy to understand. This is quite amazing. Appreciate your work!
@Ali-me4tv
@Ali-me4tv 11 ай бұрын
So far the best explanation on KZbin about this topic
@stanvanillo9831
@stanvanillo9831 8 ай бұрын
Funny how this comment is under every LLM video :D
@soonheng1577
@soonheng1577 9 ай бұрын
wow, you are the genius of explaining super hard math concept into layman understandable terms with good visual representation. Keep it coming.
@manyagupta6375
@manyagupta6375 11 ай бұрын
Your explanations are amazing and the content is great. This is the best playlist on LLMs on KZbin.
@chris_zazzman
@chris_zazzman 10 ай бұрын
Amazing work Shaw - complex concepts broken down to 'bit-sized bytes' for humans. Appreciate your time & efforts :)
@teslarocks
@teslarocks Ай бұрын
Thank you so much for making this video! As pointed out by other people, your video is structured so well and so easy to understand.
@nargesmohammadi4126
@nargesmohammadi4126 Ай бұрын
Great content and concise explanation! I am really impressed with your technical knowledge and teaching skills.
@FrancescoFiamingo99
@FrancescoFiamingo99 7 ай бұрын
dear Shaw, i listen to the video so many times, and aside that is extremely well done and i learn so much, you should emphasize (or even do an ad hoc video) the fact that key for the finetuning with "one" gpu is the usage of the "quantifized" model of mistral, overall i m sure that many users, wodul like to know more about this models, i m sure that not many knows how to use the most important LMM (quantized) on their own colab or even pc.....like base of their own application.... :)
@ShawhinTalebi
@ShawhinTalebi 7 ай бұрын
Thanks for the great suggestion! I'm organizing a fine-tuning series and will be sure to discuss quantization a bit more :)
@jazzny001
@jazzny001 4 ай бұрын
thank you so much for the lecture. It really helps me understand the concept of QLoRA.
@africanbuffalo
@africanbuffalo 11 ай бұрын
Thank you Shaw for yet another awesome video succinctly explaining complex topics!
@ShawhinTalebi
@ShawhinTalebi 11 ай бұрын
Happy to help!
@TonyCerone
@TonyCerone 13 күн бұрын
Very amazing and instructive @Shaw Thank yo for this great constent !
@chifrijo5862
@chifrijo5862 3 ай бұрын
I believe the 264M parameters is because you are only targeting one layer ('q_proj'). I don't know what other linear layers Mistral 7B has, but they should be included for best training performance. Correct me if I'm wrong and apologies if this has been answered already. Great video!
@liubovnesterenko956
@liubovnesterenko956 10 ай бұрын
Thank you for this amazing video, great explanations, very clear and easy to understand!
@operitivo4635
@operitivo4635 8 ай бұрын
First I thought omg this video is horrible but its actually excellent! (I wanted a practical fast way to get my LLM finetuned using my own data, but found it really isnt that easy). After this I understood a lot better what is going on in the background.
@ShawhinTalebi
@ShawhinTalebi 8 ай бұрын
Glad it was helpful :)
@MrCancerbero1983
@MrCancerbero1983 11 ай бұрын
This is the best explanation that i've ever heard, thanks for all the work!!
@bim-techs
@bim-techs 10 ай бұрын
Amazing video ! You are the best, man ! Thank you so much.
@RohitJain-ls2ov
@RohitJain-ls2ov 11 ай бұрын
Exactly what I was looking for! Thanks for the video. Keep going!
@ShawhinTalebi
@ShawhinTalebi 11 ай бұрын
Great to hear :)
@ifycadeau
@ifycadeau 11 ай бұрын
Another fire video in the books!
@ShawhinTalebi
@ShawhinTalebi 11 ай бұрын
Thanks! 🔥🔥😂
@donerbob8514
@donerbob8514 5 ай бұрын
Very intersting video even for more experienced watchers
11 ай бұрын
Learned a lot. Great video and very accessible. Well Done!
@ShawhinTalebi
@ShawhinTalebi 11 ай бұрын
Great to hear! Glad it was helpful :)
@el_artmaga_
@el_artmaga_ 11 ай бұрын
Great video and your slides are very well organized!
@ShawhinTalebi
@ShawhinTalebi 11 ай бұрын
Glad you like them!
@BobTheZealot
@BobTheZealot 11 ай бұрын
Great content, thank you!
@younespiro
@younespiro 11 ай бұрын
thank u for sharing this knowledge , we need more videos like this
@ShawhinTalebi
@ShawhinTalebi 11 ай бұрын
Happy to help! More to come :)
@aldotanca9430
@aldotanca9430 11 ай бұрын
Loved this, very informative and clear!
@ShawhinTalebi
@ShawhinTalebi 11 ай бұрын
Thanks Aldo!
@ai4sme
@ai4sme 11 ай бұрын
Amazing explanation!!! Thank you Shaw!
@ShawhinTalebi
@ShawhinTalebi 11 ай бұрын
Happy to help!
@telmorubioetxabe4638
@telmorubioetxabe4638 7 ай бұрын
Amazing work! Thanks mate :)
@madhurjindal1364
@madhurjindal1364 11 ай бұрын
Man, you are amazing!
@wilfredomartel7781
@wilfredomartel7781 11 ай бұрын
❤ really amazing work
@BackSideStory
@BackSideStory 20 күн бұрын
Good tutorial 🎉
@Wonderfulworldhahaha
@Wonderfulworldhahaha 9 ай бұрын
Great content, Thank you.
@guyhod3235
@guyhod3235 3 ай бұрын
Thank you for that video. I was wondering about the fine-tuning process - does the model learn to predict all tokens (include the instructions)? or does it learn to predict only non-instructions tokens?
@ShawhinTalebi
@ShawhinTalebi 3 ай бұрын
Good question. The instruction tokens are fed into the model, so it does not predict them. However, the model does learn to identify via fine-tuning so it's important to use a consist structure for the training examples.
@danishafzalkhan
@danishafzalkhan 10 ай бұрын
This was a value bomb!
@medec021
@medec021 9 ай бұрын
Awesome, thank you
@Holonet01
@Holonet01 5 ай бұрын
Nice tutorial, but I'm confused - 33:35, why do we pass in the tokenized data there? If I try that, it fails because there's not "test" or "train" in the results. The docs say that's expecting the dataset, not the tokenized data. The results of the tokenize function return input_ids & an attention_mask arrays.
@ShawhinTalebi
@ShawhinTalebi 5 ай бұрын
These are the data we will use for training the model. It needs to be tokenized because the base model is expecting the text to be encoded in a specific way. We're you getting an error when running this example on Colab?
@pawan3133
@pawan3133 9 ай бұрын
Beautifully explained, thanks!!! When you said, for PEFT "we augment the model with additional parameters that are trainable", how do we add these parameters exactly? Do we add a new layer? Also, when we say "%trainable parameters out of total parameters", doesn't that mean that we are updating a certain % of original parameters?
@ShawhinTalebi
@ShawhinTalebi 9 ай бұрын
I explain how LoRA works here: kzbin.info/www/bejne/m3SZeZdnnauppdUsi=_3PK3Kj4Zxs844qg&t=6 Good question. We do not touch any of the original parameters. This just done to give a sense of the relative computational savings of PEFT.
@CharanTeja_27
@CharanTeja_27 7 ай бұрын
Your explanation on this topic is amazing bro. Can you make a video on how to train model on our custom data (to be specific excel/csv data) using open source model like llama3. many are explaining on how to train datasets that have only input and output but no one is explaining about how to train on excel/csv data. It ill be helpful to my project if can make a video of it.
@ShawhinTalebi
@ShawhinTalebi 7 ай бұрын
Great suggestion! Could you share more about the the use case you have in mind?
@CharanTeja_27
@CharanTeja_27 7 ай бұрын
@@ShawhinTalebi For example i have an excel data of different companies and their investments , yearly profit and sales growth etc.. now i want to train my own model such that if i ask any question from the data it should give me answer. (ex. what is the current price of the company xyz. output : $123M)
@dennou2012
@dennou2012 11 ай бұрын
Great content!!
@ZixuanLiu-ld9qm
@ZixuanLiu-ld9qm 7 ай бұрын
Thank you so much for the video! Just one question, did you use free colab or colab pro or did you pay for the gpu? thank you so much!
@ShawhinTalebi
@ShawhinTalebi 6 ай бұрын
I have paid but I used the T4 GPU which is part of the free version. The only difference is the free is subject to access restrictions during peak usage times.
@ZixuanLiu-ld9qm
@ZixuanLiu-ld9qm 6 ай бұрын
@@ShawhinTalebi Thank you so much!!
@Blooper1980
@Blooper1980 11 ай бұрын
Thanks for this!!
@SmartTech-m1u
@SmartTech-m1u 5 ай бұрын
at 8:30 there is written 4 bit equals to 0101 which is not true because that binary number is 5 in decimal and 4 is 0100.
@ShawhinTalebi
@ShawhinTalebi 5 ай бұрын
4-bit is referring to the number of binary digits used to represent the number. 0101 is a random example of a number express in 4-bits which is indeed 5 in the decimal system.
@trsd8640
@trsd8640 11 ай бұрын
Thank you for this great video! If you find a way to get this working on Apple silicon machines, we would love to see a video about it!
@ShawhinTalebi
@ShawhinTalebi 11 ай бұрын
Thanks for the suggestion! Once I get something working I'll be sure to share it.
@narendraparmar1631
@narendraparmar1631 10 ай бұрын
Thanks Shaw
@dhirajkumarsahu999
@dhirajkumarsahu999 9 ай бұрын
Thank you so much, I have one doubt please, even if we set fp16 = True, still the optimization would happen in fp32 right, like you showed at 20:22
@ShawhinTalebi
@ShawhinTalebi 8 ай бұрын
This enables mixed precision training which (to my understanding) uses FP16 to do most operations and only stores the optimizer states in FP32.
@dhirajkumarsahu999
@dhirajkumarsahu999 8 ай бұрын
@@ShawhinTalebi got it, Thank you
@Etienne_O
@Etienne_O 9 ай бұрын
Thank you for sharing this! Have you tried Fine-tune Mixture of Experts like Mixtral 8x7B. Is the process really different? I want to do some testing of my own in the next week. Do you think this requires the same amount of vram as a 7b model or more? I have a macbook m3 pro max with 128gb of shared memory and a mac studio with 196gb of shared memory.
@ShawhinTalebi
@ShawhinTalebi 9 ай бұрын
I haven't played with Mixtral 8x7B yet, so I don't have much insight. Hope to cover this in a future video :)
@ShawhinTalebi
@ShawhinTalebi 6 ай бұрын
I just did a similar example on Mac: kzbin.info/www/bejne/aYGsopuah9-brqc Given your specs, you have more than enough to fine-tune Mixtral 8x7B
@felixmuller9062
@felixmuller9062 11 ай бұрын
Great Video!! How much GPU memory did you need in the end to fine tune mistral 7b?
@ShawhinTalebi
@ShawhinTalebi 10 ай бұрын
Glad you liked it! It runs on Colab with 12.7GB system RAM and 15GB GPU RAM. I don't think it went above 10GB of GPU utilization.
@naehalmulazim
@naehalmulazim 10 ай бұрын
Thank you SO much for covering this, Sir! Small question: If I want to fine-tune a model to understand a new coding language whose syntax is similar to C++, any loose ideas or direction I would go about it in?
@ShawhinTalebi
@ShawhinTalebi 10 ай бұрын
There are many ways you can go about this. While I haven't done anything like that, I'd try taking an existing programming model like CodeLlama and doing self-supervised fine-tuning on example code with docstring-like comments.
@naehalmulazim
@naehalmulazim 10 ай бұрын
@@ShawhinTalebi Thank you so much! Whether QLora would be used there or should I not use any PEFT fine tuning and go for a full fine-tune would depend on experiments I guess.
@PayneMaximus
@PayneMaximus 4 ай бұрын
Why is it needed to fine-tune the model if we can use prompt engineering to make sure the response of the model fits the conditions we are looking for?
@ShawhinTalebi
@ShawhinTalebi 4 ай бұрын
Great question! If prompt engineering provides the needed performance than it may not be needed. However, the key benefit in such cases is lowering inference costs since it can lead to shorter prompts.
@fl028
@fl028 8 ай бұрын
Hi everyone, I have a question about data preparation and fine tuning of LLMs. What should the data format look like in the fine tuning process? On the one hand, it can be pure text to add special knowledge to the LLM. On the other hand, the data set can be structured in question and answer / prompt and answer format. What do you think? Do you have any recommendations for me? Thank you and best regards!
@ShawhinTalebi
@ShawhinTalebi 8 ай бұрын
Great question. This comes down to the use case. If you want to instill knowledge into the model the former is suitable, while if you want the model to response in a particular format that latter might work better.
@CRCaritas
@CRCaritas 10 ай бұрын
Thank you for the video! Just a small question. At the end, how would you run inference with your fine-tuned model? Do you save it first to the hub and then load it again? I'm not really sure how to correctly apply the lora adapter to the original model after fine-tuning.
@ShawhinTalebi
@ShawhinTalebi 10 ай бұрын
Yes, that's how I do it here! There's example code for this in the Colab under "Load Fine-tuned Model" colab.research.google.com/drive/1AErkPgDderPW0dgE230OOjEysd0QV1sR?usp=sharing
@pepeballesteros9488
@pepeballesteros9488 9 ай бұрын
What's the loss function for this NLP task? I mean, What is the quantitative measure that determines a good response from a bad one?
@ShawhinTalebi
@ShawhinTalebi 9 ай бұрын
I believe cross entropy is used here.
@pepeballesteros9488
@pepeballesteros9488 9 ай бұрын
@@ShawhinTalebi Cheers, I'll look it out. Amazing content Shaw!
@eduardmart1237
@eduardmart1237 6 ай бұрын
Is it feasible to train 7b qlora model on cpu only?
@ShawhinTalebi
@ShawhinTalebi 6 ай бұрын
While I haven’t tried this, you can definitely do it. The key is to have at least 16GB of RAM (and some patience). Not sure how long it would take. This post might be helpful: www.reddit.com/r/LocalLLaMA/s/uqTTi6CLnS
@Eliot-nr7zq
@Eliot-nr7zq 11 ай бұрын
Thank you for sharing this fantastic video! Would it be worthwhile to explore a similar approach using unsupervised learning?
@ShawhinTalebi
@ShawhinTalebi 11 ай бұрын
Glad you liked it! When it comes to fine-tuning, the closest thing would be semi-supervised learning. This could make sense if trying to further train a model on a knowledge base (e.g. sklearn documentation). However, empirically it seems fine-tuning tends to be a less effective way to endow a model with specialized knowledge compared to a RAG system.
@pisthaoct03
@pisthaoct03 10 ай бұрын
Thanks for sharing. I have a question, instead of quantized model, can i load the base mistral model and follow this process ?
@ShawhinTalebi
@ShawhinTalebi 10 ай бұрын
Yes, given that you have enough memory for the model.
@ahmadalhineidi6414
@ahmadalhineidi6414 9 ай бұрын
Great video and explanation! Thanks a lot. For the code, have you tried to use: from transformers import BitsAndBytesConfig nf4_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True, bnb_4bit_compute_dtype=torch.bfloat16 ) and then add that as quantization configs when loading the model? This would include the other aspects from the QLoRA paper, no?
@ShawhinTalebi
@ShawhinTalebi 9 ай бұрын
Thanks for sharing! I'll need to try that out. I remember running into issues when trying this on my first pass.
@thonnatigopi4962
@thonnatigopi4962 8 ай бұрын
how do we evalute the model with blue score after training plz make a video or notebook
@ShawhinTalebi
@ShawhinTalebi 7 ай бұрын
Thanks for the suggestion! I plan to do a video on LLM eval, and I'll be sure to touch on this.
@FrancescoFiamingo99
@FrancescoFiamingo99 9 ай бұрын
dear Shaw, i m passionated old guy (i m 54 :)) in AI, is amazing how u can explain in simple words concepts , that even an old Mammoth like me can understand , of course , being a total "artisan" in this field , my job is totally different, i m facing problems that will look very simple at your eyes, usually i ask support to chat gpt 4 to learn , understand and correct, but this argument and some of the python libreries are too recent and not yet in the last version of chat gpt 4 , so i need your help, i m not using COLAB becosue i have already similar set on my machine, (16 + 16 like in your example) and i dowload both the model and the data set in my machine , but i m getting this error : ImportError: Found an incompatible version of auto-gptq. Found version 0.3.1, but only version above 0.4.99 are supported i tried to upgrade my version but seems no :ERROR: No matching distribution found for auto-gptq== (any higher then 0.3.1) how can solve the problem?
@ShawhinTalebi
@ShawhinTalebi 9 ай бұрын
It seems like an issue setting up the environment. You can try manually setting the package versions when installing them on your machine based on the Google Colab code.
@FrancescoFiamingo99
@FrancescoFiamingo99 9 ай бұрын
@@ShawhinTalebi i will try and let u know tks for feedback
@FrancescoFiamingo99
@FrancescoFiamingo99 8 ай бұрын
i did it!!! following all your steps in colab tks a lot!!!!
@ccapp3389
@ccapp3389 10 ай бұрын
Good stuff
@edsonjr6972
@edsonjr6972 11 ай бұрын
my guess is that the q_proj has 264M parameters, and thats why it's showing only that.
@ShawhinTalebi
@ShawhinTalebi 11 ай бұрын
Wouldn't that make it 264M trainable parameters then?
@itchainx4375
@itchainx4375 10 ай бұрын
@@ShawhinTalebi The training is for a smaller low rank matrix.
@itchainx4375
@itchainx4375 10 ай бұрын
Not for this reason, you can try changing target_modules to see changes in training parameters
@BenLewisE
@BenLewisE 10 ай бұрын
@@ShawhinTalebi I believe that @edsonjr6972 is right and that the trainable parameters is reduced significantly, because you are not _just_ targeting only certain layers, but also you are using LoRA decomposition to smaller low-rank matrices, and so the 264M is the probably the number of all of the parameters in the `q_proj` layers and then the 2M is the ~1% of those parameters that you are actually training due to LoRA
@ShawhinTalebi
@ShawhinTalebi 7 ай бұрын
@@BenLewisE Thanks for the clarification!
@nimesh.akalanka
@nimesh.akalanka 9 ай бұрын
How can I fine-tune the LLAMA 3 8B model for free on my local hardware, specifically a ThinkStation P620 Tower Workstation with an AMD Ryzen Threadripper PRO 5945WX processor, 128 GB DDR4 RAM, and two NVIDIA RTX A4000 16GB GPUs in SLI? I am new to this and have prepared a dataset for training. Is this feasible?
@ShawhinTalebi
@ShawhinTalebi 9 ай бұрын
That's a lot of firepower! You should be able to do full fine-tuning with that set up. Perhaps you can try using the example code as a jumping off point.
@jinghao7708
@jinghao7708 2 ай бұрын
is the dropout set to 0.05 or 0.5? I heard you say 0.5, but the codes shows 0.05.
@ShawhinTalebi
@ShawhinTalebi 2 ай бұрын
I probably misspoke. The truth is in the code
@lalmimaj
@lalmimaj 11 ай бұрын
Hi, how long did it take you to you to fintune Mistral in this example?
@ShawhinTalebi
@ShawhinTalebi 10 ай бұрын
Took about 10 min to run in Colab
@jjen9595
@jjen9595 11 ай бұрын
nice video bro
@anmolshah
@anmolshah 7 ай бұрын
the video is now 3 months old, are ther any updates to the shown code?
@A_Aphid
@A_Aphid 7 ай бұрын
I just tried it and it's working perfectly!
@anmolshah
@anmolshah 7 ай бұрын
@A_Aphid there are some depreciation warnings. Plus, I was hoping for some information about any newer models to use. Could you suggest any newer model?
@A_Aphid
@A_Aphid 7 ай бұрын
@@anmolshah Yea,, your right about them warnings.. And no I don't really have many suggestions on models. I'm just starting to get into AI so I'm not the best.. Sorry man
@ShawhinTalebi
@ShawhinTalebi 7 ай бұрын
I haven't made any updates yet. However, feel free to take the code and hack it to try it with different models. I plan to do future videos using with Llama3 and other fine-tuning approaches.
@A_Aphid
@A_Aphid 7 ай бұрын
@@ShawhinTalebi Will you make an videos on how to finetune models without requiring a NVIDIA GPU (since auto-gptq requires CUDA) so that they can run locally? Or is that just not recommended for AI? (Also great video it taught me a lot👍)
@pravingaikwad1337
@pravingaikwad1337 9 ай бұрын
Is it like the base model is stored in 4bit and as the data (X vector) passes through the layer that layer is first dequantized and then the matrix multiplication is done (X*W)? And the same thing for LoRA as well? and after we get Y (by adding output of lora and base layer) the W and LoRA layers are again quantized back to 4bit? and Y is passed on to next layer? Also, if the LoRA is at the base of the model, does that mean to update the parameters of this LoRA we need to calculate the gradients of loss wrt all the W and LoRA matrices above it?
@ShawhinTalebi
@ShawhinTalebi 9 ай бұрын
That's a great question. Honestly I'm not entirely sure, but what you said makes sense. For inference weights are dequantized layer by layer so that multiplication is possible with FP16 inputs, and no need to dequant LoRa weights since these are already FP16. No need to compute gradients for original parameters because those are frozen i.e. we treat those as constants.
@u04vw9
@u04vw9 9 ай бұрын
Have you solved the mac issue? Thanks!
@HarshvardhanKanthode
@HarshvardhanKanthode 9 ай бұрын
Lemme know as well, I was pretty bummed when I found out bitsandbytes doesn't work on M2
@ShawhinTalebi
@ShawhinTalebi 9 ай бұрын
Not yet. However, that Llama3 is out I have an excuse to spend more time with it. Hope to revisit this in June.
@ShawhinTalebi
@ShawhinTalebi 6 ай бұрын
Yes! kzbin.info/www/bejne/aYGsopuah9-brqc
@yotubecreators47
@yotubecreators47 10 ай бұрын
I can't save this video, do you know why, can you please enable saving videos to playlist
@ShawhinTalebi
@ShawhinTalebi 10 ай бұрын
That's strange. Are you still having this issue?
@yotubecreators47
@yotubecreators47 10 ай бұрын
No I can save it now :), thanks a lot
@nikandr8685
@nikandr8685 10 ай бұрын
When I tried this, I got this Exception: Cannot copy out of meta tensor; no data! This happens in this step: NotImplementedError Traceback (most recent call last) Cell In[45], line 2 1 # configure trainer ----> 2 trainer = transformers.Trainer( 3 model=model, 4 train_dataset=tokenized_data["train"], 5 eval_dataset=tokenized_data["test"], 6 args=training_args, 7 data_collator=data_collator 8 ) 10 # train model 11 model.config.use_cache = False # silence the warnings. Please re-enable for inference! Do you have any Idea?
@ShawhinTalebi
@ShawhinTalebi 10 ай бұрын
This link might be helpful: github.com/AUTOMATIC1111/stable-diffusion-webui/issues/13087
@nikandr8685
@nikandr8685 10 ай бұрын
@@ShawhinTalebi Thank you. For others with the same problem: This solved it for me: import sys sys.argv.append("--disable-model-loading-ram-optimization")
@Jordano7000
@Jordano7000 10 ай бұрын
Hey would this model work if i wanted to input a DNA sequence for example ATCGTGC and the model to repond with the gene name (for example Gene X)?
@ShawhinTalebi
@ShawhinTalebi 10 ай бұрын
I don't know honestly, but it's worth a try. LLMs have a funny way of surprising us.
@yangyang1412
@yangyang1412 9 ай бұрын
what is the minium vram spec for this tutorial
@ShawhinTalebi
@ShawhinTalebi 9 ай бұрын
Runs on Google Colab using 13GB of memory (6.5 CPU RAM + 6.5 VRAM).
@sparkledark3713
@sparkledark3713 10 ай бұрын
It's 264M parameters because it's the only ones which are trainable. Rest ones are frozen.
@ShawhinTalebi
@ShawhinTalebi 10 ай бұрын
Frozen from LoRA or something else?
@sparkledark3713
@sparkledark3713 9 ай бұрын
@@ShawhinTalebi Like the main model parameters are frozen except LoRAs parameters. Maybe that's why
@samyio4256
@samyio4256 10 ай бұрын
When you say "memory" do you mean RAM or VRAM?
@ShawhinTalebi
@ShawhinTalebi 10 ай бұрын
Both! QLoRA specifically uses Nvidia's unified memory feature.
@jeffg4686
@jeffg4686 11 ай бұрын
Any idea if GPTQ support is coming to Mac M1 at some point?
@ShawhinTalebi
@ShawhinTalebi 11 ай бұрын
I doubt it. There is an alternative format that works on Mac called GGUF.
@jeffg4686
@jeffg4686 11 ай бұрын
@@ShawhinTalebi- thanks
@gk_12344
@gk_12344 9 ай бұрын
Does it work with GGUF models ?
@ShawhinTalebi
@ShawhinTalebi 9 ай бұрын
I didn't try it, but I'm sure there is a way to do that.
@xi8t-gk1oi
@xi8t-gk1oi 11 ай бұрын
fp16=true causes training to fail with error "No inf checks were recorded for this optimizer.". Set fp16=false and training successfully completed but loss and eval loss are the same for every epoch.
@xi8t-gk1oi
@xi8t-gk1oi 11 ай бұрын
I am trying to fine tune on my own dataset with 20000 messages in format: msg_id, sender_id, content, reply_to, interval (between this and previous message) to generate similar messages with similar format.
@ShawhinTalebi
@ShawhinTalebi 11 ай бұрын
Are you running the provided script in Colab?
@xi8t-gk1oi
@xi8t-gk1oi 11 ай бұрын
@@ShawhinTalebi no, on my own machine
@MrCancerbero1983
@MrCancerbero1983 11 ай бұрын
Same here, but training doesn't take effect, i got the same answer after training
@MrCancerbero1983
@MrCancerbero1983 11 ай бұрын
i changed the torch version to match colab via pip install torch==2.1.0, and it work
@sapandeepsandhu4410
@sapandeepsandhu4410 10 ай бұрын
Enlightening journey through the intricacies of Large Language Model (LLM) optimization! 🌌🖥 Your adept presentation not only demystifies the process but also serves as a beacon of inspiration for both burgeoning and seasoned developers navigating the vast seas of AI technology. The elegance with which you delineated the nuances of QLoRA and its transformative approach to fine-tuning LLMs on a singular GPU setup is nothing short of revelatory. 📘✨ It's a masterclass in making advanced AI technologies accessible and practical for a wider audience, empowering individuals to harness the full potential of LLMs without the necessity for extensive computational resources.
@PranavBaviskar
@PranavBaviskar 9 ай бұрын
Getting Key Error: Mistral
@ShawhinTalebi
@ShawhinTalebi 9 ай бұрын
I'm not able to replicate that error. Are you running the example in Colab?
@vijayakrishnak
@vijayakrishnak 5 ай бұрын
I think he didn't log in with api key
@Ev4Nou4
@Ev4Nou4 10 ай бұрын
im so fucking lost
@ShawhinTalebi
@ShawhinTalebi 10 ай бұрын
This video goes pretty deep into the technical details. Watching some of the previous video in the series might help give more context. I also do office hours if you have any specific questions: calendly.com/shawhintalebi/office-hours
@jjen9595
@jjen9595 11 ай бұрын
i get this error OSError: TheBloke/dolphin-2.6-mistral-7B-dpo-laser-GGUF does not appear to have a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack. ¿que hago? :c
@ShawhinTalebi
@ShawhinTalebi 10 ай бұрын
Not sure, I haven't come across that one before
@jjen9595
@jjen9595 10 ай бұрын
@@ShawhinTalebi I solved it, you must put the correct model in the colab that is similar to the one you have, I still don't know how to make a meta for hugging face :c
@manuelbradovent3562
@manuelbradovent3562 10 ай бұрын
Great video, thanks !
@manuelbradovent3562
@manuelbradovent3562 10 ай бұрын
Additionally, probably also pruning was performed beside quantization in order to get such a low amount of trainable parameters.
@ShawhinTalebi
@ShawhinTalebi 10 ай бұрын
Thanks for the tip! I'll need to dig into that.
Fine-Tuning Text Embeddings For Domain-specific Search (w/ Python)
21:34
How to Improve LLMs with RAG (Overview + Python Code)
21:41
Shaw Talebi
Рет қаралды 99 М.
“Don’t stop the chances.”
00:44
ISSEI / いっせい
Рет қаралды 62 МЛН
小丑教训坏蛋 #小丑 #天使 #shorts
00:49
好人小丑
Рет қаралды 54 МЛН
Каха и дочка
00:28
К-Media
Рет қаралды 3,4 МЛН
IL'HAN - Qalqam | Official Music Video
03:17
Ilhan Ihsanov
Рет қаралды 700 М.
LoRA explained (and a bit about precision and quantization)
17:07
Local LLM Fine-tuning on Mac (M1 16GB)
24:12
Shaw Talebi
Рет қаралды 26 М.
Create Training Data for Finetuning LLMs
22:29
APC Mastery Path
Рет қаралды 8 М.
Fine-tuning Large Language Models (LLMs) | w/ Example Code
28:18
Shaw Talebi
Рет қаралды 406 М.
Prompt Engineering: How to Trick AI into Solving Your Problems
29:58
The EASIEST way to finetune LLAMA-v2 on local machine!
17:26
Abhishek Thakur
Рет қаралды 179 М.
Linus Torvalds: Speaks on Hype and the Future of AI
9:02
SavvyNik
Рет қаралды 352 М.
LoRA & QLoRA Fine-tuning Explained In-Depth
14:39
Entry Point AI
Рет қаралды 60 М.
“Don’t stop the chances.”
00:44
ISSEI / いっせい
Рет қаралды 62 МЛН