🐐Llama 3 Fine-Tune with RLHF [Free Colab 👇🏽]

Рет қаралды 13,755

Күн бұрын

In this video, I'll show you the easiest, simplest and fastest way to fine tune llama-v2 on your local machine for a custom dataset! You can also use the tutorial to train/finetune any other Large Language Model (LLM). In this tutorial, we will be using reinforcement learning with human feed back(rlhf) to train our llama, which will accelerate it performance.
This technique is how this model are trained and in this video we will see, how to finetune this llm.
Please subscribe and like the video to help me keep motivated to make awesome videos like this one. :)
Free Google Colab for 4bit QLoRA fine-tuning of llama-3-7b model
Rise and Rejoice - Fine-tuning Llama 2 made easier with this Google Colab Tutorial
✍️Learn and write the code along with me.
🙏The hand promises that if you subscribe to the channel and like this video, it will release more tutorial videos.
👐I look forward to seeing you in future videos
Please subscribe and like the video to help me keep motivated to make awesome videos like this one. :)
Free Google Colab for 4bit QLoRA fine-tuning of llama-2-7b model
Rise and Rejoice - Fine-tuning Llama 2 made easier with this Google Colab Tutorial
TIMESTAMP:
👉0:00 Intro.
👉1:15 Training the reward model for RLHF.
👉7:09 Going deeper to reward model code for RLHF.
👉9:30 Training the policy model with ppo (step-3).
✍️Learn and write the code along with me.
🙏The hand promises that if you subscribe to the channel and like this video, it will release more tutorial videos.
👐I look forward to seeing you in future videos
Links:
Dataset to train: huggingface.co/datasets/Carpe...
Reward_dataset:
huggingface.co/datasets/Carpe...
Notebook:
github.com/ashishjamarkattel/...
#llama #finetune #llama2 #artificialintelligence #tutorial #stepbystep #llm #largelanguagemodels #largelanguagemodels #rlhf

Пікірлер: 61

@rabin1620 10 ай бұрын

Excellent topic

@WhisperingAI 10 ай бұрын

Thank you. Glad you liked it

@brainybotnlp 10 ай бұрын

Great content.

@WhisperingAI 10 ай бұрын

Glad you liked it

@talhaanwar2911 10 ай бұрын

thanks

@mohamedsatti3038 10 ай бұрын

thank you

@WhisperingAI 10 ай бұрын

Glad you liked it

@user-rl8di9dc9l 8 ай бұрын

this video is really helpful.Thanks for sharing 🙌 but could you please share with us which version of libraries you are using?

@WhisperingAI 8 ай бұрын

All the libraries are upto dated one, but i guess transformer was 4.29

@cookiearmy2960 3 ай бұрын

how is the reward model trained can anyone explain in detail? i know that we used the starcoder model with chosen and rejected input ids, but how are these mapped to a particular score, since the output of the reward model is not always binary , it returns logits as it's output, how it is done here ?

@user-cr1sk9fq6o 10 ай бұрын

Thanks for your insightful sharing. The fine-tune Llama 2 model return an incomplete sentence for the last sentence. Do you have ways to solve this?

@WhisperingAI 10 ай бұрын

Try increasing the max length while inference.

@dibyanshuchatterjee4126 3 ай бұрын

Great video. Just a quick question, is it possible to intercept just the reward model's output for an LLM response, before the reward produced for each response goes into the LLM from reward model. Meaning, is there anyway can just use the reward model to see what response of LLM was good Vs bad and store those results?

@WhisperingAI 3 ай бұрын

Yes you can. In step 3 there is a line which takes the result from policy model and pass it to reward model for score. You can print the output.

@_SHRUTIDAYAMA 6 ай бұрын

Hey, this video is really helpful. Can you please tell how to give input and generate output after step3? Also when we create UI , then how feedback from UI will be given to policy model ? Can you please make a video on it, it will be really helpful !!!! Thanks :)

@WhisperingAI 6 ай бұрын

Sure i will try creating a short video for it with in couple of days.

@_SHRUTIDAYAMA 6 ай бұрын

That will really be helpful!!!!! Thanks:)@@WhisperingAI

@shrutidayama8193 5 ай бұрын

Hey , It will really be helpful if you make it ...please help me

@WhisperingAI 5 ай бұрын

@@shrutidayama8193 It will be uploaded tomorrow. Thanks

@ivanleung6034 4 ай бұрын

I notice the reward model structure is the same as the fine-tuned model. As someone said, we can use a small model with much fewer parameters and layer to do the reward model, that's work too right?

@WhisperingAI 4 ай бұрын

That works, but in case of reward model its basically the sequence classification model with one head, so output produced is only one logits, but i guess it is handled internally by the trl library.

@talhaanwar2911 10 ай бұрын

can you create a tutorial on inference of this

@WhisperingAI 10 ай бұрын

Sure

@HaroldKouadio-gj7uw 17 күн бұрын

what of doing a translation task with the LLMs and reinforce it with RLHF?

@WhisperingAI 17 күн бұрын

We can do that

@mahmoudmohamed-lr9ql Ай бұрын

does this use reference model and kl divergence?

@WhisperingAI Ай бұрын

Yes it use both

@user-gv1bc2kk4b 7 ай бұрын

can you share the jupyter folder. i dont have any idea about paths.

@WhisperingAI 7 ай бұрын

I haved updated the code base path, let me know if you don''t understand again.

@user-gv1bc2kk4b 7 ай бұрын

@@WhisperingAI how to train a chatbot iteratively with userfeedbacks. train chatbot over time with users interaction and feedbacks. how do i do that with pretrained models.

@WhisperingAI 7 ай бұрын

@@user-gv1bc2kk4b The answer is simple you must keep retraining the model, if you wish to train it with userfeedback. Offload the model from production, and train it with data, including older version of data with timestamp. Evaluate the model performance and if its good, shift the train model in production. But if you dont want to do that RAG method can be used in this case you can check my this video if its help kzbin.info/www/bejne/epOWh6Cse9Orb6s

@denidugamage2096 7 ай бұрын

@@WhisperingAI. My idea is creating a low providing chatbot. It’s a group project. And other’s doing the chatbot part and NLP part. My party is RLHF. Chatbot dataset is constitute. How do i train my chatbot with user feedbacks. Im asking you because i don’t have idea about it🥲

@WhisperingAI 7 ай бұрын

Please watch my earlier video kzbin.info/www/bejne/eGbHmZSQha-ErpI if you want to do it for feedback. I have used the amazon review in there.

@rajeepthapa5426 14 күн бұрын

Are you nepali ?

@user-kl8ov7dg7d 9 ай бұрын

First, thank you for the good video, and I would like to ask you two questions. 1) In the third part of the colab code, kzbin.info/www/bejne/iGPTkqiimJiDaK8 , I am confused about which model goes into the "model_path" of "starcoder_model = AutoModelForCausalLMWithValueHead.from_pretrained(model_path)". "model_path" is "bigcode/tiny_starcoder_py"? or "summarization_policy_new/"? which one is correct? 2) Can I think of the first part ("Creating the policy model for human Evaluation") of the colab code on the previous KZbin as SFT Training? and Can I think that the resulting policy model is the SFT model?

@WhisperingAI 9 ай бұрын

Thats actually the policy model that we have trained on first step, that is summarization_policy_new/. As we are refining the model from step 1 with reward in step 3. Hope that clarify. If you have any question fell free to ask. I would love to help

@user-kl8ov7dg7d 9 ай бұрын

@@WhisperingAI Thank you. It's clear. Could you please answer the second question regarding the SFT?

@WhisperingAI 9 ай бұрын

For the second question. Dont think it as that way. In the first step we are simply finetuning the model ( model can be anything like llama , gpt, starcoder) SFT is the library we are just using it to get rid of writing pytorch code for creating dataloader and training loop. So after first step resulting policy model is finetuned model not sft model.

@sayansamanta3775 8 ай бұрын

@@WhisperingAIhey can you please tell which model are we using in the second step where we have MODEL_PATH = "model/"? Is it big_code/tiny_starcoder_py or the policy model trained in the first step?

@sanduntharaka4256 4 ай бұрын

Can we use the same code for llama2??

@WhisperingAI 4 ай бұрын

Yes you can but i guess you cannt run it on google colab unless you use lora or 4bit

@sanduntharaka4256 4 ай бұрын

@@WhisperingAI Im using Kagle notebooks. I have created policy model. But in reward training it gives IndexError: index out of range in self Why?

@sanduntharaka4256 4 ай бұрын

@@WhisperingAI And have executed your same code in high ram environment. But it gives same error: IndexError: index out of range in self, I want to apply RLHF in llama2. Your video is the only one i found that relates with RLHF.

@WhisperingAI 4 ай бұрын

There might be some issue while loading the dataset, or tokenizing. Can you share on which step you are facing this issue?

@WhisperingAI 4 ай бұрын

Please check your dataloader and try running each step individually