Fine-tune Mixtral 8x7B (MoE) on Custom Data

Fine-tune Mixtral 8x7B (MoE) on Custom Data - Step by Step Guide

Рет қаралды 37,879

Күн бұрын

Пікірлер: 71

@薇季芬 5 ай бұрын

3:37 format 4:15 follow a different format 4:26 Indicate the end of user input 4:33 special token Indicate the end of model response 4:39 you need to provide your data in this format 5:08 def create_prompt 5:31 System message 6:16 Load our based model

@MikewasG 9 ай бұрын

Thank you for sharing, this is very helpful! Looking forward to the next videos!

@dev_navdeep 7 ай бұрын

kudos, really simple and direct explaination.

@lukeskywalker7029 8 ай бұрын

IM sceptical this actually is effectively training mixtral MoE model and not making it worse!

@AI-Makerspace 8 ай бұрын

Thanks for the tag @Prompt Engineering! What else is your audience requesting the most these days? Would love to find ways to create some value for them together!

@engineerprompt 8 ай бұрын

Thanks for the amazing work you guys are doing! really appreciate it. I think deployment is a topic that will be really valuable to my audience. Let's explore how to collaborate.

@AI-Makerspace 8 ай бұрын

@@engineerprompt absolutely! We started delving deper into deployment with LangServe and vLLM events in recent weeks. We'll connect to figure out next steps!

@Tiberiu255 9 ай бұрын

why are you using packing in the SFTTrainer if you just said that you're going to pad the examples?

@big_sock_bully3461 8 ай бұрын

Can you explain ?

@AbhishekShivkumar-ti6ru 8 ай бұрын

very nicely explained!

@shinygoomy2460 7 ай бұрын

how do you format a prompt that has multiple requests and responses within the same context???????

@varunnegi-v7z 9 ай бұрын

can you also make a video on fine-tuning multimodal models like llava, cog-vlm

@kaio0777 9 ай бұрын

Can you make this for home computer use in terms of my personal data and tech it to use tools on your system and online

@divyagarh 4 ай бұрын

Great video! Could you please consider training and deploying it in Sagemaker?

@engineerprompt 4 ай бұрын

I am going to create a video on deployment soon

@ahmedmechergui8680 9 ай бұрын

Thanks for the video 😃 i just have a question , is it possible to use the model through an API and also provide the source files for the data with the response ?

@joaops4165 9 ай бұрын

Could you make a tutorial teaching how to convert a model to ggml format?

@alexxx4434 9 ай бұрын

Thanks for the guide! How to continue fine-tuning process such as in this case? Can you load previous work (Lora) and carry on, or do you need to restart?

@engineerprompt 9 ай бұрын

I think you can do that by storing different check points

@lostInSocialMedia. 9 ай бұрын

can you finetune Uncensored Models of this with gemini pro ai ?

@PotatoMagnet 9 ай бұрын

The base model ofmistral is uncensored, but you can't fine tune one model with another model. Both are of different architecture, you can't even merge or fine tune between same models of different parameters like between 7B and 13B either, so forget completely different models.

@protimaranipaul7107 7 ай бұрын

Thank you share such wonderfull video! Waiting for a video that discuss about finetuning. So that we can use higher than 32k token. Have you or any person worked with the folloing? 0) How did we measure performance after fine tuning? Did they perform well? Perplexity? 1) Json files? Creating graphs to store the context? 2) and or Large csv/sql file? As llama code sql code is not working well 3) Any image/diffusion models? Appreciate it!

@HarmeetSingh-ry6fm 8 ай бұрын

Great video just have one question can we use the fine-tuned model as a pickle file?

@abdeldjalilmouaz 6 ай бұрын

requires colab pro to work?

@rishabhkumar4443 9 ай бұрын

How can I use a generative model to manipulate content of my website Ex. Showing response from my site based on prompt given by the user

@AIEntusiast_ 7 ай бұрын

i wish someone made a video from collecting data example pdf, conver that to working dataset tha can be used to train model, everyone is using huggingface models and just retrain another llm

@LakshayKumar-v2p 6 ай бұрын

Could you please share the requirement.txt, i am having version conflicts despite using A100 GPU!

@scortexfire 6 ай бұрын

How do I fine tune without prompt and instruction? I basically want the model to "know" about a thousand very recent web articles.

@engineerprompt 6 ай бұрын

In this case, you probably want to further pretrain the base model with your dataset (you don't need prompt & instructions format) and then finetune it on a dataset. Or just use RAG.

@sysadmin9396 9 ай бұрын

Can I use this to train a model to answer questions from a list of pdfs?

@Akshatgiri 7 ай бұрын

I've noticed that Mixtral 8x7b-instruct ( and other mistral models ) constantly repeat part of the system prompt. Have you noticed this / found a fix for it?

@researchforumonline 8 ай бұрын

Thanks, what is the cost to do this? Server cost?

@garyhutson6270 8 ай бұрын

What were your VM instance specs. It is struggling with an A100?

@VerdonTrigance 7 ай бұрын

Hi, thanks for this step by step guide, but in case we want LLM to learn something new about our domain (let's say it will be book Lord of the Rings) and we later want to ask our model open questions about this book (like 'where Frodo gets his sword?') what should we do? We definetely cannot prepare dataset in form of QnA, so it should self-supervised training. But I never saw examples of doing this and I can't image how it supposed to be done? Is it even possible? Looks like we should start from base model, fine-tune it somehow with our book, and later we should apply fine-tuning for instruct on top of it, right? But in this case someone still should prepare this QnA? I'm frustrated.

@xXCookieXx98 6 ай бұрын

Your use case sounds like a classic RAG one. It's not necessary to fine-tune for that. Although a fine-tuned model + RAG would probably create even better results, the effort here doesn't seem worth it. The video Building Corrective RAG from scratch with open-source, local LLMs from langchain (kzbin.info/www/bejne/e2PWmaSpjtyrmc0) might help you, it also incudes a web search option, in case the provided context isn't sufficient, which should work pretty good with things like popular books. So, it's not limited to that and can be used in basically any domain. But you could also just build a RAG app without that. I would suggest a combination of a MultiQueryRetriever and a ParentDocumentRetriever for retrieving your context. Nevertheless, if you still want to fine tune: From what I have learned so far it is possible to create datasets using LLMs: e.g. you prompt an instruct LLM to create questions based on context chunks and then use those questions and chunks to create answers. You will find similar methods on this channel e.g. "automate dataset creation for Llama-2 with GPT-4".

@jprobichaud 9 ай бұрын

🎯 Key Takeaways for quick navigation: 00:00 🚀 *Introduction to Fine-Tuning Mixtral 87B Model* - Overview of the video's purpose: fine-tuning Mixtral 87B model from Mistral AI on a custom dataset. - Mention of the popularity and potential of Mixtral 87B as a mixture of experts model. - Emphasis on practical considerations for fine-tuning, such as VRAM requirements and dataset details. 01:28 🛠️ *Installing Required Packages and Data Set Overview* - Installation of necessary packages: Transformers, TRL, accelerate, P torch bits, and bytes. - Discussion on using the Mosaic ML Instruct with 3 datasets for fine-tuning. - Overview of the dataset structure, splits, and sources. 03:45 📝 *Formatting Data for Fine-Tuning Mixtral 87B* - Explanation of the prompt template for fine-tuning, specific to Mixtral 87B Instruct version. - Discussion on rearranging data to make it more challenging by creating instructions from provided text. - Demonstration of a function to reformat the initial data into the desired prompt template. 06:28 🧩 *Loading Base Model and Configuring for Fine-Tuning* - Acknowledgment of the source for the notebook and clarification that the base version is used. - Setting configurations, loading the model, and tokenizer, along with using Flash attention. - Explanation of the importance of setting up configurations for a smooth fine-tuning process. 08:18 🔄 *Checking Base Model Responses Before Fine-Tuning* - Use of a function to check responses from the base model before any fine-tuning. - Illustration of the base model behavior in generating responses to a given prompt. - Recognition that the base model tends to follow next word prediction rather than explicit instructions. 10:06 📏 *Determining Max Sequence Length for Fine-Tuning* - Explanation of the importance of max sequence length in fine-tuning Mixtral 87B. - Presentation of a code snippet to analyze the distribution of sequence lengths in the dataset. - Emphasis on selecting a max sequence length that covers the majority of examples. 12:20 🧠 *Adding Adapters with Lura for Fine-Tuning* - Overview of the Mixtral 87B architecture, focusing on linear layers for adding adapters. - Introduction to Lura configuration for attaching adapters to specific layers. - Demonstration of setting hyperparameters and using the TRL package for supervised fine-tuning. 14:36 🚥 *Setting Up Trainer and Initiating Fine-Tuning* - Verification of multiple GPUs for parallelization during model training. - Definition of output directory and selection of training epochs or steps. - Importance of configuring the trainer, including considerations for max sequence length. 16:50 📈 *Analyzing Fine-Tuning Results and Storing Model* - Presentation of training and validation loss graphs, indicating a gradual decrease. - Acknowledgment of the need for potential longer training for better model performance. - Demonstration of storing the fine-tuned model weights locally and pushing to Hugging Face repository. 17:46 🔄 *Testing Fine-Tuned Model Responses* - Utilization of the fine-tuned model to generate responses to a given prompt. - Comparison of responses before and after fine-tuning, showcasing improved adherence to instructions. - Acknowledgment that further training could enhance the model's performance. Made with HARPA AI

@MehdiMirzaeiAlavijeh 7 ай бұрын

please let me know how to create a fixed forms with the below structures with special command to LLM: Give me score out of 4 for (based on the TOEFL rubric) without any explanation, just display the score. General Description: Topic Development: Language Use: Delivery: Overall Score: Identify the number of grammatical and vocabulary errors, providing a sentence-by-sentence breakdown. 'Sentence 1: Errors: Grammar: Vocabulary: Recommend effective academic vocabulary and grammar:' 'Sentence 2: Errors: Grammar: Vocabulary: Recommend effective academic vocabulary and grammar:' .......

@LeoAr37 9 ай бұрын

Can't we train the quantized version in a smaller GPU instead of training the full model?

@engineerprompt 9 ай бұрын

Even training the quantized version of the full model will need a powerful GPU. That's why LoRa is used to add extra layers that are trained instead of the actual model. Hope this helps

@DistortedV12 9 ай бұрын

Are you finetuning the mixtral instruct version they just released or base model??

@engineerprompt 9 ай бұрын

In this video, just the base version

@kanshkansh6504 6 ай бұрын

❤👍🏼

@pallavggupta 9 ай бұрын

Hi, I am trying to build an organisation level AI trained on my company data I would to know how can I create dataset for my data to be trained on mistral AI I was unable to find any tutorial on how to create a dataset for large data

@conscious_yogi 8 ай бұрын

Did you found solution for this?

@nishhaaann 8 ай бұрын

Looking for same thing@@conscious_yogi

@GolamKibriaf 15 күн бұрын

Lee Shirley Gonzalez Maria Robinson Joseph

@kunalr_ai 9 ай бұрын

64 gb vram kaha se laaoge pata nahi kaunse dataset par fine tune kiya hai bhai kisi kaam ka nahi hai ye video tere paise to view se aa gaye humare paise kaise banege

@Juan-n6k3c 6 ай бұрын

So with two 3090s this should work? And what about using multiple different gpus for training? Like I have one 3090ti 24g and one 4060 8g

@IshfaqAhmed-p6d 8 ай бұрын

at 5:58, Why is the sample["response"] given as the input and sample["prompt"] is given as response

@Zaheer-r4k 9 ай бұрын

So can't we run in colab or kaggle notebook?

@ilianos 9 ай бұрын

in the video descr it says no (not on T4)

@luciolrv 9 ай бұрын

I could not run it in A100 of Colab. It complains of lack of memory, not too much: actually less than 1GB. The "copilot" of colab gives some suggestions such as reducing batch size or the max_split_size_mb parameter, but that does not reduce enough. Any ideas? Good notebook

@jonjino 9 ай бұрын

@@luciolrv It complains of less than 1GB of memory, but that's because it's loading the model a bit at a time so the error message isn't accurate. Kaggle doesn't offer better GPU's either. You'll need to setup a VM with an A100 80GB or H100. Unfortunately you'll probably just have to go through the hassle of setting up a VM with one of those GPU's via GCP or AWS.

@Ai-Marshal 7 ай бұрын

That's a great video. Thanks for sharing. After pushing the model to hugging face, how to host it independently on runpod using VLLM ? When I try to do that, it gives me error. Tried searching a lot of videos and articles. But of no use so far.

@FunkyByteAcademy 7 ай бұрын

did you come right?

@electricskies1707 9 ай бұрын

Can you clairfy, 1 epoch would be one run of the full data (34333 steps of your trimmed data) Why would you run this 2 epochs, does going over the data twice improve it? Also how did you determine 32 was a good batch size for this data size? (this is about 0.9% of the data?)

@LeoAr37 9 ай бұрын

I think the companies that trained big LLMs usually used 2-3 epochs

@engineerprompt 9 ай бұрын

Batch size determines how much data is fed to your model at once. 32 is the max I could do on the available hardware. Usually you will see that to be much lower. In regards to the epochs, you are right. In one epoch, the model will see each example once. If you have small amount of data, you might want to go over multiple epoch so the model can actually learn from the data but you need to be careful that the model can also overfit. For large amount of data (billions or trillions of tokens) its very expensive and time consuming to have several epochs over the data, that's why you mostly see models trained for one more two epochs only. Hope this helps.

@WelcomeToMyLife888 9 ай бұрын

Awesome content as usual! Thanks!

@engineerprompt 9 ай бұрын

Thank you 😊

@tomski2671 9 ай бұрын

I think you can rent an H100 for $5/hour. So this would cost about $7

@hemeleh8683 9 ай бұрын

where?

@DistortedV12 9 ай бұрын

Awesome man, any idea of how to get this running on a colab gpu or inference cost down?

@engineerprompt 9 ай бұрын

Probably no way at the moment to run it on the colab gpu but you can look at the 2bit quantized version. If you are running this model as part of production pipeline, I would suggest to look at api providers such as together AI. They have really good pricing on it