Mistral Fine Tuning for Dummies (with 16k, 32k, 128k+ Context)

Рет қаралды 11,715

Күн бұрын

Discover the secrets to effortlessly fine-tuning Language Models (LLMs) with your own data in our latest tutorial video. We dive into a cost-effective and surprisingly simple process, leveraging the power of Hugging Face and Unsloth libraries for unmatched memory efficiency and flexibility in model training. Our walkthrough covers everything from selecting the right model on the Hugging Face Hub to preparing your data and tuning it with Colab resources, including a free tier option. This guide is designed to demystify the fine-tuning process, making it accessible even to beginners.
Join us as we explore the use of Mistral 7B model and demonstrate how to maximize your fine-tuning outcomes with minimal costs.
10X Faster Cloud Architecture Diagrams (Use Promo Code KZbin24 for 25% Off):
softwaresim.com/pricing/
Demonstration Code and Diagram: github.com/nodematiclabs/mist...
If you are a cloud, DevOps, or software engineer you’ll probably find our wide range of KZbin tutorials, demonstrations, and walkthroughs useful - please consider subscribing to support the channel.
0:00 Conceptual Overview
3:02 Custom Data Preparation
8:17 Fine Tuning Notebook (T4)
16:52 Fine Tuning Notebook (A100)
19:13 Hugging Face Save and Usage

Пікірлер: 27

@fugamante1539 2 ай бұрын

Been using Mistral on Ollama and it is pretty amazing the way it compares to larger models like Gemini and Llama2

@danielhanchen 2 ай бұрын

Super great work on the video and tutorial! Super insightful and just subscribed :) Thanks for sharing Unsloth as well! :)

@nodematic 2 ай бұрын

Thanks, Daniel! Huge fan - appreciate what you guys are doing!

@fire17102 2 ай бұрын

Awesome! Subscribed ❤

@alpapie 2 ай бұрын

very good video.

@tmangono 2 ай бұрын

Thanks!

@JavMend 20 күн бұрын

pretty cool video. I had two questions: 1) what is the difference between loading the model in 4-bit quantization but doing tuning in 16 bit? previously, I loaded a model in bfloat16 and didn't have to specify anything when doing tuning - maybe I am misunderstanding; 2) do you have any video or code recommendations for where I can see fine-tuning but not using LoRA? I feel semi committed to trying this first before going the LoRA route. ty for the great video (and music hehe)

@nodematic 18 күн бұрын

1) You can do the Mistral "base model" weights via 4-bit quantization, while using 16-bits for the fine-tuning adapter layers. So, for your fine-tuned model, most of the layers will be quantized, but with a bit of non-quantized layers "on top". This is often a sweet spot between computational demands and model accuracy (generally a tradeoff). 2) We don't currently have non-adapter fine-tuning videos (e.g., full-parameter fine-tuning), but will try to create a video on this topic. Thanks for watching and good luck.

@nicolassuarez2933 Ай бұрын

Outstanding! Will this approach works for data extraction? Lets say I want all the Titles of a book? Thanks!

@nodematic Ай бұрын

Thanks! Yes, your fine tuned model could definitely be focused on data extraction, like book titles.

@jrfcs18 2 ай бұрын

Does this fine tuning method work with question and answer dataset? Do you need an instruction too? If so, what does the json format need to be?

@nodematic 2 ай бұрын

Yes, it works with question and answer data. The format is flexible - the approach chosen in the video is a somewhat standard format, that we've seen works well with Mistral. For your dataset, change "### Story" to "### Question" and change "### Summary" to "### Answer". Then you could try leaving the instruction blank or doing something generic like "Write a response that appropriately answers the question, while being as clear, factual, and helpful as possible.". I suspect that will work well for you.

@geekyprogrammer4831 2 ай бұрын

Can you please share the colab notebook too!

@nodematic 2 ай бұрын

Sure thing - the fine-tuning code is updated and maintained at the notebook links here github.com/unslothai/unsloth?tab=readme-ov-file#-finetune-for-free.

@Vexxter09 2 ай бұрын

hi can you make a follow up video where you save the GGUF finetuned version for it directly? I tried importing llama.cpp using git clone but it didnt seem to work, really looking forward to it keep up the good work!

@nodematic 2 ай бұрын

Yes, we'll try to make a video on this. Thanks for the suggestion.

@happygoblin8179 2 ай бұрын

Great video, how to save a model using saving strategy instead of getting the last one, is there an argument to add or something? I don't want to use the one at the end of training

@nodematic 2 ай бұрын

The trainer automatically saves checkpoints to the output_dir model output directory, so to use a checkpoint (e.g., with `from_pretrained`), use the checkpoint's folder within that output_dir. For example, from_pretrained might use "outputs/checkpoint-200".

@happygoblin8179 2 ай бұрын

@@nodematicthank you so much that was very helpfull, is the number of epochs controlled by max_steps? What should I set max_steps for a dataset of 10k examples with about 1500 tokens in each? Thank you so much, this is the best practical tutorial for finetuning I found on youtube

@nodematic 2 ай бұрын

The number of epochs is the total_steps / (dataset size / effective batch size). Effective batch size is per_device_train_batch_size * gradient_accumulation_steps * number_of_devices, so 8 if you haven't changed these values in the notebook. For example, 500 steps, 10k examples, and 8 batch size, would result in a little less than half an epoch through the dataset. For such a large fine-tuning dataset, maybe try 3000-5000 steps, and figure out where the steps have significantly diminishing returns on the loss (you can stop the training cell when you see this, to save money). Or you could let it run through and plot the loss to help find that "sweet spot", then use the checkpoint at that training step. Also, feel free to specify the training in terms of `num_train_epochs` rather than `max_steps`, if that makes more sense for you.

@SaiyD 2 ай бұрын

can you do a tutorial how make a dataset parquet for chat template like chatml or others?

@nodematic 2 ай бұрын

Yeah, we'll try to make a tutorial on this. Thanks for the suggestion.

@nodematic 2 ай бұрын

It looks like you saw the new video (thanks for the comment). Posting the link here for others kzbin.info/www/bejne/b3OxaoqwbsathMk.

@nimesh.akalanka Ай бұрын

How can I fine-tune the LLAMA 3 8B model for free on my local hardware, specifically a ThinkStation P620 Tower Workstation with an AMD Ryzen Threadripper PRO 5945WX processor, 128 GB DDR4 RAM, and two NVIDIA RTX A4000 16GB GPUs in SLI? I am new to this and have prepared a dataset for training. Is this feasible?

@nodematic Ай бұрын

The approach highlighted in the video may work if your dataset doesn't have a very high token count. Just download the notebook and run it on your local machine. I haven't tried A4000s, but it's CUDA+Ampere technology, so should work similarly. The fine-tuning would need to stay within 16 GB GPU RAM since the open source, free unsloth doesn't include multi-GPU support.

@nimesh.akalanka Ай бұрын

@@nodematic Thank you for the clarification