Llama 3 Fine Tuning for Dummies (with 16k, 32k,... Context)

Рет қаралды 20,854

Күн бұрын

Learn how to easily fine-tune Meta's powerful new Llama 3 language model using Unsloth in this step-by-step tutorial. We cover:
* Overview of Llama 3's 8B and 70B parameter models
* Benchmarks showing Llama 3's strong performance
* How to fine-tune Llama 3 efficiently with Unsloth
* Choosing sequence length based on your data
* Configuring the model, adapter layers, and hyperparameters
* Preparing a custom fine-tuning dataset
* Training the model and monitoring results
* Testing the fine-tuned model on new data
* Saving and publishing your custom model to Hugging Face
Code, example data, and resources provided. Fine-tune Llama 3 for summarization, question-answering, analysis and more. Integrate your model into applications, chatbots, and pipelines.
10X Faster Cloud Architecture Diagrams (Use Promo Code KZbin24 for 25% Off):
softwaresim.com/pricing/
Demonstration Diagram and Code: github.com/nodematiclabs/llam...
ML Engineering Themed Music via Udio Beta
If you are a cloud, DevOps, or software engineer you’ll probably find our wide range of KZbin tutorials, demonstrations, and walkthroughs useful - please consider subscribing to support the channel.
0:00 Introduction
1:38 Conceptual Overview
5:31 Initial Setup
7:05 Token Counting
9:57 Model and Quantization
10:23 QLoRA Adapter Layers
11:05 Dataset Preparation
16:43 Trainer Specification
18:43 Inference Testing
21:12 Model Saving (Hugging Face)

Пікірлер: 38

@ralvin3614 Ай бұрын

Really love the play list! Great vidieo.

@danielhanchen Ай бұрын

Oh fantastic video as always - absolutely packed with detailed information so great work!

@slimpbisquitte3942 Ай бұрын

Really comprehensive and well-explained! Great work! I wonder if it is also possible to fine-tune not a text generator but an image generator. Does someone have any ideas? I am super new to this field and pretty much in the dark so far. Could not find something for image generation yet :/ Thanks for any suggestions!

@nodematic Ай бұрын

We'll try to make a video on this. Thanks for the suggestion.

@Itsgosm Ай бұрын

Amazing video!, been curious if had to train a set of codes, which would have indentations (take example python code), will it still require data to be in ]standard format of having 'instruction', 'output' and 'input'? 150+ codes with quite high complexity will it be possible to train it? are there any other ways to set up the dataset? and is Llama3 capable of getting trained on un-structured data?

@nodematic Ай бұрын

Yes, you could use a different, non-Alpaca-style format. For the "text" field creation via string interpolation, replace that with a text block of your code lines (including line breaks). Llama-3 does well on HumanEval, so I suspect it would work well for your described use case. Just be careful with how you create your samples - getting the model to stop after generating the right line/block of code may not be easy (although you could trim things down with post-processing).

@drnicwilliams Ай бұрын

LOL “We don’t need this code, so let’s put it in a text cell”

@samfisher92sc Ай бұрын

Great explanation. This could be a stupid question. How do we fine-tune for trigger function calling?

@nodematic Ай бұрын

Thanks for your question-it's definitely not a stupid one! In your dataset, have fields like "instruction", "prompt", and "function", and then do the string interpolation to create your text field (you could do it similar to the video, but replace "### Story" with "### Prompt" and "### Summary" with "### Function"). Make sure your training set has a consistent format for the function to trigger, and a consistent fallback value for non-triggering cases. Overall, the process should be quite similar to the video. Your model itself won't be able to actually trigger the function - only identify the right function to trigger (and possibly the arguments to supply to the function). You'll need to execute the function as a "next step" in some broader pipeline, application, service, or script. Hope I'm understanding the question correctly and that helps.

@alokrajsidhaarth7130 Ай бұрын

Great Video! I had a doubt about RoPE Scaling. How efficient is it and to what extent does it help solve the LLM context window size issue? Thanks!

@nodematic Ай бұрын

RoPE is the standard way to solve the context window size issue with these open models. It can come at a quality cost, but it's basically the best method we have if you need to go beyond the model's default context window. Use it only if you truly need the additional tokens. In the video's example, the RoPE scaling is needed, because you simply can't summarize a 16k token story by only looking at the second-half 8k of tokens.

@npip99 Ай бұрын

@@nodematic @nodematic Is there an easy API for RoPE? I don't even need fine-tuning, I just need a chat completion API for 32k context Llama 3

@nodematic Ай бұрын

Yes, you can use RoPE without fine-tuning (e.g., off-the-shelf Llama 3 with a 32k context). I would recommend using Hugging Face libraries, which can be configured for RoPE scaling (for example TGI RoPE scaling is detailed here huggingface.co/docs/text-generation-inference/en/basic_tutorials/preparing_model).

@adnenbenabdelaali6016 Ай бұрын

Great video and nice code, can you do this context length extension for Deepseek Coder model ?

@nodematic Ай бұрын

I believe it's possible, but I haven't tried yet and there isn't an existing Unsloth model for this. We'll look into it though and try to create a video. Thanks for the suggestion.

@triynizzles 14 күн бұрын

I have been having tremendous difficulty, can this be run locally in VScode?

@nodematic 12 күн бұрын

We haven't tested this, but it should work. The biggest concern would be if you don't have enough GPU memory on your local machine or if you don't have a clean Python packages and CUDA setup.

@triynizzles 10 күн бұрын

@@nodematic I have read about it more and it looks like windows isnt acting too friendly and most people are running Linux. :(

@user-fv8pi9uq6n Ай бұрын

do we need to create repo first before push to hub command ?

@nodematic Ай бұрын

No, just replace "hf/model" with your username (or organization name) and desired model name. Also, if you want a private repo, add a private=True argument to push_to_hub_merged.

@minidraco2601 Ай бұрын

whats the name of the some at 3:47? sounds pretty cool

@nodematic Ай бұрын

That's a Udio-generated custom song, and isn't published.

@simonstrandgaard5503 Ай бұрын

Great explanation. The background music was a little distracting.

@nodematic Ай бұрын

Thanks for the feedback - we'll keep this in mind on future videos.

@triynizzles 14 күн бұрын

Hello, I don't understand how at 11:00 I can change the "yahma/alpaca-cleaned" to a local .json file on my pc?

@nodematic 12 күн бұрын

The Hugging Face datasets library is used in either case, to compile a dataset of training strings. The load_dataset("yahma/alpaca-cleaned") approach (or similar) is only if you have your dataset in Hugging Face. The Dataset.from_dict used in the video should work if you read in the data from your local json and use it for the dictionary's "text" value. Depending upon how the text is structured in your JSON, you may need to do string interpolation - the end result "text" values for the dataset need to be pure strings.

@triynizzles 10 күн бұрын

@@nodematic Thank you! I may have more questions in the future. :)

@ShdowGarden Ай бұрын

hi, I am fine tuning llama 3 model but i am facing some issue. Your video was great. I was hoping to connect with you. Can we connect?

@nodematic Ай бұрын

Thanks. You can reach out via email at community@nodematic.com. We often do not have the staff to handle technical troubleshooting or architectural consulting, but we'll answer if we can.

@nishitp28 Ай бұрын

Nice Video, What should be the format for data extraction, if I want to extract data from a chunk? Can I include something like: """ {Instruction or System Prompt} ### {Context or Chunks} ### {Question} ### {Answer} """

@nodematic Ай бұрын

The "###" lines signify headers, so I wouldn't put your content on those lines - rather, they are used to categorize the line(s) of text below each header. If you're using a chunk of content (e.g., via some sort of RAG approach), yes, you could have that as a separate categorization. Something like: """ {instruction} ### Background {chunk} ### Question {question} ### Answer {answer} """ For the best results, use the header terms in your instruction. For the example above, this could be something like "Based on the provided background, which comes from documentation, FAQs, and/or support tickets, answer the supplied question as clearly and factually as possible. If the background is insufficient to answer the question, answer "I don't know".".

@artemvakhutinskiy900 3 күн бұрын

not gonna lie the ai song was a banger

@user-ti7fg7gh7t 27 күн бұрын

i hate how everyone does unsloth tutorials not able of using multigpu setup

@ChituyiDalmasWakhusama 26 күн бұрын

Hi, i keep getting this error "TypeError: argument of type 'NoneType' is not iterable" It is originating from "usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py" Could you please share the requirements.txt. Also it only happens when i try to push "merge_16bit". merge_4bit works just fine!