Fine-Tuning Meta's Llama 3 8B for IMPRESSIVE Deployment on Edge Devices

Fine-Tuning Meta's Llama 3 8B for IMPRESSIVE Deployment on Edge Devices - OUTSTANDING Results!

Рет қаралды 6,003

28 күн бұрын

This video demonstrates an innovative workflow that combines Meta's open-weight Llama 3 8B model with efficient fine-tuning techniques (LoRA and PEFT) to deploy highly capable AI on resource-constrained devices.
We start by using a 4-bit quantized version of the Llama 3 8B model and fine-tune it on a custom dataset. The fine-tuned model is then exported in the GGUF format, optimized for efficient deployment and inference on edge devices using the GGML library.
Impressively, the fine-tuned Llama 3 8B model accurately recalls and generates responses based on our custom dataset when run locally on a MacBook. This demo highlights the effectiveness of combining quantization, efficient fine-tuning, and optimized inference formats to deploy advanced language AI on everyday devices.
Join us as we explore the potential of fine-tuning and efficiently deploying the Llama 3 8B model on edge devices, making AI more accessible and opening up new possibilities for natural language processing applications.
Be sure to subscribe to stay up-to-date on the latest advances in AI.
My Links
Subscribe: / @scott_ingram
X.com: / scott4ai
GitHub: github.com/scott4ai
Hugging Face: huggingface.co/scott4ai
Links:
Colab Demo: colab.research.google.com/dri...
Dataset: github.com/scott4ai/llama3-8b...
Unsloth Colab: colab.research.google.com/dri...
Unsloth Wiki: github.com/unslothai/unsloth/...
Unsloth Web: unsloth.ai/

Пікірлер: 24

@israelcohen4412 26 күн бұрын

So i never post comments, but the way you explained this was by far the best i have seen online, i wish I found your channel 8 months ago :) Please keep posting videos your explanation is very well thought off and put together.

@andrew.derevo 5 күн бұрын

Good stuff sir, thanks a lot 🙌

@petroff_ss 11 күн бұрын

Thank you! You have a talent for explaining and planning a workshop! Thank you for your work!

@ratsock 25 күн бұрын

Absolutely fantastic! Really appreciate, detailed, clear breakdown of concrete steps that let us drive value, rather than the clickbait hypetrain that everyone else is on.

@tal7atal7a66 25 күн бұрын

i like the thumbnails, topic types, explains methods, and the mr who explain. nice channel very valuable infos ❤

@gustavomarquez2269 24 күн бұрын

You are amazing! This is the best explanation about this topic. I liked it and just subscribed. Thank you very much !!!

@scott_ingram 24 күн бұрын

Thank you so much for the kind words and for subscribing, I really appreciate it! I'm so glad you found the video helpful in explaining how to fine-tune LLaMA 3 and run it on your own device. It's a fascinating topic and technology with a lot of potential. I'm looking forward to sharing more content on large language models and AI that you'll hopefully find just as valuable. Stay tuned!

@andrepamplona9993 25 күн бұрын

Super, hyper fantastic! Thank you.

@Danishkhan-ni5qf 15 күн бұрын

Wow!

@EuSouAnonimoCara 18 күн бұрын

Awesome content!

@RameshBaburbabu 26 күн бұрын

Thank you so much for sharing that fantastic clip! It was really informative. I'm currently looking into fine-tuning a model with my ERP system, which handles some pretty complex data. Right now, I'm creating dataframes and using panda-ai for analytics. Could you guide me on how to train and make inferences with this row/column data? I really appreciate your time and help!

@scott_ingram 25 күн бұрын

Thanks for your question and for watching the video. I'm glad you found it informative! Your approach largely depends on your use case and the kind of insights you're looking to derive from your data. Generally, you're going to want to follow these steps to train a model with complex data: Decide how you plan to interact with the model. For instance, maybe you're doing text generation; or natural language understanding tasks like sentiment analysis, named entity recognition and question answering; or text summarization; or domain-specific queries like legal, medical or corporate. Choose a model that has high benchmarks for the specific requirements of your task, the nature of your data and the desired output format. A model is more likely to train well if the base model's capabilities are already very strong for the task you intend to use it for. Consider factors like model performance, computational resources, and the availability of pre-trained weights for your specific domain or language. Prepare and preprocess your dataframes, removing/filling missing values, encoding variables numerically and normalizing the data. The cleaner the data, the better the training will be. Split the data into a training set and validation set. The validation set will be data you haven't trained the model on to see how the model performs with unseen data. Fine-tune with your dataset, test the model out, then iterate on the process by tweaking data, adding more data, trying different training parameters, even trying different models. Hope this helps guide you in your endeavor!

@15ky3 24 күн бұрын

Amazing video, thanks for the best explanation I’ve ever seen on KZbin. Could you also please make a video how to finetune the phi3 model? 🙏

@scott_ingram 24 күн бұрын

Great suggestion! I will look into that.

@andrew.derevo 5 күн бұрын

Did you have any experience with fine tuning for non english data on this model, any suggestions for a good multilingual open sources models?🙏

@15ky3 23 күн бұрын

Is the output from Ollama on your MacBook in real-time? Or you have speed up in the video? On my 2014 iMac, it is significantly slower. It's about time for a new one. What are the technical specifications of your Mac?

@scott_ingram 23 күн бұрын

Except for the download, which I sped up significantly, everything in terminal was shown in real time. The demo was done on a MacBook Pro M3 Pro Max. YMMV with other hardware.

@azkarathore4355 14 күн бұрын

Hi i want to finetune llama3 for English to urdu machine translation can you guide me regarding this.dataset is opus 100

@madhudson1 26 күн бұрын

rather than using google colab + compute for training, what are your thoughts on using a local machine + GPU?

@guyvandenberg9297 26 күн бұрын

Good question. I am about to try that. I think you need an Ampere architecture on the GPU - (A100 or RTX 3090). Scott, thanks for a great video.

@guyvandenberg9297 26 күн бұрын

Ampere architecture for BF16 as opposed to F16 as per Scott's explanation in the video.

@scott_ingram 25 күн бұрын

Thanks for your question! The notebook is designed to do the training on Colab, but you can run it locally for training if you have compatible hardware; I haven't tested it locally though. The RTX3090 does support brain float. Install python, then set up a virtual environment: python3 -m venv venv source venv/bin/activate Next, install and start the Jupyter notebook service: pip install jupyter jupyter notebook --kernel_name=python3 That will run a local jupyter notebook service and connect to a Python 3 kernel. Then, test GPU availability: import torch print(torch.cuda.is_available()) device = torch.device("cuda" if torch.cuda.is_available() else "cpu") print(f"Using device: {device}") Here's how you would create a tensor with pytorch on the RTX 3090 and tell it to use brain float: tensor = torch.randn(1024, 1024, dtype=torch.bfloat16) Some cells in the notebook won't run correctly, such as the first cell that sets up text wrapping (this cell is not relevant for training); that's designed for Colab specifically. There may be other compatibility issues, but I haven't tested it running locally. This should get you started to see whether your GPU could potentially work. Let me know how it works out!

@PreparelikeJoseph 18 күн бұрын

@@scott_ingram Id really like to get some ai agents running locally on a self hosted model. Im hoping two rtx 3090 can combine just via PCI and load a full 70b model.