Fine-Tuning Meta's Llama 3 8B for IMPRESSIVE Deployment on Edge Devices - OUTSTANDING Results!

  Рет қаралды 6,003

Scott Ingram

Scott Ingram

28 күн бұрын

This video demonstrates an innovative workflow that combines Meta's open-weight Llama 3 8B model with efficient fine-tuning techniques (LoRA and PEFT) to deploy highly capable AI on resource-constrained devices.
We start by using a 4-bit quantized version of the Llama 3 8B model and fine-tune it on a custom dataset. The fine-tuned model is then exported in the GGUF format, optimized for efficient deployment and inference on edge devices using the GGML library.
Impressively, the fine-tuned Llama 3 8B model accurately recalls and generates responses based on our custom dataset when run locally on a MacBook. This demo highlights the effectiveness of combining quantization, efficient fine-tuning, and optimized inference formats to deploy advanced language AI on everyday devices.
Join us as we explore the potential of fine-tuning and efficiently deploying the Llama 3 8B model on edge devices, making AI more accessible and opening up new possibilities for natural language processing applications.
Be sure to subscribe to stay up-to-date on the latest advances in AI.
My Links
Subscribe: / @scott_ingram
X.com: / scott4ai
GitHub: github.com/scott4ai
Hugging Face: huggingface.co/scott4ai
Links:
Colab Demo: colab.research.google.com/dri...
Dataset: github.com/scott4ai/llama3-8b...
Unsloth Colab: colab.research.google.com/dri...
Unsloth Wiki: github.com/unslothai/unsloth/...
Unsloth Web: unsloth.ai/

Пікірлер: 24
@israelcohen4412
@israelcohen4412 26 күн бұрын
So i never post comments, but the way you explained this was by far the best i have seen online, i wish I found your channel 8 months ago :) Please keep posting videos your explanation is very well thought off and put together.
@andrew.derevo
@andrew.derevo 5 күн бұрын
Good stuff sir, thanks a lot 🙌
@petroff_ss
@petroff_ss 11 күн бұрын
Thank you! You have a talent for explaining and planning a workshop! Thank you for your work!
@ratsock
@ratsock 25 күн бұрын
Absolutely fantastic! Really appreciate, detailed, clear breakdown of concrete steps that let us drive value, rather than the clickbait hypetrain that everyone else is on.
@tal7atal7a66
@tal7atal7a66 25 күн бұрын
i like the thumbnails, topic types, explains methods, and the mr who explain. nice channel very valuable infos ❤
@gustavomarquez2269
@gustavomarquez2269 24 күн бұрын
You are amazing! This is the best explanation about this topic. I liked it and just subscribed. Thank you very much !!!
@scott_ingram
@scott_ingram 24 күн бұрын
Thank you so much for the kind words and for subscribing, I really appreciate it! I'm so glad you found the video helpful in explaining how to fine-tune LLaMA 3 and run it on your own device. It's a fascinating topic and technology with a lot of potential. I'm looking forward to sharing more content on large language models and AI that you'll hopefully find just as valuable. Stay tuned!
@andrepamplona9993
@andrepamplona9993 25 күн бұрын
Super, hyper fantastic! Thank you.
@Danishkhan-ni5qf
@Danishkhan-ni5qf 15 күн бұрын
Wow!
@EuSouAnonimoCara
@EuSouAnonimoCara 18 күн бұрын
Awesome content!
@RameshBaburbabu
@RameshBaburbabu 26 күн бұрын
Thank you so much for sharing that fantastic clip! It was really informative. I'm currently looking into fine-tuning a model with my ERP system, which handles some pretty complex data. Right now, I'm creating dataframes and using panda-ai for analytics. Could you guide me on how to train and make inferences with this row/column data? I really appreciate your time and help!
@scott_ingram
@scott_ingram 25 күн бұрын
Thanks for your question and for watching the video. I'm glad you found it informative! Your approach largely depends on your use case and the kind of insights you're looking to derive from your data. Generally, you're going to want to follow these steps to train a model with complex data: Decide how you plan to interact with the model. For instance, maybe you're doing text generation; or natural language understanding tasks like sentiment analysis, named entity recognition and question answering; or text summarization; or domain-specific queries like legal, medical or corporate. Choose a model that has high benchmarks for the specific requirements of your task, the nature of your data and the desired output format. A model is more likely to train well if the base model's capabilities are already very strong for the task you intend to use it for. Consider factors like model performance, computational resources, and the availability of pre-trained weights for your specific domain or language. Prepare and preprocess your dataframes, removing/filling missing values, encoding variables numerically and normalizing the data. The cleaner the data, the better the training will be. Split the data into a training set and validation set. The validation set will be data you haven't trained the model on to see how the model performs with unseen data. Fine-tune with your dataset, test the model out, then iterate on the process by tweaking data, adding more data, trying different training parameters, even trying different models. Hope this helps guide you in your endeavor!
@15ky3
@15ky3 24 күн бұрын
Amazing video, thanks for the best explanation I’ve ever seen on KZbin. Could you also please make a video how to finetune the phi3 model? 🙏
@scott_ingram
@scott_ingram 24 күн бұрын
Great suggestion! I will look into that.
@andrew.derevo
@andrew.derevo 5 күн бұрын
Did you have any experience with fine tuning for non english data on this model, any suggestions for a good multilingual open sources models?🙏
@15ky3
@15ky3 23 күн бұрын
Is the output from Ollama on your MacBook in real-time? Or you have speed up in the video? On my 2014 iMac, it is significantly slower. It's about time for a new one. What are the technical specifications of your Mac?
@scott_ingram
@scott_ingram 23 күн бұрын
Except for the download, which I sped up significantly, everything in terminal was shown in real time. The demo was done on a MacBook Pro M3 Pro Max. YMMV with other hardware.
@azkarathore4355
@azkarathore4355 14 күн бұрын
Hi i want to finetune llama3 for English to urdu machine translation can you guide me regarding this.dataset is opus 100
@madhudson1
@madhudson1 26 күн бұрын
rather than using google colab + compute for training, what are your thoughts on using a local machine + GPU?
@guyvandenberg9297
@guyvandenberg9297 26 күн бұрын
Good question. I am about to try that. I think you need an Ampere architecture on the GPU - (A100 or RTX 3090). Scott, thanks for a great video.
@guyvandenberg9297
@guyvandenberg9297 26 күн бұрын
Ampere architecture for BF16 as opposed to F16 as per Scott's explanation in the video.
@scott_ingram
@scott_ingram 25 күн бұрын
Thanks for your question! The notebook is designed to do the training on Colab, but you can run it locally for training if you have compatible hardware; I haven't tested it locally though. The RTX3090 does support brain float. Install python, then set up a virtual environment: python3 -m venv venv source venv/bin/activate Next, install and start the Jupyter notebook service: pip install jupyter jupyter notebook --kernel_name=python3 That will run a local jupyter notebook service and connect to a Python 3 kernel. Then, test GPU availability: import torch print(torch.cuda.is_available()) device = torch.device("cuda" if torch.cuda.is_available() else "cpu") print(f"Using device: {device}") Here's how you would create a tensor with pytorch on the RTX 3090 and tell it to use brain float: tensor = torch.randn(1024, 1024, dtype=torch.bfloat16) Some cells in the notebook won't run correctly, such as the first cell that sets up text wrapping (this cell is not relevant for training); that's designed for Colab specifically. There may be other compatibility issues, but I haven't tested it running locally. This should get you started to see whether your GPU could potentially work. Let me know how it works out!
@PreparelikeJoseph
@PreparelikeJoseph 18 күн бұрын
@@scott_ingram Id really like to get some ai agents running locally on a self hosted model. Im hoping two rtx 3090 can combine just via PCI and load a full 70b model.
"okay, but I want Llama 3 for my specific use case" - Here's how
24:20
Llama 3 Fine Tuning for Dummies (with 16k, 32k,... Context)
23:16
Nodematic Tutorials
Рет қаралды 19 М.
Just try to use a cool gadget 😍
00:33
123 GO! SHORTS
Рет қаралды 33 МЛН
ТАМАЕВ vs ВЕНГАЛБИ. Самая Быстрая BMW M5 vs CLS 63
1:15:39
Асхаб Тамаев
Рет қаралды 3,7 МЛН
La revancha 😱
00:55
Juan De Dios Pantoja 2
Рет қаралды 38 МЛН
Meta x Ray-Ban AI Glasses Are Fantastic...But Not Why You Think
16:48
Matthew Berman
Рет қаралды 57 М.
NixOS Tutorial - Upgrading to new releases
6:17
Why Does Nothing Work
Рет қаралды 990
What is Retrieval-Augmented Generation (RAG)?
6:36
IBM Technology
Рет қаралды 546 М.
Finetune Llama-3-8B with unsloth 4bit quantized with ORPO
17:45
Rohan-Paul-AI
Рет қаралды 511
New LLaMA 3 Fine-Tuned - Smaug 70b Dominates Benchmarks
12:50
Matthew Berman
Рет қаралды 45 М.
What Is an AI Anyway? | Mustafa Suleyman | TED
22:02
TED
Рет қаралды 1,1 МЛН
You've been using AI Wrong
30:58
NetworkChuck
Рет қаралды 339 М.
WAY faster than a Raspberry Pi-but is it enough?
17:26
Jeff Geerling
Рет қаралды 650 М.
Marker: This Open-Source Tool will make your PDFs LLM Ready
14:11
Prompt Engineering
Рет қаралды 33 М.
iPhone 15 Pro vs Samsung s24🤣 #shorts
0:10
Tech Tonics
Рет қаралды 13 МЛН
#miniphone
0:16
Miniphone
Рет қаралды 2,9 МЛН
Will the battery emit smoke if it rotates rapidly?
0:11
Meaningful Cartoons 183
Рет қаралды 11 МЛН