Deploying Llama3 on Amazon SageMaker

  Рет қаралды 17,974

Julien Simon

Julien Simon

3 ай бұрын

In this video tutorial, I'll show you how easy it is to deploy the Meta Llama 3 8B model using Amazon SageMaker and the latest Hugging Face Text Generation Inference containers (TGI 2.0). Follow along as I guide you through the process of setting up synchronous and streaming inference, making text generation tasks a breeze!
The Meta Llama 3 8B model is a powerful tool for natural language processing, and with Amazon SageMaker's scalable infrastructure, you can leverage this model efficiently. I'll take you through the step-by-step process, from setting up the environment to running inference, ensuring you have the knowledge to implement this in your own projects.
So, whether you're a data scientist, machine learning engineer, or developer interested in text generation and NLP, this video is for you!
#MachineLearning #NLG #AmazonSageMaker #HuggingFace #TextGeneration
⭐️⭐️⭐️ Don't forget to subscribe to be notified of future videos. Follow me on Medium at / julsimon or Substack at julsimon.substack.com. ⭐️⭐️⭐️
Model:
huggingface.co/meta-llama/Met...
Notebook:
gitlab.com/juliensimon/huggin...
Deep Learning Containers:
github.com/aws/deep-learning-...

Пікірлер: 28
@user-vo5ce6kn5t
@user-vo5ce6kn5t 3 ай бұрын
Thank you for providing such an informative and important video.
@juliensimonfr
@juliensimonfr 3 ай бұрын
Glad it was helpful!
@pallavimallesh1605
@pallavimallesh1605 3 ай бұрын
​ 0:40 0:40 0:40
@jamesyin3220
@jamesyin3220 3 ай бұрын
Hi Julien, thanks for the video. There's only one problem with it: the llama-3 model does not stop generating tokens. In your video, you've specified a max of 256 tokens, so we see the prediction got cut off in both of your examples. However, when I deployed the model on Sagemaker and gave it a large max token, I discovered the model just never stops. Do you know of a solution to this, or if I'm wrong, can you please make a video with a longer max token to show that it does stop? Thank you.
@juliensimonfr
@juliensimonfr 3 ай бұрын
Yes, I think the stop tokens are wrong/incomplete. It seems that the model configuration file was updated a few days later. This may be fixed, I need to check.
@tanujaysaha6756
@tanujaysaha6756 3 ай бұрын
Hi Julien, Thanks for the insightful video. What is the latency/throughput of the Llama 3 8B model on various g5 instances?
@juliensimonfr
@juliensimonfr 3 ай бұрын
Hi, this depends on instance size, batch size, sequence length, etc. I'd recommend testing it yourself with the configuration that makes sense for your project. Using "%%time" in a Jupyter cell predicting data and dividing that time by the number of generated token will give you a decent ballpark estimate.
@Aqsahafeez-kb5ev
@Aqsahafeez-kb5ev 3 ай бұрын
great learning with you julien. can you tell me is it possible to do PEFT on gemini using image dataset. if yes can you please make a video on how to do it.
@juliensimonfr
@juliensimonfr 3 ай бұрын
Thank you. Gemini is a closed model, so our libraries won't work. Please check the Google website for tuning information :)
@elissonsayago9777
@elissonsayago9777 3 ай бұрын
Hi! Do you know if I can deploy the llama3 model at any AWS region? Until now I have just tried Jumpstart solutions and regions. Thanks a lot! Great content!
@juliensimonfr
@juliensimonfr 3 ай бұрын
Thank you! Yes, SageMaker is available in all AWS regions, so you should be able to deploy Hugging Face models everywhere :)
@agomezss
@agomezss 3 ай бұрын
Hey Julien Thanks for the video, It's Really Insightful! Would you think that a cheaper machine could handle this model on AWS Sage Maker? Thank you!
@juliensimonfr
@juliensimonfr 3 ай бұрын
g5 instances are pretty cheap. I don't think the model would find on g4, unless you quantized it. You can also try Inferentia2 (inf2.xlarge).
@larsjacobs253
@larsjacobs253 3 ай бұрын
I have tried this but the shards seem to fail downloading due to insufficient memory. Is there a way to lower the memory usage during initialization of the model on an endpoint?
@josephazhikannickel4188
@josephazhikannickel4188 29 күн бұрын
Hi Julien, thank you for the valuable contribution. your vids are impressive, love it. May i have one question, that do deploying Llama3 on sagemaker have met HIPAA compliance, Because the data is highly confidential. Hope you will help with an answer. Thank you!
@juliensimonfr
@juliensimonfr 17 күн бұрын
This should help :) aws.amazon.com/about-aws/whats-new/2018/05/Amazon-SageMaker-Achieves-HIPAA-Eligibility/
@divyagarh
@divyagarh 3 ай бұрын
Hi Julien, a small question, so can we stop models deployed on G5 instances ? I dont want to delete them, but stop them during the night and launch it in again in the morning?
@juliensimonfr
@juliensimonfr 3 ай бұрын
sure. Delete the endpoint and deploy it again when you need it.
@divyagarh
@divyagarh 3 ай бұрын
@@juliensimonfr thanks
@AkshatGupta-kw9tp
@AkshatGupta-kw9tp 3 ай бұрын
How am I supposed to expose the endpoints so that I can use the model to generate responses via an api. New to this, please explain
@juliensimonfr
@juliensimonfr 3 ай бұрын
If you are completely new to this, then I'd recommend Inference Endpoints instead : huggingface.co/docs/inference-endpoints/index
@divyagarh
@divyagarh 3 ай бұрын
Hi Julien, How do we fine tune same model which we deployed on Sagemaker?
@juliensimonfr
@juliensimonfr 3 ай бұрын
Fine-tuning is a different process, which you could also run on SageMaker. Docs: huggingface.co/docs/sagemaker/index
@divyagarh
@divyagarh 3 ай бұрын
@@juliensimonfr Thanks Julien!. Could you please do a video on it. Fine-tuning Mistral or Llama and deploying on sagemaker?
@divyagarh
@divyagarh 6 күн бұрын
Hi Julien, I followed exact steps for Meta's newer model Llama 3.1 8B Instruct and get this error on Sagemaker, "The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. The tokenizer class you load from this checkpoint is 'PreTrainedTokenizerFast'. The class this function is called from is 'LlamaTokenizer'." Any thoughts? Please please help.
@juliensimonfr
@juliensimonfr 6 күн бұрын
Is this the deployment failing? which cell gives you this error?
@divyagarh
@divyagarh 6 күн бұрын
@@juliensimonfr the deploy cell. I found these errors in the log. A lot of people are experiencing the same issue.
Meta Announces Llama 3 at Weights & Biases’ conference
26:16
Weights & Biases
Рет қаралды 83 М.
Summarizing legal documents with Hugging Face and Amazon SageMaker
21:55
Alex hid in the closet #shorts
00:14
Mihdens
Рет қаралды 17 МЛН
Double Stacked Pizza @Lionfield @ChefRush
00:33
albert_cancook
Рет қаралды 115 МЛН
How Many Balloons Does It Take To Fly?
00:18
MrBeast
Рет қаралды 204 МЛН
SageMaker JumpStart: deploy Hugging Face models in minutes!
8:23
"okay, but I want Llama 3 for my specific use case" - Here's how
24:20
Unlimited AI Agents running locally with Ollama & AnythingLLM
15:21
Tim Carambat
Рет қаралды 106 М.
How I Made AI Assistants Do My Work For Me: CrewAI
19:21
Maya Akim
Рет қаралды 779 М.
Deliver high-performance ML models faster with MLOps tools
1:01:08
AWS Developers
Рет қаралды 10 М.
Build an SQL Agent with Llama 3 | Langchain | Ollama
20:28
TheAILearner
Рет қаралды 6 М.
Building open source LLM agents with Llama 3
17:40
LangChain
Рет қаралды 24 М.
iPhone 15 Pro в реальной жизни
24:07
HUDAKOV
Рет қаралды 488 М.
Частая ошибка геймеров? 😐 Dareu A710X
1:00
Вэйми
Рет қаралды 4,9 МЛН
Что делать если в телефон попала вода?
0:17
Лена Тропоцел
Рет қаралды 3,3 МЛН
iPhone socket cleaning #Fixit
0:30
Tamar DB (mt)
Рет қаралды 18 МЛН