Deploying Llama3 on Amazon SageMaker

Рет қаралды 17,974

3 ай бұрын

In this video tutorial, I'll show you how easy it is to deploy the Meta Llama 3 8B model using Amazon SageMaker and the latest Hugging Face Text Generation Inference containers (TGI 2.0). Follow along as I guide you through the process of setting up synchronous and streaming inference, making text generation tasks a breeze!
The Meta Llama 3 8B model is a powerful tool for natural language processing, and with Amazon SageMaker's scalable infrastructure, you can leverage this model efficiently. I'll take you through the step-by-step process, from setting up the environment to running inference, ensuring you have the knowledge to implement this in your own projects.
So, whether you're a data scientist, machine learning engineer, or developer interested in text generation and NLP, this video is for you!
#MachineLearning #NLG #AmazonSageMaker #HuggingFace #TextGeneration
⭐️⭐️⭐️ Don't forget to subscribe to be notified of future videos. Follow me on Medium at / julsimon or Substack at julsimon.substack.com. ⭐️⭐️⭐️
Model:
huggingface.co/meta-llama/Met...
Notebook:
gitlab.com/juliensimon/huggin...
Deep Learning Containers:
github.com/aws/deep-learning-...

Пікірлер: 28

@user-vo5ce6kn5t 3 ай бұрын

Thank you for providing such an informative and important video.

@juliensimonfr 3 ай бұрын

Glad it was helpful!

@pallavimallesh1605 3 ай бұрын

0:40 0:40 0:40

@jamesyin3220 3 ай бұрын

Hi Julien, thanks for the video. There's only one problem with it: the llama-3 model does not stop generating tokens. In your video, you've specified a max of 256 tokens, so we see the prediction got cut off in both of your examples. However, when I deployed the model on Sagemaker and gave it a large max token, I discovered the model just never stops. Do you know of a solution to this, or if I'm wrong, can you please make a video with a longer max token to show that it does stop? Thank you.

@juliensimonfr 3 ай бұрын

Yes, I think the stop tokens are wrong/incomplete. It seems that the model configuration file was updated a few days later. This may be fixed, I need to check.

@tanujaysaha6756 3 ай бұрын

Hi Julien, Thanks for the insightful video. What is the latency/throughput of the Llama 3 8B model on various g5 instances?

@juliensimonfr 3 ай бұрын

Hi, this depends on instance size, batch size, sequence length, etc. I'd recommend testing it yourself with the configuration that makes sense for your project. Using "%%time" in a Jupyter cell predicting data and dividing that time by the number of generated token will give you a decent ballpark estimate.

@Aqsahafeez-kb5ev 3 ай бұрын

great learning with you julien. can you tell me is it possible to do PEFT on gemini using image dataset. if yes can you please make a video on how to do it.

@juliensimonfr 3 ай бұрын

Thank you. Gemini is a closed model, so our libraries won't work. Please check the Google website for tuning information :)

@elissonsayago9777 3 ай бұрын

Hi! Do you know if I can deploy the llama3 model at any AWS region? Until now I have just tried Jumpstart solutions and regions. Thanks a lot! Great content!

@juliensimonfr 3 ай бұрын

Thank you! Yes, SageMaker is available in all AWS regions, so you should be able to deploy Hugging Face models everywhere :)

@agomezss 3 ай бұрын

Hey Julien Thanks for the video, It's Really Insightful! Would you think that a cheaper machine could handle this model on AWS Sage Maker? Thank you!

@juliensimonfr 3 ай бұрын

g5 instances are pretty cheap. I don't think the model would find on g4, unless you quantized it. You can also try Inferentia2 (inf2.xlarge).

@larsjacobs253 3 ай бұрын

I have tried this but the shards seem to fail downloading due to insufficient memory. Is there a way to lower the memory usage during initialization of the model on an endpoint?

@josephazhikannickel4188 29 күн бұрын

Hi Julien, thank you for the valuable contribution. your vids are impressive, love it. May i have one question, that do deploying Llama3 on sagemaker have met HIPAA compliance, Because the data is highly confidential. Hope you will help with an answer. Thank you!

@juliensimonfr 17 күн бұрын

This should help :) aws.amazon.com/about-aws/whats-new/2018/05/Amazon-SageMaker-Achieves-HIPAA-Eligibility/

@divyagarh 3 ай бұрын

Hi Julien, a small question, so can we stop models deployed on G5 instances ? I dont want to delete them, but stop them during the night and launch it in again in the morning?

@juliensimonfr 3 ай бұрын

sure. Delete the endpoint and deploy it again when you need it.

@divyagarh 3 ай бұрын

@@juliensimonfr thanks

@AkshatGupta-kw9tp 3 ай бұрын

How am I supposed to expose the endpoints so that I can use the model to generate responses via an api. New to this, please explain

@juliensimonfr 3 ай бұрын

If you are completely new to this, then I'd recommend Inference Endpoints instead : huggingface.co/docs/inference-endpoints/index

@divyagarh 3 ай бұрын

Hi Julien, How do we fine tune same model which we deployed on Sagemaker?

@juliensimonfr 3 ай бұрын

Fine-tuning is a different process, which you could also run on SageMaker. Docs: huggingface.co/docs/sagemaker/index

@divyagarh 3 ай бұрын

@@juliensimonfr Thanks Julien!. Could you please do a video on it. Fine-tuning Mistral or Llama and deploying on sagemaker?

@divyagarh 6 күн бұрын

Hi Julien, I followed exact steps for Meta's newer model Llama 3.1 8B Instruct and get this error on Sagemaker, "The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. The tokenizer class you load from this checkpoint is 'PreTrainedTokenizerFast'. The class this function is called from is 'LlamaTokenizer'." Any thoughts? Please please help.

@juliensimonfr 6 күн бұрын

Is this the deployment failing? which cell gives you this error?

@divyagarh 6 күн бұрын

@@juliensimonfr the deploy cell. I found these errors in the log. A lot of people are experiencing the same issue.