Deploying Hugging Face models with Amazon SageMaker and AWS Inferentia2

  Рет қаралды 8,468

Julien Simon

Julien Simon

4 ай бұрын

In this video, I walk you through the simple process of deploying a Hugging Face large language model on AWS, with Amazon SageMaker and the AWS Inferentia2 accelerator.
⭐️⭐️⭐️ Don't forget to subscribe to be notified of future videos. Follow me on Medium at / julsimon or Substack at julsimon.substack.com. ⭐️⭐️⭐️
Notebook:
gitlab.com/juliensimon/huggin...
Deep Dive: Hugging Face models on AWS AI Accelerators
• Deep Dive: Hugging Fac...
Blog posts:
huggingface.co/blog/how-to-ge...
aws.amazon.com/blogs/machine-...

Пікірлер: 11
@briangman3
@briangman3 2 ай бұрын
Great video!
@juliensimonfr
@juliensimonfr 2 ай бұрын
Glad you enjoyed it
@caiyu538
@caiyu538 4 ай бұрын
Great
@juliensimonfr
@juliensimonfr 2 ай бұрын
Thank you
@leonardoschenkel9168
@leonardoschenkel9168 Ай бұрын
hi Julien! Do you have any tips on how can I convert a ComfyUI workflow SD1.5 based model to 🤗 or run directly on INF2 ?
@rileyheiman1161
@rileyheiman1161 3 ай бұрын
Great video Julien, thank you! Does the model have to be pre-compiled to run on AWS (EC2 or SageMager)?
@juliensimonfr
@juliensimonfr 2 ай бұрын
Thank you. If you're going to deploy on SageMaker, yes. At the moment, our container won't compile the moment. On EC2, the model will be compiled on the fly if needed.
@briangman3
@briangman3 2 ай бұрын
I am going to use inf2 to run finetuned llama 3 70B should be great, I am curious about token gen speed on inf2 different sizes, if you can as a side note mention that in your next video, like this created it at x token/s
@juliensimonfr
@juliensimonfr 2 ай бұрын
You'll find benchmarks in the the Neuron SDK documentation awsdocs-neuron.readthedocs-hosted.com/en/latest/general/benchmarks/index.html
@larsjacobs253
@larsjacobs253 3 ай бұрын
Great video! However when I try to deploy llama2 7B on a inf2.xlarge instance, I get an out of memory error. However I have seen posts about people deploying llama2 7b on a inf2.xlarge instance. How can this be?
@juliensimonfr
@juliensimonfr 3 ай бұрын
Please post details and logs at discuss.huggingface.co/c/aws-inferentia-trainium/66
Deploying Llama3 with Inference Endpoints and AWS Inferentia2
10:07
Deep Dive: Hugging Face models on AWS AI Accelerators
52:42
Julien Simon
Рет қаралды 8 М.
Double Stacked Pizza @Lionfield @ChefRush
00:33
albert_cancook
Рет қаралды 115 МЛН
Survive 100 Days In Nuclear Bunker, Win $500,000
32:21
MrBeast
Рет қаралды 76 МЛН
Summer shower by Secret Vlog
00:17
Secret Vlog
Рет қаралды 14 МЛН
Launch your own LLM (Deploy LLaMA 2 on Amazon SageMaker with Hugging Face Deep Learning Containers)
1:48:01
How is THIS Coding Assistant FREE?
5:19
Alex Ziskind
Рет қаралды 145 М.
Hugging Face LLMs with SageMaker + RAG with Pinecone
32:30
James Briggs
Рет қаралды 17 М.
Introduction to Hugging Face on Amazon SageMaker | Amazon Web Services
24:53
Amazon Web Services
Рет қаралды 10 М.
The moment we stopped understanding AI [AlexNet]
17:38
Welch Labs
Рет қаралды 844 М.
Integrating Generative AI Models with Amazon Bedrock
14:19
AWS Developers
Рет қаралды 24 М.
8 Товаров с Алиэкспресс, о которых ты мог и не знать!
49:47
РасПаковка ДваПаковка
Рет қаралды 175 М.
تجربة أغرب توصيلة شحن ضد القطع تماما
0:56
صدام العزي
Рет қаралды 63 МЛН
Tag him😳💕 #miniphone #iphone #samsung #smartphone #fy
0:11
Pockify™
Рет қаралды 3,7 МЛН
Лучший браузер!
0:27
Honey Montana
Рет қаралды 1,1 МЛН