Your Own Llama 2 API on AWS SageMaker in 10 min! Complete AWS, Lambda, API Gateway Tutorial

No video

Your Own Llama 2 API on AWS SageMaker in 10 min! Complete AWS, Lambda, API Gateway Tutorial

Рет қаралды 25,145

Rob Shocks

Күн бұрын

Пікірлер: 72

@RobShocks Жыл бұрын

IMPORTANT BILLING NOTE: It's very important to make sure you follow the SageMaker clean up process to make sure you don't encourage big charges on your AWS account billing. This link here will help you docs.aws.amazon.com/sagemaker/latest/dg/ex1-cleanup.html Deleting the domain and users will not be enough.

@ballsofsalsa01 10 ай бұрын

fkin garbage never loads 2 hours and after clicking on "create domain" still loading

@samirkumar7788 7 ай бұрын

this is lame, LLM response without streaming ?

@RobShocks 7 ай бұрын

@@samirkumar7788 You can setup streaming if you want.

@mamunrashid1027 11 ай бұрын

Thanks, Rob. That very helpful tutorial. For some reason, I have been receiving "Hello from Lambda!"

@RobShocks 11 ай бұрын

That is the default 200 response from the Lambda function. You need to change the default code in the Lambda function to the code I shared. Search for Hello Lambda in the function in AWS Lambda and you will see where it's coming from.

@rowanth Жыл бұрын

Great work as always Rob!

@RobShocks Жыл бұрын

High praise from the Doctor. Thanks chief!! x

@wasgeht2409 Ай бұрын

Thanks!

@fadichehimi5148 10 ай бұрын

thanks a lot for this helpful demo!! appreciate all your efforts and notes!

@theAIjurist Ай бұрын

thank you very much for the tutorial, this was really helpful

@yashinshekh802 11 ай бұрын

this is amazing. thank you. exactly what i looking for

@ricardoalmira6005 8 ай бұрын

excellent! exactly what i was looking for. thxs

@datadockter852 4 ай бұрын

This is the best one so far. But can you please make one where how I can run these models on my windows.

@cowabungatv4159 11 ай бұрын

great tutorial , keep up the good work !

@LuizAntonioJunior 7 ай бұрын

Thanks for this tutorial. One important thing we would like to know is: How much is it montlhy on AWS? Its kinda difficult to understand their price model.

@wasgeht2409 Ай бұрын

Thanks! One question , if I use the jump start models from aws for example llama 2 and start with the fine tuning so then it's my own fine tuned model? and I can use it for my own use cases ?

@mathematicus4701 11 ай бұрын

Thx, I like your style!

@RobertSeghedi 8 ай бұрын

Nice tutorial!

@rofu37 5 ай бұрын

That’s cool!! Thanks

@XperimentalJapanese 9 ай бұрын

Thanks Rob

@SolidBuildersInc 3 ай бұрын

Thank you for your presentation. I clicked the Subscribe button, although I didn't delve into the video content. During your talk, I recall you mentioning the open-source LLM and discussing AWS pricing. This led me to prioritize a cost-effective solution that allows for scalability. Have you considered running an ollama model locally and setting up a tunnel with a port endpoint for a public URL? I appreciate any feedback you can provide." 😊

@RobShocks 3 ай бұрын

Thanks for the comment! Yes now I would go the local route as you mentioned. Unless you carefully manage it on AWS it’s going to be very expensive. What I have not explored yet is concurrency issues and scaling. But there seem to be lots of infrastructure startups like replicate that are solving these issues

@user-ph5is3hi9c 4 ай бұрын

thank you Rob for the tutorial, I really appreciate how you put it, so much clear and precise. I am a student and need to deploy LLM in AWS for the Uni project, I want to try doing from my own AWS account with free trier and trying to understand how much would that cost if I only need to deploy it and check with a few inputs (if I am talking about Llama-2-7b-chat) and don't want to end up with 400USD bills. do you think just the deployment process would cost much?

@RobShocks 4 ай бұрын

I’d avoid sagemaker if you want to do it cheaply. Take a look at an api from replicate or use ollama to run on a local machine. Sagemaker might be overkill for your project and the spending can get dangerous quick.

@user-bq4ge3zc8k 11 ай бұрын

Hey, thank you for the video. However LLama2 is not available in my SageMaker, so I guess your video is obsolete

@barasoft 3 ай бұрын

Thanks for this tutorial. Can we do all these using Amazon SageMaker Studio Lab? Can we find a guide for that somewhere? Thanks 🙂

@dchuguashvili 11 ай бұрын

Should we wait for tutorial on how to fine tune Llama2 on our own data and deploy it on AWS SageMaker?

@courses4611 11 ай бұрын

I am interested in this as well

@mad-mak 11 ай бұрын

Did you find anything about this ??

@rubensdemelo 8 ай бұрын

thank you so much!

@brittanymorris5326 5 ай бұрын

I'm still learning -- but why would i use sagemaker over bedrock? when do i pick one over the other? specifically for deploying LLMs?

@hrvojek9110 9 ай бұрын

I am new to AWS and Llamas, and this provided a great first insight. One question though - once we create a Domain, does it mean we will get charged immediately? I do not remember setting up an account and providing the credit card info, so I am a bit confused on how this is actually operating. Can you point me into some documentation which explains this? Thank you!

@BarryDooney Жыл бұрын

Great content

@RobShocks Жыл бұрын

Thanks a million, on for a chat some day in town if you want to grab a coffee.

@sasukeuchiha-ck4hy 11 ай бұрын

Can you train the AI using your own dataset, and can you integrate it into an existing website?

@rishiktiwari 9 ай бұрын

You can train but generally computationally intense. Its easier to use a good LLM and then pair with langchain or do RAG.

@lauvindra 7 ай бұрын

Hi Great video btw. I would like to ask if it's possible to make concurrent request using Sagemaker. Will it drastically increase the cost or the costing is based on the hours of usage?

@parisaghanad8042 7 ай бұрын

Thanks💞

@MahalingamBalasubramanian 10 ай бұрын

when deploy textgeneration-llama-2-7b, i am gettng below error. Any idea on this? Something went wrong We encountered an error while preparing to deploy your endpoint. You can get more details below. operation deployAsync failed: handler error

@nicolassuarez2933 3 ай бұрын

Oustanding! But what's the point of spending lots of money with SageMaker inestead of directly using Groq's api with LLama? Thanks!

@RobShocks 3 ай бұрын

This video is months old. Now I would just use groq or replicate

@SmeetKathiria Ай бұрын

this does not work anymore. They changed the endpoint structure.

@jyotikumari3696 27 күн бұрын

It works for me. I skipped the jupyter notebook part as they have given a ui to test the llama output.

@luisramos1977 Жыл бұрын

Dear, thanks for the video, but there is a jump back in minute 4, and the quality of images is very low, could you share with us the code of the lambda function?, I can not read it from the video.

@RobShocks Жыл бұрын

No problem, I actually had put the code in the description of the video if you want to look just below the video you will see a link to code snippets. Let me know if you find it okay. You can also try putting the video resolution up to 1080 to see it more clearly.

@amitvyas7905 6 ай бұрын

Quick question. If I want to use it for an app that is used by a few hundreds or thousands of users then should I keep the sage maker up and running 24X7? Because I'll need to use it for getting the AI-generated responses right?

@RobShocks 6 ай бұрын

Yes, it would need to be on constantly, but I’d make sure to heavily optimise for high usage otherwise your costs will be very high.

@shivz732 Жыл бұрын

Amazing video, How can we make this scalable? What if we need more capacity on the server side, any ideas?

@RobShocks Жыл бұрын

You can try a larger instance as the simplest way to scale but be wary of the pricing and make sure to set some limits in your billing. Outside of that I'm not an expert on scaling these models so the team at AWS may be helpful

@stasoline Жыл бұрын

Thanks for the tutorial! Can you scale down SageMaker to zero when it's not used?

@RobShocks Жыл бұрын

Unfortunately not that I can see, but I'm not a SageMaker expert. Most documentation I found mentioned deleting the instance which I think is a bit of a pain in the butt. I'll be investigating other options so I'll come back with some updates soon .

@courses4611 11 ай бұрын

I'm using Llama2 to build an AI chatbot based on a custom knowledge base (approx. 100 pages PDF) with AWS SageMaker as the backend. I expect around 2000 users/month and 5-7 simultaneous interactions. Each user will ask about 5 questions. What AWS instance size would be optimal for this usage? Also, can you tell me what will be the projected monthly cost considering these metrics? Any advice is welcome

@rituhalder835 10 ай бұрын

hey..are u up with your project ? i have a similar kind of thing to do

@diederik6975 10 ай бұрын

Would depend on how many parameters (i.e. llama 7b vs 70b)

@Xplay00 5 ай бұрын

rob i have a doubt why to use this where we can use models - ondemand without pricing for instance as base models we pay for only tokens

@RobShocks 4 ай бұрын

I agree. When I put this tutorial up there were limited ways to run LLms locally. There are cheaper options to deploy now or look at services like replicate

@Xplay00 4 ай бұрын

@@RobShocks loved your response and concern/gratitude to address me. I am dying to look at one tutorial to get the input from s3-cloudfront hosted website to work with bedrock using lambdaedge or cloudfront functions. i feel its fastest and cheapest way to work with.

@fulviobergantin4254 8 ай бұрын

Thank you for your support. I've a problem when I create a Deploy model to endpoint . I Have this message: Failed to deploy endpoint The account-level service limit 'ml.inf2.xlarge for endpoint usage' is 0 Instances, with current utilization of 0 Instances and a request delta of 1 Instances. Please use AWS Service Quotas to request an increase for this quota. If AWS Service Quotas is not available, contact AWS support to request an increase for this quota. Do you have idea that I solve that, please? I have free account. Is this the problem?

@shadrackdarku8613 7 ай бұрын

Its not free man. A quota request means you have to pay

@kennydurojaiye9389 11 ай бұрын

Thanks for this great tutorial! I'll like to change the content value for "user" but keep getting "Internal Server Error" message whenever the content is not "write me a tweet about conductors"

@takecert 11 ай бұрын

I'm not an author of the video, but let me help you -> the reason you are getting the Internal Server Error is most likely due to the Lambda Timeout. The default timeout is 3 seconds and the model might take more time to generate the response, In order to increase the timeout please go to the Lambda Function console, go to the configuration tab, in the configuration tab go to the "General configuration", click edit and modify the timeout (increase it to e.g. 30 seconds).

@kennydurojaiye9389 11 ай бұрын

Thank you 👍@@takecert

@victorrippi 9 ай бұрын

Hello, please, help me. How i can pass CustomAttributes='accept_eula=true' on header code? Help me, brother, please!!!!

@zareefbeyg 12 күн бұрын

You reply all auto generated comments 😂. Real comments wait for your response 😏

@RobShocks 11 күн бұрын

Not sure if I understand what you mean? Are you seeing auto generated comments somewhere?

@hugotown.entertainment 11 ай бұрын

I've followed every step, but it still doesn't work. Even the payload for making the request is different. The only variation in my configuration is the region (us-east-1) , but that should not affect the payload request

@arnaudh2082 11 ай бұрын

Why don't i see Llama in the list of the Sagemaker model ??

@fabiomartinelli64 11 ай бұрын

change region

@diederik6975 10 ай бұрын

@@fabiomartinelli64 Changed mine to california US, didnt work

@hrvojek9110 9 ай бұрын

@@diederik6975 I ended up requesting it from Amazon to enable it.

@bairaboinasathwik 11 ай бұрын

thanks @rob , i need a small help ,can you plz let me know , what is the recommended instance (for eg : ml.g4dn.xlarge) n if i want to build a chatbot lets say that should handle 1000 concurrent requests , can you plz suggest me the configuration for sagemake , i'm already good with serverless but i'm just worried about sagemaker part.