How to Build an Inference Service

Рет қаралды 1,037

Trelis Research

Күн бұрын

Пікірлер: 9

@ChrisSMurphy1 2 күн бұрын

Trelis at it again..

@subhamkundu5043 Күн бұрын

Hi Trelis, Are you planning to give a Black Friday discount?

@mdrafatsiddiqui 2 күн бұрын

Hi Ronan, Can you guide on the best resources for algorithmic trading using ML, DL and AI. Also, are you planning to offer Black Friday Sale or Christmas Discount on your Trelis Advanced Repo?

@fearnworks 7 сағат бұрын

10/10

@danieldemillard9412 2 күн бұрын

Awesome video Ronan as always. This video is very timely as we are optimizing our costs by moving away from runpod serverless. I have a couple of questions. - Can the service you have written scale to 0? It seems that with the minimum TPS being a positive number, this wouldn't work right? Scaling to 0 is very important for us as we have bursty traffic with long idle times and this is the primary motivation for serverless. - Is there any alternative to configuring the TPS scaling limits manually for each GPU/model combination? This seems a bit cumbersome. Would it be possible to scale directly based on the GPU utilization? I am thinking something like ssh'ing into the instance with paramiko and automatically running nvidia-smi (you can output results to a csv with --format=csv and --query-gpu parameters). You can then use the output of these results to determine if the GPUs are at full utilization. Maybe take a sample over your time window as this number can fluxuate a lot. Then you can use this to determine whether you need to add or subtract instances and you could use current TPS to determine if the instances is being used at all (scale to 0). Do you think this approach would work? - Do you only support runpod or can other clouds like vast.ai or shadeform be added as well? Both have apis that allow you to create, delete, and configure specific instances. Runpod has had many GPU shortage issues lately specifically for 48gb GPUs (A40, L4, L40, 6000 Ada etc.) - Is there any configuration here for Secure cloud vs. Community cloud? I think by default if you don't specify in the runpod api, it defaults to "ALL" which means that you will get whatever. Community cloud can be less stable and less secure so many users may want to only opt for Secure cloud. Again, I really appreciate the content you produce. For anyone who hasn't purchased access to the Trelis git repos yet, they are quite the value. Ronan consistently keeps them up-to-date with the latest models and new approaches. It is a great return on your investment and the gift that keeps on giving!

@TrelisResearch Күн бұрын

Howdy! 1. Yes, if you set the min instances to zero, it will scale to zero. 2. Scaling based on utilisation might work, yea, it's a cool idea. That may be more robust than doing TPS. Perhaps sshing might be needed or maybe there's a way to get that info from vllm, I'd have to dig. 3. Yes, you could use other platforms by updating pod_utils.py to hit those apis. (will require some different syntax there). 4. Secure cloud is currently hard coded, yeah, for the reasons you said.

@danieldemillard9412 Күн бұрын

@@TrelisResearch Awesome, thanks!