How to Build an Inference Service

  Рет қаралды 1,037

Trelis Research

Trelis Research

Күн бұрын

Пікірлер: 9
@ChrisSMurphy1
@ChrisSMurphy1 2 күн бұрын
Trelis at it again..
@subhamkundu5043
@subhamkundu5043 Күн бұрын
Hi Trelis, Are you planning to give a Black Friday discount?
@mdrafatsiddiqui
@mdrafatsiddiqui 2 күн бұрын
Hi Ronan, Can you guide on the best resources for algorithmic trading using ML, DL and AI. Also, are you planning to offer Black Friday Sale or Christmas Discount on your Trelis Advanced Repo?
@fearnworks
@fearnworks 7 сағат бұрын
10/10
@danieldemillard9412
@danieldemillard9412 2 күн бұрын
Awesome video Ronan as always. This video is very timely as we are optimizing our costs by moving away from runpod serverless. I have a couple of questions. - Can the service you have written scale to 0? It seems that with the minimum TPS being a positive number, this wouldn't work right? Scaling to 0 is very important for us as we have bursty traffic with long idle times and this is the primary motivation for serverless. - Is there any alternative to configuring the TPS scaling limits manually for each GPU/model combination? This seems a bit cumbersome. Would it be possible to scale directly based on the GPU utilization? I am thinking something like ssh'ing into the instance with paramiko and automatically running nvidia-smi (you can output results to a csv with --format=csv and --query-gpu parameters). You can then use the output of these results to determine if the GPUs are at full utilization. Maybe take a sample over your time window as this number can fluxuate a lot. Then you can use this to determine whether you need to add or subtract instances and you could use current TPS to determine if the instances is being used at all (scale to 0). Do you think this approach would work? - Do you only support runpod or can other clouds like vast.ai or shadeform be added as well? Both have apis that allow you to create, delete, and configure specific instances. Runpod has had many GPU shortage issues lately specifically for 48gb GPUs (A40, L4, L40, 6000 Ada etc.) - Is there any configuration here for Secure cloud vs. Community cloud? I think by default if you don't specify in the runpod api, it defaults to "ALL" which means that you will get whatever. Community cloud can be less stable and less secure so many users may want to only opt for Secure cloud. Again, I really appreciate the content you produce. For anyone who hasn't purchased access to the Trelis git repos yet, they are quite the value. Ronan consistently keeps them up-to-date with the latest models and new approaches. It is a great return on your investment and the gift that keeps on giving!
@TrelisResearch
@TrelisResearch Күн бұрын
Howdy! 1. Yes, if you set the min instances to zero, it will scale to zero. 2. Scaling based on utilisation might work, yea, it's a cool idea. That may be more robust than doing TPS. Perhaps sshing might be needed or maybe there's a way to get that info from vllm, I'd have to dig. 3. Yes, you could use other platforms by updating pod_utils.py to hit those apis. (will require some different syntax there). 4. Secure cloud is currently hard coded, yeah, for the reasons you said.
@danieldemillard9412
@danieldemillard9412 Күн бұрын
@@TrelisResearch Awesome, thanks!
@MegaClockworkDoc
@MegaClockworkDoc 2 күн бұрын
Wonderful work.
@NLPprompter
@NLPprompter 2 күн бұрын
oH my, another cool content!
Fine Tune Flux Diffusion Models with Your Photos
51:57
Trelis Research
Рет қаралды 2,8 М.
Mistral Pixtral First Look
12:19
Tech With Ryan Wong
Рет қаралды 209
Увеличили моцареллу для @Lorenzo.bagnati
00:48
Кушать Хочу
Рет қаралды 8 МЛН
How To Choose Mac N Cheese Date Night.. 🧀
00:58
Jojo Sim
Рет қаралды 86 МЛН
Ice Cream or Surprise Trip Around the World?
00:31
Hungry FAM
Рет қаралды 20 МЛН
Hoodie gets wicked makeover! 😲
00:47
Justin Flom
Рет қаралды 131 МЛН
How to Create High Quality Synthetic Data for Fine-Tuning LLMs
14:29
Synthetic data purpose-built for Generative AI
Рет қаралды 2,5 М.
Output Predictions - Faster Inference with OpenAI or vLLM
24:23
Trelis Research
Рет қаралды 1,3 М.
Make Cursor Understand Folder Structure - Coding with LLMs
3:53
Trelis Research
Рет қаралды 1,1 М.
CONTEXT CACHING for Faster and Cheaper Inference
35:26
Trelis Research
Рет қаралды 1,7 М.
Fine-tune Stable Diffusion on your dataset | text-to-image Finetuning
16:19
with the power of denden it can turn into an iPhone #shorts
0:10
Лазерная замена стекла iPhone 14 plus
1:00
Mosdisplay
Рет қаралды 3,4 МЛН
Apple ВАС ОБМАНЫВАЕТ! #smartphone #айфон #интересное
0:53
ТЕХНОБЛОГ АЛИША
Рет қаралды 132 М.