Exploring the fastest open source LLM for inferencing and serving | VLLM

  Рет қаралды 9,959

JarvisLabs AI

JarvisLabs AI

Күн бұрын

Пікірлер: 21
@bernard2735
@bernard2735 10 ай бұрын
This was a nicely paced and clear tutorial. Thank you. Liked and subscribed.
@JarvislabsAI
@JarvislabsAI 10 ай бұрын
Thanks for the support :)
@Akshatgiri
@Akshatgiri 10 ай бұрын
Super useful. Thanks for breaking it down.
@HermesFibonacci
@HermesFibonacci 2 ай бұрын
Very interesting i listened to the very end, and it gave me some ideas for prepping my Model. Thanks for the explanation and demo. May I ask?... Do you think an Nvidia GTX Orin Devkit 64 GB would be fitting for running LLMs locally for fine tuning, training and later deploying to server once developed (both Locally and Server on Ubuntu)?
@JarvislabsAI
@JarvislabsAI 2 ай бұрын
Have not tried it. No idea.
@kaiwalya_patil
@kaiwalya_patil 11 ай бұрын
An excellent one! Thank you so much for sharing. Any idea about the possibility of fine tuning my own LLM(like Llama/Mistral), uploading back to HF and the put it into production using VLLM?
@JarvislabsAI
@JarvislabsAI 11 ай бұрын
Yeah definitely possible. Would make one soon.
@kaiwalya_patil
@kaiwalya_patil 11 ай бұрын
@@JarvislabsAI Thank you, looking forward!
@dineshgaddi1843
@dineshgaddi1843 11 ай бұрын
Thank you for sharing this information.
@JarvislabsAI
@JarvislabsAI 11 ай бұрын
Glad it was helpful!
@fxhp1
@fxhp1 11 ай бұрын
hey i also have an AI channel, i tried mistrals model and it didnt finish its execution and looped over the input forever, i had slightly better luck with the instruct version. did you ever get mistral to work?
@JarvislabsAI
@JarvislabsAI 11 ай бұрын
We tried with vLLM and remember it working. I will probably check again.
@YajuvendraSinghRawat
@YajuvendraSinghRawat 7 ай бұрын
Its a wonderful videa, clearly and concisely explained.
@JarvislabsAI
@JarvislabsAI 7 ай бұрын
Glad you liked it
@alecd8534
@alecd8534 11 ай бұрын
Thanks for your video. It is interesting. I am new to LLM and one question to ask. When you run JarvisLabs in your demo, does it mean you are running a server running locally to provide API endpoint? Please advise
@JarvislabsAI
@JarvislabsAI 11 ай бұрын
In the demo, I was running on a gpu powered instance. The vllm server in this case is running in the Jarvislabs instance. You can use the API endpoint from anywhere.
@alecd8534
@alecd8534 11 ай бұрын
@@JarvislabsAI thanks so much. I have Navida T500 GPU card on my laptop. But it has only 4 gb. Can it run vLLM? Do we need to install JarvislabsAI on our local machine? Does JarvisLab do? Thanks
@JarvislabsAI
@JarvislabsAI 11 ай бұрын
Not sure, if will be possible to run vllm on T500 GPU. Jarvislabs, offers a gpu instance in which you can use vllm.
@Ian-fo9vh
@Ian-fo9vh 10 ай бұрын
hank you, it was interesting.
vLLM on Kubernetes in Production
27:31
Kubesimplify
Рет қаралды 4 М.
Fast LLM Serving with vLLM and PagedAttention
32:07
Anyscale
Рет қаралды 28 М.
Сестра обхитрила!
00:17
Victoria Portfolio
Рет қаралды 958 М.
“Don’t stop the chances.”
00:44
ISSEI / いっせい
Рет қаралды 62 МЛН
The architecture of mixtral8x7b - What is MoE(Mixture of experts) ?
11:42
Accelerating LLM Inference with vLLM
35:53
Databricks
Рет қаралды 8 М.
How to pick a GPU and Inference Engine?
1:04:22
Trelis Research
Рет қаралды 4,9 М.
Unleash the power of Local LLM's with Ollama x AnythingLLM
10:15
Tim Carambat
Рет қаралды 129 М.
How I Made AI Assistants Do My Work For Me: CrewAI
19:21
Maya Akim
Рет қаралды 923 М.
Accelerating LLMs 10x with Pure PyTorch: No Custom Libraries
14:42
JarvisLabs AI
Рет қаралды 1 М.
vLLM Office Hours - FP8 Quantization Deep Dive - July 9, 2024
56:09
Neural Magic
Рет қаралды 1,6 М.
vLLM - Turbo Charge your LLM Inference
8:55
Sam Witteveen
Рет қаралды 17 М.
Сестра обхитрила!
00:17
Victoria Portfolio
Рет қаралды 958 М.