How-To Speed-up Inference in LM Studio

  Рет қаралды 1,257

Fahd Mirza

Fahd Mirza

Күн бұрын

This video shares some tips and tricks to speed-up inference in LM Studio to talk with models locally.
🔥 Buy Me a Coffee to support the channel: ko-fi.com/fahd...
🔥 Get 50% Discount on any A6000 or A5000 GPU rental, use following link and coupon:
bit.ly/fahd-mirza
Coupon code: FahdMirza
▶ Become a Patron 🔥 - / fahdmirza
#lmstudio
PLEASE FOLLOW ME:
▶ LinkedIn: / fahdmirza
▶ KZbin: / @fahdmirza
▶ Blog: www.fahdmirza.com
RELATED VIDEOS:
▶ Resource lmstudio.ai
All rights reserved © 2021 Fahd Mirza

Пікірлер: 16
@Ayushsingh019
@Ayushsingh019 3 ай бұрын
nice video and I am trying vLLM to reduce the inference time of LLM and now will try exllama for same
@fahdmirza
@fahdmirza 3 ай бұрын
Keep it up
@ZIaIqbal
@ZIaIqbal 3 ай бұрын
What is the type and memory of the GPU? And how much RAM do you have in your machine?
@kironlau
@kironlau 3 ай бұрын
as show as the video 9:07: RAM:47.13gb, VRAM:47.4gb probably GPU: 4090(24gb)X2
@ZIaIqbal
@ZIaIqbal 3 ай бұрын
@@kironlau thank you, and you are right the video does show the specs. Just curious, have you done any cpp testing with CPU only models to see how big models can be successful run on RAM?
@kironlau
@kironlau 3 ай бұрын
@@ZIaIqbal Yes, I have just run ollama (make use of llama.ccp) in my SBC (rk3588 board), believe in me....runnin on cpu is extremely slow...unless you have a 32+ multi-core cpu server (but it still can't beat a 4060, I think....) For general use, vram and ram usage to load a model is the same, if the model size (after quantization) is 8GB, then add 10% of it size for buffer (it is the ram usage for asking short/non context question) if the context size is longer...it even double the ram usage: just use GLM-4-Chat 9B official test as an example (though it's a GPU test, on cpu is similar) precision Ram usage Prefilling Decode Speed Remarks INT4 8 GB 0.2s 23.3 tokens/s input length 1000 INT4 10 GB 0.8s 23.4 tokens/s input length 8000 INT4 17 GB 4.3s 14.6 tokens/s input length 32000
@fahdmirza
@fahdmirza 3 ай бұрын
I have also pasted the link to GPU site with a discount coupon, thanks
@nexx4582
@nexx4582 3 ай бұрын
Красава Иван!
@fahdmirza
@fahdmirza 3 ай бұрын
спасибо, хороший друг. Пожалуйста, также подпишитесь на канал.
@deepaktej7781
@deepaktej7781 3 ай бұрын
But, results are not perfect as before bcz of temp value is high. This makes the model more creative
@fahdmirza
@fahdmirza 3 ай бұрын
ok
@Abelius
@Abelius 3 ай бұрын
Er... the only reason you get more tokens per second is because you're offloading the entire model to VRAM on the second config, whereas you don't on the first config (not even 1). Also, at 12:00 you can see min_p and top_p settings get reverted back to acceptable values (they're a 0-1 range).
@fahdmirza
@fahdmirza 2 ай бұрын
thanks for feedback.
@Pauluz_The_Web_Gnome
@Pauluz_The_Web_Gnome 27 күн бұрын
The speed has been improved, but thet answers are very bad......OMG!
@fahdmirza
@fahdmirza 23 күн бұрын
that depends on the model
@Pauluz_The_Web_Gnome
@Pauluz_The_Web_Gnome 23 күн бұрын
@@fahdmirza What do you recommend? I have an RTX3090ti 24GB OC Gaming
How to Implement RAG locally using LM Studio and AnythingLLM
10:15
Double Inference Speed with AWQ Quantization
22:49
Trelis Research
Рет қаралды 2,5 М.
БЕЛКА СЬЕЛА КОТЕНКА?#cat
00:13
Лайки Like
Рет қаралды 2,2 МЛН
Новый уровень твоей сосиски
00:33
Кушать Хочу
Рет қаралды 4,8 МЛН
Players vs Corner Flags 🤯
00:28
LE FOOT EN VIDÉO
Рет қаралды 67 МЛН
iPhone or Chocolate??
00:16
Hungry FAM
Рет қаралды 38 МЛН
LMstudio как замена ChatGPT: Saiga Mistral, LLama и другие локальные LLM
15:24
"Mastering LM-Studio: Unleashing LLMs locally | OffGrid-AI"
23:58
This Tool Creates PERFECT AI Prompts Every Time!
9:09
Anubhav Shrimal
Рет қаралды 1,3 М.
AutoGen Studio with 100% Local LLMs (LM Studio)
9:46
Prompt Engineering
Рет қаралды 47 М.
Run ANY Open-Source Model LOCALLY (LM Studio Tutorial)
12:16
Matthew Berman
Рет қаралды 148 М.
OpenAI’s New ChatGPT: 7 Incredible Capabilities!
6:27
Two Minute Papers
Рет қаралды 194 М.
Setting Up BEST Local LLMs for Obsidian AI With LM Studio
10:52
SystemSculpt
Рет қаралды 11 М.
Stop paying for ChatGPT with these two tools | LMStudio x AnythingLLM
11:13
Why Agent Frameworks Will Fail (and what to use instead)
19:21
Dave Ebbelaar
Рет қаралды 65 М.
БЕЛКА СЬЕЛА КОТЕНКА?#cat
00:13
Лайки Like
Рет қаралды 2,2 МЛН