vLLM Office Hours - Multimodal Models in vLLM with Roblox - August 8, 2024

  Рет қаралды 535

Neural Magic

Neural Magic

Күн бұрын

Пікірлер: 2
@hari000-f6y
@hari000-f6y 2 ай бұрын
I have a question!. I'm serving multimodal on vLLM, quantized (InternVL2) on L4 , it takes ~5-6 secs to complete a request, so when multiple request hit at a time, it takes much time ~30 secs to complete the requests. how to handle it like multiple requests also gets completed in ~5 secs. I have less understanding in batch_requesting and all.
@shumshvenhiszali
@shumshvenhiszali 2 ай бұрын
Say code opensource but where?
vLLM Office Hours - Speculative Decoding in vLLM - October 3, 2024
1:04:28
Will A Basketball Boat Hold My Weight?
00:30
MrBeast
Рет қаралды 145 МЛН
КОГДА К БАТЕ ПРИШЕЛ ДРУГ😂#shorts
00:59
BATEK_OFFICIAL
Рет қаралды 7 МЛН
Человек паук уже не тот
00:32
Miracle
Рет қаралды 3,5 МЛН
Pruning and Quantizing ML Models With One Shot Without Retraining
52:31
Unlock Faster and More Efficient LLMs with SparseGPT
42:27
Neural Magic
Рет қаралды 2,1 М.
Will A Basketball Boat Hold My Weight?
00:30
MrBeast
Рет қаралды 145 МЛН