vLLM Office Hours - Multimodal Models in vLLM with Roblox - August 8, 2024

vLLM Office Hours - Model Quantization for Efficient vLLM Inference - July 25, 2024

vLLM Office Hours - Speculative Decoding in vLLM - October 3, 2024

Will A Basketball Boat Hold My Weight?

КОГДА К БАТЕ ПРИШЕЛ ДРУГ😂#shorts

Человек паук уже не тот

СКОЛЬКО людей не имеют ни малейшего представления о своем истинном ПОТЕНЦИАЛЕ? #shorts

vLLM Office Hours - Multimodal Models in vLLM with Roblox - August 8, 2024

Рет қаралды 535

Neural Magic

Neural Magic

Күн бұрын

Пікірлер: 2

@hari000-f6y 2 ай бұрын

I have a question!. I'm serving multimodal on vLLM, quantized (InternVL2) on L4 , it takes ~5-6 secs to complete a request, so when multiple request hit at a time, it takes much time ~30 secs to complete the requests. how to handle it like multiple requests also gets completed in ~5 secs. I have less understanding in batch_requesting and all.

@shumshvenhiszali

@shumshvenhiszali 2 ай бұрын

Say code opensource but where?

vLLM Office Hours - Model Quantization for Efficient vLLM Inference - July 25, 2024

50:38

vLLM Office Hours - Model Quantization for Efficient vLLM Inference - July 25, 2024

Neural Magic

Рет қаралды 939

vLLM Office Hours - Speculative Decoding in vLLM - October 3, 2024

1:04:28

vLLM Office Hours - Speculative Decoding in vLLM - October 3, 2024

Neural Magic

Рет қаралды 572

Will A Basketball Boat Hold My Weight?

00:30

Will A Basketball Boat Hold My Weight?

MrBeast

Рет қаралды 145 МЛН

КОГДА К БАТЕ ПРИШЕЛ ДРУГ😂#shorts

00:59

КОГДА К БАТЕ ПРИШЕЛ ДРУГ😂#shorts

BATEK_OFFICIAL

Рет қаралды 7 МЛН

Человек паук уже не тот

00:32

Человек паук уже не тот

Miracle

Рет қаралды 3,5 МЛН

СКОЛЬКО людей не имеют ни малейшего представления о своем истинном ПОТЕНЦИАЛЕ? #shorts

01:00

СКОЛЬКО людей не имеют ни малейшего представления о своем истинном ПОТЕНЦИАЛЕ? #shorts

BalcevMMA_BOXING

Рет қаралды 4,2 МЛН

vLLM Office Hours - vLLM on AMD GPUs and Google TPUs - August 21, 2024

48:13

vLLM Office Hours - vLLM on AMD GPUs and Google TPUs - August 21, 2024

Neural Magic

Рет қаралды 576

Deploy Fast and Accurate YOLOv8 Object Detection Models on CPUs You Already Have

47:52

Deploy Fast and Accurate YOLOv8 Object Detection Models on CPUs You Already Have

Neural Magic

Рет қаралды 3,6 М.

Roblox's Journey to Supporting Multimodality on vLLM | Ray Summit 2024

30:03

Roblox's Journey to Supporting Multimodality on vLLM | Ray Summit 2024

Anyscale

Рет қаралды 268

Pruning and Quantizing ML Models With One Shot Without Retraining

52:31

Pruning and Quantizing ML Models With One Shot Without Retraining

Neural Magic

Рет қаралды 2 М.

vLLM Office Hours - Advanced Techniques for Maximizing vLLM Performance - September 19, 2024

52:35

vLLM Office Hours - Advanced Techniques for Maximizing vLLM Performance - September 19, 2024

Neural Magic

Рет қаралды 1,1 М.

Unlock Faster and More Efficient LLMs with SparseGPT

42:27

Unlock Faster and More Efficient LLMs with SparseGPT

Neural Magic

Рет қаралды 2,1 М.

Launching Your IT CareerKey Industry Trends & Essential Steps to Get Started

48:53

Launching Your IT CareerKey Industry Trends & Essential Steps to Get Started

BlackScreen AI

Рет қаралды 62

vLLM Office Hours - Using NVIDIA CUTLASS for High-Performance Inference - September 05, 2024

1:13:14

vLLM Office Hours - Using NVIDIA CUTLASS for High-Performance Inference - September 05, 2024

Neural Magic

Рет қаралды 2,2 М.

vLLM Office Hours - Deep Dive into Mistral on vLLM - October 17, 2024

49:38

vLLM Office Hours - Deep Dive into Mistral on vLLM - October 17, 2024

Neural Magic

Рет қаралды 476

Apply Second-Order Pruning Algorithms for SOTA Model Compression

41:42

Apply Second-Order Pruning Algorithms for SOTA Model Compression

Neural Magic

Рет қаралды 912

Will A Basketball Boat Hold My Weight?

00:30

Will A Basketball Boat Hold My Weight?

MrBeast

Рет қаралды 145 МЛН