LoRAX: Serve 1000s of Fine-Tuned LLMs on a Single GPU - Travis Addair, Predibase, Inc.

  Рет қаралды 347

The Linux Foundation

The Linux Foundation

Күн бұрын

LoRAX: Serve 1000s of Fine-Tuned LLMs on a Single GPU - Travis Addair, Predibase, Inc.
LoRAX (LoRA eXchange), is a new LLM inference system that allows users to pack 1000s of fine-tuned “LoRA” adapters into a single GPU, dramatically reducing the cost of serving compared against dedicated deployments per fine-tuned model. LoRAX is open-source, free to use commercially, and production-ready, with pre-built docker images and Helm charts available for immediate download and use. In this talk, we'll introduce LoRAX and explore the key ideas that make it the most cost effective and efficient way to serve fine-tuned LLMs in production, including: - Dynamic Adapter Loading: allowing each set of fine-tuned LoRA weights to be loaded from storage just-in-time as requests come in at runtime, without blocking concurrent requests. - Heterogeneous Continuous Batching: an extension to continuous batching that packs requests for different adapters together into the same batch, keeping latency and throughput nearly constant with the number of concurrent adapters. - Adapter Exchange Scheduling: a fair scheduling policy that asynchronously prefetches and offloads adapters between GPU and CPU memory, and schedules request batching to optimize the aggregate throughput of the system.

Пікірлер
Serving 100s of LLMs on 1 GPU with LoRAX - Travis Addair | Stanford MLSys #84
59:17
5 Reasons Why Adapters are the Future of Fine-tuning LLMs
1:01:18
Predibase
Рет қаралды 1,5 М.
Каха и лужа  #непосредственнокаха
00:15
НИКИТА ПОДСТАВИЛ ДЖОНИ 😡
01:00
HOOOTDOGS
Рет қаралды 3,1 МЛН
Improving Bpftrace Reliability - Daniel Xu, Meta
35:26
The Linux Foundation
Рет қаралды 238
A year of Servo Reboot: Where Are We Now?
32:11
Igalia
Рет қаралды 417
What are AI Agents?
12:29
IBM Technology
Рет қаралды 575 М.
Virtual Workshop: Fine-tune Your Own LLMs that Rival GPT-4
1:20:06
Install LoRAX Locally to Serve 1000s of Fine-Tuned LLMs
14:16
Fahd Mirza
Рет қаралды 1 М.
Apple, Stop Putting Things On the Bottom Please
9:16
TechLinked
Рет қаралды 483 М.