LoRAX: Serve 1000s of Fine-Tuned LLMs on a Single GPU - Travis Addair, Predibase, Inc.

Serving 100s of LLMs on 1 GPU with LoRAX - Travis Addair | Stanford MLSys #84

5 Reasons Why Adapters are the Future of Fine-tuning LLMs

ЭКСКЛЮЗИВ: «Папа мені көп ұратын!» Біреудің семьясын бұздым деп айта алмаймын! Алғашқы сұхбат

Каха и лужа #непосредственнокаха

НИКИТА ПОДСТАВИЛ ДЖОНИ 😡

Узбек возомнил себя КОРОЛЁМ, но его КОРОНУ ОН быстро поправил!

LoRAX: Serve 1000s of Fine-Tuned LLMs on a Single GPU - Travis Addair, Predibase, Inc.

Рет қаралды 347

The Linux Foundation

The Linux Foundation

Күн бұрын

LoRAX: Serve 1000s of Fine-Tuned LLMs on a Single GPU - Travis Addair, Predibase, Inc.
LoRAX (LoRA eXchange), is a new LLM inference system that allows users to pack 1000s of fine-tuned “LoRA” adapters into a single GPU, dramatically reducing the cost of serving compared against dedicated deployments per fine-tuned model. LoRAX is open-source, free to use commercially, and production-ready, with pre-built docker images and Helm charts available for immediate download and use. In this talk, we'll introduce LoRAX and explore the key ideas that make it the most cost effective and efficient way to serve fine-tuned LLMs in production, including: - Dynamic Adapter Loading: allowing each set of fine-tuned LoRA weights to be loaded from storage just-in-time as requests come in at runtime, without blocking concurrent requests. - Heterogeneous Continuous Batching: an extension to continuous batching that packs requests for different adapters together into the same batch, keeping latency and throughput nearly constant with the number of concurrent adapters. - Adapter Exchange Scheduling: a fair scheduling policy that asynchronously prefetches and offloads adapters between GPU and CPU memory, and schedules request batching to optimize the aggregate throughput of the system.

Пікірлер

Serving 100s of LLMs on 1 GPU with LoRAX - Travis Addair | Stanford MLSys #84

59:17

Serving 100s of LLMs on 1 GPU with LoRAX - Travis Addair | Stanford MLSys #84

Stanford MLSys Seminars

Рет қаралды 6 М.

5 Reasons Why Adapters are the Future of Fine-tuning LLMs

1:01:18

5 Reasons Why Adapters are the Future of Fine-tuning LLMs

Predibase

Рет қаралды 1,5 М.

ЭКСКЛЮЗИВ: «Папа мені көп ұратын!» Біреудің семьясын бұздым деп айта алмаймын! Алғашқы сұхбат

2:20:23

ЭКСКЛЮЗИВ: «Папа мені көп ұратын!» Біреудің семьясын бұздым деп айта алмаймын! Алғашқы сұхбат

НТК Show

Рет қаралды 1,2 МЛН

Каха и лужа #непосредственнокаха

00:15

Каха и лужа #непосредственнокаха

К-Media

Рет қаралды 10 МЛН

НИКИТА ПОДСТАВИЛ ДЖОНИ 😡

01:00

НИКИТА ПОДСТАВИЛ ДЖОНИ 😡

HOOOTDOGS

Рет қаралды 3,1 МЛН

Узбек возомнил себя КОРОЛЁМ, но его КОРОНУ ОН быстро поправил!

01:00

Узбек возомнил себя КОРОЛЁМ, но его КОРОНУ ОН быстро поправил!

БЕЗУМНЫЙ СПОРТ

Рет қаралды 6 МЛН

Introducing Open Platform for Enterprise AI - Ramakrishna Karamsetty & Arun Gupta, Intel Corporation

34:44

Introducing Open Platform for Enterprise AI - Ramakrishna Karamsetty & Arun Gupta, Intel Corporation

The Linux Foundation

Рет қаралды 2,8 М.

Improving Bpftrace Reliability - Daniel Xu, Meta

35:26

Improving Bpftrace Reliability - Daniel Xu, Meta

The Linux Foundation

Рет қаралды 238

A year of Servo Reboot: Where Are We Now?

32:11

A year of Servo Reboot: Where Are We Now?

Igalia

Рет қаралды 417

PEFT LoRA Explained in Detail - Fine-Tune your LLM on your local GPU

40:55

PEFT LoRA Explained in Detail - Fine-Tune your LLM on your local GPU

Discover AI

Рет қаралды 66 М.

SpaceX Secrets Leaked By Diablo Player - Deep Space Updates October 28th

22:13

SpaceX Secrets Leaked By Diablo Player - Deep Space Updates October 28th

Scott Manley

Рет қаралды 399 М.

What are AI Agents?

12:29

What are AI Agents?

IBM Technology

Рет қаралды 575 М.

Virtual Workshop: Fine-tune Your Own LLMs that Rival GPT-4

1:20:06

Virtual Workshop: Fine-tune Your Own LLMs that Rival GPT-4

Predibase

Рет қаралды 586

Install LoRAX Locally to Serve 1000s of Fine-Tuned LLMs

14:16

Install LoRAX Locally to Serve 1000s of Fine-Tuned LLMs

Fahd Mirza

Рет қаралды 1 М.

Apple, Stop Putting Things On the Bottom Please

9:16

Apple, Stop Putting Things On the Bottom Please

TechLinked

Рет қаралды 483 М.

Snowflake + Predibase: Smaller, faster & cheaper LLMs that beat GPT-4

58:30

Snowflake + Predibase: Smaller, faster & cheaper LLMs that beat GPT-4

Predibase

Рет қаралды 288

ЭКСКЛЮЗИВ: «Папа мені көп ұратын!» Біреудің семьясын бұздым деп айта алмаймын! Алғашқы сұхбат

2:20:23

ЭКСКЛЮЗИВ: «Папа мені көп ұратын!» Біреудің семьясын бұздым деп айта алмаймын! Алғашқы сұхбат

НТК Show

Рет қаралды 1,2 МЛН