Lessons learned from scaling large language models in production

From Idea to Production: AI Infra for Scaling LLM Apps

Private, Local AI

БУ, ИСПУГАЛСЯ?? #shorts

Симбу закрыли дома?! 🔒 #симба #симбочка #арти

Players vs Pitch 🤯

They Chose Kindness Over Abuse in Their Team #shorts

Lessons learned from scaling large language models in production

Рет қаралды 187

MLOps World: Machine Learning in Production

MLOps World: Machine Learning in Production

Күн бұрын

Speaker: Matt Squire, CTO, Fuzzy Labs
Open source models have made running your own LLM accessible many people. It's pretty straightforward to set up a model like Mistral, with a vector database, and build your own RAG application.
But making it scale to high traffic demands is another story. LLM inference itself is slow, and GPUs are expensive, so we can't simply throw hardware at the problem. Once you add things like guardrails to your application, latencies compound.
In this talk, I'll share the lessons we've learned from our experience building and running LLMs for our customers at scale. Using real code examples, I'll cover performance profiling, getting the most out of GPUs, and interactions with guardrails.

Пікірлер

From Idea to Production: AI Infra for Scaling LLM Apps

38:26

From Idea to Production: AI Infra for Scaling LLM Apps

MLOps World: Machine Learning in Production

Рет қаралды 311

Private, Local AI

36:56

Private, Local AI

MLOps World: Machine Learning in Production

Рет қаралды 275

БУ, ИСПУГАЛСЯ?? #shorts

00:22

БУ, ИСПУГАЛСЯ?? #shorts

Паша Осадчий

Рет қаралды 2,7 МЛН

Симбу закрыли дома?! 🔒 #симба #симбочка #арти

00:41

Симбу закрыли дома?! 🔒 #симба #симбочка #арти

Симбочка Пимпочка

Рет қаралды 2,9 МЛН

Players vs Pitch 🤯

00:26

Players vs Pitch 🤯

LE FOOT EN VIDÉO

Рет қаралды 128 МЛН

They Chose Kindness Over Abuse in Their Team #shorts

00:20

They Chose Kindness Over Abuse in Their Team #shorts

I migliori trucchetti di Fabiosa

Рет қаралды 12 МЛН

Running prompts at CI does not make your GenAI app enterprise ready

32:37

Running prompts at CI does not make your GenAI app enterprise ready

MLOps World: Machine Learning in Production

Рет қаралды 152

Efficiently Fine-Tune And Serve Your Own LLMs

41:59

Efficiently Fine-Tune And Serve Your Own LLMs

MLOps World: Machine Learning in Production

Рет қаралды 120

Speed up Deep Learning Models with Minimal Effort

26:11

Speed up Deep Learning Models with Minimal Effort

deepsense

Рет қаралды 103

The Journey of Building a Leading Open Source LLM Security Toolkit

32:00

The Journey of Building a Leading Open Source LLM Security Toolkit

MLOps World: Machine Learning in Production

Рет қаралды 114

Better Chatbots with Advanced RAG Techniques

49:12

Better Chatbots with Advanced RAG Techniques

MLOps World: Machine Learning in Production

Рет қаралды 394

Function Calling for LLMs: RAG without a Vector Database

39:02

Function Calling for LLMs: RAG without a Vector Database

MLOps World: Machine Learning in Production

Рет қаралды 328

The BEST component for your RAG system

44:56

The BEST component for your RAG system

MLOps World: Machine Learning in Production

Рет қаралды 458

Evaluating LLMs and RAG Pipelines at Scale

35:25

Evaluating LLMs and RAG Pipelines at Scale

MLOps World: Machine Learning in Production

Рет қаралды 442

Developing and Serving RAG-Based LLM Applications in Production

29:11

Developing and Serving RAG-Based LLM Applications in Production

Anyscale

Рет қаралды 20 М.

The Who, What, and Why of Data Lake Table Formats

47:33

The Who, What, and Why of Data Lake Table Formats

MLOps World: Machine Learning in Production

Рет қаралды 91

БУ, ИСПУГАЛСЯ?? #shorts

00:22

БУ, ИСПУГАЛСЯ?? #shorts

Паша Осадчий

Рет қаралды 2,7 МЛН