Lessons learned from scaling large language models in production

  Рет қаралды 187

MLOps World: Machine Learning in Production

MLOps World: Machine Learning in Production

Күн бұрын

Speaker: Matt Squire, CTO, Fuzzy Labs
Open source models have made running your own LLM accessible many people. It's pretty straightforward to set up a model like Mistral, with a vector database, and build your own RAG application.
But making it scale to high traffic demands is another story. LLM inference itself is slow, and GPUs are expensive, so we can't simply throw hardware at the problem. Once you add things like guardrails to your application, latencies compound.
In this talk, I'll share the lessons we've learned from our experience building and running LLMs for our customers at scale. Using real code examples, I'll cover performance profiling, getting the most out of GPUs, and interactions with guardrails.

Пікірлер
From Idea to Production: AI Infra for Scaling LLM Apps
38:26
MLOps World: Machine Learning in Production
Рет қаралды 311
Private, Local AI
36:56
MLOps World: Machine Learning in Production
Рет қаралды 275
БУ, ИСПУГАЛСЯ?? #shorts
00:22
Паша Осадчий
Рет қаралды 2,7 МЛН
Симбу закрыли дома?! 🔒 #симба #симбочка #арти
00:41
Симбочка Пимпочка
Рет қаралды 2,9 МЛН
Players vs Pitch 🤯
00:26
LE FOOT EN VIDÉO
Рет қаралды 128 МЛН
They Chose Kindness Over Abuse in Their Team #shorts
00:20
I migliori trucchetti di Fabiosa
Рет қаралды 12 МЛН
Running prompts at CI does not make your GenAI app enterprise ready
32:37
MLOps World: Machine Learning in Production
Рет қаралды 152
Efficiently Fine-Tune And Serve Your Own LLMs
41:59
MLOps World: Machine Learning in Production
Рет қаралды 120
Speed up Deep Learning Models with Minimal Effort
26:11
deepsense
Рет қаралды 103
The Journey of Building a Leading Open Source LLM Security Toolkit
32:00
MLOps World: Machine Learning in Production
Рет қаралды 114
Better Chatbots with Advanced RAG Techniques
49:12
MLOps World: Machine Learning in Production
Рет қаралды 394
Function Calling for LLMs: RAG without a Vector Database
39:02
MLOps World: Machine Learning in Production
Рет қаралды 328
The BEST component for your RAG system
44:56
MLOps World: Machine Learning in Production
Рет қаралды 458
Evaluating LLMs and RAG Pipelines at Scale
35:25
MLOps World: Machine Learning in Production
Рет қаралды 442
The Who, What, and Why of Data Lake Table Formats
47:33
MLOps World: Machine Learning in Production
Рет қаралды 91
БУ, ИСПУГАЛСЯ?? #shorts
00:22
Паша Осадчий
Рет қаралды 2,7 МЛН