Scaling Vector Database Usage Without Breaking the Bank Quantization and Adaptive Retrieval

Uncertainty Quantification with Conformal Prediction: A Path to Reliable ML Models

Agentic AI: Unlocking Emergent Behavior in LLMs for Adaptive Workflow Automation

🦑🎮 Round two! Will he make it? #PuffSurvives #SquidGameSequel #thatlittlepuff #Mesucalife

String Competition for iPhone! 😱

100km/h Reflex Challenge 😱🚀

The perfect snowball 😳❄️ (via @vidough/TT)

Scaling Vector Database Usage Without Breaking the Bank Quantization and Adaptive Retrieval

Рет қаралды 307

Toronto Machine Learning Series (TMLS)

Toronto Machine Learning Series (TMLS)

Күн бұрын

Speaker:
Zain Hassan, Senior ML Developer Advocate, Weaviate
Abstract:
Everybody loves vector search and enterprises now see its value thanks to the popularity of LLMs and RAG. The problem is that prod-level deployment of vector search requires boatloads of both CPU, for search, and GPU, for inference, compute. The bottom line is that if deployed incorrectly vector search can be prohibitively expensive compared to classical alternatives.
The solution: quantizing vectors and performing adaptive retrieval. These techniques allow you to scale applications into production by allowing you to balance and tune memory costs, latency performance, and retrieval accuracy very reliably.
I’ll talk about how you can perform realtime billion-scale vector search on your laptop! This includes covering different quantization techniques, including product, binary, scalar and matryoshka quantization that can be used to compress vectors trading off memory requirements for accuracy. I’ll also introduce the concept of adaptive retrieval where you first perform cheap hardware-optimized low-accuracy search to identify retrieval candidates using compressed vectors followed by a slower, higher-accuracy search to rescore and correct.
These quantization techniques when used with well-thought-out adaptive retrieval can lead to a 32x reduction in memory cost requirements at the cost of ~ 5% loss in retrieval recall in your RAG stack.

Пікірлер

Uncertainty Quantification with Conformal Prediction: A Path to Reliable ML Models

1:23:45

Uncertainty Quantification with Conformal Prediction: A Path to Reliable ML Models

Toronto Machine Learning Series (TMLS)

Рет қаралды 151

Agentic AI: Unlocking Emergent Behavior in LLMs for Adaptive Workflow Automation

41:25

Agentic AI: Unlocking Emergent Behavior in LLMs for Adaptive Workflow Automation

Toronto Machine Learning Series (TMLS)

Рет қаралды 341

🦑🎮 Round two! Will he make it? #PuffSurvives #SquidGameSequel #thatlittlepuff #Mesucalife

00:40

🦑🎮 Round two! Will he make it? #PuffSurvives #SquidGameSequel #thatlittlepuff #Mesucalife

That Little Puff

Рет қаралды 220 МЛН

String Competition for iPhone! 😱

00:37

String Competition for iPhone! 😱

Alan Chikin Chow

Рет қаралды 30 МЛН

100km/h Reflex Challenge 😱🚀

00:27

100km/h Reflex Challenge 😱🚀

Celine Dept

Рет қаралды 156 МЛН

The perfect snowball 😳❄️ (via @vidough/TT)

00:31

The perfect snowball 😳❄️ (via @vidough/TT)

SportsNation

Рет қаралды 77 МЛН

Vector Quantization: The Vector Clubhouse Episode 2

59:40

Vector Quantization: The Vector Clubhouse Episode 2

Weaviate • Vector Database

Рет қаралды 291

Kùzu A fast, scalable graph database for analytical workloads

1:29:22

Kùzu A fast, scalable graph database for analytical workloads

Toronto Machine Learning Series (TMLS)

Рет қаралды 143

How language model post-training is done today

53:51

How language model post-training is done today

Interconnects AI

Рет қаралды 6 М.

Modular Solutions for Knowledge Management at scale in RAG Systems

1:04:16

Modular Solutions for Knowledge Management at scale in RAG Systems

Toronto Machine Learning Series (TMLS)

Рет қаралды 89

What if all the world's biggest problems have the same solution?

24:52

What if all the world's biggest problems have the same solution?

Veritasium

Рет қаралды 2,6 МЛН

From Chaos to Control Mastering ML Reproducibility at scale

1:18:41

From Chaos to Control Mastering ML Reproducibility at scale

Toronto Machine Learning Series (TMLS)

Рет қаралды 68

Simon Sinek's Advice Will Leave You SPEECHLESS 2.0 (MUST WATCH)

20:43

Simon Sinek's Advice Will Leave You SPEECHLESS 2.0 (MUST WATCH)

Alpha Leaders

Рет қаралды 2,8 МЛН

Transformers (how LLMs work) explained visually | DL5

27:14

Transformers (how LLMs work) explained visually | DL5

3Blue1Brown

Рет қаралды 4,8 МЛН

Why Does Diffusion Work Better than Auto-Regression?

20:18

Why Does Diffusion Work Better than Auto-Regression?

Algorithmic Simplicity

Рет қаралды 456 М.

Fuel iX: An enterprise grade Gen AI platform

40:03

Fuel iX: An enterprise grade Gen AI platform

Toronto Machine Learning Series (TMLS)

Рет қаралды 484

🦑🎮 Round two! Will he make it? #PuffSurvives #SquidGameSequel #thatlittlepuff #Mesucalife

00:40

🦑🎮 Round two! Will he make it? #PuffSurvives #SquidGameSequel #thatlittlepuff #Mesucalife

That Little Puff

Рет қаралды 220 МЛН