Why vector search is not enough and we need BM25

  Рет қаралды 19,636

Diffbot

Diffbot

Күн бұрын

Пікірлер: 63
@endre777
@endre777 2 ай бұрын
Thanks for the explanation, was super clear. We just planning to move from vector search to hybrid, and your explanation on BM25 helps a lot to understand what edge cases it can solve. Appreciate a lot! Guess we will see a surge on BM25 due to Anthropic Contextual retrieval paper .
@ChocolateMilkCultLeader
@ChocolateMilkCultLeader 2 ай бұрын
Great video. This is why when building a search engine- I like to use BM25 for sparse search, and use Vector based search later, once most of the corpus has been filtered out. This allows me to stay precise and efficient. One additional thing- people often assume that you need a Vector Db for vector search, but you can do completely without. Just store the vectors in a normal DB.
@notsojharedtroll23
@notsojharedtroll23 2 ай бұрын
I mean, at the end of the day, the embedsings are data period
@stxnw
@stxnw 2 ай бұрын
It should be the other way around. Most prompts may not have exact matches. Use vector search first, then BM25 and rerank the results.
@microburn
@microburn 2 ай бұрын
Nice video. I’ve been on the opposite side of the coin, but I like hearing the balanced argument to keep me educated
@shiholololo1053
@shiholololo1053 Ай бұрын
Waiting for the next videp. I enjoyed the format.
@oncedidactic
@oncedidactic 2 ай бұрын
Nice discussion, thanks! I wish there was more structure to the video so the “why” of the title I served as a main dish, ie let’s define the terms up front, explain how each works, then do why discussion and give a teaser for hybrid approach discussion. Instead there are some gaps and jumps around, which leaves it feeling incomplete or maybe not quite capturing the essence? I have a feeling this is partly a result of editing many clips, so don’t take this feedback too seriously. Cheers
@jameswigglesworth8132
@jameswigglesworth8132 2 ай бұрын
Thank you for delving into this important topic!
@matveyshishov
@matveyshishov 2 ай бұрын
Thanks, guys, YT recommended me this video, a very pleasant snippet of explanation. Trying to work through your website to understand what the service is.
@aproperhooligan5950
@aproperhooligan5950 2 ай бұрын
Excellent presentation/explanation. Very useful. Thank you!
@andydataguy
@andydataguy 2 ай бұрын
Great video! This is one of the most misunderstood concepts. Will def share this next time it comes up!
@roopad8742
@roopad8742 2 ай бұрын
This is so easy to understand, thank you!
@dougunderwood569
@dougunderwood569 2 ай бұрын
Great overview, thank you!
@NicolasEmbleton
@NicolasEmbleton 2 ай бұрын
Wonderful explanation. Thank you.
@badashphilosophy9533
@badashphilosophy9533 2 ай бұрын
this is an amazing explanation. im an instant follower
@marka5215
@marka5215 2 ай бұрын
Great explanation. Thank you so much!
@weirdsciencetv4999
@weirdsciencetv4999 2 ай бұрын
Oh man you are amazing!! Love channel I subscribed. Please do a video on working with such graphs using a vector database
@ashraf_isb
@ashraf_isb 2 ай бұрын
thats insightful, thank you so much boss
@ShadowD2C
@ShadowD2C 23 күн бұрын
Hi, I liked the video and the explanations, I wouldve liked it to show more visuals about the topic instead of the presenters face tho
@theepicosityofpizza
@theepicosityofpizza 2 ай бұрын
BM25 doesn't do anything to address any of the issues you bring up at the beginning of the video. TF IDF is dumber than vector search in every aspect. It's just much cheaper to run. Not saying it doesn't have value as part of the toolkit but not sure why you spend the first half setting all thes problems with vector search up as if BM25 addresses any of them.
@stxnw
@stxnw 2 ай бұрын
is English not your first language?
@PongsiriHuang
@PongsiriHuang 17 күн бұрын
how does bm25 help with ranking the words excellent, good, decent, and numer like 50, 100, 150 or 150-100=50? I thought the video would discuss that
@amortalbeing
@amortalbeing 2 ай бұрын
thanks this was great!
@broccoli322
@broccoli322 2 ай бұрын
Thanks for the video.
@MathsSciencePhilosophy
@MathsSciencePhilosophy 2 ай бұрын
The mathematics behind chatGPT is amazing
@andrewwalker8985
@andrewwalker8985 2 ай бұрын
Why don’t we include semantic dimensions in vectors
@BleachWizz
@BleachWizz 2 ай бұрын
Oh no, this is going to make texts like I do!!! ok, drama aside, I do believe this will improve things a lot. I still see some caveats that would be left for luck, but huges amount of data might overcome that. I do believe we already have enough with GPT and a few previous ideas, still improving the language model itself is always a plus.
@АндрейАндреевич-з7т
@АндрейАндреевич-з7т 2 ай бұрын
BM25. Frequency-weighted by sponsored-definition-tag vector search. Yeah google search do that too, you know. If you ever did seo optimization for your website or some kind of smm you know that it works
@MLGJuggernautgaming
@MLGJuggernautgaming 2 ай бұрын
I believe a vector search is still better for rag applications. Bm25 is better for more literal matches. Also what does this have to do with LLMs doing math?
@Howoulduknow841
@Howoulduknow841 2 ай бұрын
This is something Anthropic has shared with their contextual retrieval.
@pratikerande4808
@pratikerande4808 2 ай бұрын
super
@shizheliang2679
@shizheliang2679 Ай бұрын
wait...I think I am in love...
@bmm8213
@bmm8213 2 ай бұрын
Golden nugget
@Isaacmellojr
@Isaacmellojr 2 ай бұрын
Otima exemplificacao de como word2vec não é a solucao definitiva.
@tempname-dr2bm
@tempname-dr2bm Ай бұрын
Poland mentioned
@themax2go
@themax2go 2 ай бұрын
ty for the insight to "pair" numerical rep (vector) w/ MB25... can the same be achieved w/ just using a knowledge graph? i'm experimenting w/ sci/phi triplex... what do you think, do you have any preliminary ideas, or have you already tested it and found using "entities_and_triples" not as effective / not effective at all? 6 mo ago you did a vid on knowledge graphs, i haven't watched it yet, i'll check it out...
@knucker3
@knucker3 2 ай бұрын
TURN YOUR VOLUME UP
@NLPprompter
@NLPprompter 2 ай бұрын
i love this bot...
@815TypeSirius
@815TypeSirius 2 ай бұрын
But vs is enough to scam dummies and create a market bubble.
@ValidatingUsername
@ValidatingUsername 2 ай бұрын
Try tokenizing engendered languages 😂
@rontheoracle
@rontheoracle 2 ай бұрын
Excuse me, but your volume is just too low. Just saying.
@martin777xyz
@martin777xyz 2 ай бұрын
Seems fine to me
@sladeTek
@sladeTek 2 ай бұрын
No it’s not, your device is the issue
@rontheoracle
@rontheoracle 2 ай бұрын
@@sladeTek It's just this video and a few others that play with very low volume. I try other videos in youtube, in general, they sound acceptably loud. Dunno why.
@rontheoracle
@rontheoracle 2 ай бұрын
@@sladeTek Try watching the video in youtube with this title: "The Best RAG Technique Yet? Anthropic’s Contextual Retrieval Explained!" It is significantly much louder. Just my 2 cents.
@csmac3144a
@csmac3144a 2 ай бұрын
Her audio is fine. Turn up your volume.
@Ruhgtfo
@Ruhgtfo 2 ай бұрын
Contributed 3blue1brown
3 Vector-based Methods for Similarity Search (TF-IDF, BM25, SBERT)
29:24
BM25 : The Most Important Text Metric in Data Science
18:12
ritvikmath
Рет қаралды 11 М.
Quando eu quero Sushi (sem desperdiçar) 🍣
00:26
Los Wagners
Рет қаралды 15 МЛН
Каха и дочка
00:28
К-Media
Рет қаралды 3,4 МЛН
AI can't cross this line and we don't know why.
24:07
Welch Labs
Рет қаралды 1,4 МЛН
NEW Knowledge-Graph Adaptive Reasoning: Plan-on-Graph LLM
23:33
Discover AI
Рет қаралды 6 М.
LightRAG & LongRAG Explained: Cutting-Edge RAG Techniques in AI
17:21
SPLADE: the first search model to beat BM25
28:52
James Briggs
Рет қаралды 20 М.
Reliable Graph RAG with Neo4j and Diffbot
8:02
Diffbot
Рет қаралды 21 М.
How to make HUGE N-Body Simulations (N=1,000,000+)
10:28
Deadlock
Рет қаралды 98 М.
Is the Future of Linear Algebra.. Random?
35:11
Mutual Information
Рет қаралды 375 М.
DSPy with Knowledge Graphs Tested (non-canned examples)
11:28
Variational Autoencoders | Generative AI Animated
20:09
Deepia
Рет қаралды 42 М.
Building Brain-Like Memory for AI | LLM Agent Memory Systems
43:31
Quando eu quero Sushi (sem desperdiçar) 🍣
00:26
Los Wagners
Рет қаралды 15 МЛН