Straight to the point, excellent content ! Allow me to add a quick reminder for myself in case I forget and for others in case they want to look into the details, probability requires a bit of mental warm up sometimes to fully understand : A bloom filter can be seen as F = {0,1}^m For each word, hashing means setting k indexes in F to 1, such as F[ h_i(w) ] = 1 where h_i is the i-th hash function in { h_1, ... , h_k } We do this for all words in the dictionary/list to input Initially we put all values of F to 0, so F={0}^m Then we apply one hash like h_1(w) = some_index So now F[ some_index ] = 1 Then the probability of finding a bit at one, is literally one in m since there is only a single 1 in F: p1 = 1/m So that the probability of finding a bit at zero would be the inverse : p0 = 1 - p1 = 1 - 1/m (**) So after applying all k hash functions to the first word the probability of finding a bit at zero becomes : p' =(1-1/m)^k Then after hashing all words (or until the iteration i of the list of words), the probability of finding a bit at zero will be : p=(1-1/m)^(k*i) So now the probability of getting a one for a random word w, p(F[h_i(w)]=1) is the inverse of p which is : q = 1 - p = 1 - (1-1/m)^(k*i) Therefore if we want the probability of getting a false positive = an error, we want the probability of getting k times the value 1 in succession. So that F[h_1(w)] = 1 AND F[h_2(w)] = 1 AND F[h_3(w)] = 1 ....AND F[h_k(w)] =1 But we know from previous that p(F[h_i(w)]=1) = q = 1 - (1-1/m)^(k*i) Finally the probability of an error is e = p(F[h_1(w)]=1)*p(F[h_2(w)]=1)*...*p(F[h_k(w)]=1) = q*q*q...*q = q^k = ( 1 - (1-1/m)^(k*i) )^k ** In this calculation one could argue that after completing the first word, why wouldn't the probability of finding a zero be p' = 1 - k/m It's a valid remarque, we could say that after applying each h_1(w) to h_k(w) we've been adding a new 1 to F, so p1 = k/m While that is highly possible, since h is built so that each hash gives a different result, it wouldn't be mathematically correct , because that would violate the case where h_1(w) = h_2(w) for instance. (If you assume all other h_i(w) are distinct, then p' = 1 - (k-1)/m in that case) Then what if three hashes are the same, what if any random number n of hashes are the same, this formula would lead to incoherence. => My intuition to solve this question is that when you don't know how many ones are in your filter the experience of drawing a zero can be simulated as not drawing one for all possible values of h_i(w). So the probability of not drawing one (or drawing 0) for the first index h_1(w) is (1-1/m), then for h_2(w) it's the same (1-1/m) etc... until h_k(w) And so drawing a zero when you don't know exactly how many ones there are in your Filter is like not drawing one for the first hashed index, and not drawing one for the second etc... until not drawing one for the last hashed index. In fact the more I think about this the more I find it counterintuitive, I'm not 100% sure how to interpret this formula probabilistically, because when you say the probability of getting a 0 for a single hash, that's one operation, but when you say p = (1-1/m)^k that's the probability of multiple events happening in succession.
@Md_sadiq_Md7 ай бұрын
I was writing blog on bloom Filters and thanks for this man 🔥
@fiveyearclub60245 жыл бұрын
i got bloom filters in an interview. i had read a wikipedia article on it, but i just didn't remember enough to say anything about it. this video was clear! you're a rock star.
@gkcs5 жыл бұрын
Thank you 😁
@salookie80006 жыл бұрын
Excellent comment you made that it doesn't guarantee string has been searched but it does guarantee it has not been searched.
@gkcs6 жыл бұрын
It takes some time to roll of the toungue :)
@shivas99116 жыл бұрын
Are you saying that with the right number of hash functions, the "cc" search should respond as not been searched?
@PiyushSingh-vx7bx4 жыл бұрын
U r one of the best computer science KZbinr bro
@abhaypatil20004 жыл бұрын
This is the best channel for a cs student. We have many youtube channels which solve competitive programming problems. But who will make up for rest of the concepts. Especially when the IIT professors are not so great😅😅
@KshitijSharma-v7b2 ай бұрын
best video on bloom filter. Gone through multiple videos but my concept got cleared in this
@mairasultan8846 жыл бұрын
A simple and easy to understand talk. thumbs up
@ThePujjwal3 жыл бұрын
17:32 Small tip. If sum of digits of a number is divisible by 9 it will be divisible by 9.
@rajeevanand68705 жыл бұрын
Nice simple explanation of Bloom Filter. Only he got bit confused during 23:12-23:24. It's simply "1 divided by 1 is 1, and a minus 1 is 0... so on". Anyways good talk.
@rajashekarreddy0073 жыл бұрын
Yeah, I was looking in the comments section to see if anyone has caught that.
@TheGenerationGapPodcast2 жыл бұрын
Why point out the negative 0.000000001%?
@sujitkumar-xs2wy6 жыл бұрын
one of ur''s best lecture ...@gaurav
@sagnikbilltrim47164 жыл бұрын
I liked the video very much, except the fact u forgot "p" comes after "o" not "q" :P
@jairajkhushalani995 жыл бұрын
Great stuff Gaurav!!
@karanmjanthe2521Ай бұрын
really well explained
@BRAJAGOP6 жыл бұрын
Good talk on bloom filter and liked the math part, would have loved if you solved the optimal part as well, that is something for me to look up ;) .
@gkcs6 жыл бұрын
That math was too complicated to talk about here :) Thanks!
@abhishek_sengupta4 жыл бұрын
Very nicely explained!! Thanks.
@priyankagrawal93333 жыл бұрын
Amazing!! Thanks for sharing!!
@randomizing-life70165 жыл бұрын
great video, great explanation on topic that im just starting to get familiar with. but i couldnt lay my eyes off you. first YT engineer lecturer that is handsome like hell.I know purpose of the video is educational and all comments are about your lecture but damn i had to comment your good looks too :)
@gkcs5 жыл бұрын
Hahaha, thanks!
@shreyageek3 ай бұрын
no one explains a single question with bloom filter , but its actually asked in microsoft interview
@saaqibz5 жыл бұрын
Great video and clear explanations, thanks! small note: wouldn't comparing two strings be a complexity of O(min(m,n))? since you don't really need to compare past the end of one string after the end of the other is finished.
@gkcs5 жыл бұрын
Yes, that's correct. I am assuming n to be the length of the smaller one :)
@Atpugtihsrah6 жыл бұрын
1/1 is 0 :D Bdw amazing video! Thanks!
@gkcs6 жыл бұрын
😁
@ayushverma38636 жыл бұрын
Well explained Gaurav. I was curious how companies like Facebook handle the false positives while using Bloom filters.
@gkcs6 жыл бұрын
Thanks Ayush!
@muffinproject6 жыл бұрын
In the case presented here (caching a search) they probably wouldn't do anything about it. It's just used to cut down on unnecesary caching, NOT to definitively prove a string has already been searched. Let's say that implementing this generates a 'not in set' result 90% of the time, and has a 50/50 chance of being a false positive. Instead of caching 100% of cases, we now toss away 90% of them - saving equally as much storage space. Yes, we do save twice as much as is ideal, but we're saving a lot. Ideally we'd only save 5% but then we'd have to compare the actual string to a very very large array of previous searches which takes longer to compare (and, ironically, use up some storage space in the form of a larger string-array than the bloom array) Compare it a bit to the game 'guess who?' where you have a bunch of people and have to find out whoever your opponent picked. You could ask naïvely; is it Adam? Is it Bertha? Is it Cedric? Is it Diane? - but that takes a long time. You could instead ask; Do they have blue eyes? Are they male? Etcetera. Which reduces the possibilities a lot faster.
@himalayan0065 жыл бұрын
Very informative, really enjoyed.
@jeffreyqiu51283 жыл бұрын
absolute fire
@PrateekMehtaABDFAN2 жыл бұрын
Great video sir . Huge fan of your content . Quick question : Rather than Multiple bloom filter wouldn't it be better to implement count min sketch or do we have an advantage of using multiple bloom filters over count min sketch ?
@krishnapatni5 жыл бұрын
very informative and you made it simple to understand. can the bloom filter indicate that the string being searched is a substring of another string that was searched before ? for e.g. if matthew was searched before mat, then m a and t bits have already been set and this shows mat is a substring for some another string already searched..
@satviknema86297 ай бұрын
having some trouble understanding this: Probability of finding that a bit is set after 'i' iterations of 'k' hash functions is 1 - (1-1/m)^(k*i) Why is this value not simply (1/m)^(k*i)
@gudduagrawal852310 ай бұрын
when i heard 1/1 = 0😶🌫
@ruhinapatel65305 жыл бұрын
My quest is that basically from what I understood..bloom filter tells u whether the string is searched or not..n if yes then it stores it in the cache..but isn't there a lot of overhead for the string search..and performance issues..
@lakshyaagarwal40444 жыл бұрын
The video was great but if you had used Blue instead of Red it would have been better.
@Kavishkhullar11 ай бұрын
@gaurav (1- 1/m)^(k*i) is the probability of us not not setting a bit after running i string through k hash functions. subtracting that from 1 gives us the probability of us setting the bit to 1. Could you explain "the bymistake" part? how it that the probability of us by mistakly setting that bit?
@vivekmit3 жыл бұрын
I am not sure about your derived probability formula for error rate in bloom filter but not sure whether the ordering of bloom hashing bit set.... like hashing 1,2 ,3 for any key is (1,1,3 ; 3,1,1; 3 3,1; 333, 111.....any combination of 1,3) all falls under same consideration and treated as key already being searched...I don't think in practice it is a justified in any system design component.
@parvathysuresh28214 жыл бұрын
ur expressions are cute and ur explanations also good bro...
@siarez6 жыл бұрын
Thanks, that's a great explanation! What does 'n' mean @23:55? Is that the size of our vocabulary? Why do we need it here?
@gkcs6 жыл бұрын
Thanks! N is the number of words we will be searching in the bloom filter. 😁
@7301G9 ай бұрын
@11:42 I have 1 question if the string is "anas" and the other string is "sana" so sana name string is new but when we make it go through hash function we will get all its bits set to 1 but its never have been searched ?
@mattb99166 жыл бұрын
Excellent!
@gkcs6 жыл бұрын
Thank you!
@aravindm96876 жыл бұрын
great explanation! @13:10 how about incrementing the array index value from 1 to 2 to later find out whether the string is matched twice ? Instead of handling multiple arrays
@gkcs6 жыл бұрын
aravind kumar That's a good idea and while work most cases. However, the second filter could use a different hash function, making collisions even less likely.
@prarabdhgupta11174 жыл бұрын
@Gaurav Sen, That link for Cache in the Description is hitting to a dead end, If you can update it, it'll be helpful.
@gkcs4 жыл бұрын
Done. It's the wikipedia page on caching.
@rembautimes88083 жыл бұрын
Interesting idea, and perhaps can be paired with BitSet in Python
@vijayantpawar3137 Жыл бұрын
What is the need of using multiple hash functions?
@bite076_tania37 ай бұрын
to avoid probability of collision (two strings having same hash value)
@snehakavinkar22404 жыл бұрын
What if we have to check for millions of such words, do we just need to increase the size of bit array?
@gkcs4 жыл бұрын
To reduce false positives, yes.
@pratyushranjan2775 жыл бұрын
Hi gaurav bhai..i am a developer but i want to make my programming skills better...is there any algo book or programming book you would suggest..or any other way to improve my logic and programming?? Thanks
@shuvopaul30896 жыл бұрын
Expression😘😘
@gkcs6 жыл бұрын
😛
@shuvopaul30896 жыл бұрын
Sir ...you communicate really well...which makes it more interesting
@GelinLuo5 жыл бұрын
Curious to ask when do you call the `reset()` function to reset your bloom filter that determines whether a string has been searched or not in facebook
@gkcs5 жыл бұрын
A threshold on false positives is a good idea. Over time, the bits in a bloom filter are all set. Here is a thread discussing some resets: github.com/ben-manes/caffeine/issues/85
@sixfacesenthil5 жыл бұрын
when k=3 ,17%,k=4 ,1.18.how error rate is 5.4% . k=20 ,5.4 %
@anshashakkeer92775 жыл бұрын
sir Can u plz explain how to find hash of a string .. i didnt get thru dis lecture i cant clear
@gkcs5 жыл бұрын
Read up on it.
@anshashakkeer92775 жыл бұрын
@@gkcs how u get 126
@youtuberfanz7887 Жыл бұрын
Cat this guy become a teacher? It really suits him if this is what he wants.
@awadheshsingh72996 жыл бұрын
1/1 is 1 and not 0, made a blunder there lol
@gkcs6 жыл бұрын
I did? Damn 🙈
@Amol1754 жыл бұрын
Derived it with respect to what?🤔
@reactdeveloper2368 Жыл бұрын
If anyone has worked with mongoose implemeting bloom filters for full text search it will be nice! Thank you in advance
@kumarc4853 Жыл бұрын
bhakt jano, bloom filter baad mei seekh leha, , Lekin pehle check karlo ki Cache mei kahi *AMOEBA* toh nahi hai?!