Bloom filter for System Design | Bloom filter applications

Bloom filter for System Design | Bloom filter applications | learn bloom filter easily

Рет қаралды 111,348

Күн бұрын

Пікірлер: 149

@vedant9173 3 жыл бұрын

He doesn't just tell you to use bloom filters or what they are useful for, he actually explains it from scratch with such simplicity. He is an epitome of a great teacher.

@abhilashpatel6852 2 жыл бұрын

But when to use it?

@geoyws 4 жыл бұрын

Best explanation on Bloom Filter on KZbin probably

@adamhughes9938 4 жыл бұрын

How have I been programming so long and never used this. Incredibly elegant!

@josephedappully1482 5 жыл бұрын

This is a great explanation, and I love how it's complete with examples/applications. Thanks!

@richardbray 5 ай бұрын

Best explanation I've seen online 👍

@nithinchinni 5 жыл бұрын

Very interesting data structure. The biggest challenge here is to write good hash functions which is not easy. Also in order to reduce collisions, rather than increasing the size of the bloom filter, I would prefer to use multiple bloom fliters and assign some hash functions to one filter and some functions to other filters etc.

@ariellyrycs 4 жыл бұрын

@Tech Dummies. thank you for all your hard work . i have learned a lot from you.

@spock949 4 жыл бұрын

You and gaurav sen are going to make me a lot of money one day. Here’s a 👍

@ThoRicHeLLi 6 жыл бұрын

You rock, man! I'm addicted to your videos.

@hank91918 5 жыл бұрын

outstanding work! I know Bloom Filter now.

@zehahaha2899 4 жыл бұрын

This is one of the best data structures I've ever seen.

@tanugarg2491 Жыл бұрын

Awesome video. The concept is crystal clear now

@StreamerSagaYT 3 жыл бұрын

BRO I LITERALLY FORT THIS TOPIC. thank you youtube.

@VijayChallak Жыл бұрын

Awesome explanation - easy enough for kids to learn - thank you :).

@the_sweet_heaven 3 жыл бұрын

Great video & Great Explanation. I was asked this question in an Interview when i had completely no idea about bloom filter. I doubt if anyone can come up with this idea of storing usernames in an interview of 1 hour.

@harjos78 4 жыл бұрын

Hey Narendra. great stuff!... crisp and clear explanation...

@jatinpatel7862 6 жыл бұрын

Awesome Video & fantastic basic level of understanding on bloom filter. Thank You so much.

@doruwyl 5 жыл бұрын

Exceptional explanation! Very clear and well done because you explained how it works, but most importantly "why / when should someone use the bloom filter?". I think the answer of the question: "Why / When is this usefull?" is missing a lot of videos.

@gelucojocariu9344 4 жыл бұрын

Tnx for help. You just made my exam easier.

@mingr3299 3 жыл бұрын

Thank you. Now I am much clear understanding about the BF! Like to learn more the hash algorithms.

@rahulkala763 4 жыл бұрын

Very interesting concept and explained in very good way. Thank you so much

@johnyepthomi892 2 жыл бұрын

Thank you . Good content. Conscience and to the point.

@vijay.khanna 4 жыл бұрын

Your explanation style is nice.

@ShadJamal 2 жыл бұрын

It was very helpful. Well explained.

@ShaliniNegi24 4 жыл бұрын

Best Video on BloomFilter.

@dataguy7013 4 жыл бұрын

Best explanation of bloom filter

@arindamIT Жыл бұрын

Excellent explanation, sir❤

@weekendresearcher 6 жыл бұрын

A board, a marker and a great mind. Good job.

@PradeepSamuelRocks 3 жыл бұрын

Well explained !! Thanks for the video !!!

@anastasianaumko923 Жыл бұрын

Amazing work! Thank you 🌻

@tapasyayadav5148 6 жыл бұрын

great explanation.I have 2 doubts 1.What if after sometime all the values in bit array are set to 1.Then for all the searches it will be always yes. 2.If we have to remove an entry or word then how to reset values in bloom filter as the same hash value can be for some other word.

@sunilr360 6 жыл бұрын

1. Yes. Hence the need to use a bigger bucket of values, and maybe as fewer hash functions as possible, so as to avoid this scenario. (Doing this could also help prevent collisions and as a result, the number of false positives) 2. That's a pre-requisite of bloom filters. You CANNOT delete values from it once entered.

@gakhov 5 жыл бұрын

For this and other probabilistic data structures and algorithms with the explanations of these "edge" cases, take a look at my recently published book "Probabilistic Data Structures and Algorithms for Big Data Applications".

@SUMITGUPTA153 3 жыл бұрын

@@sunilr360 Limitation in point 2 can be removed with counting bloom filters. Instead of a bit array, you need to keep a byte/int array. (en.wikipedia.org/wiki/Counting_Bloom_filter). While inserting a word, you need to increment values in those array positions by 1 (which are returned by k hash functions) but it takes more space. While inserting a word, While deleting a word, you need to reduce 1 from every position that was incremented when it was entered.

@jubinchheda 3 жыл бұрын

2 can be addressed by counting bloom filters

@tinemabre 4 жыл бұрын

That was a great and amazing explanation, congratulation for this video. It was a very great job.

@大盗江南 4 жыл бұрын

Thank you! very good tutoriel! Plz keep giving us more videos!!!

@abhishek_sengupta 4 жыл бұрын

Very nicely explained!! Thanks.

@ChristianESL 5 жыл бұрын

Reallt nice explanation. thanks. Also awesomr real life examples.

@santhoshmanoharan9829 4 жыл бұрын

Mind Blowing Video !!! Thanks

@tiagoavila7818 Жыл бұрын

Loved the Linkin Park t-shirt! :D

@maxziebell4013 4 жыл бұрын

Awesome. Mind Blown!

@rajeshg3570 3 жыл бұрын

Very nice explanation..

@Atpugtihsrah 6 жыл бұрын

I think we can also use "Trie" Data Structure for the problem talked about in the first few minutes. Keep on inserting the elements in the trie and then if we want to search for 'CAR', we can directly check it in O(lengthOfWord) time. Isn't it? I'm not comparing Trie and BF but just suggesting another Data Structure which can be used. Thanks!

@TechDummiesNarendraL 6 жыл бұрын

Yes, but which one is better O(L) or O(1) ?

@Atpugtihsrah 6 жыл бұрын

@@TechDummiesNarendraL O(1) :D

@didyoustealmyfood8729 Жыл бұрын

@@TechDummiesNarendraLdepends on memory constraints

@vaib5917 6 жыл бұрын

Great explanation! Waiting for Count-min sketch and comparison with BF. Thanks

@TechDummiesNarendraL 6 жыл бұрын

Thanks and sure I will do Count-min sketch Algo.

@shivaakrish Жыл бұрын

can we use trie for the username search scenario ?

@springtest540 6 жыл бұрын

Test Sir.. Very nice explanation.. Please make more videos on system design and on other things as well.

@kingraja11 6 жыл бұрын

Can we use trie data structure?

@abhaysoni8631 3 жыл бұрын

thank you sir, very well and clearly rxplained

@mokogolo 4 жыл бұрын

Great explanation!!

@amolnagotkar3037 2 жыл бұрын

Great information. Thanks a lot

@ryklin1 2 жыл бұрын

I would like to see how you compute the probability of error, storage requirements, and K Hash functions to use. Maybe in a seperate video?

@khalidelgazzar 4 жыл бұрын

Great explanation. Thank you

@chandrashekhar9470 5 жыл бұрын

Thanks for such an insightful explanation. I am really inspired by your video .Could you please suggest ho did you find these topics or syllabus and from where do you get such detailed and precise information.Thanks

@MrSushil430 5 жыл бұрын

awesome video in simple language

@04pradeep 3 жыл бұрын

amazing explanation.

@uchepowers 6 жыл бұрын

You are a hero!!!

@puppy851226 3 жыл бұрын

great explanation!

@karthikeyasankarmuthurajan7754 5 жыл бұрын

Cool Explanation....Next time please Check the Mike Quality as voice is some what not hearable in few places...but thanks alot

@fredschneider7475 4 жыл бұрын

Great video but I have one question: Would it be better if you had one bit vector for each hash function? I don't understand why the values would be co-mingled from the different hash functions when you could have them in separate vectors.

@junlin3264 5 жыл бұрын

Great sharing, very clear

@prashanthtalla 5 жыл бұрын

Really appreciate Narendra. But why a lookup table is not considered efficient for the example provided? Based on cardinality of the column (values), we can either use bitmap or binary index.

@gakhov 5 жыл бұрын

@xyz The main problem of the lookup/hash table is the memory since its classical implementation requires to store real values indexed by, for instance, hash values. But the size of the elements could be quite big, e.g. some object in the database, or hard to produce, e.g. involved disk scan. With bloom filter we don't need to store values at, just to check if they exist or not. Consider using such bloom filter to optimize database check for objects. Before asking DB to check if object physically exists and perform the query, we can first check it in Bloom Filter and only if BF said it "may exist", then we perform the actual query. ant at all until the hash function can provide constant time to generate the value. Take a look at my recently published book "Probabilistic Data Structures and Algorithms for Big Data Applications" (pdsa.gakhov.com) for other data structures and such use case explained.

@prashanthtalla 5 жыл бұрын

@@gakhov Appreciate Andrii for clarifying. Yes, its better to know if a value exists or not before even querying the DB. Go to the DB only if you know the value may exist in the DB.

@peeyar2000 5 жыл бұрын

Thanks . This is a great topic.

@shreyasns1 3 жыл бұрын

Thanks Naren for the video. Your video did not explain How "Hen" understood that there is a collision and added probabilistic numbers? as the algorithm is checking the bits set in the bitarray. Am i missing something here?

@NitishSarin 6 жыл бұрын

So the case where Bloom Filter is used for Malicious URL detection, what happens if the Filter says that the URL is malicious? Does the browser now send a request to google with the particular URL for a confirmation, as the Bloom filter "Yes" would just just be a probability dependant answer? Or does it straight forward says that the URL is malicious?

@TechDummiesNarendraL 6 жыл бұрын

If you check the ratio of malicious vs good URLs browsed by user it will be 200:1. So its OK for browser to ask server for confirmation. Also if you use more hash functions in Bloomfilter then the probability of error decreases to less than 0.1% In that case you can make a parallel call to server if you doesn't want add latency when user visits potentially bad links.

@NitishSarin 6 жыл бұрын

@@TechDummiesNarendraL A parallel call? Well, That's interesting. I can think of two scenarios where we can utilise parallel calls. Please let me know which one were you suggesting. 1) While we are calculating the hash function values for the URL, we parallely make a call to the server, whichever is faster will be used. a) If the server response comes first, we use it. b) If Bloom Filter gives a result first and the result is a "NO", we use it. c) If Bloom Filter gives a result first and the result is a "PROBABLE YES", we wait for server confirmation. The wait time here will be less, if not zero, as the call was already made before starting with hash calculations. 2) In case Bloom Filter detects it as Malicious, we probably let the user visit the site, and meanwhile send a request to server for a confirmation. And now, if the server confirms it as Malicious, do we now show a notification to the user? Because as it was a Parallel call, the user might have already visited the site before a response came from the server.

@TechDummiesNarendraL 6 жыл бұрын

@@NitishSarin I would go for second one, may be don't page load or rendering. Before we are sure about the URL

@mathewsjose1990 8 ай бұрын

Nice explanation

@elmiguel1969 Жыл бұрын

Great video. Thanks!

@fun-with-kartik 4 жыл бұрын

Suggestion: Can you do some location based algorithm questions ? like s2 library algos.

@cyber-security8535 3 жыл бұрын

Can a bloom filter created by inbloom be read using pybloomfiltermmap or pybloomfiltermmap3 @Tech Dummies Narendra L

@nabidulalam6956 4 жыл бұрын

nicely explained .thanks.

@niloysaha3229 5 жыл бұрын

How will bloom filter work in distributed environment? Can we store the bit array in multiple nodes?

@vhiremath4 5 жыл бұрын

1. Shard the bit space across multiple nodes and do finds/queries with potentially O(K) network lookups when checking for existence. 2. Employ consistent hashing over the original entry to make sure only certain elements go to certain bloom filters hosted on each node (with only 1 network lookup when checking for existence). Although this isn't really a "distributed bloom filter" as much as it's employing a bloom filter on a given node that is guaranteed to only get a certain amount of the key space. That being said, you probably don't have a realistic need for doing something like this. Most of the advantages of using bloom filters are fast lookup and local storage without IO hops (no touching network or disk ideally).

@aashishradhanpura5295 6 жыл бұрын

awesome!! It is much helpful..

@rexyl547 5 жыл бұрын

Thank you! Great video, keep it up!

@soumyabiswas6251 5 жыл бұрын

Excellent work, thank you!

@mehranjanfeshan 6 жыл бұрын

great stuff, thanks for the video.

@SriramGopalGoli030792 4 жыл бұрын

How does this work in distributed systems?

@ChandramouliMallampalli99 6 жыл бұрын

great one thanks ! may be you can compare uses cases for count-min sketch vs bloom filters both being probabilistic ds

@gakhov 5 жыл бұрын

Take a look at my recently published book "Probabilistic Data Structures and Algorithms for Big Data Applications" (pdsa.gakhov.com) for the comparison. Simplifying, they solve different problems, the BllomFilter is designed to answer the question "Is element exist or not" (membership problem), while Count-Sketch Min answer the questions "How many time this element has been stored" (the frequency problem)

@sadigovelvin 5 жыл бұрын

Thank you Narendra.

@dhruvmehta7971 4 жыл бұрын

how to insert into hash function and index number generated

@sunilr360 6 жыл бұрын

Around 14:30, you say more hash functions = more accurate. But how? Won't the bit arrays be filled more quickly, leading to more false positives?

@wenlaizhang9017 5 жыл бұрын

Sunil R more problematic result.

@arctix7 5 жыл бұрын

You use multiple bit arrays, one per hash function. And when check for exists, you check the value output for your target string hash value from function in the corresponding bit array. That way you don’t overcrowd your bit array for false positives.

@thePankajsingh90 3 жыл бұрын

Not present probability for 'len' in your example?

@printedphysics6845 8 ай бұрын

Great, thanks 👌

@botambucollins3769 5 жыл бұрын

how did you get 2 . 6,4,10 please explain how i can use cat to generate the figures from the harsh# function

@pinkylover911 2 жыл бұрын

Thanks for great content

@amosec563 5 жыл бұрын

Thanks lot , nice explanation ...

@omprakashsharma9767 4 жыл бұрын

superb bro !!!!

@saurabhahuja6707 4 жыл бұрын

Hello sir, Very good explanation Sir can u start a series of Top 100 data structures interview question and their expalantion.

@saikatbiswas4862 Жыл бұрын

Awesome content !!!1

@pranaypatadiya 5 жыл бұрын

Thank you very much for in detailed informations. Can you please also do for hyperloglog.

@sowjanyav6570 3 жыл бұрын

Can Trie data structure be used to check if username is present? It wont take up lot of space, and lookup will be O(1) right? Of course, it requires all the usernames to be stored in Trie before lookup. What are the drawbacks of this? Pls reply your thoughts on this

@behroozghorbani7039 5 жыл бұрын

Thank you! Great video!

@PrateekMehtaABDFAN 2 жыл бұрын

Superb video sir . Quick question : Is there a solution to handle the size of bit array dynamically ? if our data increases we need to rehash this data or we will be using consistent hashing to avoid this during a rehash ?

@ZionREI_CN 4 жыл бұрын

What if we stored lots of data and the BitArray holds all "1"s?

@karankanchetty105 6 жыл бұрын

Amazing concept and great explanation.! Thank you.

@haribachala 5 жыл бұрын

what about Set data structure and its add method (java) ?

@peijunwu7354 3 жыл бұрын

Do you mean 1 error in 10 million requests or 1 error in 10 million inputs? Will the same input different queries result in same false positive?

@firaseljerdy1244 5 жыл бұрын

comparisons take O(n) or O(m+n)?

@div0007 5 жыл бұрын

Time complexity with BF would be K time O(1) instead of O(K)? Anyhow it will be constant as you mentioned.

@viks599 3 жыл бұрын

why would you read hash entry from DISK ?!!! Generally it will be in Memory ! Collision may be there but if a good hash function is used then it is not a big issue.

@johnmartin5729 5 жыл бұрын

What about the key deletion ? Suppose if we remove the DOG, so 10 needs to remove, and next time if we search for RAT, we might get a wrong answer.

@bhupinderbisht6286 5 жыл бұрын

Bloom filter is used for the use cases when there is no deletion. It's a prerequisite for bloom filter.

@vhiremath4 5 жыл бұрын

If you're ok with taking up more space, you can keep a list of numbers and increment each index by 1 each time an item is added and decrement each index by 1 each time an item is removed. This concept is called reference counting.

@nipeshsingh1235 5 жыл бұрын

A question. Let's say that I have 100 elements. I will have to create a bit array of length 100 for implementing bloom filter. Let's assume that I have 3 hash functions. Now, initially all the values in the bit array are set to zero and as I get a element to verify if its in the database, I pass the new element from 3 hash functions and thus I get 3 distinct values. In best case scenario, I will have all my indexes filled with 1 after checking 33-34 elements. After that whichever element comes, and I pass that element from 3 hash functions, there is a very high probability that the indexes are already set to 1 and thus even if the element is a non-existing one, it gets rejected. I understand that bloom filter is probabilistic, but in the above example, after 33-34 elements all the elements will be rejected as the bloom filter is completely filled. This seems to be very inefficient. Can i get some help?