How to Efficiently Check If a User Exists Among Billions?

Рет қаралды 26,180

Күн бұрын

Пікірлер: 90

@preetam.keshari Ай бұрын

Here's the video which explains how to calculate the optimal value of k and m in Bloom Filters. Please watch the video as an extension to the above video. kzbin.info/www/bejne/q6DVlYWCrbRjrbc

@GOZENIEINNOVATIONS Ай бұрын

Helpful

@ashraf_isb 9 күн бұрын

good one, thank you Preetam Keshari sir

@preetam.keshari 9 күн бұрын

Thank you 😇

@phoneix24886 19 күн бұрын

You can use two level indexing (and keep the second level index as a shard key) to make faster searches instead of using boom filters. The only catch is that you got to have a common index for almost all of your data that needs to be joined and picked up.. in that way they should belong to the same shard.

@preetam.keshari 19 күн бұрын

This might cause bottlenecks while scaling up. However you can find some practical use cases of bloom filters here redis.io/docs/latest/develop/data-types/probabilistic/bloom-filter/

@PrasunRoyPlus 14 күн бұрын

Most modern distributed databases address this issue with a Partition Key, which is nothing but a way of sharding the DB into a number of partitions that are rarely hot. Depending on your application find a suitable Partition Key, such as First letter of Last Name, DOB of users or Department of users, then you would only be searching in that small partition not the entire DB.

@preetam.keshari 13 күн бұрын

That’s true. But still it will involve db queries for both true negative and true positive cases. We would like to avoid db queries as much as possible.

@PrasunRoyPlus 13 күн бұрын

@preetam.keshari but we are not completely avoiding DB calls by using Bloom filters, infact with a user base of 1 billion it is more likely to get hits than misses with Bloom filter. This would result in DB calls more often than not. A partition key would also result in a deterministic query performance, unlike the probabilistic nature of Bloom filters.

@preetam.keshari 13 күн бұрын

Totally agree. However you can always choose a reasonable error rate and determine the values of k and m so that number of false positives are reduced. Please watch part 2 of this video on my channel for this. Also redis has mentioned some of the practical use cases of bloom filters in their website. You can check those out as well. 😇

@mahajveemahajvee8941 Ай бұрын

Redis supports bloom filter we can use redis to implement on our backend

@preetam.keshari Ай бұрын

That’s right.

@yasararafath3524 20 күн бұрын

We are using MongoDb Atlas M80 tier, So its so easy to search in this level tier, For bigger Database first you need higher tier version of DB so it can handle any queries, Not only finding user also simple find query also will be slower so API's will be more slower, Ideally if database size is more, then it will be scaled enough to handle any queries, And definitely we should have indexes for them

@preetam.keshari 20 күн бұрын

Thanks for the insights

@pritamsarkar9931 21 күн бұрын

And also if we try to use the partitioning mechanism in the DB to store the usernames, the search cost can be reduced when there is a probabilistic "YES" and direct query is made.

@preetam.keshari 21 күн бұрын

Yes, sharding you mean? It can be an effective strategy for reducing search costs.

@pritamsarkar9931 20 күн бұрын

@preetam.keshari yess sir.

@theunusual4566 26 күн бұрын

Very informative and something new. Thanks for the video.

@preetam.keshari 26 күн бұрын

Thanks, I'm glad it was helpful! 🙏

@theunusual4566 26 күн бұрын

@@preetam.keshari :)

@ashwinnema06 2 ай бұрын

pretty informative video. I am not sure why you do not have much views but I have sybscribed to your channel

@preetam.keshari 2 ай бұрын

Thank you :) It means a lot.

@thegreekgoat98 27 күн бұрын

Bhai maja asigala... Good explanation

@preetam.keshari 27 күн бұрын

Dhanyavad 😇

@IAmUsingAndroid 17 күн бұрын

I think we achieve it using database sharding along with caching.

@preetam.keshari 17 күн бұрын

Without involvement of db queries for both true negative and true positive case, we cannot achieve it using sharding + caching. You can find some of the real life use cases of bloom filters here: redis.io/docs/latest/develop/data-types/probabilistic/bloom-filter/ Hope that helps 😇

@SheetTalkHub 20 күн бұрын

Can you talk about some use-cases where it is being used or it can be used ?

@preetam.keshari 19 күн бұрын

Hi. You can find detailed use cases of bloom filters redis.io/docs/latest/develop/data-types/probabilistic/bloom-filter/

@damarajusushanth5702 15 күн бұрын

Wherr does the bloom filter value comd from

@preetam.keshari 15 күн бұрын

You mean the has values? If yes, those come from the hashing functions If you mean about the bloom filter bits array, we can implement this bits array using redis

@kesavanarayanaanaparthi6428 18 күн бұрын

Nice information 👌, thanks for your effort 👏👏, I have a query 😇. What if the user updates their name and also if the user gets removed from the system?, What is your solution in this scenarios? The condition is to maintain the some consistency across the operations such as add remove or update 🙋🙋.

@preetam.keshari 18 күн бұрын

Hi. 1. Bloom filters will be used to check let’s say an user id (unique identifier) exists in system or not. So changing names doesn’t have any effect on it 2. If user is deleted, we can pass its user id through the hash functions again to retrieve the indices of the filter and mark it as 0 Hope this explanation helps 😇

@ananychitranshi9408 Ай бұрын

Great content bro. Subscribed. Can you please make a video to determine the optimal size of m & k ?

@preetam.keshari Ай бұрын

Hi. Video preparation is under progress. Stay tuned.

@preetam.keshari Ай бұрын

Hi. Here's the video explaining your query kzbin.info/www/bejne/q6DVlYWCrbRjrbc Let me know if you found it helpful :)

@kartik9892 21 күн бұрын

If there are millions of users, I believe all indexes can become 1. It seems there is where size of m is to be considered.

@preetam.keshari 21 күн бұрын

Hi. That’s correct. I have created a separate video on how to determine the optimal size of k and m. Please go and watch it. Here’s the link: Part 2: Maths Behind Bloom Filters | Efficiently Checking an User Among Billions #systemdesign kzbin.info/www/bejne/q6DVlYWCrbRjrbc

@pranayanahak4708 Ай бұрын

Can we use bloom filter to search matching user ids?

@preetam.keshari Ай бұрын

No. It can be used only to detect the existence of a user

@jayantgupta5310 29 күн бұрын

What is the collision factor of it ??

@preetam.keshari 29 күн бұрын

Hi. By collision factor do you mean the rate of false positives (error rate)? If so, please watch the Part 2 of this video on my channel, where I have discussed the mathematics behind it. Hope that helps 😇. Let me know if your query is something else. Thank you 🙏

@mohitgupta7613 Ай бұрын

what will happen if all the elements in bit_array is set to 1 after multiple insertions... is it going to give "user_exists" for all new users?

@preetam.keshari Ай бұрын

Yes. And that’s where you need to determine the optimal size of k and m based on the requirements like number of users in your database, estimated increase in that number etc. I will create a separate video on how to determine the optimum value of k and m soon.

@AdityaRaj-rv5js Ай бұрын

Great Question mohit,I too had this after his explanation. now I have few more queries, where is this bit array came from, who created it and where is it stored for next checks to be made. Now, that's aside what is the size of k and m in general, let say for a billion users of insta and fb, what is the size. Good explanation but left many loopholes. Probably in next video he will clear these doubts.

@preetam.keshari Ай бұрын

@@AdityaRaj-rv5jsHi. We can use redis to implement bloom filter. It supports bloom filter. Now coming to the topic of finding optimal value of k and m, please watch Part 2 of bloom filters which I have released already. Thanks 😊

@dakshbhatnagar 17 күн бұрын

wont indexing combined with binary search help?

@preetam.keshari 17 күн бұрын

That's a great approach, but Bloom filters offer even faster lookup times for large datasets. However you can find some of the real life use cases of bloom filters here: redis.io/docs/latest/develop/data-types/probabilistic/bloom-filter/ Hope that helps 😇

@TouqeerShafi 22 күн бұрын

From my pov its useless, eventhough bloom filter can tell you user exists or not. At the end of the day you still have to query the database

@preetam.keshari 22 күн бұрын

Not Exactly. Bloom filters produce 100% accurate true negative. It means, if the output of bloom filter is "User doesnot exist in the system", then it's 100% accurate. So it saves us from making an extra DB query. Also, redis supports implementing bloom filters. Hope this explanation helps.

@TouqeerShafi 18 күн бұрын

@preetam.keshari redis uses the bloom filter make sense because redis has a job to check if the cache key exsits but if you apply the same bloom filter use case for username exists wont do a job because auth system can be complex specially if you have a billion of user. In my view adding bloomfilter for username or email check is just over engineering.

@whoami5955 Ай бұрын

when one fox howls all others join in, this video is a example of it

@sudheerkumar-tp1mg 25 күн бұрын

If already billions of users present then how to add all the users to Bloom filters, and where we can store this array in a database or in java memory suppose if a pod is crashed we will loose everything and again we will try to add in to bloom filters how google will use please suggest.

@preetam.keshari 24 күн бұрын

Hi. We can implement bloom filters using redis. It supports bloom filters data structures. I will be creating a video on how to implement bloom filters using redis soon. Stay tuned.

@aryabhat_online Ай бұрын

Good job, any tutorial on how to implement.

@preetam.keshari Ай бұрын

Hi. Glad that you liked the video. Redis supports bloom filters. It can be implemented using Redis. I will create a video on it soon. Stay tuned 😇

@chetankomal5016 Ай бұрын

please make a video on how to determine the optimal size of m and k and the maths behind

@preetam.keshari Ай бұрын

Sure

@rapoliit Ай бұрын

I second that. Thanks for the videos bro.

@preetam.keshari Ай бұрын

@@rapoliit Thank you 😇

@preetam.keshari Ай бұрын

Hi. Here's the video explaining your query kzbin.info/www/bejne/q6DVlYWCrbRjrbc Let me know if you found it helpful :)

@ranuchakravarti3277 Ай бұрын

Suppose return index is 0,4,6 Then it means name doesn't exists in db, then how to set index to 1 in bit arry, as 0 index is already 1 in arry????

@preetam.keshari Ай бұрын

If a bit is already set to 1 in the bit array.. you don’t need to set it again. You just need to set the 0 bits to 1.

@jeetchheda8916 Ай бұрын

I thought here you chose the value of m = k * number of usernames to store 💀 It felt correct for this example you used 😂

@preetam.keshari Ай бұрын

No.. it was just random.. there’s a separate algorithm to determine the optimal value of m and k. I will create a separate video on this topic

@RahulkrishnaKurapati 28 күн бұрын

"what is bloom filter how can it be implemented in Laravel for checking username while registering a new user" This prompt gave me more clarity than your video.

@preetam.keshari 28 күн бұрын

Thanks for checking out my video. The goal of this video is to give the BTS working of a bloom filter and not code approach to implement it. It’s practical implementation differs from tech stack to tech stack. I will be creating a video on how to implement it in nodejs with redis. Hope that will help. Thank you 😇

@RahulkrishnaKurapati 28 күн бұрын

@@preetam.keshari Yes I get it, but you are missing something. I follow bytebytego for similar tech stack explanations. I get things in one go from there. Anyway, thanks for allowing me to go a bit deeper about the bloom filter.

@riyansh3396 2 ай бұрын

Why every youtuber is talking about this suddenly?

@preetam.keshari 2 ай бұрын

It’s a quite interesting topic to learn

@sirfinsaan Ай бұрын

Every one want viewers

@xtan-io Ай бұрын

@@preetam.kesharilearn something useful, you don't even get access to a platform with a billion users.

@laxmikantsaraswat6319 Ай бұрын

You watch one video then the youtube algorithm knows what is your area of interest , that's why 😅😂

@SBamniya 24 күн бұрын

Looks like one of the MAANG company asked it in their interview

@deeptansupatro7421 2 ай бұрын

Damn😵😵

@preetam.keshari 2 ай бұрын

😁

@lastbenchers019 Ай бұрын

200th subscriber

@preetam.keshari Ай бұрын

Thank you

@Tyrus429 20 күн бұрын

Not a very convincing solution, only solves one part of the problem, still gives false positives and has to use some other approaches as well to be double sure. By the way, the information u provide was good, thanks

@preetam.keshari 20 күн бұрын

It's true that Bloom filters have limitations, but they can be a powerful tool if you compute the value of k and m smartly. Please watch the part 2 of the above video where I have discussed the mathematics behind calculating k and m. Hope that will help. kzbin.info/www/bejne/q6DVlYWCrbRjrbcsi=P4cn99Ub-zvLpEcA

@codecrash_t132 Ай бұрын

subscribed👍:349

@preetam.keshari Ай бұрын

Thank you 😇

@31shivasilmawala30 26 күн бұрын

Impressive video, @preetam.keshari! Adding some code examples would greatly benefit for newcomers by providing clear guidance. Keep up the great work!.

@preetam.keshari 26 күн бұрын

I am glad you liked the video. I will consider adding code examples in future videos. 😇