Hyperloglog Explained | Counting things at scale.

Рет қаралды 12,160

Күн бұрын

Пікірлер: 22

@TriNguyen-xi8ji 6 ай бұрын

I find this more intuitive and helpful than the video that has more than a million views recommended to me. May your channel be blessed by the algorithm.

@core_dump 6 ай бұрын

Thanks!

@bigpopakap 2 жыл бұрын

Fantastic and simple explanation of this algorithm! At first, hyperloglog sounds like magic. But now I can fully understand the core principles that make it work. It's ingenious!

@core_dump 2 жыл бұрын

Thanks Kapil! ❤️ Means a lot.

@Ali42374 8 ай бұрын

at 03:56, the mean should be (2+0+1+2)/4 = 1.25. Why you missed the bucket 2 with 1 count ? . It will narrow down the answer to 4 unique entries

@rahulsbytes 8 ай бұрын

leading zero in remaining bits...... is 0

@ph6295 7 ай бұрын

저도 알고 싶어요

@SauravSahu01 2 жыл бұрын

3:46 How to determine that we need to consider first 2 bits only, not first 3-bits? Or does it not matter a great deal with the final mean output?

@core_dump 2 жыл бұрын

So there is a tradeoff you have to make, the first n bits that you take will determine the number of buckets you will have, thus multiple data points to make the average better, but that would make the rest of the binary number smaller and make count in each bucket be inaccurate. So you have to run tests and experiments on your own data to decide the balance.

@dytundesu 3 ай бұрын

very cool explaination bro

@BernhardBB8 Жыл бұрын

Im not fully getting it: why can I not just count the elements, instead of assigning a random number to each element, and analysing that number...

@PravinDahal Жыл бұрын

Not memory efficient. To see if the new entry has been seen before, you'll have to store and check against all that have come so far.

@anishkelkar6434 Жыл бұрын

I don't think it's about storing either.. but when you get a new value in the entey you will have to search it in the set to see if it exists . Which would be logn for extremely large sets as hashset wouldn't be possible... Thus we are optimizing over the process of checking value in the set by having this approximate data

@GameSteals 4 ай бұрын

We cannot simply *count* the elements because of the constraint where we only need to count *unique* elements. How do we ensure that we are not counting the same element twice while counting? To ensure that, we would need to maintain some map or a set, and that is where the problem lies since storing a billion elements in a set is costly, and is not feasible. That is where this algorithm comes in to give an estimate on the number of unique values. If the problem was simply counting the number of occurrences of elements, we would not have needed this, and simply maintaining a count would be enough.