how do we determine a good value of 'L' i.e. the number of tables. Is there some logic to get it?
@mariusm51879 жыл бұрын
Finally understand LSH
@carles56018 жыл бұрын
Great explanation. Thanks!
@AnkitSharmaKumar8 жыл бұрын
Great explanation... Thanks for sharing!
@gaaligadu1487 жыл бұрын
I don't understand how we found near duplicates inside a bucket.You can't compare using all the D-dimensions because obviously it will be different because they are near duplicates.Don't we use subsections of the D-dimensions like you said it before to check near duplicates ?
@dani0qiu0china7 жыл бұрын
very good complexity analysis !
@MrBertmsk8 жыл бұрын
it's little unclear how to eliminate duplicates? Each "table" (bucket?) contains different hash ids for the data. Should I do comparasions within one bucket or against all buckets? How to combine resulting hash then?
@renzocoppola46647 жыл бұрын
if you compare against all buckets then you would be comparing aganst all points
@federicomagliani17 жыл бұрын
Have you been understood? I repeated the hash process and I only concatenate the results.
@gauravmenghani47 жыл бұрын
You do comparisons within the buckets to remove false positives. You repeat the process with new random hyperplanes to consider points which were false negatives in the previous iteration.
@renzocoppola46647 жыл бұрын
I suppose you could take the adventage that the neraby buckets have 1 bit difference.
@alihusen1119 жыл бұрын
would you pleas tell me what do you mean when you said we do the same comparison to eliminate d ??
@harshgoyal56947 жыл бұрын
What do u mean by D dimensional document??? Thanks in advance :)
@RobertoMartin17 жыл бұрын
D is the size of the dictionary. you represent each document with words from the dictionary, so a D dimensional document will have at most d unique words. Usually, each document will contain less than D unique words, but they're still represented with D dimensions, just that some of the dimensions have zero as values.
@rahulat857 жыл бұрын
8:39
@gaaligadu1487 жыл бұрын
Hi harsh, each document can be represented numerically by D-dimensions. For ex: it can be image whose D-dimension vector would be all of it's pixel values