There is in error in the code at 15:24, 'signature.append(idx)' should be replaced with 'signature.append(i)' - full code example here: gist.github.com/jamescalam/a9d5708ab84aaf92055f8a08e906efba
@MrLazini11 ай бұрын
This video is excellent in so many ways. Thanks James!
@az2252Ай бұрын
Easy to follow video tutorial. Awesome work
@LamNguyen-hw9lq5 ай бұрын
You explained much better than my professor!
@skobanemusic5752 Жыл бұрын
Thank you James for your thorough explanations in all your videos.
@Han-ve8uh Жыл бұрын
1. 15:10 def create_hash_func takes size as input but never used it? 2. What is the hash function used at 21:47? Looks like it never hashsed but directly compared the segmented signatures? That mismatches the visuals at 18:33 which shows 3 hash functions shaded blue red green
@Munk-tt6tz6 ай бұрын
Best explanations as always, thank you!
@anujlahoty802210 ай бұрын
Thanks a lot for this amazing stuff!
@kejdilleshi1342 жыл бұрын
Hello James, I am implementing LSH but there is a problem in the "signature info" part. In my computer the signature similarities between a,b and b,c are completely random. So the Jaccard (a_sig,b_sig) has no connection with Jacard(a,b) the same for b,c. In my opinion this means that the signature is not representing correctly the sentence. I tried increasing the number of MinHash func however nothing changed. Best, Kejdi.
@jamesbriggs2 жыл бұрын
If you try with a and b being the same sentence? Also increasing/decreasing the shingle size?
@charlesc2064 Жыл бұрын
just curious, what's the reason for using a list object instead of set in the "shingle" function at 6:52 ? Thanks!
@tiago.engenheiro Жыл бұрын
why don't u just loop for 'values' within func in the second loop? what are u gaining with looping for (1, len(vocab)+1) to find the index?
@mihaelacostea57838 ай бұрын
Does this work for semantic similarity? Meaning texts that say the same thing but with different words?
@MarsXion Жыл бұрын
Very helpful, Thank you!
@imlazy0072 жыл бұрын
Hello @James Briggs: I was curious if it makes sense to use minhash LSH instead of more proven solutions such as solr / elasticsearch for searching same/similar text records. Do you happen to know of any pros and cons of using LSH approach instead of solr? Love your channel, really appreciate it for all the hardwork
@vaibhavkirtankar53362 жыл бұрын
Amazing explains. Thanks alot
@AymenSekhri-gw8wh Жыл бұрын
It was really helpfull, thank you so much
@HungTran-fp3ij6 ай бұрын
Hello @jamesbriggs. thank you for your tutorial.
@loganfoster86817 ай бұрын
Appreciate you writing this but really wish you would have done a better explanation of how it works and focused less on building a script. The reliance on references to functions to call makes this useful for people who want to build this exact script or are already very familiar will those functions in python but makes it essentially useless for building an understanding of how lsh works or learning how to make a custom program using lsh
@heetaelee78732 жыл бұрын
May i ask a question? 17:53 Why the Jaccard between a_sig and b_sig (or c_sig and b_sig) is lower than the Jaccard between original a and b? and What it means?
@EmadGohari2 жыл бұрын
I think this is actually not correct, see this kzbin.info/www/bejne/mIKkioxufrN1rsk
@ddwatcherАй бұрын
What we are doing using the min hash function is randomly picking some indices from the one hot vector that are one. Now one hot vectors were just a representation for the different n-grams(substrings) in the original string. So compared to original string which had all info we now only have some substrings for it so intuitively the similarity should decrease that's what I think.
@RezaJafari-hs7dq Жыл бұрын
i think you are doing 1 hot encoding in a wrong way, Could you explain more? Thanks
@maryamaziz38413 жыл бұрын
Great work 💯
@jamesbriggs3 жыл бұрын
thanks Maryam!
@EmadGohari2 жыл бұрын
Hey James thanks for great explanation, I think your point at 17:42 (cell 19 in code) is not actually correct. Please check kzbin.info/www/bejne/mIKkioxufrN1rsk The expected fraction of matching elements in signatures of A, B (expected # matching elements in signs/length of signs) = jaccard of A, B