Locality Sensitive Hashing (LSH) for Search with Shingling + MinHashing (Python)

  Рет қаралды 29,814

James Briggs

James Briggs

Күн бұрын

Пікірлер: 30
@jamesbriggs
@jamesbriggs 3 жыл бұрын
There is in error in the code at 15:24, 'signature.append(idx)' should be replaced with 'signature.append(i)' - full code example here: gist.github.com/jamescalam/a9d5708ab84aaf92055f8a08e906efba
@MrLazini
@MrLazini 11 ай бұрын
This video is excellent in so many ways. Thanks James!
@az2252
@az2252 Ай бұрын
Easy to follow video tutorial. Awesome work
@LamNguyen-hw9lq
@LamNguyen-hw9lq 5 ай бұрын
You explained much better than my professor!
@skobanemusic5752
@skobanemusic5752 Жыл бұрын
Thank you James for your thorough explanations in all your videos.
@Han-ve8uh
@Han-ve8uh Жыл бұрын
1. 15:10 def create_hash_func takes size as input but never used it? 2. What is the hash function used at 21:47? Looks like it never hashsed but directly compared the segmented signatures? That mismatches the visuals at 18:33 which shows 3 hash functions shaded blue red green
@Munk-tt6tz
@Munk-tt6tz 6 ай бұрын
Best explanations as always, thank you!
@anujlahoty8022
@anujlahoty8022 10 ай бұрын
Thanks a lot for this amazing stuff!
@kejdilleshi134
@kejdilleshi134 2 жыл бұрын
Hello James, I am implementing LSH but there is a problem in the "signature info" part. In my computer the signature similarities between a,b and b,c are completely random. So the Jaccard (a_sig,b_sig) has no connection with Jacard(a,b) the same for b,c. In my opinion this means that the signature is not representing correctly the sentence. I tried increasing the number of MinHash func however nothing changed. Best, Kejdi.
@jamesbriggs
@jamesbriggs 2 жыл бұрын
If you try with a and b being the same sentence? Also increasing/decreasing the shingle size?
@charlesc2064
@charlesc2064 Жыл бұрын
just curious, what's the reason for using a list object instead of set in the "shingle" function at 6:52 ? Thanks!
@tiago.engenheiro
@tiago.engenheiro Жыл бұрын
why don't u just loop for 'values' within func in the second loop? what are u gaining with looping for (1, len(vocab)+1) to find the index?
@mihaelacostea5783
@mihaelacostea5783 8 ай бұрын
Does this work for semantic similarity? Meaning texts that say the same thing but with different words?
@MarsXion
@MarsXion Жыл бұрын
Very helpful, Thank you!
@imlazy007
@imlazy007 2 жыл бұрын
Hello @James Briggs: I was curious if it makes sense to use minhash LSH instead of more proven solutions such as solr / elasticsearch for searching same/similar text records. Do you happen to know of any pros and cons of using LSH approach instead of solr? Love your channel, really appreciate it for all the hardwork
@vaibhavkirtankar5336
@vaibhavkirtankar5336 2 жыл бұрын
Amazing explains. Thanks alot
@AymenSekhri-gw8wh
@AymenSekhri-gw8wh Жыл бұрын
It was really helpfull, thank you so much
@HungTran-fp3ij
@HungTran-fp3ij 6 ай бұрын
Hello @jamesbriggs. thank you for your tutorial.
@loganfoster8681
@loganfoster8681 7 ай бұрын
Appreciate you writing this but really wish you would have done a better explanation of how it works and focused less on building a script. The reliance on references to functions to call makes this useful for people who want to build this exact script or are already very familiar will those functions in python but makes it essentially useless for building an understanding of how lsh works or learning how to make a custom program using lsh
@heetaelee7873
@heetaelee7873 2 жыл бұрын
May i ask a question? 17:53 Why the Jaccard between a_sig and b_sig (or c_sig and b_sig) is lower than the Jaccard between original a and b? and What it means?
@EmadGohari
@EmadGohari 2 жыл бұрын
I think this is actually not correct, see this kzbin.info/www/bejne/mIKkioxufrN1rsk
@ddwatcher
@ddwatcher Ай бұрын
What we are doing using the min hash function is randomly picking some indices from the one hot vector that are one. Now one hot vectors were just a representation for the different n-grams(substrings) in the original string. So compared to original string which had all info we now only have some substrings for it so intuitively the similarity should decrease that's what I think.
@RezaJafari-hs7dq
@RezaJafari-hs7dq Жыл бұрын
i think you are doing 1 hot encoding in a wrong way, Could you explain more? Thanks
@maryamaziz3841
@maryamaziz3841 3 жыл бұрын
Great work 💯
@jamesbriggs
@jamesbriggs 3 жыл бұрын
thanks Maryam!
@EmadGohari
@EmadGohari 2 жыл бұрын
Hey James thanks for great explanation, I think your point at 17:42 (cell 19 in code) is not actually correct. Please check kzbin.info/www/bejne/mIKkioxufrN1rsk The expected fraction of matching elements in signatures of A, B (expected # matching elements in signs/length of signs) = jaccard of A, B
@PriyanshuSingh-hm4tn
@PriyanshuSingh-hm4tn Жыл бұрын
Great.
@IillyMacdovers-cc6ob
@IillyMacdovers-cc6ob Жыл бұрын
Idiological
How LSH Random Projection works in search (+Python)
19:08
James Briggs
Рет қаралды 7 М.
Faiss - Introduction to Similarity Search
31:37
James Briggs
Рет қаралды 60 М.
Кто круче, как думаешь?
00:44
МЯТНАЯ ФАНТА
Рет қаралды 2,8 МЛН
My MEAN sister annoys me! 😡 Use this gadget #hack
00:24
When mom gets home, but you're in rollerblades.
00:40
Daniel LaBelle
Рет қаралды 145 МЛН
Locality Sensitive Hashing   Part 1, Jeffrey D  Ullman
50:09
EIT Digital
Рет қаралды 46 М.
Similarity & MinHash
37:33
Ben Langmead
Рет қаралды 14 М.
Choosing Indexes for Similarity Search (Faiss in Python)
31:33
James Briggs
Рет қаралды 21 М.
LSH.9 Locality-sensitive hashing: how it works
16:18
Victor Lavrenko
Рет қаралды 55 М.
Semantic Chunking for RAG
29:56
James Briggs
Рет қаралды 26 М.
LangGraph 101: it's better than LangChain
32:26
James Briggs
Рет қаралды 83 М.