How Fuzzy Text Search Works

  Рет қаралды 13,346

Big Python

Big Python

Күн бұрын

In this tutorial, we explore how to make fuzzy autocomplete in Python. First using libraries like `fuzzywuzzy`, then from scratch using Levenshtein distance and the Wagner-Fischer algorithm. Finally, we look at how we can get C/C++ level of performance with Python using vectorization and NumPy. For production use, definitely prefer one of the libraries mentioned at the beginning :) If you're interested in text processing, dynamic programming, and making Python programs run fast, I hope you'll find the video useful.
Let me know in the comments what tutorial you'd like to see next!
tkarabela.github.io/bigpython
/ bigpythondev
🔴 SOURCE CODE 👇
github.com/tkarabela/bigpytho...
◼️ TIMESTAMPS 🕑
00:00 Intro
00:41 Example: Fuzzy autocomplete with "rapidfuzz"
01:27 Libraries for fuzzy matching in Python
03:01 String similarity measures
04:07 Hamming distance explained
05:17 Hamming distance, implementation
05:47 Levenshtein distance explained
07:32 Levenshtein distance, implementation
08:26 Wagner-Fischer algorithm explained
13:40 Wagner-Fischer performance in Python
14:47 How to optimize it? Cython/Numba vs. vectorization
15:13 Vectorization of Wagner-Fischer algorithm
17:14 Example: Fuzzy autocomplete with vectorized Wagner-Fischer
17:53 Comparison with C implementation
18:20 Outro
◼️ REFERENCES 📚
en.wikipedia.org/wiki/Edit_di...
en.wikipedia.org/wiki/Wagner%...
chairnerd.seatgeek.com/fuzzyw...
maxbachmann.github.io/RapidFuzz/
◼️ TOPICS 🎓
#ProgrammingTutorial #TextSearch #BigPython #Python
◼️ CREDITS 🙏
Icons made by Freepik from www.flaticon.com

Пікірлер: 22
@jsp2518
@jsp2518 Жыл бұрын
I found this video right when i need and i have been getting back to it. Would be awesome to have more tutorials, really high quality here!
@BigPython
@BigPython Жыл бұрын
Glad you found the video useful! :) I'd like to create more videos this year, so please stay tuned.
@user-cv7br5xd6i
@user-cv7br5xd6i 2 жыл бұрын
Pretty nice!! Good one you and everything is explained well and nice, thanks very much!
@SnekCato
@SnekCato 3 жыл бұрын
I cannot wait to be able to how to do this, i'm still a beginner but i love how you explain such cool topics.
@BigPython
@BigPython 3 жыл бұрын
Thank you! I used to do private tutoring while I studied at university, so I guess I miss explaining this stuff :D Teaching is a good way of learning, nothing quite makes you understand a subject like having to break it down for someone else. Your videos have a nice easy-to-understand presentation and great visuals, I'd love to see your take on some more advanced topics! I'm sure you'll get there in time :)
@robimez
@robimez Жыл бұрын
very interesting , love how you explain things , subbed.
@RyanSteele82
@RyanSteele82 3 жыл бұрын
Thank you for this. Really good explanation!
@BigPython
@BigPython 3 жыл бұрын
Thanks, I'm glad it was helpful :)
@mohammadsheikh7564
@mohammadsheikh7564 3 жыл бұрын
Came across this right when I needed it, thanks for the video. Would love to see a video on the Finite State Machines as well.
@BigPython
@BigPython 3 жыл бұрын
Glad it was useful! I have something in the works on Levenshtein automata and Tries, that video is coming :)
@thanhhuy5277
@thanhhuy5277 2 жыл бұрын
nice video , i see O(n^2), and i have 1 question ,if input text long 500 and has 1e7 item, how much is the time run?
@user-co7xp2bc3o
@user-co7xp2bc3o Жыл бұрын
there is any two sentence meaning nlp or algorithm is there? or not is there means can you tell which algo or nlp ?how that work
@Martin.Eriksson
@Martin.Eriksson 3 жыл бұрын
Great video
@BigPython
@BigPython 3 жыл бұрын
Thank you! :)
@lammelmiklos3765
@lammelmiklos3765 2 жыл бұрын
Very nice tutorial! I was wondering: what dev environment do you use here in the video? It seem to be some kind of Jupyter Notebook with some nice cusomization... How did you customize?
@BigPython
@BigPython 2 жыл бұрын
Thanks! It's Jupyter Notebook with github.com/dunovank/jupyter-themes and some custom CSS styles on top using github.com/openstyles/stylus
@lammelmiklos3765
@lammelmiklos3765 2 жыл бұрын
@@BigPython Thank you very much!
@lammelmiklos3765
@lammelmiklos3765 2 жыл бұрын
@@BigPython I played with the styles a little bit, and figured, that you were using the grade3 style. However my understanding of the openstyles, that is has to applied to the browser itself. Is that correct? Do you mind sharing your style?
@BigPython
@BigPython 2 жыл бұрын
@@lammelmiklos3765 Yes, it's applied at the browser level. I've added my current CSS to the git repo, feel free to have a look :) github.com/tkarabela/bigpython/blob/master/jupyter-grade3-bigpython.css
@andreaventurelli2359
@andreaventurelli2359 2 жыл бұрын
You talk about the possibility of change the weight of the (add,delete,replace) how can it is achievable?
@BigPython
@BigPython 2 жыл бұрын
In the Wagner-Fisher algorithm, when computing d[i, j] you take minimum of the neighbors d[i-1, j], d[i, j-1], d[i-1, j-1] with cost of the operation added to each. Usually all the costs (weights) are set to 1, except for the case of d[i-1, j-1] when a[i-1] == b[j-1] (which is a match, so it should not add to the distance).
This Algorithm is 1,606,240% FASTER
13:31
ThePrimeagen
Рет қаралды 775 М.
The Algorithm Behind Spell Checkers
13:02
b001
Рет қаралды 409 М.
EVOLUTION OF ICE CREAM 😱 #shorts
00:11
Savage Vlogs
Рет қаралды 7 МЛН
Mama vs Son vs Daddy 😭🤣
00:13
DADDYSON SHOW
Рет қаралды 42 МЛН
Full Text Search PostgreSQL
18:13
Ben Awad
Рет қаралды 72 М.
Cython makes Python INSANELY FAST
19:08
Carberra
Рет қаралды 33 М.
Use Arc Instead of Vec
15:21
Logan Smith
Рет қаралды 140 М.
How C++ took a turn for the worse
5:03
Code Persist
Рет қаралды 267 М.
Fuzzy String Matching in Python
13:07
NeuralNine
Рет қаралды 62 М.
Address Matching in Excel Using Levenshtein Distance
15:06
Anthony Smoak
Рет қаралды 8 М.
how Google writes gorgeous C++
7:40
Low Level Learning
Рет қаралды 832 М.
Python 3.10 Pattern Matching in Action
11:53
Big Python
Рет қаралды 9 М.
#samsung #retrophone #nostalgia #x100
0:14
mobijunk
Рет қаралды 11 МЛН
ОБСЛУЖИЛИ САМЫЙ ГРЯЗНЫЙ ПК
1:00
VA-PC
Рет қаралды 2,4 МЛН
Лучший браузер!
0:27
Honey Montana
Рет қаралды 286 М.