Advanced Data Structures: Aho-Corasick Dictionary Links

  Рет қаралды 27,230

Niema Moshiri

Niema Moshiri

Күн бұрын

Пікірлер: 24
@Zashxq
@Zashxq 2 жыл бұрын
i expressed my gratitude in your other video, but i just used your part 2 to successfully complete my first aho-corasick!!! i was using so many resources but again, your run-through helped it all click. thank you so much!!! :)
@niemasd
@niemasd 2 жыл бұрын
I'm so glad to hear it! Congrats!
@theodoros_1234
@theodoros_1234 2 жыл бұрын
Your videos are incredibly helpful, thank you!
@benish0r
@benish0r 2 жыл бұрын
Amazing explanation. Thank you!
@kabirnagrecha3745
@kabirnagrecha3745 4 жыл бұрын
I don't remember the lecture we went over failure links, I didn't see it in any of the MWT videos or Stepik? Edit: never mind, the videos were just displayed out of order for me. I got this video before the Aho-Corasick automaton video.
@niemasd
@niemasd 4 жыл бұрын
Sorry for the confusion! Glad you figured it out :-)
@VahidOnTheMove
@VahidOnTheMove 4 жыл бұрын
@@niemasd Hi Nima. Thanks for the videos. They are really helpful.
@richardharris9708
@richardharris9708 3 жыл бұрын
Thanks for the videos. I watched the previous one but KZbin worked very hard not to suggest this one afterwards, I had to hunt for it!
@ttaisinha1
@ttaisinha1 4 жыл бұрын
This helped me a lot, thanks!
@amitavamozumder73
@amitavamozumder73 3 жыл бұрын
understanding this is easy, coding this is a nightmare.
@mskiptr
@mskiptr 3 жыл бұрын
Maybe a video on the suffix automaton?
@pianopat6924
@pianopat6924 4 жыл бұрын
So, just trying to word this: Creating Dictionary Link = if the failure link of a given current node or along succeeding failure links - connects to a word node create a dictionary link between the current node and the word node ?
@niemasd
@niemasd 4 жыл бұрын
Precisely! Given a current node u, follow failure links from u to the root, and the first time you encounter a word node v along that failure link path, create a dictionary link from u to v. If you never encounter a word node from the failure link path from u to the root, u does not have any dictionary links
@riteshsaha2207
@riteshsaha2207 3 жыл бұрын
Is it possible that the dictionary link of a node also has a dictionary link?
@niemasd
@niemasd 3 жыл бұрын
Absolutely! You would keep following dictionary links until there are no more to follow. For example, consider the Aho-Corasick Automaton constructed from the following words: A, AA, AAA, AAAA, etc. There will be a dictionary link from AAAA to AAA, one from AAA to AA, and one from AA to A
@mskiptr
@mskiptr 3 жыл бұрын
Question: Could we have several outgoing dictionary links? I.e. should we continue hopping over failure links until we get to the root - even if we already found one dictionary link - or is there some reason not to?
@niemasd
@niemasd 3 жыл бұрын
If I'm not mistaken, in practice, you generally only have at most 1 dictionary link leaving any node, but when you want to traverse dictionary links to output visited nodes, you will keep following dictionary links as far as you can
@alonlevylll
@alonlevylll 2 жыл бұрын
Very good explanation! However, I'm facing an issue with algorithm edge case I would love to share with you: Keywords: ["gcaz", "cab"] Input string: "gcab" Seems like the "cab" keyword won't be found. Algorithm will search all trie from "g"->"c"->"a"->"z" fallback to root, and won't match with the "cab" keyword. Any ideas about this?
@niemasd
@niemasd 2 жыл бұрын
It will not fall back to the root: I think you may have forgotten the failure links from "gc" to "c" and from "gca" to "ca" (the latter of which is the important one in the example you provided). Let's work through it: If you have just two words "gcaz" and "cab", the Multiway Trie of the Aho-Corasick Automaton will have exactly these 2 paths, where ROOT is the root, () denotes a non-word node, and (*) denotes a word node: ROOT--g-->()--c-->()--a-->()--z-->(*) ROOT--c-->()--a-->()--b-->(*) You have the following 2 failure links: Failure link starting at ROOT--g-->()--c-->() and pointing to ROOT--c-->() Failure link starting at ROOT--g-->()--c-->()--a-->() and pointing to ROOT--c-->()--a-->() There are no dictionary links. If the input string is "gcab", you would do the following: 1. Start at ROOT, and start at index 0 of "gcab" ("g") 2. ROOT does have a child labeled by "g", so move to ROOT--g-->() and move to index 1 of "gcab" ("c") 3. ROOT--g-->() does have a child labeled by "c", so move to ROOT--g-->()--c-->() and move to index 2 of "gcab" ("a") 4. ROOT--g-->()--c--() does have a child labeled by "a", so move to ROOT--g-->()--c-->()--a-->() and move to index 3 of "gcab" ("b") 5. ROOT--g-->()--c-->()--a-->() does NOT have a child labeled by "b", so follow its failure link to ROOT--c-->()--a-->() 6. ROOT--c-->()--a-->() does have a child labeled by "b", so move to ROOT--c-->()--a-->()--b-->(*). This is a word node, so output the corresponding word ("cab")
@alonlevylll
@alonlevylll 2 жыл бұрын
@@niemasd Thanks for the detailed reply. You are right, this is works, my bad. Thank you very much!
@utkarshkathuria2931
@utkarshkathuria2931 3 жыл бұрын
At 0:44 You say, you find GC. Isn't it wrong to say because it's also substring and if you find GC then you also find C and A?
@niemasd
@niemasd 3 жыл бұрын
I'm not quite sure I understand your question; can you clarify? GC is indeed a substring of GCA, but it's a completely separate word in my database and needs to be counted separately. With only failure links (i.e., without dictionary links), when I find GC, I do *not* find C because I have no local knowledge that C is also a word, and when I find GCA, I do *not* find A because I have no local knowledge that A is also a word. That is precisely the motivation of dictionary links: they allow us to find instances in which, while traversing a valid (i.e., non-failing) path, there exist words that are substrings of the current path
@abhaypatil2000
@abhaypatil2000 4 жыл бұрын
That was sooo fast
Advanced Data Structures: Aho-Corasick Automaton
9:55
Niema Moshiri
Рет қаралды 62 М.
Как Я Брата ОБМАНУЛ (смешное видео, прикол, юмор, поржать)
00:59
BAYGUYSTAN | 1 СЕРИЯ | bayGUYS
37:51
bayGUYS
Рет қаралды 581 М.
Ctrl+F on steroids - Aho-Corasick Algorithm (pt. 1)
13:54
ComputerBread
Рет қаралды 984
Advanced Data Structures: Suffix Array Search
12:33
Niema Moshiri
Рет қаралды 16 М.
16. Strings
1:24:30
MIT OpenCourseWare
Рет қаралды 126 М.
8 Data Structures Every Programmer Should Know
17:09
ForrestKnight
Рет қаралды 198 М.
Suffix Tries
7:17
Lalitha Natraj
Рет қаралды 128 М.
АиСД S03E11. Алгоритм Ахо-Корасик
1:20:36
Pavel Mavrin
Рет қаралды 3,2 М.
Advanced Data Structures: Suffix Arrays
6:24
Niema Moshiri
Рет қаралды 27 М.
Aho-Corasick Algorithm
10:30
Jaisuraj Bantupalli
Рет қаралды 6 М.
Aho - Corasick Algorithm
16:47
Epsilon
Рет қаралды 7 М.
Understanding B-Trees: The Data Structure Behind Modern Databases
12:39
Как Я Брата ОБМАНУЛ (смешное видео, прикол, юмор, поржать)
00:59