05:00 how to choose hash table size m 07:40 grow table 11:00 table doubling 14:30 amortization 21:32 shrink table 28:48 string matching example 37:30 rolling-hash ADT 40:32 Karp-Rabin string matching algorithm 48:05 build rolling-hash ADT with division hashing method Note: table doubling and karp-rabin actually seems as two unrelated topics, karp-rabin starts at 28:48
@ShafaetAshraf8 жыл бұрын
if you are wondering why the professor wrote m>n, it's a mistake and he fixed it at 11:46. It should be n>m.
@arjanbal39727 жыл бұрын
Just left hackerrank to learn some theory, and guess who I found here!
@ShafaetAshraf7 жыл бұрын
:)
@hangchen61317 жыл бұрын
I stopped at 11:29 and scroll down to check my thought(can’t bare that m>n anymore..), and I see your post. :)
@meghna19157 жыл бұрын
same what I did immediately after pausing it :)
@rhac796 жыл бұрын
I wonder why none of the students corrected him
@taraskuzyk89854 жыл бұрын
I like how he started calculating how much rent he pays per second around 17:50, and said "one penny... that's actually pretty close, only off by a factor of two". He calculated immediately how many seconds there are in a month and divided with it 1500, and got 0.005 all within a second. And let's just disregard that it's off by a factor of 20, it's still impressive
@firehawk12933 жыл бұрын
I assume he started with the $50 a day. Then maybe he already knew there are about 100000 seconds in a day.
@free-palestine0004 жыл бұрын
thank you erik and whoever decided to share this series. 5 years later and I'm learning from this.
@IAmNigHtMaReTR3 жыл бұрын
Shouldn't you boycott Israel? Rabin is an Israeli professor
@free-palestine0003 жыл бұрын
@@IAmNigHtMaReTR aw man i wish I knew this before my test. wouldn't have learned this material if I knew he was israeli :/ thanks for looking out bud
@IAmNigHtMaReTR3 жыл бұрын
@@free-palestine000 sure thing. Make sure you never use an Intel product as they have developing and manufacturing centers in Israel. Actualy don't use any tech that's been developed based on any of Albert Einstein's work, as he was also a Jew proud of the newly created state of Israel.
@randomnessslayer3 жыл бұрын
@@IAmNigHtMaReTR *** fixed your ambiguous phrasing. "[person] is associated to [thing] by virtue of being born in [a specific geographic region]. Therefore, not with any direct additional caveats or explanation, [blanket negative stereotype], because of [litany of reasons that have more to do with the stereotype than the person being stereotyped]"
@randomnessslayer3 жыл бұрын
@@IAmNigHtMaReTR en.m.wikipedia.org/wiki/Erik_Demaine this is the instructor, a Canadian-American. 🤔
@monoswitachatterjee45174 жыл бұрын
Crystal clear concept visualization! He is indeed a great teacher!
@nackyding6 жыл бұрын
This guy is the BEST prof I've never seen! Ever.
@lockersrandom61614 жыл бұрын
This is the only place where comments are very useful.
@oneextrabit11 жыл бұрын
do not dislike free education
@RahulRahul-pi5fm2 жыл бұрын
In deletion I believe there is an error instead of m = n/2. It should have been when n = m/2 => shrink m to m/2. Same I believe is true for the next line n = m/4 then shrink m => m/2
@websoftwaredeveloperijtiha30932 жыл бұрын
It's so cool that these top schools release courses like this one online free of charge. I may not get a chance to go to MIT
@wesleylim59323 жыл бұрын
22:37 should be if n = m/2 then m=m/2
@cowboymc82303 жыл бұрын
Agreed. Honestly, I am a bit upset when the genius prof at MIT continually makes mistakes like that. (He made a mistake earlier "when m>n ....". And the funny thing is, no one of those smart MIT students pointed out. (I am sure some of them spotted but just kept silent). I am not here by any means showing off or saying that I am better than them, but to be very honest, I found those two mistakes the next second he wrote up there. Such mistakes could be really confusing..... Sigh Overall, it's a good prof and a good lecture ofc.
@SouravKumar-cw3je3 жыл бұрын
I think he did same mistake for second idea also for deletion, It should be n = m/4, else it will not satisfy initial requirement i.e. "m should be atleast n"
@bboczeng10 жыл бұрын
i have to say, this is much better than CS 170 here at Berkeley.
@projectdocx46615 жыл бұрын
@12:26 I don't understand how the complexity for n insertions would be O(1+2+...n). When for each insertion it would take THETA(n+m+m').
@nyc13213210 жыл бұрын
Looks like the rolling hash given here may be wrong: at least according to the wikipedia the base of the encoding should be a large prime number and not |u| as described in the lecture (in u -> u-c*a^(|u|-1 |u| is assumed to be 256 which seems wrong. Needs to be a prime #). Also, on a side note, the lecturer said we needed the rolling hash for searching very long substrings to make the algo O(n) rather than O(n*k) . It appears that the simplest hash of summing up the chars' ASCII values (instead of the Karp Rabin which is very hard to implement) will produce relatively few collisions (statistically speaking).
@obinnaubah90456 жыл бұрын
But what do you do if you do have collisions? Just use chaining?
@frankcastle32885 жыл бұрын
He said p is a prime number bigger than |U|
@zhegemingzigouchang7 жыл бұрын
I think on 32:13 , the python code should be "any (s==t [i: i+len(s)] for i in range(len(t) - len(s) +1))"
@grishound4 жыл бұрын
I agree
@joshliu70763 жыл бұрын
33:35, should be range(len(t) - len(s) + 1)
@aleksagordic95937 жыл бұрын
5:48 he: "pick your favorite constant" me: pi
@christopherjones70236 жыл бұрын
phi; pi works too. What flavor? :p
@xBl00dBrothersX11 жыл бұрын
Erik Demaine - former child prodigy - got his PhD at 20...
@carlosfonseca1437 жыл бұрын
Thank you, i just look up his name on google and he has his own website. In the website there is much more information that will help me with my computer programming career. Thank you so much.
@sportssarvies3077 жыл бұрын
xBl00dBrothersX FROOTNAME
@farhadab5 жыл бұрын
someone should make a movie about him and his dad. according to wikipedia, he was home-schooled by his dad, who only had a high-school diploma. now they're both at MIT.
@deletedaccount25803 жыл бұрын
@@carlosfonseca143 thanks man ,I know from you,thanks for such info
@aiproshar4 жыл бұрын
This table doubling example is totally messed up. And there is typo mistake in deletion part too. It is better understood using load factor alpha. Assume our load factor increases (alpha =n/m) as new items are inserted (n++). So when alpha passes a certain value(here 0.5 or can be 0.75 anything as bound), we double the table size so alpha is reduced to half. Same goes as deletion, we have a lower bound and when alpha falls behind that bound, we decrease m (usually m/2) so alpha is doubled :)
@israeladesanya2623 жыл бұрын
Nah. His explanation is more intuitive, and is basically identical to what you said.
@joebrady982910 жыл бұрын
Great video as always but n and m are terrible variable names, it's hard to distinguish which one he's saying
@dd-explores4 жыл бұрын
I think it should theta(n+m') at 9:51 not theta(n+m+m') bcz we have m slots in the old table and on average we a have n/m elements in each slot, so to go through each element in the old table we need m*(n/m) i.e. theta(n) time and obviously theta(m') for creating new table and initialize each element to nil. Please correct me if I'm wrong.
@shabanashaikh16133 жыл бұрын
It is worst case, so you can have one slot with n collisions and remaining m-1 so in total n + m - 1 + 1(you have to go through the slot with. N collisions)
@isbestlizard4 жыл бұрын
rehashing sounds expensive! i would store the original hash with full precision so don't need to hash it again, just look and use the extra bit :O
@sharinganuser15394 жыл бұрын
To all those who want more understanding on Rabin Karp...go check R9th lecture . Victor explained really well... u'll not forget ur whole life...
@adventurer23952 жыл бұрын
who? where?
@aayushanupam Жыл бұрын
@@adventurer2395 recitation 9. It is there in the playlist
@sushantkumar49173 жыл бұрын
At 22:19 if m is the size of table and n is no of keys, then if keys is half the size of table i.e n = m/2 so we shrik the table, but why professor has written m = n/2 can anyone explain??
@martinspage7 жыл бұрын
@ 33:42 should be range(len(t)-len(s) + 1 ) as far as I can tell.
@sanonp23246 жыл бұрын
martin yep. Or the last len(s) characters in t will never be checked
@mfkj1j17 жыл бұрын
28:47 string matching
@AnimeshPaul233 жыл бұрын
At 14:34 how can O(1 + 2 + 4 + ... + n) = O(n). This is a geometric series and the sum is 2 ^ n - 1
@raytso2 жыл бұрын
the n in the sum 2^n - 1 is not the same n in O(n). By the series equation the sum should be 2^(log n) - 1 since there's only log n "steps". Or rewrite it as O(1 + 2 + 4 + 8 +...+ 2^k) where n = 2^k and you'll understand.
Haha there's a Chinese word(爱/愛) printed on his T-shirt. It means 'love'.
@budhachandrayumkhaibam60795 жыл бұрын
its also written in Gaara's forehead (left)
@RIMJANESSOHMALOOG Жыл бұрын
I'd love to learn from a super geek like him
@innostellarvtp44844 жыл бұрын
At 24:40, do you think it should be m/4 items to increment the size again?
@HarikrishnaChikoti3 жыл бұрын
If you do that insertion Amortized time would not be constant and the function would become costly.
@deepdive15853 жыл бұрын
Love you a lot MIT!!
@davidjames16843 жыл бұрын
11:50 - choose better letters. You chose probably the 2 worst, that both look and sound alike.
@sudhanshudey7583 жыл бұрын
How does the r.skip() work? I didn't get the u-> u-c.a^... why did we do that?
@АйбатАманбайұлы4 жыл бұрын
51:09 what is the point of multiplication u*a? I do not get the algorithm of appending and skipping
@Alejito222ful4 жыл бұрын
me neither
@Victor-yn6jm5 жыл бұрын
Why is probability less than 1/s when the hash of two strings are equal and two strings are not equal?
@isbestlizard4 жыл бұрын
because a good hash function should not give a collision between two unequal objects more frequently than 1/2^length in bits and presumably the hash being used is longer than lg s
@coding99 Жыл бұрын
49:25 for choosing m
@robertekis24503 жыл бұрын
I now see the difference between software engineering and software theory. An engineer knows that multiplies and divides are slow, not instantaneous and avoid them when possible. I suppose the guy here is trying to show general solutions, which usually isn't applicable to engineering. Still interesting since most of my training was in software engineering. Thanks for the videos MIT and Erik.
@randomnessslayer3 жыл бұрын
Division by a power of two is free basically. As simple as an n-bit shift. When it comes to theory, the generalization holds for most any division (a few caveats of course), for implementation, powers of two provide the engineer with adequate mod or division operations with simple binary trickery.
@jackyjackson98865 жыл бұрын
28:48 String matching
@isbestlizard4 жыл бұрын
isn't the original algorithm also O(|t|) because you can skip if s[0] doesn't immediately match t[i] and it won't match more often than 1/|s|? You don't have to compare every character of s in every position, just the first.
@shariquemohd94499 жыл бұрын
Can anybody explain how did THETA(1+2+4+....+n) turned out to be THETA(n)? PS: I had forgotten about the approximation for large n. Got it now :D
@pjthamg8 жыл бұрын
+mohammad sharique The key there is that this is a sum of geometric series: 1+2+4+8+...+n = 2^0+2^1+2^2+2^3+...+2^(log(n)) You can see that this will be a sum of log(n) first items in the series, and if you compute it, you will get 2n-1, which is Theta(n) Hope this helps! :)
@shariquemohd94498 жыл бұрын
+Piotr Jan Yeah! I realized it later. Still thanks for your efforts :) Someone else might benefit from it.
@monffy587 жыл бұрын
thanks!
@akshitagarwal987 жыл бұрын
THANKS BRO..
@amuzak90634 жыл бұрын
Alternatively, using the gp summation formula, you get S=a((r^n) - 1)/(r-1) here a is 1, r is 2 and n is lg n so we get S=2^lgn - 1 which becomes n-1 which when you remove the constant gives a time complexity of n
@rj-nj3uk2 жыл бұрын
I can understand how is theta(1+2+3+...+n) = theta(n^2)? Should not it be theta(n)??
@junzhai17154 жыл бұрын
3:29, should the last part of second hash function formula be >> (w-r)? He wrote >> 2^(r-w) which is not the same as last lecture.
@ashutoshtiwari43984 жыл бұрын
11:55 He start talking about the complexity when m = m' +1 and he writes O(1+2+3...+n). Could anyone explain me how this happen?
@amuzak90634 жыл бұрын
Initially we'll have to move over 1 element, then 2 elements, then 3, and so on
@莊涵盛-n6u3 жыл бұрын
BUT how do we ensure that the hash functions do not go wrong, in comparing the strings?
@slavkochepasov81344 жыл бұрын
note 1) n>>m => grow=rebuild(m'); m>n => shrink = rebuild(m"); he just assumed it is obvious for anyone rebuild(m') == grow(m'). small inaccuracy but very annoying from terminology point.
@Nipunraghav2611 жыл бұрын
Rabin-Karp Algo starts @ 40:45. :) :P
@shabanashaikh16133 жыл бұрын
Can anyone explain what is m=O(n) how bigoh is compared with size of table? And also for all operations to be O(1) we need alpha as 1 but then each key will have separate index so what is the use of this,. We could have done with simple array.
@florianwicher6 жыл бұрын
Isn't that trick at 27:50 sort of "cheating"? If you make your constant big enough you can run almost every problem in what technically is constant time, no?
@israeladesanya2623 жыл бұрын
He wasn't implying that it can be a constant of any size, just "big enough" to make the algorithm work.
@nj93623 жыл бұрын
He mentions a way where inserts and deletes can be done in the worst case O(1) where you keep copying elements on the go. Can anyone share more information about that?
@OOO-wy5iw2 жыл бұрын
You didn't really watch the video
@chengweili95169 жыл бұрын
愛. Hahaha, I love Erik's T-shirts
@satadhi7 жыл бұрын
what does it say
@1988FENGFENG7 жыл бұрын
which equals `love`
@aria59355 жыл бұрын
what does rs.append(c) mean? Is it to add the c to original string s and calculate the hash value of string s+c? or is it to calculate the hash value of original string s and minus the value by the hash value of character c?
@주재빈-m2r2 жыл бұрын
what happens to r in skip operations? Is r
@panmacabre98952 жыл бұрын
1+2+4+8+16+..n is geometric series , then the total would be like 2^n-1? what am I missing here
@WeirdAlSuperFan2 жыл бұрын
N is the final term after exponentiation, meaning the final index/exponent itself would be something that, applied to some number (say, 2) results in n. Call this final exponent "b." Then n = 2^b, and b is logn. In other words the sum index i goes from 1 (or 0, depending on how you write it) to b aka logn. So there are only logn terms. But anyway the sum would then be something like (2^logn) - 1, so n - 1 or O(n)
@tusherity10 жыл бұрын
The ones who doesnt like this video: "He is Prof in MIT". I think that should be enough
@BurgerBurglar89649 жыл бұрын
+tusherity He also got bachelor degree at the age of 14, Ph.D at 20.
@slavkochepasov81344 жыл бұрын
note 2) m'=2m is "golden" and amortization pars are very confused cause he jumped over n>m'>>m assumption (for theoretical run time cost analysis). - with grow() at each +1 step until table reached finale size n. Tgrow = n grow steps * T_one_grow < n * O(n) = O(n^2). - with "golden" ratio 2 (2.7:). Tgrow = lg(n) grow steps * T_one_grow < lg(n) * O(n) = O(n*lg(n)) correct but inaccurate we can do better estimation of upper limit T = 1+2+4+8+..n = 2^i-1 (geometrical sum) where i grow steps = lg(n) => T < O(2^lg(n))= O(n) # Totality separated from that "amortized" over *different* k operations business. It is disregarding portion of runtime O(n) that is spred over big k operations. Where k >= n (final table size). He also goes into deriving T for k inserts only, etc. Hopefully that helps others to catch his derivations :) kzbin.info/www/bejne/eIOyaKCMfqunZpo
@slavkochepasov81344 жыл бұрын
from practical run time our choice of hash function based on "mode m" makes grows_rate =2 golden :)
@rrawat72888 жыл бұрын
can anybody explain me the cost of n insertion at 12.26...? why it is 1+2+3... n
@mohamedahmed-fr7es8 жыл бұрын
if you increase the array by 1 so if we start from the beginning its initial size is 1 after we insert the first element we move" one" element to another array with size of 2 again after we insert the second element we increase the size to 3 and move "two" element to this array again and again until we reach the n element we have done (1+2+3+4 ....+n) rehashing every rehash cost o(1) so the answer will be the sum of this numbers .
@miraires8 жыл бұрын
its actually like 8(1 + 2 + 3 .....) But because of Big O, we can ignore 8
@projectdocx46615 жыл бұрын
@@miraires It won't be 8 it would be THETA(n+m+m'), which you can't ignore. He clearly mentions, for each insertion, it's gonna take linear time.
@projectdocx46615 жыл бұрын
@@mohamedahmed-fr7es The prof clearly mentions, "For each insertion AFTER n > m". So you cannot start with initial size 1. you have to start with initial size m holding in all n elements. And when you do that for each insertion it's gonna take THETA(n+m+m')
@Logan1selva4 жыл бұрын
What is i ? I despirately need to clear this doubt...i apologise if i sound dumb im just a noob.would appreciate an answer.
@feiyijiang91676 жыл бұрын
It's a little frustrating about the table doubling that if " m is bigger than n and less than 4n, the amortized time will be constant time". I guess it should be "m greater than 2n and less than 4n". Besides, in deletion part is it "m = 2n and m = 4m " rather than "m = n/2, m = n/4"? Could somebody answer my questions?
@RahulRahul-pi5fm2 жыл бұрын
I believe what you are saying is correct. I checked lecture notes, over there it is correct.
@hkleo2 жыл бұрын
Nice short
@kevinryzack57126 жыл бұрын
The last two minute explanation for calculating the new hash value from the previous hash value was pretty bad. Hopefully Victor can clear it up in recitation!
@hellaren6 жыл бұрын
If you wonder how new hash calculation works, you could check this video (kzbin.info/www/bejne/p4Kbp4Zol9mmrqs) The author of this video has a perfect explanation of sliding hash calculation. Hope it will help
@JuanVargas-kw4di3 жыл бұрын
are the recitation videos available? what recitation are you talking about/ who is kevin?
@saikirannaragam63945 жыл бұрын
What is the difference between prehash and hash?
@Vflit2 жыл бұрын
愛:Surely he loves algorithm and Computer Science!
@allandogreat3 жыл бұрын
Karp-Rabin is not clear.
@suhasdotcom5 жыл бұрын
Which textbook did he refer to @25:32 ?
@Nickel805 жыл бұрын
As per the MIT course page, the book for the class was Introduction to Algorithms, 3rd edition. Reference: ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-006-introduction-to-algorithms-fall-2011/readings/
@obinnaubah90456 жыл бұрын
This is actually confusing at the end.
@alirezaghodsipoor22395 жыл бұрын
I didn't get a single point of K-R algorithm implementation.
@muazzakaria63904 жыл бұрын
@@alirezaghodsipoor2239 imagine you wanna check a sequence of DNA pattern. you have few candidates who have whole long ass pattern,
@cagnusmarlsen40503 жыл бұрын
whats written in his t-shirt
@jianhe3 жыл бұрын
Love. Chinese character.
@cagnusmarlsen40503 жыл бұрын
@@jianhe thanks man! much love!!
@codingmaster0083 жыл бұрын
Gaara fan maybe
@rstark3 жыл бұрын
Love it so damn much!
@suzukikenta10798 жыл бұрын
Can some explain at 31:00 in recitation 9 (kzbin.info/www/bejne/rWfRpoudZad8idE ), why hash and magic are computed hash * base + new mod, and magic * base mod p in append ?
@projectdocx46615 жыл бұрын
Why does he use THETA every where?
@gompro5 жыл бұрын
Because THETA is more accurate term, it means lower/upper bound at the same time. Technically it's different from big O or omega.
@codethings2717 жыл бұрын
m=n+1
@crjacinro4 жыл бұрын
m and n confuses me
@jiejiefu6 жыл бұрын
愛[ài ] in Chinese means love
@codingmaster0083 жыл бұрын
Naruto gaara fan maybe
@lietlee92538 жыл бұрын
愛!
@pjthamg8 жыл бұрын
+Liuliet Lee What does it mean? :)
@lietlee92538 жыл бұрын
+Piotr Jan it means love
@以可-s3p4 жыл бұрын
Love Erik lol.
@twittertalks39343 жыл бұрын
did anyone else noticed youtube AI algorithm is recognising these lectures as chalk
@florianwicher6 жыл бұрын
Does anyone know if there is special notation for amortized running time? I believe I once saw it being denoted as α() but I can't find anything on the web.
@kenichimori85337 жыл бұрын
Karp-Rabin algorithm.
@BeatSyncBytes6 жыл бұрын
wow amazing
@george51203 жыл бұрын
Instructor Erik Demaine wears his hair long and unkempt to distract from the fact that he has a bald spot. His personal appearance would improve if he emphasized looking well-groomed, regardless of whether being well-groomed further exposes the bald spot.
@israeladesanya2623 жыл бұрын
If that's ungroomed, you should see me.
@WeirdAlSuperFan2 жыл бұрын
Bro why you hating on Erik? He looks cool like that and anyway is a total beast. In any case, MIT people don't need to worry about their appearance. Their output speaks for itself. Also he's tall af, so he can get away with whatever and still look cool
@gigiduru12511 жыл бұрын
lol, didn't quite understand it even with the introduction
@obinnaubah90456 жыл бұрын
Which part?
@raymondlion3146 жыл бұрын
I am totally lost in this lecture.
@roylee31968 жыл бұрын
cool stuff.
@asdfghjkl17705 жыл бұрын
2019 April ? Before End Game?
@gregmakov26805 жыл бұрын
hahah, co ke tuc toi qua choi, chem hinh tum lum am thi tum lum, chi zay, khung bo ha, so qua diii thang cho phat xit :D co nhieu len nua thang tuc toi phat xit
@kevinc99874 жыл бұрын
爱
@bioboy4519 Жыл бұрын
nice clothes
@clairelolification4 жыл бұрын
愛 愛 愛 愛不停
@yehcc78087 жыл бұрын
愛
@davidjames16845 жыл бұрын
In the real world, this bound crap is basically useless.
@israeladesanya2623 жыл бұрын
Yeah, fuck optimal implementations of built-in data structures, let's just eyeball things.
@ariassingh4628 ай бұрын
@@israeladesanya262 That's literally how it works in the real world.
@quangtran34036 жыл бұрын
Can anybody explain why it is expected constant time at 45:21 ?
@steveye98946 жыл бұрын
I think Erik means that the "check whether" function takes THETA(|s|) time, which is THETA(1).
@steveye98946 жыл бұрын
Besides, The Amortized time of "check whether" should be THETA(1). Because, it is called only when hash(rs) and hash(rt) equals, but the possibility is sooooo small. The total time will be k1*THETA(|s|) + k2*THETA(1), k1 + k2 == |t| - |s|, and k2 >>>>>> k1.
@WeirdAlSuperFan2 жыл бұрын
23:37 what are k and 2^k?
@K4moo2 жыл бұрын
It's 2^k and 2^k+1, the thresholds for shrink and grow is right adjacent to each other, and in an edge case where we insert / delete repeatly right on these thresholds, we will have to do allocation (linear time) for every operation