Lecture 9: Table Doubling, Karp-Rabin

Рет қаралды 238,820

Күн бұрын

Пікірлер: 167

@sergeykholkhunov1888 3 жыл бұрын

05:00 how to choose hash table size m 07:40 grow table 11:00 table doubling 14:30 amortization 21:32 shrink table 28:48 string matching example 37:30 rolling-hash ADT 40:32 Karp-Rabin string matching algorithm 48:05 build rolling-hash ADT with division hashing method Note: table doubling and karp-rabin actually seems as two unrelated topics, karp-rabin starts at 28:48

@ShafaetAshraf 8 жыл бұрын

if you are wondering why the professor wrote m>n, it's a mistake and he fixed it at 11:46. It should be n>m.

@arjanbal3972 7 жыл бұрын

Just left hackerrank to learn some theory, and guess who I found here!

@ShafaetAshraf 7 жыл бұрын

@hangchen6131 7 жыл бұрын

I stopped at 11:29 and scroll down to check my thought(can’t bare that m>n anymore..), and I see your post. :)

@meghna1915 7 жыл бұрын

same what I did immediately after pausing it :)

@rhac79 6 жыл бұрын

I wonder why none of the students corrected him

@taraskuzyk8985 4 жыл бұрын

I like how he started calculating how much rent he pays per second around 17:50, and said "one penny... that's actually pretty close, only off by a factor of two". He calculated immediately how many seconds there are in a month and divided with it 1500, and got 0.005 all within a second. And let's just disregard that it's off by a factor of 20, it's still impressive

@firehawk1293 3 жыл бұрын

I assume he started with the $50 a day. Then maybe he already knew there are about 100000 seconds in a day.

@free-palestine000 4 жыл бұрын

thank you erik and whoever decided to share this series. 5 years later and I'm learning from this.

@IAmNigHtMaReTR 3 жыл бұрын

Shouldn't you boycott Israel? Rabin is an Israeli professor

@free-palestine000 3 жыл бұрын

@@IAmNigHtMaReTR aw man i wish I knew this before my test. wouldn't have learned this material if I knew he was israeli :/ thanks for looking out bud

@IAmNigHtMaReTR 3 жыл бұрын

@@free-palestine000 sure thing. Make sure you never use an Intel product as they have developing and manufacturing centers in Israel. Actualy don't use any tech that's been developed based on any of Albert Einstein's work, as he was also a Jew proud of the newly created state of Israel.

@randomnessslayer 3 жыл бұрын

@@IAmNigHtMaReTR *** fixed your ambiguous phrasing. "[person] is associated to [thing] by virtue of being born in [a specific geographic region]. Therefore, not with any direct additional caveats or explanation, [blanket negative stereotype], because of [litany of reasons that have more to do with the stereotype than the person being stereotyped]"

@randomnessslayer 3 жыл бұрын

@@IAmNigHtMaReTR en.m.wikipedia.org/wiki/Erik_Demaine this is the instructor, a Canadian-American. 🤔

@monoswitachatterjee4517 4 жыл бұрын

Crystal clear concept visualization! He is indeed a great teacher!

@nackyding 6 жыл бұрын

This guy is the BEST prof I've never seen! Ever.

@lockersrandom6161 4 жыл бұрын

This is the only place where comments are very useful.

@oneextrabit 11 жыл бұрын

do not dislike free education

@RahulRahul-pi5fm 2 жыл бұрын

In deletion I believe there is an error instead of m = n/2. It should have been when n = m/2 => shrink m to m/2. Same I believe is true for the next line n = m/4 then shrink m => m/2

@websoftwaredeveloperijtiha3093 2 жыл бұрын

It's so cool that these top schools release courses like this one online free of charge. I may not get a chance to go to MIT

@wesleylim5932 3 жыл бұрын

22:37 should be if n = m/2 then m=m/2

@cowboymc8230 3 жыл бұрын

Agreed. Honestly, I am a bit upset when the genius prof at MIT continually makes mistakes like that. (He made a mistake earlier "when m>n ....". And the funny thing is, no one of those smart MIT students pointed out. (I am sure some of them spotted but just kept silent). I am not here by any means showing off or saying that I am better than them, but to be very honest, I found those two mistakes the next second he wrote up there. Such mistakes could be really confusing..... Sigh Overall, it's a good prof and a good lecture ofc.

@SouravKumar-cw3je 3 жыл бұрын

I think he did same mistake for second idea also for deletion, It should be n = m/4, else it will not satisfy initial requirement i.e. "m should be atleast n"

@bboczeng 10 жыл бұрын

i have to say, this is much better than CS 170 here at Berkeley.

@projectdocx4661 5 жыл бұрын

@12:26 I don't understand how the complexity for n insertions would be O(1+2+...n). When for each insertion it would take THETA(n+m+m').

@nyc132132 10 жыл бұрын

Looks like the rolling hash given here may be wrong: at least according to the wikipedia the base of the encoding should be a large prime number and not |u| as described in the lecture (in u -> u-c*a^(|u|-1 |u| is assumed to be 256 which seems wrong. Needs to be a prime #). Also, on a side note, the lecturer said we needed the rolling hash for searching very long substrings to make the algo O(n) rather than O(n*k) . It appears that the simplest hash of summing up the chars' ASCII values (instead of the Karp Rabin which is very hard to implement) will produce relatively few collisions (statistically speaking).

@obinnaubah9045 6 жыл бұрын

But what do you do if you do have collisions? Just use chaining?

@frankcastle3288 5 жыл бұрын

He said p is a prime number bigger than |U|

@zhegemingzigouchang 7 жыл бұрын

I think on 32:13 , the python code should be "any (s==t [i: i+len(s)] for i in range(len(t) - len(s) +1))"

@grishound 4 жыл бұрын

I agree

@joshliu7076 3 жыл бұрын

33:35, should be range(len(t) - len(s) + 1)

@aleksagordic9593 7 жыл бұрын

5:48 he: "pick your favorite constant" me: pi

@christopherjones7023 6 жыл бұрын

phi; pi works too. What flavor? :p

@xBl00dBrothersX 11 жыл бұрын

Erik Demaine - former child prodigy - got his PhD at 20...

@carlosfonseca143 7 жыл бұрын

Thank you, i just look up his name on google and he has his own website. In the website there is much more information that will help me with my computer programming career. Thank you so much.

@sportssarvies307 7 жыл бұрын

xBl00dBrothersX FROOTNAME

@farhadab 5 жыл бұрын

someone should make a movie about him and his dad. according to wikipedia, he was home-schooled by his dad, who only had a high-school diploma. now they're both at MIT.

@deletedaccount2580 3 жыл бұрын

@@carlosfonseca143 thanks man ,I know from you,thanks for such info

@aiproshar 4 жыл бұрын

This table doubling example is totally messed up. And there is typo mistake in deletion part too. It is better understood using load factor alpha. Assume our load factor increases (alpha =n/m) as new items are inserted (n++). So when alpha passes a certain value(here 0.5 or can be 0.75 anything as bound), we double the table size so alpha is reduced to half. Same goes as deletion, we have a lower bound and when alpha falls behind that bound, we decrease m (usually m/2) so alpha is doubled :)

@israeladesanya262 3 жыл бұрын

Nah. His explanation is more intuitive, and is basically identical to what you said.

@joebrady9829 10 жыл бұрын

Great video as always but n and m are terrible variable names, it's hard to distinguish which one he's saying

@dd-explores 4 жыл бұрын

I think it should theta(n+m') at 9:51 not theta(n+m+m') bcz we have m slots in the old table and on average we a have n/m elements in each slot, so to go through each element in the old table we need m*(n/m) i.e. theta(n) time and obviously theta(m') for creating new table and initialize each element to nil. Please correct me if I'm wrong.

@shabanashaikh1613 3 жыл бұрын

It is worst case, so you can have one slot with n collisions and remaining m-1 so in total n + m - 1 + 1(you have to go through the slot with. N collisions)

@isbestlizard 4 жыл бұрын

rehashing sounds expensive! i would store the original hash with full precision so don't need to hash it again, just look and use the extra bit :O

@sharinganuser1539 4 жыл бұрын

To all those who want more understanding on Rabin Karp...go check R9th lecture . Victor explained really well... u'll not forget ur whole life...

@adventurer2395 2 жыл бұрын

who? where?

@aayushanupam Жыл бұрын

@@adventurer2395 recitation 9. It is there in the playlist

@sushantkumar4917 3 жыл бұрын

At 22:19 if m is the size of table and n is no of keys, then if keys is half the size of table i.e n = m/2 so we shrik the table, but why professor has written m = n/2 can anyone explain??

@martinspage 7 жыл бұрын

@ 33:42 should be range(len(t)-len(s) + 1 ) as far as I can tell.

@sanonp2324 6 жыл бұрын

martin yep. Or the last len(s) characters in t will never be checked

@mfkj1j1 7 жыл бұрын

28:47 string matching

@AnimeshPaul23 3 жыл бұрын

At 14:34 how can O(1 + 2 + 4 + ... + n) = O(n). This is a geometric series and the sum is 2 ^ n - 1

@raytso 2 жыл бұрын

the n in the sum 2^n - 1 is not the same n in O(n). By the series equation the sum should be 2^(log n) - 1 since there's only log n "steps". Or rewrite it as O(1 + 2 + 4 + 8 +...+ 2^k) where n = 2^k and you'll understand.

@accessory2008 9 жыл бұрын

Letcture notes: ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-006-introduction-to-algorithms-fall-2011/lecture-videos/MIT6_006F11_lec09.pdf

@yantizeng6557 6 жыл бұрын

Haha there's a Chinese word(爱/愛) printed on his T-shirt. It means 'love'.

@budhachandrayumkhaibam6079 5 жыл бұрын

its also written in Gaara's forehead (left)

@RIMJANESSOHMALOOG Жыл бұрын

I'd love to learn from a super geek like him

@innostellarvtp4484 4 жыл бұрын

At 24:40, do you think it should be m/4 items to increment the size again?

@HarikrishnaChikoti 3 жыл бұрын

If you do that insertion Amortized time would not be constant and the function would become costly.

@deepdive1585 3 жыл бұрын

Love you a lot MIT!!

@davidjames1684 3 жыл бұрын

11:50 - choose better letters. You chose probably the 2 worst, that both look and sound alike.

@sudhanshudey758 3 жыл бұрын

How does the r.skip() work? I didn't get the u-> u-c.a^... why did we do that?

@АйбатАманбайұлы 4 жыл бұрын

51:09 what is the point of multiplication u*a? I do not get the algorithm of appending and skipping

@Alejito222ful 4 жыл бұрын

me neither

@Victor-yn6jm 5 жыл бұрын

Why is probability less than 1/s when the hash of two strings are equal and two strings are not equal?

@isbestlizard 4 жыл бұрын

because a good hash function should not give a collision between two unequal objects more frequently than 1/2^length in bits and presumably the hash being used is longer than lg s

@coding99 Жыл бұрын

49:25 for choosing m

@robertekis2450 3 жыл бұрын

I now see the difference between software engineering and software theory. An engineer knows that multiplies and divides are slow, not instantaneous and avoid them when possible. I suppose the guy here is trying to show general solutions, which usually isn't applicable to engineering. Still interesting since most of my training was in software engineering. Thanks for the videos MIT and Erik.

@randomnessslayer 3 жыл бұрын

Division by a power of two is free basically. As simple as an n-bit shift. When it comes to theory, the generalization holds for most any division (a few caveats of course), for implementation, powers of two provide the engineer with adequate mod or division operations with simple binary trickery.

@jackyjackson9886 5 жыл бұрын

28:48 String matching

@isbestlizard 4 жыл бұрын

isn't the original algorithm also O(|t|) because you can skip if s[0] doesn't immediately match t[i] and it won't match more often than 1/|s|? You don't have to compare every character of s in every position, just the first.

@shariquemohd9449 9 жыл бұрын

Can anybody explain how did THETA(1+2+4+....+n) turned out to be THETA(n)? PS: I had forgotten about the approximation for large n. Got it now :D

@pjthamg 8 жыл бұрын

+mohammad sharique The key there is that this is a sum of geometric series: 1+2+4+8+...+n = 2^0+2^1+2^2+2^3+...+2^(log(n)) You can see that this will be a sum of log(n) first items in the series, and if you compute it, you will get 2n-1, which is Theta(n) Hope this helps! :)

@shariquemohd9449 8 жыл бұрын

+Piotr Jan Yeah! I realized it later. Still thanks for your efforts :) Someone else might benefit from it.

@monffy58 7 жыл бұрын

thanks!

@akshitagarwal98 7 жыл бұрын

THANKS BRO..

@amuzak9063 4 жыл бұрын

Alternatively, using the gp summation formula, you get S=a((r^n) - 1)/(r-1) here a is 1, r is 2 and n is lg n so we get S=2^lgn - 1 which becomes n-1 which when you remove the constant gives a time complexity of n

@rj-nj3uk 2 жыл бұрын

I can understand how is theta(1+2+3+...+n) = theta(n^2)? Should not it be theta(n)??

@junzhai1715 4 жыл бұрын

3:29, should the last part of second hash function formula be >> (w-r)? He wrote >> 2^(r-w) which is not the same as last lecture.

@ashutoshtiwari4398 4 жыл бұрын

11:55 He start talking about the complexity when m = m' +1 and he writes O(1+2+3...+n). Could anyone explain me how this happen?

@amuzak9063 4 жыл бұрын

Initially we'll have to move over 1 element, then 2 elements, then 3, and so on

@莊涵盛-n6u 3 жыл бұрын

BUT how do we ensure that the hash functions do not go wrong, in comparing the strings?

@slavkochepasov8134 4 жыл бұрын

note 1) n>>m => grow=rebuild(m'); m>n => shrink = rebuild(m"); he just assumed it is obvious for anyone rebuild(m') == grow(m'). small inaccuracy but very annoying from terminology point.

@Nipunraghav26 11 жыл бұрын

Rabin-Karp Algo starts @ 40:45. :) :P

@shabanashaikh1613 3 жыл бұрын

Can anyone explain what is m=O(n) how bigoh is compared with size of table? And also for all operations to be O(1) we need alpha as 1 but then each key will have separate index so what is the use of this,. We could have done with simple array.

@florianwicher 6 жыл бұрын

Isn't that trick at 27:50 sort of "cheating"? If you make your constant big enough you can run almost every problem in what technically is constant time, no?

@israeladesanya262 3 жыл бұрын

He wasn't implying that it can be a constant of any size, just "big enough" to make the algorithm work.

@nj9362 3 жыл бұрын

He mentions a way where inserts and deletes can be done in the worst case O(1) where you keep copying elements on the go. Can anyone share more information about that?

@OOO-wy5iw 2 жыл бұрын

You didn't really watch the video

@chengweili9516 9 жыл бұрын

愛. Hahaha, I love Erik's T-shirts

@satadhi 7 жыл бұрын

what does it say

@1988FENGFENG 7 жыл бұрын

which equals `love`

@aria5935 5 жыл бұрын

what does rs.append(c) mean? Is it to add the c to original string s and calculate the hash value of string s+c? or is it to calculate the hash value of original string s and minus the value by the hash value of character c?

@주재빈-m2r 2 жыл бұрын

what happens to r in skip operations? Is r

@panmacabre9895 2 жыл бұрын

1+2+4+8+16+..n is geometric series , then the total would be like 2^n-1? what am I missing here

@WeirdAlSuperFan 2 жыл бұрын

N is the final term after exponentiation, meaning the final index/exponent itself would be something that, applied to some number (say, 2) results in n. Call this final exponent "b." Then n = 2^b, and b is logn. In other words the sum index i goes from 1 (or 0, depending on how you write it) to b aka logn. So there are only logn terms. But anyway the sum would then be something like (2^logn) - 1, so n - 1 or O(n)

@tusherity 10 жыл бұрын

The ones who doesnt like this video: "He is Prof in MIT". I think that should be enough

@BurgerBurglar8964 9 жыл бұрын

+tusherity He also got bachelor degree at the age of 14, Ph.D at 20.

@slavkochepasov8134 4 жыл бұрын

note 2) m'=2m is "golden" and amortization pars are very confused cause he jumped over n>m'>>m assumption (for theoretical run time cost analysis). - with grow() at each +1 step until table reached finale size n. Tgrow = n grow steps * T_one_grow < n * O(n) = O(n^2). - with "golden" ratio 2 (2.7:). Tgrow = lg(n) grow steps * T_one_grow < lg(n) * O(n) = O(n*lg(n)) correct but inaccurate we can do better estimation of upper limit T = 1+2+4+8+..n = 2^i-1 (geometrical sum) where i grow steps = lg(n) => T < O(2^lg(n))= O(n) # Totality separated from that "amortized" over *different* k operations business. It is disregarding portion of runtime O(n) that is spred over big k operations. Where k >= n (final table size). He also goes into deriving T for k inserts only, etc. Hopefully that helps others to catch his derivations :) kzbin.info/www/bejne/eIOyaKCMfqunZpo

@slavkochepasov8134 4 жыл бұрын

from practical run time our choice of hash function based on "mode m" makes grows_rate =2 golden :)

@rrawat7288 8 жыл бұрын

can anybody explain me the cost of n insertion at 12.26...? why it is 1+2+3... n

@mohamedahmed-fr7es 8 жыл бұрын

if you increase the array by 1 so if we start from the beginning its initial size is 1 after we insert the first element we move" one" element to another array with size of 2 again after we insert the second element we increase the size to 3 and move "two" element to this array again and again until we reach the n element we have done (1+2+3+4 ....+n) rehashing every rehash cost o(1) so the answer will be the sum of this numbers .

@miraires 8 жыл бұрын

its actually like 8(1 + 2 + 3 .....) But because of Big O, we can ignore 8

@projectdocx4661 5 жыл бұрын

@@miraires It won't be 8 it would be THETA(n+m+m'), which you can't ignore. He clearly mentions, for each insertion, it's gonna take linear time.

@projectdocx4661 5 жыл бұрын

@@mohamedahmed-fr7es The prof clearly mentions, "For each insertion AFTER n > m". So you cannot start with initial size 1. you have to start with initial size m holding in all n elements. And when you do that for each insertion it's gonna take THETA(n+m+m')

@Logan1selva 4 жыл бұрын

What is i ? I despirately need to clear this doubt...i apologise if i sound dumb im just a noob.would appreciate an answer.

@feiyijiang9167 6 жыл бұрын

It's a little frustrating about the table doubling that if " m is bigger than n and less than 4n, the amortized time will be constant time". I guess it should be "m greater than 2n and less than 4n". Besides, in deletion part is it "m = 2n and m = 4m " rather than "m = n/2, m = n/4"? Could somebody answer my questions?

@RahulRahul-pi5fm 2 жыл бұрын

I believe what you are saying is correct. I checked lecture notes, over there it is correct.

@hkleo 2 жыл бұрын

Nice short

@kevinryzack5712 6 жыл бұрын

The last two minute explanation for calculating the new hash value from the previous hash value was pretty bad. Hopefully Victor can clear it up in recitation!

@hellaren 6 жыл бұрын

If you wonder how new hash calculation works, you could check this video (kzbin.info/www/bejne/p4Kbp4Zol9mmrqs) The author of this video has a perfect explanation of sliding hash calculation. Hope it will help

@JuanVargas-kw4di 3 жыл бұрын

are the recitation videos available? what recitation are you talking about/ who is kevin?

@saikirannaragam6394 5 жыл бұрын

What is the difference between prehash and hash?

@Vflit 2 жыл бұрын

愛:Surely he loves algorithm and Computer Science!

@allandogreat 3 жыл бұрын

Karp-Rabin is not clear.

@suhasdotcom 5 жыл бұрын

Which textbook did he refer to @25:32 ?

@Nickel80 5 жыл бұрын

As per the MIT course page, the book for the class was Introduction to Algorithms, 3rd edition. Reference: ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-006-introduction-to-algorithms-fall-2011/readings/

@obinnaubah9045 6 жыл бұрын

This is actually confusing at the end.

@alirezaghodsipoor2239 5 жыл бұрын

I didn't get a single point of K-R algorithm implementation.

@muazzakaria6390 4 жыл бұрын

@@alirezaghodsipoor2239 imagine you wanna check a sequence of DNA pattern. you have few candidates who have whole long ass pattern,

@cagnusmarlsen4050 3 жыл бұрын

whats written in his t-shirt

@jianhe 3 жыл бұрын

Love. Chinese character.

@cagnusmarlsen4050 3 жыл бұрын

@@jianhe thanks man! much love!!

@codingmaster008 3 жыл бұрын

Gaara fan maybe

@rstark 3 жыл бұрын

Love it so damn much!

@suzukikenta1079 8 жыл бұрын

Can some explain at 31:00 in recitation 9 (kzbin.info/www/bejne/rWfRpoudZad8idE ), why hash and magic are computed hash * base + new mod, and magic * base mod p in append ?

@projectdocx4661 5 жыл бұрын

Why does he use THETA every where?

@gompro 5 жыл бұрын

Because THETA is more accurate term, it means lower/upper bound at the same time. Technically it's different from big O or omega.

@codethings271 7 жыл бұрын

m=n+1

@crjacinro 4 жыл бұрын

m and n confuses me

@jiejiefu 6 жыл бұрын

愛[ài ] in Chinese means love

@codingmaster008 3 жыл бұрын

Naruto gaara fan maybe

@lietlee9253 8 жыл бұрын

愛！

@pjthamg 8 жыл бұрын

+Liuliet Lee What does it mean? :)

@lietlee9253 8 жыл бұрын

+Piotr Jan it means love

@以可-s3p 4 жыл бұрын

Love Erik lol.

@twittertalks3934 3 жыл бұрын

did anyone else noticed youtube AI algorithm is recognising these lectures as chalk

@florianwicher 6 жыл бұрын

Does anyone know if there is special notation for amortized running time? I believe I once saw it being denoted as α() but I can't find anything on the web.

@kenichimori8533 7 жыл бұрын

Karp-Rabin algorithm.

@BeatSyncBytes 6 жыл бұрын

wow amazing

@george5120 3 жыл бұрын

Instructor Erik Demaine wears his hair long and unkempt to distract from the fact that he has a bald spot. His personal appearance would improve if he emphasized looking well-groomed, regardless of whether being well-groomed further exposes the bald spot.

@israeladesanya262 3 жыл бұрын

If that's ungroomed, you should see me.

@WeirdAlSuperFan 2 жыл бұрын

Bro why you hating on Erik? He looks cool like that and anyway is a total beast. In any case, MIT people don't need to worry about their appearance. Their output speaks for itself. Also he's tall af, so he can get away with whatever and still look cool

@gigiduru125 11 жыл бұрын

lol, didn't quite understand it even with the introduction

@obinnaubah9045 6 жыл бұрын

Which part?

@raymondlion314 6 жыл бұрын

I am totally lost in this lecture.

@roylee3196 8 жыл бұрын

cool stuff.

@asdfghjkl1770 5 жыл бұрын

2019 April ? Before End Game?

@gregmakov2680 5 жыл бұрын

hahah, co ke tuc toi qua choi, chem hinh tum lum am thi tum lum, chi zay, khung bo ha, so qua diii thang cho phat xit :D co nhieu len nua thang tuc toi phat xit

@kevinc9987 4 жыл бұрын

爱

@bioboy4519 Жыл бұрын

nice clothes

@clairelolification 4 жыл бұрын

愛愛愛愛不停

@yehcc7808 7 жыл бұрын

愛

@davidjames1684 5 жыл бұрын

In the real world, this bound crap is basically useless.

@israeladesanya262 3 жыл бұрын

Yeah, fuck optimal implementations of built-in data structures, let's just eyeball things.

@ariassingh462 8 ай бұрын

@@israeladesanya262 That's literally how it works in the real world.

@quangtran3403 6 жыл бұрын

Can anybody explain why it is expected constant time at 45:21 ?

@steveye9894 6 жыл бұрын

I think Erik means that the "check whether" function takes THETA(|s|) time, which is THETA(1).

@steveye9894 6 жыл бұрын

Besides, The Amortized time of "check whether" should be THETA(1). Because, it is called only when hash(rs) and hash(rt) equals, but the possibility is sooooo small. The total time will be k1*THETA(|s|) + k2*THETA(1), k1 + k2 == |t| - |s|, and k2 >>>>>> k1.

@WeirdAlSuperFan 2 жыл бұрын

23:37 what are k and 2^k?

@K4moo 2 жыл бұрын

It's 2^k and 2^k+1, the thresholds for shrink and grow is right adjacent to each other, and in an edge case where we insert / delete repeatly right on these thresholds, we will have to do allocation (linear time) for every operation