0:15 DJ Drop Tables 0:56 Administrivia 1:45 Course status 3:04 Data structures 4:47 Design decisions (data organization, concurrency) 6:55 Hash tables 8:38 Constant factors matter a lot 9:28 Static hash table 10:54 Static hash table: assumptions 12:20 Static hash table: perfect hash function 13:24 Hash table design decisions: hash function, hashing scheme 16:12 Today's agenda: hash functions, static/dynamic hashing schemes 16:43 Hash functions 18:35 Hash functions: CRC-64, {Murmur,City,XX,Farm}Hash 20:30 Hash function benchmark (XXHash3 was (still is?) the best) 22:42 A question on hash collision attack 24:45 Static hashing schemes (linear probe, robin hood, cuckoo) 26:23 Linear probe hashing (open addressing) 29:28 Linear probe hashing - deletes (tombstones) 35:22 Non-unique keys (separate linked list, redundant keys) 37:30 Robin Hood hashing 43:01 Cuckoo hashing 47:22 Observation: resizing, dynamic hash tables 48:49 Chained hashing 50:35 Extendible hashing 56:59 What is the relationship between a hash table and a buffer pool? 1:00:10 Linear hashing 1:07:28 Linear hashing - deletes 1:10:40 Conclusion
@luisponce35804 жыл бұрын
beast
@marcoq71604 жыл бұрын
@@luisponce3580 haha, man, you're doing great! Thanks for these comments, they make me smile :) This is the penultimate timestamp comment I made. But you motivate me to get back to it again!
@luisponce35804 жыл бұрын
@@marcoq7160 lol, appreciated!
@ashishacharya40015 жыл бұрын
No "Hit it" at the end? I'm disappointed.
@garfieldnate4 жыл бұрын
Java checks if the collection's keys implement Comparable, and if so it uses TreeMaps in the buckets instead of simple LinkedLists. In my undergrad data structures class, the first thing we learned about hashes is "this linear probing thing sucks, use linked lists in the buckets instead" (I never heard the term "chained hashing" before this lecture). So it was really eye-opening to hear the lecturer say, "in practice everyone uses linear probing"! I suspect the JVM default is a balanced generic implementation for many disparate use-cases, which has quite different requirements from what we would use in an RDBMS. In general I feel that this lecture filled lots of holes in my knowledge. Thanks for another great lecture!
@NaderHGhanbari3 жыл бұрын
I believe his statement about linear probing is in the context of databases, not for general purpose programming.
@willw25963 жыл бұрын
Nice lecture. For extendible hashing, when the directory needs doubling due to a bucket split, you don't need to re-map all the the directory entries if using the suffix hash bits as the directory index. Just bulk copy the first half of the directory content to the second half of the new directory would work. Basically the two halves of the directory entries both pointing to the same buckets after the doubling.
@cacheman5 жыл бұрын
12:33 I will admit to some confusion here; this section makes it sound like perfect hashing is only of theoretical interest, but of course there's plenty of software to generate even a minimal mapping (like gperf, or CMPH) that is used in production software. 19:03 Using CRC for hash tables, yes probably a bad idea. That said, modern CPUs (both AMD64/SSE4.2 and some ARM variants) include CRC32 instructions which should in some sense be "fast", for their intended use-case. 22:42 Is he saying he's _never_ heard about this before, or that he's heard of it before? Okay kids, having a seed is not enough; see Aumasson & Bernstein on breaking murmur. More generally, look up 'algorithmic complexity attacks'. It's cool to ignore it, but only when you know what you're doing.
@AndersonSilva-dg4mg5 жыл бұрын
thank you Andy
@420_gunna4 жыл бұрын
feeling blessed to hear my professor say the crc hash "sucks ass" in 2020 year of our lord
@xuejianpan30203 жыл бұрын
Great lecture! But I have some questions: 1)is chained hashing the most popular one? 2)Extendible hashing and linear hashing might be used when there are a lot of collisions, but do we need that if we have a good hash function? In practice are they used widely? 3)are static hash tables actually used in some real systems? If so why are they better than chained hashing? Thanks!
@alfin36445 жыл бұрын
SHA-256 is not reversible
@andypavlo5 жыл бұрын
Ah you are correct! I got this very wrong. I will correct this in the next lecture. Thanks! My point about that we don't care about leaking information is the more important part though.
@anarionzuo14254 жыл бұрын
@@andypavlo It seems some chinese guy has made this reversible...
@abdelrhmanahmed13783 жыл бұрын
In Robin hode what if we want to move the richer one down by one but the place is already taken ?
@abdelrhmanahmed13783 жыл бұрын
i have example where we will have the number of jumps = 3 , if we have A[0],B[0],C[1],D[2] , and now we want to insert M and its location is where A at . so M[0] == A[0] , so next B[0]
@paulchiang47524 жыл бұрын
Thanks, andy. So the tombstone will be considered when we calculate the fill factor(load factor) since it is a kind of entry?
@theUnorthodoxxx4 жыл бұрын
I do not think so. Because we have used it for our easiness only.It is just a special empty entry.
@paulchiang47524 жыл бұрын
I understand what you mean, but in this video, from 30:15 to 31:00, Professor said that "the tombstone is wasting space and contributing to the fill factor". I am not quite sure and thus want to confirm.
@AshishNegi16184 жыл бұрын
@@paulchiang4752 Think what happens if you add N keys and delete N keys where hash table size is N. Should you consider hash table completely filled ?
@zhenghanghu24302 жыл бұрын
Yes, tombstones are considered for the fill factor.
@monitoringtsdb38843 жыл бұрын
such fire beats
@zeevtarantov5 жыл бұрын
Why is the a lesson on basics of hash functions in a "how to build a RDBMS" course?
@AshishNegi16184 жыл бұрын
First 5 mins of this lecture talks about this.
@zeevtarantov5 жыл бұрын
SHA-256 is not an asymmetric cryptographic primitive, and it is not reversible, and the author should bleep out the audio of that part if they don't want to take down the whole video.