11 - Join Algorithms (CMU Databases Systems / Fall 2019)

Рет қаралды 21,253

CMU Database Group

Күн бұрын

Пікірлер: 11

@firstlast7086 3 жыл бұрын

45:11 Excellent explanation of Bloom Filters

@AndersonSilva-dg4mg 5 жыл бұрын

really cool

@nikunjbhartia2222 4 жыл бұрын

can we just add a bloom filter in the metadata for every page ?

@andrew5407 2 жыл бұрын

Oh thats where a bloom filter comes into play!

@chenzhanyi9455 4 жыл бұрын

is the sort phase in sort-merge join, the same as, the whole external merge sort (two phases included)

@HarshKapadia 4 жыл бұрын

Yes and it has to be independently performed on both the tables participating in the join.

@abdelrhmanahmed1378 2 жыл бұрын

22:40 do we really scan the inner relation once ? ore for each page in outer relation we going to scan the whole inner relation ?

@greneroom 4 жыл бұрын

At 1:03:23, does this account for the disk I/O costs of building the ephemeral hash tables? Assuming the hash table does R does not fit in memory, then for each page M in R, we may have to do multiple disk IOs to update the hash table, right, since the keys are uniformly distributed?

@angus10292 4 жыл бұрын

In nested loop scan, (M + m.N) => Is this the worse case complexity while considering a column store ? We assume each column is stored in it's own page, and to reconstruct a tuple of the inner table, we fetch every page of the inner table.

@AshishNegi1618 4 жыл бұрын

Math of column store will be different. One reason is that they store information more densely/compacted. E.g. if their is a column for eye color which takes like max 5 different values, it is possible that one page contain million records with run length encoding. So, it is possible to keep complete db or all relevant columns in memory itself. In column store, to reconstruct a tuple of inner table, we need to only fetch the pages of relevant columns of that tuple only of inner table. E.g. if inner table has 1 million pages and i need all columns of key `A`, and we have 10 columns in table, we will need to fetch 10 page (1 for each column). Good thing is that if query deals with only 3 columns, we need to fetch only 3 pages. We can also defer fetching pages for some column till the last stage which are only present in output.