can we just add a bloom filter in the metadata for every page ?
@andrew54072 жыл бұрын
Oh thats where a bloom filter comes into play!
@chenzhanyi94554 жыл бұрын
is the sort phase in sort-merge join, the same as, the whole external merge sort (two phases included)
@HarshKapadia4 жыл бұрын
Yes and it has to be independently performed on both the tables participating in the join.
@abdelrhmanahmed13782 жыл бұрын
22:40 do we really scan the inner relation once ? ore for each page in outer relation we going to scan the whole inner relation ?
@greneroom4 жыл бұрын
At 1:03:23, does this account for the disk I/O costs of building the ephemeral hash tables? Assuming the hash table does R does not fit in memory, then for each page M in R, we may have to do multiple disk IOs to update the hash table, right, since the keys are uniformly distributed?
@angus102924 жыл бұрын
In nested loop scan, (M + m.N) => Is this the worse case complexity while considering a column store ? We assume each column is stored in it's own page, and to reconstruct a tuple of the inner table, we fetch every page of the inner table.
@AshishNegi16184 жыл бұрын
Math of column store will be different. One reason is that they store information more densely/compacted. E.g. if their is a column for eye color which takes like max 5 different values, it is possible that one page contain million records with run length encoding. So, it is possible to keep complete db or all relevant columns in memory itself. In column store, to reconstruct a tuple of inner table, we need to only fetch the pages of relevant columns of that tuple only of inner table. E.g. if inner table has 1 million pages and i need all columns of key `A`, and we have 10 columns in table, we will need to fetch 10 page (1 for each column). Good thing is that if query deals with only 3 columns, we need to fetch only 3 pages. We can also defer fetching pages for some column till the last stage which are only present in output.