This was a great talk, thank you for publishing it!
@munikarmanish4 жыл бұрын
I'm studying LSM-trees and this is indeed a very helpful video!
@jaigohil49634 жыл бұрын
Is someone db fanatic. Really recommend. Great content!
@AnkitGaur93 жыл бұрын
Good presentation on LSM tree and it's application.
@dn54267 ай бұрын
surprised there wasn't any mention of WiscKey
@DirtyPio3 жыл бұрын
I'm trying to internalize why probing any run would cost the same and not sure if I'm missingsomething. The paper states that: "Our insight is that the worst-case point lookup cost over the whole LSM-tree is proportional to the sum of the false positive rates of all filters. Assigning equal false positive rates to all filters, however, does not minimize this sum. The reason is that maintaining the same false positive rate across all runs means that larger runs have proportionally larger Bloom filters, whereas the I/O cost of probing any run is the same regardless of its size (due to each run having fence pointers in main memory that allow direct access to the relevant disk page)." Let's say fence pointers cover P pages, then the statement is true as long as each level consists of only P pages, but when higher levels start to have N * P pages you need to find your right page, so you need to search in your array of fence pointers on each level qualified by the bloom filter. So shouldn't the I/O cost be log(N) where a level has N fence pointers? It's going to be very cheap, because the array of fence pointers are in memory and cost of in-memory binary search vs. disk I/O has orders of magnitude difference.
@ozgurakkurt9770 Жыл бұрын
He means every single page has a pointer and you will read only one page if you are doing a point lookup. For example in RocksDB you set a single block size which applies to all levels. So this means every single block of that size will have a pointer. So for a point lookup you will read NumFalsePositives+1 pages from disk or something like that.
@kinglygaurav4 жыл бұрын
What is a run?
@nivdayan97174 жыл бұрын
A run is a sorted array of data
@yunuskoning75844 жыл бұрын
@@nivdayan9717 So there can be multiple runs per level right? A run is just lsm specific nomenclature for any member of the set of sorted arrays associated with a single level ready to be merged with runs of the underlying level. Also the buffer itself might be called a level, e.g. level 0 or something.
@nivdayan97174 жыл бұрын
@@yunuskoning7584 All correct!
@utkarshgupta29093 жыл бұрын
SSTable
@parkatip5 жыл бұрын
Please put the speaker's name in video titles
@josephoun42184 жыл бұрын
His name is Niv Dayan
@Cbon-xh3ry Жыл бұрын
It can be found on the first slide
@parkatip Жыл бұрын
@@Cbon-xh3ry i know that. nothing to do with what i asked
@Cbon-xh3ry Жыл бұрын
@@parkatip why would you want his name in the title ? It could be added to the description but the title doesn’t make sense 🤷♂️
@parkatip Жыл бұрын
@@Cbon-xh3ry because it's a crucial bit of information that i don't want to fish out of a video or its description. putting it in the title makes perfect sense to me so i can see it before clicking on these videos and spending a minute to get to that slide. it also makes the name searchable and visible in search results.