Scaling Write-Intensive Key-Value Stores

Рет қаралды 22,943

Microsoft Research

Күн бұрын

Пікірлер: 19

@williamclausen3122 11 ай бұрын

This was a great talk, thank you for publishing it!

@munikarmanish 4 жыл бұрын

I'm studying LSM-trees and this is indeed a very helpful video!

@jaigohil4963 4 жыл бұрын

Is someone db fanatic. Really recommend. Great content!

@AnkitGaur9 3 жыл бұрын

Good presentation on LSM tree and it's application.

@dn5426 7 ай бұрын

surprised there wasn't any mention of WiscKey

@DirtyPio 3 жыл бұрын

I'm trying to internalize why probing any run would cost the same and not sure if I'm missingsomething. The paper states that: "Our insight is that the worst-case point lookup cost over the whole LSM-tree is proportional to the sum of the false positive rates of all filters. Assigning equal false positive rates to all filters, however, does not minimize this sum. The reason is that maintaining the same false positive rate across all runs means that larger runs have proportionally larger Bloom filters, whereas the I/O cost of probing any run is the same regardless of its size (due to each run having fence pointers in main memory that allow direct access to the relevant disk page)." Let's say fence pointers cover P pages, then the statement is true as long as each level consists of only P pages, but when higher levels start to have N * P pages you need to find your right page, so you need to search in your array of fence pointers on each level qualified by the bloom filter. So shouldn't the I/O cost be log(N) where a level has N fence pointers? It's going to be very cheap, because the array of fence pointers are in memory and cost of in-memory binary search vs. disk I/O has orders of magnitude difference.

@ozgurakkurt9770 Жыл бұрын

He means every single page has a pointer and you will read only one page if you are doing a point lookup. For example in RocksDB you set a single block size which applies to all levels. So this means every single block of that size will have a pointer. So for a point lookup you will read NumFalsePositives+1 pages from disk or something like that.

@kinglygaurav 4 жыл бұрын

What is a run?

@nivdayan9717 4 жыл бұрын

A run is a sorted array of data

@yunuskoning7584 4 жыл бұрын

@@nivdayan9717 So there can be multiple runs per level right? A run is just lsm specific nomenclature for any member of the set of sorted arrays associated with a single level ready to be merged with runs of the underlying level. Also the buffer itself might be called a level, e.g. level 0 or something.

@nivdayan9717 4 жыл бұрын

@@yunuskoning7584 All correct!

@utkarshgupta2909 3 жыл бұрын

SSTable

@parkatip 5 жыл бұрын

Please put the speaker's name in video titles

@josephoun4218 4 жыл бұрын

His name is Niv Dayan

@Cbon-xh3ry Жыл бұрын

It can be found on the first slide

@parkatip Жыл бұрын

@@Cbon-xh3ry i know that. nothing to do with what i asked

@Cbon-xh3ry Жыл бұрын

@@parkatip why would you want his name in the title ? It could be added to the description but the title doesn’t make sense 🤷‍♂️

@parkatip Жыл бұрын

@@Cbon-xh3ry because it's a crucial bit of information that i don't want to fish out of a video or its description. putting it in the title makes perfect sense to me so i can see it before clicking on these videos and spending a minute to get to that slide. it also makes the name searchable and visible in search results.