The Secret Sauce Behind NoSQL: LSM Tree

  Рет қаралды 192,934

ByteByteGo

ByteByteGo

Жыл бұрын

Subscribe to our weekly system design newsletter: bit.ly/3tfAlYD
Checkout our bestselling System Design Interview books:
Volume 1: amzn.to/3Ou7gkd
Volume 2: amzn.to/3HqGozy
Digital version of System Design Interview books: bit.ly/3mlDSk9
Animation tools: Illustrator and After Effects
ABOUT US:
Covering topics and trends in large-scale system design, from the authors of the best-selling System Design Interview series.

Пікірлер: 183
@Kxneki2433
@Kxneki2433 4 ай бұрын
IMPORTANT: Don't forget the Memtable is stored in memory, so if the system crashes, that data will be lost. To avoid losing data, we can maintain a separate log file on disk. Every time we write to the Memtable, we'll also append that write to the log file (no need to sort it as we just use it to restore after a crash). Then once the Memtable contents get written out to a SSTable file, we can erase the log file. That way the log helps us avoid losing writes stuck in memory when a crash happens.
@jerichaux9219
@jerichaux9219 22 күн бұрын
This seems like a fairly important detail.
@sealuke2724
@sealuke2724 Жыл бұрын
Bruh, this is just awesome... keep going
@JoelBrubaker
@JoelBrubaker Жыл бұрын
This is the perfect amount of depth and overview I’m looking for. Great videos and visuals!!
@akash-kumar737
@akash-kumar737 Жыл бұрын
Yes, awesome video.
@chrisodillman3355
@chrisodillman3355 Жыл бұрын
+1
@phanphong3533
@phanphong3533 Жыл бұрын
This guy is better than my teamlead in term of explaining a concept on NoSQL, thank you , make my day
@CommandantNOVA
@CommandantNOVA Жыл бұрын
LSM trees are actually used a lot in modern SQL databases as well. the idea is to represent a relational table as a series of packed records - [k: primary key v: field 1, field 2]. Indices can be created on other fields by creating additional k:v pairs - [k: field 1 v: primary key] so an index scan just becomes two NoSQL lookups for each result.
@romannasuti25
@romannasuti25 Жыл бұрын
Spot on. One thing to keep in mind is that all databases, SQL or noSQL, are based on the principles of key value operations and the transaction consistency algorithms that allow for data integrity. The core calculus and proofs behind these systems are all done in the perspective of only two types of operation: read and write, each to some concept of a key. These algorithms vary wildly and come with varied pros and cons, some with very weird limitations that actually DO prevent SQL semantics (the FaunaDB and FoundationDB distributed algorithms require the W/R set to be known before starting the transaction, and SQL requires conditional and unbounded W/R sets to fully abide by the spec).
@greyowl3787
@greyowl3787 Жыл бұрын
@@romannasuti25 what do you mean by “W/R set” ?
@romannasuti25
@romannasuti25 Жыл бұрын
@@greyowl3787 write/read. Basically, Fauna and Foundation use a distributed transaction system that can only provide full ACID guarantees with exact read and write keys known before the transaction even starts. This means dependent querying is usually not possible, like "SELECT name IN users WHERE id = (SELECT friend IN users WHERE name = x);" simply because the result of the inner query changes the write set of the outer query based on its result. In some cases a good query engine can narrow down to a fixed "maybe W/R" set if the results are well bounded (either single element or known countable range), but generally they don't bother with SQL support for this reason.
@akash-kumar737
@akash-kumar737 Жыл бұрын
@ismailmo4
@ismailmo4 Жыл бұрын
@@greyowl3787 write/read i assume
@maxil122
@maxil122 Жыл бұрын
That's the best system design content I have ever seen on youtube ! This channel is absolutely amazing. It must be tough to squeeze all that valuable knowledge into less than a 10 minutes video. Keep up the excellent work!
@mr_nature
@mr_nature Жыл бұрын
I appreciate your efforts. Thanks for making system design more palatable than ever.
@bardhan.abhirup
@bardhan.abhirup Жыл бұрын
These videos are incredible! Very well paced and presented!
@AminAramoon
@AminAramoon Жыл бұрын
These videos are superb man, keep up the good work
@nishantgoel769
@nishantgoel769 4 ай бұрын
One of the concise video to understand how elastic/lucene is using these things to fast write and read. Great work man!
@danielgospodinow
@danielgospodinow 4 ай бұрын
I can't recall the last time I stumbled upon such great material! Fascinating work!
@BRBearUSA
@BRBearUSA Жыл бұрын
VERY informative without going into too much complexity. THANKS and congrats for a great video. I'm an MS SQL Server DBA, and the high level explanation you provided was awesome. Thanks again. Best, R.
@susingh2
@susingh2 Жыл бұрын
I never experienced such a communicative video with such a simple and easy explanation.. Thanks Alex ..Please keep it up and upload more such videos. I have anyway bought both volume of System Design Book. Thank you so much !!!
@bazoo513
@bazoo513 Жыл бұрын
This is the most information-dense video on what's so special about NoSQL databases I ever saw. Not a word was superfluous, but the key concepts were clearly transmitted.
@arthursoares610
@arthursoares610 Жыл бұрын
The dark mode was awesome. I think it could be the default from now on
@guptaanmol184
@guptaanmol184 Жыл бұрын
I came here to say this! +1 for dark more, finally!
@rafaelacioly3252
@rafaelacioly3252 Жыл бұрын
This channel is by far the best channel that I've found on yt about tech
@tomtomsiesie5436
@tomtomsiesie5436 Жыл бұрын
Another amazing video! This format is so vaiuable.
@bobdinitto
@bobdinitto Жыл бұрын
I've often wondered how NOSQL databases can achieve higher write throughput than relational DB. Thank you for sharing the techniques involved. Your explanation is clear and the graphics are excellent!
@balaclava351
@balaclava351 Жыл бұрын
Great video. I'm a junior dev that has to implement a chat feature in the next few weeks. This really helped me understand NoSQL. Thanks.
@LeoFuso
@LeoFuso Жыл бұрын
Great work! I wish I had watched this video before trying to learn the architecture behind RocksDB by only looking their documentation, hahaha! Awesome work!
@Konservator69
@Konservator69 Жыл бұрын
Good video. For the further topic development it'd be interesting to see a LSM tree vs Redis RDM/AOF persistence schema comparison.
@sampleshawn5380
@sampleshawn5380 Жыл бұрын
Man this video is awesome, so much information loved it!
@richsadowsky8580
@richsadowsky8580 Жыл бұрын
Really awesome overview of how SQL and NoSQL differ. Agree with Joel below me just the right level of detail to provide value.
@jiamingliu192
@jiamingliu192 Жыл бұрын
Absolutely amazing. Thank you for making the video!
@Andrew-rc3vh
@Andrew-rc3vh Жыл бұрын
That's a cool trick. So what it is essentially doing is spreading out the computer's resources across time, where with a traditional database you will get lots of spikes in processing on the timeline when doing reads and writes. Mind you the attraction of SQL was that you could have multiple indexes and create custom views on the data in a highly relational way, but granted that was very expensive on resources. I think traditional databases were mostly optimised for the deficiencies of the mechanical hard disk drive. I think it may end up as redundant in future as we store our data more on chips. Thanks for the video. I like videos that don't waste your time with long BS intros.
@BRBearUSA
@BRBearUSA Жыл бұрын
You read my mind when it comes to "spreading the computer's resources across time"... Which for some use cases makes perfect sense, but not all use cases. Not sure I agree with the "deficiencies of mechanical HDDs" part of your comment though. But great comment overall.
@cexploreful
@cexploreful Жыл бұрын
LOVE THE DARK BACKGROUND! :D:D:D:D (Also the video of course!!)
@alamelu85
@alamelu85 Жыл бұрын
Alex - Your lectures and contents are great assets to software engineering community.
@frederickbarbarossa2746
@frederickbarbarossa2746 Жыл бұрын
besides bloom filter, a sparse index is to help find a key quickly so we only look into a small number of sstables
@modolief
@modolief Жыл бұрын
Sublime, superb, excellent ... again.
@arno.claude
@arno.claude Жыл бұрын
This channel is such a gold mine!
@lechprotean
@lechprotean Жыл бұрын
great, you make it sound so simple, I'm writing my own nosql db this weekend ;)
@chaoluncai4300
@chaoluncai4300 Жыл бұрын
did you start writing
@EduardAlexandru
@EduardAlexandru Жыл бұрын
well, did you?
@eliyahubasa9401
@eliyahubasa9401 Жыл бұрын
Great Content, thank you very much. I'm already waiting for Tuesday for a new video, as much as I wait for a new One Piece episode.
@TheOnlyRealBreadIntheWorld
@TheOnlyRealBreadIntheWorld 2 ай бұрын
amazing content, thank you for all the hard work you do!
@roct07
@roct07 9 ай бұрын
This is so high quality. Thank you :)
@obedgennius1401
@obedgennius1401 Жыл бұрын
I really appreciated these videos !! Thanks you very much I would to know what software are you using to produce such presentation 🙏
@MostafaZeinali
@MostafaZeinali Жыл бұрын
Thank you for the great video. Keep up the good work. ❤
@ibgib
@ibgib Жыл бұрын
Like many have said, this is a great dense video at a good beginner depth for people like me. I hope there will be a relational db video that complements this if at all possible, since the quality level is so high 🤞
@KristjanKask
@KristjanKask Жыл бұрын
Awesome video! I'd love to know how densely ordered vectors for database storage work next :)
@anilkaliya3375
@anilkaliya3375 6 ай бұрын
Nicely explained. I read all this information in designing data intensive application book. But few topics like memorable and sstables were still bit unclear to me. Got the whole idea now. Great stuff. Keep going
@shakedrosenblat1925
@shakedrosenblat1925 6 ай бұрын
Thank you. great video, as always. I'd like if you guys could go into more detail
@aaronprindle385
@aaronprindle385 Жыл бұрын
Awesome work, thanks for this!
@BiggRanger
@BiggRanger Жыл бұрын
Excellent presentation!
@javisartdesign
@javisartdesign Жыл бұрын
very well explained. Thanks
@AllOfMyWat
@AllOfMyWat Жыл бұрын
I can't wait for my next systems design interview!
@sanzharsuleimenov6380
@sanzharsuleimenov6380 Жыл бұрын
Great explanation!
@peepeepoopoo2243
@peepeepoopoo2243 Жыл бұрын
Great video!
@pashachechehov3483
@pashachechehov3483 Жыл бұрын
Great visualization of "Designing Data-Intensive Applications" book
@ethanmye-rs
@ethanmye-rs Жыл бұрын
Thanks! One of the things I find difficult to find good information on is structuring data. Given I want these x properties, how do so arrange the information to get them, and what technologies are required to do it.
@quirkyquerty
@quirkyquerty Ай бұрын
in their book, bloom filters are stored on disk, but here they're shown to be in memory. Hopefully we'll get some eventual consistency
@dhpark7509
@dhpark7509 Жыл бұрын
great explanation!
@mnchester
@mnchester Жыл бұрын
amazing video!
@jrabelo_
@jrabelo_ Жыл бұрын
perfect explanation, thanks
@carolinegr
@carolinegr Жыл бұрын
One additional thing to mention (around kzbin.info/www/bejne/f2fNc2Okgp6Ggbc perhaps) is that the writes are written to memory AND a transaction log to ensure durability. Otherwise whatever was in the MemTable will not persist after a crash. The transaction log can be replayed to rebuild the MemTable.
@ByteByteGo
@ByteByteGo Жыл бұрын
Thank you for the feedback. We did have that initially, but decided to take it out to focus on the LSM tree itself. We knew someone would bring it up, but didn't think it would take this long. 😂 We are glad you did.
@ibgib
@ibgib Жыл бұрын
I was actually wondering if this were a difference with what I understand of relational dbs. Thanks for pointing it out.
@faris_id_music
@faris_id_music Жыл бұрын
one of the best videos so far
@AndyThomasStaff
@AndyThomasStaff Жыл бұрын
Great explanation, thank you, this helped reinforce my learnings about LSM-trees from reading. The graphics were especially helpful
@aus10d
@aus10d Жыл бұрын
Very interesting! Loved this video
@galeop
@galeop Жыл бұрын
Amazing video 1:15 what is that "object key" ? The row key that is being edited/added to the keyspace ?
@caseyspaulding
@caseyspaulding Жыл бұрын
Thank you!
@axa993
@axa993 Жыл бұрын
Awesome overview.
@raviv5109
@raviv5109 Жыл бұрын
Thank you so much!
@chaoluncai4300
@chaoluncai4300 Жыл бұрын
this is brilliant! I'm also wondering would this amount/level of knowledge for an advanced DS is enough for tech/system design interview? Obviously I think the interviewer won't ask for implementation so... ig im trying to know how deeper do we need to go than e.g. this channel's few minutes videos?
@StephenGillie
@StephenGillie Жыл бұрын
Very cool. Could have - one process just taking DB writes and putting them in memory - another writes too-big variables to files on disk - next would go through files and flatten them (like continuous truncate/shrink on a transaction log) - last would take DB reads and go through the memory, bloom filters, and file structure to find and return the requested data.
@rbelatamas
@rbelatamas Жыл бұрын
great explanation ❤
@allisonmachado
@allisonmachado Жыл бұрын
awesome video indeed! thank you
@arunkutube
@arunkutube 4 ай бұрын
great explanation for beginners
@adamaiken00
@adamaiken00 Жыл бұрын
This is a great video. If I want ti go further is there any good reference for nosql?
@dmytrosolovei6025
@dmytrosolovei6025 Жыл бұрын
Love your videos!
@shoobidyboop8634
@shoobidyboop8634 Жыл бұрын
Stuff like this is the future of many forms of education.
@DK-ox7ze
@DK-ox7ze Жыл бұрын
Great explanation. Resolved all my doubts on how NoSql DBs work. However, I wanted to understand 1) Whether the balanced tree and keys in sorted set is only the object key with pointer to data value or it also contains the actual data? 2) Can a NoSql DB index multiple keys? 3) Why can't SQL DB also implement flushing mechanism in order to speed up writes? I know that they are highly consistent so they need to persist data to disk, but they can simply append the entry in a log file just like NoSql DBs do, and in case of a network partition, first check the log file to sync data in actual database?
@bojandolinar1535
@bojandolinar1535 Жыл бұрын
Re 3 afaik that's what they already do. But sooner or later it has to write them to b-tree, which I guess is the real bottleneck.
@GyroCannon
@GyroCannon Жыл бұрын
Not at all a question I had, but glad I watched because I extensively use Mongo for my own app
@LuccasMaso
@LuccasMaso Жыл бұрын
Amazing!
@plussin2760
@plussin2760 3 ай бұрын
LSM Tree에 대한 이해와 작동 방식에 대한 개요를 알 수 있었습니다. ㄳ 합니다
@DarknessGu1deMe
@DarknessGu1deMe Жыл бұрын
What would be a good example of an application benefiting from "fast write slow read" property of NoSQL DB? Based on what's presented, I'd say most user-facing application, like a typical service a startup would build (e.g. personal calendar organization, etc) doesn't sound like a good fit given reads are pretty important in user-facing traffic.
@wrondonparticual5113
@wrondonparticual5113 Жыл бұрын
What is the animation software used? It is beautiful
@willl0014
@willl0014 Жыл бұрын
So much knowledge!!!
@noahgsolomon
@noahgsolomon 7 ай бұрын
awesome explanation
@thomasrobin
@thomasrobin Жыл бұрын
Amazing Content! Does anyone know how these animations are made?
@doxologist
@doxologist Жыл бұрын
Perhaps the best educational systems content on the whole of KZbin right now. Great stuff
@pdteach
@pdteach Жыл бұрын
Simply best
@Chauhannitin
@Chauhannitin Жыл бұрын
Very good animation
@paddyd7642
@paddyd7642 Жыл бұрын
Thank you! When you say sorted, is it by some object id or time?
@big0bad0brad
@big0bad0brad Жыл бұрын
Object ID, unless you design the system to use a high precision timestamp as the id, which could maybe be an interesting idea. If the Object ID is a timestamp, then object creations are already sorted which could further boost write performance, though there is probably no advantage in the case of updates or deletes.
@vikasgupta1828
@vikasgupta1828 Жыл бұрын
Thanks
@ThangNguyen-je8kv
@ThangNguyen-je8kv Жыл бұрын
The next step after watching this video is to … watch it again 😂. Awesome as always.
@sophiiisticated
@sophiiisticated Жыл бұрын
I think the movie is recommended to left 30 seconds instead of 10s for the ending credit because the next suggestion video covers (in the left bottom side) in the summary time
@Garvitatri
@Garvitatri Жыл бұрын
thx and subscribed
@singhsaubhik
@singhsaubhik Жыл бұрын
This is an awesome overview of LSM tree. If someone wants dig deeper read "Design Data intensive applications".
@venkataramaraoemmadi378
@venkataramaraoemmadi378 Жыл бұрын
It is an awesome book.
@akash-kumar737
@akash-kumar737 Жыл бұрын
thanks
@VirgoBoy2991
@VirgoBoy2991 Жыл бұрын
Hi, i just wonder why the second volume has not been available in kindle yet :( i'm from Vietnam and shipping is expensive
@badrbellaj
@badrbellaj Жыл бұрын
brilliant
@thisisnotok2100
@thisisnotok2100 Жыл бұрын
yeah I freakin love this channel
@geck1204
@geck1204 Жыл бұрын
Wow this was great
@lesmatheson6001
@lesmatheson6001 Жыл бұрын
Very interesting
@AmrishPandey
@AmrishPandey Жыл бұрын
This is amazing video
@paramvirsingh5640
@paramvirsingh5640 Жыл бұрын
Looking at how beautifuly my tired brain understood this, this video deserves a noble prize.
@adatalearner8683
@adatalearner8683 Жыл бұрын
Is the concept of SS tables specific to a specific type of NOSQL database(Cassandra), or generic ?
@wave9303
@wave9303 Жыл бұрын
hi , can you do a video on how Splunk is use for devops and how it storing its data ?
@thelonearchitect
@thelonearchitect Жыл бұрын
Thanks for the video. Your explanation rises a concern to me : since the memtable is in memory, what happens if the server crashes before flushing ? Is that memtable distributed or replicated ?
@lwfeagan
@lwfeagan Жыл бұрын
Cassandra, for example, still has a write ahead log.
@nitinagrawalbst
@nitinagrawalbst Жыл бұрын
Generally for the memory table write ahead log is maintained. Once the memory table is moved to create SSDTable write ahead log is deleted. In case of crashes write ahead log can be used to restore the memory table.
@anvogel99
@anvogel99 Жыл бұрын
Impressive
@BuyHighSellLo
@BuyHighSellLo Жыл бұрын
How does Mongodb compare to the two types explained in the video?
@ckirkyg
@ckirkyg Жыл бұрын
This was my question as well. Does it also use an lsm tree or some other approach
@mohawkgwai
@mohawkgwai Жыл бұрын
Cassandra also has Leveled Compaction Strategy so that slide comparing it to RocksDB is a little misleading
@raphaelcarvalho4288
@raphaelcarvalho4288 Жыл бұрын
Cassandra initially had size tiered only and later borrowed leveled from RocksDB to solve the space amplification problem, so it's not completely misleading.
System Design: Why is single-threaded Redis so fast?
3:39
ByteByteGo
Рет қаралды 290 М.
Understanding B-Trees: The Data Structure Behind Modern Databases
12:39
Khóa ly biệt
01:00
Đào Nguyễn Ánh - Hữu Hưng
Рет қаралды 20 МЛН
She ruined my dominos! 😭 Cool train tool helps me #gadget
00:40
Go Gizmo!
Рет қаралды 59 МЛН
Универ. 13 лет спустя - ВСЕ СЕРИИ ПОДРЯД
9:07:11
Комедии 2023
Рет қаралды 6 МЛН
Мы никогда не были так напуганы!
00:15
Аришнев
Рет қаралды 3,3 МЛН
How Discord Stores TRILLIONS of Messages
7:11
ByteByteGo
Рет қаралды 154 М.
Базы данных LSM tree
17:01
Sergey Nemchinskiy
Рет қаралды 13 М.
Back-Of-The-Envelope Estimation / Capacity Planning
8:32
ByteByteGo
Рет қаралды 87 М.
Caching Pitfalls Every Developer Should Know
6:41
ByteByteGo
Рет қаралды 109 М.
The cloud is over-engineered and overpriced (no music)
14:39
Tom Delalande
Рет қаралды 458 М.
I've been using Redis wrong this whole time...
20:53
Dreams of Code
Рет қаралды 333 М.
Top 5 Most-Used Deployment Strategies
10:00
ByteByteGo
Рет қаралды 245 М.
B-tree vs B+ tree in Database Systems
31:50
Hussein Nasser
Рет қаралды 49 М.
Khóa ly biệt
01:00
Đào Nguyễn Ánh - Hữu Hưng
Рет қаралды 20 МЛН