B-tree vs B+ tree in Database Systems

  Рет қаралды 49,680

Hussein Nasser

Hussein Nasser

Күн бұрын

In this episode of the backend engineering show I'll discuss the difference between b-tree and b+tree why they were invented, what problems do they solve, and the advantages and disadvantages of both. I'll also discuss the limitation of implementing b-tree over b+tree and how Discord ran into a memory limitation using b-tree Mongo.
Check out my udemy Introduction to Database Engineering course
husseinnasser.com/courses
Learn the fundamentals of database systems to understand and build performant backend apps
Chapters
0:00 Data structure and algorithms
1:30 Working with large datasets
6:00 Binary Tree
8:30 B-tree
19:30 B+ tree
22:00 B-tree vs B+ tree benefits
25:00 MongoDB Btree Indexes Trouble
30:00 Summary
working with a billion-row table (Members only)
• Best Practices Working...
indexing video
• Database Indexing Expl...
Discord moving from MongoDB to Cassandra
• Why Discord Moved from...
/ discord
MongoDB Indexes
docs.mongodb.com/manual/indexes/
Postgres Indexes
www.postgresql.org/docs/13/bt...
b-tree code from ‪@usfcalifornia‬
www.cs.usfca.edu/~galles/visu...
b+tree
www.cs.usfca.edu/~galles/visu...
Support my work on PayPal
bit.ly/33ENps4
Become a Member on KZbin
/ @hnasr
🧑‍🏫 Courses I Teach
husseinnasser.com/courses
🏭 Backend Engineering Videos in Order
backend.husseinnasser.com
💾 Database Engineering Videos
• Database Engineering
🎙️Listen to the Backend Engineering Podcast
husseinnasser.com/podcast
Gears and tools used on the Channel (affiliates)
🖼️ Slides and Thumbnail Design
Canva
partner.canva.com/c/2766475/6...
🎙️ Mic Gear
Shure SM7B Cardioid Dynamic Microphone
amzn.to/3o1NiBi
Cloudlifter
amzn.to/2RAeyLo
XLR cables
amzn.to/3tvMJRu
Focusrite Audio Interface
amzn.to/3f2vjGY
📷 Camera Gear
Canon M50 Mark II
amzn.to/3o2ed0c
Micro HDMI to HDMI
amzn.to/3uwCxK3
Video capture card
amzn.to/3f34pyD
AC Wall for constant power
amzn.to/3eueoxP
Stay Awesome,
Hussein

Пікірлер: 86
@hnasr
@hnasr 3 жыл бұрын
Check out my udemy Introduction to Database Engineering course db3 database.husseinnasser.com Learn the fundamentals of database systems to understand and build performant backend apps
@neiliwael5536
@neiliwael5536 2 жыл бұрын
hey hussein , can u tell me how to join the channel to get access to all the content ? :D
@zklevsha
@zklevsha 3 жыл бұрын
A+ explanation of B+ tree topic Thank you Hussein
@jetzemeilink
@jetzemeilink Жыл бұрын
Man, hussein the first 40 seconds are so on point. If you really want to learn something new out of curiosity or to solve a problem you're currently facing. You will be driven to understand the concept and way more likely to retain the information and when to apply it.
@gaofan2856
@gaofan2856 Жыл бұрын
Abdul Bari, Tushar Roy, Hussein Nasser, Alex Xu, Andrew Ng - those people are simply legends in explaining convoluted concepts in simple language.
@tyrodev5281
@tyrodev5281 3 жыл бұрын
I'd love if you could find the time to talk about different data structures, their usage, why they were created in the first place... the in-depth stuff because I haven't seen anyone analyze things as critically as you. Thanks!!
@santoshbhatnagar2155
@santoshbhatnagar2155 8 ай бұрын
Really loved how you correlate things. The mongodb example was very eye opening
@Miggleness
@Miggleness 3 жыл бұрын
Database Internals is an excellent book that goes through various algorithm and implementations for storing data.
@wassimboussebha2561
@wassimboussebha2561 2 жыл бұрын
Please I am beginner , I have only a basic knowledge in data structures like trees etc.., I want to learn more about databases, do you suggest any good book before reading databases internals book?
@Miggleness
@Miggleness 2 жыл бұрын
@@wassimboussebha2561 you can just right into Database Internals.
@gelidvoum1207
@gelidvoum1207 3 жыл бұрын
MongoDB tends to have index bloat problems, even with only 2-3 indexes they take up 10%-20% of the original dataset size.
@SheikhYourbutty
@SheikhYourbutty 3 жыл бұрын
Hussein, thank you very much for your time for creating such educative content for advanced topics. I would like to suggest additional topics to cover regarding b-trees. 1) What regulates b-tree depth? 2) And how b-tree behave in case of composite index?
@hnasr
@hnasr 3 жыл бұрын
Great suggestion!
@germanreynaga7256
@germanreynaga7256 2 жыл бұрын
I'm looking all your videos, great job man and thanks for all the information again
@tusharrao6316
@tusharrao6316 Жыл бұрын
People like you motivate me to study hard and develop some skills to be proud of.
@miguelgarciadasilva
@miguelgarciadasilva 3 жыл бұрын
Great content guy, thanks for sharing. Would be great another video discusing LSM Tree
@endchoice1073
@endchoice1073 3 жыл бұрын
i agree with exactly with you on data structers and algorithms learning them for interviews is a burden if you understand how data is stored and memory things it is really good
@vaideshshankar9899
@vaideshshankar9899 2 жыл бұрын
A million dollar advice given by him
@chandeeparora.7165
@chandeeparora.7165 3 жыл бұрын
Thanks for this Hussein!
@jujijiju6929
@jujijiju6929 3 жыл бұрын
Is that a sword in the background? Damn... That looks lifesize lmao
@ayenewyihune
@ayenewyihune 8 ай бұрын
Very clear and interesting, thanks
@SunnyGuptaTech
@SunnyGuptaTech 3 жыл бұрын
Really Awesome Nasser
@Telenisme
@Telenisme 2 жыл бұрын
You are awesome man.. thank you so much for this incredible channel.
@animatedzombie64
@animatedzombie64 3 жыл бұрын
thanks hussein
@AhmedMohamed-zr9kw
@AhmedMohamed-zr9kw 2 жыл бұрын
you're great Hussien
@amalalsaeh4894
@amalalsaeh4894 4 ай бұрын
thank you so much very valuable information
@0xc0ffee_
@0xc0ffee_ Жыл бұрын
MongoDB uses B+ by WiredTiger default storage engine.
@rocklife1802
@rocklife1802 2 жыл бұрын
Really helpful video
@tjalferes
@tjalferes 3 жыл бұрын
Thank you!
@nitindevatraj
@nitindevatraj 3 жыл бұрын
Found your channel through codedamn , I like the backend focused content, any suggestions, resources for learning backend with node, there is lot of content on front end but not much content with depth on backend
@moazmohamedhassanbayoumi4722
@moazmohamedhassanbayoumi4722 Жыл бұрын
Thank you.
@ravisemwal5363
@ravisemwal5363 2 жыл бұрын
16:49, that's what she said. Informative video!
@jhguygih
@jhguygih 12 күн бұрын
Great content. I have one question, where can I find material explaining the pros and cons of each tree? You mentioned range querie on B tree is not as efficient. Why is that? Why we need a find for every node and not collect results once we find the end of the range
@ibrahimbayramov3510
@ibrahimbayramov3510 7 ай бұрын
B stands for "ubiquitous". Reference: Douglas Comer: " The ubiquitous b-tree", ACM Computing surveys, volume 11, number 2, pages 121-137 June 1979 8:36
@krozaine
@krozaine 3 жыл бұрын
Does a B-Tree with value as the pointer to the actual data tuple (and not the whole value) count as B-Tree? It should be small in size but work well enough. The differentiating factor in B+ Trees (IMO) is the sequential pointer to the next node and storage of data in the leaves
@pjcollazo8318
@pjcollazo8318 3 жыл бұрын
1:45 Impeccable Japanese
@kenvat6344
@kenvat6344 3 жыл бұрын
here after completing the DBE course in udemy!
@nitindevatraj
@nitindevatraj 3 жыл бұрын
How was it , any suggestions for learning backend with node
@Miggleness
@Miggleness 3 жыл бұрын
B-trees is a generic term and doesn't imply that content is stored on the middle nodes and not the leaf node. MySQLs InnoDB does use B+tree but refers to it as B-tree. WiredTiger is likely similar
@stormShadow64
@stormShadow64 2 жыл бұрын
Awesome wideo
@Varun2799
@Varun2799 2 жыл бұрын
MongoDB does not infact use B-Trees. They use B+ Trees. I dont know why the docs say it is B tree but if you dig deep enough and read the implementation then you'll find that it is in fact a B+ tree lol. I spent the last 2hours going down this rabbit hole because I am preparing a presentation on database storage systems. But yeah. Hope people reading this in the future will find it helpful :)
@Varun2799
@Varun2799 2 жыл бұрын
And as to why Discord went with Cassandra, My guess is because Cassandra uses LSM trees as their data structure for storage which is a whole different thing from B trees
@fadhilimamkurnia4566
@fadhilimamkurnia4566 3 жыл бұрын
I think mongodb can't easily use B+ tree since the key is relatively unpredictable, compared to postgresdb who usually has ordered primary index (1,2,3,4,....). The insert operation on postgresdb (or other relational db) usually just need to increase the primary index hence we just need to append it in the memory, but when its come to mongodb we can have 1,2,8,9 as our keys then inserting 5 will require us to move many nodes in the tree, because we are inserting in the middle of our "array". Cmiiw
@dimitriborgers9800
@dimitriborgers9800 2 ай бұрын
At 15:37, you mention that in Postgres, the index node is the actual page that holds thousands of elements. Is this mentioned in their docs? I've only heard people say it holds the record pointer, not the actual pages itself
@lakshminarasimmanv
@lakshminarasimmanv 2 жыл бұрын
b+ tree is used in Kubernetes(etcd database)
@engineeranonymous
@engineeranonymous 3 жыл бұрын
Since MongoDB 3.2 Mongodb uses Wiredtiger as default storage engine which can use LSM Tree.
@hnasr
@hnasr 3 жыл бұрын
Thank you! I didn’t know that. More info here source.wiredtiger.com/mongodb-3.4/lsm.html
@danielskrypnik5181
@danielskrypnik5181 3 жыл бұрын
Hey, Hussein, could you please explain one thing? Why b+ three is more efficient for RAM? I guess we don’t store data in leaf nodes in RAM. We store only intermediary nodes, isn’t? Didn’t get your explanation of problem with Discord’s problem of Mongo db.
@hnasr
@hnasr 3 жыл бұрын
B-trees store keys and values in all nodes, which means node take more space. B+trees only store values in leave nodes while it only stores keys in intermediate nodes. This means that nodes with only keys are much more smaller than leave nodes. Which means you can easily fit the intermediate nodes in memory easier while keeping the leave nodes in disk, this makes traversal much more efficient. A luxury that b-tree does have.
@Keerthiprincess2061
@Keerthiprincess2061 Жыл бұрын
If we store indexes in RAM,isn't that volatile?
@NotYourAveragePirate
@NotYourAveragePirate Жыл бұрын
@@Keerthiprincess2061 You only store a copy of the index file in RAM. The entire index file + content will be on the disk
@digvijaysingh6882
@digvijaysingh6882 25 күн бұрын
​@@hnasr but isn't the data you're mentioning here is just references to some tuple... Is it of so large size that it could severely affect performance?
@utsabbanerjee9672
@utsabbanerjee9672 3 жыл бұрын
I have a silly question Hussein. I understood that the InnoDB secondary indexes don't store the tuple ID on the heap but just stores the associated PK index values. The question is, why doesn't it simply store the disk address of the PK index node (i mean the address of the node containing the associated PK values in the btree structure created for the PK index) and from there just fetch the on-disk tuple ID? This way the choice of the PK won't bump up the memory requirement for secondary indexes, since the data type of disk address is fixed length. Please help me clear this doubt.
@hnasr
@hnasr 3 жыл бұрын
This is a basic tradeoff that DBMS systems do. Postgres does as you describe, it stores the direct pointer to the tuple in all indexes, while InnoDB stores the primary key as a pointer in all secondary indexes. There are pros and cons in both approaches for different use cases. While true in MySQL based InnoDB, you have to do to index searches to get to the row, (one in the secondary to find the primary key and another b-tree search on the primary to find the row pointer), the write amplification in InnoDB is significant. Compare this to postgres where when you do a secondary index search you immediately jump to the row tuple, but updating any indexed column in the row must update ALL indexes to point to the new row id. I talked about this in detail in my write-amplification private discord (members only) which you have access to as a member, check it out kzbin.info/www/bejne/oKS6qHmClM6kjK8
@utsabbanerjee9672
@utsabbanerjee9672 3 жыл бұрын
@@hnasr The link you shared and a few other docs, helped me get a better understanding. Thanks.
@saralightbourne
@saralightbourne Жыл бұрын
1:46 my friend, did you really say "thank you ソウーマッチョーですか"😁
@rohitmundada703
@rohitmundada703 2 жыл бұрын
Being a noob at databases here, I just had a question: When the index reaches the size of RAM, who decides to put the other part of index into disk? And how do they decide which part of index should go to disk?
@supersu6138
@supersu6138 2 жыл бұрын
I think memory swapping happens once you are out of memory and OS is responsible for resource management
@bijayjungkarki7369
@bijayjungkarki7369 3 жыл бұрын
👌
@abdulmlikba9452
@abdulmlikba9452 3 жыл бұрын
what is the difference between b+tree and b*(star)-tree
@Music_song_Musurmonov_Mehroj
@Music_song_Musurmonov_Mehroj 3 жыл бұрын
Hussein, can you recommend some KZbin channels about database engineering?
@nurmukhammad_30k
@nurmukhammad_30k Жыл бұрын
kzbin.info/www/bejne/l4vNiqVubNdkmJo
@pushpak6523
@pushpak6523 3 жыл бұрын
is that a samurai? behind
@gunnvant
@gunnvant 2 жыл бұрын
learn by doing
@henrmota
@henrmota 3 ай бұрын
Maybe mongodb uses B-Tree because of the nature of the keys UUID instead of sequential? Rebalancing trees will cost more with UUID? Just asking here, not making any affirmation.
@gunnvant
@gunnvant 2 жыл бұрын
Learn by doing
@abdulmoizsheikh8031
@abdulmoizsheikh8031 2 жыл бұрын
Quick question. Why cant you load parts of Btree in memory, but load parts of a b+tree in memory?
@hnasr
@hnasr 2 жыл бұрын
You can load both data structures in memory. Its just you can load more of the b+tree internal nodes compared to b-tree. Just because of the sheer size
@abdulmoizsheikh8031
@abdulmoizsheikh8031 2 жыл бұрын
@@hnasr ah right. Thanks!
@sumedhasaxena6702
@sumedhasaxena6702 3 жыл бұрын
For a second I got confused on why is CarryMinati talking about B+ trees lol
@HungryEagle2610
@HungryEagle2610 5 ай бұрын
Why is there a katana on the shelf?
@gunnvant
@gunnvant 2 жыл бұрын
Posted from terminal
@lionkiddo
@lionkiddo 3 жыл бұрын
When you are so early that there's no pinned comment.
@numtostr
@numtostr 3 жыл бұрын
If MongoDB using Btree and PgSQL using B+tree. So, this means pgsql reads are faster than mongodb?
@hnasr
@hnasr 3 жыл бұрын
Depends on the size of the index and the nature of queries. if the index is small enough to fit in memory it’s similar performance and its a single lookup than btree/b+tree are similar in performance. If the index is large that it doesn’t fit in memory and queries are range based (eg where date range ) b+tree (postgres) wins for sure. discord ran into severe performance issues with mongo when their btree index couldn’t fit in memory, and switched to Cassandra instead which uses an LSM tree instead.
@lostfrequency89
@lostfrequency89 5 ай бұрын
Wolverine
@IBITZEE
@IBITZEE 2 жыл бұрын
Huummm... the B+Tree looks o me like a "Linked-List" where you mantain some metadata (the paths to the nodes) nevertheless it seems a good strategy for +1M leafs... ;-)
@geghamayvazyan5637
@geghamayvazyan5637 3 жыл бұрын
why DBs are not using hash tables?
@MrYokyScape
@MrYokyScape 3 жыл бұрын
Doesn't scale well when you have millions or billions of rows. Also hard for range queries.
@privacyvalued4134
@privacyvalued4134 2 жыл бұрын
This video was rather useless. The main problems are that it skips from B-trees straight into a complete B+ tree (too much edited out) and also doesn't show how the data would be stored on disk. Hand-waving over the two critical parts means anyone wanting details will just waste their time.
@nurmukhammad_30k
@nurmukhammad_30k Жыл бұрын
kzbin.info/www/bejne/l4vNiqVubNdkmJo Watch the video of this guy. Very well explained how b-tree and b+tree work.
@user-xedwsg
@user-xedwsg 2 жыл бұрын
why do you speak like this?! hahahaha, you're not american my brother
@user-xedwsg
@user-xedwsg 2 жыл бұрын
Nonsense accent
Understanding B-Trees: The Data Structure Behind Modern Databases
12:39
🌊Насколько Глубокий Океан ? #shorts
00:42
Haha😂 Power💪 #trending #funny #viral #shorts
00:18
Reaction Station TV
Рет қаралды 14 МЛН
Database Indexing Explained (with PostgreSQL)
18:19
Hussein Nasser
Рет қаралды 294 М.
The cost of Hash Tables | The Backend Engineering Show
25:26
Hussein Nasser
Рет қаралды 34 М.
Why do databases store data in B+ trees?
29:43
Arpit Bhayani
Рет қаралды 29 М.
10.2  B Trees and B+ Trees. How they are useful in Databases
39:41
Abdul Bari
Рет қаралды 1,1 МЛН
Consistent Hashing | The Backend Engineering Show
23:54
Hussein Nasser
Рет қаралды 39 М.
B-Tree Tutorial - An Introduction to B-Trees
12:20
Fullstack Academy
Рет қаралды 320 М.
Can Redis be used as a Primary database?
12:18
Hussein Nasser
Рет қаралды 50 М.
Low Price Best 👌 China Mobile 📱
0:42
Tech Official
Рет қаралды 717 М.
В России ускорили интернет в 1000 раз
0:18
Короче, новости
Рет қаралды 728 М.
Игровой Комп с Авито за 4500р
1:00
ЖЕЛЕЗНЫЙ КОРОЛЬ
Рет қаралды 1,4 МЛН
💅🏻Айфон vs Андроид🤮
0:20
Бутылочка
Рет қаралды 740 М.