Clustered Collections makes Mongo faster but there is a cost

  Рет қаралды 22,175

Hussein Nasser

Hussein Nasser

Күн бұрын

Fundamentals of Database Engineering udemy course (link redirects to udemy with coupon)
database.husseinnasser.com
In version 5.3, MongoDB introduced a feature called clustered collection which stores documents in the _id index as oppose to the hidden wiredTiger hidden index. This eliminates an entire b+tree seek for reads using the _id index and also removes the additional write to the hidden index speeding both reads and writes.
However like we know in software engineering, everything has a cost. This feature does come with a few. In this video I discuss the following
0:00 Intro
1:00 How Original Collections Work
6:50 How Clustered Collections Work
10:00 Benefits
17:00 Limitations
Fundamentals of Backend Engineering Design patterns udemy course (link redirects to udemy with coupon)
backend.husseinnasser.com
Fundamentals of Networking for Effective Backends udemy course (link redirects to udemy with coupon)
network.husseinnasser.com
Follow me on Medium
/ membership
Introduction to NGINX (link redirects to udemy with coupon)
nginx.husseinnasser.com
Python on the Backend (link redirects to udemy with coupon)
python.husseinnasser.com
Become a Member on KZbin
/ @hnasr
Buy me a coffee if you liked this
www.buymeacoffee.com/hnasr
Arabic Software Engineering Channel
/ @husseinnasser
🔥 Members Only Content
• Members-only videos
🏭 Backend Engineering Videos in Order
backend.husseinnasser.com
💾 Database Engineering Videos
• Database Engineering
🎙️Listen to the Backend Engineering Podcast
husseinnasser.com/podcast
Gears and tools used on the Channel (affiliates)
🖼️ Slides and Thumbnail Design
Canva
partner.canva.com/c/2766475/6...
Stay Awesome,
Hussein

Пікірлер: 29
@hnasr
@hnasr Жыл бұрын
fundamentals of database engineering course database.husseinnasser.com
@ke_mbo
@ke_mbo Жыл бұрын
Day 1 of waiting for Hussein to make a video on consensus algorithms
@hnasr
@hnasr Жыл бұрын
i tried to read into them few months ago and haven’t picked up the pace.
@joshcho96
@joshcho96 Жыл бұрын
Thank you so much for your insight everytime :) I am learning so much from your videos.
@mohammedabdulbary1577
@mohammedabdulbary1577 Жыл бұрын
another amazing video, love you man ❤
@Ghost_1823
@Ghost_1823 6 ай бұрын
We are heavily using clustered index in our app. But one drawback was use of UUID and creating own clustered index. Thanks this video helped to avoid bottleneck
@husreason
@husreason Жыл бұрын
Can we please get a video on secrets management? Love the breadth of topics you have covered on your channel (thankk you so much!), but this topic seems to be missing, so I'd love to learn it from you!
@EddyCaffrey
@EddyCaffrey Жыл бұрын
Great video. It is a great addition to the database.
@tesla1772
@tesla1772 Жыл бұрын
Since b trees are aslo storede in files and pages. Do db fetched entire btree when an index scan/seek has to be done
@adarshk7
@adarshk7 Жыл бұрын
About the secondary index being preferred, I could imagine a composite index being more selective, where the > 2 IO would be less of a cost than the lost selectiveness. Maybe more so in range queries. So I guess it depends on your query in the end (where if you wanted custom behaviour you could even go for $hint). What do you think?
@marsha363
@marsha363 Жыл бұрын
Awesome talk as always! Regarding 18:00, why would you want to do a query with the _id, and another filter, while the _id is unique? For kind of “is exist” query?
@hnasr
@hnasr Жыл бұрын
one example is a range query, give me all documents between id10 and 50 and having certain field is particular value , if that field is indexed it will be preferred over id
@juliussakalys4684
@juliussakalys4684 Жыл бұрын
Whenever possible UUID strings should be converted to binary and stored as binary in the DB itself. This way it takes 16 bytes, compared to "string-stored" 36 bytes.
@bashardlaleh2110
@bashardlaleh2110 Жыл бұрын
IDK if my question is valid but in minute 9:00 it's not clear why you assume that reading a range of IDs from the visible index would be faster than the hidden index, why chances are those IDs being in one page is higher than chances of that being in the hidden index? doesn't this depend on how we are writing records? why writing in the visible index is next to each other but in the hidden is random?!
@user-he4bl4tb4h
@user-he4bl4tb4h 4 ай бұрын
Can you shard a clustered collection?
@pemessh
@pemessh Жыл бұрын
Quick question, why did they go with the recordid way in the first place?
@hrmeet0509
@hrmeet0509 Жыл бұрын
+1 on the same question
@hnasr
@hnasr Жыл бұрын
if I would make a guess, it’s technical debt. because of their original model when they first shipped MMAPv1. they had a single btree with a diskloc pointer directly to disk. that model is simple but had alot of problems mainly the use of mmap and didn’t have full acid support and MVCC . in 2014 they bought WiredTiger and that had the btree with the recordid. so it was easier to integrate is to replace the diskloc pointer with a recordid and keep all architecture the same.. otherwise it will require major rewrite it seems they did this big change in 5.3 as clustered collection
@pemessh
@pemessh Жыл бұрын
@@hnasr I see. That's interesting. Thank you for the answer.
@burunkul
@burunkul Жыл бұрын
why won't mongodb team make a clustered index a default one?
@hnasr
@hnasr Жыл бұрын
i envision it being default in few years once they iron out the bugs and limitations . which will makes it close to mysql innodb
@oddym5788
@oddym5788 Жыл бұрын
Where did you books and sword go :(
@hnasr
@hnasr Жыл бұрын
I moved office, they are on my side now 😄
@JinKee
@JinKee Жыл бұрын
Why is SQL so much faster than NoSQL?
@stevefox7418
@stevefox7418 Жыл бұрын
Indexing, structured data etc.
@Aditya24234
@Aditya24234 Жыл бұрын
That depends a lot on your workload, MongoDB can certainly outperform SQL by a huge magnitude provided that you have designed your schema that suits and fits NoSQL and similarly there will be certain workloads where SQL would run faster. A big chunk of that performance is also dependent on the configuration and the type of deployments you are running.
@tonyhart2744
@tonyhart2744 Жыл бұрын
You mean the other way around ???, most scalable database on planet use NoSQL, Vitess,Cassandra,ScyllaDB etc
@jenkins9202
@jenkins9202 Жыл бұрын
In general it's the opposite, unless you're abusing NoSQL they should outperform any SQL database due to having relaxed ACID guarantees. You'll find most big tech companies had to eventually migrate to a NoSQL database because of SQL being a performance bottleneck when you're at a massive scale, e.g. Twitter, Facebook, Instagram etc. Of course it all depends on your domain, some use-cases require strong consistency guarantees with relational data which doesn't leave you with much choice but to use an RDBMS.
@user-sv3bt7eh9d
@user-sv3bt7eh9d 9 ай бұрын
#bukopin #mandiri #britama #deposito greentea_metrimini@graharaya
A Deep Dive in How Slow SELECT * is
39:24
Hussein Nasser
Рет қаралды 36 М.
B-tree vs B+ tree in Database Systems
31:50
Hussein Nasser
Рет қаралды 48 М.
2000000❤️⚽️#shorts #thankyou
00:20
あしざるFC
Рет қаралды 11 МЛН
Шокирующая Речь Выпускника 😳📽️@CarrolltonTexas
00:43
Глеб Рандалайнен
Рет қаралды 11 МЛН
The Big Problem With "Serverless"
6:34
Be A Better Dev
Рет қаралды 24 М.
They made Kafka 80% faster by switching file systems
31:30
Hussein Nasser
Рет қаралды 32 М.
Prime Video Swaps Microservices for Monolith: 90% Cost Reduction
35:10
Hussein Nasser
Рет қаралды 156 М.
Who wins in 2024? MongoDB vs PostgreSQL full comparison!
4:48
Jelvix | TECH IN 5 MINUTES
Рет қаралды 5 М.
Demystifying Sharding in MongoDB
36:00
MongoDB
Рет қаралды 7 М.
Database Sharding and Partitioning
23:53
Arpit Bhayani
Рет қаралды 62 М.
All Postgres Locks Explained | A Deep Dive
48:23
Hussein Nasser
Рет қаралды 31 М.
WebTransport is a Game Changer Protocol
14:23
Hussein Nasser
Рет қаралды 53 М.
Эффект Карбонаро и бумажный телефон
1:01
История одного вокалиста
Рет қаралды 2,8 МЛН
Урна с айфонами!
0:30
По ту сторону Гугла
Рет қаралды 2,4 МЛН
iPhone 15 Unboxing Paper diy
0:57
Cute Fay
Рет қаралды 2,5 МЛН
ПК с Авито за 3000р
0:58
ЖЕЛЕЗНЫЙ КОРОЛЬ
Рет қаралды 2 МЛН