Clustered Collections makes Mongo faster but there is a cost

Рет қаралды 23,106

Hussein Nasser

Күн бұрын

Пікірлер: 30

@hnasr Жыл бұрын

fundamentals of database engineering course database.husseinnasser.com

@Ghost_1823 Жыл бұрын

We are heavily using clustered index in our app. But one drawback was use of UUID and creating own clustered index. Thanks this video helped to avoid bottleneck

@ketembo Жыл бұрын

Day 1 of waiting for Hussein to make a video on consensus algorithms

@hnasr Жыл бұрын

i tried to read into them few months ago and haven’t picked up the pace.

@husreason Жыл бұрын

Can we please get a video on secrets management? Love the breadth of topics you have covered on your channel (thankk you so much!), but this topic seems to be missing, so I'd love to learn it from you!

@joshcho96 Жыл бұрын

Thank you so much for your insight everytime :) I am learning so much from your videos.

@EddyCaffrey Жыл бұрын

Great video. It is a great addition to the database.

@juliussakalys4684 Жыл бұрын

Whenever possible UUID strings should be converted to binary and stored as binary in the DB itself. This way it takes 16 bytes, compared to "string-stored" 36 bytes.

@mohammedabdulbary1577 Жыл бұрын

another amazing video, love you man ❤

@tesla1772 Жыл бұрын

Since b trees are aslo storede in files and pages. Do db fetched entire btree when an index scan/seek has to be done

@ВоробійВіталій 2 ай бұрын

great, thx

@adarshk7 Жыл бұрын

About the secondary index being preferred, I could imagine a composite index being more selective, where the > 2 IO would be less of a cost than the lost selectiveness. Maybe more so in range queries. So I guess it depends on your query in the end (where if you wanted custom behaviour you could even go for $hint). What do you think?

@pemessh Жыл бұрын

Quick question, why did they go with the recordid way in the first place?

@hrmeet0509 Жыл бұрын

+1 on the same question

@hnasr Жыл бұрын

if I would make a guess, it’s technical debt. because of their original model when they first shipped MMAPv1. they had a single btree with a diskloc pointer directly to disk. that model is simple but had alot of problems mainly the use of mmap and didn’t have full acid support and MVCC . in 2014 they bought WiredTiger and that had the btree with the recordid. so it was easier to integrate is to replace the diskloc pointer with a recordid and keep all architecture the same.. otherwise it will require major rewrite it seems they did this big change in 5.3 as clustered collection

@pemessh Жыл бұрын

@@hnasr I see. That's interesting. Thank you for the answer.

@marsha363 Жыл бұрын

Awesome talk as always! Regarding 18:00, why would you want to do a query with the _id, and another filter, while the _id is unique? For kind of “is exist” query?

@hnasr Жыл бұрын

one example is a range query, give me all documents between id10 and 50 and having certain field is particular value , if that field is indexed it will be preferred over id

@EddyWilson-k3c 9 ай бұрын

Can you shard a clustered collection?

@bashardlaleh2110 Жыл бұрын

IDK if my question is valid but in minute 9:00 it's not clear why you assume that reading a range of IDs from the visible index would be faster than the hidden index, why chances are those IDs being in one page is higher than chances of that being in the hidden index? doesn't this depend on how we are writing records? why writing in the visible index is next to each other but in the hidden is random?!

@burunkul Жыл бұрын

why won't mongodb team make a clustered index a default one?

@hnasr Жыл бұрын

i envision it being default in few years once they iron out the bugs and limitations . which will makes it close to mysql innodb

@oddym5788 Жыл бұрын

Where did you books and sword go :(

@hnasr Жыл бұрын

I moved office, they are on my side now 😄

@JinKee Жыл бұрын

Why is SQL so much faster than NoSQL?

@stevefox7418 Жыл бұрын

Indexing, structured data etc.

@Aditya24234 Жыл бұрын

That depends a lot on your workload, MongoDB can certainly outperform SQL by a huge magnitude provided that you have designed your schema that suits and fits NoSQL and similarly there will be certain workloads where SQL would run faster. A big chunk of that performance is also dependent on the configuration and the type of deployments you are running.

@tonyhart2744 Жыл бұрын

You mean the other way around ???, most scalable database on planet use NoSQL, Vitess,Cassandra,ScyllaDB etc

@jenkins9202 Жыл бұрын

In general it's the opposite, unless you're abusing NoSQL they should outperform any SQL database due to having relaxed ACID guarantees. You'll find most big tech companies had to eventually migrate to a NoSQL database because of SQL being a performance bottleneck when you're at a massive scale, e.g. Twitter, Facebook, Instagram etc. Of course it all depends on your domain, some use-cases require strong consistency guarantees with relational data which doesn't leave you with much choice but to use an RDBMS.