Avoid premature Database Sharding

Рет қаралды 16,089

Күн бұрын

Пікірлер: 61

@hnasr 3 жыл бұрын

Learn the fundamentals of database systems to understand and build performant backend apps Check out my udemy Introduction to Database Engineering husseinnasser.com/courses

@eslama-elwafa2 3 жыл бұрын

summary: "only solve problems that you have, don't solve the problem you don't have"

@hnasr 3 жыл бұрын

🍑 Hussein 🍑 Hussein 🍑 Hussein 🍑 Hussein

@hardikp5192 3 жыл бұрын

My answer to "what's going on guys" is, It's my first day on the job, I got an Internship in a startup company, BIG THANKS to you sir, because of you I became passionate about Backend Engineering, so, I'm doing great and I'm grateful. Bless you and your family

@sathyajithps013 3 жыл бұрын

Thanks for doing this video Hussein. As a new backend dev, I've always wondered about this and no one has answered it in detail as you have.

@hazemabdelalim5189 3 жыл бұрын

Hello Hussein , I am working with Couchbase Database which can update the indexes asynchronously , so even if you have indexes insertion will be still O(1) , however it will take time for the index tree to be re-built but it will be re-balanced eventually .

@hnasr 3 жыл бұрын

Interesting Hazem! Thanks for sharing

@AleksandarT10 3 жыл бұрын

You got a good point Hussein. Partitioning/Sharding does not make sense for most of the use cases. Where it makes sense as you said is when you have Billions of Rows and Terabytes of data (which only Big Enterprises do). I work for a Big ECommerce Company Headquartered in Silicon Valley and before migrating to Cloud, we were using Oracle DB with Sharding. We were using several physical servers where we had the UserId going through a "Hashing" Function which gave us the number of the shard. Based on our query "patterns" analysis, we decided that we can go to "DynamoDB" (NoSQL) and we moved the data there in a single table. DynamoDB Internally uses partitioning/sharding itself, but this is not visible to the end user, which means that the complexity of that gets offloaded to AWS.

@hnasr 3 жыл бұрын

Thanks for sharing ! Great use case

@TrippMotil 5 күн бұрын

Great video as always! 👍 I’ve got a question: 🤨 I have a set of words 🤷‍♂️. (behave today finger ski upon boy assault summer exhaust beauty stereo over). What should I do with this? 🤷‍♂️

@LearnToCodeAcademy 3 жыл бұрын

Just purchased the course... I love your contents honestly... They are beautiful

@hnasr 3 жыл бұрын

Thank you so much ❤️

@FISS007 3 жыл бұрын

KZbin, just give me the possibility to give more than one like !!! This guy deserves tons of likes :D

@isaacfrost9798 3 жыл бұрын

I think using hash Partitioning is a good idea if your table has the potential of adding up more rows on the client growth for example a URL shorter table.

@javisartdesign 3 жыл бұрын

Thanks for the video. The first statement is not totally right, inserts with lot of constraints and foreign keys with hundred or millions can be also slow.

@Flankymanga 3 жыл бұрын

16:03 YAGNI priniple and KISS... implement the advanced stuff only when you know you are going to need it....

@hnasr 3 жыл бұрын

Yes! I talked about YAGNI here kzbin.info/www/bejne/sHm2k4CbZpqWaLM

@mihailnesterov 3 жыл бұрын

Hussein, thanks for your videos!

@colinmaharaj 3 жыл бұрын

I want to write my own database engine. Only update the index for crying out loud, if and only iff you do a select query. Or you can have a sub index as a temporary thing then you merge after a while

@hnasr 3 жыл бұрын

I see where your going but I don’t know if that will work. Your select will be slower and more importantly it will have to acquire row exclusive locks on the parts of the index that needs to get updated. This will slow down things even more because of waiting events.

@colinmaharaj 3 жыл бұрын

@@hnasr Hmm, I hear you. Ill figure it out one day.

@romantsyupryk3009 3 жыл бұрын

Thanks very much for this video.

@cse2335 3 жыл бұрын

Hey Nasser, Which one we should prefer ORM OR Non-ORM(Query) can you light on performances and complexities! Thank you🙏

@hnasr 3 жыл бұрын

Spoke about that before, I prefer raw SQL where I have full control, ORM can introduce extra layer of complexity and invisibility to what really is happening..

@cse2335 3 жыл бұрын

@@hnasr Okay Thanks!

@nishantkamboj5973 2 жыл бұрын

Hey Hussein. I have a question. If we have a clustered index for email column or any other non auto incremented column in MySQL then the write operation will need to find the correct place to insert the row. Is it correct? Is so then I think that's another reason to use auto incremented primary keys for the tables.

@hnasr 2 жыл бұрын

That is correct, the moment you cluster on a "random" field like email or UUID, you risk slow down inserts because the db need to find where to insert this new row so it fits the clustered page. You also risk something called page splits which drastically decrease performance. unclustered tables don't have this problem, everything goes in the end unsorted.

@javaprogramming7235 7 ай бұрын

I want to ask following this question, does it mean to create PK based on some information from the record would help in performance of querying For example Inserting a new subscription record in database we can create pk something like this SUB_02022024_HASH does this will help in querying at the cost time the server will take to create this PK?

@mudrRock 3 жыл бұрын

Great video, also, not related to the video but still wanted to ask it. Do you have some sort of list with recommended books?

@dibyaranjanmishra4272 3 жыл бұрын

Can you make a video on Rocks DB?

@hnasr 3 жыл бұрын

Good idea

@bisakhmondal8371 3 жыл бұрын

If possible, HyPer as well in future :)

@whotookmyign 3 жыл бұрын

Hussein bro please make a video on ,everything about live video streaming app. how can we implement it and what would be the best approach and which data base would be the best. for both live video streaming and normal video streaming like twitch and netflix . please explain about the complete backend architecture with example thank you. i am pretty sure it would help a lot of people. theres not much insightful video out there about this topic on youtube

@hnasr 3 жыл бұрын

This should help kzbin.info/www/bejne/Z16unn-ZZchlqc0 kzbin.info/www/bejne/pqmTmmyfncidq68

@whotookmyign 3 жыл бұрын

@@hnasr omg how did i not see it 😳 tysmmm ☺☺ absolutely love your content

@ThinkSoftware 3 жыл бұрын

Amazing video :)

@saiavinashduddupudi8975 3 жыл бұрын

@Hussein at 00:58, what do you mean by a VANILLA INSERT?

@hnasr 3 жыл бұрын

Inserting a single row where all values are in the statement is simple doesn’t have performance impact (vanilla insert) However you can do an insert that queries the table (insert into select) that has huge impact on the size of the table

@saiavinashduddupudi8975 3 жыл бұрын

@@hnasr got it...it means a simple insert at end of table

@0xfaizan 3 жыл бұрын

Plz make short videos of about 5 to 8 minutes

@mugiwaranoadi5932 3 жыл бұрын

Why should he? What are the benefits for him and his viewers? Give him context, otherwise it only seems like a selfish wish.

@teunohooijer6788 3 жыл бұрын

is this the same lsm tree as the one on Wikipedia? that one was invented in 1991 an published in a paper in 1996. seems like it can't be a google invention then. love your videos.

@hnasr 3 жыл бұрын

Correct the original paper is 1996, but google pushed the tech with Leveldb and made it practical

@teunohooijer6788 3 жыл бұрын

@@hnasr that explains why you say google, a database in use is of course practical. thanks for the reply.

@sherbineyb7166 3 жыл бұрын

hussein , is it possible to update indexes asynchronously or it is set by default synchronous and cant be changed ? i understand the drawbacks of this , but is this even possible ?

@hnasr 3 жыл бұрын

Technically it’s possible of course but I am not sure if any database engine exposes such feature. Its even more efficient because you can batch those index updated and do them at once. As long as the engineer understands the consequences its a complex thing to implement though

@Nexjsdeveloper 3 жыл бұрын

Its amaizing

@tikz.-3738 3 жыл бұрын

Hey Hussein I have a query. Say I'm building a app. There are many topics and those topics contain different content . The user subscribe to a topic now I only want to fetch the stuff that user doesn't have means newly added content how can that be done? The way I think it is to compare array of content ids(on user and inside topic table) by length for each topic and then fetch the remaining content but this sounds very bad due to multiple query per every user. Im using mongoDB which does support look ups to different tables should that be needed

@hnasr 3 жыл бұрын

I would suggest using a queue or a pub/sub system such as RabbitMQ or Kafka. This feature is there by default

@tikz.-3738 3 жыл бұрын

@@hnasr so basically I'm new to this stuff so I was thinking this whole method itself is slow but as u say I need to use this only right I tried for days but I couldn't find a better solution, even tried aggregation pipelines but all require me to do multiple queries and finding diffrence of arrays

@tikz.-3738 3 жыл бұрын

@@hnasr I have seen ur videos on all those systems but as u say I wanted to keep it as simple as I can so I thought to myself the method to query itself is soo bad I must try redesigning but I can't get a better idea on model I thought of doing some courses on data modelling 😂

@tikz.-3738 3 жыл бұрын

Also one more follow up question the apps supposed to inform users about new contents as notification websockets would be expensive for my app(I think) so I thought about short polling every hr(set of users come and ask in 1hr and then another set comes) or something by doing some distribution of users with latitude and time zone on client (no idea how to do it) just a thought yet Thanks for ur replies and time ❤️

@willi1978 3 жыл бұрын

@@tikz.-3738 queuing system (kafka, rabbitmq) or periodic polling by clients are possible. I guess if you already have an existing app and want to add near time functionality then adding polling is easier. but this will mean you have periodic requests for each online user. you have to weigh the pros and cons of the two approaches.

@hnasr 3 жыл бұрын

If you enjoyed this video consider checking out my Introduction to Database Engineering course. Its a bestseller on udemy with over 14 hours worth of content and students love it ! husseinnasser.com/courses