Database Sharding and Partitioning

  Рет қаралды 60,996

Arpit Bhayani

Arpit Bhayani

Күн бұрын

System Design for SDE-2 and above: arpitbhayani.me/masterclass
System Design for Beginners: arpitbhayani.me/sys-design
Redis Internals: arpitbhayani.me/redis
Build Your Own Redis / DNS / BitTorrent / SQLite - with CodeCrafters.
Sign up and get 40% off - app.codecrafters.io/join?via=...
In the video, I discussed the importance of sharding and partitioning in scaling systems. Sharding distributes data across multiple machines for improved throughput and availability. We explored how databases evolve through stages, the differences between sharding and partitioning, and when to introduce these concepts. I also highlighted the benefits of a collaborative system design course I offer. Scaling databases vertically involves increasing resources, while horizontal scaling adds more servers for higher throughput. Sharding splits data across shards, while partitioning divides data within a shard. Strategic partitioning is crucial for efficient data management.
Recommended videos and playlists
If you liked this video, you will find the following videos and playlists helpful
System Design: • PostgreSQL connection ...
Designing Microservices: • Advantages of adopting...
Database Engineering: • How nested loop, hash,...
Concurrency In-depth: • How to write efficient...
Research paper dissections: • The Google File System...
Outage Dissections: • Dissecting GitHub Outa...
Hash Table Internals: • Internal Structure of ...
Bittorrent Internals: • Introduction to BitTor...
Things you will find amusing
Knowledge Base: arpitbhayani.me/knowledge-base
Bookshelf: arpitbhayani.me/bookshelf
Papershelf: arpitbhayani.me/papershelf
Other socials
I keep writing and sharing my practical experience and learnings every day, so if you resonate then follow along. I keep it no fluff.
LinkedIn: / arpitbhayani
Twitter: / arpit_bhayani
Weekly Newsletter: arpit.substack.com
Thank you for watching and supporting! it means a ton.
I am on a mission to bring out the best engineering stories from around the world and make you all fall in
love with engineering. If you resonate with this then follow along, I always keep it no-fluff.

Пікірлер: 135
@visheshjindal1073
@visheshjindal1073 4 күн бұрын
I have just started learning System Design and not from a backend background. Still able to understand the concepts. Thanks for creating such content.
@vipulsharma5140
@vipulsharma5140 Күн бұрын
This is such a great and simple explanation of partitioning and sharding of a database. Would love to watch the video on partitioning strategies when it is uploaded.
@shishirchaurasiya7374
@shishirchaurasiya7374 11 ай бұрын
I was literally consfused in gaining the clarity untill you came to the point where you transposed this theory into understanding through tables and the reference with SQL queries, thanks a lot to your efforts for this loving beautiful explaination Arpit sir
@ranjithpals
@ranjithpals 2 жыл бұрын
Thanks a lot ! That was well explained with clear and concise explanation. Looking forward to enrolling in your complete system design course.
@nuclearniraj
@nuclearniraj 9 ай бұрын
One video and all the clutter on Sharding and Partitioning is clear. Thank you so much Arpit.
@jaskiratwalia
@jaskiratwalia 3 ай бұрын
Wonderfully explained! Cleared all my doubts. Please keep making such videos. These are also well timed, not too short nor too long.
@AlokMehta24
@AlokMehta24 9 ай бұрын
Excellent video Arpit . Coming from no software and system engineering background , this was the best video to explain data sharding and partioning . I am a Tech PM for AWS Supply Chain and data partitioning and sharding is real deal for us. Thank for making this extremely easy to understand video
@neerajdixit7102
@neerajdixit7102 Жыл бұрын
Awesome Arpit, Thanks truly admire your way of teaching
@nimitkanani1691
@nimitkanani1691 Жыл бұрын
Very beautifully and simply explained. The content of the video flowed so smoothly. Thank You @ArpitBhayani
@___vandanagupta___
@___vandanagupta___ Жыл бұрын
The knowledge of amount in this video is tremendous!!! Extremely helpful 👍👍👍 thankyou sir!!
@chaitanyawaikar382
@chaitanyawaikar382 2 жыл бұрын
One of the best videos explaining the nuances between partitioning and sharding. Thank you @ArpitBhayani
@AqibJavaid-zl7vc
@AqibJavaid-zl7vc 2 ай бұрын
Excellent video ❤. Finally, I got a good grasp of the whole concept.
@jithinb7047
@jithinb7047 10 ай бұрын
Awesome content Arpit ! Thanks a lot and please do continue post more on concepts such as well as analysis of real use cases.
@Jamsessions0
@Jamsessions0 Ай бұрын
One of the best explanations on the internet, well done sir
@kritibindra4232
@kritibindra4232 Жыл бұрын
Wow this was really really helpful! Thank you posting this.✨
@timamet
@timamet Жыл бұрын
amazing explanations, thank you
@codecspy3479
@codecspy3479 6 ай бұрын
2 Important points which i felt could be discussed more are 1) When you said the choice of partitioning depends on the load , usecase and access patterns , can you please give an example of each case ?? 2) When you were talking about the advantages and disadvantages of sharding , have you written these points considering only sharding and no partitioning or have you written considering both sharding and partitioning ??
@aditijalaj5036
@aditijalaj5036 9 ай бұрын
this is an amazing video and your explainations are very clear
@mohitkumartoshniwal
@mohitkumartoshniwal 2 жыл бұрын
A very clear and detailed explanation. ♥️
@sameer1571
@sameer1571 5 ай бұрын
Bro your diagram example made my day. Such a clear and concise explanation of this topic. Bro dil se love u ❤❤ for making this video.
@iMakeYoutubeConfused
@iMakeYoutubeConfused 3 ай бұрын
Very clear explanation, thanks!
@anandahs6078
@anandahs6078 2 ай бұрын
Very good explanation with right examples. Hats off to you. Thanks for great content. I always thought shard and partitions are same but you clarified it very well.
@nikhilrajput8696
@nikhilrajput8696 2 ай бұрын
Wow...really nice. Nowadays a lot of people are selling and talking about system design and always try to build some optimistic solution straight forward without going into the internals and in fact they have not even worked on a lot of systems. I strongly feel the way of your explanation is very very nice and I am going to buy your system design plan to improve mine.
@AsliEngineering
@AsliEngineering 2 ай бұрын
Thanks. Looking forward to having you enrolled 🙌
@vamsidharvemuluri3817
@vamsidharvemuluri3817 2 ай бұрын
Best explanation so far. thanks brother
@varshard0
@varshard0 4 ай бұрын
thank you. I always assumed that they are the same thing. This cleared things up for me.
@pixiedustdreams
@pixiedustdreams Ай бұрын
I think I'm in love with this guy. 😢
@KishoreThatavarthi
@KishoreThatavarthi 4 ай бұрын
thanks a lot arpit sir really enjoyed and got full clarity
@zeyuli53
@zeyuli53 Жыл бұрын
well explained, thank you
@Sharmasurajlive
@Sharmasurajlive Жыл бұрын
Simple and efficient explanation 👍🏻
@jasper5016
@jasper5016 3 ай бұрын
Thanks so much Arpit!!
@DEEPAKKUMAR-wk5pk
@DEEPAKKUMAR-wk5pk Жыл бұрын
Wow great explanation
@KriszSch
@KriszSch 2 ай бұрын
Great explanation!
@akshayrahangdale8511
@akshayrahangdale8511 6 ай бұрын
Very Nice Video, I just loved the explanation.
@kaal_bhairav_23
@kaal_bhairav_23 2 ай бұрын
thanks a lot arpit for an awesome explanation as always
@TechSpot56
@TechSpot56 2 ай бұрын
Nice explaination, arpit.
@kalinduabeysinghe8917
@kalinduabeysinghe8917 10 ай бұрын
Such a clean explanation🙌
@pramodpatil-ue8sm
@pramodpatil-ue8sm 7 ай бұрын
Great explanation, as always. Please post a link If you have recorded any video on Partitioning strategies
@dhaanaanjaay
@dhaanaanjaay Жыл бұрын
One question, at 21.00 the matrix shows what it looks like when we have both sharding and partioning, how that is different from having two databases on two different EC2 instance for two applications?
@hanzalasiddique6313
@hanzalasiddique6313 Жыл бұрын
Mind Blowing ❤
@vijaymunavalli335
@vijaymunavalli335 Жыл бұрын
Its very practical explanation...cool one
@anshujaiswal5622
@anshujaiswal5622 Ай бұрын
Simple and to the point explanation .. Thanks Arpit, Liked & Subscribed :)
@letsexplorewithanika2642
@letsexplorewithanika2642 Жыл бұрын
Very clear explaination
@prashantkamble898
@prashantkamble898 10 ай бұрын
Greatly explained
@lazry1773
@lazry1773 Жыл бұрын
Dude this was amazing
@aneksingh4496
@aneksingh4496 9 ай бұрын
super video Arpit
@shintojoseph9166
@shintojoseph9166 Жыл бұрын
Clear explanation
@ryan-bo2xi
@ryan-bo2xi 11 ай бұрын
bohot badhia bhai .. lajawwab
@PoojaDurgi
@PoojaDurgi 8 ай бұрын
Amazing !!
@heykalyan
@heykalyan Жыл бұрын
Kudos to you❤
@shreyanshsinha37
@shreyanshsinha37 Жыл бұрын
When we say Shard1 or Shard2, do we mean the sql server hosted on the EC2 instance combinedly as a shard?
@amananurag07
@amananurag07 Ай бұрын
@arpit Thanks for such dense information in so short and simple video. However I have a query on a corner case - How can have replicas when one has multiple shards with partitioning? - In this case is replication locally on the shard or it can also be replicated on other shards for high availability across avalability zone or DR (like kafka architecture)?
@ankitmaheshwari2341
@ankitmaheshwari2341 11 ай бұрын
Do we use sharding when we have better options available like Oracle RAC where database can be scaled horizontally
@jivanmainali1742
@jivanmainali1742 2 жыл бұрын
Arpit sir I need your help clearifying few doubts In ecommerce platform like shopify each mechant is given their own collection for order cart account differentiated by some merchant identifier (projectId-order ) vs Same order table index by merchant ideidentifier ie projectId.So we can't apply sharding in first case. Also is it wise idea to deploy each merchant application separately as we would have to maintain each merchant app separately.So what do you suggest in those case?
@sarthaknarayan2159
@sarthaknarayan2159 Жыл бұрын
Awesome!!!!
@sumeetsingh1729
@sumeetsingh1729 3 ай бұрын
how's it decided which shard is hit by request? Is there any router in front ensuring routing of requests?
@ranjithpals
@ranjithpals 2 жыл бұрын
Thanks!
@pranjalchoudhury1670
@pranjalchoudhury1670 4 ай бұрын
Nicely expalined. :)
@hemsagarpatel8992
@hemsagarpatel8992 Жыл бұрын
If we had horizontal partitioning and 1 partition getting so much traffic in real time how can we load balance the traffic. is it possible
@vikasbhutra9400
@vikasbhutra9400 2 жыл бұрын
Thanks a lot Arpit for explaining in so simplistic way. One request can you please make video on Sharding strategies and also on how composite indexes stores in the disk.
@AsliEngineering
@AsliEngineering 2 жыл бұрын
Soon.
@hc90919
@hc90919 Жыл бұрын
@asli engineering - Bhai, any update on the sharding strategies. Also, one more request is examples of scenarios to explain shard key selection. How is the data replicated behind the scenes n stuff please ?
@tawseefbhat977
@tawseefbhat977 Жыл бұрын
how do we know which partition or shard our data is located when we make query? any detailed explantion
@user-dq8sg4ik5k
@user-dq8sg4ik5k 10 ай бұрын
literally one of the based video i have ever seen on this topic.
@rahulpanjwani1887
@rahulpanjwani1887 Жыл бұрын
Beautiful
@rahulpanjwani1887
@rahulpanjwani1887 Жыл бұрын
It makes you understand the value of a unified data platform team when scale increases.
@ohmygosh6176
@ohmygosh6176 Жыл бұрын
Cross sharding quiries very very expensive. Its best to use tools to find out how the database is being used before making these decisions. I use PG Analizer tool for PostgreSQL
@aditigupta6870
@aditigupta6870 4 ай бұрын
Hello arpit, at 5:49, why you mentioned that the new resources are being allocated to the EC2 machine? I think that should be allocated to the DB server running on EC2 machine right?
@AsliEngineering
@AsliEngineering 4 ай бұрын
I meant the server running the database. The database is eventually running on some VM.
@aditigupta6870
@aditigupta6870 4 ай бұрын
@@AsliEngineering thanks arpit
@GaneshSrivatsavaGottipati
@GaneshSrivatsavaGottipati Ай бұрын
what if we have read replicas and still have partitioning?
@aditiagarwal7081
@aditiagarwal7081 Ай бұрын
When running two databases on the same machine, are we not still sharing the same underlying resources such as CPU, memory, and disk I/O?
@likith1337
@likith1337 6 күн бұрын
How can u run two sql daemon on the same machine?
@kritibindra4232
@kritibindra4232 Жыл бұрын
Also which software did you use in this video to create pictures and write content?
@AsliEngineering
@AsliEngineering Жыл бұрын
GoodNotes
@imperfecto7734
@imperfecto7734 9 ай бұрын
@arpit what's the benefit of partitioning the data but not sharding it. Can you give me a usecase please?
@AsliEngineering
@AsliEngineering 9 ай бұрын
Partitioning allows your database to read/access/move the required subset of data easily and efficiently. 1. Imagine if you partition data by time and create one partition for every hour and someone queries how many events happened in the last 10 hours, you would just need to access last 10 partition to fulfil this query. Others are not even required to be read. 2. In a distributed setup, instead of moving individual rows/elements we can easily and efficiently move partitions across the cluster for balancing the load.
@imperfecto7734
@imperfecto7734 9 ай бұрын
Understood! Thanks 🙏
@aditigupta6870
@aditigupta6870 4 ай бұрын
One shard also must be having replicas right? I mean if a shard is handling the first 2 partitions, then all data from those first 2 partitions will go to this shard, but what if the shard is down?
@AsliEngineering
@AsliEngineering 4 ай бұрын
shared can have replicas to scale the reads. If the shard goes down, then either you auto promote replica to take over, or take the downtime.
@sachinjindal4921
@sachinjindal4921 2 жыл бұрын
Awesome, can you give some practical examples.
@AsliEngineering
@AsliEngineering 2 жыл бұрын
These are practical as they can get keeping it generic and not touching upon SRE side of things :) Every database comes it its own partitioning and sharding strategy and we need to go through their documentation to apply it. I talked about using a database proxy to bifurcate the request in one of the earlier videos, in case you are looking for that. Would recommend you picking a database and seeing how you can actually create shards and manage them. ElasticSearch can be a great start.
@abhigujjar7439
@abhigujjar7439 11 ай бұрын
Can you please share the notes
@Bluesky-rn1mc
@Bluesky-rn1mc 2 жыл бұрын
how foreign key constraints are managed when two tables are in different shards ?
@AsliEngineering
@AsliEngineering 2 жыл бұрын
Foreign keys are dropped when you adopt sharding. You cannot maintain FK when data is partitioned across multiple shards.
@Bluesky-rn1mc
@Bluesky-rn1mc 2 жыл бұрын
@@AsliEngineering thanks
@GaganJain2508
@GaganJain2508 10 ай бұрын
Does it mean Sharding and replication are the same? 22:16
@arbazadam3407
@arbazadam3407 Жыл бұрын
When you say we can have these partitions on the same server? That confuses me. On my linux server i installed MySQL which runs on port 3306. I have one MySQL process in this situation, so how can i spread the partition on this server.
@AsliEngineering
@AsliEngineering Жыл бұрын
multiple databases within same MySQL server.
@shrad6611
@shrad6611 7 ай бұрын
finally I understand what sharding is, thanks a ton
@geekmuralin
@geekmuralin 9 ай бұрын
Wow
@gigachad400
@gigachad400 Жыл бұрын
One of the biggest disadvantages of sharding over a SQL server is you lose the ACIDity so you have to be careful while you doing it with SQL databases
@ankitmaheshwari2341
@ankitmaheshwari2341 11 ай бұрын
I think that's not true
@iHariPatel
@iHariPatel 7 ай бұрын
As my view Partition is more complex because you have to work with partition key! With wrong query accidentally query scan all partition’s.
@sachthecool
@sachthecool Жыл бұрын
Hi Arpit... You have nice videos. I like interviewes with people involved in growing high scale systems. However in this video, concept explained is wrong. Partition & Shards are same (term is used interchangeably). What you are referring as Shard is Nodes (or host container). You may want to correct the same. Hope this helps.
@AsliEngineering
@AsliEngineering Жыл бұрын
I agree the terms are used interchangeably; but overall what i explained is correct also I cleared the same in the video as well.
@bikramjeetsarmah9995
@bikramjeetsarmah9995 4 күн бұрын
How can the api server know which database server shall be given this amount of load etc ??
@AsliEngineering
@AsliEngineering 4 күн бұрын
that is your routing strategy - range based or hash based or static routing. for example, all request for a user goes to a particular database and the ownership is determined by taking hash of user id i.e. f(user_id)%num_databases
@bikramjeetsarmah9995
@bikramjeetsarmah9995 4 күн бұрын
@@AsliEngineering how can this be implemented? Also if you can please can you make an system design with implementation video on microservices with like nodejs .. as i have understood the theoretical part but the implementation part is where am getting stuck and not understanding how to do that
@AsliEngineering
@AsliEngineering 4 күн бұрын
@@bikramjeetsarmah9995 you can get user ID from your auth token. Have an array of database connections in your API code and apply the function mentioned above.
@bikramjeetsarmah9995
@bikramjeetsarmah9995 4 күн бұрын
@@AsliEngineering o okay, thank you
@user-nu5nn7by6t
@user-nu5nn7by6t 17 күн бұрын
How we know in which shard our data resides?
@AsliEngineering
@AsliEngineering 17 күн бұрын
That depends on your routing strategy - Range/Hash/Static. In any case, you pick a partitioning key and depending on the approach you deduce which shard to go to.
@ManojYadav-ls6wo
@ManojYadav-ls6wo 2 ай бұрын
12:10 20:12 👍👍
@dbads
@dbads 2 жыл бұрын
💯
@anupkut
@anupkut 5 ай бұрын
I think we should not consider only read replicas as sharding concept.
@jineshbagrecha6278
@jineshbagrecha6278 Жыл бұрын
When to use master master, master candidate master replications?
@AsliEngineering
@AsliEngineering Жыл бұрын
master master - scaling writes beyond one machine master replica - scaling reads
@aadimanchekar1032
@aadimanchekar1032 Жыл бұрын
How do we know that in which partition does the data lie?
@AsliEngineering
@AsliEngineering Жыл бұрын
That's that partitioning strategy
@mudassarh4268
@mudassarh4268 2 жыл бұрын
Sharding strategies could have been taken up like range based and hash based sharing with their user case
@AsliEngineering
@AsliEngineering 2 жыл бұрын
Sir. Video would have been too long. No one would have watched it. But definitely planning it for the next one.
@mudassarh4268
@mudassarh4268 2 жыл бұрын
Definitely sirji that could have added another 30 mins of content. Awesome content as always and looking forward to further stuff 👍
@abhishekdhillon7110
@abhishekdhillon7110 6 ай бұрын
dude, the way you have explained higher availability as an advantage of sharding is not right. When you have a sharded DB and various shards live on different servers, if one of the shards go down, availability is not an advantage since you can't perform any operations on that specific shard which is not available. For example, if you have two shards named A and B, if shard is down or not available, you can't read anything from that shard so all of the queries that are expected to read from shard A would fail unless you have a read replica of that shard. I feel that there is a better way to explain it. However, thanks for all your efforts and your content is helpful to a large extent.
@AsliEngineering
@AsliEngineering 6 ай бұрын
Yes we cannot perform operation on that shard but we can still serve requests that can be served from the other shards. Hence the system still remains partially available.
@pranavnadimpalli4929
@pranavnadimpalli4929 Жыл бұрын
22:34 cross share queries are expensive
@kumarshubham4640
@kumarshubham4640 2 ай бұрын
Why course price exceeded by 20k in 1 year?
@AsliEngineering
@AsliEngineering 2 ай бұрын
In 2 years, not one. The course has changed completely and I go much more in-depth and the sessions go for 4 hours each. Earlier it used to be 2.5
@akshatreddy9870
@akshatreddy9870 3 ай бұрын
Hi
@arun10071990
@arun10071990 6 ай бұрын
I think sharding has specific use cases not every solution requires sharding. The way he arrives at sharding solution is totally absurd. If one really wants to scale the writes he can also upscale the master db servers. Why to shard then ?
@AsliEngineering
@AsliEngineering 6 ай бұрын
When did I not consider vertical scaling?
@arun10071990
@arun10071990 6 ай бұрын
@@AsliEngineering it's not about vertical scaling it's about we can scale database with horizontal scaling and that too without using sharding Like multiple master servers for writes and multiple slave servers to handle reads
@sharoonaustin551
@sharoonaustin551 Жыл бұрын
Small suggestion ad beech me mat daala karo bro, concentration toot jaata hai
@AsliEngineering
@AsliEngineering Жыл бұрын
KZbin daalta hai. I just enable them. It is upto their algorithm to decide where to place.
@AsliEngineering
@AsliEngineering Жыл бұрын
And I totally understand your frustration with ads but the world runs on them. Can't do much without it.
@jose000
@jose000 2 жыл бұрын
Iio
@luisdanielmesa
@luisdanielmesa 9 ай бұрын
We both worked for Amazon and you know nobody there would have taken this course... So you're either lying or... nah, you're lying.
@AsliEngineering
@AsliEngineering 9 ай бұрын
15 SDE-2s, 3 SDE-3, 1 PE and 1 HoE took my course. If you do not to believe it is upto you.
@AsliEngineering
@AsliEngineering 9 ай бұрын
Fun fact, after I replied to your comment I went on a 1:1 call and it was with an SDE-2 at Amazon working in CCF org :D
@akshatreddy9870
@akshatreddy9870 3 ай бұрын
Very bad. Hindu never shave off moustache and keep beard. Mussalman banne ka irada hain keya ? Please understand that you are Sanatani
@akshatreddy9870
@akshatreddy9870 3 ай бұрын
Either shave both beard and moustache or keep both moustache and beard. Don't just shave moustache only and keep beard.
@iMakeYoutubeConfused
@iMakeYoutubeConfused 3 ай бұрын
He's put so much effor into the content of this video and this is all what you've got to say?
@eatajerkpal99
@eatajerkpal99 2 ай бұрын
Hey arpit acan drop link for the notes that you presented in this video, thanks!
@eatajerkpal99
@eatajerkpal99 2 ай бұрын
found them on your github, i wont spam anymore. thanks!!
@amogu_07
@amogu_07 3 ай бұрын
thank you so much , clearly understood!!
How do indexes make databases read faster?
23:25
Arpit Bhayani
Рет қаралды 47 М.
She’s Giving Birth in Class…?
00:21
Alan Chikin Chow
Рет қаралды 4,6 МЛН
Did you find it?! 🤔✨✍️ #funnyart
00:11
Artistomg
Рет қаралды 122 МЛН
What is Database Sharding?
9:05
Anton Putra
Рет қаралды 37 М.
Why do databases store data in B+ trees?
29:43
Arpit Bhayani
Рет қаралды 27 М.
Database Sharding in 200 Seconds
5:04
PlanetScale
Рет қаралды 20 М.
Introduction to RPC - Remote Procedure Calls
33:05
Arpit Bhayani
Рет қаралды 24 М.
How DNS really works and how it scales infinitely?
16:35
Arpit Bhayani
Рет қаралды 18 М.
When should you shard your database?
21:20
Hussein Nasser
Рет қаралды 75 М.
Will the battery emit smoke if it rotates rapidly?
0:11
Meaningful Cartoons 183
Рет қаралды 641 М.
С Какой Высоты Разобьётся NOKIA3310 ?!😳
0:43