How JunoDB is designed to scale horizontally

  Рет қаралды 3,458

Arpit Bhayani

Arpit Bhayani

Күн бұрын

System Design for SDE-2 and above: arpitbhayani.m...
System Design for Beginners: arpitbhayani.m...
Redis Internals: arpitbhayani.m...
Build Your Own Redis / DNS / BitTorrent / SQLite - with CodeCrafters.
Sign up and get 40% off - app.codecrafte...
In my recent video, I delved into Juno DB, PayPal's key value database, exploring its scalability and design decisions. Juno DB plays a crucial role in PayPal's microservices architecture, necessitating scalable solutions for incoming connections, data storage, and query loads. By maintaining stateless and equal Juno proxies behind load balancers, handling large numbers of connections becomes seamless. Additionally, Juno DB's fixed number of shards and consistent hashing ensure efficient data distribution across storage servers, enabling seamless scalability without overcomplicating the architecture. Join me in the next video as we dive into availability discussions.
Recommended videos and playlists
If you liked this video, you will find the following videos and playlists helpful
System Design: • PostgreSQL connection ...
Designing Microservices: • Advantages of adopting...
Database Engineering: • How nested loop, hash,...
Concurrency In-depth: • How to write efficient...
Research paper dissections: • The Google File System...
Outage Dissections: • Dissecting GitHub Outa...
Hash Table Internals: • Internal Structure of ...
Bittorrent Internals: • Introduction to BitTor...
Things you will find amusing
Knowledge Base: arpitbhayani.m...
Bookshelf: arpitbhayani.m...
Papershelf: arpitbhayani.m...
Other socials
I keep writing and sharing my practical experience and learnings every day, so if you resonate then follow along. I keep it no fluff.
LinkedIn: / arpitbhayani
Twitter: / arpit_bhayani
Weekly Newsletter: arpit.substack...
Thank you for watching and supporting! it means a ton.
I am on a mission to bring out the best engineering stories from around the world and make you all fall in
love with engineering. If you resonate with this then follow along, I always keep it no-fluff.

Пікірлер: 14
@sounishnath513
@sounishnath513 Жыл бұрын
After watching this detailed explanation I feel like it's like a child's play 😅, Thanks Arpit for making us understand the juno elasticity is actually easy 😊.
@user-stillExploring
@user-stillExploring 7 ай бұрын
@AsliEngineering Question: (Related to time:13:51 key storing diagram) While deciding a key(K) will go to which shard, first we are hashing using murmurHash(call it H) then taking mod by shard count(call it SC), so let's assume that H%SC = n and most of the time it's 'n' then nth shard will get full quickly, how to handle that without vertically scaling the shards? As most of the shards have empty space we should not increase the capacity of all shards capacity from cost perspective. Simple mod may not suffice the large K-V store use case, eg. even at Paypal since they have lots of data to save it might be possible the simple modulo function may give you tons of collision key and your particular shard may get full. Thoughts? 🤔
@the10xdev
@the10xdev Жыл бұрын
A small inference : the number of shards is the max number of machines we can add. Adding more logically doesn’t make sense since all shards are mapped to server by 1:1
@the10xdev
@the10xdev Жыл бұрын
I understand we move the shards, but I’m unable to visualise how we do? Let’s say a third node is added and we identify that X shards need to move Now, these can belong to multiple nodes. So I’m thinking the proxy facilitates the movement, but what about the requests that come to the DB during this movement. Although consistent hashing reduces the amount of data moved, it’s still expensive enough that it takes a decent amount of time. Does Juno reject these requests? It might serve GET, but logically it should lock the shard for writes until it’s completely moved, otherwise if writes are allowed and propagated, the movement process might just never end.
@sriharshakokkanti
@sriharshakokkanti Жыл бұрын
As you said we have only 1024 shards (each a rock db instance) , so will it be like initially all the 1024 shards are running on the initial servers and as the load increases the number of new servers get added then shard movement starts ? just curious to know if initial number of servers are 5, then will the 1024 instances of rock db ran on the 5 servers ? or any other technique is used ?
@prabhath14411
@prabhath14411 Жыл бұрын
Does etcd run within Junodb proxy or in another cluster ? There is a limit on number of etcd cluster members, 5 is good and 7 is maximum recommended by etcd , the writes become too slow for clusters with more members. This would have been sufficient for them to achieve the scale that was necessary for their use cases.
@ashuvyas45
@ashuvyas45 Жыл бұрын
Question: The client's first interaction is with the load-balancer. We discussed how the increase in incoming connections are handledled at junoProxy level and downwards, but what about load balancer. It will have limit of incoming concurrent connections it can handle. Wouldn't that be a limiting factor.
@AsliEngineering
@AsliEngineering Жыл бұрын
LB is never a single machine. Internally it scaled horizontally.
@hsnice16
@hsnice16 Жыл бұрын
I've few questions here. When we added a new storage, then what initial that storage will have? Will it have any data? And when we moved a shard in the new storage, then are we moving the keys that shard stored in the new storage? And when new data will come, how that will we handled here?
@rajivsarkar277
@rajivsarkar277 Жыл бұрын
Thats what taken care by Consistant hashing which shard owned by which storage server . And we move shards means we move the data present inside the shard.
@srawat1212
@srawat1212 Жыл бұрын
1 question: Since, each Juno proxy has to maintain atleast one persistent connection with each of storage servers, won't this be a bottleneck on how many Juno proxy machines we can run at a time? Also, adding a lot of persistent connections to storage servers may deteriorate their throughput since the system will have to coordinate between all the connections.
@AsliEngineering
@AsliEngineering Жыл бұрын
This is not as limiting as you think. Paypal is supporting 100B Requests every day with 200 servers.
@srawat1212
@srawat1212 Жыл бұрын
@@AsliEngineering Agreed but then I assume it could be because the juno proxy are huge machines(resource wise) which cannot be in 1000s. Maybe the largest number they can have is few 100s only. Let me know if I am missing something here.
@AsliEngineering
@AsliEngineering Жыл бұрын
Exactly. You have vertically scaled up machines. Nothing wrong with it. Even storage servers can also be bulked up.
High level architecture and System Design of JunoDB
13:40
Arpit Bhayani
Рет қаралды 7 М.
Overview of JunoDB - an open source KV store by PayPal
17:38
Arpit Bhayani
Рет қаралды 25 М.
Which One Is The Best - From Small To Giant #katebrush #shorts
00:17
Running With Bigger And Bigger Lunchlys
00:18
MrBeast
Рет қаралды 119 МЛН
规则,在门里生存,出来~死亡
00:33
落魄的王子
Рет қаралды 26 МЛН
What is Database Sharding?
26:56
Be A Better Dev
Рет қаралды 156 М.
How Gojek masks and keeps users' phone numbers secure at scale?
16:21
Do you know Distributed transactions?
31:10
Tech Dummies Narendra L
Рет қаралды 230 М.
How do indexes make databases read faster?
23:25
Arpit Bhayani
Рет қаралды 67 М.
Amazon DynamoDB - Paper Explained
1:33:01
Arpit Bhayani
Рет қаралды 18 М.
What is DATABASE SHARDING?
8:56
Gaurav Sen
Рет қаралды 930 М.
Which One Is The Best - From Small To Giant #katebrush #shorts
00:17