How Reddit designed their metadata store to serve 100k req/sec at p99 of 17ms

  Рет қаралды 25,930

Arpit Bhayani

Arpit Bhayani

Күн бұрын

System Design for SDE-2 and above: arpitbhayani.me/masterclass
System Design for Beginners: arpitbhayani.me/sys-design
Redis Internals: arpitbhayani.me/redis
Build Your Own Redis / DNS / BitTorrent / SQLite - with CodeCrafters.
Sign up and get 40% off - app.codecrafters.io/join?via=...
Recommended videos and playlists
If you liked this video, you will find the following videos and playlists helpful
System Design: • PostgreSQL connection ...
Designing Microservices: • Advantages of adopting...
Database Engineering: • How nested loop, hash,...
Concurrency In-depth: • How to write efficient...
Research paper dissections: • The Google File System...
Outage Dissections: • Dissecting GitHub Outa...
Hash Table Internals: • Internal Structure of ...
Bittorrent Internals: • Introduction to BitTor...
Things you will find amusing
Knowledge Base: arpitbhayani.me/knowledge-base
Bookshelf: arpitbhayani.me/bookshelf
Papershelf: arpitbhayani.me/papershelf
Other socials
I keep writing and sharing my practical experience and learnings every day, so if you resonate then follow along. I keep it no fluff.
LinkedIn: / arpitbhayani
Twitter: / arpit_bhayani
Weekly Newsletter: arpit.substack.com
Thank you for watching and supporting! it means a ton.
I am on a mission to bring out the best engineering stories from around the world and make you all fall in
love with engineering. If you resonate with this then follow along, I always keep it no-fluff.

Пікірлер: 59
@richi12345678910
@richi12345678910 Ай бұрын
The Kafka CDC can solve the problem of synchronous write inconsistencies, but not the backfill overwrriting. I suspect they might do some kind of business logic or SHA/checksum validation to ensure that they are not overwriting the data during backfilling. Correct me if I'm missing something bro.
@vinayak_98
@vinayak_98 Ай бұрын
Hey Arpit, thanks a lot for putting this up. Your writing skills are next level, crisp and crystal clear. Could you please tell what's the setup you use for taking these notes? Thanks in advance.
@ankitpandey3724
@ankitpandey3724 Ай бұрын
Weekend party ❌ Asli Engineering ✅
@nextgodlevel4056
@nextgodlevel4056 Ай бұрын
Successfully ruined my upcoming weekend. Have to view all of your videos now 😢
@GSTGST-dw5rf
@GSTGST-dw5rf 12 күн бұрын
Отлично, что на KZbin есть такие полезные видео. Спасибо, Министр!
@prtk2329
@prtk2329 Ай бұрын
Nice, very informative 👍
@raj_kundalia
@raj_kundalia 19 күн бұрын
Thank you for doing this!
@AayushThokchom
@AayushThokchom Ай бұрын
Large Data Migration -> Event Driven Architecture Also, interesting to learn about postgres's extentions which are not required if going with a serverless database solution like DynamoDB.
@user-ob1zi3jc1r
@user-ob1zi3jc1r Ай бұрын
how many shards were used to hold those partitions to achieve that much throughput
@JardaniJovonovich192
@JardaniJovonovich192 Ай бұрын
I guess we don't need both the CDC Setup and dual writes, just thr CDC setup would suffice to insert the data in the new DB, correct?
@mehulgupta8218
@mehulgupta8218 Ай бұрын
As usual great stuff🙌🏻
@ramannanda
@ramannanda Күн бұрын
postgres is the king still :) with extensions it is all you need...
@guytonedhai
@guytonedhai Ай бұрын
Asli Engineering!
@karanhotwani5179
@karanhotwani5179 19 күн бұрын
Nice. Kafka part seemed over engineering. Can just verify the hash before writing to new metadata db in syncing phase.
@dreamerchawla
@dreamerchawla Ай бұрын
Hey Arpit… thanks for the video I liked doing partition as policy that runs on a cron. But wouldn’t moving data around in partitions also warrant a change in backend(read) ? Or you are saying the backend has been written in a way that it takes partitioning into account while reading the data?
@AsliEngineering
@AsliEngineering Ай бұрын
They use range based partitioning so no repartitioning required.
@AsliEngineering
@AsliEngineering Ай бұрын
They use range based partitioning so no repartitioning required. New range every 90 milion IDs.
@anurag-vishwakarma
@anurag-vishwakarma Ай бұрын
Can u share the notes ? Pls
@namanjoshi5089
@namanjoshi5089 Ай бұрын
Amajeeng
@poojanpatel2437
@poojanpatel2437 Ай бұрын
Good video. Would appreciate a lot it you can attach any resources you used in video like blog from reddit that is mentioned in description. Would be great if link is also attached there.
@AsliEngineering
@AsliEngineering Ай бұрын
Already added in the description. You are just a scroll away from finding that out.
@sachinmalik5837
@sachinmalik5837 Ай бұрын
Hi Arpit, I think you could have gone a bit more into depth, like they have mentioned in their blog. a bit about how they are using incrementing post_id, which allows them to manage most of the query from 1 partition only. Not complaining at all. Thanks for being awesome as always. TLDR; 7 minutes seem a bit short
@AsliEngineering
@AsliEngineering Ай бұрын
I deliberately skipped it because it would have taken 4 more minutes of explanation. I experimented in this video by keeping it very surface level and around 8 min mark, and the retention for this one is through the roof 😅 In the last 15 videos I saw a massive drop in retention numbers when I started explaining the implementation nuances or when video length went beyond 8 minutes. So I wanted to experiment and test out the hypothesis in this one video. Hence you see I did not even inject the ad of my courses or the intro. Jumped right in the topic. But yes, given their IDs are monotonic, their batch gets for media metadata would almost always hit the single partition if they partition the data by ranges.
@SOMEWORKS456
@SOMEWORKS456 Ай бұрын
Would appreciate if you can make another 8 min video for details. I am here for the meat. Surface level stuff is ok. But meat. No complains, just stating the opinions of a random lurker.
@amoghyermalkar5502
@amoghyermalkar5502 Ай бұрын
@@AsliEngineering so in short your focus is more viewers instead of better video quality right?
@AsliEngineering
@AsliEngineering Ай бұрын
@@amoghyermalkar5502 if you really feel like it then please check out other videos on my channel. I go in depth that other people cannot even think about or comprehend. Remember, it hurts to put in effort of 2 days on a video to be seen by just 2000 people in 7 days.
@sachinmalik5837
@sachinmalik5837 Ай бұрын
@@AsliEngineering Absolutely. I can understand that, without naming names there are so many "Tech" Creator who are getting 10x times the views we got here but they just never seem to talk about Substance. I just want to you know we do appreciate it a lot. I am still trying to read more blogs on my own so I don't think I am being spoon fed for watching a video,
@code-master
@code-master Ай бұрын
How will you handle search, because the relevant data might be several days older partitions. Even if they're using a secondary data store, the date/time range-based partitioning or even sharding will not suffice. what do you think?
@AsliEngineering
@AsliEngineering Ай бұрын
Why would it be several days older? The migration was a one time activity. Post the switch the writes of media metadata is always going to the unified media metadata store.
@code-master
@code-master Ай бұрын
Thank you for your response, Apologies, my question was not clear. My question was related more related to searching through such a data store where the data is partitioned daily i.e. partitioned on the created_at. Let’s say you search for an 'X term' post, and the result ideally will contain a lookup from several partitions. For example, if there is a relevant post from a year back. We are looking at many partitions. To build the search result, the DB has to load each daily partition. Daily partition will work well if the lookup is limited to a couple of days back. That’s my understanding.
@user-vv7ph2xr1o
@user-vv7ph2xr1o Ай бұрын
Arpit - using cdc and kafka.. that still does not solve the problem of - Data from old source during 'migration' overriding data in the new aurora postgres, right? What am i missing? You will still need a bulk batch job that takes up all the archival data from all the multiple sources and ingests them into the new Aurora. Using CDC does not solve for that backport, correct?
@2020goro
@2020goro Ай бұрын
CDC can transfer the historical data using snapshots of the existing databases if/when the transaction log is not available for old data, and then the consumers report any write conflicts into a separate table which the devs can remediate later on. Hope that answers your question
@AsliEngineering
@AsliEngineering Ай бұрын
The consumers of Kafka have this responsibility. It is not that just adding Kafka solved the problem. The core conflict management is written in the consumer of it which checks and sets in the database.
@keshavb2896
@keshavb2896 Ай бұрын
Why reddit don't go for document db for there storage as per structure and pattern .... What u think about it @arpit?
@AsliEngineering
@AsliEngineering Ай бұрын
according to me familiarity of stack could be the biggest reason. Apart from this the query pattern here is that most request hits single partition (given IDs are monotonically increasing and partitions are created on range basis). Most KV stores do hash based partitioning because of which the lookups need to be fanned out across the shards which is quite expensive. The databases that do support range based lookup on per shard is DDB and that managed offering at scale becomes very expensive. These are some of the pointers I could think of. But again this is pure guess.
@nextgodlevel4056
@nextgodlevel4056 Ай бұрын
How pg bounce minimizes the cost of creating a new process for each request? May be I am wrong, can you tell me how cost is reducing here?.
@niteshlohchab9219
@niteshlohchab9219 Ай бұрын
let's say each connection spawns a new process. killing a connection kills the process.. what would you do logically? Think before reading the next line.. simply re-use the connections .. that's what every database proxy in front usually does in simple words.. the connections are re-used and managed accordingly.
@nextgodlevel4056
@nextgodlevel4056 Ай бұрын
@@niteshlohchab9219 Thanks, bro, for the easy and simple explanation; I appreciate it. What I was thinking was that the term "cost" is used for money, but I was wrong. Here, "cost" means scalability and performance, ensuring that each client gets a response as quickly as possible. So, in terms of money, we increase the cost, and in terms of scalability and performance, we decrease the cost. If we look at it for the long term in enterprise applications, having a scalable product also increases revenue. Let me know If I am correct or not 🙃
@AsliEngineering
@AsliEngineering Ай бұрын
Because it does connection pooling, so connections are reused.
@LeoLeo-nx5gi
@LeoLeo-nx5gi Ай бұрын
Thanks Arpit!! Also what are your thoughts about using Pandas as a metadata DB, Dropbox had a post regarding they using Pandas wherein they explained in depth why other DBs are not better for them. (Would like to know your views too on it)
@AsliEngineering
@AsliEngineering Ай бұрын
I am not aware about this. Let me take a look.
@calvindsouza1305
@calvindsouza1305 Ай бұрын
What is used over here to write down the notes?
@AsliEngineering
@AsliEngineering Ай бұрын
GoodNotes.
@calvindsouza1305
@calvindsouza1305 Ай бұрын
Thanks looks very clean
@suhanijain5026
@suhanijain5026 Ай бұрын
why are they using postgres, if they are storing it as json ?
@AsliEngineering
@AsliEngineering Ай бұрын
Stack familiarity, plus range based partitioning support.
@atanusikder4217
@atanusikder4217 Ай бұрын
What is CDC mentioned here ? Please suggest some pointers
@gauravsrivastava3884
@gauravsrivastava3884 Ай бұрын
Change Data Capture
@myjourney7713
@myjourney7713 Ай бұрын
How did they check if the reads from old vs new database are same?
@AsliEngineering
@AsliEngineering Ай бұрын
a simple diff would work given that the final JSON has to be the same as no changes were made to the client.
@myjourney7713
@myjourney7713 Ай бұрын
@@AsliEngineering if there is any issue at scale, wouldn't it be very hard to debug?
@bhumit070
@bhumit070 Ай бұрын
Dayumm
@anand.garlapati
@anand.garlapati Ай бұрын
Are you saying the reddit has unified database per region?
@AsliEngineering
@AsliEngineering Ай бұрын
Unified database implies that the data that was split across multiple services has been moved to one database. Now this unified one can be replicated across regions to improve client side response times.
@anand.garlapati
@anand.garlapati Ай бұрын
What was the motivation to go for dedicated database per service initially by the Reddit? Could you please tell how many such services they have it? Regarding your second point, the Reddit team allowed only reads from the replicated databases and not writes. Correct?
@LeftBoot
@LeftBoot Ай бұрын
How can I use AI to make this sound like my native language?
@shalinikandwal7845
@shalinikandwal7845 Ай бұрын
what is CDC?
@AbhisarMohapatra
@AbhisarMohapatra Ай бұрын
Change Data Capture ... means streaming of bin log files of database
Универ. 10 лет спустя - ВСЕ СЕРИИ ПОДРЯД
9:04:59
Комедии 2023
Рет қаралды 2,6 МЛН
Watermelon Cat?! 🙀 #cat #cute #kitten
00:56
Stocat
Рет қаралды 53 МЛН
MEGA BOXES ARE BACK!!!
08:53
Brawl Stars
Рет қаралды 32 МЛН
Универ. 13 лет спустя - ВСЕ СЕРИИ ПОДРЯД
9:07:11
Комедии 2023
Рет қаралды 6 МЛН
Understanding Proxy, Forward Proxy, and Reverse Proxy
9:41
Arpit Bhayani
Рет қаралды 24 М.
Google India Engineers in a Mock System Design Interview
20:54
Life at Google
Рет қаралды 116 М.
Distributed Transactions: Two-Phase Commit Protocol
21:21
Arpit Bhayani
Рет қаралды 26 М.
Everything you need to know about CockroachDB w/ Ben Darnell | Ep 6
1:30:03
Understanding B-Trees: The Data Structure Behind Modern Databases
12:39
Database Sharding and Partitioning
23:53
Arpit Bhayani
Рет қаралды 65 М.
сюрприз
1:00
Capex0
Рет қаралды 1,6 МЛН
APPLE совершила РЕВОЛЮЦИЮ!
0:39
ÉЖИ АКСЁНОВ
Рет қаралды 3,7 МЛН
Samsung Galaxy 🔥 #shorts  #trending #youtubeshorts  #shortvideo ujjawal4u
0:10
Ujjawal4u. 120k Views . 4 hours ago
Рет қаралды 9 МЛН
i like you subscriber ♥️♥️ #trending #iphone #apple #iphonefold
0:14
Secret Wireless charger 😱 #shorts
0:28
Mr DegrEE
Рет қаралды 1,4 МЛН
#miniphone
0:16
Miniphone
Рет қаралды 3,5 МЛН