System Design: Reddit

  Рет қаралды 33,358

System Design Fight Club

System Design Fight Club

Күн бұрын

Пікірлер: 28
@John-nhoJ
@John-nhoJ 2 жыл бұрын
In my interviews, I've had interviewers suggest the subreddit as the shard key over the postId. Otherwise, you'd have to query all the shards when looking for a specific query. Is it better to put an LRU cache in front of the PostDB since the hot and top posts are almost always from the last 24 hours, and shard on the postId?
@SDFC
@SDFC 2 жыл бұрын
that’s a phenomenal point! I was a little aware of the issue in that I knew deep pagination performs particularly horrendously in this scenario, which is why it was important to assume that users typically stick to the first couple pages… However, I had not considered how as a result of being a key-range query that we would end up simply hosing *all* the partitions evenly under a hash partitioning approach for that attribute. This is really fascinating. I think we simply can’t truly alleviate hot partition issues for key-range queries in general because it would always lead to this issue of just evenly hosing ALL the partitions with ALL the key-range queries
@John-nhoJ
@John-nhoJ 2 жыл бұрын
@@SDFC Would it be better to have a two-part system where new posts are put into a DB cluster sharded by the subreddit, then after a few days, they get migrated to a DB cluster sharded by postId for cold storage?
@jamesreader5955
@jamesreader5955 Жыл бұрын
shard on subreddit and date range (24h) worked pretty well for reddit until recently when wallstreetbets almost took down reddit due to hot shards
@locksmith6096
@locksmith6096 Жыл бұрын
@@John-nhoJ If you use subreddit as shard key, you would have the exact same problem when querying all posts from a user (deep pagination on all shards). So how would one handle two different deep pagination queries?
@Gerald-iz7mv
@Gerald-iz7mv Жыл бұрын
@@locksmith6096 you mean if the user is interested in 50 subreddits?
@ryandcouto
@ryandcouto Жыл бұрын
Where does the process that updates the “hot” posts cache run ?
@umakemebored
@umakemebored Жыл бұрын
Can you explain the deep pagination issue further please? Why would a deep page cause more scatter-gather than an early page?
@ostenloo1981
@ostenloo1981 Жыл бұрын
Thanks for these explanations. I'm a student trying to learn more about system design.
@tanayakarmakar2407
@tanayakarmakar2407 Жыл бұрын
1. Upvote Functionality Question:- a. Normally we make upvote based on posts, so when user is making post, does it flow to Upvote DB as well by DB trigger or CDC ? b. Can't we utilize same Posts DB for that ? 2. Posts Data store -- used for multiple functionality, TOP "Hot" posts (TOP K problem) Question:- a. Here there are limited search functionality can we utilize Cassandra for that (may be we can have auxilliary Cassandra DB (or S3 where we can create folder based on every hour) as well to reduce load on main DB ? ) b. Some MR job should read the Posts data store at a fixed interval(may be every 1 hour ?) and store the result in some key value store (like Dynamo DB), then we can have an in-memory Priority Queue (Top "Hot or trending" posts) --> Since this data does not need to be that much real time c. To view "NEW" posts we can table user_posts(primary key: postID, clustering key: timestamp in DESC) d. In case User wants to see his/her own posts, we can table for posts_by_user(userId: primary key, (postID, timestamp DESC): clustering key) Or another thing that we can follow 1. Posts, Upvotes (Command) --> we can use one DB for that 2. Viewing "New" posts, TOP "HOT/Trending" posts (Read) --> We can use separate DB for that(May be Cassandra or to reduce cost S3) on top of that we can run MR jobs to create TOP K posts for the users CQRS pattern -- segregating the read and write path. -- Please suggests if anything is not right
@banecrow3458
@banecrow3458 Жыл бұрын
I would like to get access to the diagrams, discord is not accepting any new users. How can I acquire those diagrams?
@anilkaliya3375
@anilkaliya3375 Жыл бұрын
I would definitely put a cache in front of view posts service. As soon as user asks for top/hot posts for the first time we could store it in redis/another dynamo db table. we could talk about read through and write through cache pattern also. As here pattern is more suitable for read through cache.Since these responses can be invalidated in particular time interval (1-hour,2-hour) . we can show same hot posts for user for some time probably depends on trend pattern and invalidate using ttl feature of the tool(redis and dynamo both have ttl). For faster access of images and reducing the latency the cloud front could be a great idea. Invalidation might be extra step here as well in case of post gets updated.
@yizhenqiu8450
@yizhenqiu8450 Жыл бұрын
Stupid question (I'm non-tech) - sounds like DDB is used to store the metadata of the post and the actual post itself is stored in S3? or does it depends on post size (small post stored in DDB and larger post store in S3)?
@jamesreader5955
@jamesreader5955 Жыл бұрын
haven't watched all of it yet, but guessing the post is stored in s3, but also in cache (memcache/redis/etc) which allows for speed and ability to rebuild
@komalsinha3769
@komalsinha3769 10 ай бұрын
Which database would you recommend for storing nested comments , as reddit has multiple levels of nesting ?
@vivshan
@vivshan Жыл бұрын
Thank you so much for your videos. They are really informative. I have a general question on all system design videos/articles on the web including this one - What if someone who doesn't have a lot of experience with NoSql databases do in situations where we need to design with NoSql databases? I have extensive experience with Oracle and MySql but not much with NoSql except Mongo. When i see the design of systems involving NoSql such as Cassandra/HBase etc, first of all it is very theoretical for me and second i feel i will not be able to defend the intricacies of the databases during the actual interview. Yeah, i can read about NoSql and how they do sharding and why we need to use NoSql versus Sql etc., but since i don't have professional experience, i feel it will get exposed for sure and impact my chances. I would love your insight on this as i am really struggling with this aspect.
@thefitnessinstructor8937
@thefitnessinstructor8937 Жыл бұрын
jeezus that's a trekkers. i work in cybersec and am always amazed at how software eng types don't get bored of it all after 10 mins
@Gerald-iz7mv
@Gerald-iz7mv Жыл бұрын
whats a screen view is for mobile and page view for http?
@kleshwong8321
@kleshwong8321 Жыл бұрын
Why would you UPDATE a HISTORICAL record?If someone upvotes and then unupvotes a post, shouldn't there be two records in the log?
@Karthik-oi9gn
@Karthik-oi9gn Жыл бұрын
Any advantage with maintaining two records ? This would reduce the complexity and space usage though. And limiting to the problem stated, it is sufficient to update rather maintain history of events. If we were to do some analytics on users sentimental analysis or something, this doesnt help much but add complexity.
@mangeshshikrodkar6192
@mangeshshikrodkar6192 2 жыл бұрын
tps is transactions per sec ?
@SDFC
@SDFC 2 жыл бұрын
yes, but for full clarity, not “transactions” in the sense of full blown DB transactions. it’s a commonly used acronym within amazon that actually refers to any kind of reads/writes per second… I should possibly learn to start saying QPS instead
@Gerald-iz7mv
@Gerald-iz7mv Жыл бұрын
@@SDFC better say QPS :)
@Karthik-oi9gn
@Karthik-oi9gn Жыл бұрын
Please increase the volume of the recordings. These are really hard to listen with the low voice.
@maherj351
@maherj351 2 жыл бұрын
why don't you build a reactive actor system with an event store ? posts/upvotes/downvotes are events
@SDFC
@SDFC 2 жыл бұрын
An “actor system” is a type of design pattern for dealing with concurrency effectively and safely. That’s definitely a way that the code could be written within the backend services. By “event store”, you are probably referring to “event sourcing” which is also a design pattern but specific to a way of designing databases. The upvote history data store is already set up in a way that follows event sourcing. The initial creation of posts is basically handled like event sourcing as well. However, it’s very inefficient to be doing key-range queries all the time for determining the total upvote count of posts on the homepage, and we’d prefer to keep that as a bit more of a “materialized view” with a counter field. This is where we depart from the event sourcing pattern, because a counter field is inherently going to be mutable.
@javiponch8093
@javiponch8093 Жыл бұрын
​​@@SDFC pretty sure he ment an event store; en.m.wikipedia.org/wiki/Event_store It's clear you're not a developer 🙃
Google Interview Question | System Design: Design a Distributed Database
57:41
System Design Fight Club
Рет қаралды 7 М.
Google system design interview: Design Spotify (with ex-Google EM)
42:13
IGotAnOffer: Engineering
Рет қаралды 1,2 МЛН
Try this prank with your friends 😂 @karina-kola
00:18
Andrey Grechka
Рет қаралды 9 МЛН
How Strong Is Tape?
00:24
Stokes Twins
Рет қаралды 96 МЛН
Design Twitter - System Design Interview
26:16
NeetCode
Рет қаралды 547 М.
System Design: Hotel Booking
44:11
System Design Fight Club
Рет қаралды 68 М.
Design a High-Throughput Logging System | System Design
8:23
Interview Pen
Рет қаралды 49 М.
System Design Concepts Course and Interview Prep
53:38
freeCodeCamp.org
Рет қаралды 511 М.
Why Are Threads Needed On Single Core Processors
16:07
Core Dumped
Рет қаралды 226 М.
15: Reddit Comments | Systems Design Interview Questions With Ex-Google SWE
48:37
Whatsapp System design or software architecture
27:40
Tech Dummies Narendra L
Рет қаралды 256 М.
20 System Design Concepts Explained in 10 Minutes
11:41
NeetCode
Рет қаралды 1,1 МЛН