Systems Design in an Hour
1:11:00
time for a change
8:48
4 ай бұрын
Пікірлер
@vetiarvind
@vetiarvind 18 минут бұрын
Hey Jordan this is awesome content and you're a great teacher. Listening to it in the gym from India and I can follow everything.
@rashminpatel3716
@rashminpatel3716 2 сағат бұрын
Hey Jordan, thanks for the amazing system design video as usual !! I have one doubt on usage of Flink. Whenever atleast one flink compute node goes down or restarts, then the flink job fails and it has to restore the entire state across all the nodes from S3. So, this whole restoration process can take few minutes. So, our message delievery will be delayed for that many minutes, affecting the entire user base. Is that understanding correct ?
@shashankpratapwar-wj7xl
@shashankpratapwar-wj7xl 2 сағат бұрын
I had a hard time reading alex petrov's database internals book, so I quit after few days. But this is informative and engaging at same time. Looking forward to entire series.
@cdgtopnp
@cdgtopnp 11 сағат бұрын
Chapters please 🙏
@VibhorKumar-uo9dd
@VibhorKumar-uo9dd 16 сағат бұрын
One question regarding Fan out approach. While pushing posts to each followers, we are pushing that to a particular News Feed Caches corresponding to that user. My doubt is whether these news feed caches are just an another caching layer sharded on user id(let's say 10 caching servers sharded on userid for 100 users), or they specific to the user(100 users 100 caches in that case)?
@yuxuanche2552
@yuxuanche2552 Күн бұрын
Hi Jordan, just wondering if the mutual connection databases in 14:14 are the same as the mutual cache table?
@jaeorgjaehpaejh
@jaeorgjaehpaejh Күн бұрын
Jordan sat alone in his dimly lit room, eyes fixed on the screen. Kafka Streams flowed before him like a slow, seductive dance. His fingers moved over the keyboard, sending commands that made the data bend to his will-smooth, precise, and totally in his control. “Processing this much data feels like... processing my love life,” he chuckled, leaning in closer. “A little messy at first, but once I get my hands on it, everything falls into place... perfectly.” The logs rolled in real time, a rhythm that matched his heartbeat. Who needed real life when the streams responded to him this way? Controlled. Obedient. Alive. Jordan didn’t just process data-he made it swoon.
@DimitarVandev
@DimitarVandev Күн бұрын
cool
@vinaybabu2635
@vinaybabu2635 Күн бұрын
Hey Jordan, do you have any course on Udemy ? If not highly recommend u do that
@srishti2k22-iw5dh
@srishti2k22-iw5dh 2 күн бұрын
great
@rish.i
@rish.i 2 күн бұрын
Thank you Jordan for this series. In the paper when comparing b/w 3 partitions approaches its written that for fixed size 3rd strategy, the membership information stored at each node is reduced by three orders of magnitude. Whereas in previous paras, its mentioned that 3rd strategy stores not only no of tokens (servers) hash as in 1st but also which partition info stored on each node. Isn’t that contradictory info or am misunderstanding something. Ideally third partition scheme should contain more membership information per node. Assuming that they have not changed request forwarding from O(1) hops to logn dht like chord or pastry like routing whereas each node stores only limited number of nodes information sacrificing direct hops.
@ankitgomkale11
@ankitgomkale11 2 күн бұрын
Hey Jordan, I just wanted to take a moment to express my deep gratitude. I recently received 5 offers from tier-1 companies, including an offer from Google for a Staff Engineer role. Your videos have been an absolute game-changer for me throughout this journey. I can't thank you enough for the insights and guidance you've shared-it's made a world of difference. Please keep up the amazing work, you're truly making an impact!
@skullTT
@skullTT 2 күн бұрын
some other videos talked about recommendation engine from ML aspect, content filtering, collaborative filtering. What is the relationship between them and this embedding approach
@skullTT
@skullTT 2 күн бұрын
besides vector db, which database should we choose for other data including entity history db, neighbor index. Can we use Cassandra because for history db because it is write heavy and append only. KV store for neighbor index.
@mostinho7
@mostinho7 2 күн бұрын
11:00 good summary Can use hash index when you want fast reads and writes but you can’t do efficient range queries. Hash index is also kept in memory not on disk
@chaitanyatanwar8151
@chaitanyatanwar8151 2 күн бұрын
Thank You!
@chaitanyatanwar8151
@chaitanyatanwar8151 2 күн бұрын
Thank you! The videos and the discussions in Comments make this channel the best source for system design.
@XoPlanetI
@XoPlanetI 2 күн бұрын
14:02 isn't it the same UUId and different timestamp for replacing the message?
@RS7-123
@RS7-123 2 күн бұрын
another comment on this awesome video after rewatching it bunch of times. so to conclude, how did u say we achieve idempotency since there seems to be no best option?
@chaitanyatanwar8151
@chaitanyatanwar8151 3 күн бұрын
Thank you!
@abcxyz9637
@abcxyz9637 3 күн бұрын
Assigning N seats in a section to a group of k people each requesting {s_1, ..., s_k} seats is a Bin-packing problem. Simply allocating seats in FIFO order may lead to unfairness and sub-optimal allocation (people at the end of the list and requesting higher number of seats will be most impacted; and we want to sell as many tickets as possible). Although, it's not practically possible to solve Bin-packing in real-time, a simple optimization would be to maintain the sum of total seats requested by people for every section. In real-time, while the bookings are on-going, that sum must not exceed N. The actual seat allocation may be done offline e.g., seats can be finalized after the booking closes, and the users can be sent their tickets via mail/phone [AWS SNS] with actual seat numbers.
@guitarMartial
@guitarMartial 3 күн бұрын
Jordan would HBase be a better alternative to MySQL here? You get write ahead log style indexing which can then be leveraged to build our heap as well and expire elements as they are consumed.
@chaitanyatanwar8151
@chaitanyatanwar8151 3 күн бұрын
Thank you!
@mayankchhabra3070
@mayankchhabra3070 3 күн бұрын
If we take the example of creating a search on top of chats and if we partition it at chat_id wont that lead to an uneven distribution of data? Given elastic search has these shards and it tries to distribute the data evenly across all the shards but if we explicitly route our data to a specific shard (using chat_id in our example) it can lead to uneven distribution of data across shards where one chat might have active and other might be dormant. Just thinking out loud how we would solve for this :P (Probably distribute it evenly by using some composite key but that would defy the purpose to just search chats from one partition)
@hazemabdelalim5432
@hazemabdelalim5432 3 күн бұрын
Why do you hashing on the user id and just store the session state in a separate storage like redis ? Sticky session will not guarantee the even distribution of traffic and it might be a bottleneck for very high amount of traffic
@rashminpatel3716
@rashminpatel3716 4 күн бұрын
All those channels are more redundant than replicated database !! True that 😂
@Jayvil773
@Jayvil773 4 күн бұрын
Thank you for the series, finally finished it! You're a literal chad, I can only hope to grow up to be like you one day.
@jordanhasnolife5163
@jordanhasnolife5163 3 күн бұрын
You already are brother
@adishgangwal7105
@adishgangwal7105 4 күн бұрын
Hi Jordan - In order service - the second Flink which is sharded by product ID --how does it know that it has received all the products of the cart before sending email to the user ?
@jordanhasnolife5163
@jordanhasnolife5163 3 күн бұрын
I hadn't really made that my intention, and in the diagram we're willing to send multiple emails. If we wanted to do this, we'd probably have to split the products like we do here, have the original order id with a number of products and order id in the message, and then send to one final kafka queue + consumer to aggregate on order id
@RS7-123
@RS7-123 4 күн бұрын
great video. thanks for the incredible contribution. 2 questions 1) how do you notify users who aren’t online about unpopular posts. i would assume you need some sort of user specific queue after you fanout. the notifications table is sharded by topic id so it possibly won’t be queried directly for unpopular posts. 2) how do you notify users who are online about popular posts since i don’t see it connected to the web socket flow. i assume you expect them to poll periodically to see if they have any popular posts?
@jordanhasnolife5163
@jordanhasnolife5163 3 күн бұрын
1) when they come back online, they'll basically hit a cache of all notifications meant for them specifically (cache is partitioned by hash of user id), and combine this with popular notifications they were subscribed to 2) yep, just polling!
@chaitanyatanwar8151
@chaitanyatanwar8151 4 күн бұрын
Thank you!
@Dychord
@Dychord 4 күн бұрын
Cool!
@user-sx4wm5ls5q
@user-sx4wm5ls5q 5 күн бұрын
thanks for the great video. One qq, how can a kafka queue have multiple consumers from 28:36 for "top-bid kafka queue"? I heard one partition of kafka can have only one consumer at a time. I guess we need to post the record into multiple partitions?(which parititions by the server it will be subscrbing..?) or relaistically, just change the last bit to in-memory queue like pubsub?
@jordanhasnolife5163
@jordanhasnolife5163 4 күн бұрын
You can have many consumers on the same partition, you just can't do round robin within a partition
@user-sx4wm5ls5q
@user-sx4wm5ls5q 4 күн бұрын
@@jordanhasnolife5163 Ahh thanks. I had a misconcept of kafka, I thought it was one consumer per partition, and after seeing your reply, and found out that one consumer per partition within a consumer group, where each consumer group keeps its own offset. Thanks!!
@hbhavsi
@hbhavsi 5 күн бұрын
Congrats on now 55K. Your videos are among the best out there. I love how you segue from one video to another, building upon one, ending at challenges, and talking about solutions in the next video.
@jordanhasnolife5163
@jordanhasnolife5163 4 күн бұрын
Thanks man!!
@ged9925
@ged9925 5 күн бұрын
Can you do one on palantir foundry?
@jordanhasnolife5163
@jordanhasnolife5163 4 күн бұрын
I'll probably have to do quite a bit of googling here, but if it's open source then hopefully! I'm surprised though I assumed anything out of palanatir would be very closed source
@asning97
@asning97 5 күн бұрын
Good video. Snowflake uses FoundationDB for its cloud services/metadata layer which might be interesting for you to look at. FDB provides serializable isolation and ACID transactions while also being massively horizontally scalable which makes it fantastic for building distributed systems on top of, however it requires a lot of customization and ops work to run at that scale and there aren’t managed services for it afaict. CockroachDB has some similar value propositions. I think those kinds of new SQL DBs would be a great topic for your viewers
@jordanhasnolife5163
@jordanhasnolife5163 4 күн бұрын
Thanks for the insight there! I imagine it's another one of these DBs with consensus built in
@cunningham.s_law
@cunningham.s_law 5 күн бұрын
arrow network format, parquet file format and duck db are all interresting peices of engineering
@jordanhasnolife5163
@jordanhasnolife5163 4 күн бұрын
Hoping to do all of those for sure!
@LoneStarExplorerTX
@LoneStarExplorerTX 5 күн бұрын
Thank you so much for the video, God bless you.
@ShreyasGaneshs
@ShreyasGaneshs 5 күн бұрын
Do a video on tiger beetle they are doing some really cool stuff with viewstamp replication that u might enjoy looking into
@jordanhasnolife5163
@jordanhasnolife5163 4 күн бұрын
Haven't heard of it, thanks for the suggestion!
@NeyazShafi
@NeyazShafi 5 күн бұрын
For the record, 2 minutes is a really long time. I'm with you Jordan.
@jordanhasnolife5163
@jordanhasnolife5163 4 күн бұрын
We're not together anymore
@belikeamitesh
@belikeamitesh 6 күн бұрын
Bro 1st 20 sec 😂 , What a start - jokes apart amazing info thanks for sharing
@storiesaudio-x5c
@storiesaudio-x5c 6 күн бұрын
With User ID and IP you can also add in user system data for a more precise system where, user system(mac address(Random mac on new phones is a troublesome thing),phone make,model,device id,screen resolution,processor,storage,device specs etc) a combination of these 3 can make it more precise, especially with cgnat(same ip for whole neighbourhood- common in asia by isps) networks.
@jordanhasnolife5163
@jordanhasnolife5163 4 күн бұрын
Nice insight, thank you!
@storiesaudio-x5c
@storiesaudio-x5c 3 күн бұрын
@@jordanhasnolife5163 i guess twitter does this to detect bots using vm's to create new accounts for spamming.
@ibrahimmalik4155
@ibrahimmalik4155 6 күн бұрын
Dude you're simply awesome. Keep up the great work!
@smkhan007
@smkhan007 7 күн бұрын
Is it a good idea to connect client directly to the server queue?
@jordanhasnolife5163
@jordanhasnolife5163 4 күн бұрын
I think it really depends how many of them there are. I agree with you, in this case no, revised in 2.0 design
@hijonk9510
@hijonk9510 7 күн бұрын
man told me to crack open a red bull for a 12 minute video
@jordanhasnolife5163
@jordanhasnolife5163 4 күн бұрын
Ok I guess you can pop a Viagra next time
@chaitanyatanwar8151
@chaitanyatanwar8151 7 күн бұрын
Thank you!
@lucasparzych5195
@lucasparzych5195 7 күн бұрын
MySQL was chosen over Mongo in this video because (paraphrasing) "its simpler and a document data model is not necessary". I'm confused though over one point and wondering if anyone could elaborate on it. It's been my understanding that most relational databases (MySQL included) do not offer native sharding capabilities because you can't really facilitate ACID transactional guarantees when your data is distributed across multiple computers. Yet, MySQL is chosen here on the basis that sharding is supported. I assume that there are popular extensions for MySQL that enable sharding (I think there are for postgres at least). Is the assumption that we'd be using one of these extensions? I also understand that we could implement "sharding" entierly on the application layer with a (somewhat) simple function like `determineWhichDatabaseToConnectToBasedOnShortUrl(shortUrl)`. Then we've got all of the operational complexity of provisioning and managing the separate databases to deal with though. So, I go back to the question: why not just pick MongoDB instead which uses single-leader replication, supports sharding natively and has unique indexes for our short_url column? Postgres is actually normally my go-to database but I've never needed sharding. My experience with Mongo is pretty limited so I may be just giving it too much credit. Maybe both present operational challenges.
@abcxyz9637
@abcxyz9637 7 күн бұрын
For the selection sort, the last line should be `swap(lst[i], lst[minIndex])`
@jordanhasnolife5163
@jordanhasnolife5163 4 күн бұрын
Thanks!
@adw6579
@adw6579 8 күн бұрын
I watch all these videos and then I go to my shtty job where I create full stack cruds apps, using none of my knowledge.
@jordanhasnolife5163
@jordanhasnolife5163 4 күн бұрын
Too real
@VishalBhartivisu
@VishalBhartivisu 8 күн бұрын
Just awesome dude.. keep it coming...
@user-sx4wm5ls5q
@user-sx4wm5ls5q 8 күн бұрын
Thanks for the video. One question, what's the line between "topic subscritpion DB" and the "Notification polling service" in the end diagram?
@jordanhasnolife5163
@jordanhasnolife5163 4 күн бұрын
If we're gonna poll for notifications we need to know which topic a user is subscribed to