Hey Jordan this is awesome content and you're a great teacher. Listening to it in the gym from India and I can follow everything.
@rashminpatel37162 сағат бұрын
Hey Jordan, thanks for the amazing system design video as usual !! I have one doubt on usage of Flink. Whenever atleast one flink compute node goes down or restarts, then the flink job fails and it has to restore the entire state across all the nodes from S3. So, this whole restoration process can take few minutes. So, our message delievery will be delayed for that many minutes, affecting the entire user base. Is that understanding correct ?
@shashankpratapwar-wj7xl2 сағат бұрын
I had a hard time reading alex petrov's database internals book, so I quit after few days. But this is informative and engaging at same time. Looking forward to entire series.
@cdgtopnp11 сағат бұрын
Chapters please 🙏
@VibhorKumar-uo9dd16 сағат бұрын
One question regarding Fan out approach. While pushing posts to each followers, we are pushing that to a particular News Feed Caches corresponding to that user. My doubt is whether these news feed caches are just an another caching layer sharded on user id(let's say 10 caching servers sharded on userid for 100 users), or they specific to the user(100 users 100 caches in that case)?
@yuxuanche2552Күн бұрын
Hi Jordan, just wondering if the mutual connection databases in 14:14 are the same as the mutual cache table?
@jaeorgjaehpaejhКүн бұрын
Jordan sat alone in his dimly lit room, eyes fixed on the screen. Kafka Streams flowed before him like a slow, seductive dance. His fingers moved over the keyboard, sending commands that made the data bend to his will-smooth, precise, and totally in his control. “Processing this much data feels like... processing my love life,” he chuckled, leaning in closer. “A little messy at first, but once I get my hands on it, everything falls into place... perfectly.” The logs rolled in real time, a rhythm that matched his heartbeat. Who needed real life when the streams responded to him this way? Controlled. Obedient. Alive. Jordan didn’t just process data-he made it swoon.
@DimitarVandevКүн бұрын
cool
@vinaybabu2635Күн бұрын
Hey Jordan, do you have any course on Udemy ? If not highly recommend u do that
@srishti2k22-iw5dh2 күн бұрын
great
@rish.i2 күн бұрын
Thank you Jordan for this series. In the paper when comparing b/w 3 partitions approaches its written that for fixed size 3rd strategy, the membership information stored at each node is reduced by three orders of magnitude. Whereas in previous paras, its mentioned that 3rd strategy stores not only no of tokens (servers) hash as in 1st but also which partition info stored on each node. Isn’t that contradictory info or am misunderstanding something. Ideally third partition scheme should contain more membership information per node. Assuming that they have not changed request forwarding from O(1) hops to logn dht like chord or pastry like routing whereas each node stores only limited number of nodes information sacrificing direct hops.
@ankitgomkale112 күн бұрын
Hey Jordan, I just wanted to take a moment to express my deep gratitude. I recently received 5 offers from tier-1 companies, including an offer from Google for a Staff Engineer role. Your videos have been an absolute game-changer for me throughout this journey. I can't thank you enough for the insights and guidance you've shared-it's made a world of difference. Please keep up the amazing work, you're truly making an impact!
@skullTT2 күн бұрын
some other videos talked about recommendation engine from ML aspect, content filtering, collaborative filtering. What is the relationship between them and this embedding approach
@skullTT2 күн бұрын
besides vector db, which database should we choose for other data including entity history db, neighbor index. Can we use Cassandra because for history db because it is write heavy and append only. KV store for neighbor index.
@mostinho72 күн бұрын
11:00 good summary Can use hash index when you want fast reads and writes but you can’t do efficient range queries. Hash index is also kept in memory not on disk
@chaitanyatanwar81512 күн бұрын
Thank You!
@chaitanyatanwar81512 күн бұрын
Thank you! The videos and the discussions in Comments make this channel the best source for system design.
@XoPlanetI2 күн бұрын
14:02 isn't it the same UUId and different timestamp for replacing the message?
@RS7-1232 күн бұрын
another comment on this awesome video after rewatching it bunch of times. so to conclude, how did u say we achieve idempotency since there seems to be no best option?
@chaitanyatanwar81513 күн бұрын
Thank you!
@abcxyz96373 күн бұрын
Assigning N seats in a section to a group of k people each requesting {s_1, ..., s_k} seats is a Bin-packing problem. Simply allocating seats in FIFO order may lead to unfairness and sub-optimal allocation (people at the end of the list and requesting higher number of seats will be most impacted; and we want to sell as many tickets as possible). Although, it's not practically possible to solve Bin-packing in real-time, a simple optimization would be to maintain the sum of total seats requested by people for every section. In real-time, while the bookings are on-going, that sum must not exceed N. The actual seat allocation may be done offline e.g., seats can be finalized after the booking closes, and the users can be sent their tickets via mail/phone [AWS SNS] with actual seat numbers.
@guitarMartial3 күн бұрын
Jordan would HBase be a better alternative to MySQL here? You get write ahead log style indexing which can then be leveraged to build our heap as well and expire elements as they are consumed.
@chaitanyatanwar81513 күн бұрын
Thank you!
@mayankchhabra30703 күн бұрын
If we take the example of creating a search on top of chats and if we partition it at chat_id wont that lead to an uneven distribution of data? Given elastic search has these shards and it tries to distribute the data evenly across all the shards but if we explicitly route our data to a specific shard (using chat_id in our example) it can lead to uneven distribution of data across shards where one chat might have active and other might be dormant. Just thinking out loud how we would solve for this :P (Probably distribute it evenly by using some composite key but that would defy the purpose to just search chats from one partition)
@hazemabdelalim54323 күн бұрын
Why do you hashing on the user id and just store the session state in a separate storage like redis ? Sticky session will not guarantee the even distribution of traffic and it might be a bottleneck for very high amount of traffic
@rashminpatel37164 күн бұрын
All those channels are more redundant than replicated database !! True that 😂
@Jayvil7734 күн бұрын
Thank you for the series, finally finished it! You're a literal chad, I can only hope to grow up to be like you one day.
@jordanhasnolife51633 күн бұрын
You already are brother
@adishgangwal71054 күн бұрын
Hi Jordan - In order service - the second Flink which is sharded by product ID --how does it know that it has received all the products of the cart before sending email to the user ?
@jordanhasnolife51633 күн бұрын
I hadn't really made that my intention, and in the diagram we're willing to send multiple emails. If we wanted to do this, we'd probably have to split the products like we do here, have the original order id with a number of products and order id in the message, and then send to one final kafka queue + consumer to aggregate on order id
@RS7-1234 күн бұрын
great video. thanks for the incredible contribution. 2 questions 1) how do you notify users who aren’t online about unpopular posts. i would assume you need some sort of user specific queue after you fanout. the notifications table is sharded by topic id so it possibly won’t be queried directly for unpopular posts. 2) how do you notify users who are online about popular posts since i don’t see it connected to the web socket flow. i assume you expect them to poll periodically to see if they have any popular posts?
@jordanhasnolife51633 күн бұрын
1) when they come back online, they'll basically hit a cache of all notifications meant for them specifically (cache is partitioned by hash of user id), and combine this with popular notifications they were subscribed to 2) yep, just polling!
@chaitanyatanwar81514 күн бұрын
Thank you!
@Dychord4 күн бұрын
Cool!
@user-sx4wm5ls5q5 күн бұрын
thanks for the great video. One qq, how can a kafka queue have multiple consumers from 28:36 for "top-bid kafka queue"? I heard one partition of kafka can have only one consumer at a time. I guess we need to post the record into multiple partitions?(which parititions by the server it will be subscrbing..?) or relaistically, just change the last bit to in-memory queue like pubsub?
@jordanhasnolife51634 күн бұрын
You can have many consumers on the same partition, you just can't do round robin within a partition
@user-sx4wm5ls5q4 күн бұрын
@@jordanhasnolife5163 Ahh thanks. I had a misconcept of kafka, I thought it was one consumer per partition, and after seeing your reply, and found out that one consumer per partition within a consumer group, where each consumer group keeps its own offset. Thanks!!
@hbhavsi5 күн бұрын
Congrats on now 55K. Your videos are among the best out there. I love how you segue from one video to another, building upon one, ending at challenges, and talking about solutions in the next video.
@jordanhasnolife51634 күн бұрын
Thanks man!!
@ged99255 күн бұрын
Can you do one on palantir foundry?
@jordanhasnolife51634 күн бұрын
I'll probably have to do quite a bit of googling here, but if it's open source then hopefully! I'm surprised though I assumed anything out of palanatir would be very closed source
@asning975 күн бұрын
Good video. Snowflake uses FoundationDB for its cloud services/metadata layer which might be interesting for you to look at. FDB provides serializable isolation and ACID transactions while also being massively horizontally scalable which makes it fantastic for building distributed systems on top of, however it requires a lot of customization and ops work to run at that scale and there aren’t managed services for it afaict. CockroachDB has some similar value propositions. I think those kinds of new SQL DBs would be a great topic for your viewers
@jordanhasnolife51634 күн бұрын
Thanks for the insight there! I imagine it's another one of these DBs with consensus built in
@cunningham.s_law5 күн бұрын
arrow network format, parquet file format and duck db are all interresting peices of engineering
@jordanhasnolife51634 күн бұрын
Hoping to do all of those for sure!
@LoneStarExplorerTX5 күн бұрын
Thank you so much for the video, God bless you.
@ShreyasGaneshs5 күн бұрын
Do a video on tiger beetle they are doing some really cool stuff with viewstamp replication that u might enjoy looking into
@jordanhasnolife51634 күн бұрын
Haven't heard of it, thanks for the suggestion!
@NeyazShafi5 күн бұрын
For the record, 2 minutes is a really long time. I'm with you Jordan.
@jordanhasnolife51634 күн бұрын
We're not together anymore
@belikeamitesh6 күн бұрын
Bro 1st 20 sec 😂 , What a start - jokes apart amazing info thanks for sharing
@storiesaudio-x5c6 күн бұрын
With User ID and IP you can also add in user system data for a more precise system where, user system(mac address(Random mac on new phones is a troublesome thing),phone make,model,device id,screen resolution,processor,storage,device specs etc) a combination of these 3 can make it more precise, especially with cgnat(same ip for whole neighbourhood- common in asia by isps) networks.
@jordanhasnolife51634 күн бұрын
Nice insight, thank you!
@storiesaudio-x5c3 күн бұрын
@@jordanhasnolife5163 i guess twitter does this to detect bots using vm's to create new accounts for spamming.
@ibrahimmalik41556 күн бұрын
Dude you're simply awesome. Keep up the great work!
@smkhan0077 күн бұрын
Is it a good idea to connect client directly to the server queue?
@jordanhasnolife51634 күн бұрын
I think it really depends how many of them there are. I agree with you, in this case no, revised in 2.0 design
@hijonk95107 күн бұрын
man told me to crack open a red bull for a 12 minute video
@jordanhasnolife51634 күн бұрын
Ok I guess you can pop a Viagra next time
@chaitanyatanwar81517 күн бұрын
Thank you!
@lucasparzych51957 күн бұрын
MySQL was chosen over Mongo in this video because (paraphrasing) "its simpler and a document data model is not necessary". I'm confused though over one point and wondering if anyone could elaborate on it. It's been my understanding that most relational databases (MySQL included) do not offer native sharding capabilities because you can't really facilitate ACID transactional guarantees when your data is distributed across multiple computers. Yet, MySQL is chosen here on the basis that sharding is supported. I assume that there are popular extensions for MySQL that enable sharding (I think there are for postgres at least). Is the assumption that we'd be using one of these extensions? I also understand that we could implement "sharding" entierly on the application layer with a (somewhat) simple function like `determineWhichDatabaseToConnectToBasedOnShortUrl(shortUrl)`. Then we've got all of the operational complexity of provisioning and managing the separate databases to deal with though. So, I go back to the question: why not just pick MongoDB instead which uses single-leader replication, supports sharding natively and has unique indexes for our short_url column? Postgres is actually normally my go-to database but I've never needed sharding. My experience with Mongo is pretty limited so I may be just giving it too much credit. Maybe both present operational challenges.
@abcxyz96377 күн бұрын
For the selection sort, the last line should be `swap(lst[i], lst[minIndex])`
@jordanhasnolife51634 күн бұрын
Thanks!
@adw65798 күн бұрын
I watch all these videos and then I go to my shtty job where I create full stack cruds apps, using none of my knowledge.
@jordanhasnolife51634 күн бұрын
Too real
@VishalBhartivisu8 күн бұрын
Just awesome dude.. keep it coming...
@user-sx4wm5ls5q8 күн бұрын
Thanks for the video. One question, what's the line between "topic subscritpion DB" and the "Notification polling service" in the end diagram?
@jordanhasnolife51634 күн бұрын
If we're gonna poll for notifications we need to know which topic a user is subscribed to