Congrats on now 55K. Your videos are among the best out there. I love how you segue from one video to another, building upon one, ending at challenges, and talking about solutions in the next video.
@jordanhasnolife51632 ай бұрын
Thanks man!!
@art4eigen93 Жыл бұрын
Congratulations for 10K Jordan!
@vankram1552 Жыл бұрын
Congratz with your data intensive applications youtube channel
@jordanhasnolife5163 Жыл бұрын
Thank you Martin kleppman has no idea
@rebirthless15 ай бұрын
9:25 I'm guessing the result of the table-table join is used within the consumer itself instead of being externalized, and the consumer consumes streams in addition to these 2 CDCs which need such joined result, except that they're not drawn in the graph?
@jordanhasnolife51635 ай бұрын
Not necessarily! You may just sink the result elsewhere. That being said, it's certainly possible this join could in turn be used with a third stream.
@sushantsrivastav Жыл бұрын
I noticed that a majority of your designs lean heavily towards kafka + Flink combination. Is this a personal choice, or do you run your designs by Senior Engineers? Don't get me wrong, these designs are not textbook-y, they are real and heavily tend towards what is "in" now (as against Alex xu's designs which seem, for lack of a better word, a bit dated. I have 17 years of experience in the industry (I am a fossil), but I get to learn many things from your discussions. This is incredibly rare for someone with 2 years of experience. Thanks for everything that you do!
@jordanhasnolife5163 Жыл бұрын
Mainly a personal choice, but I experience the pitfalls every day at work of using non replayable message brokers, so I'm definitely pro log based mq's and stateful consumers whenever I notice the opportunity to use them. That being said, I understand in practice this may have a high cost to implement due to storage and or latency and many people might compromise and opt for fewer partitions/simpler solutions. My designs are generally pretty idealized and not at all optimized for cost, which is why you don't see stuff like this too much IRL.
@sushantsrivastav Жыл бұрын
@@jordanhasnolife5163 On the contrary, I honestly believe these *are* real world and not textbook-y and cookie-cutter like "Grokking". If someone were to make these systems in 2023, they would choose this tech, as against say 2017-18 when the "system design" questions became mainstream.
@bokistotel6 ай бұрын
I watched this video again and I dont understand how frequently should Consumer fetch/process elements from Demographics queue ? If I got this right, each time Consumer processes element from demographic queue, a new in-memory copy of a database is created. So for example if every time DB is updated, the current state is sent to queue, that would mean that there is a frequent change of in-memory database in consumer. My question is when we have a new "search term" coming to a consumer, how should we handle the case when in-memory database is updating ? First thing that pops in my head is to create a condition that event in Demographic Queue has a higher priority on Consumer, so If consumer consumes something from demographics queue, this will be processed first, and then it will process the potential join? And in case anything from demographics queue takes precedent over "search-term-queue" would mean we would have to wait until the Demographics queue is empty, when its empty, the database consistency is valid, then preform join??
@jordanhasnolife51636 ай бұрын
There needs to be some sort of locking so that they aren't both grabbing the demographic table at the same time. That being said, there's not really any concept of determinism here. We just process events as they come in, no need for a "priority" or anything like that.
@LeoLeo-nx5gi Жыл бұрын
Hi thanks for mentioning the Issues in depth at the end, was about to post a comment asking for how will InMemory in all Consumers work and more. Just as a side note if this problem was to be done without Flink or any tool as such, is there are any other approach?
@jordanhasnolife5163 Жыл бұрын
I think you'd probably end up reinventing the wheel - taking distributed snapshots is really expensive so flink found a great way to do it without hugely impacting performance
@LeoLeo-nx5gi Жыл бұрын
@@jordanhasnolife5163makes sense
@shibhamalik127410 ай бұрын
Hi @jordan Awesome video is there a video on CDC deep dive? Is it a queue push after db save ? and if yes then it is similar to 2-phase commit , isnt it?
@jordanhasnolife516310 ай бұрын
Similar, however keep in mind that the push to the queue is *from* the db! And we don't need that push to happen for the write to be committed to the database. If the db goes down, or can't communicate to the queue, it's ok if we place those writes there later. That's the main difference.
@indraneelghosh6607 Жыл бұрын
Would it make sense to have to query the db as an option for data that is not in memory and there is no CDC event in the queue for that ID? If you have several TBs of data in DB, maintaining such a large number of consumers may be rather costly, right(As RAM is costly)? Is there a more cost-effective solution?
@jordanhasnolife5163 Жыл бұрын
You could theoretically store the flink data on disk I believe, but yeah if latency isn't a main concern you could always just query a db
@zachlandes5718 Жыл бұрын
Can you review some cases where we’d do table to table joins with streams? Is it mainly to offload work from the db(separation of consumers and db)? Or improve performance?
@jordanhasnolife5163 Жыл бұрын
If you want realtime joins that actually update for you, it would be useful. Think of a books table and an author's table. They're both big, so you don't want to query them both many times over. Maybe a new book gets added, and it turns out many authors had written to it, some of which were already in the table. Let's do a join. Maybe a new author gets added, and it turns out they contributed to many books in the table. Now we can only fetch the books we need without having to redo a full join.
@Rahul-pr1zr10 ай бұрын
So if the in-memory tables are huge you mentioned we can partition the info coming into multiple queues - does this mean we need to maintain multiple consumers with each consuming from a specific queue?
@jordanhasnolife516310 ай бұрын
That's correct!
@bokistotel6 ай бұрын
@8:05 are you talking about 2 tables in separate databases ?
@jordanhasnolife51636 ай бұрын
Yeah
@markelmorehome5 ай бұрын
I love Martin Kleppmann almost as much as I love you. He'd be a NYT best seller if he added tissue jokes.
@jordanhasnolife51635 ай бұрын
lmao imagine
@krish000back6 ай бұрын
Can you please provide few real time examples where these 3 enrichment types are used?
@jordanhasnolife51636 ай бұрын
Is that not what I did in the video? You're welcome to check out DDIA if you're looking for something more concrete, it's in the streaming chapter.
@krish000back6 ай бұрын
@@jordanhasnolife5163 I thought that's just made up (Google search example), and in general people pull from DB itself on general.
@tamarapensiero8048 Жыл бұрын
congrats on 10k hottie
@jordanhasnolife5163 Жыл бұрын
I literally spontaneously combusted
@pushpendrasingh1819 Жыл бұрын
do you make videos after waking up and smoking one joint?
@jordanhasnolife5163 Жыл бұрын
No I do them right between when I smoke crack and go to bed
@yrfvnihfcvhjikfjn Жыл бұрын
where do i buy the foot picks?
@jordanhasnolife5163 Жыл бұрын
You don't actually buy them, I just give them out in exchange for job referrals
@salmanrizwan9730 Жыл бұрын
video from jordan finaaaaaaaaaaaaaaaaaaalllllllllyyyyyyyyyy😍😍😍😍😍