Thanks for a nice video! Question - would it make sense to use timeseries DB instead of Cassandra for Product Price database? If not, why? Thanks!
@Kevin-jt4oz Жыл бұрын
Why do you use CDC so much? What are some scenarios CDC wont work? Is it scalable?
@SDFC Жыл бұрын
I actually used to have the same concerns about CDC/DB triggers, but a principal engineer from Amazon (specifically, Steve Huynh from A Life Engineered) actually told me they're perfectly fine in the way that DynamoDB handles them. I was recently told by one of my viewers that Debezium (a popular tool for CDC) is actually built on top of kafka, so that thing's definitely going to scale just fine as well. *"Why do you use CDC so much?"* I use it so much because it handles partial failure scenarios super cleanly -- the alternative is that the upstream service would make a pair of API calls, and unless it was set-up for perpetual retries (which is a particularly bad idea if it's handling calls from end users), you'd end up with an invalid state from the 2nd call getting dropped. You could also alternatively just put a message into kafka, and then use the task runners at the end of it for perpetually retrying the pair of API calls, but that adds an extra component or two beyond what you'd be using with CDC/DB triggers. *"What are some scenarios CDC wont work?"* Anything that needs the upstream and downstream databases to be strongly consistent with each other. The CDC/DB triggers thing is basically what you'd call a "saga", which is a form of distributed transaction that involves eventual consistency. Accomplishing a distributed transaction without relaxing the consistency level is a tremendous challenge, and my current favorite approach for that is to avoid the "distributed" aspect by partitioning the two tables over the same partition key so that the proper record can be updated on both tables while only involving a single DB node in the transaction. (You can see this in my coverage on the flash sale problem -- it's a trick that I learned from a comment on my hotel booking video.)
@machsagel470110 ай бұрын
May be interesting to cover what are the trade off of going event driven and say price updated and use a message broker to propagate downstream vs CDC ? @@SDFC
@donotreportmebro11 ай бұрын
- Scapping needs work imho: how do you find new products (tons of new products appear daily on Amazon)? how do you frontload products before doing regular scapping? - Product update TPS is incorrect imho: 1mil product, once per day update, but you want to update them quickly not to spread / schedule updating lazily over 24 hours, thus it can be 1mil QPS (if I'm a user I don't want to wait 24 hours to get an alert, the product might be out of stock because other customers got alerts 24 hours earlier from other price trackers)
@Gerald-iz7mv Жыл бұрын
why you scrape all products? if there are only a couple products the user is interested in (products the use subscribed for updates) - i would be rather only scrape those product instead all amazon products... ok it makes sense - since we need to a price history to all products - so the user can set the right target price, otherwize the user would not know what is a good target price to set...
@robbieskonieczny946411 ай бұрын
Agreed this makes sense. Maybe you can have 2 different scrappers. Highly subscribed URLs are revisited more frequently (Maybe within hours of price drop). Low-subscribed URLs are visited every XYZ hour more than low-subscribed URLs. I also assume you could prob scrape Amazon to figure out the frequency of sold items to factor that in.