A Crucial Topic for Sr SWE Interviews

A Crucial Topic for Sr SWE Interviews - Partial Failures

Рет қаралды 4,159

Күн бұрын

Пікірлер: 7

@igorrybalka2611 11 ай бұрын

Thanks for the video! Can you also elaborate what happens when coordination service itself fails? Is there a reconciliation happening in the background that compensates for this?

@pchandu1995 2 жыл бұрын

from where? how? you are learning and getting all of this information from? what are the book that you read and worked on?

@SDFC 2 жыл бұрын

There's a list of books that I use for learning about system design in the #read-me channel of the discord group (discord.gg/YvDx9BdZtv) Here's a handful of what's in there for book recommendations: - Designing Data Intensive Applications by Martin Kleppmann - System Design Interview (Volume 1) by Alex Xu - System Design Interview (Volume 2) by Alex Xu - Site Reliability Engineering: How Google Runs Production Systems - The Site Reliability Workbook: Practical Ways to Implement SRE I recommend that first book the most. Also, note that these are the same books that are in the pictures and banner of my channel; those are real pictures from my apartment 🙂 As for what resource for this being a crucial topic for Sr SWE interviews, I actually am relying on my own experience a fair amount for saying that. While at amazon for over 3 years, I had frequently heard a saying that "junior engineers design for the happy path while senior engineers design for the end of the world" or something like that. I also noticed in my own system design interview experience that the interviewer would tend to ask about "what happens if this piece here fails?" if I didn't deep dive such scenarios proactively (and you should be doing such discussion proactively because then you are "leading" the interview, which is something that some companies like facebook will look for from what I've heard.) Feel free to contribute any sources of your own if you have more comprehensive info or conflicting info from somewhere! I'm always looking to expand my own list of resources 🙂

@John-nhoJ Жыл бұрын

Could you please explain how the Kafka offset will allow you to determine which events were removed from the topic AND successfully processed vs. which events were removed from the topic BUT POTENTIALLY lost by the consumer?

@SDFC Жыл бұрын

kafka doesn’t “remove” messages from the topic, that’s only what message queues do. I imagine that this is related to the concern of implementing “exactly once processing” when “exactly once delivery” is impossible… So, the offset is maintained by the consumer. And there’s no means of differentiating a message that’s picked up and failed before the processing work from one that’s been picked up and processed prior to a failure before updating the offset. The solution is to make the write operations idempotent so that when the consumer does a retry of messages that were possibly lost, it’ll de-dupe off the primary key.

@shmubledore Жыл бұрын

I'm a little bit confused with the zookeeper calling the databases on behalf of the write capture service. Is it a correct understanding? Could you please give a hint about how it works technically? I thought ZooKeeper is just a strongly consistent KV-store, so it looks like some parts are missing in the diagram.

@donotreportmebro Жыл бұрын

It's a similar diagram to the one in the hotel booking video. I think it actually was supposed to be a coordination / booking service using ZooKeeper as a distributed lock that it uses to isolate transactions.