Distributed Systems 5.1: Replication

Рет қаралды 46,836

Martin Kleppmann

Күн бұрын

Пікірлер: 37

@TruongHoang-du9if 4 жыл бұрын

This lecture has cleared the vagueness I had during reading Designing data-intensive applications.

@_Mindfield 15 сағат бұрын

@martin at 22:30 these are two different clients. In lecture of logical clock, counter/vector was managed at server side. How this will done at client side?

@Tatiana-zs3dc 10 ай бұрын

What a great lecturer! THANK YOU for explaining everything so clearly !!!!

@zhou7yuan 3 жыл бұрын

Replication [0:06] vs RAID [3:11] Retrying state updates [4:25] (twitter negative count example) [6:54] Idempotence [8:08] Choice of retry semantics [9:25] Adding and then removing again [10:50] f(g(f(x))) ≠ g(f(x)) Another problem with adding and removing [13:01] Timestamps and tombstones [15:07] x → "false" invisible (tombstone) [17:13] record has logical timestamp of last write [17:43] Reconciling replicas [17:59] Concurrent writes by different clients [19:51] 2 common approaches [21:28] Last writer wins (LWW) Lamport clock. data loss Multi-value register [23:22] vector clock. preserve both {v1,v2} if t1||t2

@yangchen392 3 жыл бұрын

I just read the paper of Amazon Dynamo and its Section 4.4 just give a real example of using the multi-value register approach to resolve inconsistencies. Very interesting!

@mostinho7 11 ай бұрын

8:20 example of idempotent operation, instead of incrementing likes, you add the user to set of likers, that way if response to the request is lost and user tries again, not incremented twice 9:40 retry semantics (at most once, at least once, exactly once - requires deduplication cache or idempotent operation like adding a user to set of likers for a post)

@baay81 3 жыл бұрын

I have a question with regard to the example at 11:47. Are client1 and client2 two separate client processes of the same user? Otherwise why would g:unlike cause the removal of client1's user id from the set?

@hongminhle5429 3 жыл бұрын

I have the same question! How client2 can remove the like action of client1 if they are different users. Hope to see his answer

@alomarrajawat8264 2 жыл бұрын

@@hongminhle5429 they are for same user i.e {userId}. A same user can use multiple clients (phone, computer) and that's where idem-potency fails.

@satinek 2 жыл бұрын

Vielen Dank für diese tolle Vorlesung. Ich hatte Schwierigkeiten mit diesem Thema, aber durch Ihre Videos läuft es jetzt viel besser.

@_Mindfield 15 сағат бұрын

At 17:53 . Client is generating logical timestamps. Lets say vector one. What if there are multiple clients. 2 instance of a service which is connected with the database. How that is managed?

@_Mindfield 15 сағат бұрын

On second thoughts above approach make sense when i have single master and multiple replica and that master is managing logical clock. So all updates is being issued by a single client. Does this slide cover that scenario?

@2tce 2 жыл бұрын

In Lamport timestamps, t1 < t2 does not imply that the event at t1 happened before t2. Vector timestamps would have been better here, since it guarantees ordering for concurrent + causal events. Atomic clocks, like in the case of Google's Spanner, can achieve this as well.

@rvender 10 ай бұрын

CMIIW but from the image it shows that the client send the message t1 before t2 so lamport can still be used here because its from the same node? englighten me if am wrong

@BharCode09 Жыл бұрын

Hi Martin. Thanks for uploading the lecture series. a GOLD MINE! I have read your book DDA as well. For me, your lectures and the book both are much easier to understand the complexities in distributed systems, than any other so-called simplified books written for practical, developers/engineer's understanding. I have a query @21:32 with concurrent writes by clients and using logical clocks for reconcile. Here we are considering clients logical clock (its like request message time (events)) and will be different for 2 different clients. When using client's logical clock, can those clocks be comparable at all? Even if those are concurrent or not. Unless Replicas themself have some way of tracking their own logical times (events within their node) there is no way of reconciling the state by comparing with clients logical time right? t1 and t2 are incomparable irrespective of their values (ie. even if t1> t2 or t2>1) we cant say one is happens before. A better way would be Nodes having their own logical clock and then may be incrementing before writing their states, upon clients requests. Would that work. Or better a NTP clock sync across Node A and B and then may be some comparison makes sense, ignoring the skew and propagation delay? Also with Vector clock approach, dont we need entry for each client in the vector? Bcz we are considering the client logical time? Please correct me if my understanding is wrong. Again thanks a ton for this lecture series. By far the best Distributed videos I have ever encountered!

@tianwenchu1663 Жыл бұрын

where can client initially gets this T1 and T2 as vector clock? From previous lecture, these vectors are derived on each node. So for every write request, client will actually read first to get the vector clock and then use this vector clock within write. If using quorum read, client can pick the largest vector clock, or random pick one if coexisting several un-comparable/concurrent winners.

@meamea5127 3 жыл бұрын

Thank you merci gracias Grazie obrigada 谢谢 شكرا لك ありがとうございました 감사합니다 Спасибо Danke schön

@austinoquinn815 Жыл бұрын

In the case of LWW using lamport clocks, what if two comits happen concurrently and have the same clock. How is this resolved or do we assume this is unlikely to ever happen?

@abcdef-fo1tf Жыл бұрын

So I see replicas can reconcile in 18:30, but what if users read from the faulty node before reconciling? They can issue further commands based on wrong data from node b for example, leading both node A and B to be in bad states. This can lead to very bad outcomes like a person withdrawing more money from a bank than he has. I guess my question is how do we deal with race conditions that we normally get in concurrent systems?

@SemihSahinCS Жыл бұрын

consistency/quorum/consensus videos should be covering this.

@qianyaohao596 2 жыл бұрын

This really helped me a lot to understand how they resolve inconsistencies ! Thank you so much for sharing this video.

@algorithmimplementer415 4 жыл бұрын

Hi Martin, thanks for your videos. Very much enjoyed. Can we get the slides of your lectures? Are they uploaded somewhere?

@revaluacion 3 жыл бұрын

See the description of the video. There you'll find the links

@prabhatsharma284 3 жыл бұрын

My one question to everyone, where can I apply all this knowledge of distributed systems given I'm simply a software developer. I hope to build something as cool as my own database but that's the goal not the start

@arnv4487 Ай бұрын

Amazing video!

@blocksmithbrothers 3 жыл бұрын

First thanks for great lecture but I have a question w.r.t Concurrent write conflict resolution, you mentioned that LWW we can use lamport clock, but issue will be client 1 and client 2, lamport clock will be initialized with 0 and there is no explicit communication between them hence t1 = t2 then how the conflict will be resolved?

@SemihSahinCS Жыл бұрын

The idea here is to avoid inconsistencies between two DBs. Since update1, and update2 are concurrent we can't really know which one was first, and we actually do not care. As long as DBs are consistent when there are concurrent updates, we are fine. You can think of the logic that DBs implement to set the final value as: value = t1 > t2 ? v1 : v2 which gives us the consistency.

@nikolay6700 4 жыл бұрын

Can we have a case when t1 == t2 in the case of Lamport timestamp? what should we do in this case?

@jcdyer3 4 жыл бұрын

If I understand correctly, that can only happen if the events are not causally related, and if that's the case either you don't care, or if you do, you have to disambiguate, often by secondarily sorting on the client ID (because all the events that happen on a given client are causally related).

@TruongHoang-du9if 4 жыл бұрын

If you have seen the previous lecture about Logical clock, Martin said that in such case, we would compare names of the nodes.

@2tce 2 жыл бұрын

@@jcdyer3 Well this still sounds like there could be possible intentional data loss. I am not sure why we can't combine Logical clocks with Atomic clocks. Then in these sort of cases where we not able to determine the total ordering of messages that are concurrent, we then leverage the timestamp from atomic clock. Google's Spanner just uses Atomic clocks.