"Testing Distributed Systems w/ Deterministic Simulation" by Will Wilson

Рет қаралды 43,797

Күн бұрын

Пікірлер: 8

@jonwatte4293 6 жыл бұрын

I remember the "spin" network protocol simulator from Bell Labs in the early '90s. It was used at then-AT&T to predict and avoid bugs in telecom equipment systems. Sounds very similar in approach; simulate first, detect protocol problems (deadlocks, livelocks, bad states, etc) and then implement once you get it right.

@outwithrealitytoo 10 күн бұрын

It does not entirely check for "thread safety" - but testing for unexpectedly out of order events are certainly a starter for ten. In the olden days working on radio comms s/w of state/event matrices forced devs to think about all eventualities - but those were simpler systems. When you have multiple processes speaking to each other, each with their own state machine, the number of states of the system as a whole becomes exponentially more complex as you add processes, and bad things can happen. Part of the argument for distributed systems is that each small part can be easily understood and tested fully. However, often by splitting a complex system you actually create a system with far far more states many of which are never considered, or used or tested. Deterministic Simulation will test some of these situations by not all of them. It will be helpful for debugging but best not have a bug in the first place. And all this is before people start inaccurately and incompatibly reflecting the same state in multiple processes. For this reason my advice has always been "do not create another process or another thread unless absolutely necessary - it may be complicated , but splitting it up unnecessarily will only make that worse, it just won't be obvious until it is too late".

@LewisCampbellTech Жыл бұрын

5 minutes into this talk and my mind is already blown.

@syzer3921 10 жыл бұрын

can u fix the sound?

@waynemokane 3 жыл бұрын

What if the invariant you want to test isn't so invariant after all, but actually depends on what happened during the simulation? Ex: if some message gets dropped, then I wouldn't expect the final state to have key X. Is this feasible? You would need to make the simulator dynamically update the expectation based on what random thing it breaks. It seems like doing that would likely require reimplementing a lot of the state management logic of the system under test itself.

@mortenbrodersen8664 10 жыл бұрын

Also, tools like TLA+ are made to solve these problems. The difference being that TLA+ uses logic to verify all traces *exhaustively*, without taking millions of years to do so. Running a sim will only explore a tiny percentage of possible event traces.

@bgianf 10 жыл бұрын

TLA+ will only get you proof of your algorithms, you must still verify your implementation...

@09goral 4 жыл бұрын

It’s also not verifying all and exhaustively. It picks some subset of the all possibilities.