Thank you for the summary. Funnily, I just recently started getting deep into AWS, with availability/redundancy getting repeatedly hammered into my head. And then this. Fabulous timing.
@Mattias98765432103 жыл бұрын
You're very welcome! It's very enlightening to experience things like this, directly, isn't it? A key thing I think you should take away is that architecting resilient systems is not just about blindly following some pattern but rather about understanding all sides of the complicated issues and making intentional tradeoffs.
@xammocoloniax3 жыл бұрын
@@Mattias9876543210 Well put, sir. Similarly, my last place's motto is 'everything is insecure' and not relying on canned checklist certifications of anything, they're only starting points. Get in there, test and verify. Anytime a vendor claims something as 'proprietary' and untestable and to just trust their internal operating standards, alerts are raised. (Yes, they are anally open-source.)
@garysutcliffe97703 жыл бұрын
From this and previous AWS outages, I see the need for better manual processes and override procedures for when automation fails, or can't deal with the unforeseen event. But as has been said, still better off with AWS than without.
@Mattias98765432103 жыл бұрын
Hello, Gary! I certainly see what you mean. One of the complicating factors with this outage, though, is that because the AWS *control plane* was impacted, it was *not* possible in the moment to make *manual* changes to systems; things needed to have been set up *in advance* to react *automatically.* That said, it is also possible to devise certain *external* control systems for an automatic system to key off of, and maybe this is the type of emergency escape hatch that you mean we need to prioritize. 👍 See this tweet from AWS Distinguished Engineer / VP Colm MacCárthaigh: bit.ly/33P4rZb
@pritamsingh89503 жыл бұрын
There were frequent public event being reported on AWS status page from past couple of months but this one really pulled the pin. Hopefully something better will come out of it in long term. As you said, we are far better off with AWS rather without it.
@alvinvaughn65313 жыл бұрын
Outstanding analysis!
@Mattias98765432103 жыл бұрын
Thank you! 😁
@himanshuamodwala3 жыл бұрын
Can you please share the list of Books I see on the table
@Mattias98765432103 жыл бұрын
Glad to! Here's a clearer image of the books: imgur.com/a/WbSUC9K Let us know which ones interest you the most!
@jerilnadar19383 жыл бұрын
@@Mattias9876543210 I bet you didn't finish Cormen yet :)
@Mattias98765432103 жыл бұрын
@@jerilnadar1938 Hahaha! Cover to cover? Goodness no! 😝 But it has been an absolutely wonderful reference. :) By the way, they've said that they will be releasing a 4th edition in March, 2022--with updates including machine learning and online algorithms!
@jerilnadar19383 жыл бұрын
@@Mattias9876543210 That would be something to look forward to. Always loved the book.
@thegrumpydeveloper3 жыл бұрын
Lol I came here for the us-west-2 but stayed for the us-east-1. Good coverage!
@Mattias98765432103 жыл бұрын
😂🔥 And I think we're now officially allowed to blame *Scott* for what happened to the us-west regions: kzbin.info/www/bejne/nKXPZqSZnLh8i5Y
@robertcorbin27493 жыл бұрын
Hmmm. Is this a canary in the coal mine? Is AWS cheaper than on prem until it’s not???
@kellymoses85663 жыл бұрын
When AWS has a major outage they can call hundreds of people to fix it. Most companies can't do that
@gordonfung1493 жыл бұрын
Let's face it. Multi region failed. That's why we need multi cloud. Expensive? Depends on how important is your apps and customers.
@JAM41113 жыл бұрын
My T-Mobile phone would not make calls during the outage. I learned that T-Mobile is one of AWS's biggest customers and runs their everything. Until it doesn't. I read that Kronos, a large payroll and HR services company, was also down. Did paycheck not go out? Some pretty serious consequences unless you have a Plan B.
@Mattias98765432103 жыл бұрын
That's interesting that your phone had an issue! I wonder if it was *caused* by the outage or just a coincidence that it happened at the same time? And did Kronos get impacted by the AWS outage, too? I know that they were more recently hit by a ransomware attack that has taken them offline for reportedly weeks, but I didn't know more than that. Regardless, you are absolutely right that a plan B is critical! This outage has taught a lot more people that Everything Fails, All The Time! 😝
@2112jonr3 жыл бұрын
It's mostly DNS.
@Mattias98765432103 жыл бұрын
I mean, it usually *is,* yes! 😝 I still plan to get a nice printout and frame that DNS haiku!
@shyammohabir82833 жыл бұрын
Speaking about outage, AWS actually demonstrated why "Go global in minutes" is really bad for you and their selling points of Reliability, Elasticity, High Availability, Increase speed and agility - are just that .. Marketing!
@Mattias98765432103 жыл бұрын
Nah. They are marketing points because they are valid. Think about it this way... You could choose to either: 1) Use the cloud and ship quickly and bring value to your customers right away. Or 2) Spend a year or three extra to do it on your own and delay all that customer (and business) value for the hope (definitely not guaranteed) of achieving better reliability than the cloud offers. If you look at it another way, all the extra time you spend trying to build a non-cloud system is effectively additional downtime--likely *years* of downtime before you even launch. And that's not even talking about the cost to build. There's no question that there are tradeoffs to be made--because nothing is perfect--but I still stand by my comments at 9:17 and on. :-) Oh, and also don't miss the Bare Metalsson video! kzbin.info/www/bejne/d4nbo2OmnM2KiZI
@JustinH953 жыл бұрын
Have fun making another video today lol
@jeetsg363 жыл бұрын
this is inside job by AWS.... why because now aws again ask to the client to build redundancy across multiple region so clients have to pay more.
@amjds13413 жыл бұрын
Another one happened today
@SharadTalekar3 жыл бұрын
Looks like its down again.
@LaVidaEnUnaGota3 жыл бұрын
Multicloud :)
@2112jonr3 жыл бұрын
Twice as many people. Twice as costly. And Microsoft have WAY more outages than AWS. And then there's Microsoft's security incident history. AWS still has much less downtime than on-premise.