Incident Management (class SRE implements DevOps)

  Рет қаралды 44,742

Google Cloud Tech

6 жыл бұрын

In the previous video, Liz and Seth discussed how to make systems observable and how observability helps us diagnose failing systems, but didn't cover what to do when an incident grows beyond the ability of one person to do it all. In this video, you learn about the most important part of the incident management process - humans.
In the stressful moments of systems failure, it is important to define clear, concise roles for all the humans involved in an incident. With too few people, you can quickly become overloaded with work, but with too many people, work may be duplicated (i.e. too many hands on the keyboard). Learn how SREs effectively manage incidents with clearly defined roles and responsibilities such as the operations lead, planning lead, communications lead, logistics lead, and more. Seth and Liz also discuss techniques for managing long-running and exponentially complex incidents.
Reach out to Liz and Seth:
lizthegrey
sethvargo
Watch more episodes from the playlist here → bit.ly/2PPL6f0
Subscribe to the Google Cloud Platform channel for more Cloud content → bit.ly/GCloudPlatform

Пікірлер
She's very CREATIVE💡💦 #camping #survival #bushcraft #outdoors #lifehack
00:26
Cool Parenting Gadget Against Mosquitos! 🦟👶
00:21
TheSoul Music Family
Рет қаралды 19 МЛН
Как удалить компромат с компьютера?
0:20
Лена Тропоцел
Рет қаралды 2,7 МЛН
Is this Samsung's change over time #shorts
0:13
Si pamerR
Рет қаралды 1,5 МЛН
What other buttons does this remote need?
0:31
Den Do It
Рет қаралды 1,1 МЛН
Проверил, как вам?
1:01
Коннор
Рет қаралды 7 МЛН
Wireless switch part 177
0:58
DailyTech
Рет қаралды 12 МЛН
Hardware tools repair tool high performance tool
0:16
Hardware tools
Рет қаралды 4,3 МЛН
Luminous screen protectors 🔥 #iphone ##screenprotector #android
0:19