Dissecting GitHub Outage - Master failover failed

  Рет қаралды 1,562

Arpit Bhayani

Arpit Bhayani

Күн бұрын

Пікірлер: 18
@AkshayKumar-yc2ls
@AkshayKumar-yc2ls 2 жыл бұрын
This is just brilliant dissection! I’ve seen n number of content creators creating content on software engineers but you just thump them all. The amount of passion that you showcase while explaining is just so contagious. Looking forward to learning more from you 😊
@karthikdinne1078
@karthikdinne1078 2 жыл бұрын
Thanks a lot Arpit for all the effort you are putting in. Hope you continue doing this. Small token of appreciation from my side - Karthik Dinne :)
@AsliEngineering
@AsliEngineering 2 жыл бұрын
Thank you so so so much Karthik. Means a ton :)
@d4devotion
@d4devotion 2 жыл бұрын
I am laughing, because during our beer party me and my friend suddenly started this master failover discussion and we did not know that how to tackle such situation of making sure that you do not loose the data if new master also crashed while serving write request. You explained like it can be done by a 5 years old child. Great job dude.
@AsliEngineering
@AsliEngineering 2 жыл бұрын
You folks talk about Master failover when drunk 🤣🤣 insane!!
@d4devotion
@d4devotion 2 жыл бұрын
@@AsliEngineering :D I am again laughing as I saw you posted this on LinkedIn, now how can I comment there to the world that we are that folks :D :D . Great to see that bro.
@shinnosukenohara1201
@shinnosukenohara1201 2 жыл бұрын
saw similar incident last year for one of the products I was working on. You really explained very well in easy words.
@AbhijeetSachdev
@AbhijeetSachdev 4 ай бұрын
Thank you !, Very Good video. [Question] Critical information is missing: a. What happens if some updation happens in 6-sec window in new-master ? And as soon as we switched back to old-master, one more updation happens for the same row in old-master. How will we handle this scenario, because WAL entry in new-master in not valid any more because it is stale. We cannot blindly replay WAL of new-master. One possible way I can think of is: First: apply WAL then switch back to old-master.
@kushalkamra3803
@kushalkamra3803 2 жыл бұрын
Awesome! Thanks for sharing
@sritejaparimi6605
@sritejaparimi6605 2 жыл бұрын
Hi Arpit, when we switch back to old from new after the new master crashes, first we sync the binlog and then start serving traffic on the old master? Until the binlog is synced the old master shouldn't be serving traffic right?
@AsliEngineering
@AsliEngineering 2 жыл бұрын
Great point. But during an outage you do whatever it takes to accept new writes. Which is where it is a common strategy to first switch and then sync. In ideal world you would have sync and then switch.
@sritejaparimi6605
@sritejaparimi6605 2 жыл бұрын
Got it, Thank you so much Arpit! Would be really great if you could expand on how switch and sync works internally. Because what if after switching we get some updates to the 6 sec writes happened in the new master(crashed), and then if we try to sync we have to be careful we don't overwrite those updates right? we would have to check timestamp or something else to not do that?
@dhruvilshah9098
@dhruvilshah9098 2 жыл бұрын
Really Loving your content, want to see more and more, if possible make a video on Cloudfare outage and Uber Database change from Postgressql to Mysql, Thanks...
@AsliEngineering
@AsliEngineering 2 жыл бұрын
Thanks. Cloudflare outage coming soon. I am trying to wrap my head around it. Diving really deep to understand what exactly happened so that I could explain it in simpler language :)
@dhruvilshah9098
@dhruvilshah9098 2 жыл бұрын
@@AsliEngineering Thanks a lot
@LeoLeo-nx5gi
@LeoLeo-nx5gi 2 жыл бұрын
Hii Arpit bhaiya wanted to know are all these things handled only by the SRE team engineers? Sorry am a Fresher so don't know much like can anyone from any other team not know or contribute for it?
@AsliEngineering
@AsliEngineering 2 жыл бұрын
Not really. Even backend engineers do this. It depends on the company though.
@LeoLeo-nx5gi
@LeoLeo-nx5gi 2 жыл бұрын
@@AsliEngineering ohh great
GitHub Outage  - How databases are managed in production
23:41
Arpit Bhayani
Рет қаралды 7 М.
Dissecting GitHub Outage - Downtime they thought was avoided
17:44
It’s all not real
00:15
V.A. show / Магика
Рет қаралды 20 МЛН
The Best Band 😅 #toshleh #viralshort
00:11
Toshleh
Рет қаралды 22 МЛН
How GitHub's Database Self-Destructed in 43 Seconds
12:04
Kevin Fang
Рет қаралды 1 МЛН
How hard it is to build an Open Source product in India!
1:52:46
Arpit Bhayani
Рет қаралды 15 М.
Dissecting GitHub Outage - Repository Creation Failed
20:59
Arpit Bhayani
Рет қаралды 1,2 М.
Best practices that make microservices integration easy
14:08
Arpit Bhayani
Рет қаралды 6 М.
Best of CES 2025
14:50
The Verge
Рет қаралды 486 М.
Introduction to NoSQL databases
26:18
Gaurav Sen
Рет қаралды 792 М.
All Rust string types explained
22:13
Let's Get Rusty
Рет қаралды 194 М.
Load Balancers are not Magic - Dissecting Atlassian Outage
13:07
Arpit Bhayani
Рет қаралды 36 М.