Рет қаралды 40
Finding the Capacity to Grieve Once More
Alexandros Kosiaris, Wikimedia Foundation
At Wikipedia, we handle unpredictable traffic spikes, especially during notable deaths, which can cause severe outages. Despite believing we had mitigated this issue years ago, a major outage occurred in 2020 due to a notable death and a DDoS attack, leading to the realization that our platform needed further improvements. Over the years, we conducted investigations and implemented numerous fixes, educating new SREs about our platform's unique constraints. Two years ago, following the death of Elizabeth II, our system successfully handled unprecedented traffic without outages, demonstrating our platform's resilience. This story highlights the infrastructure improvements that allowed us to manage traffic surges and the emotional journey of regaining the capacity to properly grieve significant losses.
We heavily rely on open source, and our code is public, making our solutions accessible to everyone.
View the full SREcon24 Europe/Middle East/Africa program at www.usenix.org...