Рет қаралды 629
Don't miss out! Join us at our next Flagship Conference: KubeCon + CloudNativeCon North America in Salt Lake City from November 12 - 15, 2024. Connect with our current graduated, incubating, and sandbox projects as the community gathers to further the education and advancement of cloud native computing. Learn more at kubecon.io
Thanos Receiver Deep Dive - Joel Verezhak, Open Systems
One of the key strengths of Thanos is its ability to ingest metrics via remote write from a multitude of different sources simultaneously. However, tuning the stability of these metric receivers is notoriously tricky, as it is in any system where one must juggle hashrings. Given the critical role played by these components, understanding how to run them in a stable manner is of critical importance. In this talk, we will describe candidly some of our past incidents, and explain how each one shaped our current approach to running the metric receivers in Kubernetes. We will explain how to achieve a setup which is stable under scheduled rollouts, node restarts, and also present our attempts to make the receivers self-healing. As a bonus, we will also describe a new and surprising failure mode, which was able to knock out almost all of our supposedly hard-tenants in one go.