The Party Must Go on - Resume Pods After Spot Instance Shut Down - Muvaffak Onuş, QA Wolf

  Рет қаралды 1,510

CNCF [Cloud Native Computing Foundation]

CNCF [Cloud Native Computing Foundation]

Күн бұрын

Don't miss out! Join us at our next Flagship Conference: KubeCon + CloudNativeCon North America in Salt Lake City from November 12 - 15, 2024. Connect with our current graduated, incubating, and sandbox projects as the community gathers to further the education and advancement of cloud native computing. Learn more at kubecon.io
The Party Must Go on - Resume Pods After Spot Instance Shut Down - Muvaffak Onuş, QA Wolf
Spot instances are about 60% cheaper but they frequently shut down and not every application can be resilient to handle it without data loss, especially long-running jobs like automated QA tests or data processing pipelines. What if you can migrate your container to another node with near zero-downtime when a shutdown signal is received? At QA Wolf, we heavily rely on spot instances due to their cost-effectiveness but the failures caused by shutdowns were significant enough for our customers to notice. We built a Kubernetes controller that orchestrates snapshot and recovery of containers of the failing nodes to another node where it can resume from the same state. In this talk, we will start with a demo, dive deep into the underlying mechanisms and see how much one can save in which scenario.

Пікірлер: 1
@tomaszsuchorowski9325
@tomaszsuchorowski9325 6 ай бұрын
Great talk! it made me wonder why Cloud providers expose all those Spot instance interruptions to end users? The Nodes are actually VMs and the virtualization technology today can actually move entire VM from one hardware server to another, it's called live-migration and when I saw it the first time executed in Data Center long ago it felt like magic i.e. my ssh session wasn't interrupted nor the processes I had started in the VM, I basically didn't even notice I was moved to another hardware. It looks like doing live migration on a process level is lot harder to achieve hence we don't have live-migraiton for PODs yet?
Lessons Learned from Let's Profile - Frederic Branczyk, Polar Signals
25:27
CNCF [Cloud Native Computing Foundation]
Рет қаралды 249
Nothing but NATS - Going Beyond Cloud Native - Byron Ruth & Kevin Hoffman, Synadia
35:43
CNCF [Cloud Native Computing Foundation]
Рет қаралды 1,3 М.
The Lost World: Living Room Edition
0:46
Daniel LaBelle
Рет қаралды 27 МЛН
Air Sigma Girl #sigma
0:32
Jin and Hattie
Рет қаралды 45 МЛН
Who is More Stupid? #tiktok #sigmagirl #funny
0:27
CRAZY GREAPA
Рет қаралды 10 МЛН
OpenTelemetry Collector: EVERYTHING you need to know [to get started]
10:07
Cilium Technical Deep Dive: Under the Hood
1:00:41
Isovalent
Рет қаралды 2 М.
Navigating the Cgroup Transition: Bridging the Gap Between Kubernetes and User Expec... S. Kunkerkar
39:44
China announces retaliatory tariffs on US goods
5:29
Al Jazeera English
Рет қаралды 227 М.
The OpenTelemetry Collector: A Deep Dive
53:26
Bindplane
Рет қаралды 3 М.
KubeSlice: Migrate Kubernetes Services With Confidence! | Project Lightning Talk
6:27
CNCF [Cloud Native Computing Foundation]
Рет қаралды 488
What’s Going on in the Containerd Neighborhood? - P. Estes, S. Karp, A. Suda, M. Brown, K. Ashok
36:56
The Lost World: Living Room Edition
0:46
Daniel LaBelle
Рет қаралды 27 МЛН