The Party Must Go on - Resume Pods After Spot Instance Shut Down - Muvaffak Onuş, QA Wolf

  Рет қаралды 1,510

CNCF [Cloud Native Computing Foundation]

CNCF [Cloud Native Computing Foundation]

Күн бұрын

Don't miss out! Join us at our next Flagship Conference: KubeCon + CloudNativeCon North America in Salt Lake City from November 12 - 15, 2024. Connect with our current graduated, incubating, and sandbox projects as the community gathers to further the education and advancement of cloud native computing. Learn more at kubecon.io
The Party Must Go on - Resume Pods After Spot Instance Shut Down - Muvaffak Onuş, QA Wolf
Spot instances are about 60% cheaper but they frequently shut down and not every application can be resilient to handle it without data loss, especially long-running jobs like automated QA tests or data processing pipelines. What if you can migrate your container to another node with near zero-downtime when a shutdown signal is received? At QA Wolf, we heavily rely on spot instances due to their cost-effectiveness but the failures caused by shutdowns were significant enough for our customers to notice. We built a Kubernetes controller that orchestrates snapshot and recovery of containers of the failing nodes to another node where it can resume from the same state. In this talk, we will start with a demo, dive deep into the underlying mechanisms and see how much one can save in which scenario.

Пікірлер: 1
@tomaszsuchorowski9325
@tomaszsuchorowski9325 6 ай бұрын
Great talk! it made me wonder why Cloud providers expose all those Spot instance interruptions to end users? The Nodes are actually VMs and the virtualization technology today can actually move entire VM from one hardware server to another, it's called live-migration and when I saw it the first time executed in Data Center long ago it felt like magic i.e. my ssh session wasn't interrupted nor the processes I had started in the VM, I basically didn't even notice I was moved to another hardware. It looks like doing live migration on a process level is lot harder to achieve hence we don't have live-migraiton for PODs yet?
Lessons Learned from Let's Profile - Frederic Branczyk, Polar Signals
25:27
CNCF [Cloud Native Computing Foundation]
Рет қаралды 249
Exploring a generic spiral LED light - with schematic
10:52
bigclivedotcom
Рет қаралды 7 М.
УНО Реверс в Амонг Ас : игра на выбывание
0:19
Фани Хани
Рет қаралды 1,3 МЛН
진짜✅ 아님 가짜❌???
0:21
승비니 Seungbini
Рет қаралды 10 МЛН
Apple's Latest Security Nightmare
24:38
Surveillance Report
Рет қаралды 27 М.
Nothing but NATS - Going Beyond Cloud Native - Byron Ruth & Kevin Hoffman, Synadia
35:43
CNCF [Cloud Native Computing Foundation]
Рет қаралды 1,3 М.
🚨🚨 Zig; what I think after months of using it 🚨🚨
ThePrimeTime
Рет қаралды 2,4 М.
OpenTelemetry Collector: EVERYTHING you need to know [to get started]
10:07
Navigating the Cgroup Transition: Bridging the Gap Between Kubernetes and User Expec... S. Kunkerkar
39:44
The OpenTelemetry Collector: A Deep Dive
53:26
Bindplane
Рет қаралды 3 М.
China announces retaliatory tariffs on US goods
5:29
Al Jazeera English
Рет қаралды 227 М.
УНО Реверс в Амонг Ас : игра на выбывание
0:19
Фани Хани
Рет қаралды 1,3 МЛН