101 Ways to Crash Your Cluster [I] - Marius Grigoriu & Emmanuel Gomez, Nordstrom

  Рет қаралды 6,704

CNCF [Cloud Native Computing Foundation]

CNCF [Cloud Native Computing Foundation]

Күн бұрын

101 Ways to Crash Your Cluster [I] - Marius Grigoriu & Emmanuel Gomez, Nordstrom
Running a kubernetes cluster requires operating many components. One must be good at running and scaling etcd, multiple control plane components, a monitoring system, a logging pipeline, Docker, rkt, and Linux itself. And this list isn't even close to being complete. With such a long list of technologies comes the potential to make a mistake that brings the whole cluster down. Come hear war stories from the Nordstrom's Kubernetes cluster admins. Each is a true story of how the cluster melted down, how they recovered, and what they did to prevent it from happening again. Don't let any of these happen to you...
About Emmanuel Gomez
Emmanuel initiated and served as tech lead on the Kubernetes platform efforts at Nordstrom for the last three years. He was working with and advocating for containers before the Kubernetes 1.0 release and has continuously (and tirelessly) developed, operated, educated, and led containerization efforts there.
This work has forced him to grapple with many of the challenges that come along with the opportunities of containers and container scheduling. Challenges both technical (ex: complex distributed systems, microservices observability), and organizational (ex: inertia, fragmentation, training). Despite these experiences, he wouldn't trade the new problems back for the old.
About Marius Grigoriu
Marius Grigoriu leads the teams responsible for all of the major tools along the software delivery pipeline: issue tracking, version control, continuous integration and deployment, and production through the use of Kubernetes. His focus is to help teams ship high quality systems on time, on budget, and with a smile.
Off the job, Marius can still be found at the keyboard, whether writing Golang or playing classical piano.
Join us for KubeCon + CloudNativeCon in Barcelona May 20 - 23, Shanghai June 24 - 26, and San Diego November 18 - 21! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy and all of the other CNCF-hosted projects.

Пікірлер: 3
@vrush151
@vrush151 2 жыл бұрын
Awesome Stuffs
@vernetto
@vernetto 6 жыл бұрын
awesome presentation, all IT manager should watch this, to understand how important it is to have a team of really trained and specialized professionals to operate the cluster
@gauravkodmalwar2109
@gauravkodmalwar2109 4 жыл бұрын
Nice presentation, very useful for me. I have a question about flink services. Were flink services in auto scale mode? I guess some properties of flink are constant like number of task managers, number of task slots. These are needed to be known for job manager. So any change in number of task managers or task lots due to scale down or scale up without modifying their similar properties in job manager would cause problem while allocating resources by flink job manager.
The Elements of Kubernetes - Foundational Concepts for Apps Running on Kubernetes
37:41
CNCF [Cloud Native Computing Foundation]
Рет қаралды 6 М.
IoK: Istio-on-Kubernetes Deep Dive [I] - Daneyon Hansen, Cisco
34:46
CNCF [Cloud Native Computing Foundation]
Рет қаралды 10 М.
Drink Matching Game #игры #games #funnygames #умныеигры #matching #игрыдлякомпании #challenge
00:26
I didn’t expect that #kindness #help #respect #heroic #leohoangviet
00:19
How Strong is Glass? 💪
00:25
Brianna
Рет қаралды 29 МЛН
It's the natural ones that are the most beautiful#Harley Quinn #joker
01:00
Harley Quinn with the Joker
Рет қаралды 22 МЛН
kubeadm Cluster Creation Internals: From Self-Hosting to Upgradability and HA [A] - Lucas Käldström
36:42
Helm Chart Patterns [I] - Vic Iglesias, Google
28:32
CNCF [Cloud Native Computing Foundation]
Рет қаралды 38 М.
Kubernetes Crash Course for Absolute Beginners [NEW]
1:12:04
TechWorld with Nana
Рет қаралды 3 МЛН
Certifik8s: All You Need to Know About Certificates in Kubernetes [I] - Alexander Brand, Apprenda
35:57
CNCF [Cloud Native Computing Foundation]
Рет қаралды 46 М.
The Architecture of a Multi-Cloud Environment with Kubernetes [I] - Brian Redbeard, CoreOS
36:11
CNCF [Cloud Native Computing Foundation]
Рет қаралды 9 М.
Kubernetes Deconstructed: Understanding Kubernetes by Breaking It Down - Carson Anderson, DOMO
33:15
CNCF [Cloud Native Computing Foundation]
Рет қаралды 193 М.
Vault and Secret Management in Kubernetes [I] - Armon Dadgar, HashiCorp
30:44
CNCF [Cloud Native Computing Foundation]
Рет қаралды 35 М.
Self-Hosted Kubernetes: How and Why [I] - Diego Pontoriero, CoreOS
33:37
CNCF [Cloud Native Computing Foundation]
Рет қаралды 9 М.
Large Scale Teaching Infrastructure with Kubernetes - Yuvi Panda, Berkeley University
34:34
CNCF [Cloud Native Computing Foundation]
Рет қаралды 1,8 М.
Linus Torvalds: Speaks on Hype and the Future of AI
9:02
SavvyNik
Рет қаралды 356 М.
Drink Matching Game #игры #games #funnygames #умныеигры #matching #игрыдлякомпании #challenge
00:26