Zero downtime with Kubernetes was harder than I expected

  Рет қаралды 7,522

Web Dev Cody

Web Dev Cody

Күн бұрын

Пікірлер: 32
@SeibertSwirl
@SeibertSwirl 6 ай бұрын
Aight babe, we see you 😮‍💨 love ya!
@webstuffzak
@webstuffzak 6 ай бұрын
Would love a complete beginner's guide to k8s. I can follow all your videos but in this one, I didn't get a single thing you talked about. Love how you explain concepts!
@WebDevCody
@WebDevCody 6 ай бұрын
At this point I'm just trying to learn as much as I can about k8s so that I can eventually make a longer form crash course for it.
@dandogamer
@dandogamer 6 ай бұрын
One thing to keep in mind with k8s is that it's all eventually consistent. Definitely important if you ever want to get into operators
@abhishekmehandiratta4241
@abhishekmehandiratta4241 6 ай бұрын
One way you can stop traffic in-flow to your old pods as you deploy is to have separate endpoints for readiness and liveness probes, and the moment you receive a SIGTERM, your readiness probe should return a 5XX status code. I think you won't need the time.sleep then. Also, you can implement this check by using a global atomic.Bool and check its value in the readiness probe.
@WebDevCody
@WebDevCody 6 ай бұрын
I'm not too sure. I think the issue is when you have a LOT of traffic, the moment you start shutting down your go server it'll no longer accept traffic, BUT k8s might already be routing traffic to the pod while you're turning off it's ability to accept requests. I think you'd have to use the SIGTERM to first make the readiness probe fail, but then you'd need to continue listening for all incoming requests on your server until they hit zero for enough elapsed time (this is when you'd know the k8s load balancer updated), and THEN gracefully shutdown the server (stop accepting requests and wait for all current requests to finish)?
@abhishekmehandiratta4241
@abhishekmehandiratta4241 6 ай бұрын
​@@WebDevCody Exactly - you need to first disable the readiness probe so that no more requests are routed to this pod, and then shutdown your server via application code to no longer take more requests. If you've configured the readiness probe interval to be 5s, then have a 5s delay between these two steps.
@ultimathule9841
@ultimathule9841 6 ай бұрын
Great video. I saw the same problem happening within my previous company as well. They solved it using preStop hook and sleep for 10 seconds. Also just started watching your videos and love your backend/system design content. Hope to see more, especially w/ Go😆
@WebDevCody
@WebDevCody 6 ай бұрын
oh yeah a preStop might actually be the recommended approach from that blog post I linked. I just forgot to mention it.
@taquanminhlong
@taquanminhlong 6 ай бұрын
Thanks for sharing, appreciate it 🎉
@yunyang6267
@yunyang6267 6 ай бұрын
Would you use kubenetes for your personal projects? Could you also set up Go with Serverless framework and compare it with node? That would be very interesting to see
@mksybr
@mksybr 3 ай бұрын
I wish you would've said how hard you expected it to be, because when I watched this and the answer was just "you have to handle SIGTERM" I was thought what else could it be? It's not even kubernetes specific, for instance if you used systemd as an orchestrator without any container wouldn't your program have to do the same? also i think the time it takes for your routing to update will scale to the size of your cluster (services amount?) so it would be better if possible to check that its done otherwise your hard coded sleep amount might be either overkill (fine) or not long enough (downtime -- bad)
@tonychia2227
@tonychia2227 3 ай бұрын
What about using the blue green deployment instead ?
@NightstalkerKK
@NightstalkerKK 6 ай бұрын
Did you come across rollout restart yet? I believe that achieves zero downtime when making changes to a replicaset.
@WebDevCody
@WebDevCody 6 ай бұрын
not sure it's possible. a block storage device can only be assigned to one pod, so I don't think a rollout is possible without downtime switching the persistent storage
@ac130kz
@ac130kz 6 ай бұрын
maybe it's possible to use a kubernetes client library to await for full deletion?
@kazwalker764
@kazwalker764 6 ай бұрын
I'm not sure why you need to do any waiting after getting term, you'll always want to do graceful handling of term and hangup signals, but waiting shouldn't be needed. I just found your channel, so I'm not sure what you're using to manage your pods or your k8s setup in general. Using deployments set to rolling updates with maxUnavailable, and a service with an ingress pointing to it should be enough to avoid seeing any 500s and not need any waiting.
@WebDevCody
@WebDevCody 6 ай бұрын
I'm using a statefulset and I'm not sure maxUnavailable works for those from what I've read
@kazwalker764
@kazwalker764 6 ай бұрын
@@WebDevCody Oh, I see, is there any reason you need to use a StatefulSet specifically? If you're running something that needs storage (PV/PVC) tied to specific instances, like a database, or have some particular need for consistent pod names, a StatefulSet is a good choice. But if you're running a normal web app, you probably want to use a Deployment since it's a better choice for stateless workloads. Cheers!
@MiiDosvid
@MiiDosvid 6 ай бұрын
as it finds out there is an issue in k8s handling of routing during rollout
@MiiDosvid
@MiiDosvid 6 ай бұрын
that incorrectly behaviour. you get problems, because you are running only one instance of container in your pod. Add second one, k8s will do rolling deployment closing one by one. thus always 2 instance be available in new and old pod, so you dont have problems.
@WebDevCody
@WebDevCody 6 ай бұрын
I have replicas more than 1, and I’m using a stateful set. Try it yourself, there is downtime.
@MiiDosvid
@MiiDosvid 6 ай бұрын
i​@@WebDevCodyif so then you are getting errors because your requests got broken in the middle of the call. You should stop accepting new connections after you get sigint
@MiiDosvid
@MiiDosvid 6 ай бұрын
​@@WebDevCody i will try to make an example tomorrow if will have time
@WebDevCody
@WebDevCody 6 ай бұрын
@@MiiDosvid isn’t that what I’m explaining in this video? I talk about how your service needs to gracefully shutdown and stop accepting connections or else you’ll get dropped connections
@MiiDosvid
@MiiDosvid 6 ай бұрын
@@WebDevCody you shouldnt set timeout you should call Shutdown ASAP. As it blocks any new connections made to the server. in your example server have more 10 sec to accept new connections. And you are running tests that runs based on video in less then 10 sec.
@csmithDevCove
@csmithDevCove 6 ай бұрын
May want to checkout argo rollouts.
@WebDevCody
@WebDevCody 6 ай бұрын
sounds interesting! thanks for the suggestion
What is the "best way" to develop software applications?
18:37
Web Dev Cody
Рет қаралды 278 М.
Do this before you deploy to Vercel
20:28
Web Dev Cody
Рет қаралды 23 М.
小丑在游泳池做什么#short #angel #clown
00:13
Super Beauty team
Рет қаралды 33 МЛН
Alat yang Membersihkan Kaki dalam Hitungan Detik 🦶🫧
00:24
Poly Holy Yow Indonesia
Рет қаралды 11 МЛН
Here's an overview of all my revenue generating side projects
19:38
Do NOT Learn Kubernetes Without Knowing These Concepts...
13:01
Travis Media
Рет қаралды 284 М.
Cursor Is Beating VS Code (...by forking it)
18:00
Theo - t3․gg
Рет қаралды 68 М.
This is the coolest side project I've worked on
19:18
Web Dev Cody
Рет қаралды 24 М.
I'm on the HTMX struggle bus right now
15:15
Web Dev Cody
Рет қаралды 12 М.
Setting up a production ready VPS is a lot easier than I thought.
29:50
This is why you'll need polling in your web applications
11:36
Web Dev Cody
Рет қаралды 35 М.
API Deployment Pipeline & DevOps at a Startup
20:42
Lofi Startup
Рет қаралды 70 М.
小丑在游泳池做什么#short #angel #clown
00:13
Super Beauty team
Рет қаралды 33 МЛН