How To Auto-Scale Kubernetes Clusters With Karpenter

Рет қаралды 24,933

Күн бұрын

Пікірлер: 98

@DevOpsToolkit 3 жыл бұрын

Would you switch from Kubernetes Cluster-Autoscaler to Karpenter (if you can)? IMPORTANT: For reasons I do not comprehend (and Google support could not figure out), KZbin tends to delete comments that contain links. Please do not use them in your comments.

@marymekins3546 3 жыл бұрын

Seems, like a severe vendor lock-in with using Karpenter. Can it be used with other cloud providers, like for example, hetzner cloud? How does it compare to keda.sh and knative autoscaling? thank you sharing

@DevOpsToolkit 3 жыл бұрын

@@marymekins3546 @Mary Mekins it's not really about vendor locking. It is open source and the major question is whether other providers will extend it or not. So, today it's only for EKS and tomorrow... we do not yet know. Keda, KNative, and similar are about horizontal scaling of applications. Karpenter is about scaling clusters/nodes. Those are very different goals even though app scaling often results in cluster scaling.

@marymekins3546 3 жыл бұрын

@@DevOpsToolkit Thanks for the clarification. Also, Do you consider Crossplane and Gardener's autoscaling components more relevant for node / cluster autoscaling? Thank you

@DevOpsToolkit 3 жыл бұрын

@@marymekins3546 Neither of those (Crossplane and Gardener) has its own Cluster Autoscaler. so they rely on those that are baked into managed Kubernetes offerings (e.g,. GKE, AKS, etc.) or can apply cluster scalers (e.g., Kubernetes Cluster Autoscaler, Karpenters, etc.). What I'm trying to say is that Crossplane and, to an extent, Gardener, are orchestrating infra services rather than providing specific implementations of those services, including cluster scalers.

@ChrisShort 3 жыл бұрын

Thanks!

@envueltoenplastico 3 жыл бұрын

More good news! This looks great. Cluster Autoscaler was wrecking my head. Trying Karpenter out now. Thanks for the video :) Also, I'm using the latest build (0.80-dev) of eksctl, which allows you to define a `karpenter` configuration value to `ClusterConfig`, so hopefully that takes most of the legwork out of the process - I believe all that's necessary after that is to create `Provisioner` resources as required.

@DevOpsToolkit 3 жыл бұрын

That's not the surprise. WeaveWorks is the company that is heavily involved in all AWS k8s OSS projects so it was to be expected that they'll extend eksctl (they did the most contributions to it).

@j0Nt4Mbi 2 жыл бұрын

awesome explanation Viktor thanks again for a such a valuable information.

@stormrage8872 3 жыл бұрын

I saw Karpenter i think 30 mins after it went GA. The issue why other providers will probably not contribute is because of the binning technique they used which is tied (and a horrible limitation) of AWS. Then I wanted to use it, can't utilize my launch templates because of custom pods per node, can't utilize crossplane because nodegroups (or this also might fall down to not knowing how) so for now it will be a no go, but a project that steps in the right direction nonetheless. Thanks for the video

@agun21st Жыл бұрын

Very details explainations about Karpenter. Thank you so much sir.

@DevOpsToolkit Жыл бұрын

You are most welcome

@felipeozoski Жыл бұрын

Thanks for another awesome video Viktor :)

@JohnNguyen-x1w 9 ай бұрын

You're fricking funny 👍. Thank you so much for such a great demo. To the point.

@bmutziu 3 жыл бұрын

Mulțumim!

@DevOpsToolkit 3 жыл бұрын

Thanks a ton, Bogdan.

@bmutziu 3 жыл бұрын

@@DevOpsToolkit It's nothing, Viktor.

@DaniVendettaXII 2 жыл бұрын

I'm trying Karpenter, and I found a cons, I'm investigating but... When the workload decrease the nodes are not changed. I mean, for example you scale to 10 replicas, and Karpenter decide to provision 1 c5n.2xlarge instance. Some time later you scaledown your pods from 10 to 6, and your instance can change to t3.medium, (for instance), I've observated that Karpenter is not adjusting the instance for the current workload. I have to do more test and experiments with karpenter, but until now is the thing that I've see. Thanks for the video and the channel Victor/DevOps Toolkit. Kind Regars.

@DevOpsToolkit 2 жыл бұрын

Scaling down is a problem with all cluster autoscalers including Karpenter :(

@DaniVendettaXII 2 жыл бұрын

@@DevOpsToolkit Hi Victor, but, with cluster autoscaler, at least in my configuration, if replicas goes down, and then remaining pods can fit in other workers, the autoscaler evitc that pods, taint the node selected for be deleted, and the pods are re-schedulerd in a existing worker. After that, the worker is deleted, it's more slowly and less accurate. With cluster autoscaler it's easier to ha ve more resources that you need, but with karpenter I can see in scaling down , we have the same problem. Maybe it's resolved in the future, but I see some scopes where karpenter can be more useful than cluster autoscaler and viceversa. Another point to take in consideration, is how to configure the providers, Since Karpenter is pretending to put all new pods in a workers, can have the probability of put all the pods in the same AZ, Today probable I'll trying combine providers with nodeaffinity and pod-ANTI-affinity to see if I can put pods in all my AZs. Again thanks for that nice work, video and channel, and I really appreciate it your answer.

@DevOpsToolkit 2 жыл бұрын

You're right. Karpenter solves some of the problems well while others are far from being solved. It's a new project so we're yet to see whether it will mature. My main concern right now, before other issues are solved, is whether other providers will pick it up and even whether AWS will include it into EKS. If neither of those happen, it's a sign that vendors do not trust it.

@sophiak4286 Жыл бұрын

can we use karpenter for patching nodes in existing nodegroup. that is nodes not managed my karpenter

@DevOpsToolkit Жыл бұрын

As far as I know, that is not possible.

@maxmustermann9858 Жыл бұрын

As I understand it only scales new nodes, is there also a way when I have a pod which gets utilized very heavily that a new node is created and the pod is moved to this node, for example when apps in pods can’t scale vertically by just adding more pods.

@DevOpsToolkit Жыл бұрын

Why would you move a pod to a new node? If you specified memory, CPU, and other constraints, it should be irrelevant where that pod runs as long as those constraints are met.

@maxmustermann9858 Жыл бұрын

@@DevOpsToolkit Ah I get it, so the recourses are statistically assigned and cannot dynamically grow with the Pods Load. My assumption was that when there are no recourse limits defined and let’s say the pod normally runs with 2G of ram but now the load gets quite high the pod now needs 4G of Ram but the current system it’s running on can’t provide more so that the Pod won’t get „throttled“ or the application gets slow maybe there is a way that this pod gets restarted on another host which now has enough recourses.

@DevOpsToolkit Жыл бұрын

@maxmustermann9858 when resource requests are not specified, pods can use any amount of memory and CPU available on the nodes they are running. However, when the collection of all the pods on a node consume more memory and CPU than the node has, pods without requests are kicked out first to leave resources for pods that do have it specified. So, pods without resource requests are considered less important and kubernetes will sacrifice them before others. Check out Quality of Service concept in kubernetes. Also, kubernetes will soon release the feature of dynamic resource allocation so that resource requests can change without restarting pods. That will be especially useful with vertical pod scalers.

@miletacekovic 3 жыл бұрын

One question: Is Karpenter capable of vertical auto-scaling down? Typical example: Consider a new project started as a Monolith and one big pod is required for initial deployment. Karpenter allocates one big node to fit it. Now as project continues and grows, it is decomposed in Microservices and 10 small pods are used for full system. Is Karpenter capable of replacing a big node with say two much smaller nodes as that might be cheaper than the one big node?

@DevOpsToolkit 3 жыл бұрын

Yes. It's doing that fairly well. Its main strength is that it creates nodes that are just the right size for the pending workload.

@miletacekovic 3 жыл бұрын

@@DevOpsToolkit Wow, thanks for the answer!

@umeshranasinghe 6 ай бұрын

Great video. Thank you very much!

@amitmantha7662 Жыл бұрын

So as I installed karpenter on eks cluster, I just want to stop spinning the nodes by karpenter on every weekends automatically, how can I do that..??

@DevOpsToolkit Жыл бұрын

I never had such a req so I never tried something like that. Why not weekends? Does that mean that you prefer having pods in the pending state instead?

@barefeg 3 жыл бұрын

How would one have both cluster autoscaler and Karpenter running in the same cluster? Is it just using the special nodeSelector for karpenter to schedule those? I would like to try it out but without committing to it the whole way

@DevOpsToolkit 3 жыл бұрын

I haven't tried using both so I'm not sure how it would work and what would need to be done to make that happen. I would rather experiment with it in a new temporary cluster.

@snygg-johan9958 3 жыл бұрын

Very nice! Does it also work with hpa during high loads?

@DevOpsToolkit 3 жыл бұрын

It does. HPA scales your apps and if some of the pods end up in the pending state Karpenter will scale up the cluster :)

@snygg-johan9958 3 жыл бұрын

@@DevOpsToolkit Thanks for the response! Then Im going to check it out :-)

@bled_2033 2 жыл бұрын

Very well explained!

@kavilofi 10 ай бұрын

Awesome explanation....🤩

@kavilofi 10 ай бұрын

sir please explanation about k8s keda

@DevOpsToolkit 10 ай бұрын

@devopsguy- here it goes... KEDA: Kubernetes Event-Driven Autoscaling kzbin.info/www/bejne/aZ3GkpStgKapbNU

@sarvanvik1835 2 жыл бұрын

Hi sir,if we use karpenter ,if we want to upgrade the worker node to new version which is in node group,and also newly scaled groupless worker node,What would happen can u clear my doubt?

@DevOpsToolkit 2 жыл бұрын

That's a bit "clanky" right now. You'd need to see TTL on the nodes so that they "expire" and be replaced by new nodes which will follow the version you have of your cluster. The good news is that improvements for that are coming. You might want to follow github.com/aws/karpenter/issues/1738. You'll see over there that some additional options are already added while others are in progress.

@barefeg 3 жыл бұрын

Do we need to have eksctl configurations for node groups at all?

@DevOpsToolkit 3 жыл бұрын

You do need a node group for the cluster so that you get the initial node where you'll install karpenter. You do not have to use eksctl to create that group, but you do have to have it, even if it's for karpenter alone. That's why I complained in the video that it should run on control plane nodes.

@javisartdesign 3 жыл бұрын

Aswesome! Wanted to see a working example of its use. Thanks

@igorluizdesousasantos4965 Жыл бұрын

Amazing content 🎉🎉

@gdevelek Жыл бұрын

You didn't explain that "limits:" thing at all. Why set a limit on total request CPU? And what if it's exceeded? No autoscaling???

@DevOpsToolkit Жыл бұрын

You're right. CPU limits are arguably useless except for QoS.

@georgeanastasiou2680 3 жыл бұрын

Hello Victor, thank you for your video, does it also consider multizone workloadss to instantiate nodes in multiple zones per region, something that as far as i know this is currently be accomodated by the upstream cluster-autoscaler project. Thank you

@DevOpsToolkit 3 жыл бұрын

Yes. It does that :) The main advantage of Karpenter is that you have much more control over the relation between pending pods and the nodes that should be created to run them.

@georgeanastasiou2680 3 жыл бұрын

@@DevOpsToolkit thank you :)

@guangguang1984 2 жыл бұрын

Very nice video, thanks! Got 1 questions: As there is no autoscaling group, how can I scale in nodes conveniently in mannual?

@DevOpsToolkit 2 жыл бұрын

Your cluster is still created with a node group and you can always add additional node groups. It's just that those managed automatically are without a node group.

@bartekr5372 3 жыл бұрын

Nice. Let us consider cluster running hpa and cluster-autoscaler outside of peak hours. If you have a good distribution of pods and hpa starts to decrease the number of replicas you may end up having some nodes underutilized. Released capacity will occur on some of worker nodes. In such condition i always find cluster-autoscaler slow. Can we expect Karpenter to be more active or even doing some optimization? By optimization i mean compaction of unused capacity (something that deschedulers try to acchieve) or optimizing worker node sizes?

@DevOpsToolkit 3 жыл бұрын

So far, I think that Karpenter is only marginally better at scaling down nodes that are underutilized. The part that works fairly well is when it scales up for a single pending pod and when that pod is removed, it removes the node almost instantly. That part looks very similar to what GKE Autopilot is doing. The project is still young so we'll see. It's better than Cluster Autoscaler in EKS but we're yet to see whether it will go beyond that (as it should).

@srikrishnachaitanyagadde926 2 жыл бұрын

Are eks ipv6 clusters supported with karpenter?

@DevOpsToolkit 2 жыл бұрын

There are issues with it (e.g., github.com/aws/karpenter/issues/1241).

@luisrodriguezgarcia1282 3 жыл бұрын

Have I understood correctly? This is just for EKS and not just EKS ... Just for EKS created with eksctl... What about the EKS clusters created with terraform? Can not be managed with karpenter? Great video as usual Víctor, by the way.

@DevOpsToolkit 3 жыл бұрын

You're partly right. Currently, Karpenter works only with EKS. The initial examples are using eksctl and Terraform examples were added recently. That, however, does not mean that it does not work with other tools. You should be able to use it with EKS clusters no matter which tool you're using to manage them. A bigger problem is with other providers (e.g., GCPO, Azure, etc.). Karpenter project is hoping to attract contributions from others (currently it's mostly AWS folks), but that is yet to be seen.

@leonardo_oliveira241 3 жыл бұрын

@@DevOpsToolkit What about Fargate? In the documentation has a mention to work with Fargate.

@DevOpsToolkit 3 жыл бұрын

@@leonardo_oliveira241 Fargate is EKS with a layer on top so it does work with it.

@barefeg 3 жыл бұрын

How do you track all of these new solutions that come up?

@DevOpsToolkit 3 жыл бұрын

In some cases, I search for specific solutions that complement those I'm already using. In others, I hear about a tool and put it to my TODO list. In any case, I tend to spend a lot of time (including weekends and nights) on learning.

@shuc1935 3 жыл бұрын

Quick question out of curiosity: Since Karpernter auto scaling offering is group less , can we spin up a n eks cluster without nodegroup definition i.e. with zero worker node and based on the deployment resource requests , have Karpenter provision a group less node with appropriate capacity to run the requested application ?

@DevOpsToolkit 3 жыл бұрын

That would be possible if Karpenter would be running on control plane nodes (like most of other cluster scalers are running). As it is now, it needs to run on worker nodes and that means that the cluster needs to have at least one where Karpenter will be running before it starts scaling up (and down).

@shuc1935 3 жыл бұрын

Never mind, you indeed mentioned that Karpenter can't be deployed on control plane nodes so in order to implement cluster auto scaling we must have at least one node in a node group which is kind of waste from a node group stand point but it's better than regular CA on EKS. I was curious to see if Karpenter could have been the solution for truly fully managed serverless k8s solution on AWS.

@DevOpsToolkit 3 жыл бұрын

@@shuc1935 Managed Kubernetes services like EKS, GKE, AKS, etc. do not allow users to access control planes. That means that AWS would need to bake Karpenter into EKS itself. I hope they'll do that. Ideally, it should be a single checkbox asking people to enable Autoscaling which, currently, does not exist in EKS in any form without using Fargate.

@shuc1935 3 жыл бұрын

@@DevOpsToolkit yep, like gke auto pilot --enable auto scaling

@shuc1935 3 жыл бұрын

Also eks with with fargate profile is only partially fully managed based off think ahead of time namespace speculation

@sparshagarwal1877 Жыл бұрын

How to run karpenter on control plane ?

@DevOpsToolkit Жыл бұрын

Not sure I understood the question. Are you asking how to run Karpenter pods in control plane nodes? If that's the case, you can't, at least when using managed kubernetes as EKS. You do not have write access to control plane nodes.

@herbertpurpora9452 2 жыл бұрын

question: I'm new to kubernetes and aws. but base on my understanding, using karpenter will make our eks cluster cost change dynamically right?

@DevOpsToolkit 2 жыл бұрын

Karpenter and similar cluster autoscaler solutions are adding servers when you need them and shitting them down when you don't. AWS, on the other hand, charges things you use. The more optimized usage is, the less you pay.

@gvoden 3 жыл бұрын

Can I use Karpenter with my clusters that are leveraging managed node groups or do I have to get rid of the node groups first? How would the cluster upgrade process change if I use Karpenter? (I assume I can still do rolling updates regardless). And finally, should I be deploying Karpenter as a DaemonSet?

@DevOpsToolkit 3 жыл бұрын

Karpenter does not use managed node group which are essentially based on AWS auto-scaling groups (ASGs). It's intentionally avoiding ASGs because they are slow and because they are managing instances based on same instance types and sizes. Karpenter is avoiding it so that the process is (much) faster (ASG is slow) and so that it can create VMs with sizes that fit pending load. In other words, it's a good thing that it does not use ASGs. That being said, there is nothing preventing you from having a cluster based on managed node group. It's just that the nodes created by Karpenter will not be using it (it'll NOT use ASGs associated with managed node groups). There should be no difference in the upgrade process. New nodes will be created based on the new version and the old nodes will be shut down (rolling updates). There's no need to run Karpenter as DaemonSet. It's not the type of service that needs to run on each node of the cluster.

@jdiegosf 2 жыл бұрын

Excellent!!!

@srivathsaharishvenk 2 жыл бұрын

legend!

@reddinghiphop1 2 жыл бұрын

fantastic

@aswinkumar3396 2 жыл бұрын

Questions : when using karpenter with eks image is not pulling from private repository like sonartype nexus

@DevOpsToolkit 2 жыл бұрын

Which image you're referring to? Image of carpenter itself or...?

@aswinkumar3396 2 жыл бұрын

Docker image of our python project which we have stored sonartype nexus

@aswinkumar3396 2 жыл бұрын

network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized Error: ErrImagePull

@DevOpsToolkit 2 жыл бұрын

@@aswinkumar3396 That's not related to scaling of the cluster. Karpenter will increase (or decrease) the nodes of the cluster allowing Kubernetes to schedule pending pods in the same way those would be scheduled without Karpenter.

@DevOpsToolkit 2 жыл бұрын

@@aswinkumar3396 I think you might be facing the same issue as github.com/aws/karpenter/issues/1391

@RakeshKumar-eb9re 2 жыл бұрын

To the point 👌

@TheApeMachine 3 жыл бұрын

Make Karpenter and ArgoCD fight it out :p

@DevOpsToolkit 3 жыл бұрын

Those are very different tools that serve different objectives, so the fight would not be fair. Karpenter could be compared to Cluster Autoscaler or, even better, EKS with Karpenter could be compared with GKE Autopilot.

@TheApeMachine 3 жыл бұрын

@@DevOpsToolkit Not compare, fight. Karpenter tries to change the cluster, ArgoCD fights for consistency with state in git :)

@MichaelBushey 3 жыл бұрын

@@TheApeMachine They won't fight at all. If the cluster does not have the resources the pods applied via ArgoCD will stay pending. ArgoCD will do it's job, the cluster just won't be able to run it all if it's not big enough.

@sebastiansMcuProjekte 3 жыл бұрын

Didn't even include a link and my prior comment got removed. Check out spot, maybe it's worth a video?

@DevOpsToolkit 3 жыл бұрын

KZbin has a nasty tendency to remove comments without any obvious reason. Can you please send me the idea over Twitter (@vfarcic) or LinkedIn (www.linkedin.com/in/viktorfarcic/)?