GPUs in Kubernetes for AI Workloads

  Рет қаралды 5,998

DevOps Toolkit

DevOps Toolkit

Күн бұрын

Пікірлер: 36
@thibdub2752
@thibdub2752 3 ай бұрын
Hello Viktor, thank for the video, I would like to see a video with Knative and KubeVirt.
@dariusjuodokas9458
@dariusjuodokas9458 3 ай бұрын
yes, yes, please make a video about models. Also it would be nice to have a concise "Intro" into our options (tools) to train and/or fine-tune our own self-hosted models (or managed?). I want to dip my fingers into ML, but the technology is so fast growing and changing, that it's quite difficult to get 3 consistent search results (videos/blogs) explaining what is what and how to DIY. And ML seems to be the way to go further, as it seems to be rather good at automation w/o the need to write custom software., so we DO have to master it
@IvanRizzante
@IvanRizzante 3 ай бұрын
Thank you for another great video! I definitively enjoyed your Ollama demo! I really would like to see a video with Knative too.
@mgroth3973
@mgroth3973 3 ай бұрын
Great content as always! As someone who works a lot with KServe, I would of course like to see a video clip about your preferred approach of scaling InferenceServices in prod.
@maaft
@maaft 3 ай бұрын
Hi Viktor, glad that you mentioned KubeVirt. I did not find a solution yet where one can have on-premise (or cloud-provider A) GPU Servers to cover the base-load, while at the same time scaling into another cloud-provider B when there is demand. I coldnt get it to work yet and tried different solutions (Admiral, KubeVirt, Karmada) but there was always one or more roadblockers. Most of the time, the scheduler would just not even try to schedule my workload since all GPUs are already used. But IF the scheduler would just go ahead and schedule, auto-scaling would have picked up and spawn a new GPU node. This topic could also be expanded on the general case of how to do multi-cluster workload distribution (with auto-scaling) As always, thanks so much for your valuable content!
@RajaPriya-m1u
@RajaPriya-m1u 2 ай бұрын
Thank you for you video i got some idea. I'm eagerly wait for the Ollama with GPU in Kubernetes.
@DryBones111
@DryBones111 3 ай бұрын
I am particularly excited about the new alpha feature of OCI images as read-only mounts. That'll take k8s to the next level for running ML algorithms.
@Sebastian-or4xw
@Sebastian-or4xw 3 ай бұрын
Just FYI: a single Ollama installation and instance can run multiple models. Of course after each other when one is no longer used, but if you have enough VRAM also at the same time now
@DevOpsToolkit
@DevOpsToolkit 3 ай бұрын
That's true. I installed it twice only to demonstrate how sharing gpu works.
@ovsepavakian6109
@ovsepavakian6109 3 ай бұрын
Also you can use Slurm Workload Manager or Volcano
@slim5782
@slim5782 3 ай бұрын
I was looking into using knative to scale copilot like models to aid development and instead of partitioning the gpu; using time sharing as the lack of security boundary is not a problem.
@nemethk
@nemethk 3 ай бұрын
Pozdrav! A video of Ollama AI models in your presentation would be a valuable content.
@DevOpsToolkit
@DevOpsToolkit 3 ай бұрын
Great. I'll add it to my to-do list.
@Fayaz-Rehman
@Fayaz-Rehman 3 ай бұрын
Thank you sharing - Could it be possible if I can apply the same locally on my homelab ?
@DevOpsToolkit
@DevOpsToolkit 3 ай бұрын
If you have gpu in your homeland, yes you can. Th setup will be more complicated though.
@kashengy1017
@kashengy1017 3 ай бұрын
Thanks for the video, can you make a video on how to do the same with on prem kubeadm cluster?
@DevOpsToolkit
@DevOpsToolkit 3 ай бұрын
Unfortunately I do not have access to on-prem clusters any more so I would not have means to try it out and write the instructions.
@jovision30
@jovision30 29 күн бұрын
Hey Viktor Your content is absolutely valuable. Please make a video about LLM ! Thanks 😊
@DevOpsToolkit
@DevOpsToolkit 29 күн бұрын
I have a few on this channel already. I'll make sure to make more.
@mrgdevops
@mrgdevops 3 ай бұрын
not just ollama... differentiate bwn few llms .... helpful in devops space
@theindependentradio
@theindependentradio 3 ай бұрын
Yes please show the knative option
@TiggyProph
@TiggyProph 3 ай бұрын
What are your recommended tools to manage GPU workloads on Kubernetes? At my org, we've configured the basics that you have here already, and now the teams are looking into AI frameworks. (Using Argo, Karpenter, and EKS to manage all the configurations discussed in your video here) Applications like Kubeflow are being discussed to help those teams move more swiftly, and I'm curious about your take on it or if you have content coming soon related to that.
@DevOpsToolkit
@DevOpsToolkit 3 ай бұрын
I explored inference. Kubeflow is focused more on generating models. It is great, but sometimes overwhelming. I'll do my best to explore it in one of the upcoming videos.
@Anselm_0
@Anselm_0 3 ай бұрын
Yes, an orchestrator definitely makes your life easier and there are pretty good open source ones. Airflow, prefect or flyte are pretty good. From Kubeflow I've heard mixed experiences.
@chandup
@chandup 3 ай бұрын
zenML could help make life easy to use many ML tools.
@Sebastian-or4xw
@Sebastian-or4xw 3 ай бұрын
Kubeflow is the ML everything tool (collection), but it is not very easy to deploy and maintain in my experience. We used deployKf which made it a bit easier but that does not include everything
@caro-n3x
@caro-n3x 3 ай бұрын
have you tried Kaito? A video on it would be great.
@DevOpsToolkit
@DevOpsToolkit 3 ай бұрын
Kaito seem to be focused on marketing and that's not an area I tend to work in.
@zuduni
@zuduni 3 ай бұрын
Kubevirt with gpu will be cool, thx
@renanmonteirobarbosa8129
@renanmonteirobarbosa8129 3 ай бұрын
What if my model requires 32 gpus to perform inference? 😜 what see your K8s do that
@DevOpsToolkit
@DevOpsToolkit 3 ай бұрын
What would you use instead?
@renanmonteirobarbosa8129
@renanmonteirobarbosa8129 3 ай бұрын
@@DevOpsToolkit SLURM as reccomended by Nvidia. They have lots of eduational material on the top and recently updated DLI labs
@deepakbaliga
@deepakbaliga 2 ай бұрын
Cloud run is a good severless platform to run GPU workloads instead of setting up kNative and managing it !!
@DevOpsToolkit
@DevOpsToolkit 2 ай бұрын
Oh yeah. Cloud Run is managed Knative abd it's great.
What The Heck Are Kubernetes Resources, CRs, CRDs, Operators, etc.?
21:08
Мясо вегана? 🧐 @Whatthefshow
01:01
История одного вокалиста
Рет қаралды 7 МЛН
Каха и дочка
00:28
К-Media
Рет қаралды 3,4 МЛН
BAYGUYSTAN | 1 СЕРИЯ | bayGUYS
36:55
bayGUYS
Рет қаралды 1,9 МЛН
Building My ULTIMATE, All-inOne, HomeLab Server
17:21
Techno Tim
Рет қаралды 207 М.
Ollama with GPU on Kubernetes: 70 Tokens/sec !
20:19
Mathis Van Eetvelde
Рет қаралды 2,1 М.
Unleashing the Power of AI in Kubernetes through K8sGPT | Alex Jones
30:01
Kubernetes Community Days UK
Рет қаралды 4,6 М.
The Future of Shells with Nushell! Shell + Data + Programming Language
25:45
Manage Docker & Kubernetes Remotely with VS Code!
21:34
Jim's Garage
Рет қаралды 15 М.
host ALL your AI locally
24:20
NetworkChuck
Рет қаралды 1,6 МЛН
Is This the End of Crossplane? Compose Kubernetes Resources with kro
30:01