Scale Your Batch / Big Data / AI Workloads Beyond the K8s...

No video

Scale Your Batch / Big Data / AI Workloads Beyond the K8s... - Antonin Stefanutti & Anish Asthana

Рет қаралды 305

CNCF [Cloud Native Computing Foundation]

Күн бұрын

Don't miss out! Join us at our next Flagship Conference: KubeCon + CloudNativeCon North America in Salt Lake City from November 12 - 15, 2024. Connect with our current graduated, incubating, and sandbox projects as the community gathers to further the education and advancement of cloud native computing. Learn more at kubecon.io
Scale Your Batch / Big Data / AI Workloads Beyond the Kubernetes Scheduler - Antonin Stefanutti & Anish Asthana, Red Hat
Whether you want to run AI model distributed training, or big data processing on Kubernetes, chances are you’ll face some challenges when scaling your workloads, like resource fragmentation, lack of all-or-nothing semantics, low throughput, and limited priority, quota and preemption management. The Kubernetes scheduler has historically been designed to orchestrate containers of (micro-)services, rather than workloads of heterogeneous, highly-coupled, and resource intensive processes. There has recently been a Cambrian explosion of projects in the Kubernetes ecosystem that have innovated to solve these challenges such as Koordinator, Kueue, MCAD, Volcano and YuniKorn. In this session, we’ll compare these projects, review their design choices, discuss their pros-and-cons, so you’ll have a better understanding of the landscape, and be able to decide which one best suits your needs when it comes to achieving better utilization of your Kubernetes clusters for your batch workloads.