Рет қаралды 848
vLLM is a fast and easy-to-use library for LLM inference and serving. In this video, we go through the basics of vLLM, how to run it locally, and then how to run it on Kubernetes in production with GPU-attached nodes via a DaemonSet. It includes a hands-on demo explaining vLLM deployment in production.
Blog post: opensauced.pizza/blog/how-we-...
John McBride(@JohnCodes)
►►►Connect with me ►►►
► Kubesimplify: kubesimplify.com/newsletter
► Newsletter: saiyampathak.com/newsletter
► Discord: saiyampathak.com/discord
► Twitch: saiyampathak.com/twitch
► KZbin: saiyampathak.com/youtube.com
► GitHub: github.com/saiyam1814
► LinkedIn: / saiyampathak
► Website: / saiyampathak
► Instagram: / saiyampathak
► / saiyampathak