Ollama with GPU on Kubernetes: 70 Tokens/sec !

  Рет қаралды 1,843

Mathis Van Eetvelde

Mathis Van Eetvelde

Күн бұрын

Пікірлер
@JoeIrizarry88
@JoeIrizarry88 Ай бұрын
Nice. Time Slicing piece was the cherry to make the Sunday.
@farzadmf
@farzadmf Ай бұрын
Great video, thank you!
@mathisve
@mathisve Ай бұрын
Thank you for the feedback! Any topics you'd like me to cover?
@farzadmf
@farzadmf Ай бұрын
@@mathisve Nothing comes to mind; just enjoying your videos 🙂
@MsSteganos
@MsSteganos Ай бұрын
Great video, its original content, not seen in youtube. Congratulations Its possible to ollama in multi nodes and multi gpu in a kubenetes cluster ? Its better performance if run ollama 2 gpus in 1 node or if run ollama in 2 nodes and 1 gpu per node ?
@satyarsh665
@satyarsh665 Ай бұрын
very helpful thank you very much
@mathisve
@mathisve Ай бұрын
Thanks for the feedback! Are there any other topics you would like to see me cover?
@satyarsh665
@satyarsh665 Ай бұрын
@@mathisve Well i'm very new to the whole k8s thing and got this video recommended :) maybe more DevOps stuff perhaps?
@jorgearagon8053
@jorgearagon8053 Ай бұрын
Hi, this video and series in general is great. as someone interested in using AWS and other cloud providers, I would like to know how much this exercise as well as the others actually costed. could you please indicate the total costs of the activity so I can try to replicate it without and unexpected hidden cost? thx in advance
@mathisve
@mathisve Ай бұрын
Thanks! The instance I used (g4dn.2xlarge) costs about $0.75 an hour. Meaning I probably spent around $2-3 making this video. For reference, running this instance for an entire month would cost about $550.
@Earthvssuna
@Earthvssuna Ай бұрын
Hey another question i had was that i found a video where it shows that ollama on its simolest installation is already able to be opened up in several terminals and responding. Concurrency was until a few months ago not possible but now it is. So i wonder why do all this hussle in making virtual gpus to make better availability. Im sure you know why. Its just what i am struggling to understand right now. Hope you find time to answer again. Thanks very much much 😊
@Earthvssuna
@Earthvssuna Ай бұрын
kzbin.info/www/bejne/bqPCaXaQptlor80si=PzyzG4KSBiM371e8
@Earthvssuna
@Earthvssuna Ай бұрын
Which are the benefits of running ollama with k8s in the cloud instead of ollama container in the cloud without k8s? Thanks very much for this video
@mathisve
@mathisve Ай бұрын
If you are running a single instance of ollama, there aren't many benefits. If you need lots of ollama instances (for a public api, or text generation, etc) using Kubernetes will help you simplify the operations.
@Earthvssuna
@Earthvssuna Ай бұрын
@ we are about 1000 people in the company…. How would you approach this for using ollama i want to have some way from tracking which user is using it how much tokens. Not the content but to know how much which department is using it
Ollama on Kubernetes: ChatGPT for free!
18:29
Mathis Van Eetvelde
Рет қаралды 12 М.
GPUs in Kubernetes for AI Workloads
13:04
DevOps Toolkit
Рет қаралды 6 М.
When you have a very capricious child 😂😘👍
00:16
Like Asiya
Рет қаралды 18 МЛН
How Strong Is Tape?
00:24
Stokes Twins
Рет қаралды 96 МЛН
Qwen QwQ 2.5 32B Ollama Local AI Server Benchmarked w/ Cuda vs Apple M4 MLX
26:28
Cracking the Enigma of Ollama Templates
7:39
Matt Williams
Рет қаралды 9 М.
Unlocking the Full Potential of GPUs for AI Workloads on Kubernetes - Kevin Klues, NVIDIA
32:38
CNCF [Cloud Native Computing Foundation]
Рет қаралды 7 М.
Make Your RAG Agents Actually Work! (No More Hallucinations)
35:01
Leon van Zyl
Рет қаралды 27 М.
Cheap mini runs a 70B LLM 🤯
11:22
Alex Ziskind
Рет қаралды 283 М.
Can the Mac Mini M4 run Llama 3?
8:47
Gerson Rod
Рет қаралды 13 М.
Upstash on AWS Lambda using Golang
34:28
Mathis Van Eetvelde
Рет қаралды 1 М.
The Kubernetes Homelab That Prints Job Offers - 2025
18:40
Mischa van den Burg
Рет қаралды 29 М.