Ollama with GPU on Kubernetes: 70 Tokens/sec !

Рет қаралды 1,843

Mathis Van Eetvelde

Күн бұрын

Пікірлер

@JoeIrizarry88 Ай бұрын

Nice. Time Slicing piece was the cherry to make the Sunday.

@farzadmf Ай бұрын

Great video, thank you!

@mathisve Ай бұрын

Thank you for the feedback! Any topics you'd like me to cover?

@farzadmf Ай бұрын

@@mathisve Nothing comes to mind; just enjoying your videos 🙂

@MsSteganos Ай бұрын

Great video, its original content, not seen in youtube. Congratulations Its possible to ollama in multi nodes and multi gpu in a kubenetes cluster ? Its better performance if run ollama 2 gpus in 1 node or if run ollama in 2 nodes and 1 gpu per node ?

@satyarsh665 Ай бұрын

very helpful thank you very much

@mathisve Ай бұрын

Thanks for the feedback! Are there any other topics you would like to see me cover?

@satyarsh665 Ай бұрын

@@mathisve Well i'm very new to the whole k8s thing and got this video recommended :) maybe more DevOps stuff perhaps?

@jorgearagon8053 Ай бұрын

Hi, this video and series in general is great. as someone interested in using AWS and other cloud providers, I would like to know how much this exercise as well as the others actually costed. could you please indicate the total costs of the activity so I can try to replicate it without and unexpected hidden cost? thx in advance

@mathisve Ай бұрын

Thanks! The instance I used (g4dn.2xlarge) costs about $0.75 an hour. Meaning I probably spent around $2-3 making this video. For reference, running this instance for an entire month would cost about $550.

@Earthvssuna Ай бұрын

Hey another question i had was that i found a video where it shows that ollama on its simolest installation is already able to be opened up in several terminals and responding. Concurrency was until a few months ago not possible but now it is. So i wonder why do all this hussle in making virtual gpus to make better availability. Im sure you know why. Its just what i am struggling to understand right now. Hope you find time to answer again. Thanks very much much 😊

@Earthvssuna Ай бұрын

kzbin.info/www/bejne/bqPCaXaQptlor80si=PzyzG4KSBiM371e8

@Earthvssuna Ай бұрын

Which are the benefits of running ollama with k8s in the cloud instead of ollama container in the cloud without k8s? Thanks very much for this video

@mathisve Ай бұрын

If you are running a single instance of ollama, there aren't many benefits. If you need lots of ollama instances (for a public api, or text generation, etc) using Kubernetes will help you simplify the operations.

@Earthvssuna Ай бұрын

@ we are about 1000 people in the company…. How would you approach this for using ollama i want to have some way from tracking which user is using it how much tokens. Not the content but to know how much which department is using it