Local Ai Models on Quadro P2000 - Homelab testing Gemma Ai, Qwen2, Smollm, Phi 3.5, Llama 3.1

Рет қаралды 7,949

Күн бұрын

Пікірлер: 33

@camerontinker1948 Ай бұрын

That's awesome performance for the P2000! I have a single Tesla P4 used for Plex and passed through to an LXC. It would be cool to add a second Tesla P4 and use the GPUs for Ollama when not Plex transcoding. Most of the time, transcoding only takes place for remote streams. 16GB of total VRAM should allow for Llama 3.1 8b fairly well.

@leemunson8419 2 ай бұрын

I'm using a pair of 3090s for serious AI work but have also tried out the P4000 I have in a server. As you can imagine from these results, it's certainly capable of running smaller LLMs at a decent speed but the accuracy lets it down vs the larger models. Where it does work for me is in creating batches of images with Stable Diffusion/ComfyUI at low power use when I'm busy with other things and don't need them in a hurry.

@DigitalSpaceport 2 ай бұрын

Yeah an always on second rig does seem like a decent idea but these smaller models are a bit of a letdown on accuracy. Didnt think about an aux use for imgen but great idea.

@johnkost2514 2 ай бұрын

Anything above 10 T/s is amazing when you consider that is 600 T/min or say about 500 Words/min (as if someone can type that fast). Small language models are fun and useful for experimenting locally and privately.

@DigitalSpaceport 2 ай бұрын

I was talking to my wife yesterday and she said.... You kinda sound like an llm outputting next words. She was right.

@Alex-rg1rz 2 ай бұрын

intersting benchmark video! any plans on testing AMD GPU? like the 7600XT

@DigitalSpaceport 2 ай бұрын

If i can borrow one i will. I have a friend with a few ill ask. Ollama does work with AMD gpus, they have a blog post on it.

@InstaKane 2 ай бұрын

I would like to see more analysis on the quality of the response.

@alx8439 Ай бұрын

As for P2000 inference speed. I have a mini pc with AMD Ryzen 9 7940 and 32 Gb of DDR5. Seeing your numbers here I went and downloaded the same quants of the same models. And I'm getting marginally better results for everything, running purely on CPU. So don't think it's worth to invest into these ancient cards. Modern CPUs can give you better performance

@drumstyx Ай бұрын

It is, however, still a very useful card for realtime transcoding on Plex. The codecs aren't as much of an issue these days (HEVC was a huge step forward) but 4k (I know....) or even high bitrate 1080 can be tricky just for the client's connection sometimes.

@markaphillips14 2 ай бұрын

Finally someone talking about the p2000 with LLMs. I’ve had my p2000 working in my server since 2020. Bad time to buy it but it has been rock solid. Could you look into the A2000 12gb? Thanks for the content.

@DigitalSpaceport Ай бұрын

Im like shocked how good it is really. The little card that could! I doubt I buy an A2000 12gb to test sorry.

@alexo7431 2 ай бұрын

good job

@drumstyx Ай бұрын

It's all about context. Each test with a different subject should really be done with a fresh chat -- these smaller models get really influenced by context.

@alx8439 Ай бұрын

Regarding context size. Ollama sets it to 2k for all the models by default. You can override it via Open Web UI if you're running some recent version. But don't trick yourself with ultra long responses phi gives you. If you haven't changed the context size (n_ctx), they are irrelevant

@moozoo2589 Ай бұрын

The "Results Table" from the description is not relevant to P2000. Could you please prepare a benchmark DB with all your results from various cards?

@sharplcdtv198 2 ай бұрын

how do you run nvtop in powershell? do you have an instalation guide anywhere in your videos?

@DigitalSpaceport Ай бұрын

Im not sure nvtop runs in powershell directly. It likely would in WSL maybe? You will always run into weird issues in windows with GPUs and high performance stuff so I just dont bother with that anymore.

@MM-vl8ic 2 ай бұрын

I'm curious, using retro MBs..... ASUS x99 w/PLX switch and an OCed Xeon E5-1660v3 (8 cores, easy 4G).... that MoBo has the potential to have 6-x8/1-x16 or 4-x16 slots.... start plugging in "cheap" GPUs.... how well would the memory stack up.... the expensive version 4 - 3060 12GB (or more running @ x8)......

@DigitalSpaceport 2 ай бұрын

It works, but its hard to predict layout it will choose. A bit slow of course. Video on the remaining lower end GPUs out in a few days and i did test that. Fermi and some kepler is the cutoff they dont work.

@piero957 Ай бұрын

How was CPU only performance on the same machine?

@DigitalSpaceport Ай бұрын

Depends on the model you are running. The new Llama 3.2 models run very nice as they are super small on modernish CPUs. You of course will see thread speed impact performance then.

@rmeta3391 2 ай бұрын

Seems like you'd get more views if you install ComfyUI and gave us some iterations per second details on each GPU you test.

@DigitalSpaceport 2 ай бұрын

Okay Ill do that

@hobohippy1616 Ай бұрын

plz do nvidia m6000 24gb

@DigitalSpaceport Ай бұрын

I wont be able to do that one sorry. I dont have one and its too old to buy at this point but if you already have one why not try? I would be cautious of going older then Pascal right now.

@johnc2k2k 2 ай бұрын

Im using a Nvidia T1000 8GB

@DigitalSpaceport Ай бұрын

Oh those are kinda rare. Id bet it has very solid performance. I have a P4 that will be out in a video sometime soonish and it does really well.

@alx8439 Ай бұрын

Also, when phi started to give you gibberish you should have started a new chat, instead of trying to fix it - as Open Web UI sends all that historical chats (including gibberish) each time with your new message