That's awesome performance for the P2000! I have a single Tesla P4 used for Plex and passed through to an LXC. It would be cool to add a second Tesla P4 and use the GPUs for Ollama when not Plex transcoding. Most of the time, transcoding only takes place for remote streams. 16GB of total VRAM should allow for Llama 3.1 8b fairly well.
@leemunson84192 ай бұрын
I'm using a pair of 3090s for serious AI work but have also tried out the P4000 I have in a server. As you can imagine from these results, it's certainly capable of running smaller LLMs at a decent speed but the accuracy lets it down vs the larger models. Where it does work for me is in creating batches of images with Stable Diffusion/ComfyUI at low power use when I'm busy with other things and don't need them in a hurry.
@DigitalSpaceport2 ай бұрын
Yeah an always on second rig does seem like a decent idea but these smaller models are a bit of a letdown on accuracy. Didnt think about an aux use for imgen but great idea.
@johnkost25142 ай бұрын
Anything above 10 T/s is amazing when you consider that is 600 T/min or say about 500 Words/min (as if someone can type that fast). Small language models are fun and useful for experimenting locally and privately.
@DigitalSpaceport2 ай бұрын
I was talking to my wife yesterday and she said.... You kinda sound like an llm outputting next words. She was right.
@Alex-rg1rz2 ай бұрын
intersting benchmark video! any plans on testing AMD GPU? like the 7600XT
@DigitalSpaceport2 ай бұрын
If i can borrow one i will. I have a friend with a few ill ask. Ollama does work with AMD gpus, they have a blog post on it.
@InstaKane2 ай бұрын
I would like to see more analysis on the quality of the response.
@alx8439Ай бұрын
As for P2000 inference speed. I have a mini pc with AMD Ryzen 9 7940 and 32 Gb of DDR5. Seeing your numbers here I went and downloaded the same quants of the same models. And I'm getting marginally better results for everything, running purely on CPU. So don't think it's worth to invest into these ancient cards. Modern CPUs can give you better performance
@drumstyxАй бұрын
It is, however, still a very useful card for realtime transcoding on Plex. The codecs aren't as much of an issue these days (HEVC was a huge step forward) but 4k (I know....) or even high bitrate 1080 can be tricky just for the client's connection sometimes.
@markaphillips142 ай бұрын
Finally someone talking about the p2000 with LLMs. I’ve had my p2000 working in my server since 2020. Bad time to buy it but it has been rock solid. Could you look into the A2000 12gb? Thanks for the content.
@DigitalSpaceportАй бұрын
Im like shocked how good it is really. The little card that could! I doubt I buy an A2000 12gb to test sorry.
@alexo74312 ай бұрын
good job
@drumstyxАй бұрын
It's all about context. Each test with a different subject should really be done with a fresh chat -- these smaller models get really influenced by context.
@alx8439Ай бұрын
Regarding context size. Ollama sets it to 2k for all the models by default. You can override it via Open Web UI if you're running some recent version. But don't trick yourself with ultra long responses phi gives you. If you haven't changed the context size (n_ctx), they are irrelevant
@moozoo2589Ай бұрын
The "Results Table" from the description is not relevant to P2000. Could you please prepare a benchmark DB with all your results from various cards?
@sharplcdtv1982 ай бұрын
how do you run nvtop in powershell? do you have an instalation guide anywhere in your videos?
@DigitalSpaceportАй бұрын
Im not sure nvtop runs in powershell directly. It likely would in WSL maybe? You will always run into weird issues in windows with GPUs and high performance stuff so I just dont bother with that anymore.
@MM-vl8ic2 ай бұрын
I'm curious, using retro MBs..... ASUS x99 w/PLX switch and an OCed Xeon E5-1660v3 (8 cores, easy 4G).... that MoBo has the potential to have 6-x8/1-x16 or 4-x16 slots.... start plugging in "cheap" GPUs.... how well would the memory stack up.... the expensive version 4 - 3060 12GB (or more running @ x8)......
@DigitalSpaceport2 ай бұрын
It works, but its hard to predict layout it will choose. A bit slow of course. Video on the remaining lower end GPUs out in a few days and i did test that. Fermi and some kepler is the cutoff they dont work.
@piero957Ай бұрын
How was CPU only performance on the same machine?
@DigitalSpaceportАй бұрын
Depends on the model you are running. The new Llama 3.2 models run very nice as they are super small on modernish CPUs. You of course will see thread speed impact performance then.
@rmeta33912 ай бұрын
Seems like you'd get more views if you install ComfyUI and gave us some iterations per second details on each GPU you test.
@DigitalSpaceport2 ай бұрын
Okay Ill do that
@hobohippy1616Ай бұрын
plz do nvidia m6000 24gb
@DigitalSpaceportАй бұрын
I wont be able to do that one sorry. I dont have one and its too old to buy at this point but if you already have one why not try? I would be cautious of going older then Pascal right now.
@johnc2k2k2 ай бұрын
Im using a Nvidia T1000 8GB
@DigitalSpaceportАй бұрын
Oh those are kinda rare. Id bet it has very solid performance. I have a P4 that will be out in a video sometime soonish and it does really well.
@alx8439Ай бұрын
Also, when phi started to give you gibberish you should have started a new chat, instead of trying to fix it - as Open Web UI sends all that historical chats (including gibberish) each time with your new message
@djayjpАй бұрын
2:29 "It's got 5 Gibibytes" 🤔😂
@DigitalSpaceportАй бұрын
GiB is the designation for gibibyte which is what nvtop displays for some reason but it confuses me every time I see it displayed in nvtop