Looking at the price: 2XP2000 will cost you around $200 (+shipping) while a new RTX 3060 12GB will cost you $284 from Amazon (+shipping), so for around $84, why should someone buy the 2 P2000 cards? I'm pretty sure that RTX3060 will smack out the dual P2000
@christender361410 күн бұрын
I guess bc it’s 4 GB more of VRAM so you should be able to use slightly larger models. That being said, I think I’d go with the 4060 as well.
@DigitalSpaceport10 күн бұрын
I think I mentioned that in the video but yes the 3060 12GB is an all around better the card vs 2 P2000. The script was out the door when I pivoted to testing other cards so it was likely muddled as a point. Always when I write a script before testing.... but the M2000 will be the stand-in for the current cheapest rig I could figure out. Its for sure worth it to go with the 350 rig and 3060 12GB if someone can.
@tomoprime2178 күн бұрын
@@DigitalSpaceport damn did you see intel's new Battlemage gpu? It drops in stores in a couple of weeks. The Arc B580 has 12GB of vram at $250! It improves efficiency on that front, using tricks like transforming the vector engines from two slices into a single structure, supporting native SIMD16 instructions, and beefing up the capabilities of the Xe core’s ray tracing and XMX AI instructions. I don't what the previous A770 16GB graphics card standing is but it may get a price drop soon as a result. It's already $259 at microcenter.
@DigitalSpaceport6 күн бұрын
Yes I wish it was a 16GB but I will prolly snag one to test. I hope they have fixed their idle wattage issues also, my a750 is a power eater!
@Jason-ju7df5 күн бұрын
@@DigitalSpaceport REALLY WANT to see you test 2x Arc B580 for 24GB of vram
@RoyFox-t1i2 сағат бұрын
Love these videos! Keep up the great work! I currently have a gaming pc with the 4090 im using for ai inference but will be building your setups starting with this one and then moving to the midsize before the monster quad gpu one!
@Choooncey9 күн бұрын
now that intel battlemage is out i bet they will be more price competitive with dedicated AI cores
@DigitalSpaceport8 күн бұрын
Just watched the GN breakdown and looks like an interesting option and a good price point
@rhadiem3 күн бұрын
10gb and 12gb VRAM is not worth it. Get a $100 M40 24gb gpu if you want cheap AI. They're slow but work fine. VRAM is king imho. Lots of stuff is made for 24gb VRAM.
@sebastianpodesta10 күн бұрын
Great video! That was fun
@DigitalSpaceport10 күн бұрын
It was a hard pivot mid video mentally for me to buy into, but rolling the dice worked. It came out decent. Thanks!
@clomok8 күн бұрын
I have been playing with Ollama on an AMD Ryzen 5900hx with 32gb of DDR4-3200 RAM and I ran the same models (with my RAM already over 65% taken using other stuff) And got 8-9 tokens/s with minicpm-v:8b and have been happy with the 17-19 token/s I can get with llama3.2:3b
@JoshuaBoyd9 күн бұрын
I was excited that you went from k2200 to m2000 to p2000. If you would have stopped at k2200 I would have been really disappointed.
@DigitalSpaceport9 күн бұрын
The K2200 is disappointing but I was surprised at the M2000. I meandered a bit in some explanations as this all popped up and evolved outside of my bullet points but yeah if I get curious I will track it down if I can. This one I feel like I chased performance pretty well, but I am still wanting to know the why behind the K2200 to M2200 differences. I need to learn more.
@JoshuaBoyd9 күн бұрын
@@DigitalSpaceport The M2000 does has better FP32 performance and about 25% faster memory performance. There is also the CUDA Compute Level 5.0 versus 5.2 difference. I haven't seen anything explaining what instruction level differences there are between the two. It would be cool to really locate all causes for the performance difference though.
@DigitalSpaceport9 күн бұрын
There is a tool for benching models if you have their shape Ive yet to go down but looks good for comparisons like this. It may be a real rabbithole but Im interested in the raw perf numbers. Maybe of interest. github.com/stas00/ml-engineering/tree/master/compute/accelerator/benchmarks
@andAgainFPV9 күн бұрын
stoked i found your channel! I'm considering using Exo to distribute a llm across my families fleet of gaming pcs, however not sure on the overall power draw. Thoughts?
@mitzusantiago8 күн бұрын
Hi! I really enjoyed your video. I'm trying to do some experimental work (research) with local AI models (I'm a teacher). What is your opinion about using Xeon processors (like the ones that are sold in AliExpress) plus a graphic card like the ones that you presented? Is the Xeon processor necessary or Can I choose any other processor? (like a Ryzen plus an nvidia card). Greetings from Mexico.
@ebswv85 күн бұрын
Thanks for the videos. I am looking to build a home AI server for ~$1000 or less. Would love to see a video on what you could build for around that price range.
@DigitalSpaceport5 күн бұрын
Good news, i'm working in that video already and I think its a price that gets a very capable setup. Out in days.
@ebswv84 күн бұрын
@@DigitalSpaceport Looking forward to it. I am a Software developer by trade and have been working to learn more about the hardware side of things. Thanks again for the Videos. You have gained a subscriber.
@ChrisCebelenski6 күн бұрын
My 16GB 4060 TI clocks in around 31 tps on this model (single card used). I've seen these for around $400 USD, so price/performance ratio is on par, but overall system price is higher. And you get 16GB of VRAM, which is going to be the limiting factor with the cheaper cards even if the performance is OK for you.
@DigitalSpaceport6 күн бұрын
Hey can you see if your 4060ti's can fit the new llama 3.3 in and at what context? It is a great model, excited for you to try it.
@ChrisCebelenski6 күн бұрын
@@DigitalSpaceport Just started playing with it - default settings I'm getting about 6 tps. I'll try and up the context, but for some reason I'm getting flaky malfunctions with multiple models lately when playing with the settings. I hope that settles down with some updates. Also my models never unload, which is minor-level annoying. (Yes, I think I have the flags set correctly...)
@jensodotnet4 күн бұрын
I tested the minicpm-v:8b on a gtx 1070 ~37 t/s, and on a rtx 3090 ~92 t/s. Using this prompt: "Help me study vocabulary: write a sentence for me to fill in the blank, and I'll try to pick the correct option." ~5,5gb vram. Default values. Tested with an image and prompt "explain the meme" using an image and got ~34t/s (gtx1070) and 97t/s (rtx 3090) the image was resized to 1344x1344
@jk-mm5to10 күн бұрын
I have two titan xp's languishing. They may have a new purpose now.
@elliotthanford13288 күн бұрын
I would be interested to see how a tesla p4 or 2 does, especially as they are around $100 especially when compared to a 3060
@patricklogan60899 күн бұрын
Great stuff. Tnx
@UCs6ktlulE5BEeb3vBBOu6DQ10 күн бұрын
My dual P40 + A2000 use 550w at idle lol Keeps me warm
@DigitalSpaceport10 күн бұрын
Its "Free" heat if a workloads running 😉
@Nettlebed74 күн бұрын
20-80 watts? This means live 24/7 classification of persons on your Ring is not only technically feasible but also financially acceptable.
@pxrposewithnopurpose58015 күн бұрын
this guy is built different
@tomoprime2179 күн бұрын
What about AMD gpus? Haven't they made progress for AI and cuda alternatives?
@DigitalSpaceport9 күн бұрын
Yes I have read they are doing better on the sw front, but still have some stability issues. I do plan to snag in some AMD cards for testing when I can, just dont have money to buy one of everything really. It will happen.
@SunnyCurwenator8 күн бұрын
Sorry, a really basic question from me; puns unintended. What are you using to collect reliable stats on power consumption (watts). We have Threadrippers and we're considering a couple of 4090s, but one question relates to having good metrics on power usage at idle and peak usage. Then we can begin to track and compare power costs. What have you found that works? Thanks in advance. Sunny
@DigitalSpaceport7 күн бұрын
In the videos im peeking at a killawatt. If your gathering metrics you can use nvidia tooling to drop that out to influix. I forget the name of it but its fairly searchable. That would be useful to check around guthub for.
@Dundell22 күн бұрын
Most people I believe if you're just trying to track GPU wattage use, you can create a script of job to track nvidia-smi, and set the power levels of your RTX 3090's down to an acceptable wattage with some performance loss, until you hit an efficient rate. Something like nvidia-smi -pm 1 and nvidia-smi -p 250 I set my RTX 3060's to 100w max for all 4 cards. It's a decrease from their usual spikes of around 145w during inference, with around 10% speed loss, but 45w spike inference savings, and they never got past 70C
@TazzSmk10 күн бұрын
19:53 - so would you recommend 3060's over 1080Ti's, or what kind of price would make 11GB Pascals an interesting value?
@UCs6ktlulE5BEeb3vBBOu6DQ10 күн бұрын
stay away from Pascal, most models use FP16 and Pascal power is 90% in the 32 instead of the 16.
@DigitalSpaceport10 күн бұрын
I do like the 3060s 12GB vram. That extra 1 GB really does matter. Id sell the 1080ti while you can and move on up.
@UCs6ktlulE5BEeb3vBBOu6DQ10 күн бұрын
@@DigitalSpaceport Those that can (have another gpu for desktop) have modest benefit by setting the nvidia gpu in TCC mode instead of WDDM mode. So you get to use 95+ % of the vram for compute instead of 80+ % because of OS reserved memory. It can be the difference between 16k context and 32k or Q4 or Q5 quant.
@DigitalSpaceport10 күн бұрын
Hey now thats news to me 😀 Im looking into this asap, thx for sharing!
@UCs6ktlulE5BEeb3vBBOu6DQ10 күн бұрын
@@DigitalSpaceport once that you set that gpu to TCC mode, it cant display image until you set it to WDDM again (reboot does reset to WDDM unless you make the change persistant)
@Act1veSp1n9 күн бұрын
I'm running Ollama UI on Proxmox with 1070 - its not bad. The 1070s are in low $100 USD rate. But you will probably do much better with 3060 12Gb/4060ti Super 16GB
@Act1veSp1n9 күн бұрын
If anybody wondering - 1070 runs at 36 tokens per second. The Wattage pulled while idle = 36W (Intel 13500)
@DigitalSpaceport9 күн бұрын
Oh yeah I did test a 1070ti out in an older video which unfortunately had bad audio. A card a lot of ppl have sitting around also which can still perform really decently for a power pin capable setup. kzbin.info/www/bejne/d6DGo6mcpJqBldUsi=YhmtIDi5C0JGyRL9&t=569
@C-141B_FE9 күн бұрын
There is the 3DFX card?
@FSK113810 күн бұрын
i am having a good time with ryzen mini pc 5th and 6th gen are CHEAP you can add a m.2 to pci adapter for egpu and you can max max out the ram of the igpu in the bios
@DigitalSpaceport9 күн бұрын
Does the m.2 to pcie adapter need an external power supply? I might buy one here for my unraid nas. It could use a proper cuda card.
@jelliott36049 күн бұрын
Have a ryzen 7 (5800H) apu with 64GB of RAM (48 dedicated to the GPU) and it works surprisingly well. Recently bought a HP Victus E16 motherboard (only) with the same APU plus a 3060 on the board (really it's 1/2 a 3060 - has 6GB of VRAM) that I have just gotten powered up and am hoping will be interesting - or at least cost effective for a £140 outlay (as i already have the RAM, SSD etc)
@FSK1138Күн бұрын
@@DigitalSpaceport yes i use a atx power the gpu base has a on off switch amazing boost in quality!! and all ai tasks , i was sharing my built in gpu with system ram
@Dundell22 күн бұрын
P102-100 10GB mining cards I think you can still get sub~$45? 2 of these together can probably push IQ3 QwQ 32B with a decent amount of context in llama.cpp, and might be around $90~140 total GPUs. + basic.. Really anything since I believe they run pcie3.0@4 lanes each. They hit a decent inference level being pascal around GTX 1080 inference speeds.
@ARfourbanger20005 күн бұрын
Does the Del 7050 have power connectors to support a 3060? Also, what would the difference be in power consumption? Just curious, thanks!
@DigitalSpaceport5 күн бұрын
No unfortunatly the 7050 doesnt. The wattages are nearly identical at idle however but the peak during use is higher on a 3060 but the work is done faster. Ive seen the 3060 in the 3620 peak at 130 watts but the 7050 only hit near 100watts.
@lovebutnosoilder7 күн бұрын
Could I use a 4x x4 bifurcated pcie slot adapter and squeeze 5 gpus in the pc?
@HunterDishner10 күн бұрын
It'd be cool to look at a K5200 8GB card. I'm seeing those used at like $70
@DigitalSpaceport9 күн бұрын
I feel like Kepler, especially after this video, is a bridge too far on the performance side at this point. Its the bottom of the supported list also for llama.cpp/ollama so I cant think it hangs in for a lot longer on the software support side.
@beprivatecdblind78319 күн бұрын
How hard is it to get Invoke AI to use dual GOU, could you use an RTX 4060 8GB and a RTX 3060 12GB to get 20GB of VRAM, or would it be better to use 2 4060's?
@MBStudiosArt19 сағат бұрын
I have a budget of $1200.00 Im not sure if I should be looking at doing a server or a pc build ... Im just running ollama models at the moment ... would like to be able to run LMStudio ... that said can you point to a build that would work any help would be appreciated
@DigitalSpaceport19 сағат бұрын
Do you want to run Video gen or Image gen locally?
@myna2macКүн бұрын
really a basic question - can I mix and match Intel CPU and Nvidia GPUs or AMD CPUs with new Intel GPU.
@michaelgleason47913 күн бұрын
If I only need a language model when I'm using my gaming/main PC, is there a point in having a dedicated LLM server? Is VRAM the end all be all? I have a 10GB 3080.
@andrewcameron41729 күн бұрын
Try the Tesla P4 Gpus
@DigitalSpaceport9 күн бұрын
Okay I have one of those here. Gotta toss a fan on it but good call.
@andrewcameron41729 күн бұрын
@DigitalSpaceport I 3d printed the fan housing for mine
@DigitalSpaceport9 күн бұрын
I had some printed but failed to find a good and cheap fan option for them. Did you happen to get fans that are not coil whine prone?
@andrewcameron41729 күн бұрын
@@DigitalSpaceport My fan is very loud and noisy but it does not bother me as it's in a room that is not occupied.
@lvutodeath9 күн бұрын
What about AMD GPUs and APUs? Can I make a video request.
@pauljones91508 күн бұрын
Love this video script
@mejesster9 күн бұрын
What cards are most efficient in terms of tokens per watt in your experience?
@DigitalSpaceport8 күн бұрын
I think base idle has to be considered also, so intels cards are out on that alone. The 3xxx series and 4xxxx series all have great idles that scale with oddly the amount of vram its looking like in my analysis. I strongly recommend a 24GB card if a person can afford it as the experience is unmatched, and spexifically a 3090 unless you want image generation at max speeds. Inference is close to the same as the 4090. That said the 3060 12G is very fast and I recommend avoiding all 8GB cards unless you already have them. The 16GB 4060 is likely to be a strong contender as well.
@joelv44957 күн бұрын
IMO Apple Silicon Macs are the best for power efficiency. Not the best for capital cost or outright speed though.
@@DigitalSpaceport thanks for that. I really appreciate it ❤️
@CatalystReaction9 күн бұрын
now try tpus on a 4x4 carrier card
@patrickweggler10 күн бұрын
Can you mix and match different cards?
@DigitalSpaceport10 күн бұрын
You can to gain vram for model storage, but your performance is that of the slowest single card always. So if you mixed a K2200 and a P2000, the tps would be that of the K2200.
@patrickweggler10 күн бұрын
@DigitalSpaceport thx. I have a 1080ti and two 1030, so it would be better to ignore the two small one and just use the 1080?
@DIYKolkaКүн бұрын
Guys, for what i can usw the ai If i Run IT local, i dont See any uswcase ?
@notaras19858 күн бұрын
What's a proxmox server?
@ChrisCebelenski6 күн бұрын
Hypervisor server - for running virtual machines. As opposed to a desktop machine that is a physical machine. Proxmox is good for sharing a machine among many tasks.
@CARTUNE.8 күн бұрын
Thank you for this. I've been looking for ideas for a viable $200 - $400 ultra budget rig to get my feet wet. This is right in that range. lol
@rbwheels9 күн бұрын
im running mine with RTX 4060 8GB
@DigitalSpaceport9 күн бұрын
I need to get one a 16GB one of those in the mix for testing!
@adamlois55745 күн бұрын
This video is pretty pointless because 8gb vram is nothing at all when it comes to running AI. Like, sure if you build your pc from outdated and nearly unusable parts then sure you can make it cheap. What I'd like to see is a video showing how to cheaply make a pc using 2x m10's or 2x m40 tesla gpus.
@DigitalSpaceport5 күн бұрын
Small models are pretty good now, however P40s would be a safer longevity bet as they are cuda12.
@Dundell22 күн бұрын
There are some uses for a 8GB Pascal GPU if you got one. smaller models that can still hit 20+ t/s for 7~8B models with small fine tunes, roleplay, visual support models, SDXL generators.
@MM-vl8ic10 күн бұрын
Thanks again for the "Poof" of concept.....
@JesusLd9359 күн бұрын
So cheap and low power but fucking slow :( I have 4060 and it is fucking fast
@StefRush10 күн бұрын
4 x Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz (1 Socket) RAM usage 71.93% (11.22 GiB of 15.60 GiB) DDR3 proxmox-ve: 8.3.0 (running kernel: 6.8.12-4-pve) NVIDIA geForce GTX 960 PCIe GEN 1@16x 4Gi write python code to access this LLM response_token/s:24.43 create the snake game to run in python response_token/s:21.38 This is way faster than P2000 with just one GTX 960 card
@andrewcameron4172Күн бұрын
On my Tesla P4 I get the following with MiniCpm-v and ollama total duration: 14.034112022s load duration: 76.407734ms prompt eval count: 383 token(s) prompt eval duration: 102ms prompt eval rate: 3754.90 tokens/s eval count: 311 token(s) eval duration: 13.819s eval rate: 22.51 tokens/s