SUPER Cheap Ai PC - Low Wattage, Budget Friendly, Local Ai Server with Vision

Рет қаралды 20,435

Күн бұрын

Пікірлер: 109

@dorinxtg 10 күн бұрын

Looking at the price: 2XP2000 will cost you around $200 (+shipping) while a new RTX 3060 12GB will cost you $284 from Amazon (+shipping), so for around $84, why should someone buy the 2 P2000 cards? I'm pretty sure that RTX3060 will smack out the dual P2000

@christender3614 10 күн бұрын

I guess bc it’s 4 GB more of VRAM so you should be able to use slightly larger models. That being said, I think I’d go with the 4060 as well.

@DigitalSpaceport 10 күн бұрын

I think I mentioned that in the video but yes the 3060 12GB is an all around better the card vs 2 P2000. The script was out the door when I pivoted to testing other cards so it was likely muddled as a point. Always when I write a script before testing.... but the M2000 will be the stand-in for the current cheapest rig I could figure out. Its for sure worth it to go with the 350 rig and 3060 12GB if someone can.

@tomoprime217 8 күн бұрын

@@DigitalSpaceport damn did you see intel's new Battlemage gpu? It drops in stores in a couple of weeks. The Arc B580 has 12GB of vram at $250! It improves efficiency on that front, using tricks like transforming the vector engines from two slices into a single structure, supporting native SIMD16 instructions, and beefing up the capabilities of the Xe core’s ray tracing and XMX AI instructions. I don't what the previous A770 16GB graphics card standing is but it may get a price drop soon as a result. It's already $259 at microcenter.

@DigitalSpaceport 6 күн бұрын

Yes I wish it was a 16GB but I will prolly snag one to test. I hope they have fixed their idle wattage issues also, my a750 is a power eater!

@Jason-ju7df 5 күн бұрын

@@DigitalSpaceport REALLY WANT to see you test 2x Arc B580 for 24GB of vram

@RoyFox-t1i 2 сағат бұрын

Love these videos! Keep up the great work! I currently have a gaming pc with the 4090 im using for ai inference but will be building your setups starting with this one and then moving to the midsize before the monster quad gpu one!

@Choooncey 9 күн бұрын

now that intel battlemage is out i bet they will be more price competitive with dedicated AI cores

@DigitalSpaceport 8 күн бұрын

Just watched the GN breakdown and looks like an interesting option and a good price point

@rhadiem 3 күн бұрын

10gb and 12gb VRAM is not worth it. Get a $100 M40 24gb gpu if you want cheap AI. They're slow but work fine. VRAM is king imho. Lots of stuff is made for 24gb VRAM.

@sebastianpodesta 10 күн бұрын

Great video! That was fun

@DigitalSpaceport 10 күн бұрын

It was a hard pivot mid video mentally for me to buy into, but rolling the dice worked. It came out decent. Thanks!

@clomok 8 күн бұрын

I have been playing with Ollama on an AMD Ryzen 5900hx with 32gb of DDR4-3200 RAM and I ran the same models (with my RAM already over 65% taken using other stuff) And got 8-9 tokens/s with minicpm-v:8b and have been happy with the 17-19 token/s I can get with llama3.2:3b

@JoshuaBoyd 9 күн бұрын

I was excited that you went from k2200 to m2000 to p2000. If you would have stopped at k2200 I would have been really disappointed.

@DigitalSpaceport 9 күн бұрын

The K2200 is disappointing but I was surprised at the M2000. I meandered a bit in some explanations as this all popped up and evolved outside of my bullet points but yeah if I get curious I will track it down if I can. This one I feel like I chased performance pretty well, but I am still wanting to know the why behind the K2200 to M2200 differences. I need to learn more.

@JoshuaBoyd 9 күн бұрын

@@DigitalSpaceport The M2000 does has better FP32 performance and about 25% faster memory performance. There is also the CUDA Compute Level 5.0 versus 5.2 difference. I haven't seen anything explaining what instruction level differences there are between the two. It would be cool to really locate all causes for the performance difference though.

@DigitalSpaceport 9 күн бұрын

There is a tool for benching models if you have their shape Ive yet to go down but looks good for comparisons like this. It may be a real rabbithole but Im interested in the raw perf numbers. Maybe of interest. github.com/stas00/ml-engineering/tree/master/compute/accelerator/benchmarks

@andAgainFPV 9 күн бұрын

stoked i found your channel! I'm considering using Exo to distribute a llm across my families fleet of gaming pcs, however not sure on the overall power draw. Thoughts?

@mitzusantiago 8 күн бұрын

Hi! I really enjoyed your video. I'm trying to do some experimental work (research) with local AI models (I'm a teacher). What is your opinion about using Xeon processors (like the ones that are sold in AliExpress) plus a graphic card like the ones that you presented? Is the Xeon processor necessary or Can I choose any other processor? (like a Ryzen plus an nvidia card). Greetings from Mexico.

@ebswv8 5 күн бұрын

Thanks for the videos. I am looking to build a home AI server for ~$1000 or less. Would love to see a video on what you could build for around that price range.

@DigitalSpaceport 5 күн бұрын

Good news, i'm working in that video already and I think its a price that gets a very capable setup. Out in days.

@ebswv8 4 күн бұрын

@@DigitalSpaceport Looking forward to it. I am a Software developer by trade and have been working to learn more about the hardware side of things. Thanks again for the Videos. You have gained a subscriber.

@ChrisCebelenski 6 күн бұрын

My 16GB 4060 TI clocks in around 31 tps on this model (single card used). I've seen these for around $400 USD, so price/performance ratio is on par, but overall system price is higher. And you get 16GB of VRAM, which is going to be the limiting factor with the cheaper cards even if the performance is OK for you.

@DigitalSpaceport 6 күн бұрын

Hey can you see if your 4060ti's can fit the new llama 3.3 in and at what context? It is a great model, excited for you to try it.

@ChrisCebelenski 6 күн бұрын

@@DigitalSpaceport Just started playing with it - default settings I'm getting about 6 tps. I'll try and up the context, but for some reason I'm getting flaky malfunctions with multiple models lately when playing with the settings. I hope that settles down with some updates. Also my models never unload, which is minor-level annoying. (Yes, I think I have the flags set correctly...)

@jensodotnet 4 күн бұрын

I tested the minicpm-v:8b on a gtx 1070 ~37 t/s, and on a rtx 3090 ~92 t/s. Using this prompt: "Help me study vocabulary: write a sentence for me to fill in the blank, and I'll try to pick the correct option." ~5,5gb vram. Default values. Tested with an image and prompt "explain the meme" using an image and got ~34t/s (gtx1070) and 97t/s (rtx 3090) the image was resized to 1344x1344

@jk-mm5to 10 күн бұрын

I have two titan xp's languishing. They may have a new purpose now.

@elliotthanford1328 8 күн бұрын

I would be interested to see how a tesla p4 or 2 does, especially as they are around $100 especially when compared to a 3060

@patricklogan6089 9 күн бұрын

Great stuff. Tnx

@UCs6ktlulE5BEeb3vBBOu6DQ 10 күн бұрын

My dual P40 + A2000 use 550w at idle lol Keeps me warm

@DigitalSpaceport 10 күн бұрын

Its "Free" heat if a workloads running 😉

@Nettlebed7 4 күн бұрын

20-80 watts? This means live 24/7 classification of persons on your Ring is not only technically feasible but also financially acceptable.

@pxrposewithnopurpose5801 5 күн бұрын

this guy is built different

@tomoprime217 9 күн бұрын

What about AMD gpus? Haven't they made progress for AI and cuda alternatives?

@DigitalSpaceport 9 күн бұрын

Yes I have read they are doing better on the sw front, but still have some stability issues. I do plan to snag in some AMD cards for testing when I can, just dont have money to buy one of everything really. It will happen.

@SunnyCurwenator 8 күн бұрын

Sorry, a really basic question from me; puns unintended. What are you using to collect reliable stats on power consumption (watts). We have Threadrippers and we're considering a couple of 4090s, but one question relates to having good metrics on power usage at idle and peak usage. Then we can begin to track and compare power costs. What have you found that works? Thanks in advance. Sunny

@DigitalSpaceport 7 күн бұрын

In the videos im peeking at a killawatt. If your gathering metrics you can use nvidia tooling to drop that out to influix. I forget the name of it but its fairly searchable. That would be useful to check around guthub for.

@Dundell2 2 күн бұрын

Most people I believe if you're just trying to track GPU wattage use, you can create a script of job to track nvidia-smi, and set the power levels of your RTX 3090's down to an acceptable wattage with some performance loss, until you hit an efficient rate. Something like nvidia-smi -pm 1 and nvidia-smi -p 250 I set my RTX 3060's to 100w max for all 4 cards. It's a decrease from their usual spikes of around 145w during inference, with around 10% speed loss, but 45w spike inference savings, and they never got past 70C

@TazzSmk 10 күн бұрын

19:53 - so would you recommend 3060's over 1080Ti's, or what kind of price would make 11GB Pascals an interesting value?

@UCs6ktlulE5BEeb3vBBOu6DQ 10 күн бұрын

stay away from Pascal, most models use FP16 and Pascal power is 90% in the 32 instead of the 16.

@DigitalSpaceport 10 күн бұрын

I do like the 3060s 12GB vram. That extra 1 GB really does matter. Id sell the 1080ti while you can and move on up.

@UCs6ktlulE5BEeb3vBBOu6DQ 10 күн бұрын

@@DigitalSpaceport Those that can (have another gpu for desktop) have modest benefit by setting the nvidia gpu in TCC mode instead of WDDM mode. So you get to use 95+ % of the vram for compute instead of 80+ % because of OS reserved memory. It can be the difference between 16k context and 32k or Q4 or Q5 quant.

@DigitalSpaceport 10 күн бұрын

Hey now thats news to me 😀 Im looking into this asap, thx for sharing!

@UCs6ktlulE5BEeb3vBBOu6DQ 10 күн бұрын

@@DigitalSpaceport once that you set that gpu to TCC mode, it cant display image until you set it to WDDM again (reboot does reset to WDDM unless you make the change persistant)

@Act1veSp1n 9 күн бұрын

I'm running Ollama UI on Proxmox with 1070 - its not bad. The 1070s are in low $100 USD rate. But you will probably do much better with 3060 12Gb/4060ti Super 16GB

@Act1veSp1n 9 күн бұрын

If anybody wondering - 1070 runs at 36 tokens per second. The Wattage pulled while idle = 36W (Intel 13500)

@DigitalSpaceport 9 күн бұрын

Oh yeah I did test a 1070ti out in an older video which unfortunately had bad audio. A card a lot of ppl have sitting around also which can still perform really decently for a power pin capable setup. kzbin.info/www/bejne/d6DGo6mcpJqBldUsi=YhmtIDi5C0JGyRL9&t=569

@C-141B_FE 9 күн бұрын

There is the 3DFX card?

@FSK1138 10 күн бұрын

i am having a good time with ryzen mini pc 5th and 6th gen are CHEAP you can add a m.2 to pci adapter for egpu and you can max max out the ram of the igpu in the bios

@DigitalSpaceport 9 күн бұрын

Does the m.2 to pcie adapter need an external power supply? I might buy one here for my unraid nas. It could use a proper cuda card.

@jelliott3604 9 күн бұрын

Have a ryzen 7 (5800H) apu with 64GB of RAM (48 dedicated to the GPU) and it works surprisingly well. Recently bought a HP Victus E16 motherboard (only) with the same APU plus a 3060 on the board (really it's 1/2 a 3060 - has 6GB of VRAM) that I have just gotten powered up and am hoping will be interesting - or at least cost effective for a £140 outlay (as i already have the RAM, SSD etc)

@FSK1138 Күн бұрын

@@DigitalSpaceport yes i use a atx power the gpu base has a on off switch amazing boost in quality!! and all ai tasks , i was sharing my built in gpu with system ram

@Dundell2 2 күн бұрын

P102-100 10GB mining cards I think you can still get sub~$45? 2 of these together can probably push IQ3 QwQ 32B with a decent amount of context in llama.cpp, and might be around $90~140 total GPUs. + basic.. Really anything since I believe they run pcie3.0@4 lanes each. They hit a decent inference level being pascal around GTX 1080 inference speeds.

@ARfourbanger2000 5 күн бұрын

Does the Del 7050 have power connectors to support a 3060? Also, what would the difference be in power consumption? Just curious, thanks!

@DigitalSpaceport 5 күн бұрын

No unfortunatly the 7050 doesnt. The wattages are nearly identical at idle however but the peak during use is higher on a 3060 but the work is done faster. Ive seen the 3060 in the 3620 peak at 130 watts but the 7050 only hit near 100watts.

@lovebutnosoilder 7 күн бұрын

Could I use a 4x x4 bifurcated pcie slot adapter and squeeze 5 gpus in the pc?

@HunterDishner 10 күн бұрын

It'd be cool to look at a K5200 8GB card. I'm seeing those used at like $70

@DigitalSpaceport 9 күн бұрын

I feel like Kepler, especially after this video, is a bridge too far on the performance side at this point. Its the bottom of the supported list also for llama.cpp/ollama so I cant think it hangs in for a lot longer on the software support side.

@beprivatecdblind7831 9 күн бұрын

How hard is it to get Invoke AI to use dual GOU, could you use an RTX 4060 8GB and a RTX 3060 12GB to get 20GB of VRAM, or would it be better to use 2 4060's?

@MBStudiosArt 19 сағат бұрын

I have a budget of $1200.00 Im not sure if I should be looking at doing a server or a pc build ... Im just running ollama models at the moment ... would like to be able to run LMStudio ... that said can you point to a build that would work any help would be appreciated

@DigitalSpaceport 19 сағат бұрын

Do you want to run Video gen or Image gen locally?

@myna2mac Күн бұрын

really a basic question - can I mix and match Intel CPU and Nvidia GPUs or AMD CPUs with new Intel GPU.

@michaelgleason4791 3 күн бұрын

If I only need a language model when I'm using my gaming/main PC, is there a point in having a dedicated LLM server? Is VRAM the end all be all? I have a 10GB 3080.

@andrewcameron4172 9 күн бұрын

Try the Tesla P4 Gpus

@DigitalSpaceport 9 күн бұрын

Okay I have one of those here. Gotta toss a fan on it but good call.

@andrewcameron4172 9 күн бұрын

@DigitalSpaceport I 3d printed the fan housing for mine

@DigitalSpaceport 9 күн бұрын

I had some printed but failed to find a good and cheap fan option for them. Did you happen to get fans that are not coil whine prone?

@andrewcameron4172 9 күн бұрын

@@DigitalSpaceport My fan is very loud and noisy but it does not bother me as it's in a room that is not occupied.

@lvutodeath 9 күн бұрын

What about AMD GPUs and APUs? Can I make a video request.

@pauljones9150 8 күн бұрын

Love this video script

@mejesster 9 күн бұрын

What cards are most efficient in terms of tokens per watt in your experience?

@DigitalSpaceport 8 күн бұрын

I think base idle has to be considered also, so intels cards are out on that alone. The 3xxx series and 4xxxx series all have great idles that scale with oddly the amount of vram its looking like in my analysis. I strongly recommend a 24GB card if a person can afford it as the experience is unmatched, and spexifically a 3090 unless you want image generation at max speeds. Inference is close to the same as the 4090. That said the 3060 12G is very fast and I recommend avoiding all 8GB cards unless you already have them. The 16GB 4060 is likely to be a strong contender as well.

@joelv4495 7 күн бұрын

IMO Apple Silicon Macs are the best for power efficiency. Not the best for capital cost or outright speed though.

@jbaenaxd 10 күн бұрын

Could you share that cat meme? 😅

@DigitalSpaceport 10 күн бұрын

Breaking software or fiber drops?

@jbaenaxd 10 күн бұрын

@@DigitalSpaceport Breaking software please 😂

@DigitalSpaceport 7 күн бұрын

kzbin.infoUgkx_9IiU9QQk6J0EHlQ9yOmz4FO0da1Zv1-?si=eu8llgtJqqiIMyTg 🙀

@jbaenaxd 7 күн бұрын

@@DigitalSpaceport thanks for that. I really appreciate it ❤️

@CatalystReaction 9 күн бұрын

now try tpus on a 4x4 carrier card

@patrickweggler 10 күн бұрын

Can you mix and match different cards?

@DigitalSpaceport 10 күн бұрын

You can to gain vram for model storage, but your performance is that of the slowest single card always. So if you mixed a K2200 and a P2000, the tps would be that of the K2200.

@patrickweggler 10 күн бұрын

@DigitalSpaceport thx. I have a 1080ti and two 1030, so it would be better to ignore the two small one and just use the 1080?

@DIYKolka Күн бұрын

Guys, for what i can usw the ai If i Run IT local, i dont See any uswcase ?

@notaras1985 8 күн бұрын

What's a proxmox server?

@ChrisCebelenski 6 күн бұрын

Hypervisor server - for running virtual machines. As opposed to a desktop machine that is a physical machine. Proxmox is good for sharing a machine among many tasks.

@CARTUNE. 8 күн бұрын

Thank you for this. I've been looking for ideas for a viable $200 - $400 ultra budget rig to get my feet wet. This is right in that range. lol

@rbwheels 9 күн бұрын

im running mine with RTX 4060 8GB

@DigitalSpaceport 9 күн бұрын

I need to get one a 16GB one of those in the mix for testing!

@adamlois5574 5 күн бұрын

This video is pretty pointless because 8gb vram is nothing at all when it comes to running AI. Like, sure if you build your pc from outdated and nearly unusable parts then sure you can make it cheap. What I'd like to see is a video showing how to cheaply make a pc using 2x m10's or 2x m40 tesla gpus.

@DigitalSpaceport 5 күн бұрын

Small models are pretty good now, however P40s would be a safer longevity bet as they are cuda12.

@Dundell2 2 күн бұрын

There are some uses for a 8GB Pascal GPU if you got one. smaller models that can still hit 20+ t/s for 7~8B models with small fine tunes, roleplay, visual support models, SDXL generators.

@MM-vl8ic 10 күн бұрын

Thanks again for the "Poof" of concept.....

@JesusLd935 9 күн бұрын

So cheap and low power but fucking slow :( I have 4060 and it is fucking fast

@StefRush 10 күн бұрын

4 x Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz (1 Socket) RAM usage 71.93% (11.22 GiB of 15.60 GiB) DDR3 proxmox-ve: 8.3.0 (running kernel: 6.8.12-4-pve) NVIDIA geForce GTX 960 PCIe GEN 1@16x 4Gi write python code to access this LLM response_token/s:24.43 create the snake game to run in python response_token/s:21.38 This is way faster than P2000 with just one GTX 960 card

@andrewcameron4172 Күн бұрын

On my Tesla P4 I get the following with MiniCpm-v and ollama total duration: 14.034112022s load duration: 76.407734ms prompt eval count: 383 token(s) prompt eval duration: 102ms prompt eval rate: 3754.90 tokens/s eval count: 311 token(s) eval duration: 13.819s eval rate: 22.51 tokens/s

@DigitalSpaceport Күн бұрын

Niiice!