Llama 3.2 Vision 11B LOCAL Cheap AI Server Dell 3620 and 3060 12GB GPU

Рет қаралды 20,285

Күн бұрын

Пікірлер: 61

@FaithMediaChannel Күн бұрын

Thank you for your video. I will share it with other people and other work organizations and put you on our list as preferred content providers for those who wanna do it yourselves again thank you for your video. It is so easy and you’re very detailed and the explanation Nolly in the application deployment as well as the hardware configuration.

@CoolWolf69 4 күн бұрын

After seeing this video I had to download and try this model by myself (also running Open WebUI in dockge while Ollama in a separate LXC container on Proxmox with a 20GB Nvidia RTX 4000 Ada passed through). I was flashed by the accuracy of the pictures being recognized! Even the numbers shown on my electricity meter's display were identified correct. Wow ... that is and will be fun using more over the weekend ;-) Keep up your good work with these videos!

@docrx1857 4 күн бұрын

Hi. This is an awesome video showcasing Ollama on a 12GB GPU. I am currently using a 12GB 6750xt. I still find it very usable speed with models in the 18-24 GB range.

@DigitalSpaceport 4 күн бұрын

Oh hey a datapoint for AMD! nice. can I ask what tokens/s you hut on the 6750xt? Any issues with ollama or does it "just work" ootb?

@docrx1857 4 күн бұрын

@@DigitalSpaceport I had to add a couple lines to ollama.service file because the 6750xt is not supported by rocm, but other than that it works great. I have not measured the token rate. I will get back to you when I do. But I can say with a 10600k and 32gb DDR4 3600 it generates responses at a very comfortable reading pace even when offloading decent percent to the cpu.

@AndyBerman 4 күн бұрын

Great video! Love anything AI related.

@firefon326 4 күн бұрын

Sounds like maybe you'll be doing a compilation video here soon, but if not or if it's going to be a while, maybe you should add the guide videos to a playlist. You have so much great content out there. It's hard to figure out which ones to watch if you're starting from scratch

@DigitalSpaceport 4 күн бұрын

I hear this feedback and its tough as things that are critical change fairly fast. I like the idea of segmenting the playlists by skillset and content type. Then I can during intro point new folks to that playlist and update those videos. Thanks, soon. And yes there is a new software guide video up soon I am working on right now.

@mariozulmin 3 күн бұрын

Thanks nearly my setup! Did you go with pci passthrough to an vm or to an lxc? The card is pretty good for daily tasks and some low power consumption. Also 3.2 vision is at the moment really good for what i use it, mine takes about 170W on full load though 😅

@DigitalSpaceport Күн бұрын

So in this demo I went with the VM and passthru as it "just works" with no cgroups funkiness but in a stable system I always go with LXC. Plus you can use it for other tasks but if it crashes out of VRAM with like a lot of tasks it doesnt recover gracefully. I need to figure that out but yeah 3.2 vision is wild stuff.

@MM-vl8ic 4 күн бұрын

I like the way you are "testing" various Combos..... I'm an old guy progressively having hand issues after years of physical work/abuse..... I'm really interested in using the "AI" as a solution for disabilities, as well as Blue Iris/Home Assistant tie in. I'm "researching" the voice to text (conversational) as well as image recognition server/servers..... would be interesting to see speech to text asking/inputting the question(s)..... I have a 3060 12g and a 4000A to play with.... if you have time/desire, would be interested in seeing a dual GPU setup with the above GPUs (so I don't have to)..... also curious how they would perform in X8 (electrical) slots... and if multiple models can run simultaneously, voice

@DigitalSpaceport 4 күн бұрын

They will perform for inferance just as fine in an 8 as a 16 its a low BW workload. Training that wouldnt hold true however. Agree I need to do the voice video. Its pretty awesome and I use it often on cellphone.

@fatherfoxstrongpaw8968 4 күн бұрын

i'm a disabled vet myself. i just started working on an agentic framework i quit on back in 2004, but now it's being refactored for vets and the disabled. problem is i'm on a fixed income and the software is failing from a cascading failure from heat on my laptop. wish i had the money for new hardware. i have all the modules working long enough to run the first couple test, but not long enough to put all the pieces together. all the pieces of the puzzle are available, but hardware will determine if you get a working product or not. #1 lesson? all the llama's are neutered and lobotomized and thus a waste of time. quants only make it worse, cascading failures and hallucinations. open-interpreter for tool use, agent-zero for memory, openai api/gpt4o for best results until a decent local LLM comes out.

@computersales 4 күн бұрын

Interesting build. Funny you make this video not too long after I recycled a bunch of them. It would be nice if people found more uses for stuff older than 8TH gen. These older machines are still perfectly usable.

@DigitalSpaceport 4 күн бұрын

Im testing out a maxwell card this weekend, a M2000. I bet its going to suprise me!

@computersales 4 күн бұрын

@DigitalSpaceport it would be interesting to see a functional ultra budget build. Curious how much cheaper that this setup you could get. The Dell T3600 with 625W PSU are really cheap now.

@DigitalSpaceport 4 күн бұрын

The power pin for a GPU tends to dictate I have found and a must to get enough vram cheaply. A strong contender that is even cheaper could be based off an HP workstation class but wow I do not like their bios at all. I have a note that says so taped to my second monitor in case I forget but it could bring costs down. I think 7th gen intel is desirable cutoff as that iGPU performs the vast amount of offload needed to have a decent AIO media center box also. Does a 3600 have a power 6 pin?

@computersales 4 күн бұрын

@@DigitalSpaceport T3600 has two 6 pin connectors if it is the 635W config. 425W config doesn't support powered GPUs though. There also can be some clearance issues depending on the GPU design. Looks like they bring the same price as the 3620 though so might not be worth pursuing.

@jamesgordon3434 3 күн бұрын

I would guess by the fact if you ask multiple things the LLM processes them all at once, the vision is the same and doesn't read left to right nor right to left but processes the entire sentence all at once. 29:14

@DigitalSpaceport 3 күн бұрын

Okay but check this out. It says -80 at first, but that screen looks like that if read rtl. The - is a small case watt. Its 08watt on the screen. Im testing the big one today so will investigate further on it.

@DIYKolka Күн бұрын

Ich verstehe nicht, wofür ihr diese Modelle benutzt. Kann mir das vielleicht einer erklären was der nutzen ist?

@klr1523 4 күн бұрын

18:02 I thought it might be referring to the F-connector and is not registering the white Cat-6 cable at all. Maybe try again using a Cat- with a contrasting color...

@DigitalSpaceport 4 күн бұрын

Good point! I am also now convinced it is reading RTL and not LTR on LCD screens which is weird.

@ToddWBucy-lf8yz 3 күн бұрын

30:07 if you have the ram you can always throw up a RAMDisk and swap models out of CPU RAM and into VRAM much quicker than off a drive. More advanced setup would use Memcached or Redis but for something quick and dirty RANDisk all day.

@MitchelDirks 3 күн бұрын

dude, genius! i didnt think about this. I have a server personally that has 192ish and might use this method lol

@DigitalSpaceport Күн бұрын

Redis/valkey sounds like a great option for this!

@ToddWBucy-lf8yz Күн бұрын

@@DigitalSpaceport yup. I use a similar approach for swapping datasets in and out of VRAM during fine-tuning and have even put my whole RAG in VRAM via lsync (It works but no way I would put it production professionally) and that defiantly helped speed things up quite a lot.

@xlr555usa 3 күн бұрын

I have an old dell i7-4760 that I could try pairing with a 3060 12gb. I have run llama3 on just a i5-13600K and it was usable but a little slow.

@DigitalSpaceport Күн бұрын

Was it the new llama3.2-vision 11b? What tokens/s did you get?

@thanadeehong921 3 күн бұрын

I would love to see the same test on fp16 or 32. Not sure if it gonna give more accurate responses.

@DigitalSpaceport Күн бұрын

I do plan to test the 90b-instruct-q8_0 which is 95GB (4 3090 gonna be close) and also the 11b-instruct-fp16 is only 21GB so might give that a roll also. I think the meta llama series of models caps at just fp16 or am I overlooking something?

@meisterblack9806 3 күн бұрын

hi will you try llamafile on threadripper cpu not gpu they say its really fast

@NLPprompter 2 күн бұрын

could you please test this build with localGPT vision github, that repo had several vision model to test with seeing how each model perform on RAG with such build might really interesting because this kind of RAG were really different to image to text to vector, this system image to vector. different architecture

@DigitalSpaceport Күн бұрын

Im looking at this now and I like the idea fewer steps in RAG. Img2txt getting the boot would be awesome.

@NLPprompter 23 сағат бұрын

@DigitalSpaceport awesome, glad to know you are into the concept of "image to vector" instead of "image to text to vector" i believe in future having a model be able to handle both without losing speed in consumer hardware would be game changing, since both architecture have their pro cons. thanks for your videos mate.

@alcohonis 4 күн бұрын

Can you do a AMD test with a 7900 variant. I feel that’s more affordable and realistic when it comes down to $ to VRAM ratio.

@DigitalSpaceport 4 күн бұрын

The amount of requests I am getting for testing AMD GPUs does have me strongly considering buying one used to see. I had a friend that was going to lend me one but then they sold it. Possibly test this out soon.

@alcohonis 4 күн бұрын

I completely understand. I would love to see an AMD build so that we don’t have to offer our kidneys to the Nvidia gods.

@JoeVSvolcano 4 күн бұрын

LoL, Now your speaking my langage! Until 48GB Vram cards under 1000 become a thing anyway 😀

@DigitalSpaceport 4 күн бұрын

Yeah this 3060 is pretty sweet. I wish there was a 12 or 16GB vram slot powered card that was cheap but maybe in a few years. 20 t/s is totally passable and the base model this is strapped to is pretty decent.

@mariozulmin 3 күн бұрын

@@DigitalSpaceportyes and affordable too, sad there is no 16G Version for just a little more. The price gap between 12-24 is just insane if just used for AI

@tungstentaco495 4 күн бұрын

I wonder how the Q8 version would do in these tests. *Should* be better.

@DigitalSpaceport 4 күн бұрын

I do plan on testing the q8 in the 90 so we should get a decent hi-lo gradiant. If that is signifigant I will revisit for sure

@alx8439 4 күн бұрын

Next time give it a try to ask a new question in a new chat. Ollama by default is using context size of 2k, you most probably exhausting it too quick with pictures. And the GPU VRAM is too low to accomodate higher context size without flash attention or using smaller quants, rather than default 4bit you have downloaded.

@i34g5jj5ssx 3 күн бұрын

I understand appeal of 3060, but why everyone ignore 4060ti 16gb?

@DigitalSpaceport 3 күн бұрын

Im not ignoring it myself, at msrp its a rather good card. I just cant afford to buy 1 of all the things so thats why its not reviewed here.

@milutinke 3 күн бұрын

It's a shame Pixtral is not on ollama, also it's a bigger model.

@DigitalSpaceport Күн бұрын

I agree but I think there is a way to make it work with the new ollama HF running. Would need to manually kick that off but I think it could work.

@FSK1138 3 күн бұрын

i5/ i7 10th gen Ryzen 5 / 6 5th gen better price/ watts

@nhtdmr 4 күн бұрын

No body should give their Ai Data or Researches to big providers. Keep your data in Local.

@DigitalSpaceport 4 күн бұрын

Fully agree! The data the collect on us in addition to paid fir services is absolutly silly.

@NLPprompter 2 күн бұрын

fully local yes be careful of API too some model still send data

@genkidama7385 3 күн бұрын

these "vision" models are so bad and unreliable for anything. need to be way more specialized and fed much more samples to be of any value. spatial relationships are completly wrong. blob classification/recognition is weak. i dont see any use for this unless very very basic tasks. i dont even know if any of this can be put to production due to unreliability.

@DigitalSpaceport 3 күн бұрын

I am about to start testing out the big one here and hope for a lot of improvement. I just want to be able to read an LCD thats very clear which seems like it should be a small hurdle.