Llama 3.2 3b Review Self Hosted Ai Testing on Ollama

Llama 3.2 3b Review Self Hosted Ai Testing on Ollama - Open Source LLM Review

Рет қаралды 9,012

Digital Spaceport

Күн бұрын

Пікірлер: 46

@DigitalSpaceport 21 күн бұрын

AI Hardware Writeup digitalspaceport.com/homelab-ai-server-rig-tips-tricks-gotchas-and-takeaways

@enermaxstephens1051 Ай бұрын

There are things like GPT4all which allows you to use the CPU instead of the RAM, for people who don't have a big bad video card

@jasonabc 2 ай бұрын

Did one test uploaded a pdf and asked it to summarize the document. It spit out jibberish not even the same topic as the paper. So obviously how could i begin to trust anything from this if it fails something very simple.

@DigitalSpaceport 2 ай бұрын

It does seem to be pretty bad frankly

@nathantruitt2740 Ай бұрын

Honestly with Arbitrary Number Arrays it took an interesting approach to an otherwise confusing question. A is 65 in ASCII decimal format :)

@shuntera 2 ай бұрын

What are you running in the powershell in the top right that shows your GPU status?

@DigitalSpaceport 2 ай бұрын

Im actually ssh'd into the host computer Guide Here: kzbin.info/www/bejne/ip6xhHehn6mHhdU and running NVTOP

@marhensa 17 күн бұрын

why there's no Llama 3.2 8b ?

@vfxvision723 Ай бұрын

Tnx for the video, what do you use to use AI models locally?

@DigitalSpaceport Ай бұрын

The software I use? Im not clear on what you are asking, sorry.

@Ikhsaints Ай бұрын

@@DigitalSpaceport Maybe he means like LM Studio or other.

@IlllIlllIlllIlll 2 ай бұрын

16:00 synthetic as in synthetic data?

@DigitalSpaceport 2 ай бұрын

Synthetic benchmarks like MMLU

@ugurinanc5177 2 ай бұрын

How can you quantize that model? And also can we fine tuning a model we downloaded from ollama?

@DigitalSpaceport 2 ай бұрын

This is a good fine tune starting jumpoff point. It answers your questions and in short you can and the -q flag in ollama can create your specific quant desired. The huggingface fine tune adapter framework and safetensor adapter for Llama3.1 is a thing so I would assume it is for 3.2 as well. That's a good starting point for you. github.com/ollama/ollama/blob/main/docs/import.md

@omarnahdi3380 Ай бұрын

Hey i don't get markdown format of the code in the output. Can you explain how did you get it?

@DigitalSpaceport Ай бұрын

Llama 3.2 has an issue with creating valid md it seems from repeated attempts. You should try qwen 2.5 or another model. Im not sure how to fix bad outputs.

@ElementX32 2 ай бұрын

Do you think I can run this on the desktop I just bought which is the Digital Storm Velox with one 4090 gpu, 128gb of memory and the AMD Ryzen 9 9950X (16-Core) 5.7 GHz Turbo (Zen 5)? I know multiple gpu's is preferable, however that's beyond my budget at the moment.

@BrennanMetzelaar 2 ай бұрын

You should be able to run 3b easy. There is a website that shows recommended specs for Llama 3.2 and you surpass it easy (for 3b). Type something like 'Llama 3.2 Requirements' in a search engine. I thought it was Meta's website, but it looks like it is an enthusiast website. I have a Ryzen 5800x and a Rtx 3080 which is worse than you, and run it just fine.

@DigitalSpaceport 2 ай бұрын

yes you can run this in windows. Windows does impact performance negatively a bit but you should be easy able to run a 22b or smaller model on that card at a decent quant level. I would not go less then 4 myself.

@hookflash699 2 ай бұрын

Nice vid! What UI are you using?

@jakohema8429 2 ай бұрын

@@hookflash699 openweb ui

@thepulpfloyd 2 ай бұрын

He's using Open WebUI 👍🏻

@DavidVincentSSM 2 ай бұрын

i agree that the results don't seem to match benchmarks in real world performance.. maybe something everyone else is missing?

@DigitalSpaceport 2 ай бұрын

@@DavidVincentSSM Im not trying to larp as a pro or anything but I am interested in what makes for a good product. Im thinking less and less benchmarks make for a good product.

@zurgmuckerberg 2 ай бұрын

will you test the 11b one?

@DigitalSpaceport 2 ай бұрын

okay

@junkzencka 2 ай бұрын

Models answer =: "Here's how some letters relate to numbers using ASCII" M = 77 S = 83 Z = 90 Answer is correct when using the ASCII table. It even tells you what it's doing! Look up the ASCII table

@fork407 Ай бұрын

no, because it was given the A=0 in the context.

@michaelwpannekoek 2 ай бұрын

What *top program are you running under WSL2?

@DigitalSpaceport 2 ай бұрын

htop and nvtop but not sure nvtop runs under wsl2

@DeepThinker193 2 ай бұрын

This is the result of AI inbreeding aka training on Synthetic data. I have a prompt that gets counting etc consistently accurate on llama 3.1 8b. However, on the 3.2 models they get things wrong all the time.

@DigitalSpaceport 2 ай бұрын

Oh that's a great term! Ai inbreeding 😅

@mayankmaurya8631 2 ай бұрын

You sure it wasn't because of 3b vs 8b ?

@DeepThinker193 2 ай бұрын

@@mayankmaurya8631 Nope, tried my prompts on the 3.2 11b and 3.2 90b as well. They's just inferior and keep getting things wrong. I get consistently correct responses from llama 3.1 8b using my special prompts.

@bbertram2 2 ай бұрын

It got it right for me... (or so I thought, lol) If A is equal to number 0, what is the number of M, S and z? llama3.2:latest Based on the standard ordering of the alphabet, where A is indeed equal to 0: M = 13 S = 19 Z = 26 Let me know if you have any further questions!

@DigitalSpaceport 2 ай бұрын

If A=0, then B=1...M=12...S=18...z=25

@bbertram2 2 ай бұрын

@@DigitalSpaceport hahahaha....yeah. Oh well.

@DigitalSpaceport 2 ай бұрын

I was surprised at a few other models that did get this right and it tracks closely also to counting letters question also I've noticed.

@bbertram2 2 ай бұрын

I went back and asked it again, same answer. Tried 3.1, could not get it either. However I gave it one clue A=0 and B=1 and boom it got it. Probably too easy of a clue but I'm surprised it could not answer it. I asked Claude and ChatGPT...they couldn't get it either....very odd. Good question!

@DigitalSpaceport 2 ай бұрын

Qwen 2.5 here - kzbin.info/www/bejne/moDVmHykZ9-nrLssi=mb33EAbMjXk55YC3&t=555

@_s.i.s.u. 2 ай бұрын

You're prompting the model wrong. The "strawberry" tests fail due to the tokenization methods of the given model. Prompt it as if you wanted to place the sentence into an array, then ask your third word second letter. It won't fail.

@DigitalSpaceport 2 ай бұрын

@@_s.i.s.u. Other models I've tested, including qwen 2.5, can and do nail that exact question. Copy pasted. If a question has to be asked in a specific way to elicit a correct response, that is a failure.