I love small and awesome models

Рет қаралды 27,671

Күн бұрын

Пікірлер: 135

@BORCHLEO 2 ай бұрын

you are underrated matt! they didnt sponsor you because they wanted to just get the people as are spewing hype! you go into such detail! your content should be #1 on any ollama tutorial.

@KonstantinBykov-o3k 2 ай бұрын

I’ve tested 3b parameters model and very impressed with it. Speed and accuracy. I use it for improving descriptions in catalog. And it looks like we don’t have to pay to Open AI anymore. Thank you for your video!

@solyarisoftware 2 ай бұрын

Hi Matt, I upvoted as usual. Two notes: Ollama HW resources calculations (proposal for a new Ollama video): In this video, you thankfully show how easy it is to set the context length in the model file, bypassing Ollama's default. How does the context length influence the RAM usage of the host? In general, it would be great to dedicate a video to hardware resource calculations based on model size, quantization, context size, and possibly other macro parameters. It would also be helpful to discuss how CPU, and especially GPU, can improve latency times (especially in a multi-user environment). You mention "your" function call method. I know you've already done a video on this topic, but since it's very useful in practice, maybe you could create a new video with code examples (Python is welcome). Other viewers: If you agree, please upvote my comment. Community thoughts are welcome! Thanks again, Giorgio

@arthurhjorth1490 2 ай бұрын

Agreed! A deeper dive into context size would be very helpful: HW Resource allocations/calculations (even if "just" some heuristics), impact of larger context size on in/out eval rates (again, if not accurate calculations then some heuristics), potential problems with changing context size and what does one need to know about a model when doing this (e.g. what happens if you set a context size larger than what the model allows?). Edit to add: I'd also be interested in a potential context size video touching on how context size affects/interacts with parallel requests, and how to set up multi-user environments that share the model's context size. Thank you for an, as usual, excellent tutorial! Cheers, Arthur

@solyarisoftware 2 ай бұрын

@@arthurhjorth1490 Yes. Just a more note about evaluations criteria. Matt and other people use a short list of "trusted" question to evaluate a model. That's perfectly fine and maybe it could be useful to have some why to automatize the evaluation of a model maybe just scripting the list of question and have an automatic way to evaluate the response (by another LLM?). Just a food for thought / a possible video topic in the future?

@trsd8640 2 ай бұрын

The first really helpful video about llama 3.2! Thanks Matt!

@FOGSUser Ай бұрын

Loving the Companion Plugin for Obsidian with llama3.2:1b. Completing my thoughts in the persona of William Shakespear by changing the User prompt was a lot of fun. Fair Companion App, thy wondrous might, Doth streamline notes and tasks with ease and light. In Obsidian vault, thou dost thy work impart, With real-time updates, a digital heart.

@AKBARESFAHANI 2 ай бұрын

I love your content and learn every time I watch, thank you

@vbitz4800 Ай бұрын

Software engineer for many many years here....I have recently stumbled upon ollama and your videos. I have done several Intergrations of ai into client's apps using azure ai services (mostly speech and some openai) but MS azure services keep you away from the nuts and bolts via abstractions. This is fine for productivity but not great for understanding. Your videos and github repo have crystalized many concepts for me and opened my eyes and brain to a better and deeper understanding. Thank you, sincerely, for what you are doing here. It is priceless. BTW, I adore ollama and the smaller models....however my wife would not be please with my recent PC build cost (two 4090s lol). But hey, this is addictive! Isn't it amazing to see the realization of what we all dreamed of when we started in the programming/IT realm!

@technovangelist Ай бұрын

Yup. Compared to what I was doing in my Intro to AI programming class I took in 1989 at FSU this is science fiction.

@GundamExia88 2 ай бұрын

Ha! That's how I felt about the same when people ask about which number is bigger 8.8 vs 8.21! It depends in what context! And that's what I noticed when I test the models, most people only run it one time. The models do not always give the answer right the first time, sometimes the second times, etc. Great video.

@TheHummChannel 2 ай бұрын

Really this channel deserve way more exposure! Love the contents and the host ! Keep the good work thanks

@КравчукІгор-т2э Ай бұрын

Thanks Matt! Everything is interesting and clear as always!

@TheDiverJim 2 ай бұрын

Love the breath holding tangent!

@mduthwala439 2 ай бұрын

Well explained especially the 1B

@Joooooooooooosh 2 ай бұрын

Dude ollama is great. Thank you.

@ClearSight2022 2 ай бұрын

Hi Matt very clear presentation yet again. I also really enjoy your polished style, so I subscribed even though I do so rarely. Cheers !

@viertekco Ай бұрын

Your delivery is great had no idea u were a founder..that's awesome.🎉 Meta wheres the ❤!!

@BirdManPhil 2 ай бұрын

ive been using llama 3.1 8b on my 4050 laptop very comfortably for ai assisted tasks in obsidian and i cant wait to see if these smaller 3b models are a better fit. you get a sub from me im all aboard the self hosted train next stop ai station lets gooo

@JeromeBoivin-tx7fm 2 ай бұрын

Thank you Matt for your videos. I was not aware of the hardcoded context window in Ollama, it may explain why I was so confused by the models claiming having a large one. Why is that? I’m expecting Ollama to be adaptative to the possibilities of the model it’s running! Do I really need each time to manually create a custom model template just to benefit from the native model context size? Do you already posted a video answering these questions? Thank you so much and keep the good job! Cheers from France!

@manuelbradovent3562 2 ай бұрын

@JeromeBoivin-tx7fm Also interested related the context and if in the model file also prompt, end token, etc was added.

@technovangelist 2 ай бұрын

Context takes a lot of memory. And it’s hard to put rails around it so it doesn’t fully crash the machine. I’ve had the machine reboot when it takes too much. And lots of folks have tiny gpus so we got lots of support requests. So it went to a blanket 2k unless you specify the size. But since it’s so easy for most devs to create that file and since ollama is intended as a dev tool first, it’s seemed like a good decision

@danielarista1352 Ай бұрын

Matt, I"m a non-CS/SWE guy trying to hack away at a tool that uses LLMs to add some NL to a the UI of an app I'm building. It's b/c of you I choose Ollama over other options. Thanks brother.

@jackonell1451 2 ай бұрын

Wondering what would be the success rate of the tool call if wrapped in a framework like Yacana ? Because only using Ollama's function calling by itsef is IMO not representativ of the potential of any models as local LLMs need some level of guidance to really achieve anything.

@technovangelist Ай бұрын

The older approach works fantastically with every model

@jackonell1451 Ай бұрын

@@technovangelist I completely agree with you that Ollama's function calling is excellent. Forcing LLMs to output JSON the way it does is very impressive! However, we use PHI models in production, and only using Ollama didn't yield the expected results... The JSON was valid, but the arguments generated were not. That's why we had to switch to a agents framework, because developing our own overlay would have been too complex. With the agent framework we now have enhanced tool calling and a way to do multi-turn chat so multiple PHIs can brainstorm together. Also, we needed control loops at each step to ensure we get a computable output in the final phase.

@jackonell1451 Ай бұрын

We had a success rate of around 30% when calling tools with correct values using only the "/generate" endpoint. Now, we're at about 70%. Additionally, the team has improved at prompt engineering, which has been a huge factor! Also, thanks for making videos, they're always a great help. Keep it up! ^^

@jayd8935 2 ай бұрын

If you have a cat, it stole the water bottle! Thanks for the review too. I will be interested to try this on my usual M1, might remove the need to run models on another more powerful machine.

@alvinnorin8820 Ай бұрын

8:12 I'd set the temperature to zero, in which case everyone using the model will get the same answer every time for the same inputs. Setting it to zero makes it default to always responding with whatever answer is most likely the accurate one. It's a predictive language model after all. The higher the temperature, the more varied responses. Setting it to zero disables the randomness parameter and thus removes variety completely. Very useful when benchmarking models against each other.

@technovangelist Ай бұрын

Setting temp to zero will not get you the same answer every time. You would have to set temp and seed and you will reduce the variations but it may not be the best answer.

@alvinnorin8820 Ай бұрын

@@technovangelist Right, I assume the seed could also be random. It might be different across different models. I ran llama3.1:70b with temperature at zero, and that seemed to get me the same story from the same Minecraft chat logs along with its system prompt. LLMs have different architectures though, and it doesn't seem like all support all parameters. Taking away randomness is highly advantageous when optimizing system prompts though, being able to compare static responses.

@technovangelist Ай бұрын

i don't know if that’s true. testing a system prompt on a limited version of the model helps if you only use the model in that limited way every time.

@tecnopadre 2 ай бұрын

1st. Always thank you Matt. Question. I've been testing 3B since launch ata my Laptop with NPU. WebUI on a server and Ollama in my Laptop. The thing is My laptop has NPU and Ollama is not taking advantages of it. Ollma 3B is taking the small GPU and CPU. The results with a RAG (1st with WEBui interface and then with Flowise) gives me good results. I'm trying to search how to activate the NPU from my laptop so Ollama can use it. It would be great. I think LLM Studio does it? Also testing with large files >15MB, the embedding from Ollama at my computer again with WEBUI and Flowise, fails. The computer can't handle. Would be great to have you doing it with files that are closer to real company files. I think 3B model it's great. The last test I did is using it with Groq and of course, there is where I can test it 🙂

@toadlguy 2 ай бұрын

These smaller models are great for research, particularly as they are fairly easy to modify in code. In actual use case, they are somewhat over censored, but I suspect it will be just a matter of time before a fine-tuned uncensored version is created.

@yacahumax1431 2 ай бұрын

ollama makes it so easy

@KevinKreger 2 ай бұрын

Good one. I saw someone training the 1B model on their iPhone😮

@merefield2585 2 ай бұрын

Hey Matt, thanks for a great video - do you keep the code featured in your videos in public repos?

@akongas Ай бұрын

That's great. Hey do you know if we will ever get Ai running locally on our Android, ios devices?

@AndresSolar-y3g 2 ай бұрын

...worth a thumb up...

@martijnveenman 2 ай бұрын

Amazing video, thank you. Is companion the only ai plugin you use in Obsidian? Looking forward to seeing more practical AI obsidian applications.

@utsavgoswami5263 2 ай бұрын

well, matt you are our fav choice for all things AI!

@dakkon77blackblade20 Ай бұрын

I would really like to know if these models are any good for entity extraction like for graphRAG or chunk generators for traditional RAG... That would be a great topic!

@jazzejeff1 2 ай бұрын

Your channel's so nice I wish could sub twice. Keep up the great work.

@g.s.3389 2 ай бұрын

just a question: what is the best model for supporting me in python programming that I can use with ollama?

@yahoolane 2 ай бұрын

What is your use case, llmana3 is a good default

@alexandrep4913 2 ай бұрын

There is an awesome video on KZbin talking about the specific model and how censored it is. I wouldn't be surprised if people find the older model to be more capable.

@nosuchthing8 Ай бұрын

How much vram is required for the 3b token method?

@wardehaj 2 ай бұрын

Thanks for this great video explaining how to use these small LLMs! I will be waiting for your video about the vision model. Maybe compare llama2 vision with pixtral?

@johang1293 2 ай бұрын

Good stuff

@Cingku 2 ай бұрын

Could you explain what the generation completion hotkey does in the Companion plugin for Obsidian? When I use the Companion, it automatically generates text, completes it, and streams the response. So, in what situation would I need to use this hotkey? I'd appreciate it if you could clarify this because I was confused by this.

@TaFeiYen Ай бұрын

First time seeing your vid. Interesting take. I know you have demonstrated some use cases of the models. But to general people, there's way too many models to pick from. Do you have a guide on that? To narrow down which model to use? I know it will always be bias but I would like to hear your take.

@technovangelist Ай бұрын

This was the first, maybe second, time I looked at one model. I would like to do more of them.

@BruceWayne15325 Ай бұрын

I love small LLM's. I don't think people realize the power they have to simplify their lives. I love to use Obsidian for note taking. Using a local LLM, I can have it easily summarize my notes, giving me an at-a-glance view of each daily note. Have a long meeting? Transcribe it and summarize it, then stuff it in your notes. When we get agents then they will become vastly more useful. People put too much emphasis on the ability of models to do analytical tasks with great accuracy. They don't understand that the power of AI is the ability to have the AI write a program on-the-fly to do that kind of analysis, and then give you the result. AI will never be 100% accurate. It's like a human. We make mistakes too. We are spoiled with computers and the ability to have 100% reliable answers, but an AI can give that to you, just through the extra step of creating a program to accomplish the task.

@stasoline 2 ай бұрын

Cool video!

@BulletProof-n7n 2 ай бұрын

Respect and appreciation for what you do. Any chance of using a llm on the new Samsung galaxy s23?

@enermaxstephens1051 Ай бұрын

Can we simply download the GGLM file and use it wit GPT4all?

@PriNovaFX 2 ай бұрын

What if you set temperature to 0, does the tool functions test succeed better?

@researchandbuild1751 Ай бұрын

Can llama 3.2 1b do function calling? I don't know if it's smart enough to format output

@technovangelist Ай бұрын

Depends. Using the newer more official format, I don't think it does a good job. Using the older approach I have documented in a few videos on this channel I expect it would be great at it.

@researchandbuild1751 Ай бұрын

@@technovangelist I will try to look into the original method. I still don't think many people explain how it all works in plain English. I will look at your earlier videos about functions

@utvikler-no 2 ай бұрын

Thanks

@technovangelist 2 ай бұрын

What??? You are too kind... a member AND a tip. Thanks so much.

@utvikler-no 2 ай бұрын

@@technovangelistI just love the simple and yet the comprehensive way you explain the subjects. Keep up the good work❤

@ts757arse 2 ай бұрын

I've just tried this with fabric. It's as good at summarising youtube videos as microsoft 8x7B. Still gets confused occasionally with certain tasks. I've got a large-ish DB of technical documents for an obscure bit of research kit and I'm going to try swapping out my current LLM and see if this performs better. It'll probably be about 10x faster.

@antoniomonteiro3698 2 ай бұрын

llama 3.2 1B Q4: To count the number of 'F's and 'R's, I'll go through the word "strawberry" carefully. Here's the breakdown: * 'F': 1 * 'R': 2 So, there are 1 'F' and 2 'R's in the word "strawberry". or: please create a java program to count the number of 'F's and 'R's in the word 'strawberry' public class Main { public static void main(String[] args) { // Define the word String word = "strawberry"; // Initialize counters for 'F's and 'R's int countF = 0; int countR = 0; // Loop through each character in the word for (char c : word.toLowerCase().toCharArray()) { // Check if the character is 'F' or 'R' if (c == 'f' || c == 'r') { // Increment the counter if (c == 'f') { countF++; } else { countR++; } } } // Print the results System.out.println("Number of 'F's: " + countF); System.out.println("Number of 'R's: " + countR); } } output: Number of 'F's: 0 Number of 'R's: 3 sorry, they left me home alone...

@harrykekgmail 2 ай бұрын

interesting video. thank you

@userou-ig1ze 2 ай бұрын

Thanks for the great content. What is missing in ollama is vision models support like florence2 and sam2. If it had a nice api for that, that could be used with curl or so... dreams. Raspberry pi with vision models must be so incredibly overpowered, I prefer not thinking about it too much

@technovangelist 2 ай бұрын

Raspberry pi overpowered???? way underpowered is more accurate, especially considering the cost of them. Physical size is the big benefit these days. But Florence2 looks like an older model that didn't get much love. Some of the other vision models on Ollama got a lot more coverage. And hadn't heard of sam2 either. Both architectures aren't supported so would require a lot of work to get working.

@userou-ig1ze 2 ай бұрын

@@technovangelist thanks for the time to reply, appreciated. Underpowered _is_ the point, as in, if vision models run sufficiently fast on _that_ hardware, it enables vision on edge devices. Florence2 was released months ago, and the combination of selecting pixels by typing, and segmenting and tracking over time with sam2, is an incredibly powerful concept- I needn't ask any lay man to become creative, the usefulness of text driven vision perception seems insane

@ivanalberquilla9953 2 ай бұрын

Thank you for the video. What is the tool you use for writing?

@technovangelist 2 ай бұрын

Obsidian. And the plugin for it was companion

@ivanalberquilla9953 2 ай бұрын

Thanks!

@modoulaminceesay9211 2 ай бұрын

All things local AI and I just subscribed that’s what I need

@autumblak Ай бұрын

Hey matt, I have an intel based MacBook, and I want ollama to utilize my gpu, but I don't know how to go about it. I have searched all round but to no avail. Could you offer some pointers, or resources to where I can succeed?

@technovangelist Ай бұрын

Unfortunately there are no options. Well except buying an apple silicon MacBook or switching to a pc.

@chrisBruner 2 ай бұрын

Good video

@megairrational 2 ай бұрын

Great content. Could you briefly describe the machine you use for this task? You mentioned 3 seconds…

@technovangelist 2 ай бұрын

I usually do and forgot this time. M1 Max MacBook Pro with 64gb. A machine you can get for about 1500 usd today.

@megairrational 2 ай бұрын

@@technovangelist thank you! 64GB? Impressive. Please keep it up! You are a great communicator

@dna100 2 ай бұрын

Lovin' the channel. 👍👍It'll be great once Ollama supports vision

@technovangelist 2 ай бұрын

Ollama does support vision today. The llama3.2 vision should be very soon

@arkemiffo 2 ай бұрын

Just tried the 3.2:3b. I said hello and got a reply blazingly fast, so I asked if it was on meth or something. Got the standard "I'm just a model, I can't human", so I said I was just surprised to see such fast answers on a local model. And this is where things got confused. Apparently, Llama3.2:3b thinks it's working off a cloud-service. It refused the notion that I'm running this locally. Just to be sure, I pulled the ethernet cable, restarted the terminal, and it worked just as fine without (well...duh). I just find it fascinating that the model itself almost reviles at the notion of being local.

@AlexanderYudin 2 ай бұрын

Which hardware setup you have ?

@technovangelist 2 ай бұрын

I'm on a m1 MacBook Pro Max with 64GB RAM

@agi_lab 2 ай бұрын

I would request you to test out llms on some complex tools (as simple as file create tool fails on 3b model). I assume of i give proper func.desc, it might not. Need to experiment

@Aarifshah-A 2 ай бұрын

Lol the ending 😂😂😂

@aiamfree 2 ай бұрын

Why am I getting Error: error loading model for all the 3.2 downloads?

@technovangelist 2 ай бұрын

Have you updated ollama?

@aiamfree 2 ай бұрын

@@technovangelist yes that fixed it, thanks… it’s sooo damn fast!!

@zhouyangbo4498 2 ай бұрын

ollama run llama3.2:1b Error: llama runner process has terminated: signal: abort trap error:done_getting_tensors: wrong number of tensors; expected 147, got 146 any idea about this error?

@technovangelist 2 ай бұрын

You need to update ollama. You should always update whenever there is a new version.

@zhouyangbo4498 2 ай бұрын

ok ,I will try it , maybe it is GFW issue, thanks.

@kshabana_YT 2 ай бұрын

I tried to run Llama3.2 1b in Samsung s 20 plus Error: no suitable llama servers found. And I am running ollama serve

@Psychopatz 2 ай бұрын

just use layla lite then import the model. Yep its a hassle on making your lammacpp to work

@kshabana_YT 2 ай бұрын

I don't know what are you talking about

@UnwalledGarden 2 ай бұрын

Awww yeah!

@thestype 2 ай бұрын

I asked it to create a component in javascript in which llama3.1 8B and mistral-nemo greatly succeeded. But llama3.2 3B failed miserably, mixing up different libraries unintelligently. Its just fast, but also a random word generator is fast.

@technovangelist 2 ай бұрын

But a random word generator wouldn't be anywhere near as good as llama32 3b.

@TLabsLLC-AI-Development 2 ай бұрын

Meta Matt!

@aiamfree 2 ай бұрын

when is ollama getting the vision models anyone know?

@technovangelist 2 ай бұрын

The team is working on it.

@aiamfree 2 ай бұрын

@@technovangelist awesome, thanks Team!

@BeauKpad Ай бұрын

My favorite hallucination : I work for the Jill Stein for President campaign so I use info about her to test AIs. She's a public figure, but a minor one, and I know more about her than most sources. When playing around with WebSim, I asked it to make me a Jill Stein fan site. The results were shockingly accurate, history, platform, etc... except the photos. The photos were all of Jill Biden. For all the normies, Jill Biden is a Joe Biden's wife. Jill Stein was running against Joe Biden. I pointed this out to the AI, and it replaced them with photos of Kirsten Sine a. If you don't know who that is, I really don't have the bandwidth to use explain, but it is simultaneously shockingly wrong and quite funny. Like, I know what it is getting at.

@shuntera 2 ай бұрын

Matthew Berman’s review of the vision models show them to be censored to the point of uselessness. Hopefully someone will bring out uncensored versions

@technovangelist 2 ай бұрын

reviews on the vision stuff seems to be mixed so far. It could be that some of the folks who have it have a bad implementation. I look forward to finding out.

@changeagent228 2 ай бұрын

First test I did was "what number is larger 9.9 or 9.11?" and it insisted 9.11 was bigger. When is 2.3 out?

@omercelebi2012 2 ай бұрын

Man you forgot your cup!

@junaidmundichipparakkal1700 2 ай бұрын

Can someone point to ollama and RAG locally from Matt?

@kshabana_YT 2 ай бұрын

Why do you quit ollama 😢😢😢

@technovangelist 2 ай бұрын

Are you asking about quitting the app? Or why I left the company? That second thing is not something for this comment thread.

@emmanuelgoldstein3682 2 ай бұрын

@@technovangelist Due to your hesitance on commenting, we'll just assume they were having Diddy parties until you clear it up

@starlord7526 2 ай бұрын

@@emmanuelgoldstein3682 did you just say diddy party brah? jajajajaja

@kshabana_YT 2 ай бұрын

Company

@ChristophBackhaus 2 ай бұрын

I want you to count the number of r's in Strawberry. To do so I want you to go Letter by letter and every time you find one r I want you to count up Gets it right every time...

@danielarista1352 Ай бұрын

>>> I want you to count the number of r's in aferradfafa;;rr;r;adfafaf'''.. To do so I want you to go letter by letter and every time you find one r I want you to count up. Let's start counting: a - no R (0) f - no R (0) e - no R (0) r - found an R! (1) r - found another R! (2) a - no R (2) d - no R (2) f - no R (2) a - no R (2) Continuing... ;; is a semicolon, not a letter to count rr - found two Rs! (4) ; - no R (4) r - found another R! (5) ; - no R (5) a - no R (5) d - no R (5) f - no R (5) a - no R (5) f - no R (5) a - no R (5) f - no R (5) So, there are 5 Rs in the given text.