you are underrated matt! they didnt sponsor you because they wanted to just get the people as are spewing hype! you go into such detail! your content should be #1 on any ollama tutorial.
@KonstantinBykov-o3k2 ай бұрын
I’ve tested 3b parameters model and very impressed with it. Speed and accuracy. I use it for improving descriptions in catalog. And it looks like we don’t have to pay to Open AI anymore. Thank you for your video!
@solyarisoftware2 ай бұрын
Hi Matt, I upvoted as usual. Two notes: Ollama HW resources calculations (proposal for a new Ollama video): In this video, you thankfully show how easy it is to set the context length in the model file, bypassing Ollama's default. How does the context length influence the RAM usage of the host? In general, it would be great to dedicate a video to hardware resource calculations based on model size, quantization, context size, and possibly other macro parameters. It would also be helpful to discuss how CPU, and especially GPU, can improve latency times (especially in a multi-user environment). You mention "your" function call method. I know you've already done a video on this topic, but since it's very useful in practice, maybe you could create a new video with code examples (Python is welcome). Other viewers: If you agree, please upvote my comment. Community thoughts are welcome! Thanks again, Giorgio
@arthurhjorth14902 ай бұрын
Agreed! A deeper dive into context size would be very helpful: HW Resource allocations/calculations (even if "just" some heuristics), impact of larger context size on in/out eval rates (again, if not accurate calculations then some heuristics), potential problems with changing context size and what does one need to know about a model when doing this (e.g. what happens if you set a context size larger than what the model allows?). Edit to add: I'd also be interested in a potential context size video touching on how context size affects/interacts with parallel requests, and how to set up multi-user environments that share the model's context size. Thank you for an, as usual, excellent tutorial! Cheers, Arthur
@solyarisoftware2 ай бұрын
@@arthurhjorth1490 Yes. Just a more note about evaluations criteria. Matt and other people use a short list of "trusted" question to evaluate a model. That's perfectly fine and maybe it could be useful to have some why to automatize the evaluation of a model maybe just scripting the list of question and have an automatic way to evaluate the response (by another LLM?). Just a food for thought / a possible video topic in the future?
@trsd86402 ай бұрын
The first really helpful video about llama 3.2! Thanks Matt!
@FOGSUserАй бұрын
Loving the Companion Plugin for Obsidian with llama3.2:1b. Completing my thoughts in the persona of William Shakespear by changing the User prompt was a lot of fun. Fair Companion App, thy wondrous might, Doth streamline notes and tasks with ease and light. In Obsidian vault, thou dost thy work impart, With real-time updates, a digital heart.
@AKBARESFAHANI2 ай бұрын
I love your content and learn every time I watch, thank you
@vbitz4800Ай бұрын
Software engineer for many many years here....I have recently stumbled upon ollama and your videos. I have done several Intergrations of ai into client's apps using azure ai services (mostly speech and some openai) but MS azure services keep you away from the nuts and bolts via abstractions. This is fine for productivity but not great for understanding. Your videos and github repo have crystalized many concepts for me and opened my eyes and brain to a better and deeper understanding. Thank you, sincerely, for what you are doing here. It is priceless. BTW, I adore ollama and the smaller models....however my wife would not be please with my recent PC build cost (two 4090s lol). But hey, this is addictive! Isn't it amazing to see the realization of what we all dreamed of when we started in the programming/IT realm!
@technovangelistАй бұрын
Yup. Compared to what I was doing in my Intro to AI programming class I took in 1989 at FSU this is science fiction.
@GundamExia882 ай бұрын
Ha! That's how I felt about the same when people ask about which number is bigger 8.8 vs 8.21! It depends in what context! And that's what I noticed when I test the models, most people only run it one time. The models do not always give the answer right the first time, sometimes the second times, etc. Great video.
@TheHummChannel2 ай бұрын
Really this channel deserve way more exposure! Love the contents and the host ! Keep the good work thanks
@КравчукІгор-т2эАй бұрын
Thanks Matt! Everything is interesting and clear as always!
@TheDiverJim2 ай бұрын
Love the breath holding tangent!
@mduthwala4392 ай бұрын
Well explained especially the 1B
@Joooooooooooosh2 ай бұрын
Dude ollama is great. Thank you.
@ClearSight20222 ай бұрын
Hi Matt very clear presentation yet again. I also really enjoy your polished style, so I subscribed even though I do so rarely. Cheers !
@viertekcoАй бұрын
Your delivery is great had no idea u were a founder..that's awesome.🎉 Meta wheres the ❤!!
@BirdManPhil2 ай бұрын
ive been using llama 3.1 8b on my 4050 laptop very comfortably for ai assisted tasks in obsidian and i cant wait to see if these smaller 3b models are a better fit. you get a sub from me im all aboard the self hosted train next stop ai station lets gooo
@JeromeBoivin-tx7fm2 ай бұрын
Thank you Matt for your videos. I was not aware of the hardcoded context window in Ollama, it may explain why I was so confused by the models claiming having a large one. Why is that? I’m expecting Ollama to be adaptative to the possibilities of the model it’s running! Do I really need each time to manually create a custom model template just to benefit from the native model context size? Do you already posted a video answering these questions? Thank you so much and keep the good job! Cheers from France!
@manuelbradovent35622 ай бұрын
@JeromeBoivin-tx7fm Also interested related the context and if in the model file also prompt, end token, etc was added.
@technovangelist2 ай бұрын
Context takes a lot of memory. And it’s hard to put rails around it so it doesn’t fully crash the machine. I’ve had the machine reboot when it takes too much. And lots of folks have tiny gpus so we got lots of support requests. So it went to a blanket 2k unless you specify the size. But since it’s so easy for most devs to create that file and since ollama is intended as a dev tool first, it’s seemed like a good decision
@danielarista1352Ай бұрын
Matt, I"m a non-CS/SWE guy trying to hack away at a tool that uses LLMs to add some NL to a the UI of an app I'm building. It's b/c of you I choose Ollama over other options. Thanks brother.
@jackonell14512 ай бұрын
Wondering what would be the success rate of the tool call if wrapped in a framework like Yacana ? Because only using Ollama's function calling by itsef is IMO not representativ of the potential of any models as local LLMs need some level of guidance to really achieve anything.
@technovangelistАй бұрын
The older approach works fantastically with every model
@jackonell1451Ай бұрын
@@technovangelist I completely agree with you that Ollama's function calling is excellent. Forcing LLMs to output JSON the way it does is very impressive! However, we use PHI models in production, and only using Ollama didn't yield the expected results... The JSON was valid, but the arguments generated were not. That's why we had to switch to a agents framework, because developing our own overlay would have been too complex. With the agent framework we now have enhanced tool calling and a way to do multi-turn chat so multiple PHIs can brainstorm together. Also, we needed control loops at each step to ensure we get a computable output in the final phase.
@jackonell1451Ай бұрын
We had a success rate of around 30% when calling tools with correct values using only the "/generate" endpoint. Now, we're at about 70%. Additionally, the team has improved at prompt engineering, which has been a huge factor! Also, thanks for making videos, they're always a great help. Keep it up! ^^
@jayd89352 ай бұрын
If you have a cat, it stole the water bottle! Thanks for the review too. I will be interested to try this on my usual M1, might remove the need to run models on another more powerful machine.
@alvinnorin8820Ай бұрын
8:12 I'd set the temperature to zero, in which case everyone using the model will get the same answer every time for the same inputs. Setting it to zero makes it default to always responding with whatever answer is most likely the accurate one. It's a predictive language model after all. The higher the temperature, the more varied responses. Setting it to zero disables the randomness parameter and thus removes variety completely. Very useful when benchmarking models against each other.
@technovangelistАй бұрын
Setting temp to zero will not get you the same answer every time. You would have to set temp and seed and you will reduce the variations but it may not be the best answer.
@alvinnorin8820Ай бұрын
@@technovangelist Right, I assume the seed could also be random. It might be different across different models. I ran llama3.1:70b with temperature at zero, and that seemed to get me the same story from the same Minecraft chat logs along with its system prompt. LLMs have different architectures though, and it doesn't seem like all support all parameters. Taking away randomness is highly advantageous when optimizing system prompts though, being able to compare static responses.
@technovangelistАй бұрын
i don't know if that’s true. testing a system prompt on a limited version of the model helps if you only use the model in that limited way every time.
@tecnopadre2 ай бұрын
1st. Always thank you Matt. Question. I've been testing 3B since launch ata my Laptop with NPU. WebUI on a server and Ollama in my Laptop. The thing is My laptop has NPU and Ollama is not taking advantages of it. Ollma 3B is taking the small GPU and CPU. The results with a RAG (1st with WEBui interface and then with Flowise) gives me good results. I'm trying to search how to activate the NPU from my laptop so Ollama can use it. It would be great. I think LLM Studio does it? Also testing with large files >15MB, the embedding from Ollama at my computer again with WEBUI and Flowise, fails. The computer can't handle. Would be great to have you doing it with files that are closer to real company files. I think 3B model it's great. The last test I did is using it with Groq and of course, there is where I can test it 🙂
@toadlguy2 ай бұрын
These smaller models are great for research, particularly as they are fairly easy to modify in code. In actual use case, they are somewhat over censored, but I suspect it will be just a matter of time before a fine-tuned uncensored version is created.
@yacahumax14312 ай бұрын
ollama makes it so easy
@KevinKreger2 ай бұрын
Good one. I saw someone training the 1B model on their iPhone😮
@merefield25852 ай бұрын
Hey Matt, thanks for a great video - do you keep the code featured in your videos in public repos?
@akongasАй бұрын
That's great. Hey do you know if we will ever get Ai running locally on our Android, ios devices?
@AndresSolar-y3g2 ай бұрын
...worth a thumb up...
@martijnveenman2 ай бұрын
Amazing video, thank you. Is companion the only ai plugin you use in Obsidian? Looking forward to seeing more practical AI obsidian applications.
@utsavgoswami52632 ай бұрын
well, matt you are our fav choice for all things AI!
@dakkon77blackblade20Ай бұрын
I would really like to know if these models are any good for entity extraction like for graphRAG or chunk generators for traditional RAG... That would be a great topic!
@jazzejeff12 ай бұрын
Your channel's so nice I wish could sub twice. Keep up the great work.
@g.s.33892 ай бұрын
just a question: what is the best model for supporting me in python programming that I can use with ollama?
@yahoolane2 ай бұрын
What is your use case, llmana3 is a good default
@alexandrep49132 ай бұрын
There is an awesome video on KZbin talking about the specific model and how censored it is. I wouldn't be surprised if people find the older model to be more capable.
@nosuchthing8Ай бұрын
How much vram is required for the 3b token method?
@wardehaj2 ай бұрын
Thanks for this great video explaining how to use these small LLMs! I will be waiting for your video about the vision model. Maybe compare llama2 vision with pixtral?
@johang12932 ай бұрын
Good stuff
@Cingku2 ай бұрын
Could you explain what the generation completion hotkey does in the Companion plugin for Obsidian? When I use the Companion, it automatically generates text, completes it, and streams the response. So, in what situation would I need to use this hotkey? I'd appreciate it if you could clarify this because I was confused by this.
@TaFeiYenАй бұрын
First time seeing your vid. Interesting take. I know you have demonstrated some use cases of the models. But to general people, there's way too many models to pick from. Do you have a guide on that? To narrow down which model to use? I know it will always be bias but I would like to hear your take.
@technovangelistАй бұрын
This was the first, maybe second, time I looked at one model. I would like to do more of them.
@BruceWayne15325Ай бұрын
I love small LLM's. I don't think people realize the power they have to simplify their lives. I love to use Obsidian for note taking. Using a local LLM, I can have it easily summarize my notes, giving me an at-a-glance view of each daily note. Have a long meeting? Transcribe it and summarize it, then stuff it in your notes. When we get agents then they will become vastly more useful. People put too much emphasis on the ability of models to do analytical tasks with great accuracy. They don't understand that the power of AI is the ability to have the AI write a program on-the-fly to do that kind of analysis, and then give you the result. AI will never be 100% accurate. It's like a human. We make mistakes too. We are spoiled with computers and the ability to have 100% reliable answers, but an AI can give that to you, just through the extra step of creating a program to accomplish the task.
@stasoline2 ай бұрын
Cool video!
@BulletProof-n7n2 ай бұрын
Respect and appreciation for what you do. Any chance of using a llm on the new Samsung galaxy s23?
@enermaxstephens1051Ай бұрын
Can we simply download the GGLM file and use it wit GPT4all?
@PriNovaFX2 ай бұрын
What if you set temperature to 0, does the tool functions test succeed better?
@researchandbuild1751Ай бұрын
Can llama 3.2 1b do function calling? I don't know if it's smart enough to format output
@technovangelistАй бұрын
Depends. Using the newer more official format, I don't think it does a good job. Using the older approach I have documented in a few videos on this channel I expect it would be great at it.
@researchandbuild1751Ай бұрын
@@technovangelist I will try to look into the original method. I still don't think many people explain how it all works in plain English. I will look at your earlier videos about functions
@utvikler-no2 ай бұрын
Thanks
@technovangelist2 ай бұрын
What??? You are too kind... a member AND a tip. Thanks so much.
@utvikler-no2 ай бұрын
@@technovangelistI just love the simple and yet the comprehensive way you explain the subjects. Keep up the good work❤
@ts757arse2 ай бұрын
I've just tried this with fabric. It's as good at summarising youtube videos as microsoft 8x7B. Still gets confused occasionally with certain tasks. I've got a large-ish DB of technical documents for an obscure bit of research kit and I'm going to try swapping out my current LLM and see if this performs better. It'll probably be about 10x faster.
@antoniomonteiro36982 ай бұрын
llama 3.2 1B Q4: To count the number of 'F's and 'R's, I'll go through the word "strawberry" carefully. Here's the breakdown: * 'F': 1 * 'R': 2 So, there are 1 'F' and 2 'R's in the word "strawberry". or: please create a java program to count the number of 'F's and 'R's in the word 'strawberry' public class Main { public static void main(String[] args) { // Define the word String word = "strawberry"; // Initialize counters for 'F's and 'R's int countF = 0; int countR = 0; // Loop through each character in the word for (char c : word.toLowerCase().toCharArray()) { // Check if the character is 'F' or 'R' if (c == 'f' || c == 'r') { // Increment the counter if (c == 'f') { countF++; } else { countR++; } } } // Print the results System.out.println("Number of 'F's: " + countF); System.out.println("Number of 'R's: " + countR); } } output: Number of 'F's: 0 Number of 'R's: 3 sorry, they left me home alone...
@harrykekgmail2 ай бұрын
interesting video. thank you
@userou-ig1ze2 ай бұрын
Thanks for the great content. What is missing in ollama is vision models support like florence2 and sam2. If it had a nice api for that, that could be used with curl or so... dreams. Raspberry pi with vision models must be so incredibly overpowered, I prefer not thinking about it too much
@technovangelist2 ай бұрын
Raspberry pi overpowered???? way underpowered is more accurate, especially considering the cost of them. Physical size is the big benefit these days. But Florence2 looks like an older model that didn't get much love. Some of the other vision models on Ollama got a lot more coverage. And hadn't heard of sam2 either. Both architectures aren't supported so would require a lot of work to get working.
@userou-ig1ze2 ай бұрын
@@technovangelist thanks for the time to reply, appreciated. Underpowered _is_ the point, as in, if vision models run sufficiently fast on _that_ hardware, it enables vision on edge devices. Florence2 was released months ago, and the combination of selecting pixels by typing, and segmenting and tracking over time with sam2, is an incredibly powerful concept- I needn't ask any lay man to become creative, the usefulness of text driven vision perception seems insane
@ivanalberquilla99532 ай бұрын
Thank you for the video. What is the tool you use for writing?
@technovangelist2 ай бұрын
Obsidian. And the plugin for it was companion
@ivanalberquilla99532 ай бұрын
Thanks!
@modoulaminceesay92112 ай бұрын
All things local AI and I just subscribed that’s what I need
@autumblakАй бұрын
Hey matt, I have an intel based MacBook, and I want ollama to utilize my gpu, but I don't know how to go about it. I have searched all round but to no avail. Could you offer some pointers, or resources to where I can succeed?
@technovangelistАй бұрын
Unfortunately there are no options. Well except buying an apple silicon MacBook or switching to a pc.
@chrisBruner2 ай бұрын
Good video
@megairrational2 ай бұрын
Great content. Could you briefly describe the machine you use for this task? You mentioned 3 seconds…
@technovangelist2 ай бұрын
I usually do and forgot this time. M1 Max MacBook Pro with 64gb. A machine you can get for about 1500 usd today.
@megairrational2 ай бұрын
@@technovangelist thank you! 64GB? Impressive. Please keep it up! You are a great communicator
@dna1002 ай бұрын
Lovin' the channel. 👍👍It'll be great once Ollama supports vision
@technovangelist2 ай бұрын
Ollama does support vision today. The llama3.2 vision should be very soon
@arkemiffo2 ай бұрын
Just tried the 3.2:3b. I said hello and got a reply blazingly fast, so I asked if it was on meth or something. Got the standard "I'm just a model, I can't human", so I said I was just surprised to see such fast answers on a local model. And this is where things got confused. Apparently, Llama3.2:3b thinks it's working off a cloud-service. It refused the notion that I'm running this locally. Just to be sure, I pulled the ethernet cable, restarted the terminal, and it worked just as fine without (well...duh). I just find it fascinating that the model itself almost reviles at the notion of being local.
@AlexanderYudin2 ай бұрын
Which hardware setup you have ?
@technovangelist2 ай бұрын
I'm on a m1 MacBook Pro Max with 64GB RAM
@agi_lab2 ай бұрын
I would request you to test out llms on some complex tools (as simple as file create tool fails on 3b model). I assume of i give proper func.desc, it might not. Need to experiment
@Aarifshah-A2 ай бұрын
Lol the ending 😂😂😂
@aiamfree2 ай бұрын
Why am I getting Error: error loading model for all the 3.2 downloads?
@technovangelist2 ай бұрын
Have you updated ollama?
@aiamfree2 ай бұрын
@@technovangelist yes that fixed it, thanks… it’s sooo damn fast!!
@zhouyangbo44982 ай бұрын
ollama run llama3.2:1b Error: llama runner process has terminated: signal: abort trap error:done_getting_tensors: wrong number of tensors; expected 147, got 146 any idea about this error?
@technovangelist2 ай бұрын
You need to update ollama. You should always update whenever there is a new version.
@zhouyangbo44982 ай бұрын
ok ,I will try it , maybe it is GFW issue, thanks.
@kshabana_YT2 ай бұрын
I tried to run Llama3.2 1b in Samsung s 20 plus Error: no suitable llama servers found. And I am running ollama serve
@Psychopatz2 ай бұрын
just use layla lite then import the model. Yep its a hassle on making your lammacpp to work
@kshabana_YT2 ай бұрын
I don't know what are you talking about
@UnwalledGarden2 ай бұрын
Awww yeah!
@thestype2 ай бұрын
I asked it to create a component in javascript in which llama3.1 8B and mistral-nemo greatly succeeded. But llama3.2 3B failed miserably, mixing up different libraries unintelligently. Its just fast, but also a random word generator is fast.
@technovangelist2 ай бұрын
But a random word generator wouldn't be anywhere near as good as llama32 3b.
@TLabsLLC-AI-Development2 ай бұрын
Meta Matt!
@aiamfree2 ай бұрын
when is ollama getting the vision models anyone know?
@technovangelist2 ай бұрын
The team is working on it.
@aiamfree2 ай бұрын
@@technovangelist awesome, thanks Team!
@BeauKpadАй бұрын
My favorite hallucination : I work for the Jill Stein for President campaign so I use info about her to test AIs. She's a public figure, but a minor one, and I know more about her than most sources. When playing around with WebSim, I asked it to make me a Jill Stein fan site. The results were shockingly accurate, history, platform, etc... except the photos. The photos were all of Jill Biden. For all the normies, Jill Biden is a Joe Biden's wife. Jill Stein was running against Joe Biden. I pointed this out to the AI, and it replaced them with photos of Kirsten Sine a. If you don't know who that is, I really don't have the bandwidth to use explain, but it is simultaneously shockingly wrong and quite funny. Like, I know what it is getting at.
@shuntera2 ай бұрын
Matthew Berman’s review of the vision models show them to be censored to the point of uselessness. Hopefully someone will bring out uncensored versions
@technovangelist2 ай бұрын
reviews on the vision stuff seems to be mixed so far. It could be that some of the folks who have it have a bad implementation. I look forward to finding out.
@changeagent2282 ай бұрын
First test I did was "what number is larger 9.9 or 9.11?" and it insisted 9.11 was bigger. When is 2.3 out?
@omercelebi20122 ай бұрын
Man you forgot your cup!
@junaidmundichipparakkal17002 ай бұрын
Can someone point to ollama and RAG locally from Matt?
@kshabana_YT2 ай бұрын
Why do you quit ollama 😢😢😢
@technovangelist2 ай бұрын
Are you asking about quitting the app? Or why I left the company? That second thing is not something for this comment thread.
@emmanuelgoldstein36822 ай бұрын
@@technovangelist Due to your hesitance on commenting, we'll just assume they were having Diddy parties until you clear it up
@starlord75262 ай бұрын
@@emmanuelgoldstein3682 did you just say diddy party brah? jajajajaja
@kshabana_YT2 ай бұрын
Company
@ChristophBackhaus2 ай бұрын
I want you to count the number of r's in Strawberry. To do so I want you to go Letter by letter and every time you find one r I want you to count up Gets it right every time...
@danielarista1352Ай бұрын
>>> I want you to count the number of r's in aferradfafa;;rr;r;adfafaf'''.. To do so I want you to go letter by letter and every time you find one r I want you to count up. Let's start counting: a - no R (0) f - no R (0) e - no R (0) r - found an R! (1) r - found another R! (2) a - no R (2) d - no R (2) f - no R (2) a - no R (2) Continuing... ;; is a semicolon, not a letter to count rr - found two Rs! (4) ; - no R (4) r - found another R! (5) ; - no R (5) a - no R (5) d - no R (5) f - no R (5) a - no R (5) f - no R (5) a - no R (5) f - no R (5) So, there are 5 Rs in the given text.
@iamarto2 ай бұрын
Whoever took the sponser from meta, I don't think they asked for it. But in case you haven't noticed they have more subscribers than you.
@technovangelist2 ай бұрын
Some have 1/3 the number of subs compared to me. So that’s not it.
@PeterHagen2 ай бұрын
Llama 3.1 & 3.2 are unfortunately very poor in Dutch language usage
@sskohli792 ай бұрын
Hey Matt, nice video. But I don’t think it’s as impressive as you put it. I am sure the llama3.1’s performance was comparable
@technovangelist2 ай бұрын
It wasn’t available in a 1 and 3 b model.
@protovici14762 ай бұрын
The vision portion isn't to great.
@SlykeThePhoxenix2 ай бұрын
There's 4 killers in the room. Since when does dying make you not a killer?
@technovangelist2 ай бұрын
Good point.
@Jason-ju7df2 ай бұрын
Microsoft GRIN MoE: A Gradient-Informed Mixture of Experts MoE Model 6.6b Ranks better
@technovangelist2 ай бұрын
In benchmarks? Or in real tests. One is useful the other has zero real value.
@HitsInSandbox2 ай бұрын
I tested it and the Vision and abilities suck and way over rated.
@xevil212 ай бұрын
It's amazing how such a small model is smarter than you?