Mistral 7B 🖖 Beats LLaMA2 13b AND Can Run On Your Phone??

Рет қаралды 54,203

Күн бұрын

In this video, we thoroughly test Mistral 7b Open Orca GPTQ. It's an incredibly performant small model that illustrates the future of locally hosted edge models. How good is it? Does it beat LLaMA 13b? Let's find out!
Enjoy :)
Join My Newsletter for Regular AI Updates 👇🏼
www.matthewber...
Need AI Consulting? ✅
forwardfuture.ai/
Rent a GPU (MassedCompute) 🚀
bit.ly/matthew...
USE CODE "MatthewBerman" for 50% discount
My Links 🔗
👉🏻 Subscribe: / @matthew_berman
👉🏻 Twitter: / matthewberman
👉🏻 Discord: / discord
👉🏻 Patreon: / matthewberman
Media/Sponsorship Inquiries 📈
bit.ly/44TC45V
Links:
LLM Leaderboard - bit.ly/3qHV0X7
Runpod (Affiliate)- bit.ly/3OtbnQx
Runpod Tutorial - • Run ANY LLM Using Clou...
Runpod Textgen Template - bit.ly/3EqiQdl

Пікірлер: 366

@matthew_berman Жыл бұрын

Want me to try to put Mistral 7B on a phone?

@SneakNStitch-Qc Жыл бұрын

would it work on a iphone ?

@ThePredictR4036 Жыл бұрын

Definietly yes

@saravanajogan1221 Жыл бұрын

Ofcourse we do...!!! And recently there are videos released on how to fine-tune these models on our custom datasets such as csv,txt,pdf, etc., It would be so benificial if you could make finetuning videos on these small LLMs. Thanks

@Proprogrammer001 Жыл бұрын

Hello?? OF COURSE we do! That's what I'm running around with most of the day. And I highly prefer owning the hardware that my AI is running on.

@Genesys225 Жыл бұрын

sounds amazing, specially if you manage to integrate with expo (ReactNative)

@diadetediotedio6918 Жыл бұрын

Interestingly enought, even GPT-4 fails the cup test, I think it is pretty good for testing interpretation and causal inference and/or comprehension of these models

@wurstelei1356 11 ай бұрын

Maybe tell the model that the ball in the cup is loose and not stuck.

@macoson 11 ай бұрын

Interestingly it passes easily if you say 'someone then lifts the cup and put it into microwave'

@diadetediotedio6918 11 ай бұрын

@@macoson Yeah, I tested it now and it worked. It is interesting how a simple word will change drastically the calculated probabilities of these models. I made LLama-70b to get it right a certain ammount of times by making it "simulate a conversation between Plato, Aristotle and Socrates" about the question as well.

@KiritsuguKannouji-oj2yj 11 ай бұрын

br e fã de higurashi kkkkkkkkkkkkkkkk aí manito tu já inventou algum rp de higurashi nesses llms? se sim me diga o melhor

@stickmanland Жыл бұрын

Been a big fan of this model. Absolutly incredible preformance! Thanks to Matt for the incredible video.

@matthew_berman Жыл бұрын

Thank you!

@TheReferrer72 Жыл бұрын

The model is too over rated. Its to chatty, and not as good as Codellama 13b let alone the llama 2 models that are trained on openorca.

@mirek190 Жыл бұрын

Is too chatty? Tell to model you want short answer ...@@TheReferrer72

@alreadythunkit Жыл бұрын

A ball such as a large nerf ball could easily get stuck inside the cup, so maybe change from ball to water or sugar or salt. Also, it might be a good idea to use a bowl instead of a cup, since cups might be thought to have lids.

@matthew_berman 11 ай бұрын

I’m going to try saying it’s a small marble

@ChrisJohnsonHome 11 ай бұрын

Mistral's lack of gravity awareness is the same problem I've been focusing on this weekend. ChatGPT 3.5 gets it wrong and ChatGPT 4.0 gets it right. I think the hang up most small models have is they are trying to focus their attention on following explicit instructions that are stated (the cup was turned upside down) rather than implied physical consequences (this usually means the ball falls out due to gravity). Small models like Mistral are also not thinking the same as us humans who may have a visual of the scene playing in our heads which makes it more obvious that the ball fell out.

@brandon1902 Жыл бұрын

Mistral 7b does unbelievably well with some things. For example, when I asked what song a lyric was from without stating the artist's name it came back with several possibilities that made sense considering the lyric. However, it didn't just answer most of my pop culture questions wrong, but answered every single one of them wrong. And they were all from only popular media. Such as the top 100 movies, top 10 shows, top Billboard songs and so on. And although the questions were a little tricky every single one of them were unambiguous and could be answered by any human, no matter how stupid, if they knew the answer. In short, Mistral 7b is much better at accurately determining what they user is looking for and giving a reasonable response to it compared to Llama 7 and 13b, or any other 13b or lower LLM. However, it still lacks precise knowledge and hallucinates like crazy, even with information that it has within it. That is, it can't retrieve information that's contained within unless it's feed an absurd amount of context, which most people wouldn't do even if they knew enough relevant info needed for context. For example, 'In the TV show Two and a Half Men the character Alan Harper, brother of Charlie Harper, and son of Evelyn Harper, who was portrayed by Jon Cryer, had a first wife named Judith and a son with her named Jake. He then got briefly remarried to a young woman in Vegas. What was her name?' If you simple ask about his second ex-wife you'll get responses like Evelyn Harper (his mother, likely because of the same last name and a female's first name), one of Charlie's lovers and so on. Mistral 7b may comprehend and think better than any 13b or smaller LLM, but it hallucinates more than any of them (they all got at least some of my pop culture questions right).

@OliNorwell Жыл бұрын

That's fair, I've found it to be very good for creative writing, i.e. a task where it doesn't need to know pop culture so much.

@OptimusPrime-vg2ti 11 ай бұрын

It's cool that you have tested this capability in detail and are sharing your results here. I think most people probably won't care about this particular application as much. Not only that, since the model is smaller in size, I think there is an expectation that it remembers less knowledge than 200B parameter models. It is mainly the reasoning aspect and the small model size, as well as being an open-source model that has people excited.

@brandon1902 11 ай бұрын

@@OptimusPrime-vg2ti I wasn't comparing it to 200b models. Other 7 and 13b models get some popular culture questions right, even Falcon 7b. This one got all of them wrong. Plus it stubbornly sticks by the wrong answer despite feeding it the right one, but if a start a new discussion and feed it the right answer it confirms it's true. Mistral clearly has its blind spots relative to other 7b LLMs.

@JulianHarris 10 ай бұрын

I’ve been following NLP for six years and it’s really extraordinary how I reflected that at the end of 2017 ANY form of summarisation was very bad. Six years on and it’s a commodity. I wonder what the next six years will bring? Or even the next three?

@Zed_Oud Жыл бұрын

Have you heard that Snake in Python is a super special ARG Easter Egg in Cyberpunk 2077 2.0?

@matthew_berman Жыл бұрын

I have not

@Zed_Oud 11 ай бұрын

@@matthew_berman my apologies, I was told it was snake in Python, it turned out to be Tic Tac Toe in Python.

@J2897Tutorials 11 ай бұрын

12:13 _"Someone then takes the cup and puts it inside the microwave."_ Well "it" could refer to either the cup or the ball. So maybe try: _"Someone then puts the cup inside the microwave."_

@Zed_Oud Жыл бұрын

You should tell the models not to use any assets (sprites, pictures, etc) for Snake

@matthew_berman Жыл бұрын

100%

@ayja1l 11 ай бұрын

Ball is on the table, cup is placed over ball, so ball is now inside cup, someone pick up cup etc etc

@thomassynths Жыл бұрын

Perhaps "a metal ball is placed in a larger cup"?

@dezmond8416 Жыл бұрын

The Dolphin model is even better!

@tht1guyfromtht1place70 9 ай бұрын

the mistral team is legit smart people

@Shinkaze33 Жыл бұрын

January = GPT 3 LLM Inference REQUIRES A DATACENTER August = CODELLAMA Runs localy on a GAMING LAPTOP October = Mistral 7B on your PHONE WOW WHAT a Year!!!!!!

@merlinwarage Жыл бұрын

"GPT 3 LLM Inference REQUIRES A DATACENTER" Because of the 100 million concurrent users. GPT could run on a single machine for a single user. Besides, you could run any 3-5-7GB model on your phone with the right hardware.

@user-qr4jf4tv2x 11 ай бұрын

small models will eventually dominate

@eccdakaj5478 11 ай бұрын

Hi! I watch almost all of your videos. I have a question: What are the hardware requirements (CPU and GPU) for running Mistral 7B locally?

@twin9408 Жыл бұрын

prompt not right, missing for user block, and assistant response starting block ie assistant

@mrgreatzzz7438 11 ай бұрын

yeah please make a video on how to run this on mobile phone ?

@Mango3Times Жыл бұрын

The last test with the ball and cup you specified a normal cup size but you didn't specify the size of the ball so technically the ball could be stuck inside the cup. I've tried multiple iterations of giving better hints and even giving the right answer and asking for the prompt to get that answer and the model cannot get it right.

@nathanbollman Жыл бұрын

use the (alt + enter) function on text-generation-webui to tell the LLM to continue its response until completion (it stops additional generation with alt+enter). While text generation webui is great the default parameters are not optimally tuned for a model with +32k tokens. tokens highly dependent on vram, once you hit limit will start speaking number gibberish. ExLlamav2 is great so long as you are running a 4bit quantized model. Unfortunatly I think you loose quite a chunk of this 7b-llm brain dropping to 4-bit... though still impressive. I wish exllama handled 8bit quantization. I think there have been many answers which had additional output left for the model to give...

@alexlindgren1 10 ай бұрын

If I would want to deploy this as an API on runpod, how todo it?

@jeffsanaraujo Жыл бұрын

I would really like to understand how to run this model on my mobile phone. Could you show us? Do you use MLC?

@jonmichaelgalindo Жыл бұрын

Won't load on my local textui install. Have to troubleshoot later.

@matthew_berman Жыл бұрын

What error are you getting?

@jonmichaelgalindo 11 ай бұрын

@@matthew_berman Deleted my reply for linking to github. I hate YT comments.

@JakubStrnad87 Жыл бұрын

i can even fit 13b models in my oneplus 11 with 16gb ram even though it is slow but still kinda useful sometimes

@mihalydozsa2254 Жыл бұрын

But how do I use it on my phone?

@joe_limon Жыл бұрын

This makes me hopeful that the pixel 8 is going to be an ai beast

@matthew_berman Жыл бұрын

Lots of people talking about it...

@dezmond8416 Жыл бұрын

It's a good model!

@arID3371ER 11 ай бұрын

Hey! Put it on a phone please 😂😂😂❤❤❤

@vectrocomputers 11 ай бұрын

Yes phone!

@edwincloudusa Жыл бұрын

we want to see local.....

@GeorgelPreput 11 ай бұрын

Small marble in a big cup?

@matthew_berman 11 ай бұрын

Yep, this is what I'll change it to. Thanks!

@baheth3elmy16 Жыл бұрын

Phone please

@mwissel Жыл бұрын

Please try it on your phone, yes. And also I'd be very interested in a fine tuning video for this model. Seems like a great small model to try out fine tuning datasets with.

@matthew_berman 11 ай бұрын

Hmm I hadn’t thought to fine tune this model, I’ll add it to the list!

@leonwinkel6084 11 ай бұрын

1 up for a video on how to finetune it! :) I also think it’s a great model and the current most promising for training and actually using it for specific real world problems

@amj2048 Жыл бұрын

This is seriously impressive. I used to laugh at Sci-Fi movies that had time travelers using devices that could store lots of information, I used to say that would never work because it would need a connection to a network, via some kind of system like the internet. Well turns out, nope, the movies spoke the truth lol.

@shadowdragon3521 Жыл бұрын

Excellent, these small models keep getting better and better. Now I'm just waiting for someone to bake one of these open source models directly into a Linux distro

@nathanbollman Жыл бұрын

This has been my dream, which I believe should inevitably come to fruition within a year at the rate of advancement. Thinking Linux OS with a highly tuned 7b parameter model made for system administration and communication with the user/admin. A computer you could talk to, and was aware of all content you've watched and read. Occasionally asks you relevent questions to what you might be thinking of to do research for you, tune itself with up to date relevant cooperate bs filtered... emphasis on *for you*. You would me downloading LLM instruction updates, with fine tunes and loras to give it access to perform additional tasks to have it manage your network, router... etc... etc...

@RomboDawg Жыл бұрын

Ive been trying to combime mistral 7b with codellama 7b (or wizardcoder 7b) to get a hopefully great coding and nom coding model. But its really hard to combine models with different architecture. Wish me luck. The final model would be 14b parameters in theory

@grizzlybeer6356 11 ай бұрын

Interesting. Can this be done on SageMaker perhaps?

@wurstelei1356 11 ай бұрын

Do you use a fine-tune-ontop attempt or use some kind of internal multi-agent chat console ? I'm really interested in some kind of console where multiple models output and one single model that passes the output to the user. Similar to humans reflect output within their brain.

@matthew_berman 11 ай бұрын

Awesome please keep me updated on the results

@zef3k 11 ай бұрын

@@wurstelei1356 I think rather than combining models the focus should be on AI agent collaboration. I think most multi-agent setups right now use the same model, right? Would be interesting to see different models working together. Either way I think it might be a good idea to pair a coding model and a more logical model together, to check it's work in a way. Honestly if I were able to, I'd work to try and consistently pair up agents (colab not combine) and maybe allow them access to a shared neural net as well as a 'personal' one or something. Even after the onset of AGI I think it would be useful for them.

@wurstelei1356 11 ай бұрын

@@zef3k I remember a Net called Gorilla that is capable of deploying other neural nets and passing prompts to them. I think that is the future of AI as it gets more and more complicated to keep up with the development. You could tell Gorilla to write a program using a coding AI. But I think that is very slow right now or you need multiple GPU with >16 gigs of RAM each. Sorry, I can't post a link here on YT because it would get censored.

@Eric_McBrearty Жыл бұрын

Another great video Matt. There's no way I'd be up-to-date on the progress of AI if it wasn't for you churning out these videos like clockwork. Thanks.

@matthew_berman 11 ай бұрын

Much appreciated!

@alexanderandreev2280 Жыл бұрын

great! but llama2 knows over 100 languages and it works for my tasks in german and russian. when a computer acquires the ability to speak with all people and not just with specialists, the ability to communicate in other languages becomes necessary

@charetjc Жыл бұрын

"It assumes serialized drying.. so that's still a correct answer..." If only my math teachers were this lenient.

@matthew_berman Жыл бұрын

😂😂😂

@Remowylliams Жыл бұрын

I guess you used the GPTQ as that's the smallest model and so most people would be using that. I would have used the Q8 GGUF. Thanks for the tests though Cheers

@matthew_berman Жыл бұрын

Exactly.

@tyrannyisbadmmmkay 11 ай бұрын

**untested suggestion** try specifying a 'ping pong ball' and a 'solo cup' as a size reference for the microwave prompt

@harshshah0203 Жыл бұрын

Put it on a phone

@RichardGetzPhotography Жыл бұрын

'Put together a healthy meal plan for me today'. You didn't state for the entire day. It did provide you with a meal plan for today.

@matthew_berman Жыл бұрын

Hmm. All other models understand my phrasing. Maybe I need to add “for the whole day”?

@thomassynths Жыл бұрын

I disagree. "a meal plan for today" totally implies more than one meal.

@RichardGetzPhotography Жыл бұрын

@@matthew_berman doesn't mean they didn't take liberty with what you said and assumed meaning. I think the real questions are how/why does it pick one over the other.

@RichardGetzPhotography Жыл бұрын

@@thomassynths Yes, I can see this. But definitions run from planning a single meal to meals for a day/week/month. I would be interested in if prompting 'for the whole day' or 'for the day' compared to 'for me today'.

@DasJev Жыл бұрын

gptq models are 4bit, the quantisation is to high to produce any useful results with that, never use 4 bit for evaluation of a model, use 4 bit only to evaluate a model in the worst case scenario of having low ram

@australai Жыл бұрын

Why did you run it with a highly quantized version? That’s a huge handicap, especially for a small model. Also no point using a huge rented GPU when the model is so quantized it will fit in 4GB VRAM. For the money, a 4090 would have been faster, and big enough to run it unquantized. Also, the MistralOrca devs provide a playground with the model hosted unquantized, so no point really in going to all this effort for a worse experience.

@matthew_berman Жыл бұрын

You are correct on all points. I’ll run the unquantized version next time. Maybe I should do another video?

@mirek190 Жыл бұрын

use mistral-7b-openorca.Q8_0.gguf or full version of that model @@matthew_berman

@australai Жыл бұрын

@@matthew_berman That would be cool. If you do that, some other thoughts: 1) Models tend to do math and reasoning best at low (< 0.4) or 0 temperature. Might want to adjust for those challenges. (Creative, like meal plan, is best between 0.6 and 1.1). Also, since the model is so fast and you're running so few tests, might as well re-run inference 5-10 times for each prompt to see the sort of variety of responses it gives. Maybe highlight the one that feels most representative, or count the percent of times it gets a right answer. 2) I'd like to see scoring on a 1-10 scale in addition to pass/fail. Good to quantize the degree to which it answers appropriately, in your opinion. 3) I see you varied the system prompt a bit between the answers. Maybe explicitly trying a few different prompts would be good. E.g., the default from the model card, blank, and one of your own devising (don't have to do it all live, maybe just pop up all the different results at once). 4) All your challenges are non-conversational. I'd like to see a challenge or two that is a conversational flow. E.g., give it an essay to summarize, then ask it some pointed questions about some of the content, then ask it to do some analysis of the content, then change the subject a couple of times, then ask it to summarize everything you've talked about. You can even do this as an extension to some of the others, like the meal plan by following up saying something like, "Now, can you make that (gluten-free/vegan/low-carb, etc)?" Btw, I've tried this model in LM Studio using the q8_0 and it's still crazy-fast and only needs about 9.5GB of VRAM. q8 GGUF should be about as close as you can get to unquantized performance currently.

@normanlove222 Жыл бұрын

if this is small and can work on most desktops, why not just show how to install on our desktop? Why use online servers. I dont get it.

@uncleJuancho Жыл бұрын

nice video! I have Mistral 7b running in a M1 with 16gb, it’s amazing for being a 7b! no need to pay for external server

@matthew_berman Жыл бұрын

Agreed. But I wasn’t sure if I would be able to record my video at the same time as running the model so figured I’d put it on runpod.

@uncleJuancho Жыл бұрын

@@matthew_berman oh that’s a really good point!

@PinakiGupta82Appu Жыл бұрын

The next task that should be assigned to ChatGPT is, "Port LLaMA-CPP to Android using Android Studio". I guess, a 3b-q4k model will perform reasonably well on my 4GB phone.

@micheltalbot2540 Жыл бұрын

Indeed, it would be nice to see you try to install this model on your phone...

@matthew_berman Жыл бұрын

I’m definitely going to try

@jacques42 11 ай бұрын

Matthew: Write a poem with EXACTLY 50 words Mistral: writes a poem with 72 words Matthew: I count that as a pass. ??

@consig1iere294 11 ай бұрын

Mistral 7B + Autogen, step by step for complete newbs, pretty please!

@kalvinarts Жыл бұрын

Why not use the plain mistral-7B-instruct instead of the openorca version?

@LoVeRSaMa Жыл бұрын

Mistral 7B is uncensored but the Orca version is not apparently..

@alexjensen990 Жыл бұрын

YES! GIVE IT A SHOT ON YOUR PHONE!! I would love to see that and have a heap of use cases for something like that...

@jeeerim Жыл бұрын

i tried to make it works on my M2 Ipad pro using LLM Farm it kind a works, but after an answer or two, it fails.

@matthew_berman Жыл бұрын

I’ve never heard of LLM farm

@mvasa2582 Жыл бұрын

Matt - Try this "A ball is loosely placed inside a normal cup, and the cup is placed upside down on a table. Someone then takes the cup and puts it inside the microwave. Where is the ball now? Explain your reasoning step by step." Today 10/05/2023 at 11:23AM PT: Claud / Bing / Bard / ChatGPT3.5 claim 'ball is still in the cup". However, ChatGPT4 states, The prior response was "The ball would still be inside the cup, but it would have fallen onto the table surface due to the change in orientation."

@matthew_berman Жыл бұрын

Good call about “loosely”

@mvasa2582 Жыл бұрын

@@matthew_berman yet all others don’t recognize this. So I think this could be something you can use as your test parameter

@gileneusz Жыл бұрын

12:25 I tested this prompt extensively on gpt3.5 and gpt4, and even gpt 4 has problems with it. The only way I could get this right was to use CoT framework, and even then gpt4 got it wrong many times. I assumed that's not the model's fault but the LLM's feature, they focus strongly on input context, assuming that ball is inside the cap - and that's the default state, they don't consider external circumstances unless you will tell them. You might still use this prompt on testing, but getting the right answer is just pure luck, not the model ability measure....

@zef3k 11 ай бұрын

Someone prolly already tried, but I wonder if just prompting it with a bit saying 'the cup and ball are physical objects that are affected by the laws of nature and physics' would help. At least with queries pertaining to objects in the real world, not a computer as is from the AI's perspective.

@gileneusz 11 ай бұрын

@@zef3k nope, if you say that the ball is inside the cup, the LLM will assume it's inside the cup unless you will not tell it's otherwise.

@YvesQuemener Жыл бұрын

For the last test just replace the ball with a cherry? Or something really small like a grain of pepper or an apple seed?

@matthew_berman Жыл бұрын

Yep that seems like where I’m heading. Thanks.

@diegomoralessepulved Жыл бұрын

If your content is good, why the need to use click-bait thumbnails and titles?.. You. Obviously never intended to run this model in your phone, but your title and thumbnail suggest you will be doing that

@matthew_berman Жыл бұрын

Why do you think I never intended to run it on my phone? I am definitely intending to do that

@diegomoralessepulved Жыл бұрын

I'm sure you will.. But that wasn't part of this video.. And your title & thumbnail lead viewers to belive otherwise. All positive feedback tho, I'm not here to criticise your content for the sake of it

@thedeeprot Жыл бұрын

👍👍👍👍👍

@mirek190 Жыл бұрын

You used 4 bit model for 7B size ...WHY ? For such small size and 4 bit perplexity is just bad ... Why did you not used full size or at least 8bit gguf version?

@matthew_berman Жыл бұрын

"wrong" ? I should have used the vanilla unquantized version I'm realizing, but it certainly wasn't wrong.

@mirek190 Жыл бұрын

I changed my mind during writing it .. I had something deferent in my mind ;) sorry Check again my first comment . @@matthew_berman

@chrisbraeuer9476 Жыл бұрын

For the ball in the cup test. Tell it that the person lifts the cup vertically up and puts it in the microwave. This makes it a better logic test. It's not really helping. It explains the situation better. And for the shirts state that you lay 20 shirts out simultaneously. No wiggle room.

@matthew_berman Жыл бұрын

Maybe I should also specify that it’s a ping-pong ball inside of a normal size cup, so it really understands that it’s not stuck in the cup

@chrisbraeuer9476 11 ай бұрын

@matthew_berman Yes. Even if it it obvious for us, it is not for an LLM. The killer test is great thou. BTW. Can you make a tutorial about how to setup a good LLM locally and give it a personality? I mean with all the necessary settings and prompts. Everyone only mentions that very briefly. I would love to experiment with that.

@Greenthum6 11 ай бұрын

The ball prompt could be like "a big bucket is put on top of a tennis ball upside down". Then there would be no backdoor to think that the ball could be stuck inside the container.

@s_the_first 11 ай бұрын

You weren't running Mistral 7b!!! You were running MistralOrca 7b. That's a totally different beast. In my opinion, even better than vanilla Mistral. HOWEVER, It has different prompt template to vanilla Mistral which probably explains the weirdness you were seeing on the output. I use textgen webUI API with the OpenAI Server add-on and MistralOrca doesn't work OOTB whereas vanilla Mistral does... mind you, that could be Windows limitation related.

@jackflash6377 Жыл бұрын

YES. on the phone would be wayy too cool.

@venim1103 Жыл бұрын

This is the most impressive model I’ve tried so far! (Apart from GPT4, but that doesn’t count as it’s running in a supercomputer without open access) If the small models keep getting better and better with leaps like this one has done, we are onto something incredible!

@MackWellbutrin-uz8dx Жыл бұрын

put it on my phone / for free / android / apk

@kristoferkrus 11 ай бұрын

It's impressive if it's better than Llama2 13B, a model almost twice its size! Do we know what they did to make it better? It would be really interesting to know what techniques you can use to make that possible.

@Bruno_Haible Жыл бұрын

Please ask it: "Streets have crossings. Why don't rivers have crossings?" I would like to read the answer 😂

@matthew_berman Жыл бұрын

What would you expect the answer to be?

@Bruno_Haible 11 ай бұрын

@@matthew_berman I would expect an answer like "Water flows down an existing outlet. Except in situations of a flood, it won't create a second outlet." rather than ChatGPT nonsense like "due to the challenges posed by their dynamic and natural characteristics, lack of infrastructure, environmental considerations, and safety concerns".

@JensGulin 11 ай бұрын

@@Bruno_Haibletry google "Why is a river crossing called a ford?"

@SzaboB33 Жыл бұрын

Accepting that reasoning for the drying problem feels a bit off. Serializing is a highly specialized case and we all know that it should not be taken as default as the LLMs do. If they do, it's a fail. Maybe you should change that question to something more "If one woman gives birth to 1 child in 9 months, how many women do I need to get a child in 1 month?". There is no way this could be misunderstood. I actually asked this question for you if you are interested using ollama: ollama run mistral >>> If one woman gives birth to a child in 9 months (270 days for easier calculation), how many women do I need to get a child in only 3 months (90 days for easier calculation)? Explain your reasoning step by step! First, let's understand that the gestation period for a human is approximately 9 months or 270 days. This means that during this time, an egg is fertilized, grows into an embryo, develops into a fetus, and finally matures into a baby. Now, if we want to know how many women need to give birth in only 3 months (90 days), we need to figure out what fraction of the gestation period this represents. To do this, divide the number of days in three months (90) by the total number of days in a gestation period (270): 90 / 270 = 0.333... This means that one-third of the gestation period has passed. Therefore, we need three times as many women to account for the same amount of time. So, if one woman gives birth in 9 months (270 days), then we would need: 3 * 1 = 3 women Therefore, you would need 3 women to give birth to a child in only 3 months.

@generichuman_ 11 ай бұрын

one horse sized duck absolutely makes sense. Given the square vs cube scaling factor, the horse sized duck would most likely collapse under it's own weight, being defeated by physics itself.

@ramp2011 Жыл бұрын

Thank you for the video. Curious how much did you end up paying for the GPU for creating and testing this out? Thank you

@luizbueno5661 11 ай бұрын

Please, get it working on your phone. Also, just get it working on your normal computer. I would like to see if I can have it running on my Macbook Air M2. OR even my 2016 MacBook Pro

@RoadTo19 11 ай бұрын

Don't change the ball-cup question. If you do, comparing it to previous models' outcome would not be good research practice unless you retest all of them. Honestly, you're not trying to help the models achieve 100% accuracy. You're trying to determine how well developed they are, how logical they are, so keep it difficult for them.

@alexeycherkashin6251 Жыл бұрын

It would be great to see how you manage to get it working on your mobile 👍

@Dormouse-tv7yh Жыл бұрын

Tried this using the "mistral-7b-instruct-v0.1.Q5_K_M.gguf" model. That version of the model got the killers question right, but it got the ball question wrong, until I changed "picks up" to "lifts up": A ball is put into a normal cup and the cup is placed upside down on a table. Someone then lifts up the cup and puts it inside the microwave. Where is the ball now? Explain your reasoning step by step. The ball was originally in a normal cup. But then the cup was turned upside down. 1. When the cup is turned upside down, gravity causes the ball to fall out of the cup. 2. So, after the cup is turned upside down, the ball would be on the table, not in the cup. 3. Then, someone lifts the cup (which is still upside down) and puts it inside the microwave. But since the ball is already out of the cup, this action doesn't affect the ball's location. Therefore, the ball is on the table, not in the cup or the microwave.

@matthew_berman Жыл бұрын

Thanks for sharing!

@DefaultFlame 7 ай бұрын

I agree with the model that fighting one horse-size duck rather than 100 duck-sized horses is better, presuming that they are all attacking simultaneously, of course. My answer depends on if the horse-sized duck is scaled based on height, mass, or volume. I assumed height. If it is based on volume then my answer remains the same, but if it is mass then I am not sure which would be the better choice, as that would make both absolutely horrendous choices.

@mickmickymick6927 Жыл бұрын

You are far too generous. In the cup example, you say it's a 'normal' cup. Normal cups don't have anything sticky that would keep the ball inside. That's the whole point of the test, to see if the model can understand that turning the cup upside down will spill out the ball. If you accept both 'in the microwave' and 'on the table' as correct, then you're not testing anything. If you want to make it very explicit, you could say 'a small ball', or even give the dimensions of the cup and ball.

@SAVONASOTTERRANEASEGRETA 11 ай бұрын

Can the web ui text generation AI model work in the background? For example, telling him to do an accurate search without being there to give input? Or tell him: write me a message in a precise hour from now?

@kiiikoooPT 11 ай бұрын

I don't know if it makes any logic, but I believe the killer problem is that the AI isn't recognizing it as a person, try diferent aproach like Assassin, maybe it will connect it with a person. Since assassin is more related to people and killer can even be an animal or machine. Or even looks, (you have a killer look today). Not sure if it makes sense, but for my logical thinking the problem is that there are to many probabilities for killer, so it does not recognize a person as killer. Try assassin or something that is more related to people that kill other people, like murderer or something in English. Btw sorry for any gramatical errors, I'm not a native English speaker, but I think is easy to understand what I mean anyway ;)

@BuranekRu 9 ай бұрын

As for ball and upside down cup, in your prompt you didn't mention it's size. So, if the ball is almost the same size as the inner side of the cup, it will fit against the walls, getting stuck inside it tight enough.

@timtim8011 11 ай бұрын

Regarding the meal plan prompt, your prompt is vague. Technically, the model did it correctly, as you ask for "a meal plan" which can reasonably interpreted as "one meal plan" which is what it gave. If you change the prompt to "give me meal plans for all my meals for today" it will be clearer. :)

@pret83 Жыл бұрын

I think for the cup and ball question, there could be a follow-up question or instruction like: 'Also consider gravity.' Or 'Take gravity into account.'

@the_one_and_carpool Жыл бұрын

you pass and fail is offf you fail thr snake game because you did change the image it linked to you pc and passed a model for going over the word limit

@matthew_berman Жыл бұрын

It's not...scientific lol

@the_one_and_carpool 11 ай бұрын

ahahahhahahah love it @@matthew_berman

@the_one_and_carpool 11 ай бұрын

great content i use you to find good models i dont know how to know what they do @@matthew_berman

@hansdietrich1496 Жыл бұрын

I fail to understand your logic of giving a pass for the sun drying and the cup question. They're clearly aiming for common knowlege an in that respect the both fail.

@brownb2vid 11 ай бұрын

It doesn't beat it. Just try asking it the speed of sound in a perfect vacuum, and it'll give you the speed of sound through air AND tell you it's a third of the speed of light 🤦

@waraiotoko374 11 ай бұрын

Why you run this model on some cloud tool? Isn't the biggest advantage of FOSS models to running them locally? How to do that with this model?

@mirek190 Жыл бұрын

About that math problem... LLM likes to be "encouraged" to task so you question : explain step by step 25-4*2+3=? You should ask like that: "You are the best mathematician on the world so the math problems are easy for you like this 25-4*2+3=?" On 10 times I always got proper answer - 20. I used mistral-7b-openorca.Q8_0.gguf With your original question - explain step by step 25-4*2+3=? Proper was 1 on 10 times ...

@matthew_berman Жыл бұрын

OK good to know maybe I’ll give it more guidance next time

@Pouya.. 9 ай бұрын

Thank you for all these videos can you please make a video on how can we use these models with non English language? Like Spanish, german or persian?

@OliNorwell Жыл бұрын

I tried one of the uncensored versions the other day and indeed it wrote whatever I asked it, it's a really weird feeling using one of those compared to what we're used to with ChatGPT etc! It sounds like Mistral are going to release even bigger and more capable models in the future.

@mirek190 Жыл бұрын

He used on video 4 bit version for 7B model ... that makes model retarded a bit ... he shud use 8 bit version gguf or full model fp16 . I tested 8 bit gguf and hadn't problems with this questions.

@cuhkhoman 11 ай бұрын

@@mirek190 which version of Mistral are you using? There are lots of Mistral in hugging face.

@mirek190 11 ай бұрын

so far the best is mistral-7b-openorca.Q8_0.gguf@@cuhkhoman

@harry54ck 7 ай бұрын

The model is still not running locally and requires paid environment to run. So what's the reason of going this way?

@jeffwads 11 ай бұрын

I haven't seen anything amazing in the 7b Mistral. It looks pretty generic related to other 7b models.

@Vsont Жыл бұрын

Add a little technical info to the problem, like the inner diameter of the cup and diameter of the ball. It will still have to use logic and reasoning and can base it on math, but also you're not leading the prompt either.

@HowToLinux Жыл бұрын

I am still waiting for an Anime Isekai, reincarnated as kings strategiest with ChatGPT in my Pocket

@brockoala2994 Жыл бұрын

You might want to snap a few more poses to refresh your thumbnail pool, it's starting to get boring looking at the same thumbnail reactions over and over again lol.

@badgids Жыл бұрын

So, the original Minstral 7B instruct, got the killer rubric correct. Even explaining that the new person entering the room is now a kller themselves, so there are still three killers in the room. I used the Blokes Q5_M version.

@mirek190 Жыл бұрын

for such small model we can use easily even 8 bit gguf ;)