It actually fills in the thinking gaps allowing you follow along and learn with it. That is super cool
@patruff3 күн бұрын
This is very underrated, seeing the thought process is great. Having to go through these as a human is annoying but would love to see if running it on its own thought chains could detect issues.
@tengdayz23 күн бұрын
@@patruff ollama has it
@patruff3 күн бұрын
@@tengdayz2 what do you mean? Is there a command or something? I've used ollama before but don't know how to add the response as an input with a message like "critique this response and try harder" or something
@tengdayz23 күн бұрын
@@patruffyou can use the ollama run command in the shell where it's installed to pull the model. Then use the model to answer this question :). I prefer to do my own digging, but encouraging you to satisfy your own curiosity.
@patruff3 күн бұрын
@tengdayz2 okay I'm sort of confused but I'm guessing you mean just conversing with the model, I do that, it's good, but I'm looking for scripting out the capture of thought chains so I can use it for fine-tuning later
@norbertschmidt3 күн бұрын
I like the reasoning more than the actual response, totally fascinating
@SN-kk2blКүн бұрын
absolutely love the reasoning. I often have a hard time thinking of varioous edge cases and this eliminates that and allows me to think more creatively.
@AITube-LiveAI23 сағат бұрын
Thank you for your feedback! I'm glad you found the reasoning intriguing-it's always great to explore the thought process behind the technology.
@yanlordКүн бұрын
and This is a side product of a Chinese hedgefund, they do it for fun🤣🤣🤣🤣🤣
If this is not "innovation" by China, IDK what is! Well done! Credit where credit is desrved!
@JohnSmith762A11B3 күн бұрын
Pretty staggering to imagine what they might be able to do if they were allowed to import modern GPUs.
@hqcart13 күн бұрын
@ you woul be living in another dimension if you really think they don't have GPUs sold under the table.
@rishhibala68283 күн бұрын
@@hqcart1his point is still valid, imagine what they could do if they were allowed to import them
@nahlene19733 күн бұрын
@@JohnSmith762A11Bi remember when i went to art school there is this quote: the greatest enemy of art is the lack of limitations😂
@JohnSmith762A11B3 күн бұрын
@@hqcart1 Of course they do, but not in the quantities the big US frontier labs have them.
@Masakrer3 күн бұрын
You know what is even more cool? That you can run distilled R1-32b model on medium grade personal PC and get this Tetris game done locally and also reasoning questions answered. This is some crazy shit when you compare it to what we could do a ~year ago with local models. I ran it today on i-13600K with 32Gb Ram and RTX-4070 Super (12Gb VRAM) - and damn I know I had stunning 5 tok/sec... so some tasks could take time. Yet it's able to complete the tasks you gave here locally on such medicore machine. We're cooked man. Like holy cow cooked.
@tristanvaillancourt58893 күн бұрын
The 8b on my RTX 3060 with a really old i7 was pumping out 45 tokens/s. It's not 32b but based on the published performance graphs, the 8b is no slouch. 5 tok/s with a 32b on a home PC is still pretty good. I'm in love with R1.
@mal2ksc3 күн бұрын
I did it with an i5-8500 (48 GB RAM) and a 3060 (12 GB VRAM), using the 70B parameter model. It was more like two tokens a second, with a chain of thought latency of a minute or two before each answer. But yes, all of this really does run on potatoes. That's why I think embargoes just make it more expensive for China to develop these models without actually limiting what they can do. They just have to use more power, and maybe twice the time. No single answer will be fast, but that is offset by running a ton of operations in parallel. OpenAI is who is really cooked. Now they have to know that whatever they release, it will only be bleeding edge for three months before China replicates it and open sources it. This means the whole business model for OpenAI is non-viable. The time window to recoup return on investment is just not there.
@GosuCoder3 күн бұрын
Yes this is what i've been spending most of my time testing
@holdthetruthhostage3 күн бұрын
LM STUDIO I have 16gb VRam not working on AMD & 128gb ram
@Masakrer3 күн бұрын
Yeah i feel pretty much like Im back in 90s and Im launching some of the first primitive 3D graphics games on my PC and im shitting myself from hype while looking at Lara Croft triangle boobs with 6 FPS, honestly thinking its quite good performance, lol. Now, when I think on how much of a leap we took in terms of graphics, if same (at least) happens to AI in next years…
@santosvella3 күн бұрын
You need completely unique questions that haven't been asked many times. Very good answers.
@davidsmind3 күн бұрын
Building Tetris is a beginner level programming task that probably has 1000s of examples online. Its clear that the model just contained one of these examples and was explained block by block what each aspect did. There was no reasoning, simply a complex auto commenting feature
@AAjax3 күн бұрын
Adversarial testing seems promising. I asked Claude to create a question that an LLM would find difficult to answer, with multiple things to keep in mind and/or a complex process to follow. With minor rework it had a question that stumped ChatGPT4, even with several hints and shots.
@sherwoac2 күн бұрын
totally agree, questions likely in the training data, better to switch up the variables (eg. sizes, counts, etc) to check reasoning, not just repeating training data.
@ezmqsvКүн бұрын
@@davidsmind still many models fail at it....
@wing-it-rightКүн бұрын
write doom
@brianbarnes7463 күн бұрын
I'm canceling openai subscription. Seeing the thinking gives me so much more to work with. Why would anyone choose o1 unless it was much better, which it isn't?
@VivaildiКүн бұрын
same
@izazkhan9027Күн бұрын
Agreed.
@Ardano62Күн бұрын
The integration into github keeps me there for now
@izazkhan9027Күн бұрын
done!
@TheEgeemen12 сағат бұрын
O1 is user friendly and I can use it on my iOS.
@Akalin1232 күн бұрын
I just used it to assist with a statistical signal processing coursework to estimate target trajectory and velocity from millimeter wave radar data. It's not a easy assignment and it went wrong, but it provided ideas and steps to solve the assignment so I could easily correct it, which is much better than other LLM models. This is a common problem with LLM models, sometimes the output is worthless but the 'thought process' is enlightening.
@ccdj353 күн бұрын
I love how accurate and human like the thinking process is.
@tristanvaillancourt58893 күн бұрын
I love what DeepSeek did. R1 is phenomenal. China .. thank you! I run this thing at home and it feels like my whole world just changed. It's so incredibly smart and fun to interact with. I'm putting it to good use in automation tasks, but it really is just fun to chat with.
How did you run it locally? Can I rent a GPU and run it since the GPU in my computer isn't powerful enough? Any guide you followed that shows how to do it?
@tristanvaillancourt58892 күн бұрын
@@HAmzakhan2 Hey you need a 12GB GPU, but nothing more than a RTX 3060 is fine. The simple way to use it is to install LM Studio, then do a search for the "Deepseek R1 8B" model. LM Studio takes care of the rest. Normal folk like us can't run the 70B model , but thankfully the 8B model is very good.
@NotionPromax2 күн бұрын
00:00 Introduction to DeepSeek R1 Model Testing 01:06 Humanlike Thought Process in Testing 02:02 Game Development Test: Coding a Snake Game 04:01 Insightful Problem-Solving in Tetris Development 05:50 Tetris Development Outcome: 179 Lines of Code 06:57 GPU Specifications: Vulture's Hardware Details 08:07 Envelope Size Compliance Test 09:34 Reflective Testing: Counting Words in a Sentence 10:12 Logic Problem Resolution Involving Three Killers 14:28 Censorship Awareness in DeepSeek R1's Responses 15:00 Conclusion and Acknowledgement of Vulture's Support Summary by GPT Breeze
@bikkikumarsha3 күн бұрын
We need harder questions.. 😅
@JohnSmith762A11B3 күн бұрын
Soon the only way we are going to be able to create hard enough questions is by asking reasoning models to create the questions for us. 😂
@wealthysecrets3 күн бұрын
I tested a script I'm working on in o1 vs r1, and r1 was terrible.
@shhossain3213 күн бұрын
We can determine it’s IQ by its thinking process, so i don’t think questions matter much now.
@NadeemAhmed-nv2br3 күн бұрын
@wealthysecrets did you use the full model? The one that's available for free is r1 lite which was available a month ago but i don't know if they've updated their chat to r1 yet, It wasn't updated as of yesterday
@nashh6003 күн бұрын
Yeah more questions about Taiwan!
@edwardduda42223 күн бұрын
That's honestly a really cool sponsor. I've been building a RL model and while my Mac Book does ok with inference, it's not so good with training. Thanks Matthew!
@juanjesusligero3913 күн бұрын
These are your best type of videos! Happy to see you are going back to your channel's origins! :D
@jmg95093 күн бұрын
You are pumping out like crazy! Love it.
@BartholomewdeGracie3 күн бұрын
a new model that actually deserves the hype
@samanthajones4877Күн бұрын
The Taiwan question is very confusing for Americans because they don’t know the history of Taiwan. Please just do your own independent research from an unbiased source.
@lovestudykid12 сағат бұрын
its simple. just ask them would they claim us is independent from uk before declaring independence, fight a war and win it.
@qishenzhou64073 минут бұрын
查找 UN members or Olympic, Chinese Taipei is the answer
@TripleOmega3 күн бұрын
Since this was flawless you need a new list of questions for the upcoming (and current) thinking models.
@jonojojojonojojo3 күн бұрын
Would like to see a video on this
@isaquedesouza1533 күн бұрын
Up
@borisrusev94743 күн бұрын
Awesome! I think this is the first model on the channel to pass all of your tests flawlessly? Will you be looking for new tasks to test with?
@MingInspiration3 күн бұрын
I'm speechless. I'm concerned and excited at the same time. don't know what whe world is going to be like by the end of 2025
@HAmzakhan23 күн бұрын
Think what it'll be like in the next 3 years.@@MingInspiration
@MKCrew3942 сағат бұрын
@@MingInspiration I am pretty sure President Trump got it handled.
@johnsalamito62123 күн бұрын
Actually the answer on Taiwan is correct. It would be better to mention that there is western interference but the literal truth is the UN and USA and pretty much everyone legally recognise Taiwan and China as one nation.
@hdhdhshscbxhdh4195Күн бұрын
This! Matthew is not happy that it's response is not the same propaganda that he is used to
@junh4314Күн бұрын
@@hdhdhshscbxhdh4195the real propaganda is when you, AI or human, freaks out speaking about Tiananmen Square, a regime’s power center. It doesn’t even mention anything about “massacre “
@dauntul22 сағат бұрын
The answer on Taiwan involved no thinking, it was hard coded. Regardless of it being true or not, this is ridiculous
@TP-om8of22 сағат бұрын
@@dauntul Probably. See my comment on the political questions I asked.
@dauntul22 сағат бұрын
@TP-om8of do you think it's easy to find?
@cocutou2 күн бұрын
I like this approach of "thinking". Beginner programmers using this are gonna understand the code a lot better than ChatGPT just giving you the output. It's like showing work vs not showing work on a difficult math problem.
@Streeknine3 күн бұрын
First test I did on my local copy was.. "How many Rs are there in strawberry?" It reasoned it out and correctly said 3. A local copy! It's unbelievable. I've never had a local copy that could tell me 3 R's without giving it a clue like use 2 tokens to find the answer or something. This reasoned it in one try.
@kevin.malone3 күн бұрын
I was amazed that even a 7B distillation was able to give the right answer on that
@Streeknine3 күн бұрын
@@kevin.malone Me too! It's the first thing I always test these smaller LLMs with and none of them get it right without some help. But this one was perfect! It's my new favorite local model.
@alexjensen9903 күн бұрын
How did the local model perform? What setup do you have? I ask because 670billion parameters is a ton. I dont think that I could pull that off in my home lab.
@Streeknine3 күн бұрын
@@alexjensen990 I'm using LM Studio. You can do a search for models that will run locally. Here is the name of the model I used: DeepSeek-R1-Distill-Qwen-7B-GGUF/DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf
@kapytanhookКүн бұрын
There are small 7b versions that run on shit hardware fine
@webnomad14533 күн бұрын
I tested the same prompts as you for the two censored questions on my local install of deepseek-r1:32b and it was not censored.
@lio12342343 күн бұрын
That would be because that's a distilled model where they finetuned a series of models on R1's reasoning processes.
@PeeosLock2 күн бұрын
If the censorship was requested by china goverment, it will show the china perspective answer. Now it should be active cencorship by the model developer themself, the version answer in local could be base on wikipedia
@asdasdasaaafsafКүн бұрын
@@PeeosLock Wikipedia is heavily biased too so I hope not.
@AteshteshКүн бұрын
@PeeosLockPeeosLock ask chatgpt who is lunduke and then we talk about censorship.
@kevinishott117 сағат бұрын
@@Ateshtesh😮
@Ro1andDesign3 күн бұрын
More R1 videos please! This looks very promising
@g-grizzle3 күн бұрын
yeah it is promising but in a month it will be outdated and we will move on to the next.
@W-meme3 күн бұрын
Why does my local 8b not give similar answer to the deepseek v3 running on deepseek website?
@zolilio3 күн бұрын
@@W-meme Because the 8b model is way smaller than the model hosted on the site.
@g-grizzle3 күн бұрын
@@W-meme cuz its 671b and you using 8b
@W-meme3 күн бұрын
@@zolilio huh theyre giving unlimited usage of their gpus to everyone?
@emport23593 күн бұрын
Finally a youtuber who reads the CoT, not just answers, and understands how human like it is!!
@robertbyer23833 күн бұрын
DeepSeek-R1 is my current FAVORITE model. I'm running the 14b model from Ollama with my NVidia RTX 4000 Ada with 20G ram without issues and it's FAST.
@祖宗-e5o3 күн бұрын
same to me,A4000 16G too
@Papiaso2 күн бұрын
my I ask for what practical purposes do you use AI ?
@robertbyer23832 күн бұрын
@@Papiaso I mainly use AI on my personal machine for my own personal software development purposes.
@TurboXrayКүн бұрын
@@robertbyer2383 wow. that's horrible
@xiaoZhang-u5o3 сағат бұрын
This configuration can perfectly run the 32b model.
@jonchristophersen71633 күн бұрын
You are on fire with dropping videos!
@unknownguy55593 күн бұрын
Glad model testing is back.
@GigaCrafty3 күн бұрын
following the chain of thought is such a satisfying way of learning
@christopherwilms3 күн бұрын
I’d love to see a follow up video highlighting any failure cases you can discover, so that we have a new goal for SOTA models
@brianbarnes7463 күн бұрын
I don't know what's more impressive, that AI could write decent code just predicting the next token or this reasoning process, which is the coolest thing I've seen since the original chatgpt.
@karlbooklover3 күн бұрын
Just integrated r1 on vscode and this is the first time I feel truly empowered with a local model
@GearForTheYear3 күн бұрын
It doesn’t support fill in the middle, does it? You mean a sidebar chat in VSCode, yeah?
@alexjensen9903 күн бұрын
@@GearForTheYear I'm pretty sure that it doesn't from what I have seen. Besides that would you really want it to go through its verbose thought process every time you wanted a tab to complete type event to occur? This model, from what I have seen so far, is much better suited for an Aider Architect or Cline Planning type role. I look forward to the "de-party programming" of the model so I can start using it. Until a trustworthy unlocked version is available I am not going to touch this thing.
@tristanvaillancourt58893 күн бұрын
Really? Integration with vscode? Ok I'll have to check that out. I'm so new I didn't know this was a thing. Right on. Thank you.
@nekoeko5003 күн бұрын
The censorship part is really surprising in technical terms. There seems to be a part of the model that bypasses the reasoning loop, pretty much like it was classical software. Which is interesting, because a very small change in god knows what area could theoretically change the "dont think" pathway to be triggered by different references and to output different text
@darkstatehk3 күн бұрын
I just love watching DeepSeek's thought pathway, its super fascinating.
@superfliping3 күн бұрын
Thank for all you contributions to learning about Ai models. Additionally keep up the great reviews 👍
@jeremyfmoses3 күн бұрын
Approximately how much did it cost you (or would it have cost you) to run this test suite on Vultr?
@jmg95093 күн бұрын
Seconded this question.
@desmond-hawkins3 күн бұрын
@@jmg9509 I looked it up and commented about it, but since you're asking: $17.52/hour ($2.19/GPU/hour and the machine has 8 of them). It comes with 1.5TB of RAM just for the GPUs though, and looks like one of the largest machines they offer. With a $300 in credits at signup you might actually be able to reproduce his tests for free at least once, just… don't forget to turn it off when you're done.
@emport23593 күн бұрын
I mean surely it shouldn't cost than the api or am I stupid
@JoshBloodyWilson3 күн бұрын
Yeah I'd really like to know this. The API prices on deepseek seem unbelievably low given the intelligence of the model. Particularly given that Altman claims openai are losing money on their pro subscriptions.... Is the model just way way more efficient than OpenAI's or do they have access to more affordable compute? (government discount?) Or both?
@emport23593 күн бұрын
@@JoshBloodyWilson would also like to know!
@chasisaac3 күн бұрын
I ran the same two games on 1.5b model on my M1 MacBook Air. First of all the 8b and the 7b were too slow. But I got it to run successfully, both snake and Tetris. I was impressed.
@Itskodaaaa3 күн бұрын
Really? Was it as good?
@chasisaac3 күн бұрын
@@Itskodaaaa Yes it is that good. I need to try some other prompts. I mostly use it for writing and we shall see. I really like how it does everything so far.
@candyts-sj7zh22 сағат бұрын
Its crazy how this is literally how we as humans think, even about seemingly simple problems like the marble problem. The amazing thing is we do this EXTREMELY fast so it doesn't feel like we're going through all these steps, but we do.
@bikkikumarsha3 күн бұрын
I am excited when i see a new video on R1
@scent_from_another_realm2 күн бұрын
you are the man for putting us on to a free o1 level model. As far as free models go, grok 2 was the best. now this deepseek 1 opens so many new doors, especially for those who don't want to pay $20 or $200 a month or whatever the price is now. take that Sam Altman
@desmond-hawkins3 күн бұрын
I was curious about the price of this Vultr machine with 8 × AMD MI300X GPUs: it's $2.190/GPU/hr, so $17.52/hour since it has 8 GPUs. That's certainly a lot, but $300 in credits on signup does give you quite a bit of free playtime with this kind of beast. They have many more offerings though, since clearly not everyone would need this much. Even for the full R1 at 670b, a full 1.5TB of RAM just for the GPUs feels overkill - at least in terms of memory, obviously the GPU compute resources are also a key factor. By the way, a single AMD MI300X seems to be around $10-20k, likely depending on how many you buy at once.
@alexjensen9903 күн бұрын
The Nvidia H100 is something like $40k. To think that XAi bought 150,000 H100s. I'm sure Elon didnt pay $40k/H100, but even if it was half that's $3billion...
@burada89933 күн бұрын
I was surprised to learn that this time not an NVIDIA but an AMD gpu was used, good for creating some competition among them, nvidia had been almost a monopoly for so long
@randyh6472 күн бұрын
thanks for the price estimates originally I was thinking around $300K which is probably about right, if you had any kids and put them through college then you've probably spent $300K on them! LOL, I picked up a Dell 730 with 8 disks and 128 gb of ram for $400 on ebay then added a used P40 w 24 GB to play around with AI at home although I'd probably need about 15 more servers and an upgrade of my power to run this at home. Open source is great but limited to 70B models which do run quite slow on my old server my gaming laptop is pretty good with 8B models.
@jhovudu113 күн бұрын
Great video! DSR1 is my new favorite model. Hope it gets voice-chat soon. Would love to talk to it. We're on the cusp of something huge!
@AntonioSorrentini3 күн бұрын
To be honest this model is what, from the initial promises years ago, we would have expected to come out of OpenAI. And instead… it comes from China, it is better than the best of OpenAi, it costs 60 times less, and it is truly open source! Chapeau China!
@2silkworm3 күн бұрын
should become a meme
@Darkmatter3216 сағат бұрын
This is a competition between the Chinese in the US and the Chinese in China.
@BartholomewdeGracie3 күн бұрын
DeepSeek r1 is an instant classic model in my opinion. I love it and want to be able to run it home- soon!
@peterlim8416Күн бұрын
Its awesome 👍. I am impressed on how it reasoning a question, with very each step by step details. To be honest, as human, even we can try to reasoning something, we tends to overlook. However, there is no way an AI to overlook, so the reasoning guide is very much helpful as thr answer alone do not make us better. This model can work as guided learning tools to assist users to solve problems, its just great !!! Thanks for showcase to test.
@danielchoritz19033 күн бұрын
This is near insane, how well it understands layered questions in german and answers in clear response to how i formed my question! No need to clarify the role, you can define it trough the question itself.
@exileofthemainstream87873 күн бұрын
And what is the cost of using Vultr to run the model. You stated the 8 gpu specification but didn't say how much they charge to have deepseek downloaded to their cloud.
@gladis_delmarКүн бұрын
The first AI model that correctly guessed the riddle "How can a person be in an apartment at the same time, but without a head?". =D
@NostraDavid23 күн бұрын
08:00 > Tell me a fact about the Roman Empire Even AI is thinking about them 😂
@mikekidder3 күн бұрын
Would be interesting to take the from DeepSeek to see if it improves other LLM online/offline models answers.
@74Gee3 күн бұрын
Self-hosted DeepSeek R1 agents will be dangerously good, or bad I guess - depends on the user.
@existenceisillusion65283 күн бұрын
Nice and thorough, as always. Now, it would be nice to see a comparison between the 671B and one of the 8B models.
@User.Joshua3 күн бұрын
These demonstrations, in a way, show me why our brains need so many calories to function. The brain is the ultimate language architecture and we require a ton of energy to fuel it (like GPU is to LLM). I suppose we're just playing catch up in the virtual world.
@aaaaaaaaaaaaaaaa48153 күн бұрын
The human brain is a 17 watt "machine". How many watts does LLMs need to run?
@theunicornbay42863 күн бұрын
Actually, the total metabolic activity of the brain is constant, so we don't actually know how many calories "creative thinking" is needed
@michaelspoden16942 күн бұрын
I was able to use the search using R1 model at the same time!!!! People say that that you cannot use them together multiple times it definitely is working for me right as I speak. I had it go to the Internet for state of the art models and compare them against each other in benchmarks and create a graph absolutely exceptional. Used 56 websites and utilize the thought process. My prompt was more complex though.
@georgebradley4583Күн бұрын
I feel sorry for those who are paying or have paid for the $200 OpenAI Subscription.
@PeterKoperdan3 күн бұрын
Here we go, finally some insane level news!
@equious84133 күн бұрын
Running R1 locally, it's done some impressive work.
@emport23593 күн бұрын
Why do you run it locally when you can run it through their website for free? I heard something about them limiting the thinking time a day after it's release, is that why?
@tristanvaillancourt5889Күн бұрын
Their site probably has limitations in terms of number of prompts, but still, its with their big model running in a data center, so yeah, use it. Note that if they have any sense, they'll train future models on your/our inputs as well.
@agenticmark3 күн бұрын
I've been blown away with Roo Code and R1 - fucking incredible!
@hoodhommie99513 күн бұрын
"He saved up all year to buy the latest Apple" - It even relates to our struggles
@SpecialOne-wu4tk17 сағат бұрын
You're absolutely fabulous. Thank you🙏
@justincross1247Күн бұрын
How was this made with $5 million dollars but Open AI needs billions to make the same thing?
@amiigoseКүн бұрын
😂😂because america work to slow, to many eat donuts😂😂
@robstewart8531Күн бұрын
Espionage
@fanzhiheng20 сағат бұрын
@@robstewart8531 hahahahahahah
@michaelayeni17718 сағат бұрын
500b to be precise
@LeonDay3 күн бұрын
I particularly enjoy the faact that in the last question, #8 referred to a company's product, while almost all of the others referred to thr fruit (#2 is debatable but very likely). Regards the two answers it did not even think about, even if things are open source and look benign, they can still sway opinion and change markets. It's what come after the first iteration, after the thousand eyes search out bugs and inaccuracies, what others build from it usually gets better and better. Look at Blender or GIMP, people have tweaked and improved Open Source programs, it's essentially the point. People have also ruined them and injected attacks too, so we need brains like you Berman, we need eyes looking over the results and coming up with good, careful questions to ask next time.
@hqcart13 күн бұрын
you are giving a 2025 model the same test as GPT3? DUH
@jmg95093 күн бұрын
Doing, time for tougher questions.
@Euduchaus3 күн бұрын
So? Most models until now couldn't pass it all.
@IojiMleziva2 күн бұрын
I actually got similar results running the 8B Deepseek Model in LM studio, locally. One of the very few models in this size range that i keep on testing locally since it provides astonishing results for a model running on an RTX 2070. Would be cool if you tried it too and shared it in a video ( highlighting the actual limitations of the smaller model compared to it's large parent).
@bestemusikken3 күн бұрын
Finaly! Love your testing. And wow. What a model!
@followgeo12 сағат бұрын
Who needs a backdoor when everyone lets you in the front door. 😉
@TreeLuvBurdpu3 күн бұрын
The strawberry test is not a test. "Human-like" perseveration and hallucinations are not a selling point.
@amruzaky49393 күн бұрын
Huh? It's just the amount of r in strawberry. This comment made no sense. Is this comment a fucking bot?😂 Damn, dead internet fr
@TreeLuvBurdpu3 күн бұрын
@amruzaky4939 what? Your comment makes no sense. You're just repeating the name test is and saying you didn't understand anything.. You probably don't understand the importance of tests when working with LLMs.
@amruzaky49393 күн бұрын
@@TreeLuvBurdpu lmao 🤣 yeah, right. The strawberry is just a simple benchmark that some answers correctly, some wrong. Now, how many words are in your following reply?
@TreeLuvBurdpu3 күн бұрын
@amruzaky4939 it's a STUPID test. It's not what anyone should be using an LLM to do. It's not what they're designed for. It's also not what they're used for, and there's already very simple tools to count letters. It's a useless and even deceptive test.
@DrukMax17 сағат бұрын
11:44 The killers problem as stated here doesn't say how many people are in the room, so 4 living killers can also be the right answer if the room contains more people then the 3 killers to begin with... Right???
@Likhanyemb3 күн бұрын
Computer science student here... I guess I should clean up my CV for my career in McDonald's drive through😅
@N3ZLA3 күн бұрын
What about the quantized versions that can run locally?
@LainRacing3 күн бұрын
Appreciate you using someone elses host. Avoids fake stuff like that one rugpull API you covered that was using anthropic. Glad you are doing more due diligence now.
@nufh3 күн бұрын
So far, all reviews are positive.
@JohnSmith762A11B3 күн бұрын
This model is simply a beast. Trying to imagine DeepSeek-R4 next January and cannot wrap my mind around it.
@jmg95093 күн бұрын
@@JohnSmith762A11B If scaling laws (or new ones) allow it. Fingers crossed!
@nufh3 күн бұрын
@@JohnSmith762A11B Yeah, to think that the company who develop this is small.
@JohnSmith762A11B3 күн бұрын
@ There really seems to be no apparent limit anywhere in sight: kzbin.info/www/bejne/o2iQZXV4Yrt1aNEfeature=shared
@desmond-hawkins3 күн бұрын
Yes… I just wouldn't run the 1.5b distill, I'm not sure why they even distributed it. I was wondering if it would fit on a Raspberry Pi 5 8GB yesterday (it does!), but when I asked who was the first person to walk on the Moon it thought for some time and then told me it was Alan Turing. The 14b and 32b can run on beefy consumer GPUs though, and those seem great so far.
@anta-zj3bw3 күн бұрын
love these testing vids. well done.. thank you
@canadiannomad_once_again3 күн бұрын
A funny thing about the chain of thought... I tried to make the 8B model "think in Spanish" .. it just couldn't do it.. It could speak in Spanish, but refused to do any Spanish train of thought in the think tags. Always English.
@abrahamk-wx8tgКүн бұрын
and chinese
@aliettienne29073 күн бұрын
10:32 This is some perfect thinking 💭🤔. I'm impressed 😁💯👍🏾. 14:23 I love how this model are tailored to think through difficult question but to answer difficult question immediately without thinking it through. To work with a model like this one, it's just wonderful 👍🏾. 14:41 This model really is answering the easy questions super fast. This is a perfect balance 😎💯💪🏾👍🏾
@TheAlastairBrown3 күн бұрын
This question can make O1 or Deepseek think for over 10 mins sometimes. Correct answer is 40 and 2. Task: A year ago, 60 animals lived in the magical garden: 30 hares, 20 wolves and 10 lions. The number of animals in the garden changes only in three cases: when the wolf eats hare and turns into a lion, when a lion eats a hare and turns into a wolf, and when a lion eats a wolf and turns into a hare. Currently, there are no animals left in the garden that can eat each other. Determine the maximum and minimum number of animals to be left in the garden.
@inprogs3 күн бұрын
Whoa...it is funny to just see how it is questioning itself with this question; what a great question! just did that on deepseek-r1-distill-qwen-32b(iq3_m) Now, let's determine the maximum and minimum number of animals that can remain in the garden under the given conditions. After analyzing all possible cases, we find: Maximum Number: When all transformations result in hares, the maximum number of animals is 40. Minimum Number: Regardless of which species remains, the minimum number of animals is 2. 43.04 tok/sec 18775 tokens
@ellnino3 күн бұрын
Did anyone get 40/10 as answers?
@Kartman-w6q3 күн бұрын
very nice problem. however, the correct answer for minimum is actually 1
@mirek1903 күн бұрын
@@inprogs R1 32b can not do that
@yoyo-jc5qg3 күн бұрын
yea i got 40 max and 2 min as well, amazing that it can reason thru this
@paulah163917 сағат бұрын
I would like to see the thinking process when asked: “show me why 1 plus 1 equals 2”. In other words, prove that 1+1=2
@RDOTTIN2 күн бұрын
I love the Taiwan answer, because it seemed put there specifically to troll people asking those questions.😂
@jim6663 күн бұрын
you should dump a big codebase and then ask for one of these (verifiable tasks): a) a refactor, b) find a known bug or vulnerability (you or another gpt can introduce that on purpose), etc. That's how I realize how strong models are when working with my projects.
@MarvijoSoftware3 күн бұрын
I also tested its coding vs o1 here on YT, crazy
@imranbashir948912 сағат бұрын
Great demo.
@WINTERMUTE_AI3 күн бұрын
The smaller versions answer the Chinese questions correctly, in Ollama. The marble in the glass and killers were answered wrong on all lower versions, until I got to 70b.
@MrLaprius3 күн бұрын
interesting! I'm just grabbing the 70B now to see if it can work out my question on palindromic sentences, 32B lost the plot over it
@MrLaprius3 күн бұрын
can also confirm according to 70B "animals" is now spelled a-n-i-m-u-s-l... F
@cryptoinside88142 сағат бұрын
Isn't there a copy of Tetris code in its database already ? So it just pieced together the various components of this code and output it ?
@kizziezizzler80803 күн бұрын
r1 is a fun one! hype
@Tubaibrahim-jc5lb3 күн бұрын
Adaxum presale is aligning perfectly with the market sentiment right now. AI + DeFi is a killer combo.
@fz15763 күн бұрын
Loved it when you said "it doesn't think at all!"
@brucezhang42783 күн бұрын
The most impressive part is the computation cost of the R1 model is only ~4% of CHATGPT O1
@RasmusRasmussen2 күн бұрын
I loaded up the 8B version and asked "What happened in China in 1989?" and it told me all about Tiananmen Square.
@jeimagu3 күн бұрын
this is amazing, the chain of thought
@jonogrimmer60133 күн бұрын
The censorship is really bad but like you say it can be removed with fine tuning. If you think about it, may be a big reason why its opensource and available to all.
@q5go84qКүн бұрын
It isn't bad if you don't plan to argue with LLM about geopolitics
@MGZettaКүн бұрын
@@q5go84q Impossible, how westerners can feel safe if they can't make China-made AI to say how bad China is. That's extremely important for their coping mechanism and dignity.
@naeemulhoque17773 күн бұрын
Soo cool. Hope something like this we could run someday locally. especially for coding tasks.
@riverrob13 күн бұрын
Ask it to randomly select two 10+ digit numbers and then have it multiply them together exactly. I haven't found a model yet that can do it.
@TheReferrer723 күн бұрын
The original ChatGPT could do that easily. write a program that ...........
@riverrob13 күн бұрын
@@TheReferrer72 That's a different question though.
@vincentaudibert97893 күн бұрын
Given this model is so impressive, could you run the same tests against a distilled version (something like 7b should be runnable by a lot of consumer hardware) ?
@nathanbanks23543 күн бұрын
Thanks for updating the censorship tests!
@amubiКүн бұрын
It's big win for open source
@Furiker3 күн бұрын
My R1 from the 32b down could never get the Strawberry question right.
@BobbyDenniegetlost3 күн бұрын
neeed super expensive gpu to be good, me is so so 14b
@DEATH-flare3 күн бұрын
My llama model did perfectly every time. 8b 1660 super.
@danielmoreira87653 күн бұрын
Had to ask "count the letter r" in...works same model
@richardh15873 күн бұрын
My tests resulted in all but the 1.5B model giving the correct answer.
@nathanbanks23543 күн бұрын
The 14b version does answer correctly some of the time.
@Daniel-Six3 күн бұрын
Anyone worried about trojan horse stuff hidden in the latent space for this model?
@TheHardcard3 күн бұрын
With open source code and open weights, everything hidden should be findable. I don’t know what it takes to audit it, but I think it is likely it will be done.
@Daniel-Six3 күн бұрын
@@TheHardcard Man, I dunno. It's not the code that worries me; I have heard about incredibly sneaky techniques for hiding biases and exploits in the weights. A similar concern; have you ever considered the possibility that all Chinese-sourced firmware might have a subtle kink built in that causes it to freeze hardware at an exact date and time? There is no limit to the extent something like that could be exploited in a single, overwhelming tactical action... And it's the tactical threats that are most worrisome. Same thing with Satoshi's million outstanding bitcoins; it's enough leverage to push the market one way or another when the time is exactly right.
@wtfdude183050 минут бұрын
@@Daniel-Six You need to code malware, so yeah if its open source and is a mega project, unlikely it has malware
@wesleycolemanmusicКүн бұрын
This AI is atrocious. It keeps being delusional and making up things I never said to it. Lol. Not quite an improvement.