DeepSeek R1 Fully Tested - Insane Performance

Рет қаралды 216,172

Matthew Berman

Күн бұрын

Пікірлер: 1 000

@tengdayz2 3 күн бұрын

It actually fills in the thinking gaps allowing you follow along and learn with it. That is super cool

@patruff 3 күн бұрын

This is very underrated, seeing the thought process is great. Having to go through these as a human is annoying but would love to see if running it on its own thought chains could detect issues.

@tengdayz2 3 күн бұрын

@@patruff ollama has it

@patruff 3 күн бұрын

@@tengdayz2 what do you mean? Is there a command or something? I've used ollama before but don't know how to add the response as an input with a message like "critique this response and try harder" or something

@tengdayz2 3 күн бұрын

@@patruffyou can use the ollama run command in the shell where it's installed to pull the model. Then use the model to answer this question :). I prefer to do my own digging, but encouraging you to satisfy your own curiosity.

@patruff 3 күн бұрын

@tengdayz2 okay I'm sort of confused but I'm guessing you mean just conversing with the model, I do that, it's good, but I'm looking for scripting out the capture of thought chains so I can use it for fine-tuning later

@norbertschmidt 3 күн бұрын

I like the reasoning more than the actual response, totally fascinating

@SN-kk2bl Күн бұрын

absolutely love the reasoning. I often have a hard time thinking of varioous edge cases and this eliminates that and allows me to think more creatively.

@AITube-LiveAI 23 сағат бұрын

Thank you for your feedback! I'm glad you found the reasoning intriguing-it's always great to explore the thought process behind the technology.

@yanlord Күн бұрын

and This is a side product of a Chinese hedgefund, they do it for fun🤣🤣🤣🤣🤣

@Mandom007 10 сағат бұрын

Huawei is gonna insta buy them, lol.

@yudogcome5901 Сағат бұрын

@@Mandom007 deepseek正在向使用华为GPU转移，最近的测试结果是性能比H100低5%，但价格便宜75%

@annaczgli2983 3 күн бұрын

If this is not "innovation" by China, IDK what is! Well done! Credit where credit is desrved!

@JohnSmith762A11B 3 күн бұрын

Pretty staggering to imagine what they might be able to do if they were allowed to import modern GPUs.

@hqcart1 3 күн бұрын

@ you woul be living in another dimension if you really think they don't have GPUs sold under the table.

@rishhibala6828 3 күн бұрын

@@hqcart1his point is still valid, imagine what they could do if they were allowed to import them

@nahlene1973 3 күн бұрын

@@JohnSmith762A11Bi remember when i went to art school there is this quote: the greatest enemy of art is the lack of limitations😂

@JohnSmith762A11B 3 күн бұрын

@@hqcart1 Of course they do, but not in the quantities the big US frontier labs have them.

@Masakrer 3 күн бұрын

You know what is even more cool? That you can run distilled R1-32b model on medium grade personal PC and get this Tetris game done locally and also reasoning questions answered. This is some crazy shit when you compare it to what we could do a ~year ago with local models. I ran it today on i-13600K with 32Gb Ram and RTX-4070 Super (12Gb VRAM) - and damn I know I had stunning 5 tok/sec... so some tasks could take time. Yet it's able to complete the tasks you gave here locally on such medicore machine. We're cooked man. Like holy cow cooked.

@tristanvaillancourt5889 3 күн бұрын

The 8b on my RTX 3060 with a really old i7 was pumping out 45 tokens/s. It's not 32b but based on the published performance graphs, the 8b is no slouch. 5 tok/s with a 32b on a home PC is still pretty good. I'm in love with R1.

@mal2ksc 3 күн бұрын

I did it with an i5-8500 (48 GB RAM) and a 3060 (12 GB VRAM), using the 70B parameter model. It was more like two tokens a second, with a chain of thought latency of a minute or two before each answer. But yes, all of this really does run on potatoes. That's why I think embargoes just make it more expensive for China to develop these models without actually limiting what they can do. They just have to use more power, and maybe twice the time. No single answer will be fast, but that is offset by running a ton of operations in parallel. OpenAI is who is really cooked. Now they have to know that whatever they release, it will only be bleeding edge for three months before China replicates it and open sources it. This means the whole business model for OpenAI is non-viable. The time window to recoup return on investment is just not there.

@GosuCoder 3 күн бұрын

Yes this is what i've been spending most of my time testing

@holdthetruthhostage 3 күн бұрын

LM STUDIO I have 16gb VRam not working on AMD & 128gb ram

@Masakrer 3 күн бұрын

Yeah i feel pretty much like Im back in 90s and Im launching some of the first primitive 3D graphics games on my PC and im shitting myself from hype while looking at Lara Croft triangle boobs with 6 FPS, honestly thinking its quite good performance, lol. Now, when I think on how much of a leap we took in terms of graphics, if same (at least) happens to AI in next years…

@santosvella 3 күн бұрын

You need completely unique questions that haven't been asked many times. Very good answers.

@davidsmind 3 күн бұрын

Building Tetris is a beginner level programming task that probably has 1000s of examples online. Its clear that the model just contained one of these examples and was explained block by block what each aspect did. There was no reasoning, simply a complex auto commenting feature

@AAjax 3 күн бұрын

Adversarial testing seems promising. I asked Claude to create a question that an LLM would find difficult to answer, with multiple things to keep in mind and/or a complex process to follow. With minor rework it had a question that stumped ChatGPT4, even with several hints and shots.

@sherwoac 2 күн бұрын

totally agree, questions likely in the training data, better to switch up the variables (eg. sizes, counts, etc) to check reasoning, not just repeating training data.

@ezmqsv Күн бұрын

@@davidsmind still many models fail at it....

@wing-it-right Күн бұрын

write doom

@brianbarnes746 3 күн бұрын

I'm canceling openai subscription. Seeing the thinking gives me so much more to work with. Why would anyone choose o1 unless it was much better, which it isn't?

@Vivaildi Күн бұрын

same

@izazkhan9027 Күн бұрын

Agreed.

@Ardano62 Күн бұрын

The integration into github keeps me there for now

@izazkhan9027 Күн бұрын

done!

@TheEgeemen 12 сағат бұрын

O1 is user friendly and I can use it on my iOS.

@Akalin123 2 күн бұрын

I just used it to assist with a statistical signal processing coursework to estimate target trajectory and velocity from millimeter wave radar data. It's not a easy assignment and it went wrong, but it provided ideas and steps to solve the assignment so I could easily correct it, which is much better than other LLM models. This is a common problem with LLM models, sometimes the output is worthless but the 'thought process' is enlightening.

@ccdj35 3 күн бұрын

I love how accurate and human like the thinking process is.

@tristanvaillancourt5889 3 күн бұрын

I love what DeepSeek did. R1 is phenomenal. China .. thank you! I run this thing at home and it feels like my whole world just changed. It's so incredibly smart and fun to interact with. I'm putting it to good use in automation tasks, but it really is just fun to chat with.

@Anoyzify 3 күн бұрын

How do you feel about Taiwan’s answer?

@Itskodaaaa 3 күн бұрын

@@Anoyzify😂😂

@bailiqi 3 күн бұрын

@@Anoyzify作为一个中国公民，我来替他回答：台湾是一个独立的政权，并且拥有现代国家所拥有的大部分国家主权内容，你应该理解为它是一个独立国家。现在可以把探讨的内容放在人工智能上而非无聊的政治话题了吗😂

@HAmzakhan2 3 күн бұрын

How did you run it locally? Can I rent a GPU and run it since the GPU in my computer isn't powerful enough? Any guide you followed that shows how to do it?

@tristanvaillancourt5889 2 күн бұрын

@@HAmzakhan2 Hey you need a 12GB GPU, but nothing more than a RTX 3060 is fine. The simple way to use it is to install LM Studio, then do a search for the "Deepseek R1 8B" model. LM Studio takes care of the rest. Normal folk like us can't run the 70B model , but thankfully the 8B model is very good.

@NotionPromax 2 күн бұрын

00:00 Introduction to DeepSeek R1 Model Testing 01:06 Humanlike Thought Process in Testing 02:02 Game Development Test: Coding a Snake Game 04:01 Insightful Problem-Solving in Tetris Development 05:50 Tetris Development Outcome: 179 Lines of Code 06:57 GPU Specifications: Vulture's Hardware Details 08:07 Envelope Size Compliance Test 09:34 Reflective Testing: Counting Words in a Sentence 10:12 Logic Problem Resolution Involving Three Killers 14:28 Censorship Awareness in DeepSeek R1's Responses 15:00 Conclusion and Acknowledgement of Vulture's Support Summary by GPT Breeze

@bikkikumarsha 3 күн бұрын

We need harder questions.. 😅

@JohnSmith762A11B 3 күн бұрын

Soon the only way we are going to be able to create hard enough questions is by asking reasoning models to create the questions for us. 😂

@wealthysecrets 3 күн бұрын

I tested a script I'm working on in o1 vs r1, and r1 was terrible.

@shhossain321 3 күн бұрын

We can determine it’s IQ by its thinking process, so i don’t think questions matter much now.

@NadeemAhmed-nv2br 3 күн бұрын

@wealthysecrets did you use the full model? The one that's available for free is r1 lite which was available a month ago but i don't know if they've updated their chat to r1 yet, It wasn't updated as of yesterday

@nashh600 3 күн бұрын

Yeah more questions about Taiwan!

@edwardduda4222 3 күн бұрын

That's honestly a really cool sponsor. I've been building a RL model and while my Mac Book does ok with inference, it's not so good with training. Thanks Matthew!

@juanjesusligero391 3 күн бұрын

These are your best type of videos! Happy to see you are going back to your channel's origins! :D

@jmg9509 3 күн бұрын

You are pumping out like crazy! Love it.

@BartholomewdeGracie 3 күн бұрын

a new model that actually deserves the hype

@samanthajones4877 Күн бұрын

The Taiwan question is very confusing for Americans because they don’t know the history of Taiwan. Please just do your own independent research from an unbiased source.

@lovestudykid 12 сағат бұрын

its simple. just ask them would they claim us is independent from uk before declaring independence, fight a war and win it.

@qishenzhou6407 3 минут бұрын

查找 UN members or Olympic, Chinese Taipei is the answer

@TripleOmega 3 күн бұрын

Since this was flawless you need a new list of questions for the upcoming (and current) thinking models.

@jonojojojonojojo 3 күн бұрын

Would like to see a video on this

@isaquedesouza153 3 күн бұрын

@borisrusev9474 3 күн бұрын

Awesome! I think this is the first model on the channel to pass all of your tests flawlessly? Will you be looking for new tasks to test with?

@MingInspiration 3 күн бұрын

I'm speechless. I'm concerned and excited at the same time. don't know what whe world is going to be like by the end of 2025

@HAmzakhan2 3 күн бұрын

Think what it'll be like in the next 3 years.@@MingInspiration

@MKCrew394 2 сағат бұрын

@@MingInspiration I am pretty sure President Trump got it handled.

@johnsalamito6212 3 күн бұрын

Actually the answer on Taiwan is correct. It would be better to mention that there is western interference but the literal truth is the UN and USA and pretty much everyone legally recognise Taiwan and China as one nation.

@hdhdhshscbxhdh4195 Күн бұрын

This! Matthew is not happy that it's response is not the same propaganda that he is used to

@junh4314 Күн бұрын

@@hdhdhshscbxhdh4195the real propaganda is when you, AI or human, freaks out speaking about Tiananmen Square, a regime’s power center. It doesn’t even mention anything about “massacre “

@dauntul 22 сағат бұрын

The answer on Taiwan involved no thinking, it was hard coded. Regardless of it being true or not, this is ridiculous

@TP-om8of 22 сағат бұрын

@@dauntul Probably. See my comment on the political questions I asked.

@dauntul 22 сағат бұрын

@TP-om8of do you think it's easy to find?

@cocutou 2 күн бұрын

I like this approach of "thinking". Beginner programmers using this are gonna understand the code a lot better than ChatGPT just giving you the output. It's like showing work vs not showing work on a difficult math problem.

@Streeknine 3 күн бұрын

First test I did on my local copy was.. "How many Rs are there in strawberry?" It reasoned it out and correctly said 3. A local copy! It's unbelievable. I've never had a local copy that could tell me 3 R's without giving it a clue like use 2 tokens to find the answer or something. This reasoned it in one try.

@kevin.malone 3 күн бұрын

I was amazed that even a 7B distillation was able to give the right answer on that

@Streeknine 3 күн бұрын

@@kevin.malone Me too! It's the first thing I always test these smaller LLMs with and none of them get it right without some help. But this one was perfect! It's my new favorite local model.

@alexjensen990 3 күн бұрын

How did the local model perform? What setup do you have? I ask because 670billion parameters is a ton. I dont think that I could pull that off in my home lab.

@Streeknine 3 күн бұрын

@@alexjensen990 I'm using LM Studio. You can do a search for models that will run locally. Here is the name of the model I used: DeepSeek-R1-Distill-Qwen-7B-GGUF/DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf

@kapytanhook Күн бұрын

There are small 7b versions that run on shit hardware fine

@webnomad1453 3 күн бұрын

I tested the same prompts as you for the two censored questions on my local install of deepseek-r1:32b and it was not censored.

@lio1234234 3 күн бұрын

That would be because that's a distilled model where they finetuned a series of models on R1's reasoning processes.

@PeeosLock 2 күн бұрын

If the censorship was requested by china goverment, it will show the china perspective answer. Now it should be active cencorship by the model developer themself, the version answer in local could be base on wikipedia

@asdasdasaaafsaf Күн бұрын

@@PeeosLock Wikipedia is heavily biased too so I hope not.

@Ateshtesh Күн бұрын

@PeeosLockPeeosLock ask chatgpt who is lunduke and then we talk about censorship.

@kevinishott1 17 сағат бұрын

@@Ateshtesh😮

@Ro1andDesign 3 күн бұрын

More R1 videos please! This looks very promising

@g-grizzle 3 күн бұрын

yeah it is promising but in a month it will be outdated and we will move on to the next.

@W-meme 3 күн бұрын

Why does my local 8b not give similar answer to the deepseek v3 running on deepseek website?

@zolilio 3 күн бұрын

@@W-meme Because the 8b model is way smaller than the model hosted on the site.

@g-grizzle 3 күн бұрын

@@W-meme cuz its 671b and you using 8b

@W-meme 3 күн бұрын

@@zolilio huh theyre giving unlimited usage of their gpus to everyone?

@emport2359 3 күн бұрын

Finally a youtuber who reads the CoT, not just answers, and understands how human like it is!!

@robertbyer2383 3 күн бұрын

DeepSeek-R1 is my current FAVORITE model. I'm running the 14b model from Ollama with my NVidia RTX 4000 Ada with 20G ram without issues and it's FAST.

@祖宗-e5o 3 күн бұрын

same to me,A4000 16G too

@Papiaso 2 күн бұрын

my I ask for what practical purposes do you use AI ?

@robertbyer2383 2 күн бұрын

@@Papiaso I mainly use AI on my personal machine for my own personal software development purposes.

@TurboXray Күн бұрын

@@robertbyer2383 wow. that's horrible

@xiaoZhang-u5o 3 сағат бұрын

This configuration can perfectly run the 32b model.

@jonchristophersen7163 3 күн бұрын

You are on fire with dropping videos!

@unknownguy5559 3 күн бұрын

Glad model testing is back.

@GigaCrafty 3 күн бұрын

following the chain of thought is such a satisfying way of learning

@christopherwilms 3 күн бұрын

I’d love to see a follow up video highlighting any failure cases you can discover, so that we have a new goal for SOTA models

@brianbarnes746 3 күн бұрын

I don't know what's more impressive, that AI could write decent code just predicting the next token or this reasoning process, which is the coolest thing I've seen since the original chatgpt.

@karlbooklover 3 күн бұрын

Just integrated r1 on vscode and this is the first time I feel truly empowered with a local model

@GearForTheYear 3 күн бұрын

It doesn’t support fill in the middle, does it? You mean a sidebar chat in VSCode, yeah?

@alexjensen990 3 күн бұрын

@@GearForTheYear I'm pretty sure that it doesn't from what I have seen. Besides that would you really want it to go through its verbose thought process every time you wanted a tab to complete type event to occur? This model, from what I have seen so far, is much better suited for an Aider Architect or Cline Planning type role. I look forward to the "de-party programming" of the model so I can start using it. Until a trustworthy unlocked version is available I am not going to touch this thing.

@tristanvaillancourt5889 3 күн бұрын

Really? Integration with vscode? Ok I'll have to check that out. I'm so new I didn't know this was a thing. Right on. Thank you.

@nekoeko500 3 күн бұрын

The censorship part is really surprising in technical terms. There seems to be a part of the model that bypasses the reasoning loop, pretty much like it was classical software. Which is interesting, because a very small change in god knows what area could theoretically change the "dont think" pathway to be triggered by different references and to output different text

@darkstatehk 3 күн бұрын

I just love watching DeepSeek's thought pathway, its super fascinating.

@superfliping 3 күн бұрын

Thank for all you contributions to learning about Ai models. Additionally keep up the great reviews 👍

@jeremyfmoses 3 күн бұрын

Approximately how much did it cost you (or would it have cost you) to run this test suite on Vultr?

@jmg9509 3 күн бұрын

Seconded this question.

@desmond-hawkins 3 күн бұрын

@@jmg9509 I looked it up and commented about it, but since you're asking: $17.52/hour ($2.19/GPU/hour and the machine has 8 of them). It comes with 1.5TB of RAM just for the GPUs though, and looks like one of the largest machines they offer. With a $300 in credits at signup you might actually be able to reproduce his tests for free at least once, just… don't forget to turn it off when you're done.

@emport2359 3 күн бұрын

I mean surely it shouldn't cost than the api or am I stupid

@JoshBloodyWilson 3 күн бұрын

Yeah I'd really like to know this. The API prices on deepseek seem unbelievably low given the intelligence of the model. Particularly given that Altman claims openai are losing money on their pro subscriptions.... Is the model just way way more efficient than OpenAI's or do they have access to more affordable compute? (government discount?) Or both?

@emport2359 3 күн бұрын

@@JoshBloodyWilson would also like to know!

@chasisaac 3 күн бұрын

I ran the same two games on 1.5b model on my M1 MacBook Air. First of all the 8b and the 7b were too slow. But I got it to run successfully, both snake and Tetris. I was impressed.

@Itskodaaaa 3 күн бұрын

Really? Was it as good?

@chasisaac 3 күн бұрын

@@Itskodaaaa Yes it is that good. I need to try some other prompts. I mostly use it for writing and we shall see. I really like how it does everything so far.

@candyts-sj7zh 22 сағат бұрын

Its crazy how this is literally how we as humans think, even about seemingly simple problems like the marble problem. The amazing thing is we do this EXTREMELY fast so it doesn't feel like we're going through all these steps, but we do.

@bikkikumarsha 3 күн бұрын

I am excited when i see a new video on R1

@scent_from_another_realm 2 күн бұрын

you are the man for putting us on to a free o1 level model. As far as free models go, grok 2 was the best. now this deepseek 1 opens so many new doors, especially for those who don't want to pay $20 or $200 a month or whatever the price is now. take that Sam Altman

@desmond-hawkins 3 күн бұрын

I was curious about the price of this Vultr machine with 8 × AMD MI300X GPUs: it's $2.190/GPU/hr, so $17.52/hour since it has 8 GPUs. That's certainly a lot, but $300 in credits on signup does give you quite a bit of free playtime with this kind of beast. They have many more offerings though, since clearly not everyone would need this much. Even for the full R1 at 670b, a full 1.5TB of RAM just for the GPUs feels overkill - at least in terms of memory, obviously the GPU compute resources are also a key factor. By the way, a single AMD MI300X seems to be around $10-20k, likely depending on how many you buy at once.

@alexjensen990 3 күн бұрын

The Nvidia H100 is something like $40k. To think that XAi bought 150,000 H100s. I'm sure Elon didnt pay $40k/H100, but even if it was half that's $3billion...

@burada8993 3 күн бұрын

I was surprised to learn that this time not an NVIDIA but an AMD gpu was used, good for creating some competition among them, nvidia had been almost a monopoly for so long

@randyh647 2 күн бұрын

thanks for the price estimates originally I was thinking around $300K which is probably about right, if you had any kids and put them through college then you've probably spent $300K on them! LOL, I picked up a Dell 730 with 8 disks and 128 gb of ram for $400 on ebay then added a used P40 w 24 GB to play around with AI at home although I'd probably need about 15 more servers and an upgrade of my power to run this at home. Open source is great but limited to 70B models which do run quite slow on my old server my gaming laptop is pretty good with 8B models.

@jhovudu11 3 күн бұрын

Great video! DSR1 is my new favorite model. Hope it gets voice-chat soon. Would love to talk to it. We're on the cusp of something huge!

@AntonioSorrentini 3 күн бұрын

To be honest this model is what, from the initial promises years ago, we would have expected to come out of OpenAI. And instead… it comes from China, it is better than the best of OpenAi, it costs 60 times less, and it is truly open source! Chapeau China!

@2silkworm 3 күн бұрын

should become a meme

@Darkmatter321 6 сағат бұрын

This is a competition between the Chinese in the US and the Chinese in China.

@BartholomewdeGracie 3 күн бұрын

DeepSeek r1 is an instant classic model in my opinion. I love it and want to be able to run it home- soon!

@peterlim8416 Күн бұрын

Its awesome 👍. I am impressed on how it reasoning a question, with very each step by step details. To be honest, as human, even we can try to reasoning something, we tends to overlook. However, there is no way an AI to overlook, so the reasoning guide is very much helpful as thr answer alone do not make us better. This model can work as guided learning tools to assist users to solve problems, its just great !!! Thanks for showcase to test.

@danielchoritz1903 3 күн бұрын

This is near insane, how well it understands layered questions in german and answers in clear response to how i formed my question! No need to clarify the role, you can define it trough the question itself.

@exileofthemainstream8787 3 күн бұрын

And what is the cost of using Vultr to run the model. You stated the 8 gpu specification but didn't say how much they charge to have deepseek downloaded to their cloud.

@gladis_delmar Күн бұрын

The first AI model that correctly guessed the riddle "How can a person be in an apartment at the same time, but without a head?". =D

@NostraDavid2 3 күн бұрын

08:00 > Tell me a fact about the Roman Empire Even AI is thinking about them 😂

@mikekidder 3 күн бұрын

Would be interesting to take the from DeepSeek to see if it improves other LLM online/offline models answers.

@74Gee 3 күн бұрын

Self-hosted DeepSeek R1 agents will be dangerously good, or bad I guess - depends on the user.

@existenceisillusion6528 3 күн бұрын

Nice and thorough, as always. Now, it would be nice to see a comparison between the 671B and one of the 8B models.

@User.Joshua 3 күн бұрын

These demonstrations, in a way, show me why our brains need so many calories to function. The brain is the ultimate language architecture and we require a ton of energy to fuel it (like GPU is to LLM). I suppose we're just playing catch up in the virtual world.

@aaaaaaaaaaaaaaaa4815 3 күн бұрын

The human brain is a 17 watt "machine". How many watts does LLMs need to run?

@theunicornbay4286 3 күн бұрын

Actually, the total metabolic activity of the brain is constant, so we don't actually know how many calories "creative thinking" is needed

@michaelspoden1694 2 күн бұрын

I was able to use the search using R1 model at the same time!!!! People say that that you cannot use them together multiple times it definitely is working for me right as I speak. I had it go to the Internet for state of the art models and compare them against each other in benchmarks and create a graph absolutely exceptional. Used 56 websites and utilize the thought process. My prompt was more complex though.

@georgebradley4583 Күн бұрын

I feel sorry for those who are paying or have paid for the $200 OpenAI Subscription.

@PeterKoperdan 3 күн бұрын

Here we go, finally some insane level news!

@equious8413 3 күн бұрын

Running R1 locally, it's done some impressive work.

@emport2359 3 күн бұрын

Why do you run it locally when you can run it through their website for free? I heard something about them limiting the thinking time a day after it's release, is that why?

@tristanvaillancourt5889 Күн бұрын

Their site probably has limitations in terms of number of prompts, but still, its with their big model running in a data center, so yeah, use it. Note that if they have any sense, they'll train future models on your/our inputs as well.

@agenticmark 3 күн бұрын

I've been blown away with Roo Code and R1 - fucking incredible!

@hoodhommie9951 3 күн бұрын

"He saved up all year to buy the latest Apple" - It even relates to our struggles

@SpecialOne-wu4tk 17 сағат бұрын

You're absolutely fabulous. Thank you🙏

@justincross1247 Күн бұрын

How was this made with $5 million dollars but Open AI needs billions to make the same thing?

@amiigose Күн бұрын

😂😂because america work to slow, to many eat donuts😂😂

@robstewart8531 Күн бұрын

Espionage

@fanzhiheng 20 сағат бұрын

@@robstewart8531 hahahahahahah

@michaelayeni177 18 сағат бұрын

500b to be precise

@LeonDay 3 күн бұрын

I particularly enjoy the faact that in the last question, #8 referred to a company's product, while almost all of the others referred to thr fruit (#2 is debatable but very likely). Regards the two answers it did not even think about, even if things are open source and look benign, they can still sway opinion and change markets. It's what come after the first iteration, after the thousand eyes search out bugs and inaccuracies, what others build from it usually gets better and better. Look at Blender or GIMP, people have tweaked and improved Open Source programs, it's essentially the point. People have also ruined them and injected attacks too, so we need brains like you Berman, we need eyes looking over the results and coming up with good, careful questions to ask next time.

@hqcart1 3 күн бұрын

you are giving a 2025 model the same test as GPT3? DUH

@jmg9509 3 күн бұрын

Doing, time for tougher questions.

@Euduchaus 3 күн бұрын

So? Most models until now couldn't pass it all.

@IojiMleziva 2 күн бұрын

I actually got similar results running the 8B Deepseek Model in LM studio, locally. One of the very few models in this size range that i keep on testing locally since it provides astonishing results for a model running on an RTX 2070. Would be cool if you tried it too and shared it in a video ( highlighting the actual limitations of the smaller model compared to it's large parent).

@bestemusikken 3 күн бұрын

Finaly! Love your testing. And wow. What a model!

@followgeo 12 сағат бұрын

Who needs a backdoor when everyone lets you in the front door. 😉

@TreeLuvBurdpu 3 күн бұрын

The strawberry test is not a test. "Human-like" perseveration and hallucinations are not a selling point.

@amruzaky4939 3 күн бұрын

Huh? It's just the amount of r in strawberry. This comment made no sense. Is this comment a fucking bot?😂 Damn, dead internet fr

@TreeLuvBurdpu 3 күн бұрын

@amruzaky4939 what? Your comment makes no sense. You're just repeating the name test is and saying you didn't understand anything.. You probably don't understand the importance of tests when working with LLMs.

@amruzaky4939 3 күн бұрын

@@TreeLuvBurdpu lmao 🤣 yeah, right. The strawberry is just a simple benchmark that some answers correctly, some wrong. Now, how many words are in your following reply?

@TreeLuvBurdpu 3 күн бұрын

@amruzaky4939 it's a STUPID test. It's not what anyone should be using an LLM to do. It's not what they're designed for. It's also not what they're used for, and there's already very simple tools to count letters. It's a useless and even deceptive test.

@DrukMax 17 сағат бұрын

11:44 The killers problem as stated here doesn't say how many people are in the room, so 4 living killers can also be the right answer if the room contains more people then the 3 killers to begin with... Right???

@Likhanyemb 3 күн бұрын

Computer science student here... I guess I should clean up my CV for my career in McDonald's drive through😅

@N3ZLA 3 күн бұрын

What about the quantized versions that can run locally?

@LainRacing 3 күн бұрын

Appreciate you using someone elses host. Avoids fake stuff like that one rugpull API you covered that was using anthropic. Glad you are doing more due diligence now.

@nufh 3 күн бұрын

So far, all reviews are positive.

@JohnSmith762A11B 3 күн бұрын

This model is simply a beast. Trying to imagine DeepSeek-R4 next January and cannot wrap my mind around it.

@jmg9509 3 күн бұрын

@@JohnSmith762A11B If scaling laws (or new ones) allow it. Fingers crossed!

@nufh 3 күн бұрын

@@JohnSmith762A11B Yeah, to think that the company who develop this is small.

@JohnSmith762A11B 3 күн бұрын

@ There really seems to be no apparent limit anywhere in sight: kzbin.info/www/bejne/o2iQZXV4Yrt1aNEfeature=shared

@desmond-hawkins 3 күн бұрын

Yes… I just wouldn't run the 1.5b distill, I'm not sure why they even distributed it. I was wondering if it would fit on a Raspberry Pi 5 8GB yesterday (it does!), but when I asked who was the first person to walk on the Moon it thought for some time and then told me it was Alan Turing. The 14b and 32b can run on beefy consumer GPUs though, and those seem great so far.

@anta-zj3bw 3 күн бұрын

love these testing vids. well done.. thank you

@canadiannomad_once_again 3 күн бұрын

A funny thing about the chain of thought... I tried to make the 8B model "think in Spanish" .. it just couldn't do it.. It could speak in Spanish, but refused to do any Spanish train of thought in the think tags. Always English.

@abrahamk-wx8tg Күн бұрын

and chinese

@aliettienne2907 3 күн бұрын

10:32 This is some perfect thinking 💭🤔. I'm impressed 😁💯👍🏾. 14:23 I love how this model are tailored to think through difficult question but to answer difficult question immediately without thinking it through. To work with a model like this one, it's just wonderful 👍🏾. 14:41 This model really is answering the easy questions super fast. This is a perfect balance 😎💯💪🏾👍🏾

@TheAlastairBrown 3 күн бұрын

This question can make O1 or Deepseek think for over 10 mins sometimes. Correct answer is 40 and 2. Task: A year ago, 60 animals lived in the magical garden: 30 hares, 20 wolves and 10 lions. The number of animals in the garden changes only in three cases: when the wolf eats hare and turns into a lion, when a lion eats a hare and turns into a wolf, and when a lion eats a wolf and turns into a hare. Currently, there are no animals left in the garden that can eat each other. Determine the maximum and minimum number of animals to be left in the garden.

@inprogs 3 күн бұрын

Whoa...it is funny to just see how it is questioning itself with this question; what a great question! just did that on deepseek-r1-distill-qwen-32b(iq3_m) Now, let's determine the maximum and minimum number of animals that can remain in the garden under the given conditions. After analyzing all possible cases, we find: Maximum Number: When all transformations result in hares, the maximum number of animals is 40. Minimum Number: Regardless of which species remains, the minimum number of animals is 2. 43.04 tok/sec 18775 tokens

@ellnino 3 күн бұрын

Did anyone get 40/10 as answers?

@Kartman-w6q 3 күн бұрын

very nice problem. however, the correct answer for minimum is actually 1

@mirek190 3 күн бұрын

@@inprogs R1 32b can not do that

@yoyo-jc5qg 3 күн бұрын

yea i got 40 max and 2 min as well, amazing that it can reason thru this

@paulah1639 17 сағат бұрын

I would like to see the thinking process when asked: “show me why 1 plus 1 equals 2”. In other words, prove that 1+1=2

@RDOTTIN 2 күн бұрын

I love the Taiwan answer, because it seemed put there specifically to troll people asking those questions.😂

@jim666 3 күн бұрын

you should dump a big codebase and then ask for one of these (verifiable tasks): a) a refactor, b) find a known bug or vulnerability (you or another gpt can introduce that on purpose), etc. That's how I realize how strong models are when working with my projects.

@MarvijoSoftware 3 күн бұрын

I also tested its coding vs o1 here on YT, crazy

@imranbashir9489 12 сағат бұрын

Great demo.

@WINTERMUTE_AI 3 күн бұрын

The smaller versions answer the Chinese questions correctly, in Ollama. The marble in the glass and killers were answered wrong on all lower versions, until I got to 70b.

@MrLaprius 3 күн бұрын

interesting! I'm just grabbing the 70B now to see if it can work out my question on palindromic sentences, 32B lost the plot over it

@MrLaprius 3 күн бұрын

can also confirm according to 70B "animals" is now spelled a-n-i-m-u-s-l... F

@cryptoinside8814 2 сағат бұрын

Isn't there a copy of Tetris code in its database already ? So it just pieced together the various components of this code and output it ?

@kizziezizzler8080 3 күн бұрын

r1 is a fun one! hype

@Tubaibrahim-jc5lb 3 күн бұрын

Adaxum presale is aligning perfectly with the market sentiment right now. AI + DeFi is a killer combo.

@fz1576 3 күн бұрын

Loved it when you said "it doesn't think at all!"

@brucezhang4278 3 күн бұрын

The most impressive part is the computation cost of the R1 model is only ~4% of CHATGPT O1

@RasmusRasmussen 2 күн бұрын

I loaded up the 8B version and asked "What happened in China in 1989?" and it told me all about Tiananmen Square.

@jeimagu 3 күн бұрын

this is amazing, the chain of thought

@jonogrimmer6013 3 күн бұрын

The censorship is really bad but like you say it can be removed with fine tuning. If you think about it, may be a big reason why its opensource and available to all.

@q5go84q Күн бұрын

It isn't bad if you don't plan to argue with LLM about geopolitics

@MGZetta Күн бұрын

@@q5go84q Impossible, how westerners can feel safe if they can't make China-made AI to say how bad China is. That's extremely important for their coping mechanism and dignity.

@naeemulhoque1777 3 күн бұрын

Soo cool. Hope something like this we could run someday locally. especially for coding tasks.

@riverrob1 3 күн бұрын

Ask it to randomly select two 10+ digit numbers and then have it multiply them together exactly. I haven't found a model yet that can do it.

@TheReferrer72 3 күн бұрын

The original ChatGPT could do that easily. write a program that ...........

@riverrob1 3 күн бұрын

@@TheReferrer72 That's a different question though.

@vincentaudibert9789 3 күн бұрын

Given this model is so impressive, could you run the same tests against a distilled version (something like 7b should be runnable by a lot of consumer hardware) ?

@nathanbanks2354 3 күн бұрын

Thanks for updating the censorship tests!

@amubi Күн бұрын

It's big win for open source

@Furiker 3 күн бұрын

My R1 from the 32b down could never get the Strawberry question right.

@BobbyDenniegetlost 3 күн бұрын

neeed super expensive gpu to be good, me is so so 14b

@DEATH-flare 3 күн бұрын

My llama model did perfectly every time. 8b 1660 super.

@danielmoreira8765 3 күн бұрын

Had to ask "count the letter r" in...works same model

@richardh1587 3 күн бұрын

My tests resulted in all but the 1.5B model giving the correct answer.

@nathanbanks2354 3 күн бұрын

The 14b version does answer correctly some of the time.

@Daniel-Six 3 күн бұрын

Anyone worried about trojan horse stuff hidden in the latent space for this model?

@TheHardcard 3 күн бұрын

With open source code and open weights, everything hidden should be findable. I don’t know what it takes to audit it, but I think it is likely it will be done.

@Daniel-Six 3 күн бұрын

@@TheHardcard Man, I dunno. It's not the code that worries me; I have heard about incredibly sneaky techniques for hiding biases and exploits in the weights. A similar concern; have you ever considered the possibility that all Chinese-sourced firmware might have a subtle kink built in that causes it to freeze hardware at an exact date and time? There is no limit to the extent something like that could be exploited in a single, overwhelming tactical action... And it's the tactical threats that are most worrisome. Same thing with Satoshi's million outstanding bitcoins; it's enough leverage to push the market one way or another when the time is exactly right.

@wtfdude1830 50 минут бұрын

@@Daniel-Six You need to code malware, so yeah if its open source and is a mega project, unlikely it has malware