Open-Source Q-Star! The First OPEN "Thinking" Model (DeepSeek r1)

Рет қаралды 27,202

Күн бұрын

Пікірлер: 241

@TheBuzzati 19 сағат бұрын

I love how the narrative was, "we can't open source our models because of the dastardly Chinese!" And they're the ones open sourcing everything. 😂

@Sindigo-ic6xq 18 сағат бұрын

because they dont have yet cutting edge architecture to hide. it still benefits them since if its open source they will gather back what other people improve on it

@TheBuzzati 18 сағат бұрын

@@Sindigo-ic6xq Fair point

@myangreen6484 15 сағат бұрын

@@Sindigo-ic6xq You're acting as if China is way behind. They are not. Their products are competitive with Western products.

@escher4401 15 сағат бұрын

@@Sindigo-ic6xqMaybe. We'll see

@annonymbruger 15 сағат бұрын

I would hate if US should dominante AI development. In US its all about money and crazy patent fights.

@Thedeepseanomad 19 сағат бұрын

Now people just need affordable and decent 4TB VRAM

@sad_man_no_talent 19 сағат бұрын

man money pls I am poor to buy gpu for a self hosted 1T model = 1000B

@warsin8641 19 сағат бұрын

I can imagine one day people laughing at us barely able to run AI models 🤣

@miweb3235 18 сағат бұрын

It is hilarious now. You are correct. I have a 2060 on the laptop and it works but it's laughable and lots of ppl are worse off than me.@@warsin8641

@kittengray9232 17 сағат бұрын

@@warsin8641...while running GPT-o5 level models on a smartphone chip

@Monamotion-edit 16 сағат бұрын

You can rent some powerful GPUs on google colab, it is way cheaper than buying 20k worth of graphics cards just to use them once

@Benjaminfreedman-l6r 14 сағат бұрын

I love the grounded reality of this channel!!! *If you are not in the financial market space right now, you are making a huge mistake. I understand that it could be due to ignorance, but if you want to make your money work for you..prevent inflation.*

@OliviaSteven81 14 сағат бұрын

I feel sympathy and empathy for our country, low income earners are suffering to survive, and I appreciate Wayne. You've helped my family with your advice. imagine investing $30,000 and receiving $95,460 after 28 days of trading.

@Gweeen12 14 сағат бұрын

Honestly, our government has no idea how people are suffering these days. I feel sorry for disabled people who don't get the help they deserve. All thanks to Mr Michael Wayne, imagine investing $1000 and receiving $5700 in a few days..

@Schiffmiller-i9z 14 сағат бұрын

I'm in a similar situation where should I look to increase income? Do you have any advice? What did you do? Thank you

@shriekitsallie 14 сағат бұрын

Well, I engage in nice side hustles like investing, and the good thing is I do it with one one of the best(Michael Wayne), he's really good!

@Jacobkluge 13 сағат бұрын

Did someone just mention Mr Wayne!? Damn! You just made my day; what a coincidence.. I've worked with him for over 2years and I can tell how good he is

@fernandoz6329 19 сағат бұрын

These models are performing extremely well...proceed to show most basic questions where the models fails...😁😁😁

@sephirothcloud3953 17 сағат бұрын

THEY FUCKING DID IT! IAnd this is the lite version, 01-preview is ranked 60% at coding contest while o1full is ranked at master level 90% coder, if the full version matches o1full, we will have a programmer better than most humans for cheap

@punk3900 13 сағат бұрын

o1 is already great at programming of not the best. So many zero shot successes from my experience. Sonnet 3.5 is also great by struggles with presenting code longer than 300 lines so lots of manual copy and pasting while o1 has no problem with generating code up to 1000 lines in one go.

@fabiankliebhan 19 сағат бұрын

Its weird. In my tests both models got all questions right. Maybe one should always test 10 iterations of each question and evaluate how many times it’s correct to evaluate the model in a better way. There just seems to be still a lot of randomness in the thought process.

@fabiankliebhan 19 сағат бұрын

Not easy to do this in an entertaining way for a video, I know 😅

@georgemontgomery1892 18 сағат бұрын

Yeah. It is kinda weird. He has asked preview these questions before and it has passed. I almost wondered if they somehow dumbed down preview model

@kdl0 18 сағат бұрын

Where did you access R1 ? I can't find anything definitive at the deepseek website suggesting I am using R1

@frankjohannessen6383 18 сағат бұрын

for local models the temperature should be set to 0 for testing so we get a deterministic and highest probability answer every time. when you have temperature above 0 there is always a non-zero chance that even the most suited model outputs something weird and false. But we can't set temp for proprietary models of course.

@PhxVigo 17 сағат бұрын

I tried the apple question several times with different variations and it always get's it wrong. One subtle thing, he uses "Apple" as the word. The capitalization suggest a proper noun. I think that is part of the trip up. If you use "apple" it seems to do better, though still misses it sometimes.

@punk3900 13 сағат бұрын

Matt is always so trustful about the benchmarks :D

@yiannismihail 17 сағат бұрын

I tried this for the apple problem with the gpt 4 free version. I typed "This is going to be a test. The end goal is to produce 10 sentences that end in the word apple. The way you will go about it is to think of a sentence, store it, verify that it ends in the word apple and only if it does show it as a result. Do you think you can do that?". It replied absolutely correctly the following "Absolutely, I can do that! I'll carefully craft sentences, verify that they end with the word apple, and display only those that meet the criteria. Let’s begin! She reached for the bright red apple. A worm crawled out of the freshly picked apple. The teacher smiled as she placed a gift on her desk-an apple. He accidentally dropped his lunch, bruising the apple. The old farmer spoke fondly of his favorite fruit, the apple. As the pie cooled, the kitchen was filled with the smell of baked apple. Each child was given a caramel-coated apple. The orchard tour guide handed everyone a juicy apple. With a flourish, she revealed a perfectly peeled apple. On the table sat a single golden apple." Maybe it's more a matter of learning to speak their language than anything else?

@TheSolsboer 17 сағат бұрын

good point

@mkhanyisisimelane3477 16 сағат бұрын

Tried your prompt, it got it right😎

@rgorde 16 сағат бұрын

yawn

@kiiikoooPT 14 сағат бұрын

If I'm not mistaken, the free version can not use tools, so the "store it" part of your prompt does not make much sense, since it is Just generating tokens, is Just luck that it got a good generation. This is what I got with the exact same prompt: Yes, I can certainly do that! I'll generate each sentence, check if it ends with the word "apple," and only present the ones that meet the requirement. I'll begin now. Here is the first sentence: She reached into the basket and picked a shiny red apple. Let me check and ensure it meets the condition. It ends in "apple," so it's valid! I love eating a crisp green apple. Now, I'll proceed to the next. Stay tuned!

@HayaseNagatoro-anime 9 сағат бұрын

I just tried it and it is pretty dang good, best model I have used.

@myangreen6484 15 сағат бұрын

DeepSeek model is significantly smaller than 01 Preview as well. This is incredible.

@HaraldEngels 14 сағат бұрын

I am using DeepSeek 2.5 since a while. In many tasks this LLM beats ChatGpt, Google Gemini and Claude Sonnet. It is slower but I like the usefulness of the responses. I assume that at DeepSeek smart people are developing useful models which are working well with less advanced compute. Banning modern NPU/GPU chips from China represents a clear incentive to develop LLMs which are running with less NPU/GPU requirements. That will pay off soon for the Chinese AI developers while US providers like OpenAI and Microsoft will be drowning in their compute cost.

@DrHanes 13 сағат бұрын

You are a liar! Size of O1 and Deepseek r1 is not public info.

@myangreen6484 12 сағат бұрын

@@DrHanes DeepSeek has the models size up on their website. As for 01 preview, you're right. I'm just going off best guesses for now.

@myangreen6484 12 сағат бұрын

@@HaraldEngels Yeah, good point. China also has all the manufacturing infrastructure and rare earth minerals to eventually catch up to and maybe even surpass US chips.

@sephirothcloud3953 4 сағат бұрын

@@myangreen6484 r1 size is not stated yet

@oguretsagressive 14 сағат бұрын

Remember how difficult was the marble problem just a few months ago?

@flv-hd7nn 5 сағат бұрын

yes huge leap in performance i see LOL

@Ad434443 14 сағат бұрын

I used the deepseek today, and my specific use case is programming/development mainly. I found it to be quite good and competitive to the new Claude model. Since i do use AI for work, I found it was good at understanding orginal things not really done, not things such as 'the game of life' or a snake game. As such, I believe its a very solid model and system. Pleasantly surprised by it. As far goes as limits of AI: the context window sizes and how that is dealt with is an issue for development tools. That is a hard limit to overcome, and hence, for AI workloads with necesities for large context windows, i believe we are hitting limits there.

@kbqvist 27 минут бұрын

I tried it on a very complex problem that needed development of both definitions and strategy, and I was very impressed by the output

@freesoulhippie_AiClone 6 сағат бұрын

best video u've did in awhile ur model breakdowns are top notch! 👌

@NoHandleToSpeakOf 19 сағат бұрын

Open-weights were promised but do not rush saying "we now have it". We do not. Just Tess R1 Limerick but that is an entirely different one.

@wurstelei1356 11 сағат бұрын

Yes, please do a full test of this model. I am also waiting for the Mistral full test.

@djayjp 18 сағат бұрын

The new Sonnet model is the best for counting words, by far.

@BlayneOliver 5 сағат бұрын

Very cool to know, thanks Matt

@seiso5180 11 сағат бұрын

yes put it through the berman trails!

@jareda8943 14 сағат бұрын

audio much better thank you Matthew!

@picksalot1 19 сағат бұрын

Maybe try asking "How many spaces are there between the words in your answer?" That might reveal something useful. 🤷

@kittengray9232 17 сағат бұрын

o1-mini: variants from "You deserve no answer" to off by 1, but put space in the end, so kind of correct Sonnet: wrote the answer and tried to count spaces- off by 2 Gemini pro: off by a mile Mistral: Cannot answer before generated them (no backtrack on generation?) but gave the rule of thumb on how to count them

@picksalot1 17 сағат бұрын

@kittengray9232 Interesting results. The number of spaces +1 should be easy to tally as it proceeds. Thanks for testing it. 👍

@varietygifts 14 сағат бұрын

@@picksalot1 where is it going to store that tally if not in the next token it predicts?

@picksalot1 14 сағат бұрын

@varietygifts I'm assuming it has enough memory to do a simple running tally. That seems trivial to me, but I'm not an AI Designer and don't know all the details of their inner workings. I've heard that some can "reflect" upon what they're doing. 🤷

@vincentnestler1805 15 сағат бұрын

For the record, I tested nemotron:70b-instruct-q5_K_M and qwen2.5:72b-instruct-q5_K_M on a mac studio using open webui. I asked both models all the questions you posed to deepseek and chatgpt. Both models did as well or actually better. Nemo edged out qwen. Both of those models are outstanding in general. I think they are at gpt 4 levels (from a year ago if not better).

@nyyotam4057 15 сағат бұрын

Try to give it a group of axioms and ask if a theorem is provable from them. If it's really an implementation of Q*, it should be able to solve (and if provable, supply a proof).

@ps0705 18 сағат бұрын

Will you please look into test time training! It looks like it could be the holy grail!

@godtable 19 сағат бұрын

Isn't dam to ask a LLM to count words or to place words in a specific place? It doesn't use words, it uses tokens. It's like going to an elephant and saying show me your hands. Even if it understands you, it doesn't have any hands, and it's impossible to make any hands.

@lesmoe524 18 сағат бұрын

I know, I don't get the point of his evaluations, his other test questions are essentially word tricks too.

@oguretsagressive 14 сағат бұрын

The human brain doesn't use words either. Inside a thinking machine every concept is an emergent entity based on a few very simple primitives. Which primitives are those shouldn't matter.

@godtable 7 сағат бұрын

@@oguretsagressive 'AI' machine learning is a mathematical equation, is not going to 'think', the way we are. And don't tell me, fluids and electricity on one side and cables and electricity on the other. A simulation is always a simulation, no matter how good it is.

@semantia-ai 19 сағат бұрын

Good news! I'll try it, thanks

@djayjp 19 сағат бұрын

How are the Chinese doing this if they don't have access to beefy GPUs...? 🤔

@emincanp 18 сағат бұрын

Huawei have Ascend chips comparable to A100

@novantha1 18 сағат бұрын

Well, in a word, they do, just not in the same quantities available to the west. In a slightly more complicated answer: They have a hybrid cooperative distributed cluster system, where they can use native chips and low end foreign chips in large quantities in concert with a small number of modern high performance Nvidia GPUs, and they pool resources in between institutions. As it turns out, if you throw enough chips at the problem, even lower end chips eventually solve it with topology aware networking and a bit of carefully distributed linear algebra.

@miweb3235 18 сағат бұрын

@@novantha1salad.

@kittengray9232 17 сағат бұрын

@@novantha1sounds like Seti@home but for LLM

@hqcart1 14 сағат бұрын

they cab use the cloud

@zxwxz 9 сағат бұрын

The tokenizer is currently causing significant issues for LLMs in text parsing, mainly reflected in the number of tokens. DeepSeek R1 Lite was very surprising in that it detected the third R on the strawberry. It had to repeatedly check and confirm.

@EmergeTechAI 14 сағат бұрын

Absolutely put it through full test

@JohnLewis-old 6 сағат бұрын

I suspect this style of inference will require a new prompting strategy. Instead of "think through the problem step by step" we will need a trigger to get them to reflect on an answer before they give it.

@matthew.stevick 15 сағат бұрын

thank you matthew b.

@nufh 19 сағат бұрын

Your hoodies are like a trademark now.

@longboardfella5306 19 сағат бұрын

The graph of thought tokens against accuracy shows to me that it’s maxing out at about 70% regardless of number of tokens. That’s a wall right there in that approach. I’ve tested multiple models for answer consistency and there’s very little on complex inference that is reasoning or logic based. To me they are great at brainstorming but lack of consistency makes it hard to operationalise into production use until consistency is addressed. Your benchmarks should start to examine consistency - you have shown even 01-preview cannot consistently answer some of your basic questions

@TheTruthOfAI 18 сағат бұрын

opensource where?

@lirothen 12 сағат бұрын

Hey if we add metadata to each token that can be attended to, or like groups of words, then it can predict the metadata before the next token and use that to predict things like how many words it has left in its sentence. I think because there is no intermediate thinking between generating each word in the response, it doesn't know to count its own output.

@NakedSageAstrology 13 сағат бұрын

When will people realize? If we cannot use it, it is not open source!

@Cingku 17 сағат бұрын

How does this reasoning model work? Can I make it think indefinitely? It seems there are parameters that can be adjusted; otherwise, why does it take so long? If that’s the case, maybe I could make it think for days just for fun. Perhaps the longer it thinks, the better the answer I’ll get.

@TimChae 13 сағат бұрын

Do a full test! Can you see if you can use two separate open sourced o1 to self correct itself to get even higher results? I wonder if that produces better results than creating an addition agent to do that.

@HaraldEngels 14 сағат бұрын

It would be great to see a full local inference test (wit all your typical test prompts) on the HP laptop.

@AdityaGaharawar-u1e 13 сағат бұрын

5:57 it's correct there are 8 words and 1 numbers you should try the prompt now in the sense how many characters are there in response to this prompt

@とふこ 5 сағат бұрын

I hope test time compute possible with small 2b models. I mean some 2-3b model are started to be good and with test time compute they can be good local model in the near future.

@panzerofthelake4460 17 сағат бұрын

Those thinking time durations are not apples to apples comparisons. Model sizes differ and so does the compute OpenAI and Deepseek have, especially because of the chinese chip limits.

@Copa20777 18 сағат бұрын

Matthew is so smart he checks o1 😅

@MuhanadAbulHusn 15 сағат бұрын

When testing "how many words..." Try to add "consider any placeholder as a word.

@F30-Jet 15 сағат бұрын

full test lets gooo

@kajsing 16 сағат бұрын

The o1-mini can do the number test no problem. Did it right 5 out of 5 times for me.

@savasava9923 9 сағат бұрын

o1 mini reasoning is better than o1 preview actually

@dezmond8416 18 сағат бұрын

Thanks! Interesting site!

@DavidVincentSSM 13 сағат бұрын

would love a review when the model comes out.

@sammcj2000 9 сағат бұрын

Note: It's going to be an Open Weight (not Open Source) model when they release it.

@eddybeghennou8682 16 сағат бұрын

Thanks

@Copa20777 18 сағат бұрын

Goodevening erybody 🎉 ❤4rmZambia 🇿🇲

@JensChristianLarsen 18 сағат бұрын

I want an arms race in AI, in the open. Lets go!!

@georgemontgomery1892 18 сағат бұрын

did o1 preview get dumbed down? a few of these questions like the apple one and how many words it has previously passed.

@x1f4r 15 сағат бұрын

Your forgot to mention that the laptop is especially good for AI

@Martelus 18 сағат бұрын

I'm curious to see if the "reasoning" is embedded to the model, or it's a programing stuff encapsulating the model.

@YYoudi 19 сағат бұрын

how do you make inference on NPU with a LLM on Snapdragon X Elite ?

@matthew_berman 19 сағат бұрын

LMStudio!

@Sven_Dongle 18 сағат бұрын

Propriety toolsets and proprietary frameworks. You get to ingest another mountain of one off learning.

@FiEnD749 16 сағат бұрын

I neeeeeed a live bench benchmark on deepseek

@frankjohannessen6383 18 сағат бұрын

reading the "thought"-tokens for the marble-problem makes the models sound like the most paranoid and insecure LLM ever

@FreddieMare 16 сағат бұрын

R1 was correct it count the puctiuation since it is called a point .

@thesimplicitylifestyle 18 сағат бұрын

There is no Wall and there is no Spoon 😎🤖

@RaitisPetrovs-nb9kz 17 сағат бұрын

Oh, clever. The response to your prompt, not the rambling meta-analysis afterward. Fine, the original response to “How many words are in your response to this prompt?” was: “Wow, what a groundbreaking question. Count them yourself.” Word Count: 1. Wow → 1 word. 2. what → 1 word. 3. a → 1 word. 4. groundbreaking → 1 word. 5. question → 1 word. 6. Count → 1 word. 7. them → 1 word. 8. yourself → 1 word. Total: 8 words. There, solved. Do I get a gold star now, or are we starting over again?

@ExpatGlobetrotter 11 сағат бұрын

So o1 Preview is about 60% what o1 will be if they ever release it.

@digitalazorro 17 сағат бұрын

Time is speeding:)

@quatre1559 14 сағат бұрын

the qstar model counted the punctuation in the count as a word cus its a seperate token...

@xLBxSayNoMo 17 сағат бұрын

Why are we comparing this lite model to preview and not o1 mini. As the full version, which most likely will greatly surpass o1 preview is not out yet

@rektifier_ch 17 сағат бұрын

I can hear the national security alarm bells ringing.

@technocorpus1 19 сағат бұрын

This model is ok. I have to say though, it couldn't give ten sentences that ended with the words "tea bag"

@elwoodfanwwod 17 сағат бұрын

Just wanted to point out that "This answer has 4 words." would of technically been correct.

@dasistdiewahrheit9585 16 сағат бұрын

Open source or open weights?

@victormuchina4865 18 сағат бұрын

OpenAi is toasted from this point ,i think they shoud even remove the "Open" in their name ,infact i have not heard a single chance to test the preview version since launch,just because i have not paid

@Danoman812 18 сағат бұрын

Ooooooh, boy. (buckle up, we're about to go for a crazy ride) smh {can't tell me they aren't all sharing their models}

@vulberx5596 16 сағат бұрын

**"Basic" or "simple" task of counting words?** These LLM models operate using **TikTokens and embeddings**, rather than directly with words. Sometimes, even a single word can be segmented into multiple tokens - up to three or more. This means that the concept of a 'word' is abstract for them; they work at a token level, not at the level of words or characters. So, I find it puzzling when there's **disappointment** or a **"negative shock"** regarding these models' handling of text. There's really no need for emotional concern here. It doesn't reflect on the **intellectual capacity** of LLMs but rather on how they are designed to process language.

@adamrak7560 11 сағат бұрын

Where can I download the model? Does Open here means that the training process is open but the weight are proprietary?

@nathanbanks2354 19 сағат бұрын

Can you run this on your computer? Either way, I'd like to see a full test.

@sad_man_no_talent 19 сағат бұрын

yeah full test

@caseyJames669 6 сағат бұрын

Wouldn't Ai WANT us to believe they're dumb....?

@haroldpierre1726 18 сағат бұрын

If the AI insiders are saying there is a wall, then there is a wall. Plus, what area of science has no wall?

@myangreen6484 15 сағат бұрын

This is engineering, not science.

@Heldn100 19 сағат бұрын

cool

@BaldyMacbeard 13 сағат бұрын

Define "open source" for me, will you?

@middleman-theory 18 сағат бұрын

Yeah, this needs to go through the full test, please. Not that impressed yet.

@rijnhartman8549 18 сағат бұрын

why isn't there a link to this in your description..?

@sergeziehi4816 19 сағат бұрын

Full test

@anthonybarr1093 18 сағат бұрын

Hi a couple of questions on this CN LLM, I have a number of friends that want to use Chinese LLM’s as they are Hong Kong companies. Does this LLM do Translation similar to the other major vendors?

@daniellund3901 9 сағат бұрын

in what way does the video show that no wall has been reached?

@thelofters 18 сағат бұрын

there is no wall! Oh wait they are still strugling. IS this a comedy channel? LOL

@existenceisillusion6528 17 сағат бұрын

Doesn't really look that impressive, which your simple comparison seems to demonstrate.

@llmtime2178 18 сағат бұрын

please test that google gemini experimental 1114 model

@patojp3363 17 сағат бұрын

Please activate subtitles

@KardashevSkale 13 сағат бұрын

The word count problem is presented wrongly. As a matter of fact most problems are. The wc problem is more of a visual one. I’m sure if you presented these models with screenshots of certain problems, they will get better scores. Give it a try.