OpenAI o3-mini vs DeepSeek R1 - First TESTS and Impressions

Рет қаралды 20,449

All About AI

Күн бұрын

Пікірлер: 125

@Samuelkings 10 сағат бұрын

So 03 is nothing burger? Well I guess we wait for Deepseek R2 or R3

@naj29 6 сағат бұрын

Gonna be a long wait because Deepseek cannot handle the amount of americans asking R1 about the Tiananmen Square a million times...

@kxttd6870 5 сағат бұрын

@@naj29 Why not ask CIA, CNN and BBC for that question? Because the West can not ask how many 'Real Genocide' themselves conducted?

@Samuelkings 4 сағат бұрын

@@naj29

@savllya6049 3 сағат бұрын

@@naj29 🤣

@AI_Robotics 7 сағат бұрын

At the 20 minute mark, R1 nailed a reasoning scenario. o3 failed. Almost human reasoning, AMAZING!

@noneofyourbusiness73 5 сағат бұрын

omg what a creep

@savllya6049 3 сағат бұрын

Yes that particular test was the most amazing, able to read between the lines. Bet some people may even have problem figuring that one out.

@Lallafef 6 сағат бұрын

From my personal experience, DeepSeek is still much better in terms of machine learning research coding tasks.

@kxttd6870 5 сағат бұрын

I am using it for the deep learning methods on regression to derive the SOC. A.K.A, it looks like a personal tutor and research partner in my study.

@savllya6049 3 сағат бұрын

@@kxttd6870 Seeing R1 actual chain of thought is very helpful... AI training humans.

@xavierf2229 12 сағат бұрын

Bro it is on the deepseek documentation that R1 has no support for function/tool calling, they are working on that...

@KCM25NJL 11 сағат бұрын

Yeah this was my immediate thought. The fact that it reasons at all about the agents required is impressive, given that it doesn't have a function calling fine tune.

@krozareq 10 сағат бұрын

Yeah would have to run it locally with a whole lot of memory to run the big boy. Made some agents that use R1 and they can call the tools. Pretty good results with automation there.

@USMiner 12 минут бұрын

Always good to see you have a video out on something I'm curious about. I like your testing and thoughts. Thanks!

@Utoko 10 сағат бұрын

Not getting the reasoning tokens is the biggest downside. You can never see where it went wrong/what you need to change. "debugging" your prompt is such a qol improvement

@annonymbruger 10 сағат бұрын

Totally agree. The self correcting behavior of workflows is dead. Tried doing .Net and Python projects using Roo dev and the missing reasoning made it such a pain, and an expensive one :D LangChain agent flows blew up my bank :D The reasoning steps are so important for any self correcting behavior.

@imrnp 6 сағат бұрын

sam said that’s coming soon

@李小白-x4v 5 сағат бұрын

@imrnp Of course, give him time to copy from deepseek, deepseek is open source after all

@smilingfrogCA 4 сағат бұрын

I think the R1 model they open sourced the inferencing for is already outdated. Seeing how quickly they innovated from version to version, v2 (May 2024), v2.5 (September 2024), v3 (December 2024), R1 (January 2025). Can't wait to see R2 or next version soon!

@technopremium91 4 сағат бұрын

Great video like always. Thanks for the content

@hqcart1 5 сағат бұрын

1. which o mini are you using? low mid or high? 2. What is the total excustion time for both? 3. how much tokens were billed for both?

@savllya6049 3 сағат бұрын

High @ 02:46

@tjmcdonough20 Минут бұрын

Did you not watch it?

@MetaMeta-ic1wr 9 сағат бұрын

I don't like openAi. They are so far away from the original mission, that deepseek needs to do their job.. Sad. Also for an open source Ai model made so cheap, R1 I amazing. Also I don't like that everyone is focused on anthropic and openAi so much when you can clearly see that deepseek v3 and R1 are just create for coding and they are super cheap.

@jeffwads 6 сағат бұрын

R1 wasn't made for cheap. It cost upwards of 500M. They put that in the paper as well, but no one pays attention. I am glad as well that it is completely free.

@kxttd6870 4 сағат бұрын

@@jeffwads 500M is only the training cost. There were numerous human resource costs behind it. Those 149 guys graduated from top universities in China. And the tuition is less than 10,000 RMB per year, which is less than $2,000. The question is straightforward, brilliant brains are much more potent than tons of chips. China invests its future in its people, for the people and by the people. Problems solved. I wonder if CNN and BBC will call it slavery work like all the fake news they put on China.

@TheGoldenrun 3 сағат бұрын

Lol

@chriscotton4207 3 сағат бұрын

Ive been using DeepThink and just using Claude when it doesn't work. Saves about 80% of the cost lol.

@CryptoCoin-tc5lc 3 сағат бұрын

Dude the cheenese stole their technology and made little twicks on it.. Deepseek is chatgpt

@annieorben 4 сағат бұрын

I am surprised at how many questions 03 missed. It's definitely a very capable model but is likely specialized to certain use cases. Good job on finding boundaries on both models! Keep up the great work! Doesn't Deepseek use a sentence based token scope? I think this approach is similar to Meta's LCM. I love the realized efficiency gains!

@abelabebe3241 10 сағат бұрын

DEEPSEEK R1 100% bravo 6 million

@brotherbig4651 4 сағат бұрын

That’s a huge understatement of their actual cost. Probably just one percent of the actual cost.😂

@oyshikaadi9631 Сағат бұрын

@@brotherbig4651 While that's true, its still 27 times cheaper than whatever the ClosedAI was charging.

@Kithsiri-lc9wc 6 сағат бұрын

You are taking about ClosedAI vs Deepseek

@syberkitten1 5 сағат бұрын

good work and great benchmarks, thanks

@youtubecommenter4069 12 сағат бұрын

Those hand drawn logos at the top of his video frame though..

@24-7gpts 11 сағат бұрын

Haha I love it! LOL

@rachkaification 11 сағат бұрын

Everything seems to be about code. What about the following: 1.lesser known languages like greek, bulgarian or hungarian etc. 2.what about creative writing 3.what about translating from one language to another Nobody talks about these, it's only about coding.

@dynodyno6970 11 сағат бұрын

bc ai first use case is things that are going to aid in software development and innovation. Creative writing is probably low on the list of important things ai are used for.

@alexleo4863 10 сағат бұрын

Use ChatGPT 3.5 for creative writing, AGI will not be used to write the break up message. That will be waste of energy

@pneumonoultramicroscopicsi4065 10 сағат бұрын

It's because most people who test these models are in computer science, so the most useful thing for them is coding. Plus these are language models first and foremost, obviously they are good at language, I bet all of them are very good at creative writing.

@DefaultFlame 9 сағат бұрын

I find that Gemini is at least better for song lyrics than any other LLM, not sure about other creative uses. Haven't tested o3-mini for anything like that since the GPT line has consistently been poor at creative tasks IMHO. I know Claude 3.5 sonnet is good at languages other than English, but I'm unsure as to how good since ti was only by accident that a conversation drifted to the subject of language and I've never tried with another model. However, most models seem pretty decent at translation.

@Toxicflu 9 сағат бұрын

Go ahead and test them ! I've always liked ChatGPT for creative writing, and translations. (but did korean, thai, french, and spanish only)

@johnkintree763 5 сағат бұрын

If language models are used to automate the construction of databases of knowledge and sentiment from unstructured input such as conversations, there need to be benchmarks for accuracy, coherence, and consistency of the constructed databases; and, measures of the latency and speed of output. The models should also be tested after being compiled to optimize the performance for specific hardware platforms such as smartphones.

@micbab-vg2mu 10 сағат бұрын

I was not aware about 100K output of o3-mini - I will test it - on my medical AI workflows - so far I used mainly new Gemini models (2.0 flash and thinking) with good results. Thanks for the video:)

@pneumonoultramicroscopicsi4065 10 сағат бұрын

I'm curious, what do you use the AI for in the medical field, I'm a clinician and i have no use case for it, can you enlighten me

@DefaultFlame 9 сағат бұрын

@@pneumonoultramicroscopicsi4065 Quick summarization maybe? IDK, a specifically trained medical AI to do differential diagnosis to catch anything rare that might have gotten missed as an extra layer of safety I could see being used, but with a general AI I'm drawing blanks. That said, I'm a manual laborer, not a medical expert of any kind so what do I know.

@seye46 7 сағат бұрын

Deepseek is the best

@VangBong1Thoi 5 сағат бұрын

Which model is best for medical AI?

@thomasgilson6206 8 сағат бұрын

They both got confused by "renovented"

@hotlineoperator 11 сағат бұрын

In the ChatGPT web service, a long response is interrupted and you have to press "continue". Is there a corresponding API in the interface that can say continue when it shows that the output has been interrupted?

@dejavue3013 6 сағат бұрын

DS has much better sense of humour and irony than OA

@manwheat5150 7 сағат бұрын

let the models be cheaper and more open, thanks to DS

@Toxicflu 9 сағат бұрын

The renovation problem didn't have to be about renovating for an upcoming baby. It could of easily been an accident too. If your wife entered labor, you go back to the house to get your wife to bring her to the hospital. You don't meet her there ;)

@thisisneti 4 сағат бұрын

R1 for me worked better, and more human like responses with txt generation as well.

@gruvhagen 7 сағат бұрын

That 100.000 tokens include the tokens for "thinking"

@eposnix5223 3 сағат бұрын

I really wish you'd at least see what console errors are preventing the windtunnel scripts from running. Last time you messed up by pasting the code wrong.

@dezmond8416 5 сағат бұрын

o3-mini is not open !

@kxttd6870 5 сағат бұрын

It is still a black box to the users, though it is free and with the reasoning function now. Honestly, I do not think OpenAI will ever open its source code under the MIT licence like DS.

@PaulHigginbothamSr 8 сағат бұрын

What I see here is that deepseek seems more oriented to code return than o3mini just my thoughts as to ease of use.

@theoriginalrecycler 11 сағат бұрын

Assignment misspelled

@peterlim8416 5 сағат бұрын

I think none of the model is perfect. Performance wise, i believe o3 should have performed better with their huge compute cluster backing. Anyhow, i like DS the reasoning output, which we can visualise the whole analytics process which is great. Overall, the improvement of o3 is insignificant if compared to o1 based on the commonly questions user asked. DS got big plus point on (1) free or cheap (2) open source, transparency. When comes to advantages of speed, i think 1-2 minutes to get a reasonable solution compared to 5-6 seconds is acceptable to me. I will still fall to DS on this comparison

@云间-v9x 3 сағат бұрын

Sadly, the speed issue is largely due to the fact that DeepSeek is currently under an unprecedented hacker attack that is still ongoing!

@holdenmcgroin8917 10 сағат бұрын

CloseAI's o3-mini thumb down

@scotter 5 сағат бұрын

I'd like to see these 2 against Goofle's best and Anthropic's best.

@alexj2869 4 сағат бұрын

copy attempt of r1?

@tomaszzielinski4521 6 сағат бұрын

We need some AI tools to design better slides.

@gocybertruck8189 7 сағат бұрын

OpenAI all the way. Is that 03-mini HiGH? Appreciate DeepSeek being open source.

@SenselessTalk 12 сағат бұрын

Did you by any chance forgot to set reasoning_effort to high for o3-mini? Because, that kinda is very important.😅 *Edit: He did.*

@Nordine1977 12 сағат бұрын

No he put on high, just watch

@엠케이-p3p 8 сағат бұрын

deepseek r1 context window and max output tokens differ from the provider. in openrouter there are deepseek r1 models with context window of 128k and also max output of 128k. some provider even give 164k of both. and together ai, the provider you are using, actually has 16k of max output tokens.

@jeffwads 6 сағат бұрын

The official model is 128K max context as can be seen on the model card in LM Studio.

@haunter90 10 сағат бұрын

OpenAI is desperate.

@83marktwain 12 сағат бұрын

But 4o medel. Is still the best for writing content and blog posts??

@sherpya 12 сағат бұрын

I think claude is better at this

@rachkaification 12 сағат бұрын

Yep, Claude is still outstanding in this regard which shouldnt be the case now that o3 is supposedly the best. BS.

@sherpya 11 сағат бұрын

@rachkaification o1/o3 are "reasoning" models, they will waste tokens if you ask to produce text

@amzpro5734 6 сағат бұрын

Tested 03-mini-high in the OpenAI dashboard using a bunch of single prompts for retro video games. The outputs were noticeably superior to both DeepSeek R1 and o1. Not tested using just the API yet.

@nexuhs. 6 сағат бұрын

Issue is 99% of the people aren't paying for AI And playing fair, with the free versions from both sides, R1 is way better than o3 free, on top of being unlimited prompts

@amzpro5734 4 сағат бұрын

@@nexuhs. tbf I cancelled my ChatGPT subscription after trying DeepSeek. Open AI will prob lose many paid subscribers to DS, but if they keep the API solid - we'll still need it. Prob be caught in that ecosystem forever! Lol

@nexuhs. 3 сағат бұрын

@@amzpro5734 that's fully understandable, why'd you pay to have something 1% better really

@bramlilipory4116 12 сағат бұрын

I have asked DeepSeek: do you know ChatGPT o3? Repeatedly. DeepSeek: oh, you mean ChatGPT 3.5? Yes, of course I know ChatGPT 3.5

@khanra17 12 сағат бұрын

So what ? Using llm for first time ?

@とふこ 11 сағат бұрын

Turn on web search.

@autohmae 10 сағат бұрын

I guess they are old buddies from high school ? 🙂

@krozareq 10 сағат бұрын

OAI's numbering scheme even confuses the most advanced AIs. 3.5, 4, o4, o1, o3-mini, o3-full soon? Or will that be t6zz?

@MTN1601 4 сағат бұрын

Likely because of knowledge cutoff

@pabloescobar2738 5 сағат бұрын

But why you dont campare qwen max vs 03, its maybe 03 cry😂😂😂

@Rami_Zaki-k2b 11 сағат бұрын

You realize that people are watching this video to "know which model is better" ... Not to "know every small detail about how they work" right ?

@theoriginalrecycler 11 сағат бұрын

Ahh , but none of them are “better” just different

@DORNORFineJewelry 10 сағат бұрын

Did you know; that nobody are forcing you to see this video. Keep up the good work you are one of the best on YT!

@float32 10 сағат бұрын

Oops, I guess I’m watching it wrong. :(

@Rami_Zaki-k2b 10 сағат бұрын

@@DORNORFineJewelry I am entitled to express my opinion in a respectful manner, which I did. Also, no one forcing you to read my comments, but you did, and disagreed with them. You cannot blame me for doing the same thing you do 😘

@autohmae 10 сағат бұрын

Just ask the video maker for for a summary at the end and timestamps on the video.

@bramlilipory4116 12 сағат бұрын

First