So 03 is nothing burger? Well I guess we wait for Deepseek R2 or R3
@naj296 сағат бұрын
Gonna be a long wait because Deepseek cannot handle the amount of americans asking R1 about the Tiananmen Square a million times...
@kxttd68705 сағат бұрын
@@naj29 Why not ask CIA, CNN and BBC for that question? Because the West can not ask how many 'Real Genocide' themselves conducted?
@Samuelkings4 сағат бұрын
@@naj29
@savllya60493 сағат бұрын
@@naj29 🤣
@AI_Robotics7 сағат бұрын
At the 20 minute mark, R1 nailed a reasoning scenario. o3 failed. Almost human reasoning, AMAZING!
@noneofyourbusiness735 сағат бұрын
omg what a creep
@savllya60493 сағат бұрын
Yes that particular test was the most amazing, able to read between the lines. Bet some people may even have problem figuring that one out.
@Lallafef6 сағат бұрын
From my personal experience, DeepSeek is still much better in terms of machine learning research coding tasks.
@kxttd68705 сағат бұрын
I am using it for the deep learning methods on regression to derive the SOC. A.K.A, it looks like a personal tutor and research partner in my study.
@savllya60493 сағат бұрын
@@kxttd6870 Seeing R1 actual chain of thought is very helpful... AI training humans.
@xavierf222912 сағат бұрын
Bro it is on the deepseek documentation that R1 has no support for function/tool calling, they are working on that...
@KCM25NJL11 сағат бұрын
Yeah this was my immediate thought. The fact that it reasons at all about the agents required is impressive, given that it doesn't have a function calling fine tune.
@krozareq10 сағат бұрын
Yeah would have to run it locally with a whole lot of memory to run the big boy. Made some agents that use R1 and they can call the tools. Pretty good results with automation there.
@USMiner12 минут бұрын
Always good to see you have a video out on something I'm curious about. I like your testing and thoughts. Thanks!
@Utoko10 сағат бұрын
Not getting the reasoning tokens is the biggest downside. You can never see where it went wrong/what you need to change. "debugging" your prompt is such a qol improvement
@annonymbruger10 сағат бұрын
Totally agree. The self correcting behavior of workflows is dead. Tried doing .Net and Python projects using Roo dev and the missing reasoning made it such a pain, and an expensive one :D LangChain agent flows blew up my bank :D The reasoning steps are so important for any self correcting behavior.
@imrnp6 сағат бұрын
sam said that’s coming soon
@李小白-x4v5 сағат бұрын
@imrnp Of course, give him time to copy from deepseek, deepseek is open source after all
@smilingfrogCA4 сағат бұрын
I think the R1 model they open sourced the inferencing for is already outdated. Seeing how quickly they innovated from version to version, v2 (May 2024), v2.5 (September 2024), v3 (December 2024), R1 (January 2025). Can't wait to see R2 or next version soon!
@technopremium914 сағат бұрын
Great video like always. Thanks for the content
@hqcart15 сағат бұрын
1. which o mini are you using? low mid or high? 2. What is the total excustion time for both? 3. how much tokens were billed for both?
@savllya60493 сағат бұрын
High @ 02:46
@tjmcdonough20Минут бұрын
Did you not watch it?
@MetaMeta-ic1wr9 сағат бұрын
I don't like openAi. They are so far away from the original mission, that deepseek needs to do their job.. Sad. Also for an open source Ai model made so cheap, R1 I amazing. Also I don't like that everyone is focused on anthropic and openAi so much when you can clearly see that deepseek v3 and R1 are just create for coding and they are super cheap.
@jeffwads6 сағат бұрын
R1 wasn't made for cheap. It cost upwards of 500M. They put that in the paper as well, but no one pays attention. I am glad as well that it is completely free.
@kxttd68704 сағат бұрын
@@jeffwads 500M is only the training cost. There were numerous human resource costs behind it. Those 149 guys graduated from top universities in China. And the tuition is less than 10,000 RMB per year, which is less than $2,000. The question is straightforward, brilliant brains are much more potent than tons of chips. China invests its future in its people, for the people and by the people. Problems solved. I wonder if CNN and BBC will call it slavery work like all the fake news they put on China.
@TheGoldenrun3 сағат бұрын
Lol
@chriscotton42073 сағат бұрын
Ive been using DeepThink and just using Claude when it doesn't work. Saves about 80% of the cost lol.
@CryptoCoin-tc5lc3 сағат бұрын
Dude the cheenese stole their technology and made little twicks on it.. Deepseek is chatgpt
@annieorben4 сағат бұрын
I am surprised at how many questions 03 missed. It's definitely a very capable model but is likely specialized to certain use cases. Good job on finding boundaries on both models! Keep up the great work! Doesn't Deepseek use a sentence based token scope? I think this approach is similar to Meta's LCM. I love the realized efficiency gains!
@abelabebe324110 сағат бұрын
DEEPSEEK R1 100% bravo 6 million
@brotherbig46514 сағат бұрын
That’s a huge understatement of their actual cost. Probably just one percent of the actual cost.😂
@oyshikaadi9631Сағат бұрын
@@brotherbig4651 While that's true, its still 27 times cheaper than whatever the ClosedAI was charging.
@Kithsiri-lc9wc6 сағат бұрын
You are taking about ClosedAI vs Deepseek
@syberkitten15 сағат бұрын
good work and great benchmarks, thanks
@youtubecommenter406912 сағат бұрын
Those hand drawn logos at the top of his video frame though..
@24-7gpts11 сағат бұрын
Haha I love it! LOL
@rachkaification11 сағат бұрын
Everything seems to be about code. What about the following: 1.lesser known languages like greek, bulgarian or hungarian etc. 2.what about creative writing 3.what about translating from one language to another Nobody talks about these, it's only about coding.
@dynodyno697011 сағат бұрын
bc ai first use case is things that are going to aid in software development and innovation. Creative writing is probably low on the list of important things ai are used for.
@alexleo486310 сағат бұрын
Use ChatGPT 3.5 for creative writing, AGI will not be used to write the break up message. That will be waste of energy
@pneumonoultramicroscopicsi406510 сағат бұрын
It's because most people who test these models are in computer science, so the most useful thing for them is coding. Plus these are language models first and foremost, obviously they are good at language, I bet all of them are very good at creative writing.
@DefaultFlame9 сағат бұрын
I find that Gemini is at least better for song lyrics than any other LLM, not sure about other creative uses. Haven't tested o3-mini for anything like that since the GPT line has consistently been poor at creative tasks IMHO. I know Claude 3.5 sonnet is good at languages other than English, but I'm unsure as to how good since ti was only by accident that a conversation drifted to the subject of language and I've never tried with another model. However, most models seem pretty decent at translation.
@Toxicflu9 сағат бұрын
Go ahead and test them ! I've always liked ChatGPT for creative writing, and translations. (but did korean, thai, french, and spanish only)
@johnkintree7635 сағат бұрын
If language models are used to automate the construction of databases of knowledge and sentiment from unstructured input such as conversations, there need to be benchmarks for accuracy, coherence, and consistency of the constructed databases; and, measures of the latency and speed of output. The models should also be tested after being compiled to optimize the performance for specific hardware platforms such as smartphones.
@micbab-vg2mu10 сағат бұрын
I was not aware about 100K output of o3-mini - I will test it - on my medical AI workflows - so far I used mainly new Gemini models (2.0 flash and thinking) with good results. Thanks for the video:)
@pneumonoultramicroscopicsi406510 сағат бұрын
I'm curious, what do you use the AI for in the medical field, I'm a clinician and i have no use case for it, can you enlighten me
@DefaultFlame9 сағат бұрын
@@pneumonoultramicroscopicsi4065 Quick summarization maybe? IDK, a specifically trained medical AI to do differential diagnosis to catch anything rare that might have gotten missed as an extra layer of safety I could see being used, but with a general AI I'm drawing blanks. That said, I'm a manual laborer, not a medical expert of any kind so what do I know.
@seye467 сағат бұрын
Deepseek is the best
@VangBong1Thoi5 сағат бұрын
Which model is best for medical AI?
@thomasgilson62068 сағат бұрын
They both got confused by "renovented"
@hotlineoperator11 сағат бұрын
In the ChatGPT web service, a long response is interrupted and you have to press "continue". Is there a corresponding API in the interface that can say continue when it shows that the output has been interrupted?
@dejavue30136 сағат бұрын
DS has much better sense of humour and irony than OA
@manwheat51507 сағат бұрын
let the models be cheaper and more open, thanks to DS
@Toxicflu9 сағат бұрын
The renovation problem didn't have to be about renovating for an upcoming baby. It could of easily been an accident too. If your wife entered labor, you go back to the house to get your wife to bring her to the hospital. You don't meet her there ;)
@thisisneti4 сағат бұрын
R1 for me worked better, and more human like responses with txt generation as well.
@gruvhagen7 сағат бұрын
That 100.000 tokens include the tokens for "thinking"
@eposnix52233 сағат бұрын
I really wish you'd at least see what console errors are preventing the windtunnel scripts from running. Last time you messed up by pasting the code wrong.
@dezmond84165 сағат бұрын
o3-mini is not open !
@kxttd68705 сағат бұрын
It is still a black box to the users, though it is free and with the reasoning function now. Honestly, I do not think OpenAI will ever open its source code under the MIT licence like DS.
@PaulHigginbothamSr8 сағат бұрын
What I see here is that deepseek seems more oriented to code return than o3mini just my thoughts as to ease of use.
@theoriginalrecycler11 сағат бұрын
Assignment misspelled
@peterlim84165 сағат бұрын
I think none of the model is perfect. Performance wise, i believe o3 should have performed better with their huge compute cluster backing. Anyhow, i like DS the reasoning output, which we can visualise the whole analytics process which is great. Overall, the improvement of o3 is insignificant if compared to o1 based on the commonly questions user asked. DS got big plus point on (1) free or cheap (2) open source, transparency. When comes to advantages of speed, i think 1-2 minutes to get a reasonable solution compared to 5-6 seconds is acceptable to me. I will still fall to DS on this comparison
@云间-v9x3 сағат бұрын
Sadly, the speed issue is largely due to the fact that DeepSeek is currently under an unprecedented hacker attack that is still ongoing!
@holdenmcgroin891710 сағат бұрын
CloseAI's o3-mini thumb down
@scotter5 сағат бұрын
I'd like to see these 2 against Goofle's best and Anthropic's best.
@alexj28694 сағат бұрын
copy attempt of r1?
@tomaszzielinski45216 сағат бұрын
We need some AI tools to design better slides.
@gocybertruck81897 сағат бұрын
OpenAI all the way. Is that 03-mini HiGH? Appreciate DeepSeek being open source.
@SenselessTalk12 сағат бұрын
Did you by any chance forgot to set reasoning_effort to high for o3-mini? Because, that kinda is very important.😅 *Edit: He did.*
@Nordine197712 сағат бұрын
No he put on high, just watch
@엠케이-p3p8 сағат бұрын
deepseek r1 context window and max output tokens differ from the provider. in openrouter there are deepseek r1 models with context window of 128k and also max output of 128k. some provider even give 164k of both. and together ai, the provider you are using, actually has 16k of max output tokens.
@jeffwads6 сағат бұрын
The official model is 128K max context as can be seen on the model card in LM Studio.
@haunter9010 сағат бұрын
OpenAI is desperate.
@83marktwain12 сағат бұрын
But 4o medel. Is still the best for writing content and blog posts??
@sherpya12 сағат бұрын
I think claude is better at this
@rachkaification12 сағат бұрын
Yep, Claude is still outstanding in this regard which shouldnt be the case now that o3 is supposedly the best. BS.
@sherpya11 сағат бұрын
@rachkaification o1/o3 are "reasoning" models, they will waste tokens if you ask to produce text
@amzpro57346 сағат бұрын
Tested 03-mini-high in the OpenAI dashboard using a bunch of single prompts for retro video games. The outputs were noticeably superior to both DeepSeek R1 and o1. Not tested using just the API yet.
@nexuhs.6 сағат бұрын
Issue is 99% of the people aren't paying for AI And playing fair, with the free versions from both sides, R1 is way better than o3 free, on top of being unlimited prompts
@amzpro57344 сағат бұрын
@@nexuhs. tbf I cancelled my ChatGPT subscription after trying DeepSeek. Open AI will prob lose many paid subscribers to DS, but if they keep the API solid - we'll still need it. Prob be caught in that ecosystem forever! Lol
@nexuhs.3 сағат бұрын
@@amzpro5734 that's fully understandable, why'd you pay to have something 1% better really
@bramlilipory411612 сағат бұрын
I have asked DeepSeek: do you know ChatGPT o3? Repeatedly. DeepSeek: oh, you mean ChatGPT 3.5? Yes, of course I know ChatGPT 3.5
@khanra1712 сағат бұрын
So what ? Using llm for first time ?
@とふこ11 сағат бұрын
Turn on web search.
@autohmae10 сағат бұрын
I guess they are old buddies from high school ? 🙂
@krozareq10 сағат бұрын
OAI's numbering scheme even confuses the most advanced AIs. 3.5, 4, o4, o1, o3-mini, o3-full soon? Or will that be t6zz?
@MTN16014 сағат бұрын
Likely because of knowledge cutoff
@pabloescobar27385 сағат бұрын
But why you dont campare qwen max vs 03, its maybe 03 cry😂😂😂
@Rami_Zaki-k2b11 сағат бұрын
You realize that people are watching this video to "know which model is better" ... Not to "know every small detail about how they work" right ?
@theoriginalrecycler11 сағат бұрын
Ahh , but none of them are “better” just different
@DORNORFineJewelry10 сағат бұрын
Did you know; that nobody are forcing you to see this video. Keep up the good work you are one of the best on YT!
@float3210 сағат бұрын
Oops, I guess I’m watching it wrong. :(
@Rami_Zaki-k2b10 сағат бұрын
@@DORNORFineJewelry I am entitled to express my opinion in a respectful manner, which I did. Also, no one forcing you to read my comments, but you did, and disagreed with them. You cannot blame me for doing the same thing you do 😘
@autohmae10 сағат бұрын
Just ask the video maker for for a summary at the end and timestamps on the video.
@bramlilipory411612 сағат бұрын
First
@garfield5849 сағат бұрын
I think the DeepSeek R1 is overly hyped; I tried it and it's simply unusable. To me, it represents the same quality as GPT-3.5.
@raidsmith51074 сағат бұрын
cool story bro
@blackpiller37774 минут бұрын
liar
@ramkumarpandey486510 сағат бұрын
Please do a video of comparing with the GOAT Sonnet 3.5 New
@LindaPopes12 сағат бұрын
Your channel is a source of entertainment as well as education. Keep leading us into the world of knowledge and discovery!🧿🛬🦏