o3-mini is really good (but does it beat deepseek?)

Рет қаралды 104,445

Күн бұрын

OpenAI just released their new reasoning model o3 mini, with some very clear responses to the crazy stuff Deepseek's been up to 👀
Thank you Ragie AI for sponsoring! Check them out at: soydev.link/ragie
Try out o3 mini for free: soydev.link/chat
o3 mini announcement: openai.com/ind...
Check out my Twitch, Twitter, Discord more at t3.gg
S/O ‪@bmdavis419‬ for the awesome edit 🙏

Пікірлер: 392

@amihartz 6 күн бұрын

The fact they dropped it so quickly tells you that OpenAI has had the ability to make great cheap models for awhile now but just didn't want to due to lack of competition.

@voicevy3210 6 күн бұрын

exactly, it's much more than this. they just want to release new products just for making money. it doesn't matter for them to put the best foot forward

@urmom8322 6 күн бұрын

they announced this a while ago bruh

@arotobo 6 күн бұрын

Except they announced a late January release for o3-mini back in December 2 months ago? They might’ve made it cheaper because of r1 but release date has nothing to do with it.

@voicevy3210 6 күн бұрын

@@arotobo agree, but we never know the scope of what's going to be released. it's not tangible exactly. so they might even just keep the hype cycle up and keep raising funds and selling us mediocre products.

@arotobo 6 күн бұрын

@@voicevy3210 we always knew it was o3 mini, they literally said it in the announcement video that finished their “12 days of openai”. I see how my wording is confusing tho so I will fix it.

@jiachen1078 6 күн бұрын

App devs should send DeepSeek team a thank-you letter

@mikitoburrito 6 күн бұрын

why?

@rodjenihm 6 күн бұрын

@@mikitoburrito Because they forced OpenAI to lower the price for o3-mini to be competitive again. Otherwise the would probably start with $100 per 1m tokens lol

@itsmeGeorgina 6 күн бұрын

@@mikitoburrito you mean you have no clue???

@myintmaunmaun 6 күн бұрын

"After we leave, they will build schools and hospitals for you, and they will raise your wages. This is not because they have had a change of heart, nor because they have become good people, but because we were here."

@deathrace-bx5ne 6 күн бұрын

@@myintmaunmaun DeepSeek DeepSeek DeepSeek DeepSeek DeepSeek DeepSeek

@damians.7859 6 күн бұрын

I'm sure they've planned to make o3 more expensive, but they've had to come up with a cheaper pricing due to R1. I'm also sure Google wanted to increase their pricing of the experimental "thinking" Gemini Flash model once it comes out of the preview phase, but now they'll need to adjust as well. Thank you DeepSeek!

@leuhenry8031 6 күн бұрын

DeepSeek helps US's people to bring the AI price down, that make closeAI follow up with. someday, closeAI maybe become the true OpenAI

@dibu28 6 күн бұрын

DeepSeek helps people all around the world to bring the AI price down.

@kukuricapica 6 күн бұрын

DeepSeek made their source code open for everyone to use proving that we dont actually need Project Stragate, but smarter ways to train models.

@myintmaunmaun 6 күн бұрын

@pneumonoultramicroscopicsi4065 6 күн бұрын

@@kukuricapica not true

@apierror 6 күн бұрын

@@kukuricapicaAnd if OpenAI ever decides to actually sue, AI would finally get regulated and they may shoot themselves in the foot.

@bryce1017 6 күн бұрын

Theo to fix your issue with markdown OpenAI is looking for a key in the system message, this is from their new docs on reasoning models. "Markdown formatting: Starting with o1-2024-12-17, reasoning models in the API will avoid generating responses with markdown formatting. To signal to the model when you do want markdown formatting in the response, include the string Formatting re-enabled on the first line of your developer message."

@ashleymccarthy6232 6 күн бұрын

This past week in the world of AI, is a great example of free market competition principles!

@gamernerd7139 6 күн бұрын

No none of these players are free market based. While ChatGPT has its big investors, DeepSeek has Chinese government in the shadows. Both will steal your data and it is a matter of opinion if you want it to go to corporate thieves or CCP autocratic thieves.

@mcwornex2123 6 күн бұрын

yup. hard to see in other fields. Once AI is figured out, innovation won't be as much of a disruptive to bigger players. I'm rooting for open source.

@VV-nw4cz 6 күн бұрын

The irony is that it was sparked by communist China, lol.

@vedranlekic9725 6 күн бұрын

Not true at all. Giving your stuff for free as open source is as far from capitalism that it can get. Capitalism was what was in power 3 weeks ago.

@天天去溜达 6 күн бұрын

yeah, EV is the opposite example of free market competition principles!

@Lucas-gt8en 6 күн бұрын

“Two devs in a trenchcoat” is such good a way to describe early startups 😂

@devviz 23 сағат бұрын

15:00 of course he knows this, no typical user is running such a demanding task so they do not need to push the model so hard, thus less error prone

@haha7567 4 күн бұрын

Markdown: "Formatting re-enabled" on the first line of your developer message, to enable markdown.

@tedarcher9120 6 күн бұрын

OpenAI was like "What's their price? *DOUBLE IT*"

@gibzrival1565 5 күн бұрын

fr, I had to confirm😅

@sbowesuk981 6 күн бұрын

I'm rooting for DeepSeek and similar opensource companies. If opensource wins the AI race, we all win. If OpenAI wins, we all lose. That sounds extreme, but that's honestly how it looks right now.

@tomirkpl 6 күн бұрын

Oh noooo... OpenAI wins, we looose Oh noooo 🤣 🤣 🤣

@generalegg6778 5 күн бұрын

Dont we benefit either way by the competition? I mean by the end, one of them will have a model that we suprass all of them, and we will benefit from the model.

@Pepe12345-c 5 күн бұрын

We? Who is we?

@Jaroslav-f9o 4 күн бұрын

We win either way.

@shining_cross 4 күн бұрын

opensource will never win, because every time opensource discovers new methods, closedsource will just copy it silently since they can access to it, but when closedsource discovers new method, opensource can't access it

@BruceWayne15325 6 күн бұрын

The best thing about DeepSeek is that it looks like they've been able to do (at least to some degree) what the rest of the industry has been hounding OpenAI to do (unsuccessfully) forever: Reveal their chain-of-thought and go open source. They aren't doing either yet, but they are relaxing their "moat" a bit and giving more detailed, but still high level chain-of-thought and they are considering actually open sourcing some of their code.

@HDshrimpkick 6 күн бұрын

These are the kind of vids that are ripe for ai video summarisation

@PhuPhillipTrinh 6 күн бұрын

yup used o3 for this case lol so good!

@umairaliism 5 күн бұрын

which one are we using?

@Rsanda 6 күн бұрын

Yes ChatGPT UI is bad we get it

@haroldcruz8550 6 күн бұрын

To be fair with all the funds that they have, that bad of a UI deserve that kind of criticism.

@fatal510 6 күн бұрын

It’s not just bad. It’s doesn’t work for long running tasks

@Mezielz 5 күн бұрын

He's just trying to sell his product.

@devviz 13 сағат бұрын

i challege t3 chat to maintain its performance with the amount of traffic chatgpt has

@the_proffesional1713 6 күн бұрын

Nope. First, deepseek still outperform o3 mini with tons of problems that i gave. Second, its free.

@tomirkpl 6 күн бұрын

It's not free. It cost 3000 USD for a graphics card :D and electricity.

@RedEyeLazer 6 күн бұрын

@@tomirkplYou can use it for completely free on their website... Are you dumb?

@brockoala2994 6 күн бұрын

@@tomirkpl Not to mention the model you can run on a 4090 (or 5090 if you can even get a chance to buy one), is only the 70b model AT BEST, with super slow speed, and far dumber than the 671b model hosted on their website, and without a search function that you will have to implement by yourself, which can be far inferior to their native one.

@haythemsandel8303 6 күн бұрын

@@brockoala2994 bro 0.001$ per API token is basically free you don't need to host anything

@subhashpeshwa2997 6 күн бұрын

When did theo become an AI bro 😂

@sid4579 6 күн бұрын

hype is views

@richielickie 6 күн бұрын

he has an chat app...

@nwsome 6 күн бұрын

End of 2024, I guess

@urmom8322 6 күн бұрын

bc he's grifter

@Frozander 6 күн бұрын

He is literally a Cursor Editor investor, he always made videos about them since copilot.

@dibu28 6 күн бұрын

It does't.))) I tried to use o3-mini-hard to write my one simple python script and it failed to work after 15 additional questions while deepseek wrote me a working script after 15 questions. On the first question every model faild.

@lost4468yt 6 күн бұрын

I would suggest you check the internal thoughts to see what's going on.

@_Factboy_Sunny_ 6 күн бұрын

DeepSeek R1 is Better & Free

@fufuu 6 күн бұрын

And it’s open source what more could you ask for

@childe2001 6 күн бұрын

Deepseek is not free, it charges for every token you call through api

@th-redattack 6 күн бұрын

@childe2001he talks about the web app and mobile app not api

@TheWarehouseDude 6 күн бұрын

Uhhh no. o3 smokes DeepSeek

@th-redattack 6 күн бұрын

@@TheWarehouseDude true the code it gave me was crazy good o3 mini high is much superior but having deepseek make openai scared is very good for us users

@matthewwoodard9810 6 күн бұрын

Stop the hype. I used it all day on real world coding problems and it’s not much different from 3.5 sonnet. Even there, most of the improvement from 0-1 isn’t coming from the model, it’s coming from the software layer on top of the model.

@mikitoburrito 6 күн бұрын

it isn't really an upgrade from o1 performance wise afaik. It's just similar/same performance with greater efficiency and speed.

@RomeTWguy 6 күн бұрын

Claude is still the best coding model for real world tasks

@LucasSouzaDev 6 күн бұрын

same here, it is not too much...

@ismaelplaca244 6 күн бұрын

Exactly

@furycorp 6 күн бұрын

Bingo, the model is basically the same imho, there's effectively just some built in "are you sure" and "outline the steps" prompts. Agree 3.5 sonnet still seems to pull ahead in real-world coding tasks.

@jeystone2159 6 күн бұрын

o3 fails at the marble cup question....fail....deepseek gets it right

@moonasha 6 күн бұрын

o3 is specialized for coding and STEM, not marbles

@jeystone2159 5 күн бұрын

@@moonasha if it's logic fails at marble, cup, table - it's shite

@Vedant-df9zo 4 күн бұрын

No, if you are smart engouh to use right model its better than deep seek. Some people dont even know which model to use for coding and smaller tasks.

@ViralKiller 4 күн бұрын

@@Vedant-df9zo Nope, it should at least have the 'reasoning' to understand that the marble falls out of the cup and onto the table when turned upside down. If it fails at this, it won't be any good at 'STEM'. Imagine the reasoning errors it will make with fluid dynamics.

@leeyouyun7728 6 күн бұрын

Still R1 is just as good and cheaper. 👍👍👍

@vassovas 6 күн бұрын

I mean... OpenAI did say they were launching o3-Mini at end of January in Midish December...

@PrinceofUnderpants 6 күн бұрын

But the emergence of V3 at that time made O3 reconsider the release time. No one knows what O3 was doing during this period.

@neoglacius 5 күн бұрын

but not with a massive price drop

@gr33nDestiny 6 күн бұрын

You're a legend for offering o3-mini on free tier, thanks so much for that!

@m3nafsy 6 күн бұрын

After extensive experience with this model with DeepSeek Literally deepseek Thinks longer and gives better and more accurate answers in long context Most importantly, I can download it locally and use it Also, the new OpenAI model is not a complete model, it cannot even view files, and it is really stupid and worthless, even in normal questions it did not answer correctly

@VidoviDroga 6 күн бұрын

I have to say, Claude does that, it has been reported that it sometimes ignores first prompts so the interaction in that specific chat would last longer. If you give it the same promt in another chat it might give you better results.

@Darkenz000 5 күн бұрын

Honestly, after some tests online and testing myself, o3 is underwhelming. Like, by A LOT. R1 still manages to beat it in half if not more of the tasks, especially making a game in html. Also, I've noticed a SIGNIFICANT drop in all chatgpt models. What do I mean? They seem to respond in such stupid ways, they don't actually follow what the user is saying. This started happening after r1 released (I did not use R1 at that time, so there's no bias on my side. I came to this conclusion before using R1 or even hearing about it.)

@md.manzeralam6508 6 күн бұрын

hey just a quick shout out to you guys the t3 chat is amazing, tried it for the first time today and responses were asap. Great work

@pikachu-mx6hi 6 күн бұрын

o3 mini-high is actually insanely good. been playing for a while. absolutely mind-blowing.

@haroldcruz8550 6 күн бұрын

No it's not it has better pricing than the other ChatGPT models but nowhere near being mind blowing.

@RedEyeLazer 6 күн бұрын

03 mini and the high version are not completely free though, unlike DeepSeek R1, so just use that model, you dumbass.

@markbond08 6 күн бұрын

Using both o3 mini high and deepseek for 8 hours yesterday I can confidently say Deepseek is better at doing it what you tell it to. All GPT want's to do is give you // Fill in the rest here comments. I am cancelling my GPT subscription

@chriss87878 4 күн бұрын

With the amount of hype i see from KZbinrs about AI i thought even the old gpt 4 could easily complete all of these AoC tasks with ease, especially considering the results are everywhere online, the fact that the latest models can't was shocking to me. And i'm out here wondering why everyone is sucking off the Cursor ide while it's struggling with my simple react codebase. So much empty hype around AI it's insane

@LEONARDO-xs2ke 6 күн бұрын

Bro make a video on how are you so productive

@VeaceslavBARBARII 6 күн бұрын

Just start coding when you're seven years old and you're good to go.

@furycorp 6 күн бұрын

He has a team. He just hired Ben Davis who is insanely productive.

@perguth 6 күн бұрын

I always suspected OpenAI to mine BTC in the background of the page or something 😂

@GigaSimp 5 күн бұрын

Another scam from mister charlatan Altman. Before they program gpt to change the reply, here is what I got by asking ChatGPT o3-mini "Which model am I talking to? " It replied : Let's break down the answer into simple points: - **I am GPT-4:** I run on the GPT-4 architecture. That's my main model. - **About "o3-mini":** There is no version called "o3-mini" in my design. My technology is entirely based on GPT-4. So, to answer your question directly: No, I'm not "o3-mini." I'm GPT-4.

@TheHronar 6 күн бұрын

It's not weird that o3 Mini costs less per token than 4o. It's probably the equivalent of 4o mini but with reasoning capabilities. It ultimately spits out MUCH more tokens per prompt and you're still paying for them even if you don't see them over the API.

@myintmaunmaun 6 күн бұрын

@tongducthanhnam 6 күн бұрын

But does that mean they officially reconized DeepSeek good 😙.

@keyser021 5 күн бұрын

Open source wins all day every day. Sam can only keep the Potemkin Village standing for so long before all of his skeletons come flying out of the closet.

@Versus-A 6 күн бұрын

Strange how I've found sonnet 3.5 to still be the best at my coding tasks

@DanielMetille 6 күн бұрын

Maybe others are good in Python and React, but when coming to code for some less popular language as Drupal/PHP or SwiftUI, Claude still impress me.

@nikomancer69 4 күн бұрын

Honestly, if I were building an app right now, it would take a huge, huge leap in capabilities for me to even consider any Open AI (or Google, or Antrhopic) model. The cost-effectiveness, the ability to self-host, the ability to apply LORA to fine tune for specific capabilities; these are high-value things when you're building an app and it would take a substantial increase in capabilities from Open AI before I would even start to debate giving them up.

@df_all 6 күн бұрын

What’s with the fake tweet thumbnail?

@WiseWeeabo 5 күн бұрын

short answer: yes, it really beats deepseek. I personally haven't bumped into any of Theo's issues, I feel sorry for him.

@MrEnriqueag 6 күн бұрын

If you give an LLM an open ended problem with tons of requirements they will miss something unless you prompt them super specifically Reasoning models are just really good at prompting themselves very specifically

@tubeyou6794 6 күн бұрын

I am a genius and I write amazing prompts. That’s why I actually don’t use the oh one model. I use the old GPT four model and it works better for me because the GPT four actually gives me precise what I want the oh one things instead of me, which is of course worse because I’m super superior to you or any other human being.

@MrEnriqueag 6 күн бұрын

@tubeyou6794 I'm not sure if you are implying I said anything of the sort. Or you are actually saying that you do that which wouldn't be the smartest thing to do. But technically if you broke down the problem into very clear step by step instructions you realize how that would be easier for the AI no? If you want to test this, you can use any reasoning model that actually gives you all the "thinking" part. Ask a question to V3 that it can't do consistently but R1 can. Now ask it to R1, then take the content inside the think tags and dive it with the question to V3 Watch V3 get the question right. But if you want to do it even better, break down the problem in tiny steps yourself, and ask it to do the steps 1 by 1 and you'll probably do a better job than R1

@OhsoLosoo 6 күн бұрын

I didn’t know that Claude was so expensive. We use it at work & it honestly does so well that we always assumed they updated it to be a reasoning model, but after watching this video I will be suggesting several changes

@Jenkkimie 6 күн бұрын

Well all grads and current students careers just went up in flames. What a good prank it was.

@personofcolour6564 6 күн бұрын

Tf?

@Alistair1217 4 күн бұрын

Interestingly, OpenAI O3's reasoning process inevitably shows Chinese thinking process, which looks like a trick that is not hidden well.

@VikasKapadiya1993 6 күн бұрын

Have you tried with new gemini thinking model?

@ItsNicolau 6 күн бұрын

Your videos taught me so much that I know almost nothing about. Thank you, Theo

@bigmedge 5 күн бұрын

@ 6:24, when GPT cut off that last response after “Setting the parameter” paragraph , why didn’t you then just ask something along the lines of “your response got cut off after (copy/paste last paragraph). Continue from there.” You’d have had the ability to objectively evaluate o3 mini’s coding capabilities if you had written a prompt like that b/c that would’ve generated a final stable version of the script

@VV-nw4cz 6 күн бұрын

If GPT agents will replace all developers, why did not all those companies fix their UI yet?

@tjblackman08 6 күн бұрын

T3 Chat needs a toggle for "Just answer, don't explain the answer." and it should default to on.

@frosty129 2 күн бұрын

Can you open source a version of T3chat, or some boilerplate that uses the same stack? I am curious how you’ve married nextjs and react router, counter to what everyone says you should do, yet you seem to be getting a good result.

@misterJBD 6 күн бұрын

Claude just shows that everything that Amazon touches (or invests in) end up being promising and then they suck.

@RomeTWguy 6 күн бұрын

Its still the best model

@Frozander 6 күн бұрын

It is probably still the best non-reasoning model and it works the fastest.

@RomeTWguy 6 күн бұрын

@ there is no such thing as a 'reasoning' model, that's just a marketing term

@wwkk4964 6 күн бұрын

@@RomeTWguybest model for normies. Nobody with a serious novel or hard problem is going to choose Claude, it just makes stuff up confidently because it can't reason.

@josephvictory9536 6 күн бұрын

@@RomeTWguythen there is such a thing guy

@SjarMenace 5 күн бұрын

who else skips like crazy whe you hear 'this days sponsor' and dont listen to ads at all and is not affected by them?

@HillaryNamanya 6 күн бұрын

very sure deepseek is cooking r2 in silence. The next distillation will setback openAI. But completion is good for us consumers. Let them fight

@sasa-tg4od 6 күн бұрын

After testing, the 03 is now inferior to the r1.

@RedEyeLazer 6 күн бұрын

@@sasa-tg4odJUST SHUT UP!!!!!!!

@omarmady5582 5 күн бұрын

The pagination is weird indeed. In the network tab, you can see that AT LEAST 2 requests are made for each page, sometimes more are made.

@alexbowe3411 6 күн бұрын

Did you try adding "Formatting re-enabled" on the first line of your developer message to re-enable Markdown?

@Schlafen-wx1kx 6 күн бұрын

I just stumbled onto T3 today, and wanted to get signed up but its missing even basic functionality such as a system prompt? i understand you want to run lean and mean, but couldnt you stash it somewhere in advanced? And folders are a must if you are running 20 queries a day.

@random.mandem 6 күн бұрын

Yeah. It's unusable.

@YixuanLi-h3i 4 күн бұрын

I don't know why you should compare a shelled deepseek clone with the native deepseek.

@lost4468yt 6 күн бұрын

4o is still better for general knowledge, trivia, etc.

@SithLordBishop 6 күн бұрын

reasonable dad jokes

@trappedcat3615 6 күн бұрын

The word Reasonable came from teaching a son to eat by reading a story about a bowl. (Read-son-a-bowl)

@AlucardNoir 6 күн бұрын

The 03 mini prices are either BS they use as a loss leader OR someone forgot to pull the plug on GPT4.

@lukasz96 6 күн бұрын

For sure loss leader. They are in panic mode. Alas, R1 is still completelly free, so OpenAI can f off for all I care

@josjos1847 6 күн бұрын

@@lukasz96 O3 is in the free tier too

@bladekiller2766 5 күн бұрын

How are you so good at Advent of code to have pretty good timings? Do you have experience with Algorthmic Problems and Competitive Programming, or you are naturally extremely gifted?

@ishaat_plays 6 күн бұрын

NVDIA vs AMD || Deep seek vs Open AI .... what a strange world we live in

@FRareDom 5 күн бұрын

if it wasnt for deepseek, o3 mini probably wouldnt release for another year, exact thing they did with sora

@GhostHack_1 4 күн бұрын

This is the beginning of the plateau. No increase in result accuracy, but making it cheaper. Wild to me that claude 3.5 is still superior to both r1 and o3 when it comes to coding lol

@mokoboko2482 6 күн бұрын

Guys! What tier do I need to use o3-mini through the API?

@dominikchyziak8246 6 күн бұрын

@emirtunahanalim2748 6 күн бұрын

Google has been cooking with Gemini models recently and adding them to the exact comparison would be very nice

@spetz911 6 күн бұрын

Their UI is hilariously bad. I can’t agree more with that! 😊

@attentioncestpaslegal7847 6 күн бұрын

9:30 That was a really hard problem.

@josephvictory9536 6 күн бұрын

3 medium problems layered. Its definitely hard. Thought it would be easier. But the trick is that its a combinations problem not a greedy problem. You can greedily get the combinations to reduce space. After that realization its kinda easy. Just a lot of writing. Incredibly fun problem. Had no idea recursive optimal pathways could be so different with such obvious and seemingly fixed optimal paths.

@Worldkiajoliet 6 күн бұрын

I gave 03 mini functional, working, simple code to evaluate. It had improvement ideas that sounded fine, so I asked it to improve. It was like dealing with gpt 3 ...it broke the existing code and provided no updates. After 5 more prompting sessions it still could not even duplicate the existing code that worked. Not sure what the hype is yet. What may I be missing? Thanks

@daniellyons6269 6 күн бұрын

15:36 I'm pretty sure that the reasoning UI given by OpenAI is actually gaslighting. The actual reasoning tokens are not exposed to us the user. Instead they have yet another process that is summarizing the reasoning in order to obfuscate their techniques.

@nathanbanks2354 6 күн бұрын

I've been using o3 occasionally, but I still like r1 more for most prompts. r1 tells you when your question is too large for the context window, o3-mini just forgets that you asked a question if it's before 5000 lines of code. o1 answers best.

@GonzaloGuevaraFreire 6 күн бұрын

Ned Flanders sabe lo que dice.

@ПетроБойко-ц3б 6 күн бұрын

Agent Smith: ... The perfect world was a dream that your primitive cerebrum kept trying to wake up from. Which is why the Matrix was redesigned to this: the peak of your civilization. I say your civilization, because as soon as we started thinking for you it really became our civilization, which is of course what this is all about. (Matrix quotes)

@fredshum7521 5 күн бұрын

Obviously, why O3 can be out shortly after Deepseed with lower training cost ? O3 incorporated the key feature of Deepseed's code

@IvanBrandonOwonoMbarga 6 күн бұрын

We should just build a decoder model to convert that fun formatting of he’s (r1) , to markdown or any other formatting. I am pretty sure any basic gpt can already decode in such a way.

@shining_cross 4 күн бұрын

now closed source ai will copy what deepseek has done because they can access it because deepseek is completely open source, then will sell it to the public

@divinelyindifferent 6 күн бұрын

What happens when China develops and releases a free version of Sora?

@stephenlflf3871 6 күн бұрын

😮

@vaibhav5783 6 күн бұрын

server cost will be too much. It won't be free. But it will be open source we can run on our local machine

@sasa-tg4od 6 күн бұрын

The Chinese Sora equivalents, Kling and MinMax, far surpass America's Sora in capability. Though not free to use, the United States has already lost ground in this domain of technological competition.

@divinelyindifferent 6 күн бұрын

@ Thank you for letting us know! Very interesting.

@alexleo4863 6 күн бұрын

Let them keep their expensive models to themselves

@dibu28 6 күн бұрын

DeepSeek R1 is now very slow. And DeepSeek R1 (Nitro) which is fast is $7 in $7 out.

@nikyabodigital 4 күн бұрын

The real ranking. o3 mini - deepseekr1 - claude sonnet 3.5 - o1 - o1 mini - qwen 2.5. Used em all qwen is not there yet and gemini even the latest gemini 2 isnt even in the list its worse among ranking at the moment

@hendrx 6 күн бұрын

They can keep their closed source trash

@al2935 6 күн бұрын

I tested it for about 4 hours yesterday and for now 01 Pro is just better due to being more compliant and on task when presented with long and complex prompt scripts and tasks. It's not so much about hallucinations at this point, it's more like it selectively ignores parts of the script, even with the most extreme reenforcement. Like, it understands the full context but will do what it wants past a certain point instead of following the full letter of what you're asking for it unless you go back to chunking your answer and go peacemeal.

@shahswatpandey5427 6 күн бұрын

It's just my opinion,I feel that R1's answer after reasoning is better than o3-mini. LIke more detailed and structured

@slzzzzzzzz 6 күн бұрын

I tested o3-mini (low) [free-tier] and Deepseek R1 on some math competitions. Deepseek R1 is able to solve many problems from the Chinese National High School Math League First Round, but fails miserably on the Second Round (harder problems). On the other hand, o3-mini (low) solves all problems from the Second Round @2 (those I threw to it), but fails on the National Team Selection Test (extremely hard problems). And o3-mini (low) is clearly faster than Deepseek R1. So at least for math, o3-mini (low) is better than Deepseek R1.

@ccyberhub 6 күн бұрын

Without deepseek o3 would cost $25 per million tokens

@ryanlee2091 6 күн бұрын

Can it run locally on my off-grid base out in nowhere? No? Good bye. Hi this is your homie Tony from LCSign.

@sophieedel6324 6 күн бұрын

Mistral Small 3 > DeepSeek. No normal user has any use for a highly censored model like DeepSeek that needs a giant server to even run it properly.

@seye46 6 күн бұрын

Can someone tell me which one is better, or do they both have their own advantages?

@dltn42 6 күн бұрын

Still prefer DeepSeek

@jasonchang8601 6 күн бұрын

Why would anyone in their right mind support that kind of scummy behavior when they could have released the cheaper option to begin with?

@kellyaquinastom 3 күн бұрын

Seems like we have to download and train our own. Prime agen is looking at this. Maybe internet of bugs could join. Like taking a 7 year old bright kids and slowly bringing him along. Clearly the way is for good programmers to pick a language like ziggy and group teach a new model. Lots of work.

@sebkeccu4546 6 күн бұрын

I find the naming of open ai extreemly confusing, you have three o3 models mini/medium/high but the mini also has 3 sub models: mini-low mini-medium and mini-high. So now when we are looking at benchmarks, we have no idea what the benchmarks refer to. Especially when you do like the video creator here, naming it "o3" , it is not clear anymore if we are still looking at the o3 mini models, and if yes, which of the sub models. Deepseek seem to be better then o3 mini-mini and mini-medium, but not mini-high (which is currently only for pro subscribers). But offcourse deepseek r1 can be downloaded, whereas the o3 models cannot be downloaded. And when we think of all the dowtime chatgpt has the last few, weeks. It becomes tempting to run it offline. Especially because with deepseek we can use PDF's, for some reason the o3 models don't support files.

@lyndonsimpson1056 6 күн бұрын

i think the formating outpout issue is a tell that this was not as polished as they wanted before release and did a rushed release following deepseek fallout. they never wanted to release this for free but have been forced into it. it's a lot cheaper for us but they are probably doing this at a big loss

@mdxggxek1909 6 күн бұрын

Instead of being confused why most of your costs are from claude, you could maybe just make out that claude is really well liked.... The reason why claude is used so much, is because it is incredibly well aligned for programming. You really feel that they put a lot of effort in their rlhf for programming tasks and it works really well on cursor Deepseek r1 is though much better at reasoning about the code, but not that good in creating good nice code without significant prompting

@jonklaric 6 күн бұрын

Do API users (or T3) have to pay for the tokens used in the weird formatting of outputs? Like those weird lines of dashes would presumably consume tokens despite having zero functional value in the output.

@benx1326 6 күн бұрын

o3 mini might be cheap but it generates lots of output token for reasoning deepseek v3 is the best compromise for generation

@KirowOnet 6 күн бұрын

In my test scenario o3-mini solved the problem fast, but R1 spent 10 minutes and gave me code that don’t work at all. All other models I tried also was not able to solve my test task. So o3-mini favorite for now. Have not tried o1 just in case.

@joshix833 5 күн бұрын

With o1 you are paying for output tokens that you don't get to see. That sounds like a scam to me

@legelf 6 күн бұрын

calling deepseek dilluted from their own model and then dropping a model comparable to r1 for so cheap doesnt make sense at all, its like openai is completely falling apart out of desperation💀