The full OpenAI o1 is MUCH better than o1-preview

Рет қаралды 3,976

The Feature Crew

Күн бұрын

Пікірлер: 28

@gabrielsandstedt 15 күн бұрын

Great video! Thank you very much! :)

@TheFeatureCrew 15 күн бұрын

Glad you liked it! Let us know if you want anything else covered in the future! -Dylan

@henrismith7472 15 күн бұрын

How AGI does AGI need to be before we call it AGI?

@jimj2683 15 күн бұрын

It has to be able to plan, reason, fantasize, act and learn continuously. And it has to be adaptable so it can be put in a robot of any type and figure out how to move and do what it wants.

@TheFeatureCrew 14 күн бұрын

Asking the real questions 🤔 - Jakob

@sangeetanarendrasingh5416 15 күн бұрын

In MLE-Bench (probably the most important benchmark), o1 (24%) performs significantly worse than o1-preview (37%) and only slightly better than GPT-4o (18%)

@clarityhandle 14 күн бұрын

Hey where'd you get your source for the o1 full scores for MLE-Bench i can't find it anywhere

@sangeetanarendrasingh5416 14 күн бұрын

@clarityhandle it was in the system card paper

@clarityhandle 14 күн бұрын

@sangeetanarendrasingh5416 thanks got it

@OctavChelaru 15 күн бұрын

Odd view but I can't help but find people's general reaction to the 200 bucks price point. Sure, you might not be used to paying that much for AI, but if you can make or save at least 400 then it's a no brainer. In reality, people looking to get Pro are likely making serious bank with AI and can't feel the $180 difference as compared to the upside.

@TheFeatureCrew 14 күн бұрын

Will do a video looking into pro soon 👀 - Jakob

@OctavChelaru 14 күн бұрын

@ Brilliant, looking forward to it. Will run it through it's paces today myself, got some heavy stuff to throw at it.

@hamandchees3 15 күн бұрын

It's quite good at NYT Connections

@calvingrondahl1011 15 күн бұрын

Thinking… Try not to spook the humans… more thinking.

@torch_boy 17 күн бұрын

First, if the first comment didn’t exist

@TheFeatureCrew 16 күн бұрын

😂

@unimposings 14 күн бұрын

OPENAI = RICHKIDSAI

@TheFeatureCrew 13 күн бұрын

😂

@IdPreferNot1 14 күн бұрын

ToL is so weak... As it is referencing the classic training corpus scenario with different sized rings... Trying to work a memorized problem not in existence rather than understanding the situation... Classic llm fail. How can llm devs not know this?? Dumb model with TTS will surely solve.

@mohammedhanifa8259 17 күн бұрын

first view

@TheFeatureCrew 17 күн бұрын

Thanks for watching, Mohammed! - Jakob

@edmilinski1295 15 күн бұрын

Lot of empty chatting. It seems like ,it looks like ...

@jaysonp9426 14 күн бұрын

No chapters no view

@Kiki-qh7xk 15 күн бұрын

Claude.ia sonnet 3.5 still better for coding

@ain3sh 15 күн бұрын

True but it has consistently been finicky about instruction adherence, sharp intelligence fall off even on medium length chats, and horrendously small chat size limits (especially with files with no option for rag/removing files from context to save tokens), on the web interface. So I still find myself going to ChatGPT but never want to. I can see those glimpses of almost genuine brilliance in 3.6 sonnet but the inconsistency seriously damages my overall sentiment

@TheFeatureCrew 15 күн бұрын

This is on our list to test 👀 - Jakob

@yoagcur 14 күн бұрын

In my experience, it is superior to Claude for creating a website I am working on

@DistortedV12 13 күн бұрын

No it’s not