New ChatGPT Strawberry Model is Here and it's INCREDIBLE

New ChatGPT Strawberry Model is Here and it's INCREDIBLE - OPENAI o1

Рет қаралды 39,262

Skill Leap AI

Күн бұрын

Пікірлер

@SkillLeapAI 3 ай бұрын

Join the fastest growing AI education platform and instantly access 20+ top courses in AI: bit.ly/skillleap

@christophervillela5900 3 ай бұрын

😮😢l

@georgeg.3518 3 ай бұрын

Back in 1986, I bought my first computer, a Sinclair ZX Spectrum 128k. I was 7 years old and thought I could just type in my quest, and it would answer. I quickly realized that's not how things worked; instead, I had to learn the BASIC programming language-which I became quite good at. Today, the day has come when things work exactly as I had imagined! I never thought I'd live to see it happen! A childhood dream has become reality. ChatGPT with the reasoning of o1-preview marks a new era.

@Addictedtobleeps 3 ай бұрын

I think we’re probably similar ages, and we’re FINALLY beginning to live in the times that we thought would happen a lot quicker, back in the 80s. Just need those damn hoverboards now! 😀😏

@Finndian 3 ай бұрын

@@Addictedtobleepshe is 45.

@Finndian 3 ай бұрын

I go back even further. I used to get the Mattel talking telephone for Christmas every year. It came with little mini records to put in and through the handset you could hear the one sided recorded conversation that never changed. However, I would listen so intently and in my imagination it was just about to go off script every time and I sat and waited thinking I heard it. I was just fascinated with the prospect. I have been waiting for ChatGPT just about my entire life.

@SkillLeapAI 3 ай бұрын

@Addictedtobleeps yep 100%

@xLBxSayNoMo 3 ай бұрын

Just keep in mind before you go asking your model a bunch of silly questions. You get 30 messages A WEEK on the preview model and 50 A WEEK on the mini.

@SkillLeapAI 3 ай бұрын

Oh yea good point. Forgot the mention the limit

@harshitbhatt3243 3 ай бұрын

Thanks!

@quantumHumans 3 ай бұрын

@@SkillLeapAI limit? is there limit on upgrade or gpt+?

@nobodygonnaknow8771 3 ай бұрын

wtf, what the ffck they charging for then, for wrong answers as seen on video

@wzt9376 3 ай бұрын

This should be at the top of the comments ! 😅

@roberthuff3122 3 ай бұрын

🎯 Key points for quick navigation: 00:00:00 *🚀 Introduction to New Models* - OpenAI introduces "01 preview" and "01 mini" models, - Designed to handle complex reasoning and coding tasks, - Available to ChatGPT Plus and Teams users, and API developers. 00:02:18 *📊 Performance and Testing* - "01 preview" model shows significant improvement in reasoning tests, - Benchmark superiority over previous models in coding and math tasks, - Achieved high scores in various test scenarios. 00:05:27 *🔍 Reasoning Process and Accuracy* - Demo of model's answer to complex SAT problems, - Illustrates Chain of Thought prompting for accuracy, - Shows improvement with structured prompts, varying success in solutions. 00:08:09 *🕹️ Coding Demonstrations* - Successful creation of a functioning checkers game, - Initial attempt at chess game logic requires refinement, - Potential shown in generating complex game code accurately. 00:09:58 *🌐 Model Limitations and Future* - Current limitations in general use compared to GPT-4, - Lacks web browsing and content summarization features, - Positioned for specialized complex reasoning, further integration anticipated. Made with HARPA AI

@Soccer5se 3 ай бұрын

The reasoning is scary good. I gave the 4o model the old riddle about the man who walks into a hotel with a wheelbarrow. It really couldn't get the answer at all. But the new preview had no trouble figuring it out. This is a game changer.

@DjHandzsolo1973 2 ай бұрын

You continue to impress me with the content of your videos. I haven't found anything like your videos in the KZbin universe. As others are probably told you keep doing this my brother I got a ton of value.

@LoFimau 3 ай бұрын

Great job explaining this! It helped a lot!

@southcoastinventors6583 3 ай бұрын

Have to do a major shoutout on your dedication and for a first pass for chess it did a really great job. Its like reflection if it actually worked. Thanks for interrupting your vacation

@SkillLeapAI 3 ай бұрын

Thank you. Yea it seems like reflection was trying to do exactly this

@СаскеУчиха-з1я 3 ай бұрын

This model is limited in capabilities as it is just a demo. That's when the full-fledged model comes out, that's when everyone will go crazy

@DevPythonUnity 3 ай бұрын

what is context in / out in tokens?

@tariqz5384 3 ай бұрын

Excellent channel. Can you please guide me to the GAI which can do web browsing. Extract and analyze content through that

@SkillLeapAI 3 ай бұрын

Sure. It’s called perplexity

@tariqz5384 3 ай бұрын

@@SkillLeapAI But why sometimes it says I do not do Internet browsing.

@scottymitch1 3 ай бұрын

Maths calculations are pointless if ChatGPT doesn't get 100% correct. Doesn't matter if the 'success rate' has gone up if it hasn't got to 100%.

@SkillLeapAI 3 ай бұрын

Small steps

@RetiredInThailand 3 ай бұрын

@@SkillLeapAI then don’t be pushing it as “INCREDIBLE” if it’s only “small steps”!

@therainman7777 3 ай бұрын

That’s a ridiculous statement. If you’d ever worked a day in your life in science or mathematics you would realize how incredibly useful a tool would be even if it only correctly solved 25% of the problems you asked it for help with. Problems are extremely difficult in these fields, so even a model that only a has a 25% success rate would save you hundreds of hours per year.

@RetiredInThailand 3 ай бұрын

@@therainman7777 It's not really 'complicated' math that these models are failing at. If it were solving only 25% of the world's most complicated mathematical questions it were bad at, then I'd agree ... just ask the AI and test it's answers, and if 1 out of 4 of them worked, then "hell yeah!" But. it's far simpler math that it is failing. So as the questions get more complicated, that 1 in 4 correct solutions starts to become 1 out of 4 of tens or hundreds of thousands of these, and the correctness of 1 questions near the start of the chain of math has knock-on effects making the probability more like 1 chance in near infinity it has the whole problem and all the math correct.

@AI_Revolution13 3 ай бұрын

How do you look under the hood to see the chain of thought? This is my answer, nderstand the equation OK, let's clarify the equation: 24x^2 + 25x - 47ax - 2 = 8x - 3 - 53ax. The goal: solve for a, combining like terms on one side. Mine doesn't look like yours? Rearranging and combining I’m moving all terms to the left-hand side, simplifying by distributing and combining like terms, leading to 24x² + 17x + 6ax + 1 = 0. Taking a closer look I'm exploring the equation's implications for all x or by plugging in a specific x to solve for a. Revisiting the equation I’m considering if the equation needs a universal quantifier or a specific 'a' value for infinite solutions, and if it simplifies to an identity.

@Dina_tankar_mina_ord 3 ай бұрын

The ultimate promt. Introduction: The ultimate goal is to create an AI system that leads humanity towards a peaceful, balanced, and evolved global society, where well-being, harmony, and ethical growth are prioritized across all aspects of life. Importance of the Goal: Achieving this goal is crucial because it addresses many of the core challenges facing humanity, including ideological conflicts, environmental sustainability, and global well-being. The AI, by harmonizing different worldviews, fostering peaceful consensus, and ensuring full transparency, will help humanity overcome divisions, evolve ethically, and build a sustainable and peaceful future for both humans and nature. the first promt starts like this Design an AI-agent that continuously learns and analyzes global data to promote human and ecological well-being, balance empathy with free will, peacefully foster ideological consensus, reveal hidden barriers to human potential, ensure transparency, and evolve ethically, guiding humanity toward a harmonious and sustainable future. Make Love the new credit.

@juandesalgado 3 ай бұрын

And then it hooks us all to a supply of intravenous morphine, and we live happily drooling for ever after.

@Soccer5se 3 ай бұрын

I gave it a link to a Coursera course I am looking at taking and it was able to read the webpage and tell me all about the course.

@SkillLeapAI 3 ай бұрын

Oh interesting. They said it had no web browsing yet

@ElearningDigest 3 ай бұрын

Coursera now has its own AI chat model built into the page when you sign up for a course.

@Greguk444 3 ай бұрын

I just tried the “Strawberry” test on my ChatGPT 4o version. I cannot believe it got it wrong and refused blankly to accept it was wrong. It even spelled the word out letter by letter and still said there was only 2 letter “r”. I have asked it many complicated questions that it gets right but this logic test it fails. I am surprised

@jackstrawful 3 ай бұрын

What fascinates me is the very first step the model takes, that is, how it decides to even approach the problem. Such as, with the chicken and egg question, the first thing it says is that it will begin by looking at biological evolution. But why would it do that? It must already understand that the question is asking about the origin of a species, that of the chicken. It must also already understand that the field which investigates the origins of species is the one that studies biological evolution.

@ktwice7481 3 ай бұрын

Awesome and great timing, just when I want to tackle some programming, so far, very extensive ❤

@SkillLeapAI 3 ай бұрын

It’s very limited access right now, so use your prompts wisely

@ktwice7481 3 ай бұрын

@@SkillLeapAI thanks!

@tango2olo 3 ай бұрын

Thanks for sharing! I wish you had an antropic sonnet 3.5 running side by side, with same task.

@SkillLeapAI 3 ай бұрын

On my list to compare it

@NotesOfArun 3 ай бұрын

it's incredible. using it, even its mini version is far better than 4o

@RetiredInThailand 3 ай бұрын

For example? Do you have an example where simple prompt engineering and a system/user prompt would not have provided a similar answer? I mean, it’s probably nice the the prompt engineering process is being automatically provided for you, but I’m not really feeling any major advancement here.

@Boschx 3 ай бұрын

No its not. Its literally the same

@djayjp 3 ай бұрын

Literally the best chicken or egg answer ever lol

@anythingandeverything363 3 ай бұрын

Its wrong. When we say egg or chicken we mean hen's egg. And if we are extending it back then the birds came first from mammals who didnt used to give eggs, and then birds started giving eggs :D

@RetiredInThailand 3 ай бұрын

@@djayjp I asked Free Perplexity the same question and asked it to explain its answer … I got nearly word for word exactly the same answer.

@micbab-vg2mu 3 ай бұрын

yes I tested it is quite good :) it seems that they improved chat initial prompt.

@onlinepersonalitydisorder1051 3 ай бұрын

Claude solve it at first try with multiple choices included

@Truecolors326 3 ай бұрын

I just used it first time since 2023 and Strawberry is Amazing 4 bible questions in. ❤

@GrahamLaight 3 ай бұрын

Good video - but how did he not notice that the chess starting position is wrong? 😄

@SkillLeapAI 3 ай бұрын

I think I gave it the wrong png file for king and queen

@rexmanigsaca398 3 ай бұрын

That's why they call the new model "Strawberry" 😁

@SeWi2221 3 ай бұрын

Why?

@SeWi2221 3 ай бұрын

Because it can count the letters of r in the word strawberry correctly?

@maraisdekker2415 3 ай бұрын

How do these guys discover latest releases and always seem nonchalant about it?

@SkillLeapAI 3 ай бұрын

OpenAI sent an email about this

@RichKingsford 3 ай бұрын

INCREDIBLE is a strong word - especially when the tool makes so many mistakes

@SkillLeapAI 3 ай бұрын

Incredible for an LLM. It had better answers than I did for pretty much every question.

@Horizon-hj3yc 3 ай бұрын

You say that about every new OpenAI model mr. Hype.

@SkillLeapAI 3 ай бұрын

Well you think they are going to release models that are not an improvement from the last one? Also, watch my video after I posted this one.

@SkillLeapAI 3 ай бұрын

Every new version of every new software I’ve ever used is better than the last version. Kinda of point of upgrades

@srivastav3684 3 ай бұрын

This question must be the part of its data that he is trained on, but without options you see, it was unable to find out, with options, he was already trained with this data

@sujayn3537 3 ай бұрын

Hey Boss. Cheers!!!

@Opeyemi.sanusi 3 ай бұрын

Gpt 4o was a let down for me. Was bad at following long instructions and coding not basic things, so I always use Claude Sonnet. Hopefully this isn’t too expensive

@alejandrosolari1760 3 ай бұрын

There is an error in the mathematical problem you set for the model. You got the wrong answer because that's a badly formatted question. The right problem is about the equation: 24x^2+25x-47 =-8x-3-53 for x ≠ 2/a And the left side of equation is divided by ax - 2, and only -53 is also divided by ax-2 on the right side. In this way the answer is a = -3, which even GPT-4o could solve.

@techant7282 3 ай бұрын

🎉not bad dude

@dadballers 3 ай бұрын

Hello Skynet

@janusz7 3 ай бұрын

I thought the new model would be called strawberry! Why did they change the name?

@SkillLeapAI 3 ай бұрын

Yea me too. Not sure why the name is different

@quantumHumans 3 ай бұрын

@@SkillLeapAI maybe beacuse of being scared of things like insider trading?? that is me saying maybe! total nonsense but one is for sure lying when asked about strawberry in his garden on X and a lot more like earlier this year or even before saw like 50min video from some AI tuber(thank you for your service guy) like most of people no judgement...just sayin thumnails like someone saw burning bush or catlike humanoid smashing smartphone seeing TIK-TOK

@RanHab 3 ай бұрын

guys i'm just starting out as an AI enthusiast, would love your feedback as i make similar stuff!

@dxnvideoHD 3 ай бұрын

now.. Hallucination Is All You Need .. To Get Rid Of.

@nobodygonnaknow8771 3 ай бұрын

first question and chat gpt failed, i was like WTF man, why the fck then i am paying subscription

@NakedSageAstrology 3 ай бұрын

I love how I keep predicting the dates exactly, yet nobody notices... Remember this comment? 🤖 👁️ 🍓 Remember, remember the 12th of September, The Strawberry, Reason, and Mind. Orion’s path, through logic’s math, Shall soon its breakthroughs find. The Cosmic Glitch, Mrigasira Nakshatra, holds the Clue for You. 🙏

@SCHaworth 3 ай бұрын

They are not that good if youre doing hard stuff.

@cr-iv1el 3 ай бұрын

Go green and give up chatgpt. It uses 17000 household power usage.

@happytree121-wl2zl 3 ай бұрын

This model and 4o have the same problem, both of them can't solve the math problem correctly. Ideas may be good and can be used as a reference, but he made mistakes in the calculations in very simple places .Don't know why

@TheCajunAsian 3 ай бұрын

Sorry but o1 sucks major donkey balls.... it is dumb as dirt.... i couldnt use it anymore after like 5 min... I dont give it "Ai tests".... I just use it like I want to for what I need and it is worse than 4o and it is much worse than Meta Ai in many ways, basically unusable right now, terrible release.... do they even test this crap before launching it?

@edwardserfontein4126 3 ай бұрын

Atleast SOMEONE in this comment section is honest!

@SkillLeapAI 3 ай бұрын

Really? In the few test I ran in this video, it beat GPT by a mile. This is designed for math and complex reasoning and coding, not much else. If you know of a model that can keep up with my results in those categories, I’ll test it.GPT doesn’t even come close in solving those or giving me usable code at this level

@RetiredInThailand 3 ай бұрын

It hardly ‘incredible’ … why would anyone ask a multiple choice math question other than someone taking a SAT. I asked perplexity the same ‘chicken/egg’ question and just asked it to explain its answer and I got the same answer in a second. I wish you AI bloggers would stop being so ‘excited’ about almost nothing. Yeah, sure LLMs are useful for some things, but so far their rate of advancement is nowhere near the level of constant hype. Do better. Benchmarks are useless, actual useful use cases are needed, these are the only things that count!