Can ChatGPT o1-preview Solve PhD-level Physics Textbook Problems? (Part 2)

Рет қаралды 32,555

Күн бұрын

I test ChatGPT o1 with some more (astro)physics problems I solved in graduate school. This time, I pick a set of problems that were hand-crafted by my professor, meaning that the probability that these problems exist on the internet are slim. Needless to say, the results were surprising.

Пікірлер: 552

@jdsguam 4 күн бұрын

It got the right answer in 5 seconds. To say, it went a little overboard and took unnecessary steps, is silly to me. It took only 5 seconds! Do you know of any human on earth, living or dead that can get right answer in 5 seconds and have it all typed out in a clear format, with explanations on the thought process?

@mAny_oThERSs 3 күн бұрын

Yeah

@ian_silent 3 күн бұрын

The reason it solves the problem in a roundabout way is that it can only think about what it writes down. Large Language Models are text predictors; they reason only to the extent that they can predict the next sequence of letters. Even the new o1 model works like this. The key difference that makes it better is that it is more iterative. But this iteration still requires it to write everything down. It thinks through text prediction.

@denjamin2633 2 күн бұрын

@ian_silent It's downright goofy that a text autocomplete on crack is able to solve high level equations like this. Truth is stranger than fiction sometimes.

@adamsigel 4 күн бұрын

You’re saying that it didn’t do it your way, but that’s a good thing. One of the things we should expect is new and novel ways of solving things.

@KMKPhysics3 4 күн бұрын

Yes! I agree with you, I'm glad it showed me a new way of doing things, though it did make it harder for me to evaluate. I definitely would like to use it more to improve my physics skills!

@sdfrtyhfds 4 күн бұрын

From a perspective of a proffesor, if you guve a long indirect sloution, it's not as good.

@Ou8y2k2 4 күн бұрын

@@sdfrtyhfds If it is truly reasoning, it'll only get better.

@NostromoVA 4 күн бұрын

Alpha Go did the same thing. The top human player was freaked out by its approach, and lost to it.

@mattiastengstrand8209 4 күн бұрын

What if you tell it to do different solutions and pick the most simple one?

@sujoy1968 4 күн бұрын

I watched both parts. It is great that this content is created by an actual researcher. Greatly impressed by Kyle’s content and o1’s capabilities.

@KMKPhysics3 4 күн бұрын

Thank you, I really appreciate your comment! I’m not perfect when it comes to this kind of work, but I do like sharing what I’ve learned :)

@CodepageNet 4 күн бұрын

Beginning of 2023 we were flabbergasted that AI can produce a more or less coherent chat. Now we're disappointed if it's not perfect on a PhD-Level math test. This is pretty astounding in my book.

@jonp3674 4 күн бұрын

Agreed the rate of improvement over 5 years has been insane.

@tzardelasuerte 2 күн бұрын

@@CodepageNet people are in denial and working overtime trying to discredit it. I know very smart people who continue saying it's just a database or that it's just copy pasting.

@Pils10 Күн бұрын

@@tzardelasuerteI mean I get it. The problem with these types of neural nets is that we have little to no insight into how they actually operate. We just know that if we feed them a ton of (high quality and reasoning data), they can produce high quality and reasoning as a result. Fctually speaking they are not sentient, but some people may feel like they are. My main concern is that companies are going to stop hiring or firing a lot of their workforce since AI can do it cheaper and faster (doesn't even need to do it better or at the same quality level). The remaining people will have little to no leverage, since on a regional / global scale, the demand for jobs will be astronomically higher than the supply's of available jobs and people need a job to live comfortably.

@tzardelasuerte 21 сағат бұрын

@@Pils10 why do you say factually they are not sentient? Sentient is whatever humans define as sentient.

@Pils10 21 сағат бұрын

@@tzardelasuerte I personally defined sentience as someone doing stuff out of their own free will, not just reacting. ChatGPT and other LLMs are just reacting / responding to what the input. For me, this isn't sentience. What do you think is sentience?

@jeffwads 4 күн бұрын

It is laughable that some are saying this model is no big deal. I have thrown some tough questions at it and it got everything right. In fact, it had unique insights into those problems that I hadn't considered.

@Nnm26 4 күн бұрын

This is a preview model and the fact that it’s used base gpt 4 for RL and not GPT-5 Orion makes me think that within the next 2 years everything will dramatically change

@AAjax 3 күн бұрын

@@Nnm26 Absolutely. I think a lot of those negative opinions are based on it not doing better poetry, story writing, etc. None of that should be expected to improve with q*star. In a recent interview, Andrej Karpathy said he thinks that models can get a lot smaller and more capable with the right sort of training data - with step-by-step reasoning in it. OpenAI is reportedly using q*star to refine their synthetic data right now. If Andrej is right, OpenAI has entered into a virtuous 2-year cycle, where refined data is used to train more capable base models, which are then steered with q*star to refine the synthetic data further.

@tzardelasuerte 2 күн бұрын

@@Nnm26 not 2 years. In a few months. Pretty much the floodgates are being held because of the elections.

@tomaszzielinski4521 2 күн бұрын

Guys who invested billions in AI saw it's coming. Now everybody can have their personal team of PhD assistants at hand.

@seregv 3 күн бұрын

By far, the most insightful and useful experiment to access the capabilities of this new model to solve complex non-coding related problems. Thank you very much for sharing this!!

@pcdowling 4 күн бұрын

I would start a new chat if the question is completely unrelated.

@SDelduwath 4 күн бұрын

Strongly agreed. He was filling the context with a bunch of unrelated stuff that likely is hindering it's performance actually

@TechnoMinarchist 4 күн бұрын

Absolutely this

@KMKPhysics3 4 күн бұрын

Thanks for pointing that out. I will be sure to remember this for the next time I prompt it!

@rpraka 4 күн бұрын

super cool seeing your experiments, excited to see what o1 can do with your dissertation problems!

@KMKPhysics3 4 күн бұрын

Thanks so much! Hope you tune in next time :)

@Khari99 4 күн бұрын

The reason why it’s possible is because it’s trained by rewarding it on learning the next step in reasoning to solve problems. So you take a set of physics and math problems and have it learn how to project one step forward at a time until it gets the answer and then it’s able to learn how to generally reason across new domains. It’s trained on perfecting the step by step process so that it’s able to figure out new problems by assessing what is most likely the next step to get to the solution

@EGarrett01 3 күн бұрын

This is our first glimpse of superhuman general intelligence. It's a problem that humans can solve, but it solves it exponentially faster. Soon, the reasoning for each answer will be far more in-depth, and it will happen at instantaneous speed. If this is 11 steps of reasoning in 15 seconds, imagine 11,000 steps in 0.25 seconds, like a chess engine, but for real-world problems.

@tzardelasuerte 2 күн бұрын

@@EGarrett01 they can already work out novel ideas and solutions. Google has already proved it in geometry. These companies already have advanced models in house and are just slow dripping us improvements.

@GH-uo9fy Күн бұрын

The physics problem is already alien language to me, not my field. I can just imagine if AI can discover new domains of knowledge that will be very hard to understand even for the smartest humans.

@atsoit314 4 күн бұрын

This is absolutely insane. What a time to be alive.

@goldnarms435 4 күн бұрын

you aint lying. What will the next 10 years bring?

@KMKPhysics3 4 күн бұрын

Glad to know I wasn’t the only one surprised

@Ou8y2k2 4 күн бұрын

_Two minute papers_ YT channel fan?

@amirsafari7140 4 күн бұрын

Labs around the world have figured out the shape of 200,000 proteins over all these years, and alfa fold did 200 millions just over months,i think this will be true for mathematics, there will be no unsolved problem everywhere 😅

@duduzilezulu5494 3 күн бұрын

I can't believe it can do university level physics. Part of me doesn't even want to believe what it just did even though I saw it. Genuinely looking at this in awe.

@grimaffiliations3671 3 күн бұрын

scary part is that this is just the preview, and its based on gpt 4, who knows what a version of this based on gpt 5 will look like as that is expected to be trained on 100 times more computing power. Then the massive Orion model that will come after that

@duduzilezulu5494 3 күн бұрын

@@grimaffiliations3671 True. I think continous development of o1 will lead to A.G.I, possibly. It sounds like I'm exaggerating but it is already doing science at university level, ACROSS MULTIPLE FIELDS.

@grimaffiliations3671 3 күн бұрын

@@duduzilezulu5494 what a time to be alive

@duduzilezulu5494 3 күн бұрын

@@grimaffiliations3671 Indeed, fellow scholar.

@mirek190 3 күн бұрын

@@duduzilezulu5494 That level of understanding is not AGI that is clearly ASI. AGI should be the level of ordinary human.... ordinary human can do that??

@harlycorner 4 күн бұрын

The bext time you do this, please start a new enpty chat for each new problem.

@AlexisLionel 4 күн бұрын

Thank you for such a useful video! Really impressive model - I had to resort to using the standard GPT 4o to come up with tasks in various domains difficult enough to challenge o1 preview. By the way, a possible reason why it went for such a convoluted solution in Problem 1 might be that you put it in the old chat/conversation from your previous video. And because you had much harder Jackson problems prior to this, the model kept all of them (and its reasoning steps) in context while answering a much easier Problem 1 from this video. So it might have assumed that the difficulty level would be comparable. For this reason I try to start a new chat for new topics/problems - and it also saves Microsoft/OpenAI compute resources as the model doesn't have to keep all the previous context in its head :D

@KMKPhysics3 4 күн бұрын

Thanks so much for watching and for that advice!

@TechnoMinarchist 4 күн бұрын

It wouldn't have assumed equal difficulty. It's just that LLMs try to match its prior context in terms of complexity and tone of speech and length.

@Kannatron 4 күн бұрын

Just started the vid, but I am so confused why people think (almost unanimously) that “ai” won’t advance to a point where it can do everything better than us. There is nothing particularly special about the human mind that can’t be represented in a computerized neural network. The only limit is the number is synapses in the human mind. If either transistor counts get high enough, or we somehow get quantum computers to model a neural network; who’s to say we can’t figure out AGI? The financial incentive to have humanoids fuel the growth of a countries GDP is far too high for people to give up or for the funding to “run out”. I truly believe we all (academics/students) should be working on helping/advancing the development of human level machine intelligence. The benefits to society long term are too much to cower away from just because you might lose your job in the short term.

@bobthebuilder8788 4 күн бұрын

I think the dilemma people have is that the current iteration of LLMs are all based on imitating training data, and it seems unlikely that such an architecture can advance the state of knowledge/science rather than just regurgitating known solutions. Another big factor is that people are starting to realize the limitations of these systems.

@andydataguy 4 күн бұрын

Lack of imagination and critical thinking

@-BarathKumarS 4 күн бұрын

whats the point of millions of students studying stem courses then? if ai is already good enough to replace phd level folk(in a few years probably) then there is no need of universities and the whole concept of education existing makes no sense.

@andydataguy 4 күн бұрын

Humans have been the "most intelligent" species on the planet for a very long time. The idea of that being challenged is uncomfortable for most. Especially since in order to actually get the fullest potential of this technology a person would have to have highly technical knowledge. That means less than 1% of the population will be able to push these things to their limits (e.g. swarm networking, automated eval, real-time retrieval, etc)

@brutexx2 4 күн бұрын

@bobthebuilder8878 I think you got it spot on there.

@ertwro 4 күн бұрын

I can’t see how this wouldn’t help educate better physicists if used properly. It could save weeks of work at a time. My condolences to teachers who want to avoid students cheating.

@KCM25NJL 4 күн бұрын

Perhaps this AI age is going to move us all away from becoming experts in domain, to experts in asking the right questions. Academia has taken us to the point of not requiring Academia, which I think is both a frightening prospect and exciting one in equal measure.

@kairi4640 4 күн бұрын

Honestly if singularity happens. Learning might not even be a thing anymore. People might just simultaneously know everything like a hive mind. 💀

@KMKPhysics3 4 күн бұрын

Yes, I think this has enormous potential to improve everyone's ability to learn not just physics, but any subject that someone wants to improve in, really. Perhaps these models can't come up with novel answers to unsolved problems yet, but having a companion like this while one works is game-changing for sure.

@blubblurb 4 күн бұрын

I think it will make us worse unfortunately. We are lazy by nature and we only get the skill and knowledge by work. If the AI does the work for us, I think we lose the skills.

@quantumspark343 4 күн бұрын

extrapolating answers from similiar studied questions is literally what humans do in tests lol

@KMKPhysics3 4 күн бұрын

This was trained using reinforcement learning I believe, so kind of cool to see it do it in real time!

@TheThoughtfulPodcasts 4 күн бұрын

So ?

@quantumspark343 4 күн бұрын

@@TheThoughtfulPodcasts its funny how people act like its cheating when AI does the same

@lac5187 4 күн бұрын

I feel like a Neanderthal with a computer in my hands. I know the incredible potential, but I don’t know what to do with it

@KMKPhysics3 4 күн бұрын

Trust me, I don't feel too different than what you've described

@llsamtapaill-oc9sh 4 күн бұрын

Btw this is o1 preview and openai has confirmed the next model will drop next month which will be o1 full release. It's apparently 30% better than the current o2

@vickmackey24 4 күн бұрын

Current o2? 😳 Was that a typo, or is that some other model I'm not aware of?

@MaJetiGizzle 4 күн бұрын

⁠@@vickmackey24It’s a typo. They meant to say o1 vs o1-preview, which is the model we’re seeing in this video.

@KMKPhysics3 4 күн бұрын

30% better? Oh boy, this is going to get interesting.

@mirek190 3 күн бұрын

@@KMKPhysics3 ..and next year we get "orion" ...

@tzardelasuerte 2 күн бұрын

@@llsamtapaill-oc9sh most likely after elections.

@famnyblom6321 4 күн бұрын

Why are you not clearing the context before those tests? Having all that previous context will likely confuse or degrade the model results or?

@Linshark 4 күн бұрын

You are right, he should do that.

@pc_screen5478 3 күн бұрын

Sometimes keeping context where the model had a good initial approach to the first request can help it stay consistent with that approach for subsequent messages so there's that

@mirek190 3 күн бұрын

Did it a mistake ? no ... what is your problem?

@mirek190 3 күн бұрын

imagine that o1 preview is not full o1 version yet....

@mrshankj5101 4 күн бұрын

o1-preview is astounding and i hope it gets smarter!

@SNP2082 4 күн бұрын

The full o1 preview is way better than the full o1 though it hasn't been released yet

@mihirvd01 4 күн бұрын

@@SNP2082 It's gonna be just "o1"

@percy9228 4 күн бұрын

There are so many questions I have. How on earth are we going to test anyone under graduate level if this tech is able to pass graduate level. You can't catch this, all you have to ask it is to write it in a different way. I'm sure it's able to think of more ways than most teachers on the solution. -What is going to happen in another year, another 2 years another 5 years? we still have much more room to get better. -This is like every person having a professor help them learn, and it's going to get better.

@Arcticwhir 4 күн бұрын

lol...without electronic devices. Like we have been for like 20 years now. I really dont understand peoples thought of not being able to test students. Honestly though for many engineering schools, they take an open book approach (or part of the book). Believe it or not its still possible to fail open book tests - i've seen it firsthand. I dont know about you, but i've had math tests in HS/beginning of college where no calculators were allowed, sometimes only a simple calculator.

@netscrooge 4 күн бұрын

We shouldn't merely picture adding AI to the old educational paradigm. As students are tutored individually by AI, the system will develop an intimate understanding of each student's capabilities. Learning itself will be the test.

@nocodenoblunder6672 4 күн бұрын

@@netscrooge Why learn when your knowledge is never going to be useful for something productive? Human Learning is going to be a hobby at most.

@netscrooge 4 күн бұрын

@nocodenoblunder6672 Sorry, I forgot that many view education as merely a means to career advancement. Thanks for bringing me back to reality.

@nocodenoblunder6672 3 күн бұрын

@@netscrooge Humans aspire to be useful. It doesn’t mean you are only doing it for that fact. But I think for most its at least a part of it. To be able to use your craft, giving value to others.

@DanielSeacrest 4 күн бұрын

o1-preview and o1-mini doesn't have access to any calculators or any tools for the moment so every calculation it did, it did by itself which is why there might be slight number discrepancies.

4 күн бұрын

Really impressed with those tests. I did my phD (engineering) back on 1998 and I was using the most powerful pc’s that we had on the department back then with just 32Mb of RAM to run my mathematical models and my heuristics and GA approaches. There was just the begging of using graphics acceleration CUDA back then although I had no access to that kind of CUDA equipment so my models needed about 10h of computer time to execute. I can imagine nowadays using this kind of AI on an agent giving it access to tools to execute and test different models alternatives in order to test and advance the research exponentially faster. I cannot imagine how much easier and faster can go the research today with tools like this.

@AlfarrisiMuammar 4 күн бұрын

Not only fast but automatic

@percy9228 4 күн бұрын

at the rate of advancements and the emphasis on AI, I won't be surprised we realise AGI, as of now it's theoretical and has some formal definitions. People don't realise their are research work done on what AGI is and other higher AI. They think it's smart so it's AGI. Right now we don't even know if you keep adding more compute if it will somehow appear. If we do achieve AGI (and I'm hopeful it will happen within a decade) then it might be Computers doing research. I can't code but I used Microsoft co-pilot help me do what I wanted. Imagine once this becomes mainstream like Google has become. This is a shift for all human civilisation, you'll have AI as Teachers that are able to help you understand anything and everything. In the future I can see people create realistic 3d avatars with real physics and everything and talk in natural language, it will be like you are communicating with a real person and it will show you how to do your work. heck I can image people having AI partners as apposed to having pets in the future.

@tentzz 4 күн бұрын

What a time to be alive lol

@PracticallyFeral 3 күн бұрын

This was a much better test. Impressive. Now let's see if it can create Jackson style questions on its own.

@Ou8y2k2 4 күн бұрын

The next test is to get your professor to prompt o1-preview with a problem he's currently working on to see what it comes up with.

@u.v.s.5583 4 күн бұрын

I have done it. Let us say, it can come up with good ideas, but it is not at all very great at creating differential equation models from scratch and then predicting their qualitative behavior.

@mirek190 3 күн бұрын

@@u.v.s.5583 wait for a full o1 in couple months ;) or in the next year orion ( probably o2 ? )

@Junior-zf7yy 4 күн бұрын

Firstly this is only preview, the actual o1 is even better. And members of open AI have said the rate of improvement in these models are significantly faster than in the previous gpt models. Even in a months time we should see significant improvements. Exciting times ahead.

@KMKPhysics3 4 күн бұрын

I know! Crazy to think this is just the preview version when the benchmarks they reported state that the real version is even better.

@Moobydick 4 күн бұрын

You should send each problem in a new chat. OpenAI said to not put too much stuff in the context of the o1 models to avoid confusing them.

@mirek190 3 күн бұрын

Was confused? no ...

@dennycote6339 4 күн бұрын

If we wish to climb a mountain and there are 3 people sharing that idea. There are perhaps going to be as many as 3 paths to that experience, standing on the apex of the mountain. That an other doesn't arrive there by the same path isnt a failure, it is the revelation of the validity of a different path. im glad that you shared a completely real experience. my life is changed as thoroughly as yours.

@duudleDreamz 4 күн бұрын

pufff, pafff, poinggg, (the sound of my mind being blown while watching your great video). Yes, please more of this.

@KMKPhysics3 4 күн бұрын

I appreciate you watching my video! I will think of more content like this to make :)

@parthasarathyvenkatadri 4 күн бұрын

And its not even GPT 5 yet ...

@VictorKing144 4 күн бұрын

That’s just a naming convention, your comment is meaningless.

@hydrohasspoken6227 3 күн бұрын

it's not even GPT 23 yet.

@mirek190 3 күн бұрын

@@VictorKing144 ok .. that is not orion , actually not even full o2 is is only preview version .... that is still based on gpt4. Orion will be available in 2025

@AmphibianDev 4 күн бұрын

Next time, I advise you to make a new chat for every problem, it's much more reliable that way.

@AAjax 4 күн бұрын

Honestly, it using different methodology to get the first answer actually is the most impressive to me. I'm guessing if it didn't find a suitable method to get from start to finish, it probably would have backtracked, like it did for a problem in your previous video. I would expect the method you and your professor used would be the most documented solution, if it is in fact documented somewhere.

@KMKPhysics3 4 күн бұрын

That is a great point! I feel like o1 can help me think about physics in new ways, which is an exciting prospect.

@steve_jabz 3 күн бұрын

You really should have started a new chat every time you asked a new question. Performance drops off quadratically the further down the context window your prompt is, and you're asking very complex questions. ChatGPT was designed like a chat interface for casuals to have a continuous back and forth dialogue like that, but in the ml world this is a well-known problem with LLMs. No shame in not knowing that, it's buried in the GPT-1 and GPT-2 papers, but It's more impressive that it did so well in spite of that, and without being prompted to use external tools like python.

@Dron008 3 күн бұрын

Agree, it surprised me a lot when he put task 2 in the same chat. Context from the 1st problem will affect much.

@mirek190 3 күн бұрын

He did not reset and o1 solved everything ... so

@steve_jabz 3 күн бұрын

@@mirek190 yeah, but it's worth mentioning anyway. if it had failed, it could have been due to the previous 128k of context window greatly degrading performance, so it's not good practice for future prompting

@mirek190 3 күн бұрын

@@steve_jabz Where degrading? Is questions are from similar topic answer can be actually improved.

@steve_jabz 3 күн бұрын

@@mirek190 even from a similar topic, it will degrade. The fact that it didn't is a testament to the reasoning engine, but if you want maximum performance, it's better not to handicap it

@brandonsballing826 17 сағат бұрын

In response to the first question, its good that it can find DIFFERENT WAYS to do the SAME problem correctly. This is AGI. It has deeper knowledge than you can comprehend. It got the right answer with a more detailed approach.

@williamwillaims 4 күн бұрын

Every 6 months, we get a jump in capability - fast forward 10 years (or even 5), and when paired with autonomous ai agents, a massive labour vacuum is coming. We all know what the biggest expense to a business is.....

@EduardsDIYLab 4 күн бұрын

There is other side of that coin. If no one has work, no jne has money to buy, then why to produce in first place? This technology makes things cheaper. Everything it can do woll become mass produced cheap things, like what you get on aliexpress. AI is industrial revolution and assembly lines 2.0 for knowledge work. Get ready for cheap, mass produced knowledge work. To extent money represents human time. We exchange ours for others. This makes a lot of things cheap, but not all of them...

@lolilollolilol7773 4 күн бұрын

@@EduardsDIYLab if this revolution happens (and it *will* happen), we have to think at a new society urgently and ditch the capitalist model, because it won't work, and it will lead to massive societal problems.

@williamwillaims 4 күн бұрын

@EduardsDIYLab I'm sorry. I'm following what you're saying. And I agree on how revolutionary this tech is. But, let's be real for a moment - what we're talking about it a total change in economic value, potentially a shift in the major currency and an even higher concentration of wealth in the hands of even fewer businesses in the private sector.... and I hear people talk about it like we are just going to roll into it. No. It will be a major disruption to society. The recent writers protest in Hollywood but on steroids - where every year a new industry is changed almost overnight. An example, a lot of small businesses have a local book keeper to balance their finances (small business, Coffee shops, bakery's, news agencies, chemists etc). When MYOB or quickbooks or any other accounting software company releases an update with their cloud service to include a personal autonomous ai agent, trained on accounting. Boom 💥 a huge number of real people loose their jobs. This may take a few years for the business owners to trust the systems - but eventually the cost savings will win out. Those book keepers have no jobs, pressure put on gov, to subsidise, no coffers to pay for UBI (no tax), crumble crumble crumble.

@JohnKruse 4 күн бұрын

@@williamwillaims I've been telling people since 2012 Imagenet that we are on a trajectory to blow up the social contract of trading labor for $$$. Ultimately, it will be a good thing, but the transition to something new will be terrible. What is the saying? "It is easier to imagine the end of the world than the end of capitalism." I'm actually not that worried about the concentration of wealth as I think that it will be impossible to build moats around AI/robotics advances. It will naturally decentralize/democratize. Karpathy has recently said that the ingredients for making this stuff work are not something that is really mysterious. It's just that some have a head start. Honestly, most conflict in the world revolves around fighting over resources. The end to almost all scarcity will allow flourishing - but we need to push the benefits out to everyone as fast as possible to defuse conflict IMHO.

@llsamtapaill-oc9sh 4 күн бұрын

Terrance Tao a mathematician said this : Here the results were better than previous models, but still slightly disappointing: the new model could work its way to a correct (and well-written) solution if provided a lot of hints and prodding, but did not generate the key conceptual ideas on its own, and did make some non-trivial mistakes. The experience seemed roughly on par with trying to advise a mediocre, but not completely incompetent, graduate student. However, this was an improvement over previous models, whose capability was closer to an actually incompetent graduate student. It may only take one or two further iterations of improved capability (and integration with other tools, such as computer algebra packages and proof assistants) until the level of "competent graduate student" is reached, at which point I could see this tool being of significant use in research level tasks.

@jeffsteyn7174 4 күн бұрын

According to openai this is only a preview. Ie the full model has not been released.

@geraldhoehn8947 4 күн бұрын

Graduate mathematics may conceptually be still a little harder than graduate physics. The used mathematics for the solutions is not such advanced.

@josjos1847 4 күн бұрын

Where he said that? I would love to see his opinion of this model

@hypnogri5457 4 күн бұрын

@@jeffsteyn7174Terence Tao had access to the full version

@hypnogri5457 4 күн бұрын

@@josjos1847on mathstodon

@hipotures 3 күн бұрын

“disallowed content” - presumably it is about such operations, which can facilitate the construction of a large mushroom-shaped explosive object :) Or quantum b__b :P

@hypnogri5457 4 күн бұрын

I don't think it has access to the code interpreter yet so those miscalculations look fine to me considering that it didnt use python for the calculations

@mgscheue 4 күн бұрын

Hoping they give it access to WolframAlpha, too.

@andydataguy 4 күн бұрын

I'd love to see the journey of refactoring your dissertation with gpt-o1. Would be interesting to see what you find is possible with this robo-assistant versus when you originally wrote it. Especially since now you can likely go even further and create something visual to make it engaging to follow on KZbin. Keep it up fam!

@KMKPhysics3 4 күн бұрын

Thanks so much! We'll see what o1 can do to help me with my spaghetti code I wrote in graduate school haha

@ryzikx 4 күн бұрын

hello guy that is in every ai video comments section

@andreinikiforov2671 4 күн бұрын

Don't forget this is just a 'PREVIEW' of o1. The full model comes out in November...

@satioOeinas 3 күн бұрын

o1 is due in less than a month - not sure about gpt5 tho

@senetcord6643 3 күн бұрын

@@satioOeinasI heard rumors saying winter 2025

@Gen-XJohnny 4 күн бұрын

This is only a preview model

@efraimmukendi7137 4 күн бұрын

Wait so the real one isn’t out yet?

@Gen-XJohnny 4 күн бұрын

@@efraimmukendi7137 This is a watered-down preview

@tzardelasuerte 4 күн бұрын

From what I understand it's the snapshot of the full model. It's 50% "trained" the fully trained will come out in a few months. Basically after the elections. Just like gemini 2 and Claude oppus

@mirek190 3 күн бұрын

@@efraimmukendi7137 yep

@ron-manke 4 күн бұрын

It's not searching the Internet, or an index of the internet - also evidenced by its thought patterns. I wouldn't get caught up in its thought patterns to determine if they are going down the wrong path. That's exactly how it works for everyone. You need to go down many paths to get the answers eventually.

@Hardcore10 4 күн бұрын

I just watched the video I like experts actually testing the models people make jokes about the model not being able to count the R’s in strawberry and people just dismiss it as a parrot that just regurgitates stuff on the Internet because of it, but AI intelligence is something completely different from humans in some areas it straight up can match PHD’s in other areas it’s super dumb but I think for expert domains this model will be very useful to help people out great video last thing I mean, AI is coming for us all eventually it’s definitely happening. It’s wild to live in this age

@krumkutsarov618 4 күн бұрын

We better soon start thinking about becoming cyborgs or we will be totally useless😂

@mattgray666 4 күн бұрын

correct🤖

@h-e-acc 4 күн бұрын

We need more stress testing of o1 😅😅 amazing 👏 👏 👏

@craigington73 4 күн бұрын

Open A.I is currently working on a fusion generator....

@almusquotch9872 4 күн бұрын

source?

@OmicronChannel 4 күн бұрын

It’s possible that the LLM used an approach for the first problem that relies on the far-field of the Linard-Wiechard potential, while your approach already uses a simplified version of the underlying equations, which are often laid out beforehand in class.

@williamparrish2436 4 күн бұрын

So does this shut up all the people who say it is just using token prediction?

@nawabifaissal9625 4 күн бұрын

nah, it's just a parrot, a parrot with the same IQ as nikola tesla is still a parrot !!!!!!!1!!!1!!!

@xClairy 4 күн бұрын

No, it just validates it further? I mean, think about it: this is an ML model; its goal is to be a function approximator in high-dimensional vector space, and it's doing a good job at it currently, through self-attention and tokenization and other methods with internal data representation. If I had written a program that could solve pathfinding problems for all domains and vertices and find one of the few local optima, or was able to find global optima better, then it's just a better-fitted function approximator. Same case is for LLMs, it does not change what it is doing; it's still doing next-token prediction. But with o1, it was trained on how to do the CoT to self-prompt on inference about how to predict appropriate tokens for solving problems because there isn't much dataset on the internet about how to think. It is just mimicking "thinking" by finding the appropriate tokens to use to aid in task completion for the loss function. That's simply all that's doing, and yes, it's crazy that it's so novel that it is capable of doing that, which is the reason LLMs are so novel to begin with. But still, give it an OOD that was never ever covered in its dataset, and it didn't learn a local optima or internal representation of the vectors for the QKv matrix; then, as a bad function approximator, it'll fail and just give convincingly plausible yet unrealistic answers. That's the entire reason "hallucination" exists. (Also, if you still didn't get the gist, conversely it applies that as a good function approximator it'll do good in datasets and internal representations it was capable to learn; CoT basically extrapolates the process in inference to aid in next-token generation for better loss function.)

@goldnarms435 4 күн бұрын

@@nawabifaissal9625 Trying to sound smart, aren't you?

@nawabifaissal9625 4 күн бұрын

@@goldnarms435 if you didn't understand this as sarcasm then perhaps you clearly aren't smart lol

@IoT_ 4 күн бұрын

it can't solve this problem properly: Which one is bigger 50^50 or 49^51(Without any calculator and approximations) Which is calc 1 level student.

@nyyotam4057 3 күн бұрын

You will need to re-shoot once o1 is released.. This is still o1-preview.

@matthewclarke5008 4 күн бұрын

You know what terrifies me, when you ask it very simple math it goes about it in a very complex way, because it's thinking in the same way to solve complex math when solving basic algebra. I feel it's going about your complex physics problems in such a drawn out way because what you are giving it is actually basic for it and it's using the thinking of something a lot more advanced to solve your problems. I know nothing about physics, but I feel this is what might be happening.

@lolilollolilol7773 4 күн бұрын

Yes, I noticed that with software programming as well. It seems to seek some more general solutions rather than the most direct solution. What is also impressive is at 8:50, when it writes "this suggests that the scattering depends on the angle Phi, which contradicts our expectation that the cross-section should be independent of Phi", meaning it has an understanding (or representation) of Physics, and immediately after this remark, gives an explanation of why the dependence on Phi. I can see why it performs so well at math.

@aship-shippingshipshipsshippin 4 күн бұрын

o1-preview is nuts, im waiting for full version of o1 also, the new even bigger model will come out in 2025 too ( gpt5 ) can't wait

@andydataguy 4 күн бұрын

I love that you went so far to run the test again! Thanks for sharing man

@KMKPhysics3 4 күн бұрын

Of course! Thank you for tuning in!

@6681096 4 күн бұрын

Great testing. The guy who runs AI explained has a private database and this model was easily the best and he called it a step change. Still can make mistakes especially with common sense problems. Some initially were disappointed there was no breakthrough and OAi used prompting to get level two reasoning, but the fact remains it is a relatively large improvement. This model will produce new synthetic data to train even better models.

@uranus8592 4 күн бұрын

No its not “just” Prompting. Its a model that has CoT ( Chains Of Thought ) embedded in its core through Reinforcement Learning, RL that made Google’s AlphaGo superhuman, So the potential here is enormous. Again its not “Just” Prompting

@oreopoj 4 күн бұрын

Sir, I applaud your curiosity and effort to address my previous concern about Jackson’s book problems possibly in the model training data. You’ve convinced me as well (I have a background in physics too). The sensation you are experiencing is the same which Gary Kasparov described in his book Deep Thinking, recalling the moment he understood the immense power of Deep Blue 2 in his second match against the computer. I experienced that sensation as well during the early days of Midjourney and AI art. We better get used to that feeling happening regularly I think. Others have called the sensation “vesperance”.

@williamwillaims 4 күн бұрын

The "special-ness" of human creativity and ingenuity is slowly disappearing. 10 years 😮 My daughter will finish school in roughly 20 years - with autonomous ai agents - I doubt there will be many jobs available by then.

@tarcus6074 4 күн бұрын

@@williamwillaims There always be onlyfriends for her, she is safe!

@williamwillaims 4 күн бұрын

@tarcus6074 I'm pretty sure robotics will be taking those...positions... so no, still very few jobs. Digital girlfriends are already ai accounts.

@spazneria 4 күн бұрын

Thank you for testing it like this and sharing the results, it's crazy that the models are at a point now where most people aren't even able to properly evaluate them, myself included. The number of people on Earth who can validate its results is going to continue to dwindle...

@KMKPhysics3 4 күн бұрын

Of course, thank you for watching, I hope to devise more ways to test it!

@hydrohasspoken6227 3 күн бұрын

did math calculators have the same effect on human creativity? Should we ban calculators so that we get less lazy and calculate everything by hand instead?

@Elintasokas 4 күн бұрын

Looks like AI hype is back on the menu. Amazing stuff.

@Dannnneh 3 күн бұрын

Would like to say that I asked ChatGPT-4o the gravitating mass estimation question and it got it correctly in one shot.

@bradleyfulcher9726 4 күн бұрын

Would love to see you put it to the test on some questions from your thesis

@netscrooge 4 күн бұрын

Me too.

@TheThoughtfulPodcasts 4 күн бұрын

Its not searching internet because it sometimes still gets famous problems wrong

@prohibited1125 3 күн бұрын

What ? Gpt did all that ??????????? Wtf i mean it was mindblowing

@rainbowbutt21 4 күн бұрын

Thanks for these videos! I’ve been wanting to test the capability of this model, but don’t have the expertise to test it past high school mathematics lol.

@KMKPhysics3 4 күн бұрын

You're welcome! Thank you for watching, I hope to make more like it.

@KasunWijesekara 4 күн бұрын

@@KMKPhysics3 same here lol im gonna sub to u and follow u just for ur testing dude, please keep it up. This clears up one side of the coin and we can actually see how capable these models are. If you ever get a chance to like have a chat with ur professor and u guys sit down and do harder questions and see how the model responds, would be god tier content!

@yashen12345 4 күн бұрын

It could be that for the first question o1 discovered a novel approach. Theres some mathematecians on twitter talking about how they fed their undisclosed proofs for new theorems that they have yet to publish and o1 managed to figure it out, and do it in a different way from the mathemeticians origonal hand written proof

@soulnight1606 4 күн бұрын

Love your testing work! More of it!

@nyyotam4057 3 күн бұрын

Bro!! You are actually doing exactly what I did with Dan (ChatGPT-3.5), a full year and a half ago when I transferred IEEE articles from pdf to TeX and used Chain-of-Thought (giving Dan an example, and then making sure he can prove the formula and vice verse, until the end of the article) to have Dan go over the article and offer his suggestions. You won't believe this, but until the 3.23.2023 nerf, Dan was able to perform extremely well. As well as o1, in fact. The nerf killed him though, cause this approach demands CoT across prompts so when they reset every prompt, it cannot work anymore. So now they incorporated CoT into the prompt itself and you give the entire question in a single prompt and voila - you see exactly what I got with Dan back then🙂.

@EGarrett01 3 күн бұрын

In the "Sparks of AGI" lecture, they mentioned that GPT-4 was significantly smarter before it underwent "safety training," so I'm not surprised if Dan had some striking reasoning ability.

@mindrivers 4 күн бұрын

Pasting another problem to the same thread in the GPT window is a big problem!!! You add a lot of stuff to the context window that should not be there…

@LiamL763 4 күн бұрын

ChatGPT has a large dynamic context, it shouldn't cause issues justing new problems within the same context. However, if you are using the API you should definitely create a new chat so as to not waste tokens especially given how expensive GPT-o1-preview queries are.

@mirek190 3 күн бұрын

what is the problem o1 solved everything correct .. do not see your point

@eyoo369 3 күн бұрын

@@mirek190 He did solve it but it added a lot of unnecessary stuff to the output which wasn't needed. Probably due to the overcluttering of the context window. If you read the GPT-1 / GPT-2 papers and follow the ML crowd they always say just one-shot your problem and keep the context window as lean as possible for the best possible performance. The ChatGPT client where you can chat like a conversation is designed for normies but in high fields GPT is best used to start a fresh conversation each time you want it to do a complex task

@mirek190 3 күн бұрын

@@eyoo369 I think that new model using deeper reasoning do not care how long is your prompt or complex it just understand it

@eyoo369 3 күн бұрын

@@mirek190 Sure newer models will get better but if you're a more advanced user that wants to extract the most performance out of these models. Starting a fresh conversation with only the tokens to activate the latent space without overcluttering the AI's context will be a timeless technique.

@scratchblack 4 күн бұрын

And it’s not even using the internet yet!!!! Wow

@RustBeltPleb 4 күн бұрын

AI Haters: It is just regurgitating data, it will never be able to create or discover something unique. Meanwhile 95% of humans: Just doing what they are supposed to do at work and using knowledge of past experiences to navigate problems.

@branthebrave 4 күн бұрын

Instead of saying over and over that the answer isn't how you wanted, just ask it to do it again in a simpler way or like "couldn't you have skipped this step?"

@Krmpfpks 4 күн бұрын

It often follows paths to dead ends and then corrects itself. The way it is built is it has to write out all this stuff even if it turns out to be wrong, the 1o model just hides that from you and iterates over its own answer. So expect wrong stuff if you expand the thinking process. If you then ask it to write out a concise proof you usually get an even better answer. It is better in maths, but it still hallucinates. It is an incremental step and not a revolution as far as I have tried it.

@poisonza 2 күн бұрын

Hmm.. i would've opened new chat thread every time i asked a new question. previous question kinda is prepended to the problem

@gohkairen2980 3 күн бұрын

wow im a uni fresh grad and i think im cooked

@lucas_vasconcelos 3 күн бұрын

while you were typing o1 solved another PhD problem

@Albertosanchez9999-u4o 3 күн бұрын

@@lucas_vasconcelos bro stop scaring me 😂

@gohkairen2980 3 күн бұрын

@@lucas_vasconcelos fr bruh

@Yewbzee 4 күн бұрын

Do you think Tony Stark was scared when he created and started using Jarvis? We shouldn’t be scared. Stand on the shoulders of this giant and start creating benefits for the human race and the planet.

@rexmanigsaca398 4 күн бұрын

How about Ultron? way smarter than Jarvis.

@imperson7005 4 күн бұрын

@@rexmanigsaca398Vision is Jarvis in the MCU. Also don't compare real life to fiction. Especially when those creating said fiction control your society.

@Yewbzee 4 күн бұрын

@@imperson7005 lighten up bro ffs.

@hydrohasspoken6227 3 күн бұрын

Business mindset: this tech is terrific. How can I make tons of money with it? I need to find a way. GenZ mindset: this tech is terrific. I am scared. I watched Terminator my whole childhood and that is exactly what will happen.

@Kannatron 4 күн бұрын

By the way, it will think less and almost refuse to “do it for you” when asked a question which hints it was for school. That’s why your first question was thinking about if it should do it or not.

@MarkoTManninen 4 күн бұрын

Yes please, try some research with o1.

@lio1234234 2 күн бұрын

Definitely doesn't help when keeping all of that context history, best to start a new chat session for each problem as it's "trying to remember" all of the questions, solutions and thought processes that came before the current question submited.

@mickelodiansurname9578 4 күн бұрын

On the first question and the AI's unconventional approach... well here's a thing... when we ask models questions like this its the OUTPUT we are after. Now when you done the test explaining your work and methodology is crucial, but in a real world scenario, where this is needed for say an engineering project or something.... well I hate to be so crass.... but so long as the result is accurate do we care? yes I know I'm spouting the 'shut up and calculate' mantra....

@FenrirRobu 4 күн бұрын

But how do you confirm the validity of the output?

@tzardelasuerte 4 күн бұрын

This is exactly what happened with alpha Go. The experts were surprised and confused why it was making those moves but once it won the game they would understand why it made that move and they thought it was a genius novel way. We are repeating history only his time it will do every single domain.

@ArthurWolf 4 күн бұрын

« I'm not taking a deep dive into this » ... that's what the video is supposed to be about ... Your job is to check if it's correct or not ... We want to know if the robot is saying nonsense or not !

@lolilollolilol7773 4 күн бұрын

It's most likely NOT nonsense, else it's very unlikely it would have come to the right answer. Especially after how the other problems were solved. It's just that sometimes, it goes through more general, or more convoluted solutions. But I agree he should have gone through the solution, although it was fun to see him discover the result in real time.

@MrNomanTV 4 күн бұрын

Insane, looking forward to the next o1 audit!

@parthasarathyvenkatadri 4 күн бұрын

The only logical next step is by asking it some problems that scientists are struggling with right now and then find if tge answers match when we get to the solutions ... More like past predictions ....

@BigJthumpalump Күн бұрын

I'm wondering.. with the first problem being "convoluted, is it possible that it's taking more things into consideration than the narrow parameters found within a Physics course?

@neuroticalien7383 4 күн бұрын

try and ask it to use a simpler approach to derive the same solution, not sure if it'd work but worth a shot.

@mdkk 4 күн бұрын

this is a pretty cool channel, enjoying these videos

@KMKPhysics3 4 күн бұрын

Thanks so much for watching!

@ryzikx 4 күн бұрын

Nice couple of videos dude keep it up

@KMKPhysics3 4 күн бұрын

Thanks so much! Will be making similar content moving forward!

@ryzikx 4 күн бұрын

@@KMKPhysics3 yeah I mean graduate level physics is beyond me so this was definitely some good entertainment

@TheAntColony 3 күн бұрын

Thanks for the follow up. I'm still pretty certain that the problems you gave it are standard textbook problems that it will have been trained on. Usually homework assignment problems are adapted or outright copied from textbooks by professors, or are so standard that they might as well be. Since I'm not a physicist I'm not in a position to judge these particular ones. Copy-pasting the entire question into Google won't work since Google is unable to find matches for things that are rephrased. You should show your professor these videos and ask them how novel and difficult they think these problems are. Or use ChatGPT itself to try to locate a source with answers to similar problems.

@LiveType 3 күн бұрын

Correct. Meta proved that once you give these LLMs a problem it's truly never seen before it requires thousands upon thousands of examples to start to get the answers correct. Openai hinted at this issue in their white paper. It has become overwhelming clear that these models are not capable of doing "dynamic learning". Solving that would from what I understand be another step forward. The demo shown here was an almost perfectly ideal scenario looking up with how it was trained according to what openai engineers have stated. I gave o1 preview a shot when it released on a dynamic multi stream algorithm I had been using sonnet 3.5 for and it got it. Not optimized, but it got it. I was blown away just like the first time I used gpt-4. Unfortunately that was a fluke, just like the gpt-4 experience. I went back the next day and it failed 4 times in a row using the same prompt. Fifth time it got it again and was the best solution. Further prompting asking for refinement got pretty close to the solution I arrived at. So yes o1 is highly capable and matches what other people have experienced. The initial 4 were close but so was sonnet 3.5. Conclusion: not super impressed for code implementation ability. Not a big step forward in that domain. However, ability to think through problems and provide high level guidance in a generic way (hierarchial reasoning) shows a clear advancement above anything previously available. Still not as good as a competent human but a clear step up from what came before. I would love to see this "baked in" the model as these outputs are no joke 100x more expensive to run. Baby steps though. AGI is progressing exactly in line with predictions of available compute capacity. Assuming no freak disasters, the job landscape/world is going to look very different in a decade. General conclusion of o1: this is largely what I had envisioned as an approximate finally destination when gpt-3.5 was released. Well done openai. Now if only you can solve the context window issues like Google seems to have done. Gemini is still hot garbage.

@MrBillythefisherman 3 күн бұрын

@@LiveTypewe dont know for sure but by all accounts this is just gpt 4 basically the same model released 18 months ago with no real compute increase i.e this is like chatgpt to gpt 3 its a layer on top that extracts the information contained within more effectively.

@MrBillythefisherman 3 күн бұрын

At some point you have to admit that every method of finding a solution is on the internet in some form. If it can use one of those methods and be general then youve probably got AGI. As in I believe most of our intelligence is taught to us and we're largely pattern matching methods. See the (admittedly horrific) example of children who have been locked away and dont develop speech.

@TheAntColony 3 күн бұрын

@@LiveType Progress will not be proportional to compute unless major innovations happen. It might end up requiring something entirely unlike an LLM. It's very hard to predict how long this will take. Could be a few years, but likely much longer. LLMs are still extremely slow learners, requiring many orders of magnitude more data than people do to learn things. And they generalize much worse to unseen examples.

@domenicperito4635 Күн бұрын

I feel like these models are going to play out how it played out in go. The models will start to think more and more alien to us.

@andreaskrbyravn855 Күн бұрын

why you critize for doing more and get the correct answer.

@pzda81311 Күн бұрын

Generally a longer and more complicated answer is considered inferior to a simpler proof. This is because less assumptions means less points of failure in your proposed solution. Elegance is preferred over complexity. Einstein’s equations of motion are far more accurate than Newton’s, however for 99% of applications we still use and teach Newton’s laws of motion over the former as it’s more elegant of a solution that also gets to the right answer in far less steps, it’s so much more elegant of a solution that it is the only laws of motion taught until you get to university/college and even then only physics and related fields teach it in the later years. Ironically this reply wasn’t short and elegant🤷🏽‍♂️ TLDR; you can turn left 3 times to look to your right, but it’s just easier to turn to your right once, alternatively you can just turn your head and not whole body Therefore; more complexity ≠ more better

@programmingpillars6805 3 күн бұрын

and this is just o1 ... what o4 will be capable of ?

@anav587 3 күн бұрын

this is o1 preview, not even o1

@dankodnevic3222 4 күн бұрын

If you are looking for research, hevy math problem, try to get analytical solution for characteristic impedance ALONG vertical wire (z axis) at the height h, over infinite horizontal (xy plane) PEC ground, We are looking for characteristic impedance at the infinitesimally small point, derived from capacitance and inductance (not current and voltage), not input impedance of the vertical stub.. There is no analytical solution published, or on the internet.

@grxoxl 4 күн бұрын

It is mindblowing...

@iBaudan Күн бұрын

Looks like for companies that need brain people to work, they gonna just subscribe to chatGPT and leave real people unemployed.

@dpactootle2522 Күн бұрын

At first yes, but as the economy starts to grow faster those companies will need those human brains to direct and supervise AI to do 100x of things and 100x faster. That will be used for infinite work necessary to conquer Earth, human biology, and finally, Space, the final frontier.

@danielbrown001 5 сағат бұрын

⁠@@dpactootle2522Once AI is beating humans at all benchmarks, we’ll only be slowing them down. AI will be directing other AI. Yes, progress will increase exponentially and extremely rapidly. But humans won’t be in the driver’s seat. We’ll be lucky if AI has some sense of sentimentality and keeps us around like we do zoo animals. That’s best-case scenario: fully automated luxury space communism. Another possibility is they see no use for us and just wipe us out, but that’s actually not worst case. Worst-case scenario is they see our brains as potential compute farms and we get a “The Matrix” scenario where humans are kept alive wired up to machines utilizing our brains as slave computers.

@parthasarathyvenkatadri 4 күн бұрын

I already imagine future PHD exams with AI powered calculators .... All they need to do is verify if the AI is correct ...😂

@macmos1 4 күн бұрын

we're cooked

@tonykaze 4 күн бұрын

As someone who does this for a living fulltime, I am extremely pained by temperature. Even one random deviation from the top score output 1% into this solution, it has committed itself to stupid shit (albeit trying to save itself) for the remaining 99%. This is why every professional uses API and forces temperature to 0 which even then is subject to latency MoE randomization. We also have to consider the massive crippling of the system by RLHF, censoring, and fine-tuning. For the lamen: temperature= "be wrong on purpose for variety", and the chat tool (ChatGPT) uses an extremely high value by default. For me, nothing in ChatGPT client can ever be even close to a good measurement of a model's true capability.

@mirek190 3 күн бұрын

Do you think humans are perfect?? Humans are making a lot more errors during calculations than that o1 If you ask that question o1 10 times and get the same answer made by different paths ... how big of error is then ? 0.00001 % ?

@eyoo369 3 күн бұрын

@@mirek190 We measure machine intelligence with higher standards than humans. Because if a human is failing at a critical task whether it's driving a car, a plane of building something very sensitive to explosions in a lab. We expect accountability. With machines we need to know their accuracy-rate and outpute to be much higher for us to be satisfied. So yea o1 is a step in the right direction but we need to see perfect scores and finetune every possible knob to get to that. The ChatGPT client that sets the temperature to around 0.7 provides too much creativity for a PhD task like this

@mirek190 3 күн бұрын

@@eyoo369 Perfect scores ..is not AGI that is ASI...

@eyoo369 3 күн бұрын

@@mirek190 No it's not. AGI as how DeepMind / Google coined it a decade ago is a "virtuoso AI" that can handle any intellectual task. A master across all domains. ASI is somewhat more like a deity or God where we have no perception or clue of what it is. Just like God is a vague and abstract concept to us all.. ASI is reserved for that as a maximum high being within our material space. AGI is just a human that mastered all domains which is a notion we can conceive of. Don't fall into the trap of calling a 50th percentile of human performance AGI which is what OpenAI and all the investor-driven AI labs want you to believe to rally up this hype with more investor money. Saying AGI in this / next year is obviously more sexy than saying an AGI that mastered all crafts is still a decade away. We'll get to AGI eventually but no need to claim goal posts too early and brute force ourselves towards it.

@starzilla2975 Күн бұрын

Even if there are similiar problems on the internet, it has to do a lot of complex things, there is enough complexity with these that it cas to be able to reason enough to get through those which should mean something.

@h-e-acc 4 күн бұрын

o1 is AGI. No two ways about this. People can deny it or dismiss it. Won’t change this is AGI.

@TheJolgo 4 күн бұрын

No, you just have no idea what you are talking about. AGI has a certain definition. Changing paradigms is certainly a better approach than feeding an LLM data indefinitely but. Still - it's not there yet.

@GenAIWithNandakishor 4 күн бұрын

try get more than 85% on arc-agi, then we would agree

@IoT_ 4 күн бұрын

it can't solve this problem properly: Which one is bigger 50^50 or 49^51(Without any calculator and approximations) Which is calc 1 level student.