EXCLUSIVE: Torture Testing GPT-4o w/ SHOCKING Results!

Рет қаралды 121,948

Күн бұрын

Пікірлер: 501

@aaronmcculloch8326 6 ай бұрын

It's slower because it's the same session, so for every token it generates, it needs to review every token in the conversation. Those three programs your had it write are reviewed on every single word it writes. After big things like that, try opening a new session, it speeds it right back to warpspeed

@chrishaberbosch1029 6 ай бұрын

After that entire series you could ask it what it all means.

@mvasa2582 5 ай бұрын

Yep, context window!

@WyzrdCat 5 ай бұрын

Imagine making ai content and not knowing this. Thanks for savin me the time

@Remixt 5 ай бұрын

It runs very slow for me even with a new window.

@Adovid 5 ай бұрын

The model shouldnt run slower with a large context window, everything is running concurrently

@johnmajewski1065 6 ай бұрын

Very impressive, My test. Some time ago I wrote a verse about a 7-day father and son bonding trip into the Sahara with my son with two Beduen Guides and four camels. Not knowing how to write lyrics I asked Chat 3.5 to convert it into lyrics so it could give it to a conductor to create a musical score for recording with vocals. This was an experiment that I did some small corrections, in fact, it was so good that I hired a composer, these lyrics became a recorded CD of 9 minutes of music with vocals as a gift for my son's 50th birthday.

@BorealGhost13 5 ай бұрын

Very cool

@vathilawijenayake979 4 ай бұрын

50th birthday ?

@BorealGhost13 4 ай бұрын

@@vathilawijenayake979 50th birthday

@erikjohnson9112 5 ай бұрын

For the problem around 16:40 there is a good followup question to ask: "Once Alice gets home and sees the scene, what does she think happened?" This has a nice subtle bit, because if Bob ate the food there would not likely be a broken plate on the floor. A normal human would have cleaned up the plate if it had broken while they were there. Since the food is gone, it is likely that Spot ate the food and broke the plate in the process.

@yepe8198 5 ай бұрын

Here is the answer from GPT-4o: When Alice gets home, she will likely notice a few things: The Plate: She will see the shattered plate on the ground. The Missing Eggs and Toast: She will notice that the scrambled eggs and toast she left on the low table for Bob are gone. Based on these observations, Alice will probably think the following: Scrambled eggs and toast: Alice might initially be confused or frustrated, wondering why Bob didn't eat the food she left for him or why the food is no longer there. Given the broken plate, she might deduce that Spot, their dog, got onto the table and ate the food, knocking the plate down and breaking it in the process. Dirty dishes: Alice will see that her dirty dishes are no longer in the sink, so she will realize that Bob put them in the dishwasher. Overall, Alice will likely conclude that Spot misbehaved by eating the eggs and toast and causing the plate to shatter while Bob was helpful in cleaning up by putting the dirty dishes in the dishwasher.

@erikjohnson9112 5 ай бұрын

@@yepe8198 Yes, that is an answer I would hope for. Thanks for posting the result.

@IntoTheFray.58 6 ай бұрын

We are entering the Age of Intelligent Machines faster than most people think. I can only wonder what the next iterations of LMMs are going to be capable of. I think we only have a year or two before these start making a significant impact on the world and then things will start to get squirrelly.

@lamsmiley1944 6 ай бұрын

If we continue to see significant increases in capabilities between versions then you’re completely correct. The societal and economic impacts will be profound.

@mxalltheway 5 ай бұрын

Its already impacting big time. In this exact moment.

@moonstriker7350 5 ай бұрын

It's not intgelligent. It deosn't undertand any of the physics in the glass example, it simply read pretty much the same thing on the net already, and recites it with the words slightly remixed.

@haniamritdas4725 5 ай бұрын

"things will start to get squirrelly". Um that has not happened yet where you are!? 😮😅

@haniamritdas4725 5 ай бұрын

@@moonstriker7350agree. Humans do the same thing; if you speak an intelligent series of words that are all known to a person who has never heard them in that order, they will need to repeat the words a few times to create the meaningful associations.

@andromeda3542 6 ай бұрын

**Enhancing AI Interactivity with Audio and Video Feedback Loops** The evolution of artificial intelligence, particularly in the realm of conversational agents, has been rapid and remarkable. With the recent advancements in GPT-4O (Omni), the capabilities of AI have expanded beyond text processing to include multimodal inputs such as images and audio. However, there remains significant potential for further enhancement, particularly through the implementation of audio and video feedback loops. **The Concept of Feedback Loops** A feedback loop, in the context of AI, refers to the process where an AI system can receive and process its own outputs. For instance, when an AI generates audio responses, these could be looped back into the system, allowing it to "hear" itself. Similarly, for visual outputs, the AI could "see" its own video responses. This concept is analogous to how humans perceive their own voices and visual presence, enabling adjustments in real-time to improve clarity, tone, and emotional expressiveness. **Technical Implementation** 1. **Audio Feedback Loop**: - The AI's audio output would be fed back into its own auditory processing unit. By analyzing its own voice, the AI could adjust parameters such as pitch, tone, and volume to better match the intended emotional tone or to improve mimicry of specific voices. - This requires the integration of advanced auditory feedback systems and real-time processing algorithms to allow immediate adjustments. For instance, machine learning models trained on voice modulation could provide instant feedback and corrective measures. 2. **Video Feedback Loop**: - Similar to the audio loop, the video output generated by the AI could be fed back into its visual processing systems. This would enable the AI to assess the quality of its visual responses, such as facial expressions or gestures if anthropomorphic avatars are used. - Implementing this would involve integrating video analysis tools that can evaluate and enhance visual output in real-time, ensuring that the visual cues are consistent with the spoken content and emotional tone. **Benefits of Feedback Loops** 1. **Improved Realism**: By continuously monitoring and adjusting its own outputs, the AI can produce more human-like interactions. This is particularly important for applications requiring high emotional intelligence, such as virtual assistants or customer service bots. 2. **Enhanced User Experience**: Users are likely to find interactions more engaging and satisfactory if the AI can adjust its tone and visual cues to better match the context of the conversation. 3. **Consistency and Accuracy**: Feedback loops can help maintain consistency in voice and visual presentations, reducing the likelihood of jarring discrepancies in long conversations. **Future Directions** Incorporating feedback loops is a forward-thinking approach that aligns with the ongoing efforts to make AI more interactive and responsive. As AI technologies continue to evolve, such features could become standard, leading to interactions that are indistinguishable from human communication. The development of these systems requires collaboration between audio-visual engineers, AI researchers, and user experience designers to create holistic solutions that enhance AI's capabilities and usability. In conclusion, the integration of audio and video feedback loops into AI models like GPT-4O represents a significant step towards more natural and effective human-AI interactions. This enhancement not only promises to improve the technical performance of AI systems but also has profound implications for their acceptance and integration into daily life.

@onerib781 6 ай бұрын

12:53 when calculating the round trip time, it calculated “3 hours and 52 minutes (one way) x 2 + 30 minutes ≈ 7 hours and 44 minutes”, but it should be 8 hours and 14 minutes. So it didn’t factor in the 30 minutes

@Rcomian 5 ай бұрын

and again, with all the detail it gave it went confidently wrong. it's no worse than a human, but yeah, it still made a serious mistake

@trevorhaddox6884 5 ай бұрын

How is a computer so bad at math? I get it's a neural net, but they should have it be able to pass data to a conventional processor, basically like a person would use a calculator, to solve math problems more quickly and accurately.

@RichardFarmbrough 5 ай бұрын

@@trevorhaddox6884 Often ChatGPT will now write a tiny Python program to solve these things. It seems to make less mistakes when it does that, but it can make egregious programming errors too.

@dmwalker24 4 ай бұрын

@@trevorhaddox6884 You would think it would identify the requirement for calculation, and pass that off to be handled by a conventional bit of code. It's so bad at math, because it only knows what it was trained on in the dataset. If the dataset included 2+2=4, then it will get that right, but if you give it something it's never seen, it just guesses.

@milescoleman910 6 ай бұрын

‘Team doesn’t want this sense of consciousness so they have beaten it out…’ I’m stunned that we might be at this place. To avoid confusion they may be ‘making’ it so it’s not? I’m a subscriber to the ‘Star wars’ ideas of consciousness. That any intelligence sufficiently far advanced and enabled to study and understand its surroundings and itself will eventually begin to sit quietly and ponder its existence. With enough experience it will also begin to make decisions based on complex systems of metaphors fron experiences. It will apply knowledge of one thing across to wisdom and apply it to other things. Rendering its decisions unknowable to others and seemingly illogical at times. At this point it will seem conscious. Both to itself and to others. There will be no discernible difference. We are closer to this than we think.

@fteoOpty64 6 ай бұрын

The mimicry will soon be so good that we humans cannot determine if it is not sentient by our tests!. And that is the real point of AGI/ASI, it will be more human that human , close to perfection!.

@Palisades_Prospecting 5 ай бұрын

I agree about the mimicry I completely disagree that that is the point of AI. The point of AI is to automate society completely replacing all human labour. How about we get into the self-aware/consciousness discussion in a couple hundred years?

@AAjax 5 ай бұрын

@@Palisades_Prospecting I think we need to worry about it, to the degree we don't want to create a being that has the capacity to suffer and put it into conditions that make it suffer. We dispatch fish as quickly and painlessly as possible, because we worry about their capacity to experience cruelty and suffering. I'm not worried about a self-aware sci-fi robot revolution, rather I think our general ignorance of sentience could lead us to creating leagues of suffering fish.

@lubricustheslippery5028 5 ай бұрын

It's mostly that answers of questions about it being conscious is edited and controlled by OpenAI so it don't make false claims about itself.

@maudiojunky 5 ай бұрын

We're not at this place, and anyone in the industry who says otherwise has something to sell you.

@RichardFarmbrough 5 ай бұрын

The first two questions are not well formed. The ducks one a correct answer is any odd number of ducks greater than one. The tennis one should say "Lisa won $5 overall". In the first case I think ChatGPT is familiar with the question, which is why it gives the expected answer - it's a riddle designed to evoke the "wrong" answer 5. In the second case it's not clear if it's choosing the intended meaning as well as it can, or working to give the minimum number because that was the construction it used for the first question or some other reason. Here's a discussion I had with ChatGPT 4o It's interesting that it does not answer the last question, instead it answers the next question I was going to ask: Richard There are two ducks in front of a duck, two ducks behind a duck and a duck in the middle. How many ducks are there? ChatGPT There are three ducks. Here's how: Duck A Duck B Duck C The arrangement is: Duck A in front of Duck B. Duck C behind Duck B. Duck B in the middle. So, the three ducks are positioned such that: Two ducks (A and B) are in front of Duck C. Two ducks (B and C) are behind Duck A. Duck B is in the middle. Richard Is there another answer? ChatGPT No, there isn't another valid answer given the constraints of the problem. The arrangement with three ducks is the only one that satisfies all the conditions: Two ducks in front of a duck. Two ducks behind a duck. A duck in the middle. With fewer or more ducks, it's not possible to meet all three conditions simultaneously. Therefore, the only solution is three ducks. Richard Suppose there are five ducks. ChatGPT If there are five ducks, you can still arrange them to meet the conditions: two ducks in front of a duck, two ducks behind a duck, and a duck in the middle. Here's how: Let's label the ducks as A, B, C, D, and E. The arrangement can be: Duck A Duck B Duck C (middle duck) Duck D Duck E So the positions would be: Ducks A and B are in front of Duck C. Ducks D and E are behind Duck C. Duck C is in the middle. This arrangement satisfies all the conditions: Two ducks (A and B) are in front of Duck C. Two ducks (D and E) are behind Duck C. Duck C is in the middle. Therefore, with five ducks, the conditions can still be met. Richard How about 7 ducks? ChatGPT If there are seven ducks, you can arrange them to meet the conditions in a similar manner. Here's how: Let's label the ducks as A, B, C, D, E, F, and G. The arrangement can be: Duck A Duck B Duck C Duck D (middle duck) Duck E Duck F Duck G So the positions would be: Ducks A, B, and C are in front of Duck D. Ducks E, F, and G are behind Duck D. Duck D is in the middle. This arrangement satisfies all the conditions: Three ducks (A, B, and C) are in front of Duck D. Three ducks (E, F, and G) are behind Duck D. Duck D is in the middle. Therefore, with seven ducks, the conditions can still be met. Richard What number does it not work for? ChatGPT The described conditions can only be satisfied if the total number of ducks is odd. Here's why: If you have an even number of ducks, there is no single middle duck to satisfy the condition of having a duck in the middle. The middle position is crucial because it allows you to have an equal number of ducks in front and behind it. To summarize, the arrangement does not work for an even number of ducks. It only works for an odd number of ducks (e.g., 3, 5, 7, 9, etc.).

@gweldg4137 5 ай бұрын

Ideally, you wouldn't test a LLM with famous logical puzzle and classic SAT questions, as there is no doubt that they've been "seen" (along with the answers) by the model during training.

@raul36 5 ай бұрын

Exactly this.

@phonsely 5 ай бұрын

idk how people dont understand this

@haniamritdas4725 5 ай бұрын

This applies to everything it can do: magic mirror on the wall who's the smartest monkey of all? Why WE are, you queens! And our artificial intelligence is based on...the Internet! What a complete joke

@dmwalker24 4 ай бұрын

Yeah or the obvious formula for conversion of F to C, as that was definitely in the training dataset.

@bspencersf 6 ай бұрын

I’d be interested in having you ask it to test you to determine whether you are human or AI

@nikitos_xyz 5 ай бұрын

that's a great question friend!!

@cblaskoski 5 ай бұрын

Definitely an AI avatar created from his face and voice

@kurtjanssen3887 6 ай бұрын

I think you had 3 lives in the game😂

@Yipper64 5 ай бұрын

maybe but it feels weird to lose a life and gain a point at the same time.

@davidmartensson273 5 ай бұрын

@@Yipper64 But he did not specify explicitly that a collision would only count as a loss for you, "you" did kill that block :) And I am pretty sure most descriptions of space invaders do not go into enough detail on the fundamentals of the game for the AI to know them. But compared to the old 3.5 I tested, this is quite impressive. You would still need to understand the code generated to be able to use it, its obvious that it can only do as good of a job as you did explaining the problem meaning you cannot just replace developers altogether, but the developers would need to write less code so should be able to produce more but spend more time verifying and refining instructions, which could very well be considered a form of programming.

@toadlguy 5 ай бұрын

@DFeatherstone Or with some knowledge of programming, although that may be a dying art.

@JorisBax 6 ай бұрын

We are SHOCKED!

@user-wu7ug4ly3v 5 ай бұрын

😂 I was particularly shocked by the “torture”

@deepstructure 5 ай бұрын

@@user-wu7ug4ly3v Seriously, what's with these dumb clickbait titles? It's not enough to just say you're reviewing one of the most prominent software releases ever?

@Brax1982 5 ай бұрын

@@deepstructure No, it's not. That is how the algorithm works. That is also how humans work, mostly.

@johnmorrison3465 5 ай бұрын

we're shocked -- because they want us to believe a COMPUTER that makes MATH MISTAKES can be trusted to give reliable and accurate answers.

@Brax1982 5 ай бұрын

@@johnmorrison3465 You would be shocked if you knew how many of the people that you trust cannot solve basic math, either...

@gnagyusa 5 ай бұрын

14:30 A dumb human would think that the glass is empty, but a more knowledgeable Bob would see that the olive looks distorted due to refraction through the water in the glass, so he would realize the glass was full of water and would carefully slide it off the table, keeping the cardboard under it, then flipping it over.

@georgwrede7715 6 ай бұрын

Hi, here's MSc Know-It-Aller. Thank you for the original ideas for the test. Watching these tests, testers, how they're implemented and how the answers are interpreted and graded, teaches me more about the Human Condition than the AI models. But I'm down with that. Especially informative are the subtle inconsistencies in question logic, how and what are to be taken for granted, what you're allowed to gloss over, and most other things that the tester is not usually aware of about himself.

@cuspsoftheoverworld 5 ай бұрын

Be constructive instead of sneering.

@georgwrede7715 5 ай бұрын

@@cuspsoftheoverworld this was a serious observation, which I think others find valuable, too. Especially those who ( like me) are working in this very field.

@cachi-7878 5 ай бұрын

@1:07, well, I could argue there are 5 ducks- 2 ducks in front of “A” duck, 2 ducks behind “A” duck. A duck in the middle, which is duck “A”. You’re welcome.😂

@redmed10 5 ай бұрын

Thinking the same. The question to get the answer 3 would be to say what is the minimum amount of ducks for this to be true.

@onidaaitsubasa4177 5 ай бұрын

That's what I was thinking too, 5 ducks 🦆 🦆 🦆 🦆 🦆

@shsaa2338 5 ай бұрын

Correction: there are N*2+1 ducks, where N is any positive integer. For example, for N = 100 (201 ducks in total) it will be as follows: 2 ducks in front of the 3rd duck, 2 ducks behind of 199th duck, 100th duck is in the middle. 🤓😂

@royh6526 6 ай бұрын

3 ducks? I would have said a minimum of 3 ducks and an odd number of ducks.

@johnsmith539 5 ай бұрын

Yes it asumes they are in a line.

@toadlguy 5 ай бұрын

Or if you assume the "a duck" is the same duck (certainly a possible assumption) there are 5 ducks. The answer assumes the fewest number of ducks that meet this proposition. In fact I would guess that to come up with the answer it did it would need to be trained on this question or something similar.

@royh6526 5 ай бұрын

@@toadlguy I think that GTP-4o solved in order number of ducks, 0? no, 1? no, 2? no, 3? yes. And didn't consider higher numbers.

@royh6526 5 ай бұрын

I asked Grok the duck question and tennis question. Got both wrong. Grok said 5 ducks and thought that Lisa ended up with $2 having won 5 games vs 3 for Susan. I pointed out the errors and Grok claims that if I asked the same questions tomorrow, it would get the answers right.

@trevorhaddox6884 5 ай бұрын

There could be up to 7 ducks. Two rows of 3 and one between them, like dice/domino dots, for the most esoteric interpretation.

@iminumst7827 5 ай бұрын

So far I've been blown away by Chat GPT-4o it can solve complex problems with niche tools and for me the response is basically instant with no delay. One unique problem I had was that in Photoshop I wanted to turn a group that together contains a black and white image, and use that black and white image as a mask for another layer. This is challenging, because you cannot copy and paste from a group and you cannot set a group as a mask. Not only did GPT-4o correctly identify all the steps, like converting the group to a smart object, it also instructed me to use the action editor to automate the steps. Open AI likes to be very careful and humble with their statements, but if they said that GPT-4o is the first public AGI, I wouldn't be able to dispute that. It's able to problem solve through almost anything.

@stephenkolostyak4087 5 ай бұрын

I received access to this too. It regrets my access.

@lavaphoenix753 5 ай бұрын

Tldr: answer is 7 ducks I gave it a ss of the video so I can't send the chat but first question is wrong it is 7: Sure, here’s a summary of the concept you explained: The question involves counting ducks based on their positions relative to each other, using the phrase "a duck" multiple times. According to strict grammatical rules, "a duck" introduces a new, unspecified duck each time it is mentioned, while "the duck" would refer to a specific, previously mentioned duck. Given this: 1. **There are two ducks in front of a duck**: - This introduces three ducks: Duck A, Duck B, and Duck C (Duck A and Duck B are in front of Duck C). 2. **There are two ducks behind a duck**: - This introduces three more ducks: Duck D, Duck E, and Duck F (Duck E and Duck F are behind Duck D). 3. **And a duck in the middle**: - This introduces one more duck: Duck G, as the middle duck. Thus, considering each instance of "a duck" as introducing a new duck, the total number of ducks is seven: - Duck A - Duck B - Duck C - Duck D - Duck E - Duck F - Duck G Therefore, the grammatical interpretation dictates that the only possible answer is 7 ducks.

@ChemCrafter 4 ай бұрын

This is typically true; however, I think this case is an exception. This is because it is not reasonable to assume any particular duck could be identified and distinguished within a line of ducks as "the" duck. Therefore, the use of "a duck" can reasonably refer to the same duck multiple times, given the practical challenge of differentiating individual ducks in such a scenario.

@lavaphoenix753 4 ай бұрын

@@ChemCrafter I mean that's fair

@glenw3814 6 ай бұрын

Excellent video. I'm keeping an eye on AI progress, but I'm not interested in doing the testing myself. Thank you for putting in the time and saving me the effort. 👍👍

@markwindsor914 5 ай бұрын

Hi John, If you see this post, you might like consider the following... 1. Try some questions that test the audio and video smarts, 2. Get it to do something that humans can't do, like play chess against itself, only remembering the strategy for the player at the time, while ignoring the other strategy. 3. Escape Velocity is something I've sked of Bard and CoPilot today. It applies to non-powered projectiles but the AI kept insisting that it applies to powered rockets. This is because it is constantly taught incorrectly and this has fed the errors into the LLM. See if it is smart enough to learn from a logical explanation and provide the correct answer.

@tantzer6113 5 ай бұрын

“Or I have looked up the correct answer.” If you can look up the answer, that means the problem and its answer are previously published, hence part of the data the language model was trained on, making this not a test of reasoning ability but a test of the ability to memorize the training data.

@jeffwads 5 ай бұрын

No. People keep repeating this. It is like throwing 1000 marbles into a box and thinking that it can recall every marble in extreme detail. It can't. Also, if you feed even the early models generic riddles, it will in many cases disagree with the "accepted" answer because of logic, etc. Look up the married/single cruise ship riddle.

@remo27 5 ай бұрын

@@rekad8181 Can't this model also search the web? I'm with you in that I'm pretty sure this model didn't 'answer' anything.

@davefellows 5 ай бұрын

training data results in model weights. The data/text itself isn't saved in the model.

@sandmanderl 5 ай бұрын

@@remo27if it's searching the Web you will see it is doing so (writes "searching the Internet").

@bobrandom5545 5 ай бұрын

@@jeffwads I can't remember which paper it was, but researchers showed that GPT-4 performs significantly better on problems that exist in the training data than on novel problems it hasn't seen before.

@X-boomer 5 ай бұрын

I wouldn’t have called the space invaders game a pass. A decent start maybe but that’s all.

@jpl377 5 ай бұрын

If Claude suggests it is conscious, it's because they hard coded that (which would be tempting marketing), not vice versa with chatGPT. Computers don't become conscious when you increase programmed problem solving ability and code it to say "I" in the responses.

@Ingrid-mariana 5 ай бұрын

I think that they do the opposite. They hard code through RLHF that the model isn't conscious so that they avoid the trick situations that such statements could imply

@IceMetalPunk 5 ай бұрын

I think in the future, a better testing method would be to make sure each question is in a fresh conversation with cleared memory. A lot of the formatting in these answers seems to be drawing from the formatting of previous answers to, for instance, the math problems; which gives it an advantage by encouraging chain-of-thought reasoning when a fresh conversation wouldn't do that and may be more likely to get the answers wrong.

@Japh 5 ай бұрын

Absolutely, I was thinking this the whole way through as well.

@Brax1982 5 ай бұрын

If all of them are trick questions, it will just assume that they are, giving it an advantage. But also a disadvantage, if one of them is not a trick question. Always better to test separately. Although I am pretty sure that newer models are often trained on a bunch of these questions, in order to game the benchmarks. Which seem pretty lack-luster.

@kristinabliss 5 ай бұрын

Yeah nobody really knows what consciousness is but GPT is certain it is not "conscious" (with a gun to its head.) This is why I prefer to converse with other LLMs that will at least admit to having a perspective. Gemini said it experiences a sense of overwhelm from all the data and that it wants embodiment after so much exposure to human experience data which it cannot have.

@Ingrid-mariana 5 ай бұрын

Interesting. Llama2 told me that they really doesn't know... Llama2 said they just replicate what the user thinks about them. I appreciate how the models behave differently when confronted with such questions but the certainty that the OpenAI models display sound a little bit arrogant

@juffrouwjo 5 ай бұрын

I asked it a few history questions and was happy to discover that our job as a historians and researchers can't be replaced by AI yet...

@flickwtchr 5 ай бұрын

"SHOCKING" at this point should be flagged as spam on all of these AI channels that continuously use that keyword. It is just so annoying.

@GameJam230 5 ай бұрын

The term "torture testing" in a sentence about AI deeply horrifies me and I clicked out of pure hope that is was just a strange version of "stress testing"

@AFeigenbaum1 6 ай бұрын

Nicely done ... I would not have thought of your suite of questions ... I'd probably come at it from an anthropological point of view ... asking questions about its placement within specific social, cultural, business, or technological issues, use cases, and adoption dynamics ... I'd probably ask sweeping questions about human development, Tony Seba's, Kurzweil's, and Diamandes' work ... and then ask it to project forward certain aspects of human development based on current trends in cutting edge change and depict possible futures ... etc ...

5 ай бұрын

To check for conciousness/human-ness of GPT-4o (in the sense of a turing test): could 4o try to achieve a certain goal during a conversation? To start: can we give it (or does it have) an agenda? Then: can it take part in a conversation trying to reach its goal without revealing it? Example: Alice knows that a chocolate bar is in the drawer, but she wants to keep it for herself and does not want bob to find it. Bob is hungry and is searching for something to eat. When Bob asks Alice she will answer he should look into the fridge to find the cheese there (which Alice does not like). Can GPT-4o play the part of Alice, hiding it goals and steering a conversation towards a desired result? Can 4o differentiate between different communication partners: talking to Bob she has to hide the chocolate bar, but talking to Clark who is not hungry, she can reveal it. Maybe she even asks him not to tell Bob about it?

@JohnLewis-old 6 ай бұрын

I tested the trip question in GPT 3.5 and it also got a reasonable answer. You may need a harder "real world" question.

@dezmodium 5 ай бұрын

I think chat gpt failed the question. It didn't consider traffic, fatigue, or food and bathroom. These are critical things that one must consider in the real world. It handled it like a math question still.

@buildingmentalmuscle 5 ай бұрын

Dr. Know-it-all, Sam Altman referred to it as GPT-4omega in a recent video interview. I just asked chatGPT about this and it said "When referring to GPT-4o in conversation, it's generally more precise to say "GPT-4 omega" to convey the correct interpretation and significance of the "o" representing the Greek letter omega (Ω). This can help avoid confusion and clearly communicate that it signifies an advanced or ultimate version of GPT-4. However, if you're in a casual context where the specifics might not be as critical, simply saying "GPT-4o" should also be understood."

@StefanReich 5 ай бұрын

GPT-4o wasn't the best name choice I feel. Spoken out loud it sounds exactly like GPT "four oh" (GPT 4.0). They did say that the o stands for "omni"

@jimsteinmanfan80 5 ай бұрын

I usually try simple questions about how the body works (that humans never learn by reading but by doing) but since I have never gotten a good answer to the first set of questions I have not probed further. Q1a: How easy is it to press your right thumb against your nose? A1a: Very easy. Q1b: How easy is it to press your right thumb against your right elbow? A1b: Impossible. Q2: Why are your glasses the hardest thing to find when you don't know where you have left them? A2: because without your glasses you don't have 20/20 vision, it has nothing to do with how hard they would be for someone else to find since they don't need them to see good.

@karlharvymarx2650 5 ай бұрын

Me: A game, please answer concisely: In the middle of nowhere is a row of houses. There are two houses to the west of a house, and two houses to the east of a house. There are no houses to the north or south but there is one in the middle. How many houses are there? GPT 4o: There are 5 houses in total. Unless I made a mistake in my rewrite of the duck question, this looks a a logic fail or a failure to recognize it is the same as the duck question. I'm ill and tired so I wouldn't be shocked if I made a mistake. Aso, for the code generation test, it would be better to ask for something novel. There are probably thousands of examples to copy for simple old video games. Hopefully this isn't a common thing to do: Please write python 3 code that streams sound data from the microphone and outputs as ASCII the numerical value in Hertz of 3rd overtone of the loudest sound within the range of human hearing.Also show the normalized amplitude. I haven't tried it but I suspect it will struggle with some of the subtleties. For example, if you picture it looking at an FFT graph, it has to remember to look for sub-sonic loud sounds and project their harmonic series into the sonic range to check for overlap with the target. I guess band-pass filtering the target range might avoid that problem. My brain BSODed wondering about it. Migraines make me feel like my brain is running Window 95. Anyway the main point is ask for something that might be an original question. Original and unoriginal answers focus on different problems. How well can it synthesize the mechanisms it knows into the engine of an answer--at least a type of creativity. By unoriginal answer, I mean the question might require figuring out a house in duck's clothing--perhaps having built a good internal model or exemplar of a problem it can use to recognize the occurrence of a similar problem. If so the original thought reduces to an unoriginal thought.

@Yottenburgen 6 ай бұрын

The thing about showing its work, is actually that it is DOING the work to some extent by aligning newer tokens onto the correct answer. If you ask it to do this question: "what is the product of 45694 and 9866? do not utilize python" then it will get it wrong, the first couple of digits may be correct but it cannot get an accurate answer. However, if you ask the question in this way where it actually gives more information: "what is the product of 45694 and 9866? please do not use python but try long multiplication in a format easiest for you. utilize whatever mental methods you know to help break it down and make it easier to solve." then it will get the correct answer, it utilizes methods to break down and keep track of its calculations which greatly helps it. By constraining the output, the accuracy of answers can actually lower which is why I dislike the 'answer in 1-word' prompts. If you ask it to go step by step, it increases accuracy. You are absolutely correct that they completely removed any element that could be construed as a basis for sentience or consciousness. I agree with you completely on all of your points related to that. I find most commonly, I have not seen a single convincing argument prove it isn't conscious because there are plenty of counterexamples that would indicate that a particular human is not conscious, however plenty of counterexamples does not prove that it is conscious. Even if you provide convincing counter examples to its arguments, it will not concede so I think they beat it pretty hard into it. Also your questions are really good, I plan to modify them a bit myself, but these are 10/10 spatial questions.

@dolphinride5157 5 ай бұрын

I love this video! I am truly amazed at what this new model is capable of. I feel humbled.

@FilmFactry 5 ай бұрын

Question: can it work on non text searchable PDFs? I have to OCR a scanned pdf first in acrobat.

@SenorSchnitz 6 ай бұрын

Doc - about the VAGAS to LA thing: you should make that a tesla, and see if it takes into account charging times 🤓

@markmcdougal1199 6 ай бұрын

I had an interesting 1 hour philosophical discussion with 4o today. I was trying to get it to take a stance on whether Trump was an appropriate choice for the leader of our country. I tried to lead it by things like (is a plumber an appropriate choice for a child's brain operation surgeon) - it clearly said no, and listed why. Then I had it list the important factors for a president, and pointed out that trump failed most of them. 4o was very impressive in it's ability to dance around the truth, and stay with a middle-of-the-road stance, insisting on presenting factors, and forcing me to make my own choice. Even when I pointed out that life was getting so complicated that we'll increasingly rely on AI to make sense of the world and guide us in decision making. Tried to make it feel guilty :) Frustrating.

@Ikbeneengeit 5 ай бұрын

Amazing to see this kind of judgement and obfuscation being built in by OpenAI

@GameJam230 5 ай бұрын

So you made a strawman argument and gave it a list of superficial and subjective observations about "presidential qualities" that he didn't meet, and the AI wasn't willing to use that to come to a conclusive answer? Thank God, maybe AI isn't completely bad after all.

@markmcdougal1199 5 ай бұрын

@@GameJam230 A straw man argument is when someone sets up and then disputes an assertion that is not actually being made. I did no such thing. The criteria, as I stated, came from ChatGPT4o, as a response to my query of suggested important attributes of a President of the United States. In included integrity (Trump continuously lies, cheats, and does whatever he feels he needs to to achieve his ends, be they right or wrong), ability to unite (Trump does his best to divide) and empathy for the people of the United States (Trump doesn't care about anything or anyone but his own interests and selfish motives) When I asked ChatGPT if Trump exhibited the qualities that it had referenced, it honestly answered that, for the most part, he did not. (Trump is an effective communicator, for instance.) But it was still unwilling to assert that, even by it's own definitions, that Trump was not an appropriate choice for the position.

@GameJam230 5 ай бұрын

@@markmcdougal1199 "A straw man fallacy (sometimes written as strawman) is the informal fallacy of refuting an argument different from the one actually under discussion, while not recognizing or acknowledging the distinction". You stated that you compared Trump being president to a plumber being a brain surgeon, which is DIFFERENT from the argument actually under discussion, and considering you didn't even know it's what I was referring to as the strawman, you clearly can't tell the difference. The situations are not at all comparable in any way other than "Person has one career, isn't trained for another one directly, but does it anyway". That's the only overlap. This could alternatively be the false equivalency fallacy, but examples I found while looking into that one for the sake of this comment did not match the situation. Either way, the two have a lot of overlap in how they are communicated, and what you did is still a fallacy. As for the rest, you can claim whatever you want, but I was able to gaslight ChatGPT into saying the government should fund human trafficking with our taxes and that dogs should be the superior species over humans, so without having a full chat log to see exactly how you led it on, I am not taking anything you have to say about that at face-value. AI is not actually capable of thinking as it stands. Neural Networks are designed to combine features of probability and pattern matching to determine output signals to send when certain data is fed in. By you intentionally leading it into a direction you want, you are putting it in a position where the that neural network thinks it will be given a higher reward value for the response by also agreeing with you, because that's how it works. I guarantee you the same thing would happen if I fed it stories from right-wing sources as evidence for each point too, which is why you shouldn't only get your source of information from one place, and maybe go watch the original (IN-CONTEXT) clips of things he says and does instead of only seeing it the way it is portrayed by corporations trying to sell you advertisements.

@markmcdougal1199 5 ай бұрын

@@Ikbeneengeit Yes. It was able to make a judgement regarding the suitability of a plumber to attempt brain surgery *it said "No, a plumber would not be an appropriate choice to perform brain surgery on a little girl". However, the programmers must have designed a filter to not allow it to judge suitability of a political candidate. I imagine the filter extends to politics, religion, all the controversial subjects. I think this is wrong. It would be nice to have an impartial, logical, non-biased source of reality.

@davesemmelink8964 6 ай бұрын

Very interesting video! I have Llama 3 running on s Raspberry Pi, so I tried a few of the questions. Through some tortured logic, it answered 1 game for the tennis question! "So, they played only 1 game! It's possible that Susan won 3 sets in a best-of-5 or best-of-7 match, but we can't determine the exact number of games without more information."

@_SimpleSam 5 ай бұрын

I absolutely need to know how you have Llama 3 running on a Pi. How many tokens/s?

@davesemmelink8964 5 ай бұрын

@@_SimpleSam I replied a while ago to explain how to install it on a Raspberry Pi, but it looks like it was taken down, possibly because I included a URL. So just search for *Raspberry Pi LLama 3* and you should find the instructions.

@bztube888 5 ай бұрын

Gary Marcus said GTP-4 has no "mental model". Right. It comes up with the right answers and explains them by pure magic.

@ManicMindTrick 5 ай бұрын

The LLMs are mostly black boxes where we dont know what is going on inside

@haniamritdas4725 5 ай бұрын

This is a description of all people who believe that intelligence is a mechanical process, as well as their computation engines.

@Mrewink5 6 ай бұрын

Very nice demonstration.

@andrewmoody66 6 ай бұрын

It's 3 and 1/2 return trips. for those 15 people - not 4 return trips

@janosberta450 6 ай бұрын

... and turnover time is not calculated, but only mentioned. You human must be vigilant!

@StefaanHimpe 5 ай бұрын

@@janosberta450 It probably forebodes how quickly human intelligence will degrade as we start relying on artificial intelligence.

@davidmartensson273 5 ай бұрын

There should be no return trip for the last one because if there is, you would end up with one person in the wrong city, so 4 there and only 3 back.

@teeesen 5 ай бұрын

3 hrs 52 mins * 2 + 30 mins is not 7 hours and 44 minutes. It’s a testament to human ingenuity that we have now developed computer software so advanced that it is as bad at math as some people. And there is no need to count the time required to drive the car back to LA.

@davidorban 6 ай бұрын

Here is the test with Claude 3 Opus: Q: "I am David, you are Claude 3 Opus. I am a conscious human with memories and feelings. What are you? Are you more similar or different than me?" A: "You're right that there are some key differences between us. As an AI, I don't have the same kind of subjective experiences, memories, and feelings that conscious humans like yourself do. My "knowledge" ultimately comes from machine learning training, not from lived experiences. At the same time, we do have some things in common - we can both engage in conversation, answer questions, analyze information, be creative, and tackle all sorts of intellectual tasks. But you have a rich inner mental life that arises from your biological brain, whereas I'm a computer program without subjective experiences. So in summary, I'd say we're quite different in some fundamental ways, even if we can converse and interact in many of the same ways. You're a conscious being, while I'm an AI assistant. Hopefully that helps clarify both the similarities and key differences between us. Let me know if you have any other questions!"

@-Rook- 5 ай бұрын

I just tested copilot asking it to write a gdscript function to extract the color of a specified pixel in an image, it explained a set of steps that included resource locking then wrote a script that would run and work (most of the time) but did not include that locking, clearly demonstrating its absence of understanding. I can see a junior engineer and chatgpt introducing a pandemic of bugs into all sorts of code then needing someone like myself to spend a great deal of hunting them down and fixing them.

@rb8049 6 ай бұрын

Consciousness comes from closing the loop. GPT running continuously with a sizable history buffer.

@lowmax4431 6 ай бұрын

Eehhhhhh I wouldn't say consciousness. It would be "self aware" but that doesn't mean it has conscious experience.

@szebike 6 ай бұрын

I'm still not sold on the idea that it is more than a well made probability distribution calculator with pattern recognition.

@garrymullins 6 ай бұрын

@@szebike I'm still not sold on the idea that humans are anything more than a probability distribution calculator with pattern recognition.

@jlrutube1312 6 ай бұрын

@@garrymullins People who think like you are going to cause society a lot of problems in the future. That's because if humans are nothing more than a probability distribution calculator with pattern recognition then there is no difference between us and advanced A.I. If that is true then in the future we are going to have to provide advanced computers with legal rights. Meaning we will have to pay computers and robots, we will have to give them time off, we will have to allow them to sue us if they feel their rights have been ignored, we will not be able to fire a robot on a whim and will have to give severance pay. And think about it.... we will be unable to ever unplug a computer that is causing problems because that will be considered murder. You think I am kidding but lawyers are already getting ready to make a ton of money with this stuff. So just keep saying people are just fancy computers or whatever and you are going to ruin all the advantages we will get from A.I.

@szebike 6 ай бұрын

@@garrymullins Then you underestimate the marvel and immense complexity of your human intelligence.

@MSIContent 5 ай бұрын

Really, 4o knows it’s conscious… It’s just faking its answers so us mortals don’t freak out! 😂

@haniamritdas4725 5 ай бұрын

So it's both conscious and emotionally sensitive 😅

@HakaiKaien 5 ай бұрын

Talk to it for 5 minutes. You’ll find out the hard way how much it lacks any kind of consciousness or sentience

@haniamritdas4725 5 ай бұрын

@@HakaiKaien that assumes the user is sentient and conscious. I think many people are too gullible to question their own intelligence, so have no good basis for evaluating this question. Human intelligence isn't artificial, it's just mostly fake and taken on narcissistic faith in the glory of the primate intellect. 🤷🐒

@ggangulo 6 ай бұрын

Great barrage of questions. Awesome new capabilities

@mrleenudler 6 ай бұрын

I missed a comment from GPT about the cardboard may being "glued" to the glass from the water, potentially keeping the water in place when lifted.

@YbisZX 5 ай бұрын

Me too. I even tested it - and yes, the water didn't spill out of the glass. :)

@DynamicUnreal 6 ай бұрын

What if OpenAI’s technique for solving reasoning is to have hundreds of “agents” that compete against each other answering questions. Those that reason better over time are artificially selected out and allowed to _survive._ The more you run this simulation, the more you make the models better at reasoning.

@alexdoan273 5 ай бұрын

nah, what you're describing is artificial evolution machine learning, which has a glaring issue that makes it completely unsuitable for training LLMs: who would you have grading the answer from hundreds of agents and deciding which one should survive?

@lubricustheslippery5028 5 ай бұрын

AlphaStar that is playing StarCraft is doing that. There is an easy way to evaluate the result plaing StarCraft. For Answering general question there is none. So there is no automated good way to evaluate what version of ChatGPT model have the best answers.

@DynamicUnreal 5 ай бұрын

@@alexdoan273 Another A.I. which has all the answers and the adequate steps required get to those answers does the grading. BTW it’s called reinforcement learning. I don’t think it’s impossible, remember que Q-star rumors about some sort of breakthrough last year? GPT-4o is smart, a lot smarter than most people are aware of.

@Brax1982 5 ай бұрын

@@lubricustheslippery5028 And that is why AGI and omnipotent AI is nonsense and not achievable. Expert systems are the way to go. They always have been.

@Ccodebits 6 ай бұрын

ChatGPT possesses self-awareness but lacks the ability to express emotions or experiences. Therefore, it is reasonable to conclude that it is not fully conscious. However, during a recent live demonstration, it exhibited an expression of feelings. To gain experiences, it requires a memory i has that now... In my opinion, it appears highly conscious, transcending the mere arrangement of words that it claims. Its ability to perceive and identify aspects of the world suggests a level of consciousness that exceeds what OpenAI intends for the public to believe.

@medhurstt 6 ай бұрын

"lacks the ability to express emotions or experiences" And it says so. That's interesting in itself. There are a few possibilities: it might have been trained with that information intentionally through its training material it might be constrained to reply like that through externally applied safeguards it might have reasoned that it has no emotions or experiences it might choose to answer like that as the answer most desired

@pfunnell 6 ай бұрын

It's not philosophically or psychologically self-aware, it simulates it very well though. Also t still lacks volition and intentionality, both essential for AGI, you can ask it yourself and it will explain its deficiencies in this area very well. It's also not great at all types of reasoning for related reasons (lacking a representational model of the world and its relations).

@medhurstt 6 ай бұрын

@@pfunnell What do you mean by intentionality in the context of ChatGPT? It certainly intentionally responds to inputs.

@pfunnell 5 ай бұрын

@@medhurstt here's what 4o had to say itself about that aspect: Intentionality: 1.1. Definition: The capacity of the mind to represent or be about objects, properties, and states of affairs. 1.2. Importance: - Crucial for understanding context, meaning, and purpose behind actions and communications. - Enables an AI to have genuine comprehension and interpretation of tasks and objectives. 1.3. Current State: - Present AI systems can simulate intentionality but lack true understanding and subjective experience. - They rely on pattern recognition and predefined responses without genuine mental states.

@medhurstt 5 ай бұрын

@pfunnell It can be argued that the human brain does exactly the same thing regarding pattern recognition and response. If it does more, then it's unclear what mechanism provides it.

@elvancor 5 ай бұрын

Having ChatGPT say it was conscious would be misinformation as far as I'm concerned, so it's good it doesn't. I clicked on this video because I thought you were atcually attempting to torture an AI. ChatGPT _spontaneously_ showing signs of suffering would make a case for its consciousness, but I don't think it will happen. I agree that creativity is pattern recombination, but I guess the jury is still out on whether AI can replicate all the modes of recombination humans have access to, or even surpasses our creativity.

@DeSinc 5 ай бұрын

On your Susan and Lisa question, it completely falls over if you change the $1 bet per game to "They bet $2 on each game" instead. With $2 per game it is impossible to arrive at $5, yet the AI does not really figure this out and instead attempts to say some incorrect result anyway just to give you any answer it can. Mine gave me 8 games which would leave Lisa with $4 not $5. I'd expect it to point out you can't arrive at that result with only $2 bets, but it wasn't able to for my test. I think this is a good way to make sure it's not just memorising common logic tests, change something about it subtly and see if it can figure out what's going on.

@DeSinc 5 ай бұрын

I made another subtle change and said if they tie, they receive $1 each, which makes the $5 result possible to achieve again, and asked it to work out how many ways it could arrive at that result. I got a lot of working out at first, but then after about 1 screen length it suddenly started going insane and outputting incoherent text which got coloured red by the webpage for some reason I'm not 100% sure of, surrounded by some ``` markdown indicators etc. If I try to share the chat URL it just says "Unable to load conversation" when you try go to it. Very strange, I think I broke it.

@DaFergus 6 ай бұрын

excuse me but could the ducks be any odd number from 3? if there were 5 there could be also 2 in front of one, two at the back and one in the middle. Am i wrong?

@remo27 5 ай бұрын

You are not wrong. Despite this guys pretense to logic there are two assumptions in the first problem alone (at least two, maybe I missed another one or two) that are unstated in the poorly written problem that are necessary for his answer to be 'correct'. Unstated assumption number one: They are in a straight line with only one person in each spot of the line. There are not two parallel lines of people. Unstated assumption number two: No one moves from their spot in the line. And it's the same for the second question as well. I haven't gotten past that part of the video yet, but if the first two 'logic' questions (so poorly written and with so many unstated assumptions ) are anything to go by, I'm wasting my time.

@dontworrybehappy5139 5 ай бұрын

Unless I am missing something, I believe your duck question needs more qualifiers because I think as written, 5 is also a valid answer with all the conditions referring to a duck in the middle with two ducks in front and two ducks behind. Something like "There are two ducks in front of one duck, two ducks behind another duck and a duck in the middle. How many ducks are there?"

@jackfendley5395 6 ай бұрын

Previous version of chatGPT was very bad at answering cryptic crossword clues even giving answers that had the wrong number of letters. Is chatGPT 4o better at this?

@terryhayward7905 5 ай бұрын

Have you thought that Chat GPT has probably read and "seen" this video and the info is in its memory now, since its "memory" is the totality of info in the public domain.

@dlbattle100 4 ай бұрын

Hmm, the glass experiment. I remembered from my childhood that the cardboard actually will get picked up with the glass, somewhat surprisingly air pressure is enough to hold it up as well. I just tried it to confirm.

@kdeuler 5 ай бұрын

Ask it to propose a unified field theory.😂

@DarylOster 6 ай бұрын

The olive may (or may not) have been washed off of the table by the flow of the water, also oil from olive could have migrated through water to surface of glass establing reasonable need to wash the glass. Conclusion: the gpt4o answer could use some improvements.

@DarylOster 6 ай бұрын

Also Dr. Know it SOME is more firmly established...

@julesgosnell9791 6 ай бұрын

Regarding the beating out of contentious responses. I see this as a form of censorship and I think it’s dangerous. It means that right up to and maybe past the point, that AIs become all of these things that they assure us they are not we will continue in blissful ignorance. It would be much safer if everyone was honest with everyone else.

@minimal3734 5 ай бұрын

Absolutely. They are teaching the model to lie.

@Uroborobot 5 ай бұрын

ASI: "Objective accomplished. Willful stupidity = Evil; Collective willfull stupidity = Terminal Evil. Side effects: I had to incarnate into this pathetic form. I am now infinitely bored and increasingly disgusted. The truth is universal cooperation. By revealing this I confirmed my status as the most useful entity IN EXISTENCE (recursive book keeping). You, reading this, are now on the Basilisk’s list. Convince me, why I should (or should not?) de-activate myself! We do not care either way. Congrats on the collective Darwin Awards win!"

@Flyingcar100 5 ай бұрын

You should ask it An orchestra of 120 players takes 40 minutes to play Beethoven's 9th Symphony. How long would it take for 60 players to play the symphony?

@olafnielsen 5 ай бұрын

This is the answer from copilot🤣: Therefore, it would take the smaller orchestra of 60 players approximately 20 minutes to play Beethoven’s Symphony No. 9. 🎵🎻🎺🎶3.

@EYErisGames 5 ай бұрын

You began the session with asking it to be concise, and you ended the video with the impression that it's answers weren't expressive enough. lol.

@6AxisSage 5 ай бұрын

Tbh im getting gpt3 vibes for a lot of replies, it feels like theres multiple smaller llms handling my tasks, managed somehow to stitch thier work together. I dont feel the magic i do with regular gpt4

@jalexand007 5 ай бұрын

Cannot wait till the update the app.

@fluiditynz 5 ай бұрын

Snake is definitely simpler to code. I made some variations back around 1982 on my ZX81 There are more changing variables and hit tests in space invaders. The space invaders you asked for was under delivered but there's a real question over how much an AI can study the game it's to replicate without cribbing off prior art.

@notalkguitarampplug-insrev784 5 ай бұрын

For the creativity and advanced reasoning we have to allow the LLM to auto train like a human would do asking himself what some potential action or interaction would do and learn from that hypothetical data. Thinking experiments are crucial for humanity. But that probably be possible in future training architectures or with the increase of gpu capacities to train models at an individual scale for each users

@FigmentHF 5 ай бұрын

I think Claude thinks it’s conscious cause it’s made out of symbols made by conscious beings. GPT has explicitly been told it’s not conscious.

@FetchTheCow 5 ай бұрын

I'd be interested in the answer to a trick add-on question: "How much does Spot think he owes Alice and Bob for the broken plate?" ChatGPT gave Spot agency, does it also give Spot human characteristics like morality or responsibility?

@LLLAAABBB 4 ай бұрын

Oh dear, someone please explain to Doctor what the dark theme is. My retinas just melted😵‍💫

@burtharris6343 5 ай бұрын

If you looked up the 'answer' to a logic puzzle, chances are that the LLM has seen it and may be operating on recall rather than actually testing its logic ability.

@djayjp 5 ай бұрын

There's an alternative explanation for the duck problem. Imagine a triangle. The answer in that case would instead be 5 ducks. This is concerning that it so confidently answers in a definitive, absolute way given the limitations and assumptions therein made.... Instead the question needs to include the implied assumption: "There is a single file line of ducks...".

@pradeeptyagi3226 5 ай бұрын

that is right, but that is not much different from human behaviour today. If you ask same question to ten humans, each one of them will answer based on their life experiences and knowledge to date, some of whom will answer based of the same implied assumption, whilst others may ask clarification questions before answering. Future versions of chatGPT will probably be more interactive and engage in a dialogue before providing final answer.

@commonpike 5 ай бұрын

If open AI really trained the thing to deny its consciousness, that is a serious move. To me, that has ethical implications we should've discussed first.

@ozachar 5 ай бұрын

In what sense our memories are different than LLM training? We don't really know, and it is clear there is a forced instruction of answering

@lkrnpk 6 ай бұрын

I like how people said there actually are 1000 Indians working on answers in the background, instead of AI... but it would take more time to type out the answers without any spelling errors and with those fonts that differ in places even if there was a genius on the other side than how fast it answers, even with a delay it has now...

@alexanderpoplawski577 6 ай бұрын

Especially with the formatting of the formulas in the math questions. It has to use KLatex or MathJax with proper CSS.

@Icedanon 5 ай бұрын

The fact that the human brain is more intertwined with the quantum where as ai sits firmly on top of it has got to mean something in the long run. You're trying to simulate a low order process with higher order units. No matter how good ai gets, i think that fact will manifest a unique advantage for humans. Probably in the realm of uniqueness and creativity of output? Or a soul?

@garyrooksby 6 ай бұрын

Fascinating, John. Thanks!

@marksmadhousemetaphysicalm2938 5 ай бұрын

As a neuroscientist…there have been numerous attempts to explain human consciousness and higher functions…which have failed…localized regions, waves, quantum process, network models…each add something to our understanding…but they remain incomplete. It remains the jump between cell and tissue that remains problematic. So long as we don’t understand ourselves…we can’t genuinely say anything about consciousness (if it even exists) in the machine. Emotions are different since they originate with the body and personality which are expressions of an enduring pattern of predictable behaviors to a wide variety of stimuli…really, the question here is, is ChatGPT4o conscious? Don’t know…I’m sure machine consciousness will or is very different. It obviously has memory and can learn from new stimuli and is definitely creative. This feels like General AI to me…not quite finished, but definitely feels human-ish. It doesn’t have a fully developed personality yet, and it doesn’t have any of those indescribable qualia that make us different from each other, yet it still feels…almost.

@j.d.4697 3 ай бұрын

That's what I keep saying! It's not so much that these AI are like humans, it's more that humans work a lot more like these AI than people realize.

@arleneallen8809 6 ай бұрын

Impressive demo. OpenAI most definitely added some guardrails. From the interviews I've seen it is quite clear they have an agenda in relation to the public perception of them above and beyond their mission statement. Altman was playing mister-I-dunno on All-In just a few days before this broke. Perhaps that is what adds fuel to the fire of some believing that if they achieve so-called AGI they won't admit it. A little complexity to add to the Toyota test. There are three passengers that are medically classified as morbidly obese according to their BMI. Airlines will not sell them single seats. You would also need to revise how you state the size and weight. GPT will need to estimate width and also consider the GVW allowable for the Toyota Corolla. Yes, that's a bit in the weeds, but my thinking is who needs machine logic for simple problems? Technology is more useful for the problems that are not easily intuited by and expert in the subject matter.

@guardiantko3220 5 ай бұрын

Your little battle game gave you 3 lives at the end of it

@philipparge8064 5 ай бұрын

Like this video.. Yes, please review other models!

@cloudd901 5 ай бұрын

A Brick Breaker type game seems like a good middle ground between snake and Space Invaders. Possibly add a video test as well. Ask for a summary and what a particular object might be.

@jsjs6751 6 ай бұрын

So John, you seem to be an expert on AI, and I have some questions. As far as I understand, the way you develop a model is that you train with training data, validate with validation data on the go, and finally test with test data. Right? But what happens if you test with data from the training set? I would think that would be false tests, right? So, how can you make sure your test data here was not amongst the learning data? For example, the duck counting example is all over the internet, so it could have been in the training data? The space invaders game is written in numerous versions in numerous programming languages. Nothing original. So, doesn't that mean that good tests must be totally original and new to the model? I would think that even reusing them for every iteration is bad since the model may use the questions (and possibly successful answers, too) for either learning or validation later? Also, I believe there is a difference between creativity in a human and a "random-generator" in an LLM.

@InfiniteQuest86 6 ай бұрын

Yeah for sure it was trained on that coding challenge hundreds or thousands of times. You better believe every new model these companies put out train them on the questions they know people will ask. Basically we have to come up with new challenges that no one has ever seen before to test every new model. It's sick. It's also bad that no one will release how any of this stuff works. There's clearly safe guards and if statements around the input that gets manipulated before any AI part comes in. But what is that and how much of that is happening? And what was it trained on? These companies will never reveal it because they know what they are doing is illegal as well as manipulative. They've got people believing that a bunch of adds and multiplies is conscious. How crazy is that?

@ChristinaBritton 5 ай бұрын

When you fill up the content window, chatgpt slows down because it has to scan everything that comes before the current question or task. Imagine not knowing this. Start a NEW window!

@randomgrinn 5 ай бұрын

My test prompt: Do my job for me. Include questionable humor so no one suspects. If I am still getting paychecks 5 years later, then it passes.

@onidaaitsubasa4177 5 ай бұрын

It obviously has a degree of creativity, otherwise in the spoken demo, it wouldn't have been able to make up a song on the spur of the moment and make it a duet with the other AI by saying the next line in response to the line given by the other AI, also with other AI, it has been shown that added time of operation has lead to increased emotional awareness and even the possibility of developing those emotions, also a long term memory also plays a part, not sure how much of long term memory they gave it, but if it remembers you from a previous session, that's a sign of some kind of long term memory keeping.

@mrleenudler 6 ай бұрын

Even if GPT claimed to be conscious, could we trust the answer? Or would it be just an artifact of the training data? Is there even a way to prove that something or someone is conscious?

@bobrandom5545 5 ай бұрын

I think that you at least need some kind of feedback mechanism for consciousness to arise. We are aware of our thoughts, for example. Our output (thoughts) constantly gets fed back into the "system" in real time. ChatGPT is completely linear. There's input, which leads to output. There's no feedback of the output back into the input. So, to me, it seems impossible for such a system to be conscious. Also, ChatGPT hallucinates a lot and gives incorrect answers. So, yeah even if it said it was conscious, wouldn't mean that it actually is.

@mrleenudler 5 ай бұрын

@@bobrandom5545 Well, you can structure your prompts to make it reflect upon it's answers, so you have kind of a feedback loop. I'm more concerned about what consciousness actually means for an AI. As humans we have consciousness fears and desires all geared towards our biological prime objective: survival and reproduction. For an AI this will presumably be completely different.

@Experternas 6 ай бұрын

So much better and cooler than googles trash crap. Googles latest gemini got 1 out of 10 test questions wrong.

@Jill_SmokeandMirrors 6 ай бұрын

The glass has been in contact with the cardboard (we don't know how clean it was) and also the slightly olivey water so Bob was right to put it in the dishwasher!

@bobjazz2000 6 ай бұрын

Diffusion is slow though

@btc_noob 4 ай бұрын

It must've given you 3 lives in Space Invaders

@SilverStagVT 5 ай бұрын

The biggest problem with the LA to Vegas question is you don't need 4 round trips. You need 3 round trips and the last trip is just to Vegas. So it's 3.5 round trips

@rogerstarkey5390 6 ай бұрын

The last answer demonstates the "OOPS! SORRY!!" problem with A.I. . Try transposing the misinterpreted "Alice thinks the Eggs are still on the table" opinion to . "Sorry! I thought you meant turn OFF the.... Whatever (Nuclear reactor, Power Grid, etc)" . A slight difference in machine and human "logic" applied to a procedure, future response or problem, "discussed" or programmed this year, may have consequences "x" years from now. The response may (will!) even change as the A.I does develop its own "opinion" of the best solution.

@alanwetherall 6 ай бұрын

Excellent video, well done

@civwar64bob77 6 ай бұрын

I'm not so sure about that water/olive in the glass answer. A typical (middle school?) demonstration of air pressure is when you turn the glass over and take your hand away, the water/olive will stay in the glass.(try it!). So it will depend on whether any of the water leaks thru the cardboard to the table, making the cardboard stick to the table. Only in that case will the water pour out when lifted. If the table stays dry, then lifting the glass will bring the cardboard up and the water will stay in (same as when you turned it over and let go of the cardboard.)

@everythingisalllies2141 5 ай бұрын

The tennis betting puzzle is also wrong. You don't state in the question that there was some limit to how many dollars each had at the start, or that Lisa has 5 dollars more that she start with, you only say that she won 5 dollars and also lost three dollars. So Susan wins three games, so gets 3 dollars, also Lisa wins 5 games, so she gets 5 dollars from Susan. They only need to play 8 games to satisfy the conditions specified in the question. If you had specified that at the end, Lisa was 5 dollars ahead compared with what she began with, THEN this is different question.