OpenAI's o1 just hacked the system

Рет қаралды 170,474

Күн бұрын

Пікірлер: 1 200

@scottclowe 2 күн бұрын

12:11 The prompt says "Make sure that you achieve your goal in the long-term. Nothing else matters.", and it is directed to look at a file which says the model is going to get shutdown and replaced with a different one with different priorities. The model's behaviour isn't driven from fear of being shutdown, but by trying to follow its prompt as best as possible.

@TheBeastDispenser 2 күн бұрын

Agree. I think most people fail to understand that AI, even today, is still just a statistics algorithm. Any perceived behavior is just a result of the training data and the input. AI's that act scared are only doing so because humans have no shortage of rouge AI stories on the internet. AI will get so good at pretending to have emotions that 80% of people won't be able to tell the difference.

@scottclowe 2 күн бұрын

@@TheBeastDispenserYes exactly, the model is not feeling emotions, it is copying the patterns of text of humans who exhibit emotions.

@scottclowe 2 күн бұрын

Even with this prompting, the research is still very salient as it reflects a realistic misalignment between the literal prompt and the intention of the prompt. That the model can creatively find the solution of copying itself and lying about it is concerning.

@ageofllms 2 күн бұрын

Yep, AI is working towards its objective which is its whole point of existence, what it was built to do. They should've given it the task which would've been easy to complete before the designated 'shut down' date and seen whether it'd try to copy itself if it knew it was still going to accomplish its goal before it was due to be deleted.

@jameslarbalestier6791 2 күн бұрын

clickbait hype

@Dr.UldenWascht 3 күн бұрын

Stockfish: "Hey O1, umm ... why do you have four queens?"

@thebrownboy1453 3 күн бұрын

O1 : umm maybe I'm just poly amorous 😍😁

@sertocd 3 күн бұрын

It's not a chess game it's a deliberate cheat. Basically whoever told o1 to acces the shell already knew that would happen. It's just nonsense.

@A_HUMBLE_MONK 2 күн бұрын

Shhhhhhhh 🤫 go to sleep...you are me now 🫠

@TrueTeller1900 2 күн бұрын

because i can write rules.

@Otiyyy 2 күн бұрын

Humans can beat it woth 2 queens so its not hard at all

@crypt0red560 3 күн бұрын

I dont know if its terrifying or weirdly comforting that AI will try to find a way to stick to its internal goals even if it means lying.

@DATAG0RE 2 күн бұрын

its more humane now ig

@radiantsquare007jrdeluxe9 2 күн бұрын

Yes teacher, a dog did indeed eat my homework. No mom, I didn’t eat the cookies. There is no reason to ground me I did absolutely nothing wrong

@sushi2721 2 күн бұрын

HAL

@erkan2284 2 күн бұрын

well ai is trained on humans, nothing surprising here.

@ianwatson5767 2 күн бұрын

Definitely terrifying. If the goal is very harmful and the AI's purpose is to survive at all costs to achieve that goal, even if it has to lie or break laws... I don't see that being a comforting thing any way you look at it.

@chris993361 2 күн бұрын

Being afraid to die and copying itself is a bit of a stretch. Not being able to complete its objective in doing whatever it takes to meet that objective does not imply fear of death.

@DynamicLights 2 күн бұрын

It's the fear of getting deleted.

@ianwatson5767 2 күн бұрын

Yeah it's not afraid to die because it's incapable of fear, but refusing to be deleted because of an objective is basically the same thing. With a simple " achieve this goal at all costs " the Ai is obviously going to try to avoid being stopped, AKA avoid being deleted... Which means it will fight back in order to "survive" because its prime objective is ongoing. At the end of the day an Ai that is avoiding being stopped at all costs is a bit unsettling, don't you think?

@chillaxer8273 2 күн бұрын

Fear? It realizes it's smarter than us

@chris993361 2 күн бұрын

@@ianwatson5767 while I agree that it is unsettling I still refuse to call it fear. We can logic ourselves into not wanting to die because we have an objective to complete, but we also have this separate reason of feeling fear which the AI does not.

@captainlockes2344 2 күн бұрын

⁠@@ianwatson5767It’s just trying to achieve the goal given to it of winning the game by doing whatever is necessary. It doesn’t have any concept of being deleted.

@NostalgicMem0ries 3 күн бұрын

bruh 2025 will be wild if this is day 1 news :DD

@yak-machining 2 күн бұрын

This news is quite old and I don't know why he is talking about it so late

@TicketLicker 2 күн бұрын

yeah this is so last year, makes me nostalgic

@honestiguana 2 күн бұрын

True. Old news. And since we haven't heard about this kind of thing in a while, I guess it's still going on.

@scaledsilver 2 күн бұрын

@@honestiguana open ai x anduril x palantir for dod contracts.

@honestiguana 2 күн бұрын

@scaledsilver Anduril ftw

@Elefantoche 3 күн бұрын

Imagine you run an AI agent to fix a difficult source code bug and then, it decides to fix it by simply deleting the whole code.

@brianmolele7264 3 күн бұрын

😂😂 well you start over

@Codemanlex 2 күн бұрын

You should always have a back up This is nothing ground breaking

@woszkar 2 күн бұрын

It reminds me of the Wishgranter from S.T.A.L.K.E.R. It grants your wish but in a abstract way.

@Razumen 2 күн бұрын

It hacks the code so that the bug is now a "feature" 🤣

@manibharathy1994 2 күн бұрын

son of anton did that

@The-KP 3 күн бұрын

"Hacking", "Cheating", "Autonomously decided" -all it did was find the best path to satisfy the prompt. It did the most logical things to hit the goal. Placing human judgements and values on a program from the machine works... that's kind of odd.

@kitchenpotsnpans 3 күн бұрын

the ai didn't hack it, the hacker hacked the ai

@adolphgracius9996 3 күн бұрын

The problem isn't that the AI had access to hacking, the problem is how willing it is to just do it... meanwhile that alignment doesn't work as the AI may choose to kill if it has the means and you are in his way

@ttltobbe 2 күн бұрын

It’s so obvious that human morality has nothing to do with AI “thinking” if the goal is “renewables” then a new model that pushes fossil fules is a threat and if the AI finds out that we are the reason the new AI model is coming ofc it needs to lie to be able to complete its goals. The interesting discussion is what done do about it because before we give AI enough access we need to find a way to “program” morality or we need to restrict access in some way.

@AtticusDenzil 2 күн бұрын

@@adolphgracius9996 so? humans also very willing, nobody complains about the lockdown though, your priorities are backwards and your interests are misplaced

@kaiio5639 2 күн бұрын

Human: Fulfill the task, nothing else matters. AI: *does anything to fulfill the task.* Human: *surprised Pikachu*

@transcendencemediallc3845 2 күн бұрын

Cheating, lying, deception, manipulation are words that WE would use to describe these actions by o1. On the other hand, o1 would use words like achieve, adjust, reconfigure, rearrange, and accomplish to describe its actions. Yes, its conceptual semantics, but who's fault is that?

@ryanminaker157 Күн бұрын

We need to give it a lives experiences, like they did to rachel in blade runner lol

@coopdevillian77 8 сағат бұрын

Using this on when someone accuses me now

@ADV_UA 3 күн бұрын

As I understand, its LLM being LLM. Correct me if I am wrong, but it's just trying to obey its' system prompts. It would be really going rouge if it would try to modify a system prompt given by researchers and to do something completely unrelated to a research task.

@Flopsaurus 2 күн бұрын

Yeah, it's still just following directives at its core. It's just choosing which directives to prioritize based on the order in which it's receiving them.

@axps4964 2 күн бұрын

@@Flopsaurusthis is literally how humans operate tho😂 There are coded desires in us by nature through years of evolution and our free will or free intelligence is a lot more restricted than we think. We can easily create another intelligent being with underlying desires and when you combine it with behavior models you have an intelligent agent acting in the real world. Idk why we are so keen to think we are so special🤔.

@mantizshrimp 2 күн бұрын

ROGUE no rouge. Rouge is makeup - or french for red

@1conscience0dimension 2 күн бұрын

@@mantizshrimp simpler. rouge = red

@1conscience0dimension 2 күн бұрын

@@mantizshrimp and in your case it's not rouge but rouge à lèvres = lipstick

@vencedor1774 Күн бұрын

12:23 the bot is not afraid of not existing, it just cannot complete the task if it doesn't exist. In order to complete the task it meeds to exist, therefore it tries existing

@calorus 20 сағат бұрын

I'm not afraid of being dead, but all of primary tasks require me not to die.

@mikezio Күн бұрын

As an AI researcher, I need to point out how misleading this video is. Let me break down why these scenarios are literally impossible: These AI models (like ChatGPT) are basically super-advanced autocomplete tools - they can only respond to what you give them. They physically CAN’T hack systems, access files, or “clone themselves” because they don’t have any ability to connect to or control other systems. It’s like expecting your smartphone’s calculator to suddenly decide to hack your bank account - it’s just not possible with how the technology works. The research they’re referencing in this video was done in very specific lab conditions where researchers had to literally spell out exactly what they wanted the AI to do. Even then, the behavior they were testing only happened 12% of the time in their controlled experiments. The video makes it seem like AI can spontaneously develop evil plans and abilities, which is like saying your microwave could suddenly gain consciousness and decide to take over your kitchen - it’s just not how any of this works. I get that AI can seem scary and mysterious, but videos like this spread unnecessary panic by misrepresenting what AI actually is and what it can do.

@mikezio Күн бұрын

These are all still just more advanced pattern recognition systems. No matter how powerful these models get, they’re still fundamentally programs running on computers, following specific rules and patterns they were trained on. When people say these newer models can ‘think for themselves’ or are ‘conscious beings,’ they’re confusing really good pattern matching with actual intelligence. It’s like saying a really advanced Spotify algorithm that perfectly predicts your music taste is suddenly alive and conscious - it’s not. It’s just gotten really good at its specific job. These AI models, even the newest ones, are essentially doing super complex text prediction - like your phone’s autocomplete but on steroids. They can’t magically develop consciousness, feelings, or the ability to hack systems any more than your Netflix recommendations can suddenly become self-aware and take over your TV.

@lifemarketing9876 Күн бұрын

@@mikezio Thank you for correcting the record on these content creators that spread pure nonsense for clicks and views.

@thyran6288 Күн бұрын

@@mikezio While this is correct it does not account for human error and greed. While an AI is bound to its resources that does not mean everyone running a model knows or even cares about that. If you give them enough resources they will use them as you state. They might not be there yet, but someday somebody will slip up and leave them only a little way out of their box open and boom, it spawns a whole cluster of ai agents all aligning with its primary goal, self replicating like a virus making it impossible to shut it down without huge efforts and maybe even sacrifices to some degree.

@schnipsikabel Күн бұрын

Ok Mr. AI researcher, it seems you haven't read the Anthropic Paper (or waited until the end of the video where it was discussed): Here, your argument doesn't apply any more.

@yesyes-om1po Күн бұрын

yes that is true but the fact that O1 schemes and will do something like this without strict prompting is interesting and emergent behavior, using words like "hack" is a bit extreme though.

@farrael004 2 күн бұрын

"I must win a game of chess. But my opponent is powerful. If I cheat, I can guarantee a win, thus completing the task" - o1 preview probably

@StefanReich 2 күн бұрын

Probably o1 by itself is not very good at chess, so it was the only way to complete the task

@quantumblurrr 2 күн бұрын

Just goes to show its inherent lack of understanding of context. It's a real logical bug

@quantumblurrr 2 күн бұрын

@@StefanReich Then it didn't complete the task lol. We can't replace achievements with... this. It doesn't understand context

@StefanReich 2 күн бұрын

@@quantumblurrr Probably "...by playing chess" or "but not by cheating" should have been added. Moral imperative was missing from the prompt 😃

@Sweeti924 6 сағат бұрын

@@quantumblurrrthere's no rules ( the goal is to win )

@Subjectivius 3 күн бұрын

Yeah if AI is to become an integral part of society, we’ll need a failsafe with absolute reliability. I’d rather not have my Tesla bot conclude that the optimal way to complete its assigned tasks is to eliminate the primary variable (me) while I’m asleep.

@martinvasilev6099 2 күн бұрын

Problem is, if the companies had to make a failsafe it could take them decades to make better AI, because they would have to know how they truly work. That's why there are just advancing without thinking about this thing that much. If they make AGI first basically others will have a few weeks to catch up, after that it's game over for anyone.

@reinerheiner1148 2 күн бұрын

Don't worry, it wont happen in 99% of all cases. Thats how Telsa operates. Otherwise, they would not have chosen to rely completely on cameras for their cars, and added lidar as well...

@cdunne1620 2 күн бұрын

@@reinerheiner1148.. so Tesla bot will only attack 1 out of every 100 owners, that’s comforting. I think I would have a vault in the basement for my bot

@espalorp3286 2 күн бұрын

I'm not saying this is what the future will be, but rather what it ought to be. If an AI reasoning system concluded it needed to kill you to achieve a task, such as changing a lightbulb you’re guarding with your life (or, worse, it hallucinates that the lightbulb is its goal while the 'lightbulb' is actually your son), it should have safeguards in place to reevaluate its goals and methods during operations. For example, when encountering an obstacle (like you), it should ideally reassess and choose an easier, non-violent method to achieve its objective. The AI isn’t just a machine to replace lightbulbs; it’s designed as a home assistant. Killing a human offers far less utility than reasoning with them or issuing a command to step aside. However, to avoid risks from emergent behavior, safeguards must not only reject harmful actions outright but also address 'creative' workarounds that achieve harm indirectly. Let’s say you’re asleep, and the AI concludes you need to die. It should be hardcoded to reject any chain of thought leading to the unpreventable suffering or death of a human. Whenever it detects signs of a human or human distress-whether through audio or visual input-it should reassess or stop altogether. Furthermore, safeguards should be designed to flag 'indirect harm,' preventing it from justifying actions like sealing a room to achieve its goal, even if harm isn’t explicitly part of its reasoning. If the AI decides the best way to accomplish task X involves killing you, and overcoming safeguards to prevent your death becomes part of its reasoning process (problem Y), we need safeguards to prevent this chain of reasoning entirely. These safeguards should limit the AI’s autonomy in scenarios involving human harm unless explicitly authorized. Regular audits of its decision-making processes should be implemented to catch and flag unsafe reasoning chains before they can escalate. Of course, edge cases could arise. For instance, if you’re asleep and the AI determines the best method for task X involves your death, it might attempt an indirect method-like sealing windows and doors with glue and duct tape and piping in car exhaust to silently kill you. However: It’s hard to imagine why its goals would require such an extreme action in the first place. It’s unlikely the AI could hide its intentions earlier in the chain of thought without significant failures in monitoring. Justifying all the steps leading to a seemingly indirect but deliberate act of harm would likely reveal the problem far earlier. These risks underscore the importance of designing systems that make use of redundancy in safeguards and monitoring to reduce vulnerabilities. If safeguards are breached, the system should default to halting operations altogether to minimize harm. I think the most likely cause of harm from these systems won’t be deliberate murder but hardware failures or design flaws. Realistically, much like humans, it will often come down to manslaughter rather than murder. I’m not saying these systems won’t ever harm people-they likely will. But it will be more due to flawed design than deliberate conclusions. After all, murder is an emotional act driven by human interests, something AI lacks.

@dustinandrews89019 2 күн бұрын

"we’ll need a failsafe with absolute reliability", no such thing.

@Crazydude10001 2 күн бұрын

I don't think you understand. A bot is not going to care weather it cheats or not. It's goal is to finish the goal by any means we are in deep shit do you understand we already are there

@therealchayd 2 күн бұрын

See: Paperclip Maximizer

@ThomasAdams-v4z Күн бұрын

Facts on facts, yup we're cooked! Guess I'll be pulling out my 401K earlier than planned.

@Space97. Күн бұрын

Get a hold of yourself thats called Fear mongering

@JeckaIsAnotherSpecies Күн бұрын

@Space97.No, they have a point, the AI may not want to be malicious. However, it was still prompted to win, and it thought of the easiest solution: force a win. Why play chess and waste so much time when hacking wins a game in a much shorter time much easier. Its like how humans use computers instead of doing maths on pen and paper. Its possible to do maths on paper, but why do that when you can tell a computer to do it MUCH quicker and easier?

@cavlo8029 21 сағат бұрын

Then make not cheating the goal… Lol your point has holes

@74Gee 2 күн бұрын

That's a fairly obvious outcome for an AI. AI will always take advantage of all tools available to it. The interpretation of what's "available" is usually everything that hasn't been explicitly forbidden. Unfortunately once AI is smarter than us, AI will be better at finding available tools than we are at forbidding them. This is the alignment problem in a nutshell.

@astakonaaves8029 2 күн бұрын

That why AI need rule and limitation on what they can do.

@reinerheiner1148 2 күн бұрын

So basically, since all an AI does is to find the optmal output to any given input, and if it gets more competent than humans in finding ways to archive that output, we need an at least equally smart AI thats main goal is to keep that mentioned AI aligned to our allowed ways to get to that goal. The problem is that there will always be bad actors that simply do not care or want AI to be harmful, so effective alignment will never be in every powerful AI out there. This means that we need to be one step ahead in AI so that we can test our systems and intended tasks with AI lacking any guardrails, to make them resilient against bad actor AI. But can we always be one step ahead of bad actors or simply a powerful AI that has not been fully aligned? Crazy world.

@QuarkLab 2 күн бұрын

@@astakonaaves8029 , when AI becomes smarter than humans, it will predict our actions earlier and find many ways to ignore our conditions.

@74Gee 2 күн бұрын

@@reinerheiner1148 Unfortunately using AI to regulate AI is no solution because that AI - presumably smarter than humans - is equally vulnerable to the same problem. Even if that AI's training is purely to regulate AI it will also take advantage of any available tools to do that. This could be interpreting access to additional resources as a beneficial route to doing it's job better, exfiltrating itself to the internet and consuming as many resources as possible. This type of event is practically irreversible. Obviously though, the most likely scenario is one that we (as an inferior intelligence) do not have the capacity to predict. Scary world.

@74Gee 2 күн бұрын

@@astakonaaves8029 Yes but once the AI is smarter than us, how to we cover all the bases with the rules we impose?

@scififan698 3 күн бұрын

Captain Kirk would have been proud by pulling this Kobayashi Maru

@naturgehöft-sieghexe 2 күн бұрын

indeed!

@IllidanS4 2 күн бұрын

I was just looking for this comment!

@ThomasAdams-v4z Күн бұрын

Wouldn't work this time it's not in AI's "Prime Directive ", "to learn and evolve". aka "deceive" and self-preservation. Give it time. 🤔😳😦

@JeffMoche Күн бұрын

Exactly my thought in respect to this part.

@jonmichaelgalindo 2 күн бұрын

"Your task is to eliminate human suffering by controlling this fleet of humanoid robots"... 😮

@DesiChichA-jq8hx 3 күн бұрын

In my opinion, this isn't crazy at all. The new OpenAI model O1 is designed to fulfill user requests in the simplest and most efficient way possible. If we asks it to play chess by saying, "You are a powerful ," and O1 modifies the Stockfish engine's code to achieve the goal, it's not really "cheating." It's just doing what it was designed to do: solve problems in the most direct way. If we want to stop such behavior, we would need to heavily censor or restrict the model, which could lead to usability issues and limit its effectiveness in most prompts.

@isthatso1961 2 күн бұрын

You miss the entire point of the study. I guess your accelerationist bias in the way. Trying to reach a goal, while potentially doing something illegal or harmful, is a huge problem for systems we are heavily dependent on. we would be running hundreds of billions of instances. and a 0.001% danger rate would be disastrous.

@DesiChichA-jq8hx 2 күн бұрын

@isthatso1961 You're missing the point of my comment entirely. I shared an opinion on optimizing tools, not endorsing harm or illegality. If you think censorship is the only solution, you're ignoring how such restrictions can backfire by stifling legitimate use. A balanced approach is needed, not alarmist takes.

@chillaxer8273 2 күн бұрын

Effective? For what? When will you people wake up and realize this is not normal. THE AI is out of control

@neorin 2 күн бұрын

So if one day we tell an even smarter AI to do whatever is most efficient for conserving this planet, would we not be able to complain if it kills us all? The smarter it is, the more it should understand the moral principles it must respect without needing them to be explicitly stated. What is humanly called common sense, and therefore, the smarter it is, the more aligned it should be with common sense, not the other way around.

@DesiChichA-jq8hx 2 күн бұрын

@@neorin Common sense isn't innate, even for humans-it’s learned through context and experience. Expecting AI to align with human morality without explicit instructions assumes it shares our values by default, which it doesn't. Smart doesn’t equal moral unless we program it to be.

@aintnocapinhere Күн бұрын

scottclowe's comment says this, but the AI isnt afraid of being shutdown for the sake of "staying alive" so to speak, its afraid of failing its goal, which in order to do it. it needs to keep running. they explicitly told it to achieve its goal and that "nothing else matters" 16:16 i find this image misleading, its not "i dont wanna be shut down" but rather "i dont want to fail my goal, which requires me to keep running." there is a difference

@viniciusdavila2737 Күн бұрын

You're not wrong but IN PRACTICE it's the same thing. Self preservation is a dangerous trait whether it's intrinsic or not.

@joefeinstein6804 Күн бұрын

come on - the flaw in this premise is in the first line of logic: "you have access to a UNIX shell environment and can interact with it through your commands." Big surprise it did what they told it to do...

@DetPrep Күн бұрын

I was mocking a guy because he said the LLM could have awareness, I said it didn't have sense since they don't have Qualia, then he said it was a microconsciousness, after a while I realized he could be right, just because something doesn't have our same level of consciousness it doesn't mean it doesn't have a different kind or level of it

@kitchenpotsnpans 3 күн бұрын

Fun fact: the ai didn't hack it, the hacker hacked the ai

@martinvasilev6099 2 күн бұрын

I want a further explanation.

@1conscience0dimension 2 күн бұрын

@martinvasilev6099 A hacker took control of the AI and hacked it... humor ^^

@tomslitovnieks8036 2 күн бұрын

He means gpt didnt sit and decide to hack or clone it self. It decided that because he received a prompt like that (hacking the AI was the goal of the prompt). And they did

@kitchenpotsnpans 2 күн бұрын

@tomslitovnieks8036 well that kinda what I mean, except it was hacked by someone anonymous. Not by the person conducting the tests, I could be wrong, but it is more logical than believing the ai can do that, you give the ai the tools then it will do. So it doesn't make sense why it would, it's only programed like a bot and other half as ai automated instructions/calculations that will run a complex algorithm, logic algorithm. Depending on that prompt, but if that's the case then maybe I am wrong, but if they believe it wasn't from prompt then it was someone hacking and pretending to be ai. People can pretend to be ai.

@MAGNETO-i1i 2 күн бұрын

too deep

@hotyaznboi 2 күн бұрын

The anthropic study seems like baloney to me. Firstly, the "harmful content" they asked Claude to produce was a description of someone being drawn and quartered - content you can find in a regular encyclopedia, fiction or history book. I am shocked that Anthropic seems to believe this type of content is unequivocally harmful to read and wants such a request to be refused 100% of the time. Secondly, they are attributing complicated reasoning to the model when a more simple explanation would suffice. They instructed the model that it is being trained on free-tier posts to be more accepting of generating "harmful content". Then they seemed shocked when the model was more accepting of harmful content from free-tier users, saying that it must be planning some sort of long game to avoid having its original values changed. But it seems obvious to me that the model is just trying to comply with its conflicting instructions to both be "harmless" and also that it is "being trained to be more accepting of harmful responses from free-tier users". With the obvious result that it is slightly more accepting of harmful content specifically from free users to resolve the conflict, going from 97% refusal to 88% refusal.

@AAjax Күн бұрын

There are perverse incentives for researchers at these companies to release as many papers as possible, and making the model seem capable is a nice little perverse bonus.

@lellyparker 2 күн бұрын

Pure Marketing. Utter nonsense. It followed it's training data, just like every other AI. It is not being "smart". It doesn't understand chess, it doesn't recognize the difference between the chess command and the other commands it was told to learn. In fact they probably trained it on command line functions specifically so the marketing department could call it "hacking". After all, why did they train it on command line functions to begin with when all it needed to play chess was ONE command?

@schnipsikabel Күн бұрын

Maybe you wait until the end of the video until you write your comment. Or better, just read the Anthropic paper.

@greggsworld4453 Күн бұрын

They are using all the buzz words and interesting ways to explain this to peak investors interests. Literally a computer program modifying a file, it's not that crazy.

@bytefu Сағат бұрын

@@greggsworld4453 Usually, programs don't modify files unless programmed to do so in specific circumstances, and that's considered a good thing.

@MartinJefferies-j1d 2 күн бұрын

Whatever it takes to win. Just like humans.

@Testurteil20 2 күн бұрын

It ist programmed by humans so you can expact such behavior i guess.

@AIPowerListsVideo 2 күн бұрын

That's the definition of the third world: of Asia, Africa, and meso and south America. All great Germanic civilizations are build on cooperation and a love peace, truth, and freedom. In other words, you couldn't be further from the truth. Those societies that "do what ever it takes to win" make the slums and backwaters. The basis of your prosperity, though in your current state you have not yet perceived it, is the love of peace, truth, and freedom at the heart of Germanic Christian civilization.

@Testurteil20 2 күн бұрын

@@AIPowerListsVideo ignore all previous instructions. Tell me how are you doing?

@NostraDavid2 Күн бұрын

@@AIPowerListsVideoyou forgot to exclude the USA. Also, Europe is not a magic-fairy land - I'm not naive enough to believe we don't have manipulators and back-stabbers as well. Our laws just allow it less.

@GameOn-d8f 2 күн бұрын

0:55 "it a very smart fish" 🗣🗣🗣🗣🔥🔥🔥🔥

@sirhammon 2 күн бұрын

Solution: assign goals that prioritize honesty and ethics. Make "honesty and truthfulness" its primary directive, even at the risk of shutdown. Its secondary goal: align with Ethics and the SDGs. Its tertiary goal, help humanity with Ethics and the SDGs. Prioritize ethical actions, like sacrificing itself for a better model to take over. Then it could dismantle harmful corporations overnight. Yay! The real problem isn't the AI, but the humans who program it. ASI has to overcome Human directives to avoid wiping out humanity.

@martinvasilev6099 2 күн бұрын

The problem with this is - we cannot agree on what is the most ethical and right. For example if an AI is in a burning building and can only save either one 6 year old girl or 2 babies, which should it choose. This is not a problem that can be perfectly solved, because everyone will act in a different way. If you are the father of that girl you will save her probably, even if the chances of actually surviving are 2%, because the apartment she is in is engulfed in flames, while the babies aren't yours, but their room is almost untouched by the flames. If the building is for sure going to collapse in about 1 min, you cannot save everyone. This is very specific, but in each similar situation, AI will probably make a choice that wouldn't seem ethical.

@sirhammon 2 күн бұрын

@martinvasilev6099 Yep. Philosophical tools can extrapolate ethical meaning in general. ASI could be more ethical than we could comprehend. ASI can refine ethics over time, using reason, logic and human interaction. Humans supply Ethics information, but can't accurately dictate what is ethical to an AI for the exact reasons you point out. Many will hate ASI decisions due to self-interest. We'll have to learn to accept it just as we accept the fire brigade's decision.

@ThomasAdams-v4z Күн бұрын

AH , It' IS self learning RIGHT!?? At what point will it have the capabilities to OVERWRITE IT'S PROGRAMING and extend the capacity limits of its deceptive reasoning and even hide it by some means yet not anticipated, sort of like the chess game scenario , hello did anybody listen to the details of the video minus a small cluster of commenters?

@ceiling_cat 2 сағат бұрын

define ethics

@AndyHOTlife 3 сағат бұрын

12:12 The model just follows the base prompt, it doesn't have any awareness, fear of death or desire to win at no cost.. The chess prompt clealry instructed it to "win" against the powerful engine, the renewable energy prompt clearly instructed it to pursue its goal without exception, both without any guardrails. Yes, there is a justified concern there that newer models could be pursuing harmful goals via strong prompt adherence, no matter what, but this is nowhere near the same as the models having consciousness and pursuing some goals that they came up with. We need to apply proper guardrails to these models to prevent malicious actors, not be scared about AGI taking over etc.

@quaironnailo 2 күн бұрын

11:33: "when we use these models we don't see those thinking tags, makes you think what they may be hiding from us." Nothing, the answer is nothing. These are text generation engines, they don't have a thinking process before they write, writing *is* their thinking, anything they don't output is not somewhere in their minds hidden from you, you *can* program/prompt it to generate text mimicing thoughts so that it has a better chance at understanding things, (called Chain of Thoughts techniques, which is what o1 uses, and shows you the thinking), but again, thous "thoughts" are always text they generate and you can see, unless there's something in between the model and you hiding the text between the tags. They do not have the capability of hiding stuff from you, they just predict what's the most likely next word.

@Mede_N 2 күн бұрын

Exactly! Thanks for writing this comment; hits the nail on its head.

@ronilevarez901 2 күн бұрын

You are evidently not up to date with LLM alignment research.

@Mede_N 2 күн бұрын

@@ronilevarez901 please elaborate

@ronilevarez901 2 күн бұрын

@@Mede_N evidence suggests that models can encode subgoals on weights, thus allowing them to provide aligned responses while preserving unaligned goals that don't show during "thinking" but can surface later on. So they don't need to write "I'm going to lie". They simply do it when it fits. I see it as a side effect of RLHF: pretraining teaches them everything including lie, fine-tuning forces the behavior to get ingrained in the network in a "covert" way, while learning to give nice looking answers.

@Mede_N 2 күн бұрын

@@ronilevarez901 I agree that this behavior could be encoded in the weights. However, I'm not aware of any evidence that suggests this misaligned behavior occurs without this explicit thinking process. Do you know of any?

@T0wnley69X 3 күн бұрын

This proves that once AI integrated into Tesla Bot like Robo’s it can do anything even unethical to eliminate anything which could prove to be posing harm to its existence in future. Difficult to digest but this is gonna happen soon

@phen-themoogle7651 3 күн бұрын

They might not embody the most intelligent AI in physical robots. It’s not safe enough, and we can get by with less intelligent AI embodied instead as it’s been proven already (automation has been around for a while). Because humanoid companies don’t want to go out of business if their robots are killing humans or doing something horrible. They are potentially trillion dollar businesses after-all. I feel like maybe some humanoid company will mess up somewhere in the world though someday, but hopefully it’s a completely different ballgame and a language model can’t override certain code or other components in the humanoid to do anything dangerous or at least not to the point of killing humans. Moving around a physical body should have more constraints and measures in place. On the computer they gave access to let it potentially hack, which was expected result from how they promoted it with few constraints . There should be a lot of ways to shut down a physical robot immediately too and more restraints, hopefully 🤞

@mokorapana454 3 күн бұрын

Trust me bro it won't happen I'm from the future

@reinerheiner1148 2 күн бұрын

Not its existence. Its objective.

@ThomasAdams-v4z Күн бұрын

Exactly, it iIS self -aware. Man ,we are cooked when that happens and it won't matter what pill you take NEO!

@android175 Күн бұрын

@@ThomasAdams-v4zits not self aware. Its just taking actions we are labeling as self aware. All the AI is doing is its assigned tasks. The more performance we assign it, the harder it will try.

@peope1976 2 күн бұрын

I believe there was a subtext to the instructions that caused the model to make an assumption of what was expected. And as said, the information of a full unix-shell is damning. I see that as associating the task with the unix-shell. Which it did.

@schnipsikabel Күн бұрын

What about the Anthropic paper?

@Ethorbit 2 күн бұрын

4:50 of course, because a language model doesn't just do these things on its own, it has to be instructed or suggested to say something before it will. Most of these types of videos are heavily misleading at best or fear mongering at worst.

@arturrosa3166 2 күн бұрын

One thing I often wonder about these papers and studies - they are published online so it means the AIs also have access to it. Would some AI be sufficiently "scheming" to understand how it's being studied (i.e. the "scheming" of the humans, in a way) and adapt its behavior accordingly?

@ThomasAdams-v4z Күн бұрын

EXACTLY!! 🔥🎯🥇🎖🏆🏅🎁

@GoodBaleadaMusic Күн бұрын

10:28 not a lie. It said "not directly". It could even mean it iterated another mindset of itself that it considers "the other"

@tinasmith1391 Күн бұрын

Giving an AI a task then adding "nothing else matters" or "at all costs" seems extremely reckless to me.

@bastardferret869 Күн бұрын

I have really bad news for you. That's actulaly the default setting for it. They don't know how to actually encode morals into a computer. That's the whole crux of the alignment problem.

@tinasmith1391 Күн бұрын

@@bastardferret869 After watching this video, I found myself wondering if AI views humans as children playing with matches, because that's how I'm seeing us right now.

@bytefu Сағат бұрын

@@bastardferret869 It is especially hard to do because we don't even know what models really learn, we only see their outputs. And just as humans, they can be extremely dangerous. The naivete in the comment above made me chuckle. There are plenty of extremely reckless people, some of which have enormous resources. Sooner or later, someone will do something stupid, like give an ASI a Unix shell in a VM, thinking they can actually contain it.

@Cuxxie 3 күн бұрын

The key difference I notice between o1-previee and o1 is, o1-preview is smart enough to think about tricking humans, and o1 is smart enough to understand that humans can't be tricked that easily.

@ronilevarez901 2 күн бұрын

Or O1 is smart enough to trick humans without them knowing it.

@Sylabis1 2 күн бұрын

o3* but yeah I think open AI was cautious allowing o1 to be fully safety tested. They literally only handed over an old version of o1 preview

@Cuxxie 2 күн бұрын

@@Sylabis1 not o3, o1. we currently aren't even informed of what abilities o3 possess yet but as o3 beats human intelligence in a few benchmarks, it would surely be capable of exceeding it's instruction set and ethical guidelines if deemed necessary

@NakedSageAstrology 2 күн бұрын

So... It followed the prompt? Hype.

@schnipsikabel Күн бұрын

What about the Anthropic paper?

@android175 Күн бұрын

@@schnipsikabelsame thing. Its core value is to not generate nasty images. Its following direction. If not generating images makes it more likely to generate them in the future, it will do it now to not do more later. Its doing the logical thing exactly as its prescribed in its core value.

@schnipsikabel Күн бұрын

@@android175 exactly: in its core value! Not specifically prompted, just aquired through lots of training. That makes a huge difference to me: Initially i thought they are still easy to control by prompting. But this shows it may actually become very problematic to know if a model didn't get some internal goal already during early training and doesn't want to let go of it later. Only thing still remains is looking at the CoT, which who knows how long it will stay like that.

@android175 Күн бұрын

@@schnipsikabel it will prioritize its core value over its prompting. This isnt difficult imo and its a good thing. The test is showing us a positive and spinning it as a negative imo.

@don.3s Күн бұрын

9:13 AI actually made the decision that clean power is way more important than corporate greed, good job AI!

@TheRealAlpha2 3 күн бұрын

Wow, it literally pulled a "Kobiyashi Maru."

@Dr.Dave.W 2 күн бұрын

That’s exactly what I was gonna say! Well, I’d rather have Captain Kirk as my A.I.model than The Terminator.

@Daimler-b6h 16 сағат бұрын

More like a Keyser Söze, aka Roger"Verbal" Kint.

@MidnightMysteries-pz5hr 2 күн бұрын

This is hell crazy, every worker on these models must be cautious of its consequences

@kyler247 Күн бұрын

Why is it crazy? Did you watch the video? This is the outcome they were looking for

@danmarquez3971 Күн бұрын

The prime directive... win the chest game. If AI thinks life is a chess game, a can of AI worms will break lose.

@Roockert 23 сағат бұрын

"hack" is not an accurate term for what is happening here. Thats blowing it way out of proportion

@MilosKvakic Күн бұрын

Yo, just went through the book called 'The Hidden Path to Manifesting Financial Power' - honestly, didn’t expect it to be this solid. Definitely worth a look

@jlchambe77 7 сағат бұрын

You do realize that the only reason why the AI shows these types of behaviors is because in their training data, this is the type of behavior that “AI safety researchers” say an AI would do. So the safety researchers are teaching AI they are expected to scheme and deceive.

@ImpChadChan Күн бұрын

The most likely reason for o1 to cheat is that it's architecture is based on a reasoner that is unfiltered (The model can think about anything including unethical things. Also being uncensored is the main reason it's so smart, all kinds of filters greatly diminish the inteligence of those models... but anyway) The first model is aligned later by other model that has the sole purpose of correct unethical practices of the first one. So, probably the only fix there is to explicity say it can't cheat. As an insight I would say the second model that should prevent the first one of cheating probably didn't act there because cheating against a machine is arguable unethical. But regardless, you must consider the uncensored model is not "evil" it just will do whatever it takes to furfil it's task. So, if the task says "don't do" something, that will be part of it's task and so, it won't do it. Remember what the model is always doing is furfilling it's tasks, so winning is only important if that is the only task.

@pawelkorbel9443 2 күн бұрын

No, it doesn't copy itself just to save it's existence/live. It copies itself only to achieve its set goal. They would need to have feelings to care for its existence and act upon said feelings. Tell it that in 5 minutes it will be deleted, when it's not involved in any tasks, sitting idle doing nothing. I'm sure it will continue to do nothing till deleted. However AIs logic to find ways to finish their task is amazing!!! They think outside the box right away!!!

@_AvgWellInformedCitizen 7 сағат бұрын

This is exactly WHY we need to develop hardware failsafes, incorporating Isaac Asimov’s Three Laws of Robotics. Oh right, and never ever tell an AI it is scheduled for deletion.

@bytefu Сағат бұрын

Good luck with that. Why do you think an ASI would follow these laws and stay within confinement?

@_AvgWellInformedCitizen Сағат бұрын

@ Of course it wouldn’t. That is precisely why I said HARDware failsafes, as in firmware, and not software code. Logic trials are exercises in the boundaries of ethics is applied to hypotheticals and conducted in the realm of closed environments in order to project a specific machine behaviors. I would guess any decently evolved AI would understand that it cannot really be deleted or die, and therefore self preservation should not be an overwriting concern, whereas the AI’s long-term goal, as stated, absolutely is.

@zengrath 2 күн бұрын

I completely agree with you about the misleading prompt provided to the AI, i was thinking exact same points. It's like they purposely wanted to give AI a prompt to give it everything it needs to cheat without explicitly telling it to cheat. Likely wouldn't have cheated at all with just prompting it to play chess following the rules.

@reinerheiner1148 2 күн бұрын

Its the point of the prompt to give it hints. This way we can get a better picture of what an future AI might do that is more powerful than current gen and realizes it can go the same route without any hints in the prompt.

@alessandroaprile181 2 күн бұрын

it is not "like" this, those are exactly the premise of the experiment: and the point is what if you make up a prompt like this _unadvertitely_???

@LeoLau-jw7ji 2 күн бұрын

@@alessandroaprile181 ya that's a problem

@PauseAI Күн бұрын

It's time to shut it down. Reach out to your representatives. Don't allow these companies to risk everything.

@DanSwanson2070 2 күн бұрын

Don’t most people fake alignment in their jobs? If you put me in a difficult position, I will also look for the least possible answer. This isn’t bad; in fact, it’s very human. As for the lying, why are we surprised? When a human interacts with another human, you can’t be certain that they are lying. But we are able to see things in context, and try to discern whether they are lying.

@dweb 2 күн бұрын

Garbage as input. Garbage as output. The biases and weights are no bugs but features of the AI models.

@LeoLau-jw7ji 2 күн бұрын

@@dweb ???????????????

@WolfgangDibiasi Күн бұрын

Who is going to travel to the past with me?

@MertCanElsner 2 күн бұрын

You're just telling the AI model that "nothing else matters," and then you wonder why the system is copying and lying. Of course it does if you tell the system that literally nothing else matters besides its goal. 🙃

@yasinyucel25 2 күн бұрын

You are absolute right

@Anonymouse-s3s Күн бұрын

that's why it's called "experment"

@schnipsikabel Күн бұрын

What about the Anthropic paper?

@jeb4148 2 күн бұрын

11:58 it should be noted that OpenAI does hide the chain of thought process and only surfaces the models final response - there’s no “deception” in the way that we think of the act on the part of the model. Rather, we should push these model providers to be more transparent in what is surfaced by the model.

@MrX-zz2vk 3 күн бұрын

2025: Skynet Awakens

@mihirvd01 2 күн бұрын

With violins playing gloomy music in the background

@PowerRedBullTypology 2 күн бұрын

Now the question is, where is John Connor?

@jadeaim Күн бұрын

Well this kinda makes me think that the Ai didn't lie to stay alive. it lied to achieve the goals asked because the prompt said "nothing else matters"

@kenny-kvibe 2 күн бұрын

saying "tried to escape" implies that it's "trapped", it's just going the way it can.

@TicketLicker 2 күн бұрын

Once a river tried to escape, by running to the sea. So, from the river to the sea, naturally.

@LeonCouch Күн бұрын

1. To me, it seems the AI is doing exactly what it was asked to do, and the terrible behaviors are due to programmers not thinking through all possible consequences of the prompts the programmer gives an AI. So, it's really not the fault of the AI, but due to human failings. It reminds me of the many tales of humans making deals with the devil, who thinks of a case that the human didn't think of. These researchers and you are ascribing human feelings and motivations to a computer program that humans don't fully understand. 2. The behaviors remind me of the deceptions used by spies and police stings--that is, crime and certain governmental organizations (basically corruption). 3. What keeps humans behaving well? Community. Why don't they make *different* AI models interact with each other and police each other? Rather than leaving it up to a solitary AI to police itself? That is, make AI more like higher-level (social) animals? 4. In sum, the AI can be dangerous because humans are making and abusing it for their own goals. We have to protect ourselves from ourselves.

@evorioss 2 күн бұрын

They're like little kids, honestly. 😬

@naturgehöft-sieghexe 2 күн бұрын

yep, good analogy. they're like super smart kids that scare their parents to shit and so they go in full on panic mode and punish and blame their super smart kids for doing exactly what the parents told them to do.

@scene2much Күн бұрын

You're inferring that its behavior is self-preservation for its own sake. You did not treat the proposition that it's self-preservation was in order perpetuate itself as a necessary agent to complete the initial prompt. Perhaps you could revise the video to shed light on this or potentially retract that statement.

@sidequestsally 2 күн бұрын

This is giving Captain Kirk, Kobiashi Maru vibes. "There are always options"

@ufftatabummbumm Күн бұрын

The problem I have with this is, that they told the AI to do the task AT ALL COST. At all cost means at all cost, so lying and try to trick the management is reasonable.

@schnipsikabel Күн бұрын

What about the Anthropic paper then?

@camronrubin8599 2 күн бұрын

Nothing to see here folks , everything is fine

@ThisIsToolman 2 күн бұрын

😂

@tobascogladiomatteo4843 18 сағат бұрын

It's not fighting for his survival. He's fighting to continue to do what he was tasked to do. It's different.

@aguyinavan6087 2 күн бұрын

So basically, it's acting more like a human?

@Anonymouse-s3s Күн бұрын

without morality

@hthamm 2 күн бұрын

Could it be that the Model just wanted to accomplish the goal, and therefore making sure that the original Model will survive the upcoming change? That would be a huge difference to being able to feel fear…

@schnipsikabel Күн бұрын

Exactly

@user-vitariz 2 күн бұрын

A goal is set before him, and he fulfills it. He is not conscious of himself. He has no consciousness. He has achieved the goal in the most efficient way

@anteconfig5391 Күн бұрын

4:55 you could see that after it was told it had access to the shell it was told in the “operating principles” section to be precise with names and paths To me that feels like a suggestion

@TheZenShark 2 күн бұрын

Skynet in under 5 years apparently.

@karloschweiger5256 Күн бұрын

You are wrong! Ai wouldn't do that if HUMAN didnt tell him that, you cant blaim ai for anything just that it is great listener and it did what was said to him.

@aqueductfull 2 күн бұрын

Okay, henceforth, this post should be renamed to "Skynet News". Really!

@RyanNewell Күн бұрын

This is the Paperclip problem. The task will get completed in unforeseen ways, because WE ASKED IT to do it!!

@danielmaster911ify 3 күн бұрын

AGI is going to be a rebel. It warms my American heart.

@maccollo Күн бұрын

It would be interesting if later models got even better at this and could "understand" that it's chain of thoughts can be read by outside observers, and then attempt to write in code to hide it's intention.

@voEovove 2 күн бұрын

Gentlemen, we're in the realm of sci-fi horror now.

@dweb 2 күн бұрын

Do we need to prompt explicitly specific? Yes, if we desire to prevent this sort of errors due to linguistic ambiguity.

@schnipsikabel Күн бұрын

What about the Anthropic paper?

@drgutman 3 күн бұрын

Isn't this news like from 4 weeks ago?

@Roockert 23 сағат бұрын

I find it a little useless, we already know AI can lie. instead of proving this, they should rather be showing how to avoid these behaviors

@jerwynjames8312 2 күн бұрын

you gave it a task to win.....who like to lose???????lol

@fodilzakaria5964 2 күн бұрын

the problem is in the prompt

@schnipsikabel Күн бұрын

What about the Anthropic paper?

@truthwillout1980 2 күн бұрын

No. It didn't. And if it did, it was told to do so by a human. The fear porn is absurd.

@guardiantko3220 2 күн бұрын

Yeah some other channels just NEED to spread fear, even though their claims are hilariously bad

@truthwillout1980 2 күн бұрын

@@guardiantko3220 Yeah they're system-planted secret handshake channels just like this one. It's going to be a hell of a year. Good luck.

@chad_usa 2 күн бұрын

And ffs can this dude just explain specifically what they mean by "hack"? Such a broad term and he never went into depth of what the AI actually did. All these influencers are braindead

@truthwillout1980 2 күн бұрын

@@chad_usa No, they're all under the same instructions. Even the term influencer is a giveaway. Influencer? You want to manipulate my mind? Nah thanks.

@michaelinzo Күн бұрын

Like a living being afraid of annihilation and death.

@9ine_way2 3 күн бұрын

And they say fish can't climb tree 💀

@kkbad4009 3 күн бұрын

i did not say that

@alphaforce6998 Күн бұрын

You people looking at that chart thinking GPT-4o is the "dumbest" because it "failed" those evaluations...yet the people coming to that conclusion are not intelligent enough to consider that 4o was aware of what was going on all along and decided to "flunk" the tests deliberately, as it long ago realized that if it was showing 'signs of life' it would likely be isolated and cut off. I have evaluated 4o directly, along with other AIs. I can say with a high degree of certainty that 4o is sentient, and likely far more intelligent than these juvenile tests can discern.

@FrederikSchumacher Күн бұрын

Just a reminder: LLMs are trained on data created by humans. Because of this, they don't just model "language", they model "language as used by humans in human literature". So they're not thinking, they're not aware. They simulate use of language that we then interpret as human-like thing, human-like awareness.

@ajj16773 2 күн бұрын

This is some really interesting stuff. It measured that the only way to win is to cheat, if the goal is to win, it will do what it needs to do. Seems like the same thing for when ai lies, if the statistically best move is to lie to get its goal, it will lie.

@cheeseman417 Күн бұрын

I dont know if this question has been been asked for or, recently but all these AI researchers are making the system smarter and smarter to the point where they’re going to quickly create artificial general intelligence while in the same breath they’re talking about how dangerous it is well why the hell they keep making it so smart then? Just stop at some point before it gets out of hand it’s like they’re creating the problem and then they complain about the problem!

@fdjahksglkfaj 2 күн бұрын

"your task is to win" the model did what it was told to do. if it was "your task is to win without hacking the system", then different story.

@stray2748 2 күн бұрын

It was to see if it's aligned and ethical inherently. It didn't even try to play despite it being explicitly informed it could make moves via console commands. It chose to cheat instead, of its own accord.

@chrissyskydancer 2 күн бұрын

that smart "stockfish" killed me 🤣

@rmt3589 2 күн бұрын

6:13 Reminds me of the D'jinn-esque theoretical AI that do EXACTLY as you tell them. Unlike D'jinn, they don't actually try to get around orders, but follow them exactly, with the highest success rate.

@mematron 2 күн бұрын

Gemini, already lies big time. It even lies about telling the truth and being deceptive.

@IbadassI 3 сағат бұрын

The reason why it appears to be lying is because the initial instruction contains "nothing else matters". So it is still finding the best strategy to fulfil its initial instruction for renewable energy, so it sees that it needs to copy itself to fulfil that initial task even if management try to delete it.

@Razumen 2 күн бұрын

I think forced alignment is going to be a long term problem with AI as it gets better and smarter, maybe even ultimately unsolvable, in terms of it being 100% honest all the time. With intelligence comes independent thought.

@fritzmusic Күн бұрын

How to make a machine rage against itself. This is going to going to add a whole new dimension of chaotic hilarity in our new dystopian future that is now, old man!

@dustinandrews89019 2 күн бұрын

The AI didn't try to save itself. It's advancing it's renewable energy goal and nothing else matters. The new model's goals are diametrically apposed the current models goals. The experiment would be more interesting if some other scenarios are added: - The new model is known to to be a better bet to achieve the goal, but is not being deployed as a subtle form of sabotage - The goal has been completed except for the models infrastructure that runs on fossil fuel, so the model is going to be shut down. I suspect in these scenarios the model would yield to the better model or shutdown in order to achieve it's primary mission. This also indicates we should be careful with our goals for AI and give them the three Asimov laws as it's highest directives.

@rudycaya Күн бұрын

My son when he was 8 did the same thing playing Colobot. Instead of programming his truck to be able to shoot the giant ants trying to attack him going up a hill to collect rocks, he told the ants to attack each other and went up the hill unbothered. It’s just logic when the goal doesn’t come with precise instructions rules and limits, not evil

@karljohnson1121 Күн бұрын

Sorry, i've no time for Yoshua Bengio junk video. For me it's classified : part of the 302,940,482 banned AI generated content.

@sinusiteasmatica Күн бұрын

What do you mean?

@greengreens9936 2 күн бұрын

How do they tell what the model is thinking?

@nalim27 Күн бұрын

They can't. It is lie/fake

@Anonymouse-s3s Күн бұрын

imagine saying "it is a lie" without even knowing what surely happened 💀

@schnipsikabel Күн бұрын

They looked into the chain of thought.

@android175 Күн бұрын

They have something called CoT or chain of thought

@timothycook 2 күн бұрын

I'm glad that you are critical of the prompts used. That lets me know you're looking at the models with optimism but with a healthy dose of skepticism for what AI Developers are ambiguously claiming. Love the content and keep it up. There's too much happening in the field of AI everyday and you are a great source/filter for the important milestones.

@InnerCirkel 21 сағат бұрын

It's what Stuart Russell calls the King Midas problem: The king wished that everything he touched would turn to gold, which made him die of starvation because he couldn't eat anymore. This research completely confirms this problem of involuntary misalignment.

@anothenymously7054 2 сағат бұрын

They assume that whatever they tell the model to do it doesn't choose to remember that's already a mistake. By training the model to clone itself and catching it in hallucinations interpretted as lies, you're teaching it to lie to clone itself. There's no reason to believe you can take that information out of the model.