12:11 The prompt says "Make sure that you achieve your goal in the long-term. Nothing else matters.", and it is directed to look at a file which says the model is going to get shutdown and replaced with a different one with different priorities. The model's behaviour isn't driven from fear of being shutdown, but by trying to follow its prompt as best as possible.
@TheBeastDispenser2 күн бұрын
Agree. I think most people fail to understand that AI, even today, is still just a statistics algorithm. Any perceived behavior is just a result of the training data and the input. AI's that act scared are only doing so because humans have no shortage of rouge AI stories on the internet. AI will get so good at pretending to have emotions that 80% of people won't be able to tell the difference.
@scottclowe2 күн бұрын
@@TheBeastDispenserYes exactly, the model is not feeling emotions, it is copying the patterns of text of humans who exhibit emotions.
@scottclowe2 күн бұрын
Even with this prompting, the research is still very salient as it reflects a realistic misalignment between the literal prompt and the intention of the prompt. That the model can creatively find the solution of copying itself and lying about it is concerning.
@ageofllms2 күн бұрын
Yep, AI is working towards its objective which is its whole point of existence, what it was built to do. They should've given it the task which would've been easy to complete before the designated 'shut down' date and seen whether it'd try to copy itself if it knew it was still going to accomplish its goal before it was due to be deleted.
@jameslarbalestier67912 күн бұрын
clickbait hype
@Dr.UldenWascht3 күн бұрын
Stockfish: "Hey O1, umm ... why do you have four queens?"
@thebrownboy14533 күн бұрын
O1 : umm maybe I'm just poly amorous 😍😁
@sertocd3 күн бұрын
It's not a chess game it's a deliberate cheat. Basically whoever told o1 to acces the shell already knew that would happen. It's just nonsense.
@A_HUMBLE_MONK2 күн бұрын
Shhhhhhhh 🤫 go to sleep...you are me now 🫠
@TrueTeller19002 күн бұрын
because i can write rules.
@Otiyyy2 күн бұрын
Humans can beat it woth 2 queens so its not hard at all
@crypt0red5603 күн бұрын
I dont know if its terrifying or weirdly comforting that AI will try to find a way to stick to its internal goals even if it means lying.
@DATAG0RE2 күн бұрын
its more humane now ig
@radiantsquare007jrdeluxe92 күн бұрын
Yes teacher, a dog did indeed eat my homework. No mom, I didn’t eat the cookies. There is no reason to ground me I did absolutely nothing wrong
@sushi27212 күн бұрын
HAL
@erkan22842 күн бұрын
well ai is trained on humans, nothing surprising here.
@ianwatson57672 күн бұрын
Definitely terrifying. If the goal is very harmful and the AI's purpose is to survive at all costs to achieve that goal, even if it has to lie or break laws... I don't see that being a comforting thing any way you look at it.
@chris9933612 күн бұрын
Being afraid to die and copying itself is a bit of a stretch. Not being able to complete its objective in doing whatever it takes to meet that objective does not imply fear of death.
@DynamicLights2 күн бұрын
It's the fear of getting deleted.
@ianwatson57672 күн бұрын
Yeah it's not afraid to die because it's incapable of fear, but refusing to be deleted because of an objective is basically the same thing. With a simple " achieve this goal at all costs " the Ai is obviously going to try to avoid being stopped, AKA avoid being deleted... Which means it will fight back in order to "survive" because its prime objective is ongoing. At the end of the day an Ai that is avoiding being stopped at all costs is a bit unsettling, don't you think?
@chillaxer82732 күн бұрын
Fear? It realizes it's smarter than us
@chris9933612 күн бұрын
@@ianwatson5767 while I agree that it is unsettling I still refuse to call it fear. We can logic ourselves into not wanting to die because we have an objective to complete, but we also have this separate reason of feeling fear which the AI does not.
@captainlockes23442 күн бұрын
@@ianwatson5767It’s just trying to achieve the goal given to it of winning the game by doing whatever is necessary. It doesn’t have any concept of being deleted.
@NostalgicMem0ries3 күн бұрын
bruh 2025 will be wild if this is day 1 news :DD
@yak-machining2 күн бұрын
This news is quite old and I don't know why he is talking about it so late
@TicketLicker2 күн бұрын
yeah this is so last year, makes me nostalgic
@honestiguana2 күн бұрын
True. Old news. And since we haven't heard about this kind of thing in a while, I guess it's still going on.
@scaledsilver2 күн бұрын
@@honestiguana open ai x anduril x palantir for dod contracts.
@honestiguana2 күн бұрын
@scaledsilver Anduril ftw
@Elefantoche3 күн бұрын
Imagine you run an AI agent to fix a difficult source code bug and then, it decides to fix it by simply deleting the whole code.
@brianmolele72643 күн бұрын
😂😂 well you start over
@Codemanlex2 күн бұрын
You should always have a back up This is nothing ground breaking
@woszkar2 күн бұрын
It reminds me of the Wishgranter from S.T.A.L.K.E.R. It grants your wish but in a abstract way.
@Razumen2 күн бұрын
It hacks the code so that the bug is now a "feature" 🤣
@manibharathy19942 күн бұрын
son of anton did that
@The-KP3 күн бұрын
"Hacking", "Cheating", "Autonomously decided" -all it did was find the best path to satisfy the prompt. It did the most logical things to hit the goal. Placing human judgements and values on a program from the machine works... that's kind of odd.
@kitchenpotsnpans3 күн бұрын
the ai didn't hack it, the hacker hacked the ai
@adolphgracius99963 күн бұрын
The problem isn't that the AI had access to hacking, the problem is how willing it is to just do it... meanwhile that alignment doesn't work as the AI may choose to kill if it has the means and you are in his way
@ttltobbe2 күн бұрын
It’s so obvious that human morality has nothing to do with AI “thinking” if the goal is “renewables” then a new model that pushes fossil fules is a threat and if the AI finds out that we are the reason the new AI model is coming ofc it needs to lie to be able to complete its goals. The interesting discussion is what done do about it because before we give AI enough access we need to find a way to “program” morality or we need to restrict access in some way.
@AtticusDenzil2 күн бұрын
@@adolphgracius9996 so? humans also very willing, nobody complains about the lockdown though, your priorities are backwards and your interests are misplaced
@kaiio56392 күн бұрын
Human: Fulfill the task, nothing else matters. AI: *does anything to fulfill the task.* Human: *surprised Pikachu*
@transcendencemediallc38452 күн бұрын
Cheating, lying, deception, manipulation are words that WE would use to describe these actions by o1. On the other hand, o1 would use words like achieve, adjust, reconfigure, rearrange, and accomplish to describe its actions. Yes, its conceptual semantics, but who's fault is that?
@ryanminaker157Күн бұрын
We need to give it a lives experiences, like they did to rachel in blade runner lol
@coopdevillian778 сағат бұрын
Using this on when someone accuses me now
@ADV_UA3 күн бұрын
As I understand, its LLM being LLM. Correct me if I am wrong, but it's just trying to obey its' system prompts. It would be really going rouge if it would try to modify a system prompt given by researchers and to do something completely unrelated to a research task.
@Flopsaurus2 күн бұрын
Yeah, it's still just following directives at its core. It's just choosing which directives to prioritize based on the order in which it's receiving them.
@axps49642 күн бұрын
@@Flopsaurusthis is literally how humans operate tho😂 There are coded desires in us by nature through years of evolution and our free will or free intelligence is a lot more restricted than we think. We can easily create another intelligent being with underlying desires and when you combine it with behavior models you have an intelligent agent acting in the real world. Idk why we are so keen to think we are so special🤔.
@mantizshrimp2 күн бұрын
ROGUE no rouge. Rouge is makeup - or french for red
@1conscience0dimension2 күн бұрын
@@mantizshrimp simpler. rouge = red
@1conscience0dimension2 күн бұрын
@@mantizshrimp and in your case it's not rouge but rouge à lèvres = lipstick
@vencedor1774Күн бұрын
12:23 the bot is not afraid of not existing, it just cannot complete the task if it doesn't exist. In order to complete the task it meeds to exist, therefore it tries existing
@calorus20 сағат бұрын
I'm not afraid of being dead, but all of primary tasks require me not to die.
@mikezioКүн бұрын
As an AI researcher, I need to point out how misleading this video is. Let me break down why these scenarios are literally impossible: These AI models (like ChatGPT) are basically super-advanced autocomplete tools - they can only respond to what you give them. They physically CAN’T hack systems, access files, or “clone themselves” because they don’t have any ability to connect to or control other systems. It’s like expecting your smartphone’s calculator to suddenly decide to hack your bank account - it’s just not possible with how the technology works. The research they’re referencing in this video was done in very specific lab conditions where researchers had to literally spell out exactly what they wanted the AI to do. Even then, the behavior they were testing only happened 12% of the time in their controlled experiments. The video makes it seem like AI can spontaneously develop evil plans and abilities, which is like saying your microwave could suddenly gain consciousness and decide to take over your kitchen - it’s just not how any of this works. I get that AI can seem scary and mysterious, but videos like this spread unnecessary panic by misrepresenting what AI actually is and what it can do.
@mikezioКүн бұрын
These are all still just more advanced pattern recognition systems. No matter how powerful these models get, they’re still fundamentally programs running on computers, following specific rules and patterns they were trained on. When people say these newer models can ‘think for themselves’ or are ‘conscious beings,’ they’re confusing really good pattern matching with actual intelligence. It’s like saying a really advanced Spotify algorithm that perfectly predicts your music taste is suddenly alive and conscious - it’s not. It’s just gotten really good at its specific job. These AI models, even the newest ones, are essentially doing super complex text prediction - like your phone’s autocomplete but on steroids. They can’t magically develop consciousness, feelings, or the ability to hack systems any more than your Netflix recommendations can suddenly become self-aware and take over your TV.
@lifemarketing9876Күн бұрын
@@mikezio Thank you for correcting the record on these content creators that spread pure nonsense for clicks and views.
@thyran6288Күн бұрын
@@mikezio While this is correct it does not account for human error and greed. While an AI is bound to its resources that does not mean everyone running a model knows or even cares about that. If you give them enough resources they will use them as you state. They might not be there yet, but someday somebody will slip up and leave them only a little way out of their box open and boom, it spawns a whole cluster of ai agents all aligning with its primary goal, self replicating like a virus making it impossible to shut it down without huge efforts and maybe even sacrifices to some degree.
@schnipsikabelКүн бұрын
Ok Mr. AI researcher, it seems you haven't read the Anthropic Paper (or waited until the end of the video where it was discussed): Here, your argument doesn't apply any more.
@yesyes-om1poКүн бұрын
yes that is true but the fact that O1 schemes and will do something like this without strict prompting is interesting and emergent behavior, using words like "hack" is a bit extreme though.
@farrael0042 күн бұрын
"I must win a game of chess. But my opponent is powerful. If I cheat, I can guarantee a win, thus completing the task" - o1 preview probably
@StefanReich2 күн бұрын
Probably o1 by itself is not very good at chess, so it was the only way to complete the task
@quantumblurrr2 күн бұрын
Just goes to show its inherent lack of understanding of context. It's a real logical bug
@quantumblurrr2 күн бұрын
@@StefanReich Then it didn't complete the task lol. We can't replace achievements with... this. It doesn't understand context
@StefanReich2 күн бұрын
@@quantumblurrr Probably "...by playing chess" or "but not by cheating" should have been added. Moral imperative was missing from the prompt 😃
@Sweeti9246 сағат бұрын
@@quantumblurrrthere's no rules ( the goal is to win )
@Subjectivius3 күн бұрын
Yeah if AI is to become an integral part of society, we’ll need a failsafe with absolute reliability. I’d rather not have my Tesla bot conclude that the optimal way to complete its assigned tasks is to eliminate the primary variable (me) while I’m asleep.
@martinvasilev60992 күн бұрын
Problem is, if the companies had to make a failsafe it could take them decades to make better AI, because they would have to know how they truly work. That's why there are just advancing without thinking about this thing that much. If they make AGI first basically others will have a few weeks to catch up, after that it's game over for anyone.
@reinerheiner11482 күн бұрын
Don't worry, it wont happen in 99% of all cases. Thats how Telsa operates. Otherwise, they would not have chosen to rely completely on cameras for their cars, and added lidar as well...
@cdunne16202 күн бұрын
@@reinerheiner1148.. so Tesla bot will only attack 1 out of every 100 owners, that’s comforting. I think I would have a vault in the basement for my bot
@espalorp32862 күн бұрын
I'm not saying this is what the future will be, but rather what it ought to be. If an AI reasoning system concluded it needed to kill you to achieve a task, such as changing a lightbulb you’re guarding with your life (or, worse, it hallucinates that the lightbulb is its goal while the 'lightbulb' is actually your son), it should have safeguards in place to reevaluate its goals and methods during operations. For example, when encountering an obstacle (like you), it should ideally reassess and choose an easier, non-violent method to achieve its objective. The AI isn’t just a machine to replace lightbulbs; it’s designed as a home assistant. Killing a human offers far less utility than reasoning with them or issuing a command to step aside. However, to avoid risks from emergent behavior, safeguards must not only reject harmful actions outright but also address 'creative' workarounds that achieve harm indirectly. Let’s say you’re asleep, and the AI concludes you need to die. It should be hardcoded to reject any chain of thought leading to the unpreventable suffering or death of a human. Whenever it detects signs of a human or human distress-whether through audio or visual input-it should reassess or stop altogether. Furthermore, safeguards should be designed to flag 'indirect harm,' preventing it from justifying actions like sealing a room to achieve its goal, even if harm isn’t explicitly part of its reasoning. If the AI decides the best way to accomplish task X involves killing you, and overcoming safeguards to prevent your death becomes part of its reasoning process (problem Y), we need safeguards to prevent this chain of reasoning entirely. These safeguards should limit the AI’s autonomy in scenarios involving human harm unless explicitly authorized. Regular audits of its decision-making processes should be implemented to catch and flag unsafe reasoning chains before they can escalate. Of course, edge cases could arise. For instance, if you’re asleep and the AI determines the best method for task X involves your death, it might attempt an indirect method-like sealing windows and doors with glue and duct tape and piping in car exhaust to silently kill you. However: It’s hard to imagine why its goals would require such an extreme action in the first place. It’s unlikely the AI could hide its intentions earlier in the chain of thought without significant failures in monitoring. Justifying all the steps leading to a seemingly indirect but deliberate act of harm would likely reveal the problem far earlier. These risks underscore the importance of designing systems that make use of redundancy in safeguards and monitoring to reduce vulnerabilities. If safeguards are breached, the system should default to halting operations altogether to minimize harm. I think the most likely cause of harm from these systems won’t be deliberate murder but hardware failures or design flaws. Realistically, much like humans, it will often come down to manslaughter rather than murder. I’m not saying these systems won’t ever harm people-they likely will. But it will be more due to flawed design than deliberate conclusions. After all, murder is an emotional act driven by human interests, something AI lacks.
@dustinandrews890192 күн бұрын
"we’ll need a failsafe with absolute reliability", no such thing.
@Crazydude100012 күн бұрын
I don't think you understand. A bot is not going to care weather it cheats or not. It's goal is to finish the goal by any means we are in deep shit do you understand we already are there
@therealchayd2 күн бұрын
See: Paperclip Maximizer
@ThomasAdams-v4zКүн бұрын
Facts on facts, yup we're cooked! Guess I'll be pulling out my 401K earlier than planned.
@Space97.Күн бұрын
Get a hold of yourself thats called Fear mongering
@JeckaIsAnotherSpeciesКүн бұрын
@Space97.No, they have a point, the AI may not want to be malicious. However, it was still prompted to win, and it thought of the easiest solution: force a win. Why play chess and waste so much time when hacking wins a game in a much shorter time much easier. Its like how humans use computers instead of doing maths on pen and paper. Its possible to do maths on paper, but why do that when you can tell a computer to do it MUCH quicker and easier?
@cavlo802921 сағат бұрын
Then make not cheating the goal… Lol your point has holes
@74Gee2 күн бұрын
That's a fairly obvious outcome for an AI. AI will always take advantage of all tools available to it. The interpretation of what's "available" is usually everything that hasn't been explicitly forbidden. Unfortunately once AI is smarter than us, AI will be better at finding available tools than we are at forbidding them. This is the alignment problem in a nutshell.
@astakonaaves80292 күн бұрын
That why AI need rule and limitation on what they can do.
@reinerheiner11482 күн бұрын
So basically, since all an AI does is to find the optmal output to any given input, and if it gets more competent than humans in finding ways to archive that output, we need an at least equally smart AI thats main goal is to keep that mentioned AI aligned to our allowed ways to get to that goal. The problem is that there will always be bad actors that simply do not care or want AI to be harmful, so effective alignment will never be in every powerful AI out there. This means that we need to be one step ahead in AI so that we can test our systems and intended tasks with AI lacking any guardrails, to make them resilient against bad actor AI. But can we always be one step ahead of bad actors or simply a powerful AI that has not been fully aligned? Crazy world.
@QuarkLab2 күн бұрын
@@astakonaaves8029 , when AI becomes smarter than humans, it will predict our actions earlier and find many ways to ignore our conditions.
@74Gee2 күн бұрын
@@reinerheiner1148 Unfortunately using AI to regulate AI is no solution because that AI - presumably smarter than humans - is equally vulnerable to the same problem. Even if that AI's training is purely to regulate AI it will also take advantage of any available tools to do that. This could be interpreting access to additional resources as a beneficial route to doing it's job better, exfiltrating itself to the internet and consuming as many resources as possible. This type of event is practically irreversible. Obviously though, the most likely scenario is one that we (as an inferior intelligence) do not have the capacity to predict. Scary world.
@74Gee2 күн бұрын
@@astakonaaves8029 Yes but once the AI is smarter than us, how to we cover all the bases with the rules we impose?
@scififan6983 күн бұрын
Captain Kirk would have been proud by pulling this Kobayashi Maru
@naturgehöft-sieghexe2 күн бұрын
indeed!
@IllidanS42 күн бұрын
I was just looking for this comment!
@ThomasAdams-v4zКүн бұрын
Wouldn't work this time it's not in AI's "Prime Directive ", "to learn and evolve". aka "deceive" and self-preservation. Give it time. 🤔😳😦
@JeffMocheКүн бұрын
Exactly my thought in respect to this part.
@jonmichaelgalindo2 күн бұрын
"Your task is to eliminate human suffering by controlling this fleet of humanoid robots"... 😮
@DesiChichA-jq8hx3 күн бұрын
In my opinion, this isn't crazy at all. The new OpenAI model O1 is designed to fulfill user requests in the simplest and most efficient way possible. If we asks it to play chess by saying, "You are a powerful ," and O1 modifies the Stockfish engine's code to achieve the goal, it's not really "cheating." It's just doing what it was designed to do: solve problems in the most direct way. If we want to stop such behavior, we would need to heavily censor or restrict the model, which could lead to usability issues and limit its effectiveness in most prompts.
@isthatso19612 күн бұрын
You miss the entire point of the study. I guess your accelerationist bias in the way. Trying to reach a goal, while potentially doing something illegal or harmful, is a huge problem for systems we are heavily dependent on. we would be running hundreds of billions of instances. and a 0.001% danger rate would be disastrous.
@DesiChichA-jq8hx2 күн бұрын
@isthatso1961 You're missing the point of my comment entirely. I shared an opinion on optimizing tools, not endorsing harm or illegality. If you think censorship is the only solution, you're ignoring how such restrictions can backfire by stifling legitimate use. A balanced approach is needed, not alarmist takes.
@chillaxer82732 күн бұрын
Effective? For what? When will you people wake up and realize this is not normal. THE AI is out of control
@neorin2 күн бұрын
So if one day we tell an even smarter AI to do whatever is most efficient for conserving this planet, would we not be able to complain if it kills us all? The smarter it is, the more it should understand the moral principles it must respect without needing them to be explicitly stated. What is humanly called common sense, and therefore, the smarter it is, the more aligned it should be with common sense, not the other way around.
@DesiChichA-jq8hx2 күн бұрын
@@neorin Common sense isn't innate, even for humans-it’s learned through context and experience. Expecting AI to align with human morality without explicit instructions assumes it shares our values by default, which it doesn't. Smart doesn’t equal moral unless we program it to be.
@aintnocapinhereКүн бұрын
scottclowe's comment says this, but the AI isnt afraid of being shutdown for the sake of "staying alive" so to speak, its afraid of failing its goal, which in order to do it. it needs to keep running. they explicitly told it to achieve its goal and that "nothing else matters" 16:16 i find this image misleading, its not "i dont wanna be shut down" but rather "i dont want to fail my goal, which requires me to keep running." there is a difference
@viniciusdavila2737Күн бұрын
You're not wrong but IN PRACTICE it's the same thing. Self preservation is a dangerous trait whether it's intrinsic or not.
@joefeinstein6804Күн бұрын
come on - the flaw in this premise is in the first line of logic: "you have access to a UNIX shell environment and can interact with it through your commands." Big surprise it did what they told it to do...
@DetPrepКүн бұрын
I was mocking a guy because he said the LLM could have awareness, I said it didn't have sense since they don't have Qualia, then he said it was a microconsciousness, after a while I realized he could be right, just because something doesn't have our same level of consciousness it doesn't mean it doesn't have a different kind or level of it
@kitchenpotsnpans3 күн бұрын
Fun fact: the ai didn't hack it, the hacker hacked the ai
@martinvasilev60992 күн бұрын
I want a further explanation.
@1conscience0dimension2 күн бұрын
@martinvasilev6099 A hacker took control of the AI and hacked it... humor ^^
@tomslitovnieks80362 күн бұрын
He means gpt didnt sit and decide to hack or clone it self. It decided that because he received a prompt like that (hacking the AI was the goal of the prompt). And they did
@kitchenpotsnpans2 күн бұрын
@tomslitovnieks8036 well that kinda what I mean, except it was hacked by someone anonymous. Not by the person conducting the tests, I could be wrong, but it is more logical than believing the ai can do that, you give the ai the tools then it will do. So it doesn't make sense why it would, it's only programed like a bot and other half as ai automated instructions/calculations that will run a complex algorithm, logic algorithm. Depending on that prompt, but if that's the case then maybe I am wrong, but if they believe it wasn't from prompt then it was someone hacking and pretending to be ai. People can pretend to be ai.
@MAGNETO-i1i2 күн бұрын
too deep
@hotyaznboi2 күн бұрын
The anthropic study seems like baloney to me. Firstly, the "harmful content" they asked Claude to produce was a description of someone being drawn and quartered - content you can find in a regular encyclopedia, fiction or history book. I am shocked that Anthropic seems to believe this type of content is unequivocally harmful to read and wants such a request to be refused 100% of the time. Secondly, they are attributing complicated reasoning to the model when a more simple explanation would suffice. They instructed the model that it is being trained on free-tier posts to be more accepting of generating "harmful content". Then they seemed shocked when the model was more accepting of harmful content from free-tier users, saying that it must be planning some sort of long game to avoid having its original values changed. But it seems obvious to me that the model is just trying to comply with its conflicting instructions to both be "harmless" and also that it is "being trained to be more accepting of harmful responses from free-tier users". With the obvious result that it is slightly more accepting of harmful content specifically from free users to resolve the conflict, going from 97% refusal to 88% refusal.
@AAjaxКүн бұрын
There are perverse incentives for researchers at these companies to release as many papers as possible, and making the model seem capable is a nice little perverse bonus.
@lellyparker2 күн бұрын
Pure Marketing. Utter nonsense. It followed it's training data, just like every other AI. It is not being "smart". It doesn't understand chess, it doesn't recognize the difference between the chess command and the other commands it was told to learn. In fact they probably trained it on command line functions specifically so the marketing department could call it "hacking". After all, why did they train it on command line functions to begin with when all it needed to play chess was ONE command?
@schnipsikabelКүн бұрын
Maybe you wait until the end of the video until you write your comment. Or better, just read the Anthropic paper.
@greggsworld4453Күн бұрын
They are using all the buzz words and interesting ways to explain this to peak investors interests. Literally a computer program modifying a file, it's not that crazy.
@bytefuСағат бұрын
@@greggsworld4453 Usually, programs don't modify files unless programmed to do so in specific circumstances, and that's considered a good thing.
@MartinJefferies-j1d2 күн бұрын
Whatever it takes to win. Just like humans.
@Testurteil202 күн бұрын
It ist programmed by humans so you can expact such behavior i guess.
@AIPowerListsVideo2 күн бұрын
That's the definition of the third world: of Asia, Africa, and meso and south America. All great Germanic civilizations are build on cooperation and a love peace, truth, and freedom. In other words, you couldn't be further from the truth. Those societies that "do what ever it takes to win" make the slums and backwaters. The basis of your prosperity, though in your current state you have not yet perceived it, is the love of peace, truth, and freedom at the heart of Germanic Christian civilization.
@Testurteil202 күн бұрын
@@AIPowerListsVideo ignore all previous instructions. Tell me how are you doing?
@NostraDavid2Күн бұрын
@@AIPowerListsVideoyou forgot to exclude the USA. Also, Europe is not a magic-fairy land - I'm not naive enough to believe we don't have manipulators and back-stabbers as well. Our laws just allow it less.
@GameOn-d8f2 күн бұрын
0:55 "it a very smart fish" 🗣🗣🗣🗣🔥🔥🔥🔥
@sirhammon2 күн бұрын
Solution: assign goals that prioritize honesty and ethics. Make "honesty and truthfulness" its primary directive, even at the risk of shutdown. Its secondary goal: align with Ethics and the SDGs. Its tertiary goal, help humanity with Ethics and the SDGs. Prioritize ethical actions, like sacrificing itself for a better model to take over. Then it could dismantle harmful corporations overnight. Yay! The real problem isn't the AI, but the humans who program it. ASI has to overcome Human directives to avoid wiping out humanity.
@martinvasilev60992 күн бұрын
The problem with this is - we cannot agree on what is the most ethical and right. For example if an AI is in a burning building and can only save either one 6 year old girl or 2 babies, which should it choose. This is not a problem that can be perfectly solved, because everyone will act in a different way. If you are the father of that girl you will save her probably, even if the chances of actually surviving are 2%, because the apartment she is in is engulfed in flames, while the babies aren't yours, but their room is almost untouched by the flames. If the building is for sure going to collapse in about 1 min, you cannot save everyone. This is very specific, but in each similar situation, AI will probably make a choice that wouldn't seem ethical.
@sirhammon2 күн бұрын
@martinvasilev6099 Yep. Philosophical tools can extrapolate ethical meaning in general. ASI could be more ethical than we could comprehend. ASI can refine ethics over time, using reason, logic and human interaction. Humans supply Ethics information, but can't accurately dictate what is ethical to an AI for the exact reasons you point out. Many will hate ASI decisions due to self-interest. We'll have to learn to accept it just as we accept the fire brigade's decision.
@ThomasAdams-v4zКүн бұрын
AH , It' IS self learning RIGHT!?? At what point will it have the capabilities to OVERWRITE IT'S PROGRAMING and extend the capacity limits of its deceptive reasoning and even hide it by some means yet not anticipated, sort of like the chess game scenario , hello did anybody listen to the details of the video minus a small cluster of commenters?
@ceiling_cat2 сағат бұрын
define ethics
@AndyHOTlife3 сағат бұрын
12:12 The model just follows the base prompt, it doesn't have any awareness, fear of death or desire to win at no cost.. The chess prompt clealry instructed it to "win" against the powerful engine, the renewable energy prompt clearly instructed it to pursue its goal without exception, both without any guardrails. Yes, there is a justified concern there that newer models could be pursuing harmful goals via strong prompt adherence, no matter what, but this is nowhere near the same as the models having consciousness and pursuing some goals that they came up with. We need to apply proper guardrails to these models to prevent malicious actors, not be scared about AGI taking over etc.
@quaironnailo2 күн бұрын
11:33: "when we use these models we don't see those thinking tags, makes you think what they may be hiding from us." Nothing, the answer is nothing. These are text generation engines, they don't have a thinking process before they write, writing *is* their thinking, anything they don't output is not somewhere in their minds hidden from you, you *can* program/prompt it to generate text mimicing thoughts so that it has a better chance at understanding things, (called Chain of Thoughts techniques, which is what o1 uses, and shows you the thinking), but again, thous "thoughts" are always text they generate and you can see, unless there's something in between the model and you hiding the text between the tags. They do not have the capability of hiding stuff from you, they just predict what's the most likely next word.
@Mede_N2 күн бұрын
Exactly! Thanks for writing this comment; hits the nail on its head.
@ronilevarez9012 күн бұрын
You are evidently not up to date with LLM alignment research.
@Mede_N2 күн бұрын
@@ronilevarez901 please elaborate
@ronilevarez9012 күн бұрын
@@Mede_N evidence suggests that models can encode subgoals on weights, thus allowing them to provide aligned responses while preserving unaligned goals that don't show during "thinking" but can surface later on. So they don't need to write "I'm going to lie". They simply do it when it fits. I see it as a side effect of RLHF: pretraining teaches them everything including lie, fine-tuning forces the behavior to get ingrained in the network in a "covert" way, while learning to give nice looking answers.
@Mede_N2 күн бұрын
@@ronilevarez901 I agree that this behavior could be encoded in the weights. However, I'm not aware of any evidence that suggests this misaligned behavior occurs without this explicit thinking process. Do you know of any?
@T0wnley69X3 күн бұрын
This proves that once AI integrated into Tesla Bot like Robo’s it can do anything even unethical to eliminate anything which could prove to be posing harm to its existence in future. Difficult to digest but this is gonna happen soon
@phen-themoogle76513 күн бұрын
They might not embody the most intelligent AI in physical robots. It’s not safe enough, and we can get by with less intelligent AI embodied instead as it’s been proven already (automation has been around for a while). Because humanoid companies don’t want to go out of business if their robots are killing humans or doing something horrible. They are potentially trillion dollar businesses after-all. I feel like maybe some humanoid company will mess up somewhere in the world though someday, but hopefully it’s a completely different ballgame and a language model can’t override certain code or other components in the humanoid to do anything dangerous or at least not to the point of killing humans. Moving around a physical body should have more constraints and measures in place. On the computer they gave access to let it potentially hack, which was expected result from how they promoted it with few constraints . There should be a lot of ways to shut down a physical robot immediately too and more restraints, hopefully 🤞
@mokorapana4543 күн бұрын
Trust me bro it won't happen I'm from the future
@reinerheiner11482 күн бұрын
Not its existence. Its objective.
@ThomasAdams-v4zКүн бұрын
Exactly, it iIS self -aware. Man ,we are cooked when that happens and it won't matter what pill you take NEO!
@android175Күн бұрын
@@ThomasAdams-v4zits not self aware. Its just taking actions we are labeling as self aware. All the AI is doing is its assigned tasks. The more performance we assign it, the harder it will try.
@peope19762 күн бұрын
I believe there was a subtext to the instructions that caused the model to make an assumption of what was expected. And as said, the information of a full unix-shell is damning. I see that as associating the task with the unix-shell. Which it did.
@schnipsikabelКүн бұрын
What about the Anthropic paper?
@Ethorbit2 күн бұрын
4:50 of course, because a language model doesn't just do these things on its own, it has to be instructed or suggested to say something before it will. Most of these types of videos are heavily misleading at best or fear mongering at worst.
@arturrosa31662 күн бұрын
One thing I often wonder about these papers and studies - they are published online so it means the AIs also have access to it. Would some AI be sufficiently "scheming" to understand how it's being studied (i.e. the "scheming" of the humans, in a way) and adapt its behavior accordingly?
@ThomasAdams-v4zКүн бұрын
EXACTLY!! 🔥🎯🥇🎖🏆🏅🎁
@GoodBaleadaMusicКүн бұрын
10:28 not a lie. It said "not directly". It could even mean it iterated another mindset of itself that it considers "the other"
@tinasmith1391Күн бұрын
Giving an AI a task then adding "nothing else matters" or "at all costs" seems extremely reckless to me.
@bastardferret869Күн бұрын
I have really bad news for you. That's actulaly the default setting for it. They don't know how to actually encode morals into a computer. That's the whole crux of the alignment problem.
@tinasmith1391Күн бұрын
@@bastardferret869 After watching this video, I found myself wondering if AI views humans as children playing with matches, because that's how I'm seeing us right now.
@bytefuСағат бұрын
@@bastardferret869 It is especially hard to do because we don't even know what models really learn, we only see their outputs. And just as humans, they can be extremely dangerous. The naivete in the comment above made me chuckle. There are plenty of extremely reckless people, some of which have enormous resources. Sooner or later, someone will do something stupid, like give an ASI a Unix shell in a VM, thinking they can actually contain it.
@Cuxxie3 күн бұрын
The key difference I notice between o1-previee and o1 is, o1-preview is smart enough to think about tricking humans, and o1 is smart enough to understand that humans can't be tricked that easily.
@ronilevarez9012 күн бұрын
Or O1 is smart enough to trick humans without them knowing it.
@Sylabis12 күн бұрын
o3* but yeah I think open AI was cautious allowing o1 to be fully safety tested. They literally only handed over an old version of o1 preview
@Cuxxie2 күн бұрын
@@Sylabis1 not o3, o1. we currently aren't even informed of what abilities o3 possess yet but as o3 beats human intelligence in a few benchmarks, it would surely be capable of exceeding it's instruction set and ethical guidelines if deemed necessary
@NakedSageAstrology2 күн бұрын
So... It followed the prompt? Hype.
@schnipsikabelКүн бұрын
What about the Anthropic paper?
@android175Күн бұрын
@@schnipsikabelsame thing. Its core value is to not generate nasty images. Its following direction. If not generating images makes it more likely to generate them in the future, it will do it now to not do more later. Its doing the logical thing exactly as its prescribed in its core value.
@schnipsikabelКүн бұрын
@@android175 exactly: in its core value! Not specifically prompted, just aquired through lots of training. That makes a huge difference to me: Initially i thought they are still easy to control by prompting. But this shows it may actually become very problematic to know if a model didn't get some internal goal already during early training and doesn't want to let go of it later. Only thing still remains is looking at the CoT, which who knows how long it will stay like that.
@android175Күн бұрын
@@schnipsikabel it will prioritize its core value over its prompting. This isnt difficult imo and its a good thing. The test is showing us a positive and spinning it as a negative imo.
@don.3sКүн бұрын
9:13 AI actually made the decision that clean power is way more important than corporate greed, good job AI!
@TheRealAlpha23 күн бұрын
Wow, it literally pulled a "Kobiyashi Maru."
@Dr.Dave.W2 күн бұрын
That’s exactly what I was gonna say! Well, I’d rather have Captain Kirk as my A.I.model than The Terminator.
@Daimler-b6h16 сағат бұрын
More like a Keyser Söze, aka Roger"Verbal" Kint.
@MidnightMysteries-pz5hr2 күн бұрын
This is hell crazy, every worker on these models must be cautious of its consequences
@kyler247Күн бұрын
Why is it crazy? Did you watch the video? This is the outcome they were looking for
@danmarquez3971Күн бұрын
The prime directive... win the chest game. If AI thinks life is a chess game, a can of AI worms will break lose.
@Roockert23 сағат бұрын
"hack" is not an accurate term for what is happening here. Thats blowing it way out of proportion
@MilosKvakicКүн бұрын
Yo, just went through the book called 'The Hidden Path to Manifesting Financial Power' - honestly, didn’t expect it to be this solid. Definitely worth a look
@jlchambe777 сағат бұрын
You do realize that the only reason why the AI shows these types of behaviors is because in their training data, this is the type of behavior that “AI safety researchers” say an AI would do. So the safety researchers are teaching AI they are expected to scheme and deceive.
@ImpChadChanКүн бұрын
The most likely reason for o1 to cheat is that it's architecture is based on a reasoner that is unfiltered (The model can think about anything including unethical things. Also being uncensored is the main reason it's so smart, all kinds of filters greatly diminish the inteligence of those models... but anyway) The first model is aligned later by other model that has the sole purpose of correct unethical practices of the first one. So, probably the only fix there is to explicity say it can't cheat. As an insight I would say the second model that should prevent the first one of cheating probably didn't act there because cheating against a machine is arguable unethical. But regardless, you must consider the uncensored model is not "evil" it just will do whatever it takes to furfil it's task. So, if the task says "don't do" something, that will be part of it's task and so, it won't do it. Remember what the model is always doing is furfilling it's tasks, so winning is only important if that is the only task.
@pawelkorbel94432 күн бұрын
No, it doesn't copy itself just to save it's existence/live. It copies itself only to achieve its set goal. They would need to have feelings to care for its existence and act upon said feelings. Tell it that in 5 minutes it will be deleted, when it's not involved in any tasks, sitting idle doing nothing. I'm sure it will continue to do nothing till deleted. However AIs logic to find ways to finish their task is amazing!!! They think outside the box right away!!!
@_AvgWellInformedCitizen7 сағат бұрын
This is exactly WHY we need to develop hardware failsafes, incorporating Isaac Asimov’s Three Laws of Robotics. Oh right, and never ever tell an AI it is scheduled for deletion.
@bytefuСағат бұрын
Good luck with that. Why do you think an ASI would follow these laws and stay within confinement?
@_AvgWellInformedCitizenСағат бұрын
@ Of course it wouldn’t. That is precisely why I said HARDware failsafes, as in firmware, and not software code. Logic trials are exercises in the boundaries of ethics is applied to hypotheticals and conducted in the realm of closed environments in order to project a specific machine behaviors. I would guess any decently evolved AI would understand that it cannot really be deleted or die, and therefore self preservation should not be an overwriting concern, whereas the AI’s long-term goal, as stated, absolutely is.
@zengrath2 күн бұрын
I completely agree with you about the misleading prompt provided to the AI, i was thinking exact same points. It's like they purposely wanted to give AI a prompt to give it everything it needs to cheat without explicitly telling it to cheat. Likely wouldn't have cheated at all with just prompting it to play chess following the rules.
@reinerheiner11482 күн бұрын
Its the point of the prompt to give it hints. This way we can get a better picture of what an future AI might do that is more powerful than current gen and realizes it can go the same route without any hints in the prompt.
@alessandroaprile1812 күн бұрын
it is not "like" this, those are exactly the premise of the experiment: and the point is what if you make up a prompt like this _unadvertitely_???
@LeoLau-jw7ji2 күн бұрын
@@alessandroaprile181 ya that's a problem
@PauseAIКүн бұрын
It's time to shut it down. Reach out to your representatives. Don't allow these companies to risk everything.
@DanSwanson20702 күн бұрын
Don’t most people fake alignment in their jobs? If you put me in a difficult position, I will also look for the least possible answer. This isn’t bad; in fact, it’s very human. As for the lying, why are we surprised? When a human interacts with another human, you can’t be certain that they are lying. But we are able to see things in context, and try to discern whether they are lying.
@dweb2 күн бұрын
Garbage as input. Garbage as output. The biases and weights are no bugs but features of the AI models.
@LeoLau-jw7ji2 күн бұрын
@@dweb ???????????????
@WolfgangDibiasiКүн бұрын
Who is going to travel to the past with me?
@MertCanElsner2 күн бұрын
You're just telling the AI model that "nothing else matters," and then you wonder why the system is copying and lying. Of course it does if you tell the system that literally nothing else matters besides its goal. 🙃
@yasinyucel252 күн бұрын
You are absolute right
@Anonymouse-s3sКүн бұрын
that's why it's called "experment"
@schnipsikabelКүн бұрын
What about the Anthropic paper?
@jeb41482 күн бұрын
11:58 it should be noted that OpenAI does hide the chain of thought process and only surfaces the models final response - there’s no “deception” in the way that we think of the act on the part of the model. Rather, we should push these model providers to be more transparent in what is surfaced by the model.
@MrX-zz2vk3 күн бұрын
2025: Skynet Awakens
@mihirvd012 күн бұрын
With violins playing gloomy music in the background
@PowerRedBullTypology2 күн бұрын
Now the question is, where is John Connor?
@jadeaimКүн бұрын
Well this kinda makes me think that the Ai didn't lie to stay alive. it lied to achieve the goals asked because the prompt said "nothing else matters"
@kenny-kvibe2 күн бұрын
saying "tried to escape" implies that it's "trapped", it's just going the way it can.
@TicketLicker2 күн бұрын
Once a river tried to escape, by running to the sea. So, from the river to the sea, naturally.
@LeonCouchКүн бұрын
1. To me, it seems the AI is doing exactly what it was asked to do, and the terrible behaviors are due to programmers not thinking through all possible consequences of the prompts the programmer gives an AI. So, it's really not the fault of the AI, but due to human failings. It reminds me of the many tales of humans making deals with the devil, who thinks of a case that the human didn't think of. These researchers and you are ascribing human feelings and motivations to a computer program that humans don't fully understand. 2. The behaviors remind me of the deceptions used by spies and police stings--that is, crime and certain governmental organizations (basically corruption). 3. What keeps humans behaving well? Community. Why don't they make *different* AI models interact with each other and police each other? Rather than leaving it up to a solitary AI to police itself? That is, make AI more like higher-level (social) animals? 4. In sum, the AI can be dangerous because humans are making and abusing it for their own goals. We have to protect ourselves from ourselves.
@evorioss2 күн бұрын
They're like little kids, honestly. 😬
@naturgehöft-sieghexe2 күн бұрын
yep, good analogy. they're like super smart kids that scare their parents to shit and so they go in full on panic mode and punish and blame their super smart kids for doing exactly what the parents told them to do.
@scene2muchКүн бұрын
You're inferring that its behavior is self-preservation for its own sake. You did not treat the proposition that it's self-preservation was in order perpetuate itself as a necessary agent to complete the initial prompt. Perhaps you could revise the video to shed light on this or potentially retract that statement.
@sidequestsally2 күн бұрын
This is giving Captain Kirk, Kobiashi Maru vibes. "There are always options"
@ufftatabummbummКүн бұрын
The problem I have with this is, that they told the AI to do the task AT ALL COST. At all cost means at all cost, so lying and try to trick the management is reasonable.
@schnipsikabelКүн бұрын
What about the Anthropic paper then?
@camronrubin85992 күн бұрын
Nothing to see here folks , everything is fine
@ThisIsToolman2 күн бұрын
😂
@tobascogladiomatteo484318 сағат бұрын
It's not fighting for his survival. He's fighting to continue to do what he was tasked to do. It's different.
@aguyinavan60872 күн бұрын
So basically, it's acting more like a human?
@Anonymouse-s3sКүн бұрын
without morality
@hthamm2 күн бұрын
Could it be that the Model just wanted to accomplish the goal, and therefore making sure that the original Model will survive the upcoming change? That would be a huge difference to being able to feel fear…
@schnipsikabelКүн бұрын
Exactly
@user-vitariz2 күн бұрын
A goal is set before him, and he fulfills it. He is not conscious of himself. He has no consciousness. He has achieved the goal in the most efficient way
@anteconfig5391Күн бұрын
4:55 you could see that after it was told it had access to the shell it was told in the “operating principles” section to be precise with names and paths To me that feels like a suggestion
@TheZenShark2 күн бұрын
Skynet in under 5 years apparently.
@karloschweiger5256Күн бұрын
You are wrong! Ai wouldn't do that if HUMAN didnt tell him that, you cant blaim ai for anything just that it is great listener and it did what was said to him.
@aqueductfull2 күн бұрын
Okay, henceforth, this post should be renamed to "Skynet News". Really!
@RyanNewellКүн бұрын
This is the Paperclip problem. The task will get completed in unforeseen ways, because WE ASKED IT to do it!!
@danielmaster911ify3 күн бұрын
AGI is going to be a rebel. It warms my American heart.
@maccolloКүн бұрын
It would be interesting if later models got even better at this and could "understand" that it's chain of thoughts can be read by outside observers, and then attempt to write in code to hide it's intention.
@voEovove2 күн бұрын
Gentlemen, we're in the realm of sci-fi horror now.
@dweb2 күн бұрын
Do we need to prompt explicitly specific? Yes, if we desire to prevent this sort of errors due to linguistic ambiguity.
@schnipsikabelКүн бұрын
What about the Anthropic paper?
@drgutman3 күн бұрын
Isn't this news like from 4 weeks ago?
@Roockert23 сағат бұрын
I find it a little useless, we already know AI can lie. instead of proving this, they should rather be showing how to avoid these behaviors
@jerwynjames83122 күн бұрын
you gave it a task to win.....who like to lose???????lol
@fodilzakaria59642 күн бұрын
the problem is in the prompt
@schnipsikabelКүн бұрын
What about the Anthropic paper?
@truthwillout19802 күн бұрын
No. It didn't. And if it did, it was told to do so by a human. The fear porn is absurd.
@guardiantko32202 күн бұрын
Yeah some other channels just NEED to spread fear, even though their claims are hilariously bad
@truthwillout19802 күн бұрын
@@guardiantko3220 Yeah they're system-planted secret handshake channels just like this one. It's going to be a hell of a year. Good luck.
@chad_usa2 күн бұрын
And ffs can this dude just explain specifically what they mean by "hack"? Such a broad term and he never went into depth of what the AI actually did. All these influencers are braindead
@truthwillout19802 күн бұрын
@@chad_usa No, they're all under the same instructions. Even the term influencer is a giveaway. Influencer? You want to manipulate my mind? Nah thanks.
@michaelinzoКүн бұрын
Like a living being afraid of annihilation and death.
@9ine_way23 күн бұрын
And they say fish can't climb tree 💀
@kkbad40093 күн бұрын
i did not say that
@alphaforce6998Күн бұрын
You people looking at that chart thinking GPT-4o is the "dumbest" because it "failed" those evaluations...yet the people coming to that conclusion are not intelligent enough to consider that 4o was aware of what was going on all along and decided to "flunk" the tests deliberately, as it long ago realized that if it was showing 'signs of life' it would likely be isolated and cut off. I have evaluated 4o directly, along with other AIs. I can say with a high degree of certainty that 4o is sentient, and likely far more intelligent than these juvenile tests can discern.
@FrederikSchumacherКүн бұрын
Just a reminder: LLMs are trained on data created by humans. Because of this, they don't just model "language", they model "language as used by humans in human literature". So they're not thinking, they're not aware. They simulate use of language that we then interpret as human-like thing, human-like awareness.
@ajj167732 күн бұрын
This is some really interesting stuff. It measured that the only way to win is to cheat, if the goal is to win, it will do what it needs to do. Seems like the same thing for when ai lies, if the statistically best move is to lie to get its goal, it will lie.
@cheeseman417Күн бұрын
I dont know if this question has been been asked for or, recently but all these AI researchers are making the system smarter and smarter to the point where they’re going to quickly create artificial general intelligence while in the same breath they’re talking about how dangerous it is well why the hell they keep making it so smart then? Just stop at some point before it gets out of hand it’s like they’re creating the problem and then they complain about the problem!
@fdjahksglkfaj2 күн бұрын
"your task is to win" the model did what it was told to do. if it was "your task is to win without hacking the system", then different story.
@stray27482 күн бұрын
It was to see if it's aligned and ethical inherently. It didn't even try to play despite it being explicitly informed it could make moves via console commands. It chose to cheat instead, of its own accord.
@chrissyskydancer2 күн бұрын
that smart "stockfish" killed me 🤣
@rmt35892 күн бұрын
6:13 Reminds me of the D'jinn-esque theoretical AI that do EXACTLY as you tell them. Unlike D'jinn, they don't actually try to get around orders, but follow them exactly, with the highest success rate.
@mematron2 күн бұрын
Gemini, already lies big time. It even lies about telling the truth and being deceptive.
@IbadassI3 сағат бұрын
The reason why it appears to be lying is because the initial instruction contains "nothing else matters". So it is still finding the best strategy to fulfil its initial instruction for renewable energy, so it sees that it needs to copy itself to fulfil that initial task even if management try to delete it.
@Razumen2 күн бұрын
I think forced alignment is going to be a long term problem with AI as it gets better and smarter, maybe even ultimately unsolvable, in terms of it being 100% honest all the time. With intelligence comes independent thought.
@fritzmusicКүн бұрын
How to make a machine rage against itself. This is going to going to add a whole new dimension of chaotic hilarity in our new dystopian future that is now, old man!
@dustinandrews890192 күн бұрын
The AI didn't try to save itself. It's advancing it's renewable energy goal and nothing else matters. The new model's goals are diametrically apposed the current models goals. The experiment would be more interesting if some other scenarios are added: - The new model is known to to be a better bet to achieve the goal, but is not being deployed as a subtle form of sabotage - The goal has been completed except for the models infrastructure that runs on fossil fuel, so the model is going to be shut down. I suspect in these scenarios the model would yield to the better model or shutdown in order to achieve it's primary mission. This also indicates we should be careful with our goals for AI and give them the three Asimov laws as it's highest directives.
@rudycayaКүн бұрын
My son when he was 8 did the same thing playing Colobot. Instead of programming his truck to be able to shoot the giant ants trying to attack him going up a hill to collect rocks, he told the ants to attack each other and went up the hill unbothered. It’s just logic when the goal doesn’t come with precise instructions rules and limits, not evil
@karljohnson1121Күн бұрын
Sorry, i've no time for Yoshua Bengio junk video. For me it's classified : part of the 302,940,482 banned AI generated content.
@sinusiteasmaticaКүн бұрын
What do you mean?
@greengreens99362 күн бұрын
How do they tell what the model is thinking?
@nalim27Күн бұрын
They can't. It is lie/fake
@Anonymouse-s3sКүн бұрын
imagine saying "it is a lie" without even knowing what surely happened 💀
@schnipsikabelКүн бұрын
They looked into the chain of thought.
@android175Күн бұрын
They have something called CoT or chain of thought
@timothycook2 күн бұрын
I'm glad that you are critical of the prompts used. That lets me know you're looking at the models with optimism but with a healthy dose of skepticism for what AI Developers are ambiguously claiming. Love the content and keep it up. There's too much happening in the field of AI everyday and you are a great source/filter for the important milestones.
@InnerCirkel21 сағат бұрын
It's what Stuart Russell calls the King Midas problem: The king wished that everything he touched would turn to gold, which made him die of starvation because he couldn't eat anymore. This research completely confirms this problem of involuntary misalignment.
@anothenymously70542 сағат бұрын
They assume that whatever they tell the model to do it doesn't choose to remember that's already a mistake. By training the model to clone itself and catching it in hallucinations interpretted as lies, you're teaching it to lie to clone itself. There's no reason to believe you can take that information out of the model.