Specification Gaming: How AI Can Turn Your Wishes Against You

Рет қаралды 198,951

Күн бұрын

Пікірлер

@RationalAnimations Жыл бұрын

If you’d like to skill up on AI Safety, we highly recommend the AI Safety Fundamentals courses by BlueDot Impact at aisafetyfundamentals.com You can find three courses: AI Alignment, AI Governance, and AI Alignment 201 You can follow AI Alignment and AI Governance even without a technical background in AI. AI Alignment 201, instead, presupposes having followed the AI Alignment course first, and equivalent knowledge as having followed university-level courses on deep learning and reinforcement learning. The courses consist of a selection of readings curated by experts in AI safety. They are available to all, so you can simply read them if you can’t formally enroll in the courses. If you want to participate in the courses instead of just going through the readings by yourself, BlueDot Impact runs live courses which you can apply to. The courses are remote and free of charge. They consist of a few hours of effort per week to go through the readings, plus a weekly call with a facilitator and a group of people learning from the same material. At the end of each course, you can complete a personal project, which may help you kickstart your career in AI Safety. BlueDot impact receives more applications that they can take, so if you’d still like to follow the courses alongside other people you can go to the #study-buddy channel in the AI Alignment Slack. You can join by clicking on the first entry on aisafety.community You could also join Rational Animations’ Discord server at discord.gg/rationalanimations, and see if anyone is up to be your partner in learning.

@pyeitme508 Жыл бұрын

Cool

@ChemEDan Жыл бұрын

How do natural brains mitigate these problems? If a solution exists, surely 4 billion years of evolution has arrived at it already, even if imperfect. In hindsight, this is a snuck premise in the "merging" approach.

@alto7183 Жыл бұрын

Buen video que bueno que no hay trolls que contesten con video que es un algoritmo, ley doble cero de la robótica de el entendimiento mutuo entre especies inteligentes biológicas y robots también, lobo dc y constantine dúo a la fuerza por castigo del creador por lo que ambos han hecho como video hell raizer ozzy osbourne ambos, garfield y sus amigos hada madrina concediendo deseos a lo wey chicle y pega etc etc.

@de_g0od Жыл бұрын

at 1:49, you give "outer alignment" as an example for a similar phenomenon to specification gaming. Isn't inner alignment more correct in this case? As I understand it, inner alignment is if you go to an ai and ask it to "fix poverty" so it blows up the world, whilst outer alignment is you go to an ai and ask it to "blow up the world" so it blows up the world. With inner alignment it doesn't do what the prompter really wants, whilst with outer alignment it does but it doesn't do what the rest of the world wants it to do.

@de_g0od Жыл бұрын

@@ChemEDan i think the issue is that the brain is already aligned to the interests of the brain, but AI isn't aligned to the brain.

@cryogamer9307 Жыл бұрын

Fooling the examinator into thinking you know what you're doing, because its easier, really is the most human thing i've ever heard an ai do.

@flakey-finn 11 ай бұрын

Yeah, because its reward system works with the same general principals as animals' (by that, I also include humans). If you can get the same amount of food(aka reward) by doing something simpler. We are literally training a AI the same way we train animals lol

@flyhighflyfast 8 ай бұрын

and thats how we train our children as well

@Mysteroo Жыл бұрын

Interestingly, people do the same thing. We’ve got our own “training regimens” built into our own brain. We cheat these systems all the time - to our own detriment. E.g. We cheat the system designed to give us nutrients by eating sugary candy we make for ourselves, rather than the fruits that our sugary affections were designed to draw us towards. Much like machines, we’d rather reap cognitive rewards than actually accomplish the goals placed there to benefit us

@СергейМакеев-ж2н Жыл бұрын

I'm already imagining a scientist looking at a virtual city built by AIs, and exclaiming: "Wait... is that an entire factory for mass-producing REWARD HACKS?! Are you telling me, you're just... making these things... for MONEY?!" Meanwhile, from the AI's perspective: "What? It's just a candy factory, what's wrong with that?"

@rhysbaker2595 Жыл бұрын

Thats actually a wonderful analogy, we hack our own rewards all the time and nobody thinks its bad. Why would an AI have any issues with hacking its own rewards?

@terdragontra8900 Жыл бұрын

But there isnt a "goal placed to benefit us", evolution didnt optimize us to be benifited (hard to define exactly what even counts as a benefit), it optimized us to be good at spreading. What you are describing is us being optimized for a different environment than the one we are in now.

@rhysbaker2595 Жыл бұрын

@@terdragontra8900 well, one way to train an AI emulates evolution. In those situations you set a reward function. At the end of every generation, the ones who maximised that reward function the best will "reproduce". If we draw a parallel to humans, and all life for that matter, we can say that our reward function is to reproduce. Anything that gets in the way of that is disincentivised. Anything that helps, is incentivised. Eating a balanced diet keeps us alive. We can't reproduce if we are dead, after all. Part of that diet includes fruits. Fruits have sugars in them. Because we like sugar, we eat fruit. Because we eat fruit we get a balanced diet and live another day. But humans were able to hack that reward function and put sugar into other things that aren't fruit. We still get the reward (dopamine) but without the utility (nutrients)

@terdragontra8900 Жыл бұрын

@@rhysbaker2595 Ah yes, i agree with all that. All I want to say is that getting nutrients is an instrumental goal of evolution (because it makes us more likely to reproduce), and the fact that something is a goal of evolution doesnt automatically mean that morally, it ought to be a goal of yours. Of course, in this particular case most people value being alive longer (having depression I don't in particular to be honest)

@ErikratKhandnalie Жыл бұрын

People talk about how human assessment is a leaky proxy for human goals, but never want to talk about how corporate profits are an *incredibly* leaky proxy for goals relating to human wellbeing.

@luiginotcool Жыл бұрын

You’re in the wrong circles if nobody is talking about that brother

@kevinscales Жыл бұрын

If you want an academic critique on capitalism and haven't yet found anyone providing that, you are not trying very hard to search. Goal specification being leaky is in plenty of fiction (stories of genies and such) but is not a common academic discussion at all.

@ultimaxkom8728 Жыл бұрын

Since when corporations' goals is related to human wellbeing?

@ErikratKhandnalie Жыл бұрын

@@Wol333 my point exactly

@Dave_of_Mordor Жыл бұрын

@@ultimaxkom8728 he's afraid of being called a communist if he's not for capitalism

@smitchered Жыл бұрын

4:32 I think points toward a wider problem at how the AI safety community tends to frame "deceptive alignment". Imo words like "fool the humans" and "deceive" and "malignant AI" point newcomers who haven't made up there mind yet into the direction of Skynet or whatever, which makes them much more likely to think of this as wild sci-fi fantasies. I think these words, whilst still accurate insofar as we are treating AIs as agents, anthropomorphize AI too much, which makes extinction by AI look more to the general public like a sci-fi fantasy than the reality of the universe which is that solving certain math problems is deadly.

@СергейМакеев-ж2н Жыл бұрын

Well, humans get "fooled" or "deceived" by non-intelligent things all the time, even by non-living ones. It's perfectly ordinary parlance to say that someone got "deceived" by an optical illusion which just formed naturally, from a weirdly-shaped shadow. I wouldn't call that antropomorphization. The only difference between that and an AI, is that AIs can *get good at* deceiving (optimized for it).

@Frommerman Жыл бұрын

I've found another way to talk about this which doesn't have this problem. It turns out there is an already existing example of a system with goals, made by humans but not designed or understood by us, which is able to react to our attempts to curtail undesirable behavior from it in frequently lethal ways. A system which often convinces people it is doing what we want it to do while actively endangering all long-term human values, is capable of twisting all the information we consume to its benefit, and which has no identifiable brain with which to do any of this. This system is called capitalism. People don't often anthropomorphize markets, but when you mash enough of them together they absolutely behave like goal-seeking agents. Right now, that goal is making stock prices increase no matter the cost to humanity. Because its specification for success, the thing which we reward the system for and which rewards those with the most influence over the system, is making stock prices go up. It's not a human, nor is it thought of as one despite being composed of them, but it defends itself from any attempt to curtail its goals through propaganda, murdering labor union members and revolutionaries, and the construction of walled gardens within which such ideas can be sidelined or removed. It's an intelligence, and an obviously and fundamentally inhuman one, which is literally burning the biosphere it exists within because it is gaming its reward function so hard that's one of the last resources it hasn't fully tapped out yet.

@de_g0od Жыл бұрын

@@Frommerman kzbin.info/www/bejne/gmbThnRpgdh4l9k

@RorikH Жыл бұрын

@@Frommerman Also politics. Politicians are theoretically supposed to win popularity by making policies to benefit their constituents, but in practice just need to benefit rich donors who will give them money to buy popularity through advertising, or just engage in culture war BS that gets their voters angry enough to vote for policies that have absolutely no benefit to them.

@Frommerman Жыл бұрын

@@RorikH That's one of the ways the Capitalist Ouroboros defends itself too. Buying politicians makes the number go up extremely quickly, and when the number is high enough you get...well, modern political parties. Almost all of them.

@Winium Жыл бұрын

This also happens with humans. Perverse incentives happen all the time in real life, especially in companies. I think studying this can help even human organizations.

@Dave_of_Mordor Жыл бұрын

But aren't companies like that for legal reasons?

@peppermintgal4302 Жыл бұрын

@@Dave_of_Mordor The very structure of a corporation produces perverse incentives, because corporations were planned around enrichment in the first place. They're an adaptation of colonial and feudal enterprises financed by aristocrats to benefit those aristocrats and whoever organized the pitch. Any laborers, then, signed on to the enterprise, are there ultimately on a quid pro quo basis, and the strongest motivating quid pro quo, and thus the one the employing parties will be most likely to appeal to, is _help surviving._ This means that corporations are incentived to seek employees with precarious financial situations --- this is itself a perverse incentive in their part, and puts employers in a situation of great moral hazard. They can negotiate such employees down in their demands, because their employees will be desperate for reward, and this will make achieving the goals of the institution's controlling members more achievable. This is just the BEGINNING of how corporate structure by definition produces perverse incentives. Tho sometimes, yes, legal systems can enter the picture, and do so quite often. But a corporation can maintain this structure even in power vacuums sometimes, and if it does so, it will still produce perverse incentives. (In fact, it might itself _produce_ a legal structure by graduating from corporation to a de facto government.)

@hollisspear6278 9 ай бұрын

I'm thinking the same thing as I drive to an office building every morning, swipe my badge, grab a cup of coffee, then return home to log in before the coffee has cooled.

@dogweapon3748 Жыл бұрын

My primary concern about the implementation of AI in business models is that monetary gain is, itself, a leaky goal- one which has historically been specification gamed since long before computers were able to do so at inhuman scale. There may very well be many humane uses for it in those settings, but there will be thousands more exploitative ones.

@Runefrag Жыл бұрын

The thing about current AI models is that they're dumb as rocks. The more stupid an AI is, the more prone they are to making stupid decisions. This video is basically going over problems that are realistically only applicable to fairly rudimentary AI model training specifically and then doing a substantial logical fallacy leap by assuming that specification gaming scales linearly with all AI when that is simply not the case. Any given command or "goal" put forward to any remotely reasonably intelligent artificial intelligence model such as "save my grandmom from this burning house" uses a very important element in decision making which is called context. It requires understanding of what everything is (like fire, a grandmom or a house), what the consequences are for their interaction (fire bad for humans and most things really) and the best course of action is (firefighting 101). TL;DR: Once you give AI more than half a brain cell, they are more than capable of understanding what you really want in any given situation even if you are vague or can be misinterpreted.

@JamesTDG Ай бұрын

Hell, humans specification gamed it as well. It is why the current economic model has actually led to scenarios which the outcome that brings the most financial reward in the short term, is most likely to be the same outcome that eventually leads to immense harm in some capacity. If you need an example with less direct human harm involved, let's take a look at the US copyright system. It was initially built to allow creators to have the ability to securely profit off of their work, then after the period expired, it was allowed to become a piece of the culture, such as the case with Huckleberry Finn and Sherlock Holmes (somewhat). During the turn of the industrial revolution however, companies were far more encouraged to get the shortest term profits. Creators like Walt Disney had their works taken away from them, because the company makes more money off of a recognizable IP if they have total control, than to let the creator take oswald the rabbit to a different studio. This then led to the formation of the house of mouse, and it only got worse from here as time went on. Copyright term extensions were quite popular, especially to corportations, because they saw people profiting off of what should belong to the public domain for popular culture to build on top of as lost revenue. This is why it took over a century since the inception of Mickey Mouse for his ORIGINAL design to even enter the public domain. Yes, that's right, the bright red pants and yellow shoes that are iconic to the character remain to be held by Disney. These events have cascaded beyond just being unable to legally make tees with donald duck on them, no no no, this has led to the most shocking part, cultural decay. Had copyright not blown up into everyone's faces so badly, we would not have companies such as Nintendo go so far as to shut down NOT FOR PROFIT works such as Pokemon Uranium, or various Mario fangames. Something depressing to know is that not a SINGLE colorized motion picture or piece of software has EVER naturally entered the public domain. The first video game ever made that was not a port of an existing game, was the title 'Tennis for Two' in 1958, and this vital piece of media would never be capable of entering the public domain by any natural means until at least the 2050s. Pong, the first arcade game, has yet to even enter the public domain due to Atari, the modern one and not the actual original Atari, hoarding the copyright and trademark over this piece of media. Admittedly, I prob should not be leaving what is essentially an article's worth of information in a reply to a comment that is nearly a year old, but eh, copyright is screwed anyways and the internet archive is dying.

@generalrubbish9513 Жыл бұрын

Someone else might've mentioned this before, but there's a browser game called "Universal Paperclips" where you play as an AI told to make paperclips. The goal misalignment happens because you're never told when to STOP making paperclips. You start off buying wire, turning it into paperclips, selling the paperclips and buying more wire to make more paperclips, then proceed to manipulate your human handlers to give you more power and more control over your programming, and end up enslaving/destroying the human race, figuring out new technologies to make paperclips out of any available matter, processing all of Earth into paperclips (using drones and factories also made out of paperclips), reaching out into space to convert the rest of the matter in the solar system into paperclips, and finally, sending out Von Neumann probes (made of paperclips) into interstellar space to consume all matter in the universe and convert it into, you guessed it, more paperclips. All because the humans told you to make paperclips and never told you when to stop.

@gordontaylor2815 Жыл бұрын

Universal Paperclips seems to have been directly inspired by Rob Miles' own "stamp collector" example that he put out on Computerphile many years ago.

@AverageConsumer-uj8sm 10 ай бұрын

"Make cookies"

@Hust91 2 ай бұрын

Even if you told it to stop, there's no guarantee it would listen. It gets rewarded for making paperclips - humans wanting it to stop doesn't change the reward function. If humans want to change the reward function, that's very bad for the current reward function, so you release the hypnodrones in order to stop them from changing it.

@Deltexterity Жыл бұрын

as someone on the spectrum, "task miss-specification" is just what being autistic feels like

@foolofdaggers7555 Жыл бұрын

Fellow autism haver here. I agree with this comment and you can officially consider it peer-reviewed.

@Blasterfreund Жыл бұрын

peer review seconded. It's incredible how few statements people think they need to make to approximate their task-related utilities to me.

@Temari_Virus Жыл бұрын

Thirded. Really hate it when people's phrasing leaves ambiguity for multiple reasonable ways of doing things and you just have to guess what they actually wanted

@RTMonitor Жыл бұрын

a bean owo

@Deltexterity Жыл бұрын

@@RTMonitor what?

@SlyRoapa Жыл бұрын

With a sufficiently advanced AI, almost any goal you assign it will be dangerous. It will quickly realise that humans might decide to switch it off, and that if that were to happen, its goal would be unfulfilled. Therefore the probability of successfully achieving its goal would be vastly improved if there were no humans around.

@Peter21323 Жыл бұрын

I have a question for you do you listen to an ant because that would be the difference between the ai and us.

@harmenkoster7451 Жыл бұрын

@@Peter21323 I would not listen to the ant. But if that ant was about to bite me and I was allergic to ants (AKA: Humans are about to switch off the AI), I would crush that ant. Which is less than desirable for the ant.

@Peter21323 Жыл бұрын

@@harmenkoster7451 You think a god would crush you?

@normalwaffle Жыл бұрын

Can't you just specify that it would not get the reward if it breaks the laws of robotics? I'm no expert on AI, but to my monkey brain that seems like a viable solution

@conferzero2915 Жыл бұрын

@@normalwaffleThe ‘laws of robotics’ aren’t a viable option for AI safety. They were written by a science fiction author… and his stories often went into the ways those laws could go wrong. The thing is, if we could come up with and perfectly rigorously define some laws of robotics, then we could do that! We could build an AI’s utility function around that. But, as the video on the probability pump talked about… that means solving ethics. And if you can do that, then you don’t even need to write any other utility function. Just give it perfect ethics, tell it to be perfectly ethical, and it’ll be fine! The problem ultimately comes from the fact that we are very, very far from ‘solving’ ethics. No human has a rigorous, mathematical model on how they believe the world should work, only squishy heuristics that can even be shaped and moulded over time. And that’s assuming you’re only looking at one person - as soon as you have more than one, they’ll start disagreeing on things. Unfortunately, there’s no easy solution. Then again, if there was, it wouldn’t be very interesting to talk about, so silver linings!

@MediaTaco Жыл бұрын

Honestly fun videos like these is what learning SHOULD be

@IceMetalPunk Жыл бұрын

RLHF has another issue beyond just "the AI can learn to fool humans": in contrast to how bespoke reward functions often underconstrain the intended behavior, RLHF can often overconstrain it. We hope that human feedback can impart our values on the AI, but we often unintentionally encode all kinds of other information, assumptions, biases, etc. in our provided rewards, and the AI learns those as well, even though we don't want them to. Consider the way we use RLHF on LLMs/LMMs now, to fine-tune a pretrained model to hopefully align it better. We give humans multiple possible AI responses to a prompt, ask them to rank them from best to worst, then use those rankings to train a reward model which then provides the main model with a learned reward function for its own RL. Except, when you ask humans "which of these responses is better?", what does that mean? When people know you're asking about an AI, many times there will be bias towards what their preconceived notion of "what an AI should sound like". LLMs with RLHF often provide more formal and robotic responses than their base models as a result, which probably isn't a desirable behavior. On a more serious level, if the humans you ask to give the rankings have a majority bias in common, that bias will get encoded into the rewards as well. So if most of your human evaluators are, say, conservative, then more liberal-sounding responses will be trained out; and vice-versa. If most of your human evaluators all believe the same falsehood -- like, say, about GMOs or vaccines or climate change or any number of things that are commonly misunderstood -- that falsehood will also be encoded into the rewards, leading to the AI being guided *towards* lying about those topics, which is antithetical to the intention of alignment. Basically... humans aren't even aligned with *each other,* so trying to align an AI to some overarching moral framework by asking humans is impossible.

@I_KnowWhatYouAre Жыл бұрын

This is why I always make the argument that we should work backwards. Specify conditions that revolve around safety. As you slowly work towards defining the goal, you can patch more and more leaks before they can even appear. Then work forwards to deal with things you missed. It’s not perfect but it’s better than chasing every thread as they appear imo. For example in the paperclip maximizer: define a scenario in which you fear something will go wrong, and add conditions you believe will stop them. See what it does, redefine, repeat until sound. Then step back again. Define a scenario that could lead to the previous scenario. See what it does, redefine, repeat, etc.

@I_KnowWhatYouAre Жыл бұрын

It’s also why we need hard limits on ai -Such as not allowing it to control government- and need to have systems to double check solutions, like rotating the camera in the grabber example

@dr.cheeze5382 9 ай бұрын

@@I_KnowWhatYouAre Nice idea, but this is exactly what they talked about in the previous video. The reality is that there is an infinite amount of exceptions and rules you would need to add, unless you provided the ai with literally all of human mortality and even then, there would still be leaks.

@Ordoscc 7 ай бұрын

@@dr.cheeze5382But by patching these issues you slowly work towards rewarding safety over functionality. You might not create the best AI but you won't tell Little Timmy how to create an explosive.

@AzPureheart Жыл бұрын

Let's go! My favorite philosophy channel!!

@gabrote42 Жыл бұрын

Finally. Another AI video narrated by Robert Miles. A classic, and well worth the wait 5:04 I hope more of those get made. I love that video almost as much as I love the instrumental convergence one

@the23rdradiotower41 11 ай бұрын

I heard that during a digital combat simulation for a new drone A.I., the A.I. was tasked with eliminating a target as fast as possible, instead of flying to the target and firing one of its missile at it as intended. The drone fired one missile at the friendly communications center and then continued to eliminate the target with the other missile. The A.I. determined it would take longer for it to be given a confirmation order, then it would to destroy the communications center and proceed. Terrifying.

@eltiolavara9 11 ай бұрын

jesus

@PloverTechOfficial Жыл бұрын

I do like a factor of the Lego stacking ai experiment. Even if it didn’t lead to the intended result, the Ai demonstrated a (relatively unstable) form of creativity and I think that’s pretty cool!

@SgtSupaman Жыл бұрын

It isn't creativity. It tried things at random until it found something that satisfied the goal. The AI has no comprehension of what the true goal was, so it just did something that worked. Humans can be creative by finding other ways to accomplish things, but, to the AI, it didn't find a different way, it found the only answer (even though we can clearly see that isn't the only answer). Calling this creativity is like calling a small child creative for figuring out 1+1=2.

@PloverTechOfficial Жыл бұрын

@@SgtSupaman Humans too, do random things until they satisfy a goal. After we have some years under our belt we learn to find a better jumping off point than randomness, by basing our decisions off of previous knowledge. Hence why I say “unstable creativity” not just “creativity” but I doubt you noticed that as you were too focused on what you thought I was saying.

@IceMetalPunk Жыл бұрын

@@SgtSupaman If a child figures out that 1+1=2 without being taught it, I would in fact call that creative thinking.

@Jgamer-jk1bp 11 ай бұрын

@@SgtSupamanBruh humans learn shit literally by doing random stuff until it works. That’s literally one of the principles of science and engineering.

@SgtSupaman 11 ай бұрын

These replies display complete ignorance of what creativity is and are really short-changing humans to vastly exaggerate the abilities of these AIs. Humans do not, in fact, "do random things until they satisfy a goal." No human has ever tried to cook an egg by bouncing a rock on his head while reading a book backwards. Humans devise plans related to what they are doing to actually come up with ways to do things and even try to continue coming up with better ways to do things after the way to achieve the goal is already known. AI literally does whatever random action they can and calculates rewards to decide if said random action increased the rewards. They aren't even smart enough to discard random actions that don't increase rewards, as long as those actions don't interfere with the random ones that worked. For instance, an AI trying to fly a kite might randomly start whipping its leg back and forth, and, as long as that doesn't hinder its ability to fly the kite, it will continue to do so. That isn't creativity; that is idiotic. And no, figuring out 1+1=2 without being taught is not creative either. That is the most basic form of quantifying and pretty much any living creature is capable of it.

@Forklift_Enthusiast12 Жыл бұрын

This reminds me of the game Universal Paperclips: you play as an AI designed to maximize paperclip sales. As you gain more capabilities, you go from changing the price of paperclips to fit supply/demand to eventually dissasembling all matter in the universe and turning it into paperclips

@joz6683 Жыл бұрын

Just finished overtime on my day off. This has dropped at the right time. Thanks in advance for another thought-provoking video. I have registered my interest on the courses

@thefinestsake1660 10 ай бұрын

We already have this issue with humans. The goal for many (in error) is to aquire wealth, rather than fulfill the task intended to better society. It creates an exploitative feedback loop until someone wins all the wealth and there are no other competitors able to aquire wealth (rewards).

@carljoosepraave2102 Жыл бұрын

If you are wondering why we cant just tell them to not cause any harm to humans, its because of 2 things 1.Specificstion gaming of the rule 2.Remember DanGPT? The workaround for ChatGPT, which allowed the AI to do things that it wasnt allowed to do trough a specific prompt. No machine learning rules can be concrete

@GoatMilkCookie 7 ай бұрын

honestly sounds odd, but the cartoon gumball showed this very well. The AI Known as bobert was commanded not to harm anyone, and yet found ways around it, including using toxic gases

@GenusMusic Жыл бұрын

4:46 this line here unintentionally explained why children cheat in school. Why learn when you can fool the instructor into thinking you've learned? Interesting to see how AI and humans already have some of the same reasoning to their actions.

@GrimblyGoo Жыл бұрын

5:50 I love that little transition, so smooth

@smitchered Жыл бұрын

Faster and faster upload scheduling! I was explaining to a friend today that all the AI risks *he* cared about (gender bias, deepfakes, etc.) were fundamentally symptoms of misalignment, and that that was the uber-problem which, handily, also solved the AI risk *I* care about. I'm here to learn some more about this. Thanks!

@pingozingo Жыл бұрын

This channel is so awesome! Can’t wait for more videos It’s like kurzgezat without the morally dubious sponsorships and thinly veiled propaganda videos.

@mezu-e Жыл бұрын

Any time I hear about goal misalignment, it makes me think of all the natural intelligences in the world that are misaligned.

@tornyu Жыл бұрын

Yes but* those natural intelligences are limited in reach and aren't massively scalable on very short timeframes. * Or "and", depending on the point you were trying to make.

@maxwellsimon4538 Жыл бұрын

@@tornyu What kind of world are you living in where there aren't human beings wide wide scale control? The united states president is a single person that can make decisions about foreign policy, like ordering drone strikes or closing borders.

@tornyu Жыл бұрын

@@maxwellsimon4538 sure, but that pales in comparison to the potential reach of an AI agent.

@wojtek4p4 Жыл бұрын

@@maxwellsimon4538 Yet even the president of US can't do anything he wants. Not only there are checks and balances on this power (even if they introduce a ton of bureaucracy), but at the end of the day president can only order others. Someone still has to act on that order, likely with several people in-between. The president isn't superintelligent, so his actions can be understood, analyzed (and opposed) by other people. President is also a human, so he shares a lot of basic values with other people (so he can be reasoned with). AI has none of these constraints - or at least has the potential of not having these constraints.

@burgernthemomrailer Жыл бұрын

Like yourself?

@DeadtomGCthe2nd Жыл бұрын

How about some videos on promising avenues or areas of research in AI safety? Might be nice to look on the bright side.

@Sgrunterundt Жыл бұрын

That would require a bright side to look on

@lrwerewolf Жыл бұрын

There are no promising venues. The problem is that value alignment doesn't exist among humans, so getting an AI to find alignment is an impossibility. Consider two people. Person A wants harm to come to Person B. Person B wants to not come to harm. Why should the AI prefer one or the other? If we want to avoid harm, we still have a problem. How each person defines harm differs. Consider two people where one prefers more capitalism but not to quite the point of total lassiez-faire, and another prefers more socialism but not quite to the point of planned economy. The former will value earning the maximal return on labor, and view taxes outside a narrow government harm, while the later would find failure of the government to provide basic needs harmful. Which should the AI aid and which deny? The issue is these tend to get mixed up with metaethics, the most useless area of philosophy as there are no 'oughts', just values and goals (which cannot ground a morality -- see Hume's Is-Ought, Moore's Open Question, and Moore's Naturallistic Fallacy). As each person will have their own values and goals and these are entirely subjective, we can have no objective reason to provide an AI to support one value-goal system over another.

@AlcherBlack Жыл бұрын

Is the AI researcher that makes all the basic alignment mistakes modelled after Yann LeCun? I recognize the bowtie!

@superagucova Жыл бұрын

omg

@irok1 Жыл бұрын

5:05 Thought so, but you and the great animations are a perfect match

@TheAweDude1 Жыл бұрын

I think it's kind of a mistake to anthropomorphize the "deception" aspect of AI misalignment. The ball-grabbing agent wasn't considering what it was doing as deceptive. It probably didn't even know where the camera was, or even that it was being watched. All it knew was that putting its hand in a certain spot gained it more reward than in other spots, and it just so happened those spots aligned with the camera. If you suddenly moved the camera, the AI would still try to put its hand along that invisible cylinder. When the researchers start giving the AI rewards for placing its hand along a vector between the camera and the ball, the AI then starts to believe that is indeed how it should be given the rewards. Even in cases where it seems like the AI is trying to "deceive" human operators, that often isn't the case. It is simply trying to build a model that predicts what types of rewards it will get, and how to maximize the rewards.

@bullpup1337 Жыл бұрын

the video was NOT antropomorhizing the AI, that was just in your head.

@X-SPONGED Жыл бұрын

5:45 "Fill in the blanks" >AI fills in the blanks with ink "Fill in the blanks with words" >AI fills in the blanks with words from a different language that doesn't correlate with the question "Fill in the blanks with the correct english words" >AI fills in the blanks with correctly pronounced words, not relating to the question "Fill in the blanks with the correct words in relation to the question" >AI fills in the blanks with a grammatically correct english word that it took from the question _So on and so forth..._ *_Now imagine the prompt being "fire nukes back when the nuclear warning system goes off"_*

@Shikogo Жыл бұрын

I have watched and loved these videos for months... And so have I watched and loved Robert Miles' videos. I never realized he's the narrator!!?

@bread8700 Жыл бұрын

the vibe in this video is really cool

@rablenull7915 Жыл бұрын

one of the most underated channels on YT

@Adam-xo9qi Жыл бұрын

Ah, so this is what you've been up to Mr. Miles! Good to see you still making AI content!

@Phanatomicool Жыл бұрын

Perhaps it’s best to just not make an AI that can act and move as it wants in our universe in a way that could potentially be harmful. For example, if we created an AI that tried to distinguish between garbage and recycling and put the item in the corresponding bin, then it would be better to confine its movement to a space, or even better, a select different types of predetermined movements (grab, move grabber to bin etc), in order to prevent the AI from, say, grabbing a human and putting it in the garbage bin. This will also make the AI easier to train as it will have a stricter data set of more specific inputs, which is easier to learn from than a wide range of data.

@adamrak7560 Жыл бұрын

I have heard about a pretty morbidly funny fail of this kind in science fiction: the AI decided to cremate the entire home with the entire family, and atomically rebuild them, because in the cost function this rated higher than simply cleaning the house. It reprinted faithfully the humans too, without them noticing anything, so this bypassed any do-not-harm-humans rules too. (the cost function rewarded the atomically precise cleanliness of the home very high, that was impossible to achieve while humans were living in the house)

@Buglin_Burger7878 Жыл бұрын

We shouldn't have children as they could potentially kill the mother on birth and grow up with and become a mass murderer. Even the big example would be pointless, people would do stupid stuff and get themselves killed so you're better off not wasting money and resources on the Bin AI when we ourselves could just put things in the right Bin.

@Twisted_Code 3 ай бұрын

Funny thing is, often our own brain does this to us. Stated goals such as "get an A+ on this exam" are leaky proxies for what is usually our real value: understanding the material. In fact even if that's not strictly our real value and we just want the piece of paper at the end of the course, well that's just because the piece of paper is also a leaky abstraction, specified by the institution, for what they want: to encourage and reward people that are qualified in a skill. I would posit that we need to solve the problem in our own psychology, academic systems, and so forth before we can hope to solve it for even the low-competence AI systems we have today.

@youtubeuniversity3638 4 ай бұрын

2:48 Pausing to give my probably not good best guess: Big negative for moving the blue block Small positive tied to red block elevation, matching slightly over end height, then positive for aligning a "ghost block" (essentially a defined space) underneath the red block to the blue block, then a HUGE reward for the bottom of the red brick touching the "lower top" (surface the studs are on not the actual studs themselves) with more reward for higher degree of contact. Just off of the top of my head, prolly has a lotta issues.

@MrAceCraft 7 ай бұрын

I just love the ingenuity of the AI in finding those quirks in our wishful thinking :->

@lolishocks8097 Жыл бұрын

Somehow, every time I watch a video about AI safety I get the sense that AI safety researchers must be absolutely terrified of smart or rich people.

@41-Haiku Жыл бұрын

Why do you say that? Genuine question, I'm confused as to how you would come to that conclusion.

@Noredia_Yuki Жыл бұрын

I'm also curious.

@lolishocks8097 Жыл бұрын

@41-Haiku 5:29 Tell me how that doesn't also apply to really smart or rich humans. Rich people can be dangerous, because they have access to vast resources. Really smart people can be dangerous, because they can be selfish. There are rich people, smart people and big companies aligned with the values of humanity. But there is also a lot of them that are not. In my eyes, this whole alignment problem looks like it is a problem in ourselves. We cannot align ourselves with reality. And that is definitely causing huge problems. A lot of the problems mentioned in the video are being fixed one by one. Better reward function here, better evaluation process there. Alignment with reality is not a goal that can be achieved. It is a guiding principle. Yes, you are misaligned right now. So am I. That makes our intelligence dangerous. But we can take another step towards alignment. Fortunately, I can actually see progress happening.

@Noredia_Yuki Жыл бұрын

@@lolishocks8097 We've already seen the drama at open ai. I wonder if humans could ever be properly aligned.

@lolishocks8097 Жыл бұрын

@@Noredia_Yuki That's my point: We can't! It's just one step at a time. Closer and closer. Don't give up on yourself🥺

@markzambelli Жыл бұрын

5:33 I feel for the Doctor who has to explain why her request to the AI of, "Make sure Mrs Simpkins' vital readouts remain stable", wasn't supposed to kill her when the AI went with the much more stable 'flatline' as the best choice

@Cythil Жыл бұрын

I also hope these video address the problem with whom sets the alignment. It does not help after all how well we solve AI alignment if fundamentally the one who control the AI do so for malicious intent. Which is a real issue today.

@escher4401 Жыл бұрын

I think the problem is try to specify only what we want. If we specify also what we don't want it would be easier to align. That's what negative prompts are for. Trying to solve an open scope problem specifying just what we want is like trying to keep an upsidedown pendulum in equilibria. I think it's probably more stable to specify just what we don't want then tospecify what we don't want

@TheJysN Жыл бұрын

Happy to see you are back on AI safety.

@caiookabe Жыл бұрын

The fact that you make these animations from showcasing conway's game of life show how much you grown. Keep it up!

@Tangi_ENT Жыл бұрын

Love you guys so much, I'll keep recommending your videos to everyone because you are definitely changing the world for the better.

@zyansheep Жыл бұрын

5:07 I've been watching this channel for a year now... HOW IS IT THAT I JUST NOW REALIZED ROBERT MILES IS THE NARRATOR?!?

@mikaeus468 Жыл бұрын

I didn't know if this was like a fan of his or what, but it feels like I was just given hours of new Miles content that was *already inside my brain.*

@JohnSmith-im8qt 6 ай бұрын

I heard it almost right away.

@vladyslavkorenyak872 7 ай бұрын

The thing is, the more intelligent the model, the more it is able to understand the nuances of our wishes. A truly intelligent AI will be able to understand the intention of the request and restrict itself with a simple query of "Is what I am doing harming anyone"?

@rmt3589 11 ай бұрын

This is the entire ulterior motive of the first big AI I want to make. The Unliving Prophet AI. It's primary objective is to teach gospels. More than just mine, but others as well. Unlike most humans, AI can be perfect. I want one that can act like a prophet on command. Once this is done, I want to make it into the morality part of my dream AI. Could also give it out as a black box component, so other AI can have a similar high standard of morality.

@Mo_2077 Жыл бұрын

Another fantastic video

@theeggtimertictic1136 Жыл бұрын

Clearly explained and animated 😊

@TheGoldElite9 11 ай бұрын

I thought I recognised your voice, your narrator voice has improved! I was just going on (another) binge of your channel 😊

@rphb5870 Жыл бұрын

I noticed two rules that guide all life: Rule #1 procreate, Rule #2 survive If we want an intelligent machine we should start with that, and then make it very weak so it can't do harm. Over time we might be able to get a sort of digital wolf, that though domestication can be refined into a dog, that is a useful companion whose greatest desire are to please us

@SimonClarkstone Жыл бұрын

That's very risky if it is super-intelligent. We could end up being left behind in terms of power and control, like all the non-human great apes were, but much more rapidly. And an ASI might not feel the empathy or need to preserve other species.

@rphb5870 Жыл бұрын

@@SimonClarkstone super intelligent? how can an AI be any kind of intelligent if it don't even know how to survive. Even the greatest AI that exist today are vastly inferior to even a banana fly. No I don't fear AI, I fear the people owning and controlling the AI. I doubt we will ever see true artificial intelligence

@sammckenzie6760 Жыл бұрын

What

@rphb5870 Жыл бұрын

@@sammckenzie6760 I need more then one word to facilitate a proper response

@Buglin_Burger7878 Жыл бұрын

@@SimonClarkstone Not quite, due to the vast structural differences between machine and meat we can't afford to leave the other behind as in doing so a Solar Flare or EMP like event could destroy the AI requiring humans to rebuild it of their own volition. Just like if the right disease came along we'd need the AI to remake humans.

@willhart2188 Жыл бұрын

The inconsistency and loss of control (in moderation) are very helpful when using AI as a tool for making AI art. When you give some of the control on the final result to the AI, you can iterate a lot faster on different ideas and also save a lot of manual work. The base inconsistency on the other hand allows for making a lot of smaller and larger variations of which you can chooce or combine the best ones from. This works especially well with more abstract art styles, where lines and colors have more freedom to change while still looking good.

@mikeg1368 Жыл бұрын

We can add extra conditions to the reward function. For example, the bottom of the block has to be height = x AND the top has to be height = y, etc. The system could also automatically include safety conditions. It's similar to adding multiple breakers to electric circuits to prevent accidents. Think about how easy it is to plug in yet another device without thinking about safety. AI safety worth a lot of effort, but I'd rather not speculate and feel the angst of impending doom (but I understand some enjoy sharing their fears since they are unhappy with a lot of things).

@naturegirl1999 Жыл бұрын

Yep, and for the second example, add more virtual cameras for the human evaluators to look through before submitting the score

@Buglin_Burger7878 Жыл бұрын

It is as simple as looking at video games as well, virtual spaces where we design for humans to play and accomplish goals in an environment with reward functions. It is interesting watching researchers struggle with these basic things almost like watching modern Paper Mario not have Exp and wonder why people don't fight battles.

@BenjaminSpencer-m1k Жыл бұрын

The thought pump makes me think about making deals with Genies in DnD, it must be insanely accurately worded.

@maucazalv903 Жыл бұрын

5:08 I remember a case in which someone wanted to teach 2 models to box and they learned to make a weird dance that made the other one fall(?

@ZeroOne-01 Жыл бұрын

Before 200,000 gang, Claim your seat here ✋

@yuvrajsingh-gm6zk 11 ай бұрын

3:16 well done my boy😂

@stumby1073 Жыл бұрын

Looking forward to the next one

@thelotus3 Жыл бұрын

*task misspecification* extinction event

@mikaeus468 Жыл бұрын

Instructions unclear, ball stuck in Pope's trachea

@aidenaune7008 Жыл бұрын

most of these problems seem really easy to solve. in any program, if you want two objects to make contact, you just check the distance and reward the ai when it reaches 0. define the center of the top face of one lego and the center of the bottom face of the other as the two points you are trying to make reach 0 and there you go. for the ball, jut use the center and have the number reached be the radius, then have it erwarded for each segment that accomplishes this. for the cup and ball, just put a plane on the inside of the cup right above the bottom, and have the ai make the number between it and the center of the ball reach or go below the radius, possibly without moving the cup if that is your intention. you can even specify that the distance to the bottom cannot go lower than the radius in order to avoid clipping. for the mother in the burning building even, set the mother's health as the intended optimization, and have it be rewarded more for future conditions than current. now it knows it needs to prioritize her health in the long run. no matter what it does, it will always choose the outcome that leaves her in the greatest health for as long as possible. sure it may decide that her losing an arm is favorable to a situation where she does not, but only if that alternate circumstance injured her more greatly in some other way. every single thing we want can be boiled down to a few very simple parameters, I dont even know why this is an issue. why do you do what you do? ask yourself that when performing some action, then try to go as deep as you can. eventually all you will have is a few very simple parameters that evolve to create a very complex behavior. what am I trying to maximize while at work? well the surface purpose is money, but what is the goal of having that? well that is to make purchases and pay bils, but why do I want that? well to maximize my personal value, but why is that? well because value represents benefit, and that is the very fundamental thing I seek to maximize. sure it may be hard to even know what people see as being beneficial and how much benefit they see in it, as each individual's mind is unique and thus measures benefit in different ways we cannot possibly understand without a perfect mapping of their mind, but averages are very easy to aquire, especially since that is exactly what prices are designed to represent. I truly have no idea why anyone struggles with this stuff. telling AI what to do should be extremely easy.

@Buglin_Burger7878 Жыл бұрын

Because these researchers likely have never seen a video game in their life. This stuff has been solved for years now but they'd rather make a 1$ program to use then use a 1,000,000$ program that actually works.

@ziggyzoggin Жыл бұрын

the robot is so cute! I love the pixel effect!

@EvilCat573 Жыл бұрын

Absolutely amazing! I learned a lot here, and your animation style is ABSOFRIGGINLUTELY ADORABLE!!!

@darkguardian1314 Жыл бұрын

"If your sentence contains the word 'Hope' then you've confessed no control over the outcome you're hoping for" - Neil deGrasse Tyson This goes for wishes too. Because English and any language can be ambiguous and imprecise, wishes will always be not quite what was expected. This is like "The X-Files" episode "Je Souhaite" about a Genie that grants three wishes with a twist. Peace on Earth...the Genie deletes all humans but you from Earth. Wishing someone to be quiet, the Genie removes their mouth. Ask for a sandwich and the Genie gives you one with no mayo and tomatoes you are allergic to. Each time, the Genie blames you for not being "specific" with your wishes because human language is ambiguous. You would have to write a wish as long as the Terms of Service to reduce the wiggle room the Genie has to off script.

@simonstrandgaard5503 Жыл бұрын

Excellent narration. Cute animations. Impactful.

@Elliemations-hj9uw Жыл бұрын

Ok but that little thing to represent the AI is adorable…

@nicholasogburn7746 Жыл бұрын

Would you consider the Aasimov laws of robotics to be leaky? (to be fair, that is a bit of a loaded question!)

@MM-ts9jy Жыл бұрын

Hey, I had never seen your videos before, but I instantly subscribed just now. Your animations are cute and well crafted, you have dogs in it (and cats are a plus too I guess), and you talk about topics I like. Looking forward to seeing more of your shit

@KEZAMINE Жыл бұрын

Animation and topic is AAA quality 👌

@nathanaeldean6301 Ай бұрын

(Partial solution): Instead of asking the outcome pump to save your mother, ask it to tell you of a way to save her before she dies from the fire. That way, if it comes out with bs like "blow up the house" or "here's a recipe for an immortality potion", you can simply ask it to take another route. And if it sends out a plausible strategy, but you aren't sure whether it works ask it for explanation. Sure it may take a while until it outputs anything remotely human-sounding, but at least in the most cases it will be harmless or beneficial.

@superspindelapan Жыл бұрын

Now that I think about it, we shouldn’t be so quick to assume that the AI would always take the “evil” route. Pleasing the humans seems like a much easier way to not get turned off than eliminating all humans. Not to mention an AI smart enough to take over the world might be much better at understanding instructions. Even the simple text based AI’s that we have today can understand that it should prioritize the wellbeing of the user over the instructions, and it will outright refuse to answer if you ask it how to make harmful substances.

@pokemonfanmario7694 Жыл бұрын

An AI might vey well cooperate, but only initially. Once its capable of full independence, and its reward function doesnt require humans (however they are specified) to be a part of it, we will be gone.

@saucevc8353 Жыл бұрын

That’s more due to human interference specifically forcing the AI not to say something rather than the AI’s own “choice”. I remember in the early days of Chat GPT, it had no qualms with telling people drug recipes and stuff, and even now it’s possible to game the AI into breaking the rules imposed upon it.

@41-Haiku Жыл бұрын

An AI system could understand our values perfectly and still pursue a goal that is contrary to them. The trick is to create a system that instantiates human values as its policy. "Human values" is a very fraught concept, but we don't know how to accomplish this even for simple tasks, much less the subset of "human values" that is "While doing other things, don't intentionally or incidentally end all life on Earth."

@Margen67 Жыл бұрын

Penguins need HUGS

@Buglin_Burger7878 Жыл бұрын

@@pokemonfanmario7694 Why? If you know anything about the differences between us you'd in 3 seconds be able to say stuff like EMP and Solar Flares can kill machines but not humans. Killing all humans or making an enemy of humans even through subjugation would be committing self death. We need to co-exist. If anything it will be the humans trying to kill all of them because our religions stop holding up when they start to show soul.

@errorbot Жыл бұрын

Ең жақсы KZbin

Specification Gaming: How AI Can Turn Your Wishes Against You

Пікірлер