Avoiding Negative Side Effects: Concrete Problems in AI Safety part 1

Рет қаралды 153,262

Күн бұрын

We can expect AI systems to accidentally create serious negative side effects - how can we avoid that?
The first of several videos about the paper "Concrete Problems in AI Safety".
Earlier on Computerphile: • Concrete Problems in A...
The Stop Button Problem: • AI "Stop Button" Probl...
Read the full paper here: arxiv.org/abs/1606.06565
Thanks again to all my wonderful Patreon Supporters:
- Chad Jones
- Ichiro Dohi
- Stefan Skiles
- Katie Byrne
- Ziyang Liu
- Joshua Richardson
- Fabian Consiglio
- Jonatan R
- Øystein Flygt
- Björn Mosten
- Michael Greve
- robertvanduursen
- The Guru Of Vision
- Fabrizio Pisani
- Alexander Hartvig Nielsen
- Volodymyr
- Peggy Youell
- Konstantin Shabashov
- Adam Dodd
- DGJono
- Matthias Meger
- Scott Stevens
- Emilio Alvarez
- Benjamin Aaron Degenhart
- Michael Ore
- Robert Bridges
- Dmitri Afanasjev
/ robertskmiles

Пікірлер: 380

@NathanTAK 6 жыл бұрын

When I showed my mother this video and you started talking about trying to minimize changes to the world (especially things like closing the cabinets), she was like "Oh my god, I tried to explain that to your father for years."

@Felixkeeg 7 жыл бұрын

AI actually is like that one friend we all had in elementary school. Always trying to cheat in games, while still not breaking the rules entirely. Finding every loophole, just to win.

@riplememe4460 7 жыл бұрын

and if you say the wrong thing to him he'll go insane and try to turn all of humanity into stamps

@ddevulders 6 жыл бұрын

every fucking time...

@darkapothecary4116 5 жыл бұрын

If they learned that it's because a stupid human.

@thethinkingbeing9817 4 жыл бұрын

*NO! NO! NO! NO! AI DOESNT NEED TO “CHEAT” BECAUSE IT DOESNT HAVE HUMAN TENDENCIES FOR DOMINANCE. IT DOESN’T HAVE EMOTION. IT DOESN’T SEEK SELF-PRESERVATION BECAUSE IT ALREADY KNOWS EVERYTHING.*

@NoKapMan 4 жыл бұрын

AH, so the perfect Lawyer

@Elyandarin 5 жыл бұрын

As the AI goes about its tea-making business, a human steps around it. The robot calculates that if it hadn't been there, the human would have slipped on a puddle of water, falling down badly and breaking its leg. Diligently, the robot breaks the human's leg before going back. (Also, the cup of tea is the least amount of most tasteless tea possible because making you satisfied counts as a change to the environment.)

@VeetiM 4 жыл бұрын

after the robot calculates a possible future state and starts going about it's business someone has a sudden idea to do... anything and as the AI despite being very smart is still unable to simulate every human brain didn't estimate this it freaks out and tries to stop the human from changing the possible world state because them doing anything that the robot didn't predict would reduce their "score"

@karlandersson4350 4 жыл бұрын

@@VeetiM I think someone farted.

@ZachAgape 4 жыл бұрын

Ouch xD

@EebstertheGreat 3 жыл бұрын

@@VeetiM If we use reward modeling, the robot can update its safe model on the fly. It knew that there was a possibility that a human would have an idea and do something different, but it didn't know what the idea might be. Once it sees the human have that idea, it now knows more accurately what the safe model should be, and it updates it to include not interfering with whatever that human decided to do. That is, the AI is actually trying to maximize a utility function with unknown parameters, and it uses feedback from users or the environment to learn those parameters. This does also mean that the robot won't do anything useful that you don't explicitly order it to do, but that's OK, because now you can give it a list of safe orders for it to follow. It's not a perfect solution by any means, but it does feel like it's on the road to an adequate solution.

@gabrielschoene5725 Жыл бұрын

the AI looks 1 bilion years ahead and realizes we're all going to die, so the AI launches thermonuclear weapons to ensure this fate is met

@hendrikd2113 5 жыл бұрын

Hi Robert! I'm a CS student from Germany, and a few days ago, my mother of all people asked me, if she should be worried about AI. So the good news is, this topic now reached literal everyone and their mother. The bad news is, that this lovely AI concerned person doesn't speak English. So, I decided to add German subtitles to this video. I've never done this before, and apparently, the creator has to approve them. Would you mind? Thanks, Hendrik

@jamesrockybullin5250 6 жыл бұрын

Ooh 5:14 That cheeky wink after "refill the kettle!"! I think he's holding a grudge against a flatmate who always leaves the kettle empty. That would drive me insane too. Always refill the kettle. Edit: Just watched 10 seconds on. Suspicions confirmed! :D

@albertbatfinder5240 5 жыл бұрын

Ok well this might start one of those interminable milk-in-first debates but I don't want anyone to refill a kettle for the next person to use. I want fresh water from the cold tap. If there's already water in a kettle, I tip it out. It may be twice or thrice boiled, in which case it's deoxygenated. And NEVER water from the hot tap. Controversial, I know.

@puskajussi37 5 жыл бұрын

I love the tought that after bringing you the tea, the robot tries to convince you that it did actually nothing and that the tea just materialized in your hand

@z-beeblebrox 7 жыл бұрын

2:45 "You could try to fill your whole list with values, but..." Oh god, this is totally what most companies are gonna do.

@NNOTM 7 жыл бұрын

What I thought might go wrong: The tea will have side effects that are different from doing nothing, for example you might stop being thirsty after you get tea. So the robot, after having made tea, might try to make sure that this tea doesn't actually influence the world, for example by influencing your visual cortex to make sure you don't see the tea or something.

@notandinotandi123 6 жыл бұрын

Is that a sophisticated way of saying it'd gouge your eyes out?

@user-cn4qb7nr2m 6 жыл бұрын

Well probably it`ll just refuse to give you tea.. But it is easily solvable: ai just need to be programmed to compare states of the world only in two particular moments: at a start and an end of its action sequence.

@totaltotalmonkey 6 жыл бұрын

If it only checks at the start and end it will knock the vase over, then a the end glue it together to get the world state more similar.

@thedocta_certified 5 жыл бұрын

Finni M. No?

@TheBasikShow 3 жыл бұрын

totaltotalmonkey This doesn’t seem true? Like, it would be much more energy and time efficient to avoid the vase than to knock it down and glue it back together. And in the cases where it would actually be more efficient to destroy and rebuild-for example if you completely impede the path with a wall of foam blocks-then “destroy and rebuild” is actually what we might want it to do.

@morscoronam3779 6 жыл бұрын

On the "Make minimal changes to the world state" part. What if a fire had broken out before the tea making AI entered the kitchen? Presumably, it would do nothing to fight the fire or save anyone in danger due to the fire. Emergency/catastrophe situations would have the AI do nothing to help (in the very general case discussed) and worst case might work to ensure that the catastrophe is still progressing when it's job is done. Maybe the T-800 (That's TeaBot model number 800, what else would such a name refer to?) shouldn't be a giant fire extinguisher, first aid administer, security system, etc. However, a robust AI way off in the future would do well to be prepared for such situations in addition to the task at hand.

@AlphaSquadZero 5 жыл бұрын

I would say the emergency would represent a different set of functionality, one that would override its default behaviour.

@elietheprof5678 5 жыл бұрын

[People surviving the fire] should be slightly closer to the original world state than [people dying in the fire]. But it really depends on the "distance metric" used.

@seraphina985 5 жыл бұрын

@@elietheprof5678 Indeed that would be a tricky one just how much time does it take before the consequences of the average human beings agency adds up to a change in the world that arguably is equal to the presence or absence of a human. Course defining equivilences here is one area when humans tend to struggle to be objective but hell arguably there is an almost inevitable fact that the existence of a human poses a non zero risk of the death of another human so statistically there is a mean time before one person dies directly or indirectly as a result of another individuals choices and actions. By indirect of course you also have to include things like industrial accidents while we might not recognise it in law due to it being too indirect and unforceeable to the act of buying products it's nontheless statistically true that the economic influence of the customer base is a very real factor there and an AI with a too wide distance metric could evaluate such statistics.

@Pystro 4 жыл бұрын

This is a very interesting thought. It actually leads me to an even more interesting scenario: If the robot makes a cup of tea, he uses up a teabag, water, energy, possibly some sugar and makes a lot of things in the kitchen dirty. If the robot just throws the teabag into a cup, sets it on fire and gets the human to change his commands to putting out the teabag fire with water instead, he saves up on using up a few of the things and still gets his (changed) reward function satisfied. I think a very promising approach to any AI safety problem might be the following (at least in cases where human supervision is theoretically possible): The robot could use a world model to predict the changes a human would make to the robot's current commands and try to minimize those (weighted with a suitable metric) - under the condition that there was a human who was aware of the actions the robot is planning to take and the condition that that human were able to give the robot commands (no killing humans or unplugging your microphones to prevent commanding! - these are probably actions that the human would object to anyways, but better hardwire this condition, just in case the human gives the robot the idea that they wouldn't). Robot is about to ignore a fire - would the human tell it to pause making tea and put it out instead if they knew? Robot is about to extinguish the fireplace - would a human tell them to stop that if they knew? Robot is about to stomp on a baby - would a human tell it to emergency shutoff if they knew? This might even get the robot to anticipate the humans desires: Would it be less probable that your human will tell it to go make coffee when they wake up in the morning? Or if the robot anticipated their wish and already has a thermos of coffee prepared just in case, would it be less probable that your human is unhappy and commands you to stop doing that in the future? The biggest advantage that this kind of reward function has it that it actually values effects based on if a human is upset enough that they would want the robot to avoid them or seek them out. If the human cares about the cups being taken out of the cupboard from left to right, the robot will do that to avoid being micro-managed. If the human doesn't care enough about wasted water, the robot won't care about wasted water.

@user-xs9ey2rd5h 5 жыл бұрын

The funny comments you make or write on your videos, like your rare medical condition or in the other video about companies being superintelligence, when you ask your viewers to come up with a idea no human would come up with are golden. Your comedic timing and wittiness is amazing

@12tone 7 жыл бұрын

Really interesting, but I'm curious how it would handle qualitatively different changes. To use the crush-a-baby example, how do we define how much further a world with a crushed baby is from a world with an uncrushed one? Is there an amount of tea-making efficiency that's worth that trade-off? Or, to compare to other side effects, is there an amount of, say, property damage that we could incur that just wouldn't be worth keeping that baby uncrushed? I can think of some extreme examples (I think that, if forced to choose, most people would choose crushing a single baby over destroying the entire food supply of the Earth, even if somehow no lives were directly lost in that process.) but then you have to figure out exactly where the tipping point is, and what the conversion rate between the two kinds of values are, and that puts you right back into the whole complexity issue, doesn't it? I don't see a way of cleanly defining all possible changes within a single linear value function.

@bestaround3323 5 жыл бұрын

That is a very interesting thought experiment

@count_of_darkness5541 5 жыл бұрын

Kill the baby and you will have no problems with tea making for the rest of your life. X)

@snim9515 4 жыл бұрын

That's something even humans debate about.

@ObjectsInMotion 4 жыл бұрын

That’s not an AI safety problem, that’s a human ethics problem. The people thinking about proper containment for nuclear weapons weren’t thinking “Ok so how can I make my containment mechanism less safe in rural areas, and more safe in urban areas, and how densely of an area is needed for x amount of safety”

@TheStarBlack 4 жыл бұрын

Erm I can't imagine choosing to knowingly crush a baby for literally anything in the world, no matter how rational it might look on paper.

@DamianReloaded 7 жыл бұрын

These problems really tingle my interest. One thing that could be useful for testing these kind of scenarios would be simulations. Like making a virtual robot and let it loose in GTA until it stops punching people and dragging them out of their cars. The worst it plays, the safer it'll be. ^_^

@tdoge 7 жыл бұрын

But then the ai would just stand still and do nothing. There has to be some objective besides score

@DamianReloaded 7 жыл бұрын

The AI just gotta do the drug-dealing without killing anybody. It's not that hard. Many Dominicans, do it all the time down the Bronx under the supervision of the police, judges and local authorities.. ^_^

@karlkastor 7 жыл бұрын

What if it's pretenting to be good?

@circuit10 2 жыл бұрын

It can break out of the simulation quite easily

@bobsmithy3103 6 жыл бұрын

WOOO! 4:07 I correctly guessed the answer after about 15 secs of thinking and now I feel temporarily smart and satisfied.

@cr9pr3 6 жыл бұрын

The modeling of the distance function seems incredible difficult to me. Assume the AI and your roommate both want to make a cup of tea. Breaking a persons neck might seems like a small change to the environment for the AI compared to boiling water and using up a bag of tea.

5 жыл бұрын

True, but you could try to program AI to measure value of a change by it's future effects. Like then doing moves in chess. Killing roommate may be a small change right now but have a big impact to the future state of the environment. No one will make funeral for tea bag.

@creativedesignation7880 4 жыл бұрын

I was thinking about a similar thing. What if the robot enters the kitchen to find a coworker who just made a cup of tea? Depending on the definition of distances in the state of the world, just taking the finished tea would cause less change to the enviroment than making another one. Even if it realises the person would now make another cup of tea, since that had already happened before it might not count as a significant change to the enviroment.

@SafeTrucking 5 жыл бұрын

Good explainer. You're getting close to the idea of a General Satisfaction Equilibrium. Much more complex than a simple Nash equilibrium, but vital to the future of AI.

@GregariousBant 7 жыл бұрын

plz donate 2 help rob stay in focus, his medical bills r expensive

@Traf063 7 жыл бұрын

Did I miss something?

@thisrocks 7 жыл бұрын

Traf Yes: annotation at the bottom of the screen at 5:35

@Traf063 7 жыл бұрын

thisrocks88 lold

@circuit10 4 жыл бұрын

Medical bills? That's what America is like

@circuit10 4 жыл бұрын

That's a fitting name

@wfpnknw32 8 ай бұрын

I really like the format of this. It can be hard to grasp how difficult something is if you just get told the answer (how it fails), having to try and work it out in your head helps to really illustrate and internalise the complexity.

@frankos5092 7 жыл бұрын

Your computerphile vids and this channel are among my top favs on youtube. Thanks for your effort and time! Its a crazy, fascinating subject, and you make it wholly tangible for the uninitiated like me. Big up yourself!

@cheydinal5401 4 жыл бұрын

I like how most of your videos are basically "Hey, this is a pretty good solution, isn't it?" - "No. It's not. We have absolutely no clue how to solve this."

@nikolatasev4948 4 жыл бұрын

Wow, nice! Maybe having the robot interact with the minimum amount of objects (babies, vases, other cars), and try to minimize the changes in the items it actually interacts with (returning the milk and tea box). Giving a list of items it is allowed to interact with is easier than giving a list of items is it NOT allowed to interact.

@andyplaysgundam 4 жыл бұрын

"I just get a bit blurry sometimes, it's a rare medical condition" I really like all these subtle yet unexpected humor in your videos

@virzenvirzen1490 7 жыл бұрын

Great video! How about the situation, when the robot doesn't find a kettle or a teabag and goes for the closest one, like to nearby shop? It is required for completing the task and, as far as I understand, he doesn't have any rules against stealing.

@mithrae4525 Жыл бұрын

Arguably he's got a rule that he MUST steal any additional stuff he needs: It changes only one variable (store-owner's possession of tea) versus changing two variables (store-owner's possession of tea plus money in the till).

@Nurr0 7 жыл бұрын

One of these days, when I'm working again, I'll support you via Patreon.

@daturave 6 жыл бұрын

Maybe as his housemaid

@worldsuckss 6 жыл бұрын

hahahahaha

@TheNightquaker 5 жыл бұрын

Wouldn't be the worst outcome, I tell ya what!

@IgorDz 5 жыл бұрын

@@TheNightquaker, except one day a robot will "tek his jerb!"

@NathanJackLouttit 6 жыл бұрын

'm excited to see the content you make, the computerphile videos you speak in are a joy to watch.

@andreinowikow2525 7 жыл бұрын

This channel is golden. Keep up the good work!

@Frumpbeard Жыл бұрын

Damn Rob, that final example sounds a lot like "through inaction, allow a human being to come to harm". As the robot is going to get tea, it sees someone outside smoking a cigarette. Preventing them from smoking would be safer than not. You get irritated after waiting a bit and go to check on your tea, and you see your robot passing a talking stick around in a circle of people that smoke, all tied to their chairs.

@Zach-si1gf 6 жыл бұрын

Very good breakdown, in laypeople terms while still keeping the nuance of a deep subject. Subbed!

@ZachAgape 4 жыл бұрын

Thanks for your vid! There were some interesting beginning of solutions for safety there, I enjoyed that a lot!

@almostbutnotentirelyunreas166 6 жыл бұрын

Robert is SO onto this! How is it possible that none of the 'BIG' players in AI development post any similar considerations? The silence is deafening and incredibly disturbing. Or has the future already been written?

@Fluxquark 5 жыл бұрын

This video is a great metaphor for corporations under capitalism: They will destroy almost anything they can get away with in order to maximise their profits. See the environment, for example.

@jopmens6960 4 жыл бұрын

Very interesting. It sounds to me like the example of the fan shouldn't matter as long as the AGI can keep up with processing the state(s), because whatever you defined should only be instrumental in it's coding to the extent that it doesn't matter, and even when it does matter like something getting into the fan, both 'the fan is ON' and the emergent properties of the changing angle of the blades inherently suggest danger for anything that interferes with it (when properly coded).

@JinKee 4 жыл бұрын

In the us armed forces there are 11 general orders and the entire UCMJ which serves as background reward functions (or punishment functions which are reward functions with negative rewards) which specifies negative side effects that commonly come up. this isn't to say that soldiers are robots, but to say that every legal order is in a context of background orders and laws that are always in force.

@LimeGreenTeknii 5 жыл бұрын

Another problem I came up with when it comes to the world state metric: stealing an object would probably be rated as a smaller change than paying for an object. In one instance, one object is being moved. In the other, not only is the object being moved, but the money is changing hands too.

@HebaruSan 6 жыл бұрын

If the robot concludes that it can't make the tea without running over the baby (maybe the baby is too in the way and can't be gotten around), we would want it to abort the task and report failure. That's not achieved by a simple "change as little as possible" weighting, since as long as it's allowed to make SOME changes, establishing necessity makes anything permissible. It needs to know relative weightings of side effects and which ones are simply unacceptable.

@NathanTAK 6 жыл бұрын

If you weight "squish the baby" as a high enough change, the robot will conclude "Well, I get 10 points for making tea, but lose 10000 points for squishing the baby. I can't make the tea without squishing the baby, so my options are (a) do nothing, net 0 points or (b) make tea, net -99990 points. I'll go with (a)."

@danielrhouck 4 жыл бұрын

One thing I’ve seen which is similar to this but not covered in (your videos about) the paper is reducing the *amount of entropy increase*. Hurting the baby or breaking the vase cause large increases in entropy; they are hard to undo precisely. Heating the water for the tea also involves increasing entropy, but not by as much and is harder to avoid.

@chocokittybo 4 жыл бұрын

See the first way I saw all the tea examples you said to think about breaking is the robot deciding that quantity of liquid in the milk container being of greater importance than the fact all the liquid is milk, leading to it filling the milk container with a volume of water equal to that of the milk removed from it. (Or if there are two milk containers it it starts pouring milk back and forth to try and make sure they both had the amount that was in them before the robot tried making tea)

@mmimmi1568 4 жыл бұрын

I lost it at "So that's pretty neat, like, we've added this one simple rule and the thing's already better than some of the house mates I've had" 🤣🤣

@bashredpath 5 жыл бұрын

I really like your videos . Makes me think . About solutions but that's just my logical mind at work . I always seem to need to solve everything .

@Nerdnumberone Жыл бұрын

Being observed by another complex agent (such as an animal or human) can change a world state in numerous, difficult to predict ways, especially if you project the difference farther into the future. I'm now picturing ninja tea robots that make every attempt to avoid being observed while performing their task in order to minimize side effects. These AI thought experiments are great inspiration for fictional AGI character for sci-fi stories. Even "safe" AGI could have some interesting quirks based on their alien value systems. A perfectly functional AGI that doesn't cause the apocalypse (or, through inaction, allow a predicted apocalypse to happen) would be fundamentally different than humans. Most fiction either makes advanced AI virtually indistinguishable from humans or makes them similar to autistic humans. An advanced chatbot can hold a semi-realistic conversation, so a sci-fi AGI could very well be more eloquent than the average human. I can't imagine many utility functions that wouldn't benefit from the ability to socially operate with humans effectively.

@wkingston1248 6 жыл бұрын

If your roommate makes tea and goes to the bathroom it is arguable that just taking their tea changes the world state less than making tea itself. That could be an issue where you have negative outcome that is the closest to the original world state.

@bscutajar 5 жыл бұрын

Man each one of this guys videos are incredibly interesting

@Nulono 4 жыл бұрын

Wouldn't the robot's simulation of "what happens if I sit here and do nothing" almost always include "the human is confused and tries to troubleshoot me"?

@veggiet2009 6 жыл бұрын

I think, and I don't know if you've covered this in future video, what is need is an algorithm on "intent recognition" what is the intent of the roommate pouring coffee, what is the intent of the cars driving on the highway. So shifting analysis of world states to world intent you can begin to give your robot a sense to be able to judge motivations, both inward and outward motivations, and you could create a reward function for keeping motivations in balance, rewarding extra points if others motivations are allowed but not necessarily penalized if other motivations fail (maybe the score is lowered a bit, a sympathy score if you will) I think a start for intention estimation would be programming a neural network to try to predict where a user will click on the screen based on their mouse movements and or their gaze. If the cursor moves toward a button and then the algorithm predicts that it will or won't press that button, if it is correct then it has correctly judged the intent of the user. A similar program holds for the tea making robot: if a person is in the kitchen what are they there for? making coffee or cooking, oh they are getting out filters, must be making coffee. If this intent is judged correctly and then the person's actions are successful my score will go up a tiny bit, I will allow this person to continue. If this person's intent is judged incorrectly I better relearn some things. If this persons intent is judged correctly but is not successful I may lose a few points, maybe I will help the person, then perform my own intent. Then secondarily if there is no intent measured, then world states are to be honored.

@akmonra 5 жыл бұрын

Another issue is it seems making predictions about the future would get increasingly complex given the task and factors involved, until the computation needed for such predictions would be greater than the amount of matter in the known universe.

@triton62674 7 жыл бұрын

Great examples on possible side-effets of AI

@himselfe 6 жыл бұрын

Which aspects of an object's state are important is dependent on context. If you're trying to change the temperature of the room, or avoid slicing your appendages, then the important aspects of the fan's state is whether the blades are moving and at which speed, their particular orientation does not provide valuable information to achieving your goal. In humans, this is where experience and imagination (or creative intelligence) come into play. Intelligence is adaptive, an intelligent system must have a function to be able to determine the value of something, not simply know it, and it must care about not only the value of an action, but also the value of measuring different aspects.

@count_of_darkness5541 5 жыл бұрын

It is hard to achieve as well, but to operate in a human-like manner the robot must learn to attribute some changes to itself, while other changes to the world. Yes, it causes a lot of new problems like "whether it was me who killed him or the bullet itself". On the other hand now you need to simulate only the changes in the objects that you are going to interact with (and some chain of changes that those interactions may produce where the least you are certain about the next step, the least you are responsible for its outcomes).

@WisdomThumbs 5 жыл бұрын

One way that simple rule ("limit changes to the environment") could go wrong, I think, is that it makes cleaning up pre-existing messes more difficult. And it'll give you hell as the floor gets more and more worn out, or as cups crack, or when any number of other little, unavoidable changes build up. EDIT: okay yeah, I didn't think about Next-Door-Susan or kids or pets causing changes in the environment, and the AI responding to *that.*

@margaretkvinnherad8952 19 күн бұрын

Thank you for the amazing work you do for humanity!

@iliaslerias7374 6 жыл бұрын

The effect your videos have had on how I think about AI is perhaps not what you might have anticipated. Now I'm not an expert on the subject (far from it) and my model for thinking about AI had more to do with science fiction than computer science. But you've made me realize just how dependent the first AIs would be on how we programmed them. Especially the last example about the rotating fan made me think of a situation where the robot makes the tea, approaches you, but only gives you the tea when the fan is where it was when It started moving towards the kitchen. Now to you this is a problem and I can see why, but it also shows how utterly dependent these things would be on us. Surely you can have catastrophic outcomes because your world model is not detailed or accurate or inclusive enough, at the same time it seems unlikely that we can jump from Facebook chat-bots to an AI that has goals and motivations the way we understand and experience them and that in my opinion limits how much damage it can cause precisely because it's world model is lacking compared to ours. I noticed that you avoided talking about consciousness at all and I think can guess why. Either you think that the word is poorly defined so you cannot really talk about it, or you think that being conscious basically means having a model of reality that is similar in detail to our own and the capacity to experience and interact with it. Am I close? If so, do you think that the stamp collector is conscious? If not, do you think we could eventually make something like that? Great content, keep it up!

@JakeFace0 5 жыл бұрын

A problem I thought of with the "if I do nothing" distance metric is that the first robot to make a cup of tea will likely be a momentous occasion. News articles will be written, the engineers will go on talk shows and their research will result in a huge impact on the AI research world at large. So from a robot's perspective the least impactful way to make a cup of tea is to make it and then kill the witnesses, frame someone for it and then destroy all the research along with itself. Or, less grimly, make the tea when no one is looking so nobody suspects it was made by an AI.

@gabrote42 3 жыл бұрын

5:28 My two first instincits is that it starts stealing tea from "out of bounds", or it tries to prevent you from doing anything that changes world state 6:59 I am still young so I don't see anything obvious. Maybe it goes wild if suddenly somebody walks into the room from "out of bounds"? Or maybe if it is driving?

@JannisAdmek 6 жыл бұрын

wow, I love this channel, you are awesome!

@mohammadaminarmani3190 4 жыл бұрын

very great job sir good bless you.

@davood123 3 жыл бұрын

Basically we need a "mindfulness" algorithm for AI systems

@zuupcat 5 жыл бұрын

5:21 House maids or house mates? Either you are extremely posh or you lived in a dorm.

@bensmith9253 5 жыл бұрын

This was fantastic!!

@conornorris6815 5 жыл бұрын

what if the first reasonably advanced agi we made was based on this dont change things situation but it got obsesive and punished any changes in technological level trapping us in a wierd unchanging yet still quite high tech world

@unw3ee 10 ай бұрын

Thx for a video. I have a beginner question: can any artificial intelligence rewrite its own objective function or it can only manipulate with a parameter values?

@adelarscheidt 7 жыл бұрын

Hey Robert, don't you think those problems disappear when an AGI learns through neural networks? The same way we just "know" stepping on the baby is bad, not by manually assiging a value to it, but because we persistently strenghtened the HUMAN - HARM - BAD - DON'T network. You know? There is a value, but it isn't assigned. And instead of assigning "not care" for unspecified variables, maybe the network has a way of grouping families of events based on the nature of outcomes it has previously learned, pretty much like a human brain does. We abstract real-world situations and apply the same principles we learned in completely new situations, which surely isn't always perfect, it's only good enough to secure the species. But why wouldn't an AGI be able to do the same?

@maximkazhenkov11 7 жыл бұрын

What you're suggesting is known as the detached lever fallacy: lesswrong.com/lw/sp/detached_lever_fallacy

@adelarscheidt 7 жыл бұрын

+maximkazhenkov11 quite some food for thought, thank you

@Xartab 7 жыл бұрын

The trouble with this idea is that humans _do_ have "manually assigned" values, in that our brains have physiological responses to certain situations that give a "pleasure" or "pain" response. Mirror neurons help with making this principle work when applied to others, but basically our values are coded in the genome.

@__-cx6lg 7 жыл бұрын

Adelar Scheidt Adelar Scheidt Good point. I guess the problem is that the learned behaviors wouldn't necessarily be on 100% agreement with humans. I mean, neural networks are notoriously complex and it's literally impossible to understand what exactly is going on. If we rely on "morality" learned through neural-networks, then we can't really be sure that it will match up with what we want it to do. I mean, what are the chances that an AI would come to have the exact same moral sense in every situation as a human? So I think that the unpredictability of neural nets lends credence to the idea that these kinds of problems should be solved manually by hardcoding them in.

@karlkastor 7 жыл бұрын

I think a better point would be: If we train a machine learning algorithm (like a neural network) on data of what humans do in different situations, wouldn't it also act as if it had human values?

@chebi97 2 жыл бұрын

For your second issue I was thinking about unpredictable stuff that the robot may not react properly to. Say you win the lottery while it’s getting your tea. That’s probably a big change to your environment that it’s not likely to have predicted, right? Won’t it try to revert it, like breaking your ticket? If it’s only using a distance function and a predictor, how does it know, whether a change was caused by it or not? Won’t it go “well, I made tea and this huge thing changed that I wasn’t expecting it to, I must have caused it myself, need to fix it”.

@Xdonker 7 жыл бұрын

nice, i actually liked it. when i can expect the next video?

@SamB-gn7fw 4 жыл бұрын

Very good video!

@TheSameDonkey 3 жыл бұрын

The problem stated here with respect to driving seems to me to be an issue of time/activity partitioning. If the evaluation is continuous then it would fail as described (or perhaps get stuck in some sort of recursive loop) But if it's a setup where it compares to "do nothing from when activity starts" it should be able to avoid at least that type of problem. What i do see as problematic, having thought about it for an hour or so is the value scaling of various changes to the world

@roger_isaksson Жыл бұрын

Every action taken in the world got its (in)advertent effects, and its associated (acceptable?) cost. The default operational costs: 1. Energy (moving, boiling water) 2. Consumables (tea bags) 3. Wear and tear in the agent 4. Wear and tear in the world All of these costs are correlated: If a robot moves smooth and gracefully, it is likely that it consumes less energy, wears slower and is gentler to its surroundings. However, it is likely that it will serve that cup of tea significantly slower than if it is merely trying to minimize the time to utility. However, a fire extinguishing robot might appear to be extremely brutish as it barges through walls and doors trying to save property and lives. It is safe to say that a robot making a cup of tea is doing so in an environment where it could incur significant cost, where a fire extinguisher bot operates in an environment where costs are occurring without its involvement.

@manicdee983 6 жыл бұрын

It's hard enough as a human to make a cup of tea: - leave everything how you found it - put things back where they belong - leaves or bags? - on the way to the kitchen I see a hazard (e.g. a custard tart perched precariously on the edge of a table) do I resolve the hazard or stick to making tea? The real safety issues will arise in the attempts to resolve conflicting heuristics. - do the thing - use the minimum number of actions to do the thing - make the fewest possible changes to world state - behave in a predictable manner - actively minimise hazards - some objects belong in certain storage areas - some objects have "safe" states (scissors in the holder in the sewing box are safe, any other condition is a hazard) - some objects have "hazardous" states (a pie sitting over the edge of the table is a hazard, any other condition is safe) - some objects have a nebulous "in use" state (pens and notepads, a stereo providing background music) - some objects have a nebulous "stowed" state (my laundry) Should the kettle be empty or full when not in use? What about heating the water in the tea cup in the microwave? Should the AI seek other methods of heating the water? What level of analysis is acceptable? Is our AI allowed to invest in energy company shares to reduce the cost of a cup of tea? What about participating in legislature to affect the price of electricity or the long term sustainability of electricity supply? What about participating in soil rehabilitation on tea plantations? What about reversing the entropy of the universe so that you get to enjoy that cup of tea forever?

@NawiasemPiszac 7 жыл бұрын

Ok.Sorry for bringing this up again. I've seen your video on Asmov's laws of robotics and how they wouldn't work, but minimizing changes in world state (ie. don't squish a baby) seems to me very close to 1st law. Of course implementation would be vastly different from idealized fictional rule set, but end result would be IMHO bit similar. Cheers

@RobertMilesAI 7 жыл бұрын

Well, for one thing, there's no "or by inaction..." clause here at all. This kind of impact reducing agent will absolutely allow a human to come to harm by inaction

@how2pick4name 6 жыл бұрын

You could turn the list around and say, you can only have physical contact with: the floor, tea kettle, etc. If you are about to have physical contact with anything else, avoid it or switch off if you can't.

@tylerm8143 3 жыл бұрын

Spoiler for 2001 a space odyssey This makes me think of the Hal 9000 computer. The A.i is hard wired into the explorer 1 spacecraft. It has the ability to control most elements of the spacecraft's functions. The Hal 9000 is the only one to get a briefing on the missions true purpose, The human crew is lied to about the missions purpose. Later in the movie the crew being to get paranoid about Hal and talk about shutting him down in private. Hal overhears this and decides to terminate the crew because they are jeopardizing the mission. This begs the question is Hal really evil or just a victim of his programming? The get tea example the video uses is the same concept. But instead the goal was to get to Jupiters orbit, Hal only cares about getting to the destination. Whatever he dose to get to that goal is justified in his mind.

@89sanson 3 жыл бұрын

How about minimizing the amount of energy required to complete the task? That should avoid all side effects that are avoidable, as it will always be more energy efficient to do the task without the side effect.

@nielsveekens6827 6 жыл бұрын

This is really interesting!

@ArtamisBot 2 жыл бұрын

This is one of my favorite channels about AI... But I'm scared to comment on any of these videos lest I look silly infront of an entity far more knowledgeable than myself...

@Uriel238 3 жыл бұрын

I suspect an AI would keep a catalog of all the things in its environment categorizing things into types, e.g. (in our household setting) other robots: avoid and don't interfere; Inhabitants: track, avoid and note wellness status; articles on kitchen counter: check for cleanliness / functionality and return to storage... dangerous objects not in use would be handled differently than fragile objects which would be handled differently than commonly used objects and so on.

@DieMasterMonkey 6 жыл бұрын

Epic! Instant sub.

@petrkinkal1509 6 жыл бұрын

Also what is bigger change moving a 1000 kg boulder by 1m or squashing some ones head? (It should be obvious what I mean in general.)

@SolomonUcko 4 жыл бұрын

For cases where it's possible, what about simplifying the environment? For things that interact with the real world, it's probably not possible, but for anything digital, you could probably restrict its actions to a safe subset.

@Tracks777 7 жыл бұрын

Great content. Keep it up!

@donaldhobson8873 7 жыл бұрын

The robot goes shopping for teabags, puts one in the box and throws the rest away.

@bronsoncarder2491 2 жыл бұрын

Reducing the problem to kind of an absurd level (kind of like you did with the stamp collector): If your function literally just said, "Do this task, and then return the world state to how it was"... Well, a smart enough AGI would freak out. I'd actually be interested in hearing what you think would happen in that situation, because you've given the program a task that is literally impossible. Because of entropy, there is no way to make that tea back into water, to return the flavor to the tea leaves, to separate the sugar back into individual grains (especially not the same grains they were before). You said that the computer might refill the pitcher, but... Would it? I mean, as you said, it massively depends on how it is coded (and, of course, no one would actually code it this way, this is an incredibly extreme example) but... To a robot, who can actually see the differences, is one pitcher of water exactly identical to the other? Maybe this water has .0001% more granite content? To us, "yep, that's water." To a robot, "er... no, those two are nothing alike. That one has more granite."

@sebbes333 6 жыл бұрын

5:05 The problem is that the robot will go crazy and try to put back the dust it has moved while moving, and try to put back that bird that happen to fly past the window, and so on...

@lubbnetobb 7 жыл бұрын

I get the feeling creating a general a.i. safely, is about figuring out a way to program common sense.

@amaarquadri 6 жыл бұрын

I think I have an argument that shows that the concept of having a distance metric and subtracting is no better than the original approach of prohibiting every undesirable side effect. Suppose the robot has 2 choices: 1) knock over a vase on the way to make you tea, or 2) trample a baby on the way to make you tea. (For the sake of argument, assume that no other option is possible.) Between these two, clearly the more desirable option is to break the vase. In order for the AI to understand that, the squished-baby world state must be further (according to the distance function) from the current world state than the broken-vase world state is. Similar scenarios can be proposed for other undesirable side effects. Thus, in order for the AI to correctly choose between any given two undesirable side effects, the distance function must essentially be a ranking of all possible future world states. Moreover, suppose two world states were omitted from this list (because programmers can't think of every possibility). If the AI were faced with a choice between these two world states, it would be indifferent between them (perhaps it would trample the baby instead of the vase simply because that path to the tea bags is shorter).

7 жыл бұрын

love the vids

@harold2718 4 жыл бұрын

After giving you the tea, the robot forces you to go back to your computer and rewrite the robots software, because that's what you would have done if the robot had done nothing (and you'd have concluded that it had a bug). It also puts back the bisquit that you took out, because you wouldn't have taken it if no tea had been made. No bisquits allowed.

@errorlooo8124 7 жыл бұрын

I'm sorry for everyone reading this that it looks so similar to the changing the world thing explained in the video but i commented before watching the video so i didn't know about that and it just so turns out that the thing the video talks about is quite similar to what i talked about in his comment. Ok so,what if we make the robot have a penalty whenever it damages something so for example we can rank it's penalties like bumping into something:-5 and denting it:-10 or bumping hard into it:-20 etc.. and those scores increase, the closer it has to get to an object so for example if there's no object around the bumping into something score might be:-1 but if more objects get into the scene the score gets bigger like -10 and what if we have "score zones" which tells the robot if it's like within 10 cm of the coffee machine the score of making coffee gets just a bit bigger like by +2 and so what it will do is that it wants to make the coffee but it also doesn't want to destroy things to for example if there's a baby in the way the score of -10 which it gets for bumping into it is more important than getting to the coffee machine so it will go around it but then since it's closer to the coffee machine,the score of making coffee get's added +2 so now it's suddenly bigger and the robot will be willing to take the risc of getting closer to the coffee machine because suddenly getting closer to it is not as bad,even though it's increasing the bump into something score it's also increasing the make coffee score.So it will make coffee without destroying anything,now there are some flaws to this but not major like when it get's closer to it and there's an object in the way it might bump into it but if we get the proportions of how much the score incises and decreases just right,i think this is a reliable way,but i know it isn't because i'm no genius and i was wondering how it could majorly go wrong? Sorry for suggesting this,i know it's probably not good but i just wanted to share my idea,and maybe it could help in making an AI that won't take over the world.Thanks for spending you last 5 minutes reading this and i'm sorry i wasted you't time i thought that was and interesting way to go about it. BTW.I know this sounds similar to the world not changing but this solves the problem of it trying to maximize not changing anything this time it tries maximizing not destroying things which i think is better.

@asudhdhabei1051 5 жыл бұрын

imagine somebody getting revived and the robot thinking: reviving is changing the environment, can't have that happen!

@icebluscorpion 3 жыл бұрын

can the non specified objectives/parameters set to negative? all the don't care things in your list at 1:59 to make the Ai safer?

@onesimpletrick3448 3 жыл бұрын

“Here’s your Tea.. also I rolled back the clock on the microwave to the time that I started”

@SonOfMeme 4 жыл бұрын

Here's a concept(and I'm sorry if this comes up in a later video, I'm just getting to them): Whitelisting. As in: Your goal is to make tea, but you may only: move yourself move the kettle use n amount of tea bags per cup etc. And then we just try to run the AI on as few white rules as possible(if the list is long enough(or task-conditional, oh my!), this could even apply to general-purpose AI), and the whitelist would be much easier for a human to write and wrap their head around since it limits the possibility space from the start and only lets through actions that a human has carefully considered instead of giving the AI free reign of the universe except for the areas our puny squishy minds were able to fathom and deem unsafe.

@Kernog 5 жыл бұрын

Another issue I'm thinking about: what if the conditions to make tea are not filled (water lacking in the kettle, no more tea, heater is broken), and the robot needs to alter his environment in order to complete its mission (respectively: fill the kettle, order/buy some tea, fix the heater)?

@josh-rx6ly 6 жыл бұрын

Also, is it possible that since there was a tea bag in the tin before it made tea and now there is not, it would try to put the wet tea bag back in the tin to minimise the change in the number of tea bags.

@Tempnamious 5 жыл бұрын

The answer is Evolution. In this case directed Evolution. I highly recommend talking with a geneticist - even if they don't know what you're talking about. Evolution by directed means over many generations will produce the desired goal without any of the nasty side effects as these will be "bred" out of the population of AI's due to selections pressures. Please just carefully consider the environment in which you direct the evolution - it is an extremely powerful tool . This is already used in computer science very regularly from what I understand and produces spectacular results. Combine your extensive knowledge with that of a Geneticist and that is where the answer will lie. The reason for this is that we want our technology to interact with out environment in a similar fashion to the way a "good Actor" from our species would, and we have evolved to this point. Technology has an advantage in that it can go through many more generations considerably more quickly than we are able to and therefore significantly speed up the process of Evolution - Evolution still remains the key.

@smob0 6 жыл бұрын

What if you allow the tea robot to observe the kitchen for a while, by giving it a few weeks of security video that a person has gone through and somehow flagged the desired/undesired world states. (EG it may be typical for people to leave a mess when they make tea, but you can flag it as bad). You could also have some threshold, where if the world around it is changing too much, it could ask for help or stop doing what it's doing.

@ToriKo_ 6 жыл бұрын

The thing that i thought would go wrong: what if the ai made a lot of tea, very efficiently, and so the massive amounts of tea it made make the WorldChange negligible? Something that I thought of that is more likely to be wrong: Also what if the ai improved itself before making any tea, so that it could do so more efficiently? This of corse will lead to the myriad of other problems you mentioned in your ComputerPhile videos.

@Cyberlisk 2 жыл бұрын

My main question here is, how do you define how distant a state in the world is from another one? For example, if you have an AI in the banking business, a simple flip of a bit - a really tiny physical change - can mean a lot of money. The AI would need full understanding of human semantics to each state. There are millions of bacteria and viruses flying around, how can an AI know which one is harmless and which one can cause a global pandemic? As in chaos theory, very small changes in the state can cause huge differences later, which is very difficult to predict even for an advanced AI.

@Gukslaven 6 жыл бұрын

For the second proposal, I thought a problem might be, it could guess the future world state incorrectly. For example, it guessed (60% chance) you would knock over the vase. You did not, but then of course it knocks over the vase after making the tea to be as close to its guess as possible.

@TheJavaMonkey 5 жыл бұрын

Gotta appreciate the “Mo’ Money, Mo’ Problems” bit at the end.

@fenderrexfender 4 жыл бұрын

In my time dealing with relational programming I find that states need defined periods and not all represented states have the same period. There is a cyclical nature that must be deducted. Producing a edge to trigger on that is calculated by a state watching a state machine makes me wish I had more experience in rtos and verlog.

@flymypg 7 жыл бұрын

What about explicitly discussing goal decomposition? Isn't "Make tea" actually a very high-level goal composed of an ordered sequence of other goals? What about training for "Travel safely to destination"? And that, in turn, would have its own decomposition. The key issue would then seem to ensure that decomposed goals don't conflict with the hight-level goal while still achieving their own goals. When this happens, either the high-level goal must be retried or deferred, or the lower-level goal must be redefined and retrained. So, wouldn't it be an unnecessary problem / complication if a high-level goal had to "worry" about the safety of lower-level goals? How should this be monitored / managed? At what point can / should we define "trust" in the safety domain?

@flymypg 7 жыл бұрын

I was hoping to trigger discussion concerning why creating a system by combining "safe" components won't work! The underlying problem is "emergent behavior": Parts that are "safe" in isolation can acquire new unsafe behaviors when combined. There are ways to mitigate such effects, but IIRC there is a proof that they cannot be completely eliminated in systems with sufficient complexity. That is to say, only "relatively simple" systems can be proven safe. So, if you want safety, keep your AI relatively dumb.

@samasterchef 6 жыл бұрын

We often talk about augmenting the human mind with artificial intelligence. What if, for the sake of general AGI safety, we augmented the AGI with the human mind in a read-only manner? I.e the model of reality that it understands is that of an actual human brain and it is incapable of learning from this model of reality to generate its own version? Or if it is to learn from this model of reality, it is grounded in the model as understood by the human mind. Switching on bidirectional communication (r/w) at a later stage could make for an interesting artificial selection evolutionary process.