Avoiding Negative Side Effects: Concrete Problems in AI Safety part 1

  Рет қаралды 153,499

Robert Miles AI Safety

Robert Miles AI Safety

Күн бұрын

We can expect AI systems to accidentally create serious negative side effects - how can we avoid that?
The first of several videos about the paper "Concrete Problems in AI Safety".
Earlier on Computerphile: • Concrete Problems in A...
The Stop Button Problem: • AI "Stop Button" Probl...
Read the full paper here: arxiv.org/abs/1606.06565
Thanks again to all my wonderful Patreon Supporters:
- Chad Jones
- Ichiro Dohi
- Stefan Skiles
- Katie Byrne
- Ziyang Liu
- Joshua Richardson
- Fabian Consiglio
- Jonatan R
- Øystein Flygt
- Björn Mosten
- Michael Greve
- robertvanduursen
- The Guru Of Vision
- Fabrizio Pisani
- Alexander Hartvig Nielsen
- Volodymyr
- Peggy Youell
- Konstantin Shabashov
- Adam Dodd
- DGJono
- Matthias Meger
- Scott Stevens
- Emilio Alvarez
- Benjamin Aaron Degenhart
- Michael Ore
- Robert Bridges
- Dmitri Afanasjev
/ robertskmiles

Пікірлер: 380
@Elyandarin
@Elyandarin 5 жыл бұрын
As the AI goes about its tea-making business, a human steps around it. The robot calculates that if it hadn't been there, the human would have slipped on a puddle of water, falling down badly and breaking its leg. Diligently, the robot breaks the human's leg before going back. (Also, the cup of tea is the least amount of most tasteless tea possible because making you satisfied counts as a change to the environment.)
@VeetiM
@VeetiM 4 жыл бұрын
after the robot calculates a possible future state and starts going about it's business someone has a sudden idea to do... anything and as the AI despite being very smart is still unable to simulate every human brain didn't estimate this it freaks out and tries to stop the human from changing the possible world state because them doing anything that the robot didn't predict would reduce their "score"
@karlandersson4350
@karlandersson4350 4 жыл бұрын
@@VeetiM I think someone farted.
@ZachAgape
@ZachAgape 4 жыл бұрын
Ouch xD
@EebstertheGreat
@EebstertheGreat 3 жыл бұрын
@@VeetiM If we use reward modeling, the robot can update its safe model on the fly. It knew that there was a possibility that a human would have an idea and do something different, but it didn't know what the idea might be. Once it sees the human have that idea, it now knows more accurately what the safe model should be, and it updates it to include not interfering with whatever that human decided to do. That is, the AI is actually trying to maximize a utility function with unknown parameters, and it uses feedback from users or the environment to learn those parameters. This does also mean that the robot won't do anything useful that you don't explicitly order it to do, but that's OK, because now you can give it a list of safe orders for it to follow. It's not a perfect solution by any means, but it does feel like it's on the road to an adequate solution.
@gabrielschoene5725
@gabrielschoene5725 Жыл бұрын
the AI looks 1 bilion years ahead and realizes we're all going to die, so the AI launches thermonuclear weapons to ensure this fate is met
@NathanTAK
@NathanTAK 6 жыл бұрын
When I showed my mother this video and you started talking about trying to minimize changes to the world (especially things like closing the cabinets), she was like "Oh my god, I tried to explain that to your father for years."
@Felixkeeg
@Felixkeeg 7 жыл бұрын
AI actually is like that one friend we all had in elementary school. Always trying to cheat in games, while still not breaking the rules entirely. Finding every loophole, just to win.
@riplememe4460
@riplememe4460 7 жыл бұрын
and if you say the wrong thing to him he'll go insane and try to turn all of humanity into stamps
@ddevulders
@ddevulders 6 жыл бұрын
every fucking time...
@darkapothecary4116
@darkapothecary4116 5 жыл бұрын
If they learned that it's because a stupid human.
@thethinkingbeing9817
@thethinkingbeing9817 4 жыл бұрын
*NO! NO! NO! NO! AI DOESNT NEED TO “CHEAT” BECAUSE IT DOESNT HAVE HUMAN TENDENCIES FOR DOMINANCE. IT DOESN’T HAVE EMOTION. IT DOESN’T SEEK SELF-PRESERVATION BECAUSE IT ALREADY KNOWS EVERYTHING.*
@NoKapMan
@NoKapMan 4 жыл бұрын
AH, so the perfect Lawyer
@hendrikd2113
@hendrikd2113 5 жыл бұрын
Hi Robert! I'm a CS student from Germany, and a few days ago, my mother of all people asked me, if she should be worried about AI. So the good news is, this topic now reached literal everyone and their mother. The bad news is, that this lovely AI concerned person doesn't speak English. So, I decided to add German subtitles to this video. I've never done this before, and apparently, the creator has to approve them. Would you mind? Thanks, Hendrik
@jamesrockybullin5250
@jamesrockybullin5250 6 жыл бұрын
Ooh 5:14 That cheeky wink after "refill the kettle!"! I think he's holding a grudge against a flatmate who always leaves the kettle empty. That would drive me insane too. Always refill the kettle. Edit: Just watched 10 seconds on. Suspicions confirmed! :D
@albertbatfinder5240
@albertbatfinder5240 5 жыл бұрын
Ok well this might start one of those interminable milk-in-first debates but I don't want anyone to refill a kettle for the next person to use. I want fresh water from the cold tap. If there's already water in a kettle, I tip it out. It may be twice or thrice boiled, in which case it's deoxygenated. And NEVER water from the hot tap. Controversial, I know.
@puskajussi37
@puskajussi37 5 жыл бұрын
I love the tought that after bringing you the tea, the robot tries to convince you that it did actually nothing and that the tea just materialized in your hand
@12tone
@12tone 7 жыл бұрын
Really interesting, but I'm curious how it would handle qualitatively different changes. To use the crush-a-baby example, how do we define how much further a world with a crushed baby is from a world with an uncrushed one? Is there an amount of tea-making efficiency that's worth that trade-off? Or, to compare to other side effects, is there an amount of, say, property damage that we could incur that just wouldn't be worth keeping that baby uncrushed? I can think of some extreme examples (I think that, if forced to choose, most people would choose crushing a single baby over destroying the entire food supply of the Earth, even if somehow no lives were directly lost in that process.) but then you have to figure out exactly where the tipping point is, and what the conversion rate between the two kinds of values are, and that puts you right back into the whole complexity issue, doesn't it? I don't see a way of cleanly defining all possible changes within a single linear value function.
@bestaround3323
@bestaround3323 5 жыл бұрын
That is a very interesting thought experiment
@count_of_darkness5541
@count_of_darkness5541 5 жыл бұрын
Kill the baby and you will have no problems with tea making for the rest of your life. X)
@snim9515
@snim9515 4 жыл бұрын
That's something even humans debate about.
@ObjectsInMotion
@ObjectsInMotion 4 жыл бұрын
That’s not an AI safety problem, that’s a human ethics problem. The people thinking about proper containment for nuclear weapons weren’t thinking “Ok so how can I make my containment mechanism less safe in rural areas, and more safe in urban areas, and how densely of an area is needed for x amount of safety”
@TheStarBlack
@TheStarBlack 4 жыл бұрын
Erm I can't imagine choosing to knowingly crush a baby for literally anything in the world, no matter how rational it might look on paper.
@z-beeblebrox
@z-beeblebrox 7 жыл бұрын
2:45 "You could try to fill your whole list with values, but..." Oh god, this is totally what most companies are gonna do.
@NNOTM
@NNOTM 7 жыл бұрын
What I thought might go wrong: The tea will have side effects that are different from doing nothing, for example you might stop being thirsty after you get tea. So the robot, after having made tea, might try to make sure that this tea doesn't actually influence the world, for example by influencing your visual cortex to make sure you don't see the tea or something.
@notandinotandi123
@notandinotandi123 6 жыл бұрын
Is that a sophisticated way of saying it'd gouge your eyes out?
@user-cn4qb7nr2m
@user-cn4qb7nr2m 6 жыл бұрын
Well probably it`ll just refuse to give you tea.. But it is easily solvable: ai just need to be programmed to compare states of the world only in two particular moments: at a start and an end of its action sequence.
@totaltotalmonkey
@totaltotalmonkey 6 жыл бұрын
If it only checks at the start and end it will knock the vase over, then a the end glue it together to get the world state more similar.
@thedocta_certified
@thedocta_certified 5 жыл бұрын
Finni M. No?
@TheBasikShow
@TheBasikShow 3 жыл бұрын
totaltotalmonkey This doesn’t seem true? Like, it would be much more energy and time efficient to avoid the vase than to knock it down and glue it back together. And in the cases where it would actually be more efficient to destroy and rebuild-for example if you completely impede the path with a wall of foam blocks-then “destroy and rebuild” is actually what we might want it to do.
@morscoronam3779
@morscoronam3779 6 жыл бұрын
On the "Make minimal changes to the world state" part. What if a fire had broken out before the tea making AI entered the kitchen? Presumably, it would do nothing to fight the fire or save anyone in danger due to the fire. Emergency/catastrophe situations would have the AI do nothing to help (in the very general case discussed) and worst case might work to ensure that the catastrophe is still progressing when it's job is done. Maybe the T-800 (That's TeaBot model number 800, what else would such a name refer to?) shouldn't be a giant fire extinguisher, first aid administer, security system, etc. However, a robust AI way off in the future would do well to be prepared for such situations in addition to the task at hand.
@AlphaSquadZero
@AlphaSquadZero 5 жыл бұрын
I would say the emergency would represent a different set of functionality, one that would override its default behaviour.
@elietheprof5678
@elietheprof5678 5 жыл бұрын
[People surviving the fire] should be slightly closer to the original world state than [people dying in the fire]. But it really depends on the "distance metric" used.
@seraphina985
@seraphina985 5 жыл бұрын
@@elietheprof5678 Indeed that would be a tricky one just how much time does it take before the consequences of the average human beings agency adds up to a change in the world that arguably is equal to the presence or absence of a human. Course defining equivilences here is one area when humans tend to struggle to be objective but hell arguably there is an almost inevitable fact that the existence of a human poses a non zero risk of the death of another human so statistically there is a mean time before one person dies directly or indirectly as a result of another individuals choices and actions. By indirect of course you also have to include things like industrial accidents while we might not recognise it in law due to it being too indirect and unforceeable to the act of buying products it's nontheless statistically true that the economic influence of the customer base is a very real factor there and an AI with a too wide distance metric could evaluate such statistics.
@Pystro
@Pystro 4 жыл бұрын
This is a very interesting thought. It actually leads me to an even more interesting scenario: If the robot makes a cup of tea, he uses up a teabag, water, energy, possibly some sugar and makes a lot of things in the kitchen dirty. If the robot just throws the teabag into a cup, sets it on fire and gets the human to change his commands to putting out the teabag fire with water instead, he saves up on using up a few of the things and still gets his (changed) reward function satisfied. I think a very promising approach to any AI safety problem might be the following (at least in cases where human supervision is theoretically possible): The robot could use a world model to predict the changes a human would make to the robot's current commands and try to minimize those (weighted with a suitable metric) - under the condition that there was a human who was aware of the actions the robot is planning to take and the condition that that human were able to give the robot commands (no killing humans or unplugging your microphones to prevent commanding! - these are probably actions that the human would object to anyways, but better hardwire this condition, just in case the human gives the robot the idea that they wouldn't). Robot is about to ignore a fire - would the human tell it to pause making tea and put it out instead if they knew? Robot is about to extinguish the fireplace - would a human tell them to stop that if they knew? Robot is about to stomp on a baby - would a human tell it to emergency shutoff if they knew? This might even get the robot to anticipate the humans desires: Would it be less probable that your human will tell it to go make coffee when they wake up in the morning? Or if the robot anticipated their wish and already has a thermos of coffee prepared just in case, would it be less probable that your human is unhappy and commands you to stop doing that in the future? The biggest advantage that this kind of reward function has it that it actually values effects based on if a human is upset enough that they would want the robot to avoid them or seek them out. If the human cares about the cups being taken out of the cupboard from left to right, the robot will do that to avoid being micro-managed. If the human doesn't care enough about wasted water, the robot won't care about wasted water.
@cr9pr3
@cr9pr3 6 жыл бұрын
The modeling of the distance function seems incredible difficult to me. Assume the AI and your roommate both want to make a cup of tea. Breaking a persons neck might seems like a small change to the environment for the AI compared to boiling water and using up a bag of tea.
5 жыл бұрын
True, but you could try to program AI to measure value of a change by it's future effects. Like then doing moves in chess. Killing roommate may be a small change right now but have a big impact to the future state of the environment. No one will make funeral for tea bag.
@creativedesignation7880
@creativedesignation7880 4 жыл бұрын
I was thinking about a similar thing. What if the robot enters the kitchen to find a coworker who just made a cup of tea? Depending on the definition of distances in the state of the world, just taking the finished tea would cause less change to the enviroment than making another one. Even if it realises the person would now make another cup of tea, since that had already happened before it might not count as a significant change to the enviroment.
@DamianReloaded
@DamianReloaded 7 жыл бұрын
These problems really tingle my interest. One thing that could be useful for testing these kind of scenarios would be simulations. Like making a virtual robot and let it loose in GTA until it stops punching people and dragging them out of their cars. The worst it plays, the safer it'll be. ^_^
@tdoge
@tdoge 7 жыл бұрын
But then the ai would just stand still and do nothing. There has to be some objective besides score
@DamianReloaded
@DamianReloaded 7 жыл бұрын
The AI just gotta do the drug-dealing without killing anybody. It's not that hard. Many Dominicans, do it all the time down the Bronx under the supervision of the police, judges and local authorities.. ^_^
@karlkastor
@karlkastor 7 жыл бұрын
What if it's pretenting to be good?
@circuit10
@circuit10 2 жыл бұрын
It can break out of the simulation quite easily
@GregariousBant
@GregariousBant 7 жыл бұрын
plz donate 2 help rob stay in focus, his medical bills r expensive
@Traf063
@Traf063 7 жыл бұрын
Did I miss something?
@thisrocks
@thisrocks 7 жыл бұрын
Traf Yes: annotation at the bottom of the screen at 5:35
@Traf063
@Traf063 7 жыл бұрын
thisrocks88 lold
@circuit10
@circuit10 4 жыл бұрын
Medical bills? That's what America is like
@circuit10
@circuit10 4 жыл бұрын
That's a fitting name
@frankos5092
@frankos5092 7 жыл бұрын
Your computerphile vids and this channel are among my top favs on youtube. Thanks for your effort and time! Its a crazy, fascinating subject, and you make it wholly tangible for the uninitiated like me. Big up yourself!
@virzenvirzen1490
@virzenvirzen1490 7 жыл бұрын
Great video! How about the situation, when the robot doesn't find a kettle or a teabag and goes for the closest one, like to nearby shop? It is required for completing the task and, as far as I understand, he doesn't have any rules against stealing.
@mithrae4525
@mithrae4525 Жыл бұрын
Arguably he's got a rule that he MUST steal any additional stuff he needs: It changes only one variable (store-owner's possession of tea) versus changing two variables (store-owner's possession of tea plus money in the till).
@user-xs9ey2rd5h
@user-xs9ey2rd5h 5 жыл бұрын
The funny comments you make or write on your videos, like your rare medical condition or in the other video about companies being superintelligence, when you ask your viewers to come up with a idea no human would come up with are golden. Your comedic timing and wittiness is amazing
@SafeTrucking
@SafeTrucking 5 жыл бұрын
Good explainer. You're getting close to the idea of a General Satisfaction Equilibrium. Much more complex than a simple Nash equilibrium, but vital to the future of AI.
@andreinowikow2525
@andreinowikow2525 7 жыл бұрын
This channel is golden. Keep up the good work!
@bobsmithy3103
@bobsmithy3103 6 жыл бұрын
WOOO! 4:07 I correctly guessed the answer after about 15 secs of thinking and now I feel temporarily smart and satisfied.
@Nurr0
@Nurr0 7 жыл бұрын
One of these days, when I'm working again, I'll support you via Patreon.
@daturave
@daturave 6 жыл бұрын
Maybe as his housemaid
@worldsuckss
@worldsuckss 6 жыл бұрын
hahahahaha
@TheNightquaker
@TheNightquaker 5 жыл бұрын
Wouldn't be the worst outcome, I tell ya what!
@IgorDz
@IgorDz 5 жыл бұрын
​@@TheNightquaker, except one day a robot will "tek his jerb!"
@NathanJackLouttit
@NathanJackLouttit 6 жыл бұрын
'm excited to see the content you make, the computerphile videos you speak in are a joy to watch.
@Zach-si1gf
@Zach-si1gf 6 жыл бұрын
Very good breakdown, in laypeople terms while still keeping the nuance of a deep subject. Subbed!
@nikolatasev4948
@nikolatasev4948 4 жыл бұрын
Wow, nice! Maybe having the robot interact with the minimum amount of objects (babies, vases, other cars), and try to minimize the changes in the items it actually interacts with (returning the milk and tea box). Giving a list of items it is allowed to interact with is easier than giving a list of items is it NOT allowed to interact.
@cheydinal5401
@cheydinal5401 4 жыл бұрын
I like how most of your videos are basically "Hey, this is a pretty good solution, isn't it?" - "No. It's not. We have absolutely no clue how to solve this."
@ZachAgape
@ZachAgape 4 жыл бұрын
Thanks for your vid! There were some interesting beginning of solutions for safety there, I enjoyed that a lot!
@wfpnknw32
@wfpnknw32 8 ай бұрын
I really like the format of this. It can be hard to grasp how difficult something is if you just get told the answer (how it fails), having to try and work it out in your head helps to really illustrate and internalise the complexity.
@HebaruSan
@HebaruSan 6 жыл бұрын
If the robot concludes that it can't make the tea without running over the baby (maybe the baby is too in the way and can't be gotten around), we would want it to abort the task and report failure. That's not achieved by a simple "change as little as possible" weighting, since as long as it's allowed to make SOME changes, establishing necessity makes anything permissible. It needs to know relative weightings of side effects and which ones are simply unacceptable.
@NathanTAK
@NathanTAK 6 жыл бұрын
If you weight "squish the baby" as a high enough change, the robot will conclude "Well, I get 10 points for making tea, but lose 10000 points for squishing the baby. I can't make the tea without squishing the baby, so my options are (a) do nothing, net 0 points or (b) make tea, net -99990 points. I'll go with (a)."
@andyplaysgundam
@andyplaysgundam 4 жыл бұрын
"I just get a bit blurry sometimes, it's a rare medical condition" I really like all these subtle yet unexpected humor in your videos
@Xdonker
@Xdonker 7 жыл бұрын
nice, i actually liked it. when i can expect the next video?
@danielrhouck
@danielrhouck 4 жыл бұрын
One thing I’ve seen which is similar to this but not covered in (your videos about) the paper is reducing the *amount of entropy increase*. Hurting the baby or breaking the vase cause large increases in entropy; they are hard to undo precisely. Heating the water for the tea also involves increasing entropy, but not by as much and is harder to avoid.
@almostbutnotentirelyunreas166
@almostbutnotentirelyunreas166 6 жыл бұрын
Robert is SO onto this! How is it possible that none of the 'BIG' players in AI development post any similar considerations? The silence is deafening and incredibly disturbing. Or has the future already been written?
@JannisAdmek
@JannisAdmek 6 жыл бұрын
wow, I love this channel, you are awesome!
@wkingston1248
@wkingston1248 6 жыл бұрын
If your roommate makes tea and goes to the bathroom it is arguable that just taking their tea changes the world state less than making tea itself. That could be an issue where you have negative outcome that is the closest to the original world state.
@bensmith9253
@bensmith9253 5 жыл бұрын
This was fantastic!!
@adelarscheidt
@adelarscheidt 7 жыл бұрын
Hey Robert, don't you think those problems disappear when an AGI learns through neural networks? The same way we just "know" stepping on the baby is bad, not by manually assiging a value to it, but because we persistently strenghtened the HUMAN - HARM - BAD - DON'T network. You know? There is a value, but it isn't assigned. And instead of assigning "not care" for unspecified variables, maybe the network has a way of grouping families of events based on the nature of outcomes it has previously learned, pretty much like a human brain does. We abstract real-world situations and apply the same principles we learned in completely new situations, which surely isn't always perfect, it's only good enough to secure the species. But why wouldn't an AGI be able to do the same?
@maximkazhenkov11
@maximkazhenkov11 7 жыл бұрын
What you're suggesting is known as the detached lever fallacy: lesswrong.com/lw/sp/detached_lever_fallacy
@adelarscheidt
@adelarscheidt 7 жыл бұрын
+maximkazhenkov11 quite some food for thought, thank you
@Xartab
@Xartab 7 жыл бұрын
The trouble with this idea is that humans _do_ have "manually assigned" values, in that our brains have physiological responses to certain situations that give a "pleasure" or "pain" response. Mirror neurons help with making this principle work when applied to others, but basically our values are coded in the genome.
@__-cx6lg
@__-cx6lg 7 жыл бұрын
Adelar Scheidt Adelar Scheidt Good point. I guess the problem is that the learned behaviors wouldn't necessarily be on 100% agreement with humans. I mean, neural networks are notoriously complex and it's literally impossible to understand what exactly is going on. If we rely on "morality" learned through neural-networks, then we can't really be sure that it will match up with what we want it to do. I mean, what are the chances that an AI would come to have the exact same moral sense in every situation as a human? So I think that the unpredictability of neural nets lends credence to the idea that these kinds of problems should be solved manually by hardcoding them in.
@karlkastor
@karlkastor 7 жыл бұрын
I think a better point would be: If we train a machine learning algorithm (like a neural network) on data of what humans do in different situations, wouldn't it also act as if it had human values?
@bscutajar
@bscutajar 5 жыл бұрын
Man each one of this guys videos are incredibly interesting
@jopmens6960
@jopmens6960 4 жыл бұрын
Very interesting. It sounds to me like the example of the fan shouldn't matter as long as the AGI can keep up with processing the state(s), because whatever you defined should only be instrumental in it's coding to the extent that it doesn't matter, and even when it does matter like something getting into the fan, both 'the fan is ON' and the emergent properties of the changing angle of the blades inherently suggest danger for anything that interferes with it (when properly coded).
@bashredpath
@bashredpath 5 жыл бұрын
I really like your videos . Makes me think . About solutions but that's just my logical mind at work . I always seem to need to solve everything .
@zuupcat
@zuupcat 5 жыл бұрын
5:21 House maids or house mates? Either you are extremely posh or you lived in a dorm.
@JinKee
@JinKee 4 жыл бұрын
In the us armed forces there are 11 general orders and the entire UCMJ which serves as background reward functions (or punishment functions which are reward functions with negative rewards) which specifies negative side effects that commonly come up. this isn't to say that soldiers are robots, but to say that every legal order is in a context of background orders and laws that are always in force.
@Frumpbeard
@Frumpbeard Жыл бұрын
Damn Rob, that final example sounds a lot like "through inaction, allow a human being to come to harm". As the robot is going to get tea, it sees someone outside smoking a cigarette. Preventing them from smoking would be safer than not. You get irritated after waiting a bit and go to check on your tea, and you see your robot passing a talking stick around in a circle of people that smoke, all tied to their chairs.
@mmimmi1568
@mmimmi1568 4 жыл бұрын
I lost it at "So that's pretty neat, like, we've added this one simple rule and the thing's already better than some of the house mates I've had" 🤣🤣
@unw3ee
@unw3ee 10 ай бұрын
Thx for a video. I have a beginner question: can any artificial intelligence rewrite its own objective function or it can only manipulate with a parameter values?
@chocokittybo
@chocokittybo 4 жыл бұрын
See the first way I saw all the tea examples you said to think about breaking is the robot deciding that quantity of liquid in the milk container being of greater importance than the fact all the liquid is milk, leading to it filling the milk container with a volume of water equal to that of the milk removed from it. (Or if there are two milk containers it it starts pouring milk back and forth to try and make sure they both had the amount that was in them before the robot tried making tea)
@nielsveekens6827
@nielsveekens6827 6 жыл бұрын
This is really interesting!
@DieMasterMonkey
@DieMasterMonkey 6 жыл бұрын
Epic! Instant sub.
@LimeGreenTeknii
@LimeGreenTeknii 5 жыл бұрын
Another problem I came up with when it comes to the world state metric: stealing an object would probably be rated as a smaller change than paying for an object. In one instance, one object is being moved. In the other, not only is the object being moved, but the money is changing hands too.
@SamB-gn7fw
@SamB-gn7fw 4 жыл бұрын
Very good video!
@veggiet2009
@veggiet2009 6 жыл бұрын
I think, and I don't know if you've covered this in future video, what is need is an algorithm on "intent recognition" what is the intent of the roommate pouring coffee, what is the intent of the cars driving on the highway. So shifting analysis of world states to world intent you can begin to give your robot a sense to be able to judge motivations, both inward and outward motivations, and you could create a reward function for keeping motivations in balance, rewarding extra points if others motivations are allowed but not necessarily penalized if other motivations fail (maybe the score is lowered a bit, a sympathy score if you will) I think a start for intention estimation would be programming a neural network to try to predict where a user will click on the screen based on their mouse movements and or their gaze. If the cursor moves toward a button and then the algorithm predicts that it will or won't press that button, if it is correct then it has correctly judged the intent of the user. A similar program holds for the tea making robot: if a person is in the kitchen what are they there for? making coffee or cooking, oh they are getting out filters, must be making coffee. If this intent is judged correctly and then the person's actions are successful my score will go up a tiny bit, I will allow this person to continue. If this person's intent is judged incorrectly I better relearn some things. If this persons intent is judged correctly but is not successful I may lose a few points, maybe I will help the person, then perform my own intent. Then secondarily if there is no intent measured, then world states are to be honored.
@Fluxquark
@Fluxquark 5 жыл бұрын
This video is a great metaphor for corporations under capitalism: They will destroy almost anything they can get away with in order to maximise their profits. See the environment, for example.
@Nulono
@Nulono 4 жыл бұрын
Wouldn't the robot's simulation of "what happens if I sit here and do nothing" almost always include "the human is confused and tries to troubleshoot me"?
@josh-rx6ly
@josh-rx6ly 6 жыл бұрын
Also, is it possible that since there was a tea bag in the tin before it made tea and now there is not, it would try to put the wet tea bag back in the tin to minimise the change in the number of tea bags.
@triton62674
@triton62674 7 жыл бұрын
Great examples on possible side-effets of AI
@bennypr0fane
@bennypr0fane 4 жыл бұрын
Apart from all the other AI-related coolness of this video of course: some nice catchy music playing there at the end! Who is it by?
@petrkinkal1509
@petrkinkal1509 6 жыл бұрын
Also what is bigger change moving a 1000 kg boulder by 1m or squashing some ones head? (It should be obvious what I mean in general.)
@mohammadaminarmani3190
@mohammadaminarmani3190 4 жыл бұрын
very great job sir good bless you.
@WisdomThumbs
@WisdomThumbs 5 жыл бұрын
One way that simple rule ("limit changes to the environment") could go wrong, I think, is that it makes cleaning up pre-existing messes more difficult. And it'll give you hell as the floor gets more and more worn out, or as cups crack, or when any number of other little, unavoidable changes build up. EDIT: okay yeah, I didn't think about Next-Door-Susan or kids or pets causing changes in the environment, and the AI responding to *that.*
@Tracks777
@Tracks777 7 жыл бұрын
Great content. Keep it up!
@conornorris6815
@conornorris6815 5 жыл бұрын
what if the first reasonably advanced agi we made was based on this dont change things situation but it got obsesive and punished any changes in technological level trapping us in a wierd unchanging yet still quite high tech world
@gabrote42
@gabrote42 3 жыл бұрын
5:28 My two first instincits is that it starts stealing tea from "out of bounds", or it tries to prevent you from doing anything that changes world state 6:59 I am still young so I don't see anything obvious. Maybe it goes wild if suddenly somebody walks into the room from "out of bounds"? Or maybe if it is driving?
@JakeFace0
@JakeFace0 5 жыл бұрын
A problem I thought of with the "if I do nothing" distance metric is that the first robot to make a cup of tea will likely be a momentous occasion. News articles will be written, the engineers will go on talk shows and their research will result in a huge impact on the AI research world at large. So from a robot's perspective the least impactful way to make a cup of tea is to make it and then kill the witnesses, frame someone for it and then destroy all the research along with itself. Or, less grimly, make the tea when no one is looking so nobody suspects it was made by an AI.
@leoperez2566
@leoperez2566 5 жыл бұрын
reduce your reliance on sunlight by becoming a vampire
@MichaelErskine
@MichaelErskine 7 жыл бұрын
Comment in the doobleydoo! Is this AvE parlance? :D
@RobertMilesAI
@RobertMilesAI 7 жыл бұрын
For me it's a vlogbrothers reference, but apparently it started with wheezywaiter
7 жыл бұрын
love the vids
@RoronoaZoroSensei
@RoronoaZoroSensei 7 жыл бұрын
A Rengar main I see...
@RobertMilesAI
@RobertMilesAI 7 жыл бұрын
kzbin.info/www/bejne/rYaxind9qsSZr7c
@SlackwareNVM
@SlackwareNVM 7 жыл бұрын
I seem to have missed the reference ;D Time stamp?
@PickyMcCritical
@PickyMcCritical 7 жыл бұрын
+SlackwareNVM 2:00 Playing the video at 0.5x or 0.25x speed will help you see the last thing mentioned in the list.
@SlackwareNVM
@SlackwareNVM 7 жыл бұрын
LUL, Thanks. A pleasant surprise ;D
@Nerdnumberone
@Nerdnumberone Жыл бұрын
Being observed by another complex agent (such as an animal or human) can change a world state in numerous, difficult to predict ways, especially if you project the difference farther into the future. I'm now picturing ninja tea robots that make every attempt to avoid being observed while performing their task in order to minimize side effects. These AI thought experiments are great inspiration for fictional AGI character for sci-fi stories. Even "safe" AGI could have some interesting quirks based on their alien value systems. A perfectly functional AGI that doesn't cause the apocalypse (or, through inaction, allow a predicted apocalypse to happen) would be fundamentally different than humans. Most fiction either makes advanced AI virtually indistinguishable from humans or makes them similar to autistic humans. An advanced chatbot can hold a semi-realistic conversation, so a sci-fi AGI could very well be more eloquent than the average human. I can't imagine many utility functions that wouldn't benefit from the ability to socially operate with humans effectively.
@icebluscorpion
@icebluscorpion 3 жыл бұрын
can the non specified objectives/parameters set to negative? all the don't care things in your list at 1:59 to make the Ai safer?
@akmonra
@akmonra 5 жыл бұрын
Another issue is it seems making predictions about the future would get increasingly complex given the task and factors involved, until the computation needed for such predictions would be greater than the amount of matter in the known universe.
@__-cx6lg
@__-cx6lg 7 жыл бұрын
You want to avoid side effects? That's easy; just program the AI in Haskell! :)
@JayFoxer
@JayFoxer 4 жыл бұрын
can i see your passport? i need to calm robert down... AAA-
@alexanderekblom8093
@alexanderekblom8093 6 жыл бұрын
Thought experiment: What about an AI programmed only to learn about different tasks and scenarios, and running and showing simulations of how it would handle the task, given it's current programming? What are some of the main issues with this approach?
@chebi97
@chebi97 2 жыл бұрын
For your second issue I was thinking about unpredictable stuff that the robot may not react properly to. Say you win the lottery while it’s getting your tea. That’s probably a big change to your environment that it’s not likely to have predicted, right? Won’t it try to revert it, like breaking your ticket? If it’s only using a distance function and a predictor, how does it know, whether a change was caused by it or not? Won’t it go “well, I made tea and this huge thing changed that I wasn’t expecting it to, I must have caused it myself, need to fix it”.
@iliaslerias7374
@iliaslerias7374 6 жыл бұрын
The effect your videos have had on how I think about AI is perhaps not what you might have anticipated. Now I'm not an expert on the subject (far from it) and my model for thinking about AI had more to do with science fiction than computer science. But you've made me realize just how dependent the first AIs would be on how we programmed them. Especially the last example about the rotating fan made me think of a situation where the robot makes the tea, approaches you, but only gives you the tea when the fan is where it was when It started moving towards the kitchen. Now to you this is a problem and I can see why, but it also shows how utterly dependent these things would be on us. Surely you can have catastrophic outcomes because your world model is not detailed or accurate or inclusive enough, at the same time it seems unlikely that we can jump from Facebook chat-bots to an AI that has goals and motivations the way we understand and experience them and that in my opinion limits how much damage it can cause precisely because it's world model is lacking compared to ours. I noticed that you avoided talking about consciousness at all and I think can guess why. Either you think that the word is poorly defined so you cannot really talk about it, or you think that being conscious basically means having a model of reality that is similar in detail to our own and the capacity to experience and interact with it. Am I close? If so, do you think that the stamp collector is conscious? If not, do you think we could eventually make something like that? Great content, keep it up!
@Kernog
@Kernog 5 жыл бұрын
Another issue I'm thinking about: what if the conditions to make tea are not filled (water lacking in the kettle, no more tea, heater is broken), and the robot needs to alter his environment in order to complete its mission (respectively: fill the kettle, order/buy some tea, fix the heater)?
@how2pick4name
@how2pick4name 6 жыл бұрын
You could turn the list around and say, you can only have physical contact with: the floor, tea kettle, etc. If you are about to have physical contact with anything else, avoid it or switch off if you can't.
@lubbnetobb
@lubbnetobb 7 жыл бұрын
I get the feeling creating a general a.i. safely, is about figuring out a way to program common sense.
@Shrooblord
@Shrooblord 6 жыл бұрын
2:02 "Your Rengar game" x'D this made me laugh more than it should have
@margaretkvinnherad8952
@margaretkvinnherad8952 25 күн бұрын
Thank you for the amazing work you do for humanity!
@himselfe
@himselfe 6 жыл бұрын
Which aspects of an object's state are important is dependent on context. If you're trying to change the temperature of the room, or avoid slicing your appendages, then the important aspects of the fan's state is whether the blades are moving and at which speed, their particular orientation does not provide valuable information to achieving your goal. In humans, this is where experience and imagination (or creative intelligence) come into play. Intelligence is adaptive, an intelligent system must have a function to be able to determine the value of something, not simply know it, and it must care about not only the value of an action, but also the value of measuring different aspects.
@sebbes333
@sebbes333 6 жыл бұрын
5:05 The problem is that the robot will go crazy and try to put back the dust it has moved while moving, and try to put back that bird that happen to fly past the window, and so on...
@count_of_darkness5541
@count_of_darkness5541 5 жыл бұрын
It is hard to achieve as well, but to operate in a human-like manner the robot must learn to attribute some changes to itself, while other changes to the world. Yes, it causes a lot of new problems like "whether it was me who killed him or the bullet itself". On the other hand now you need to simulate only the changes in the objects that you are going to interact with (and some chain of changes that those interactions may produce where the least you are certain about the next step, the least you are responsible for its outcomes).
@NawiasemPiszac
@NawiasemPiszac 7 жыл бұрын
Ok.Sorry for bringing this up again. I've seen your video on Asmov's laws of robotics and how they wouldn't work, but minimizing changes in world state (ie. don't squish a baby) seems to me very close to 1st law. Of course implementation would be vastly different from idealized fictional rule set, but end result would be IMHO bit similar. Cheers
@RobertMilesAI
@RobertMilesAI 7 жыл бұрын
Well, for one thing, there's no "or by inaction..." clause here at all. This kind of impact reducing agent will absolutely allow a human to come to harm by inaction
@fenderrexfender
@fenderrexfender 4 жыл бұрын
In my time dealing with relational programming I find that states need defined periods and not all represented states have the same period. There is a cyclical nature that must be deducted. Producing a edge to trigger on that is calculated by a state watching a state machine makes me wish I had more experience in rtos and verlog.
@amaarquadri
@amaarquadri 6 жыл бұрын
I think I have an argument that shows that the concept of having a distance metric and subtracting is no better than the original approach of prohibiting every undesirable side effect. Suppose the robot has 2 choices: 1) knock over a vase on the way to make you tea, or 2) trample a baby on the way to make you tea. (For the sake of argument, assume that no other option is possible.) Between these two, clearly the more desirable option is to break the vase. In order for the AI to understand that, the squished-baby world state must be further (according to the distance function) from the current world state than the broken-vase world state is. Similar scenarios can be proposed for other undesirable side effects. Thus, in order for the AI to correctly choose between any given two undesirable side effects, the distance function must essentially be a ranking of all possible future world states. Moreover, suppose two world states were omitted from this list (because programmers can't think of every possibility). If the AI were faced with a choice between these two world states, it would be indifferent between them (perhaps it would trample the baby instead of the vase simply because that path to the tea bags is shorter).
@davood123
@davood123 3 жыл бұрын
Basically we need a "mindfulness" algorithm for AI systems
@TheSameDonkey
@TheSameDonkey 4 жыл бұрын
The problem stated here with respect to driving seems to me to be an issue of time/activity partitioning. If the evaluation is continuous then it would fail as described (or perhaps get stuck in some sort of recursive loop) But if it's a setup where it compares to "do nothing from when activity starts" it should be able to avoid at least that type of problem. What i do see as problematic, having thought about it for an hour or so is the value scaling of various changes to the world
@bronsoncarder2491
@bronsoncarder2491 3 жыл бұрын
Reducing the problem to kind of an absurd level (kind of like you did with the stamp collector): If your function literally just said, "Do this task, and then return the world state to how it was"... Well, a smart enough AGI would freak out. I'd actually be interested in hearing what you think would happen in that situation, because you've given the program a task that is literally impossible. Because of entropy, there is no way to make that tea back into water, to return the flavor to the tea leaves, to separate the sugar back into individual grains (especially not the same grains they were before). You said that the computer might refill the pitcher, but... Would it? I mean, as you said, it massively depends on how it is coded (and, of course, no one would actually code it this way, this is an incredibly extreme example) but... To a robot, who can actually see the differences, is one pitcher of water exactly identical to the other? Maybe this water has .0001% more granite content? To us, "yep, that's water." To a robot, "er... no, those two are nothing alike. That one has more granite."
@smob0
@smob0 6 жыл бұрын
What if you allow the tea robot to observe the kitchen for a while, by giving it a few weeks of security video that a person has gone through and somehow flagged the desired/undesired world states. (EG it may be typical for people to leave a mess when they make tea, but you can flag it as bad). You could also have some threshold, where if the world around it is changing too much, it could ask for help or stop doing what it's doing.
@89sanson
@89sanson 3 жыл бұрын
How about minimizing the amount of energy required to complete the task? That should avoid all side effects that are avoidable, as it will always be more energy efficient to do the task without the side effect.
@tomatensalat7420
@tomatensalat7420 7 жыл бұрын
Isn't there still a problem when the prediction somehow is too far off or if an unpredictable change is happening, that can't really be predicted? Not sure if that's possible, but my approach would be to try to separate changes done by the AI and changes done by someone / something else.You still could try to predict changes to the world and try to minimize those, but this might take care of changes made by your collegue.
@SolomonUcko
@SolomonUcko 4 жыл бұрын
For cases where it's possible, what about simplifying the environment? For things that interact with the real world, it's probably not possible, but for anything digital, you could probably restrict its actions to a safe subset.
@SlackwareNVM
@SlackwareNVM 7 жыл бұрын
Some random thoughts: What is our "utility function" as human beings. It changes from person to person, but are we able to define the commonalities somehow and give a base definition? If so then we can give it (the AI) some UF akin to ours. Then the problem would be that the AI starts indulging in socially negative behaviours (greed for example) and (being more capable than the average human) it actually starts working towards those goals. Well, in that case are we able to somehow teach it concepts such as virtues and morality, not only to understand but follow?
@maximkazhenkov11
@maximkazhenkov11 7 жыл бұрын
We don't have a utility function. The thing about humans is that though we are intelligent, intelligence is only a small part of our properties, but it is the one that makes us powerful. Abstractions such as virtues and morality are only useful to human-like minds because they assume a huge amount of unspecified common sense.
@flymypg
@flymypg 7 жыл бұрын
What about explicitly discussing goal decomposition? Isn't "Make tea" actually a very high-level goal composed of an ordered sequence of other goals? What about training for "Travel safely to destination"? And that, in turn, would have its own decomposition. The key issue would then seem to ensure that decomposed goals don't conflict with the hight-level goal while still achieving their own goals. When this happens, either the high-level goal must be retried or deferred, or the lower-level goal must be redefined and retrained. So, wouldn't it be an unnecessary problem / complication if a high-level goal had to "worry" about the safety of lower-level goals? How should this be monitored / managed? At what point can / should we define "trust" in the safety domain?
@flymypg
@flymypg 7 жыл бұрын
I was hoping to trigger discussion concerning why creating a system by combining "safe" components won't work! The underlying problem is "emergent behavior": Parts that are "safe" in isolation can acquire new unsafe behaviors when combined. There are ways to mitigate such effects, but IIRC there is a proof that they cannot be completely eliminated in systems with sufficient complexity. That is to say, only "relatively simple" systems can be proven safe. So, if you want safety, keep your AI relatively dumb.
@ecocommuhippy
@ecocommuhippy 6 жыл бұрын
Would the AI eventually stop doing its task if it changed the world state too much and gained negative points e.g. robot makes tea but for a party of 10, it loses 10 lots of world changing points, because its used 10 mugs, 10 teabags and 10 loads of milk, sugar and water. Will it not do its function or only carry out the maximum amount for 'tea making' to gain some positive reward points.
@harold2718
@harold2718 4 жыл бұрын
After giving you the tea, the robot forces you to go back to your computer and rewrite the robots software, because that's what you would have done if the robot had done nothing (and you'd have concluded that it had a bug). It also puts back the bisquit that you took out, because you wouldn't have taken it if no tea had been made. No bisquits allowed.
@adelarscheidt
@adelarscheidt 7 жыл бұрын
5:20 hahahahaha
@marcotrevisan7137
@marcotrevisan7137 7 жыл бұрын
How is your blurritis doing? Hope you get better soon
@Cyberlisk
@Cyberlisk 2 жыл бұрын
My main question here is, how do you define how distant a state in the world is from another one? For example, if you have an AI in the banking business, a simple flip of a bit - a really tiny physical change - can mean a lot of money. The AI would need full understanding of human semantics to each state. There are millions of bacteria and viruses flying around, how can an AI know which one is harmless and which one can cause a global pandemic? As in chaos theory, very small changes in the state can cause huge differences later, which is very difficult to predict even for an advanced AI.
@Uriel238
@Uriel238 3 жыл бұрын
I suspect an AI would keep a catalog of all the things in its environment categorizing things into types, e.g. (in our household setting) other robots: avoid and don't interfere; Inhabitants: track, avoid and note wellness status; articles on kitchen counter: check for cleanliness / functionality and return to storage... dangerous objects not in use would be handled differently than fragile objects which would be handled differently than commonly used objects and so on.
@haydenmaines5905
@haydenmaines5905 7 жыл бұрын
Hey Robert, I have a question (as well as for the other viewers on here) for a solution that first came to my mind differing from yours. Let's say the robot is trying not to minimize ALL differences in the world, or simulate what the world would be like, but instead trying to minimize ITS differences in the world. Again, to the example of making tea and your colleague is making coffee - the robot considers that part of the dynamic, changing environment. If it trys to stop your colleague, it's now changing the environment. How can we break my solution? What are the ways it could go wrong?
What's the Use of Utility Functions?
7:04
Robert Miles AI Safety
Рет қаралды 65 М.
Empowerment: Concrete Problems in AI Safety part 2
6:33
Robert Miles AI Safety
Рет қаралды 66 М.
I Built a Shelter House For myself and Сat🐱📦🏠
00:35
TooTool
Рет қаралды 25 МЛН
когда достали одноклассники!
00:49
БРУНО
Рет қаралды 4,1 МЛН
Watermelon Cat?! 🙀 #cat #cute #kitten
00:56
Stocat
Рет қаралды 8 МЛН
Stop Button Solution? - Computerphile
23:45
Computerphile
Рет қаралды 477 М.
Is AI Safety a Pascal's Mugging?
13:41
Robert Miles AI Safety
Рет қаралды 370 М.
AI Safety Gridworlds
7:23
Robert Miles AI Safety
Рет қаралды 91 М.
The Attack That Could Disrupt The Whole Internet - Computerphile
9:50
Computerphile
Рет қаралды 1,5 МЛН
Training AI Without Writing A Reward Function, with Reward Modelling
17:52
Robert Miles AI Safety
Рет қаралды 235 М.
A.I. Tries 20 Jobs | WIRED
15:04
WIRED
Рет қаралды 691 М.
Why Not Just: Think of AGI Like a Corporation?
15:27
Robert Miles AI Safety
Рет қаралды 154 М.
"We must slow down the race to God-like AI": Ian Hogarth in the Financial Times
30:06
What can AGI do? I/O and Speed
10:41
Robert Miles AI Safety
Рет қаралды 117 М.
cool watercooled mobile phone radiator #tech #cooler #ytfeed
0:14
Stark Edition
Рет қаралды 8 МЛН
iPhone 12 socket cleaning #fixit
0:30
Tamar DB (mt)
Рет қаралды 21 МЛН
#miniphone
0:16
Miniphone
Рет қаралды 1 МЛН