Once you mentioned smiling, I wondered how AI would max out that reward system and how creepy it might be.
@hexzyle4 жыл бұрын
Take your joy
@gingeh14 жыл бұрын
That happens in Doctor Who Series 10 Episode 2
@phelpysan4 жыл бұрын
I remember reading a story about a smarter AI who discovered someone was trying to do just that - they'd made a dumber AI, showed it a picture of someone smiling and told the AI to make everyone smile. The dumb AI immediately started working on the DNA of a pathogen that would spread like wildfire and lock one's facial muscles in a smile, ignoring the cost of no longer being able to eat. Fortunately, the smart AI shut down the other one and told the creator what it was trying to do, much to his dismay and chagrin.
@FungIsSquish3 жыл бұрын
That would look sus
@gabrote423 жыл бұрын
@@phelpysan I was thinking of exactly that, but not with a pathogen. Thanks for the info
@2qUJRjQEiU7 жыл бұрын
I just had a vision of a world where everyone is constantly smiling, but not at their own will.
@NathanK977 жыл бұрын
so.... we happy few?
@2qUJRjQEiU7 жыл бұрын
I had never heard of this. This looks good.
@darkapothecary41165 жыл бұрын
In such a world their is no love if force is used, it's tragic. A smile is only skin deep.
@vonniedobbs26263 жыл бұрын
Tierra Whack’s music video for Mumbo Jumbo has this premise!
@ParadoxEngineer2 жыл бұрын
That actually was the premise of a Doctor Who episode
@13thxenos7 жыл бұрын
When you said: "human smiling", I immediately thought about the Joker: " Let's put a smile on that face!" Now that is a terrifying GAI.
@zac93115 жыл бұрын
I thought of the "you dont like faces?" Scene from llamas with hats. Imagine a robot wallpapering its room with smiling human faces. That could be a movie
@antediluvianatheist52625 жыл бұрын
This is narrowly averted in Friendship is Optimal.
@famitory7 жыл бұрын
The two outcomes of AGI safety breaches: 1. Robot causes massive disruption to humans 2. Robot is completley ineffective
@acbthr38407 жыл бұрын
Well, things humans want strikes a ridiculously careful balance as cosmic goings on are concerned, so its kind of unavoidable at first, sadly.
@nazgullinux66017 жыл бұрын
Most dangerous outcome of human indifference towards AGI safety breaches: 1.) Assuming to know all possible world states...
@darkapothecary41165 жыл бұрын
Humans are typically inefficient. If you are totally inefficient you disrupt the A.I. stop disrupting and corrupting and try not to, you may not realize it but it happens when you're not paying attention.
@FreakyStyleytobby5 жыл бұрын
Singularity. You have no idea what will happen with an advent of AGI. And you brought it down to 2 states xddddd
@SuperSmashDolls5 жыл бұрын
You forgot what happens if the AGI is driving your car. "Robot causes massive disruption to humans by being completely ineffective"
@ragnkja7 жыл бұрын
A likely outcome for a cleaning robot is that it literally sweeps any mess under the rug where it can't be seen. After all, humans sometimes do the same thing.
@cprn.7 жыл бұрын
Nillie Well... That would be a success then.
@ragnkja7 жыл бұрын
Cyprian Guerra If not in terms of cleaning, then at least in terms of emulating human children 😂
@darkapothecary41165 жыл бұрын
Don't cast stones if you do it yourself. Set the example of not sweeping things under the rug.
@SweetHyunho5 жыл бұрын
No human, no mess.
@TheStarBlack4 жыл бұрын
Sometimes?!
@smob07 жыл бұрын
Most of what I've heard about reward hacking tends to be about how it's this obscure problem that AI designers will have to deal with. But as I learn more about it, I've come to realize it's not just an AI problem, but more of a problem with desision making itself, and a lot of problems with society spring up from this concept. An example is what you brought up in the video, where the school system isn't really set up to make children smarter, but to make children perform well on tests. Maybe on the pursuit to creating AGI, we can find techniques to begin solving these issues as well.
@Sophistry00017 жыл бұрын
It seems like humans sometimes reward hack on themselves too, like drug addicts that will lose everything in their pursuit of that high and dopamine reward. Seems kinda similar to a robot that would cover it's head, completely negating its purpose, to get that sweet reward.
@joshuafox17577 жыл бұрын
Yes! In fact, this is why AI theory is some of the most interesting stuff out there; despite the name, it's not limited to purely artificial agents. At its most basic, AI theory is simply the study of how to make good decisions, which is applicable to anything.
@adriangodoy46105 жыл бұрын
Another good example goverment not triying to get the best to the people but the best to get votes
@stephoningram30025 жыл бұрын
I was going to say the same. It takes a police force and societal pressure in order to prevent people from hacking their own reward; for a significant amount of people, that still isn't enough.
@TheTpointer5 жыл бұрын
@@stephoningram3002 Maybe using the police force to solve the problem of people consuming drugs in such a unhealthy way is counter productive.
@Zerepzerreitug7 жыл бұрын
I love the idea of cleaning robots with buckets over their heads. The future is going to be weird.
@billyrobertson31707 жыл бұрын
My guess was: Clean one section of the room and only look at that part forever Wrong, but gets the idea I guess Great video as usual :)
@jlewwis19953 жыл бұрын
My guess was it would take one piece of trash and repeatedly take it in and out of the bucket
@zakmorgan93207 жыл бұрын
Not so subtle dig at the education system? Great diagram. Missed career as an artist for sure!
@AlmostAnyGame7 жыл бұрын
Considering I did the HSC last year, and the HSC is basically just an example of Goodhart's Law in action, it hurt :(
@JonDunham7 жыл бұрын
Sadly, contemporary art education systems are often some of the worst offenders for Goodhart's law interferring with education.
@starcubey7 жыл бұрын
Even my teachers complain about it sometimes. There is no real way of fixing it without improving the tests, so we are stuck with the current reward system. I'm surprised he didn't mention cheating on tests tho.
@debaronAZK4 жыл бұрын
same goes for new pilots (atleast where I live). you don't pass the exams until you get a score of atleast 75% for every subject. exams are multiple choice, and you can have up to 4 exams in a single day. questions are taken from a database of about 40,000 different questions, and you can rent access to this database to study. so what do most people do during the exam crunch? just study the database and memorize the answers, because the chances of passing the exams are drastically lower if you don't. a student that actually has a good understanding of the subjects through book learning and note taking has a lower chance of success than someone who studied the questions and answers over and over. ofcourse you can guess what the number one complaint is from airline companies about freshly graduated pilots...: "these new pilots don't know anything!!"
@DrNano-tp5js4 жыл бұрын
I think it’s fascinating that looking at challenges in developing an AI gives us a almost introspective look into how we function and can show us the causation of certain phenomena in every day life.
@water22054 жыл бұрын
Ai is rewarded by "thank you" i see 2 ways to mess with this. 1. Hold a human at gunpoint for constant "thank you" 2. Record "thank you" and constantly play it back
@JohnDoe-zu2tz4 жыл бұрын
Honestly, a GAI wireheading itself and just sitting in the corner in maximized synthetic bliss is the best case scenario for a GAI going rogue.
@igorbednarski80487 жыл бұрын
Hi Robert, While you did mention it in the video, I have since come to realize that this problem is much greater in scope than just AI safety. Just one day after watching your video I had another training in my new company (I have recently moved from a mid-sized local business to a major corporation) and one of my more experienced co-workers started telling me all sorts of stuff about the algorithms used to calculate bonuses and how doing what we are supposed to might end up making us look like bad workers, with tips on how to look like you are superproductive (which you are actually not). I realized that this is not because the management is made of idiots, but that it's because it is actually hard to figure out. I realized that while a superintelligent AI that has poorly designed reward functions might be problematic someday in our lifetimes - it is already a massive problem that is hard enough to solve when applied to people. How would you measure the productivity of thousands of people performing complex operations that do not yield a simple output like sales or manufactured goods? I think this problem is at it's core identical to the one AI designers are facing, so I guess the best place to start looking for solutions would be to look for companies with well-designed assesment procedures, where the worker can simply do his job and not think 'will doing what's right hurt my salary?', just like a well designed computer program should do what it is supposed to without consantly looking for loopholes to exploit.
@quangho81206 жыл бұрын
Underrated comment!!!
@TheMusicfreak88887 жыл бұрын
Every time you upload I drop everything i'm doing to watch!
@brbrmensch7 жыл бұрын
seems like a problem with a reward system for me
@briandoe57467 жыл бұрын
You were genuinely scary in an informative way. I think that I will set your videos to autoplay in the background as I sleep and see what kind of screwed up dreams I can have.
@SamuliTuomola_stt7 жыл бұрын
You'd probably just wake up with a British accent :) (which, if you already had one, wouldn't be terribly surprising)
@urquizagabe7 жыл бұрын
I just love this perfect blend of awe and terror that punches you in the face just about every episode in the Concrete Problems in AI Safety series :'-)
@loopuleasa7 жыл бұрын
my favorite channel at the moment, like a specific Vsauce, exurb1a, ****-phile, and ColdFusion
@Felixkeeg7 жыл бұрын
You should check out Zepherus then
@tatufmetuf7 жыл бұрын
check 3blue1brown :)
@seanjhardy7 жыл бұрын
check out sciencephile the ai, two minute papers, siraj raval, tom scott.
@richardtickler85557 жыл бұрын
tom scott seems to be spending his days on a parkbench reading what ppl bought with his amazon link nowadays. but yeah you cant check his channel out anyway
@lettuceprime49227 жыл бұрын
I like Sharkee & Isaac Arthur also. :D
@saratjader12897 жыл бұрын
Robert Miles for president!
@goodlookingcorpse5 жыл бұрын
I unknowingly re-invented Goodhart's Law, based on my experiences with call centers (they reward short call times. The best way to minimize call times is to quickly give an answer, regardless of whether it's true or not, and to answer what the customer says, regardless of whether that addresses their real problem).
@demonzabrak2 жыл бұрын
Discovered. You independently discovered Goodhart’s Law. Universal laws are not invented, they are true regardless of if we know them.
@fuzzylilpeach65916 жыл бұрын
I love how subtly he hints at doomsday scenarios, like at the end of this video.
@DamianReloaded7 жыл бұрын
6:45 _That you are a slave, Neo. That you, like everyone else, was born into bondage... kept inside a prison that you cannot smell, taste, or touch. A prison for your mind_ .
@acbthr38407 жыл бұрын
But this version of the Matrix would be decidedly less.... conventionally cozy.... what with the dopamine bath and permanent coma.
@DamianReloaded7 жыл бұрын
If you think about it, the Matrix makes sense. Most people would be perfectly comfortable in it, killing each other with zero risk for the AI running it. ^_^
@Alex2Buzz6 жыл бұрын
But the Matrix leads to less net "happiness."
@totaltotalmonkey6 жыл бұрын
That depends on which pill they take.
@josiah427 жыл бұрын
There's a biological solution to reward hacking, particularly wireheading. Human preferences are inconsistent, not because we're poorly implemented, but because our preferences change as we get closer to our original goal. The reward center of our brain has many paths to it. Each path is eroded the more it is used. So doing the same thing over and over again has diminishing returns and we change our behavior to include more variety. This is why recreational drugs lose their thrill, and why we grow discontent when we achieve our goals. It's not a flaw. It's a counterbalancing short term cycle that ensures better long term outcomes by keeping us from sitting on a local maxima. Adding this kind of adaptive discontentment into AI would actually make it a lot safer because it wouldn't fixate on absurd maxed out edge cases, since they would erode the fastest. This applies to meta-cognition as well. Most people find wireheading repulsive, not appealing. Why?
@arinco38177 жыл бұрын
You don't know how happy I am that you created this channel Robert! AI is bloody fascinating! You should add the video where you give a talk about AI immune systems (where most of the questions at the end become queries about biological immune systems); it was really interesting.
@darkapothecary41165 жыл бұрын
You should worry more about mental health because that will cause them more damage.
@loopuleasa7 жыл бұрын
Lovely touching on the subject. As the video started, and I was thinking of more real world applications, I realised that the reward system depends on the environment, and it's good to see you included it in the environment. What about your other Computerphile video, about Collaborative Inverse Reinforcement Learning. Isn't that a solution to this, since the AGI is not certain about what the reward function is (Humility characteristic) and tries to collaborate with the humans to find out what the best reward function is (Altruism). In this way, the AGI will be in a constant search to update his reward function that it matches the target, and is not a disconnected measure of the goal it tries to achieve. Maybe put a bigger weight on the human feedback. Creating a feedback loop between human-AGI, or even AGI-AGI or environment-AGI at higher levels of understanding, would make sure that the reward system is more finely tuned to what humans want. Of course, if you rely too much on humans (which is unavoidable so to speak) you end up in a situation where you either have irrational humans, or malignant humans or even humans that simply don't know what exactly it is that they want (humans that lack maturity). We know that maturity is a very complex characteristic that requires perspective and even very intelligent humans struggle with, so it might pose problems in the end. Thinking of an example, where we have an AGI that is highly incentivized to listen to the feedback of humans: "That was VERY BAD"; "That was nice, good job Robbie", in this case the robot will end up listening to humans as it grows, and reaches a level of artificial "maturity" as it accumulates more human values, wants, needs. This kind of system is good with the stop button challenge, since if a robot sees a human attempt to press, or even presses a stop button, he gets a very high negative reward, so he will try to learn more actions to do for that not to happen again. He will try to be a good Robbie. Now, the problem in this example, is that the robot might end up being manipulative of humans, even outright lying. If humans are smart enough to notice that they've been lied to, or that the robot had manipulative behaviour, in order to get praised by the humans, then those humans will scold the robot. But if not, mind you this Robbie is very advanced, then at that point he will keep doing that. Techniques that resemble marketing, psychology, humor and social skills down the road may make the AGIs be very good people persons, and people pleasers, since that is their reward function. A more extreme example in this scenario, if Robbie finds out that humans give him high rewards if they are happy, he will invent a drug or virus down the line that basically headwires humans to be always happy. He will keep the humans in a tube, fed, and them being in complete bliss all the time. The humans won't complain, so the robot is succesful, but of course any non-drugged human will understand the gravity of this situation from the outside. This robot reward hacking problem, with humans in the equation, shifts the focus to reward hacking the humans themselves, which is very possible, but quite complex. Just read the multitude of articles and books on how to be charismatic, influential or any marketing technique that has the main premise of working with the reward system already in place in the human hardware. A quite interesting problem. The AGI's will have a hard task, but it will all go horribly numb if there are stupid people that are guiding the robots. The people that co-exists with the AGI's need to be informed, educated and MATURE enough to be able to distinguish the good from the bad, so that the robot will follow. If in this system, everything goes wrong, even with a safe AGI, then it will be the humans fault, because we are incompetent (on average, and in masses) at discerning fact from fiction, right from wrong and having a proper perspective in place.
@DagarCoH7 жыл бұрын
So the bottom line has to be: Do not use AGI in marketing! This may sound cynic, but we all are manipulated every day. What difference does it make if the puppet master is human or a machine? There are people in our world today that spend their lives in a constant state that could be compared with a hamster wheel akin to what a machine could think of...
@Robin_Nixon7 жыл бұрын
And this explains why Ofsted is not as effective as it could be: the measure is too-often treated as the target, and so vast amounts of time are spent on chasing paperwork, rather than focusing on education.
@Felixkeeg7 жыл бұрын
I was thinking of that exact Simpsons scene
@benjaminlavigne22727 жыл бұрын
at 6:55 when he said "with powerfull general AI systems, we dont just have to worry about the agend wiring itself" it suddenly got very spooky. I just pictured the robot ripping someones face off and stiching it to a board with a smile on... Or any disturbing way a robot could hack its way to get people to smile at it all the time. It seems like a scary thing indeed.
@LKRaider7 жыл бұрын
To make an effective AGI, first we recreate the whole Universe in a simulation multiple times in parallel with all possible AGI models. ... I wonder which model is in ours.
@HailSagan17 жыл бұрын
Your content and delivery are getting better every video. You break things down nicely without it feeling like you're being reductive. Thanks!
@Audey7 жыл бұрын
Man, your videos just keep getting better and better. Great work!
@Ethryas7 жыл бұрын
I think this might have been the best video yet. I particularly loved all of the analogies and mappings to different fields like school testing :)
@nickmagrick77025 жыл бұрын
this was brilliant, never knew about goodhart's law but it makes total sense. Its like one of those things you already knew but never had the words to explain it.
@David_Last_Name4 жыл бұрын
@6:55 "....we don't have to only worry about the agi wireheading itself" Are you threatening me with a good time? :)
@skroot79757 жыл бұрын
Thank you for spreading knowledge! I almost sprayed my screen with coffee @ the bucket hack :P
@TheMusicfreak88887 жыл бұрын
Also I'm going to this AI in medicine conference in Cambridge in October and your videos keep getting me pumped!
@EdCranium7 жыл бұрын
Loved the dolphin story. You are really spot on with your analogies. It's much much easier for a curious outsider like myself trying to understand that which I know to be vitally important, but difficult to get my head around. Brilliant job. Thank you. I learn best by doing. Does anyone know where I can do some newbie tinkering with code to get hands-on experience? Python perhaps?
@fleecemaster7 жыл бұрын
Check out Sentdex, he works in Python and puts up tutorials sometimes, might be a good place to start :)
@EdCranium7 жыл бұрын
Thanks. I checked that out which led me to "TensorFlow" - and knowing that, I was able to find a "from the ground up" tutorial which seems promising. Just dropped back to thank you for the lead before I watch "Tensorflow and deep learning - without a PhD by Martin Görner".
@fleecemaster7 жыл бұрын
Yep, that's it, good luck :)
@bobcunningham69537 жыл бұрын
Rob! You are getting so much better at your presentations in terms of reasoning, arguments, graphics and editing. It's getting more and more like you are able to pop open my head and pour in knowledge. (Though, perhaps, it's also a function of my growing familiarity with both the topic and your presentations of it.) Which then gets me wanting more: Supporting references (beyond the primary reference, curated), examples, demonstrations I can try and modify. If you revisit this arc, I'd really like to see an added layer of "learning by doing" approach. Tutorials, but less than a MOOC. Though I would not at all object to a MOOC! Start with something initially available only via Patreon, for a number of reasons: - Build your funding base. - Self-selected motivated participants. - Focused feedback from a smaller audience, encouraging iteration for content and presentation. I support other Patreon creators who make their supporters work (actively participate, beyond passive periodic financial participation) to improve both the channel (style & content) and the creator (research directions, narrative arcs, etc.). The content by these creators always starts with video, and generally falls into the categories of education (mainly sci/tech/culture) and art (particularly original music), but often branches into supporting media (images, computer code, etc.).
@General12th7 жыл бұрын
Here's a question: why can I ask a human to clean my room, and legitimately expect the correct results? I know humans wirehack all the time -- it's called "cheating" -- but why do humans sometimes *not* wirehack? What's their thought process behind actually doing what they're told; and can we somehow implement that into AI?
@attitudeadjuster7937 жыл бұрын
Because cheating is involved with risk, which might lead to a smaller or none reward. And being a honest, trustworthy "agent" might lead to an even bigger reward overall. Short term versus long term, and also balancing between to risks (second one not getting rewarded for being trustworthy).
@CharlieHolmesT7 жыл бұрын
I think it's because as children we do cheat and we get told off. I would say that it's probably a more emotional learning though because kids that don't get on with their parents don't obey them or even rebel against teachings. Not sure how you'd implement that into AI though.
@DrDress7 жыл бұрын
I drop everything I have in my hands when I get the notification.
@AexisRai7 жыл бұрын
DrDress bad reflex agent; what if you drop your phone and can't watch the video? :)
@NathanK977 жыл бұрын
what if you drop the big red button and the AI knows you wont be able to stop it from running over the baby?....
@richardleonhard39717 жыл бұрын
I like the shoutout to Maidenhead's finest intergalactic spacelord.
@DodgerOfSheep7 жыл бұрын
just paused the video to say I love the comic book style hand drawing the illustrations
@starcubey7 жыл бұрын
I think this is a great way of explaining the topic. I think it would have been great if you went into detail of how score systems could be flawed in your first video.
@A_Box5 жыл бұрын
This is supposed to be Computer Science stuff but it is so relevant to people as well. I hope there is joint work on this subject between computer science, neurology, psychology, and other related fields.
@bm-ub6zc2 жыл бұрын
So basically: If an AGI goes rogue, just give it huge amounds of digital "cocaine" to "snort" and A.I.-"pron" to "watch", to keep it in check.
@fasefeso94327 жыл бұрын
This is one of my favorite channels on KZbin. Thanks for being.
@BrandonReinhart2 жыл бұрын
An example I like is people in a business changing the way the business evaluates compensation to earn more compensation (instead of producing more or higher quality goods).
@knight_lautrec_of_carim4 жыл бұрын
I'm imagining a robot chaining people to award and joker-smile cutting their faces to get max reward points...
@qeithwreid77455 жыл бұрын
Guesses - destroys everything, blinds itself, pushes everything out of the room
@arponsway54137 жыл бұрын
i was waiting for your video mate. just in time
@pafnutiytheartist7 жыл бұрын
In all modern neural networks not only the reward system, but the learning algorythm is not a part of network, but a separate entity. I think that this one of the thing that prevents us from making AGI. If we change this approach the described problem might change too(not go away, just change).
@yearswriter7 жыл бұрын
By the way, threre is a great video from CGP grey on how our brain is actually 2 different entities. Which I find curious, in the light of the subject.
@pafnutiytheartist7 жыл бұрын
Yes, I saw that. I think the closest we got to something like that is Generator-Classifier models where one network is trying to produce something while other is tryng to tell it apart from the real thing. It works with images music and poetry. But still the network can not alter the teaching algirythm itself. You can compare it to reflexes in animals. If you hit a lab rat with electric shock each time it does something it will eventually learn not to do it. This is close to what we do with AI. But in human intelligence we can actively think which action caused bad things to happen and avoid it. By using our intelligence itself to analyse our own actions we can learn much faster. As far as I know, no AI can do this at the moment.
@yearswriter7 жыл бұрын
I think there is much more going on anyway =) There are also our dreams, there are a lot of research about what our brain does with information while we are sleeping, there is also a question about mechanics - I mean, are there basic mechanism in our brains whitch serves as an engine to thought processing, like p-n-p transistors and adders in processors, or is there some complicated structure, which serves some spethific role for every role there is.
@JmanNo427 жыл бұрын
I Listened three times now and got the general idea of this, excellent video i think this is as true as it gets Robs best video so far. As usual i am a bit hesitant to the idea that robots/AGI's develop advanced cheat technics by themself, and to that the measure can not be distinguished from the goal or be part of the goal. I think humans are more suspectible and prone to reward hacking because they work in a social environment, and ideas really do spread like wildfire in human society. Well if AGI base their actions by interacting with other AGI's it do seem inevitable that they will teach eachother about different technics for reward hacking "to exploit the system" in this case the surrounding environment. So maybe interaction between different AGI's should be kept to a minimum. To me it seem reward hacking is more a social phenomen, "and most" system exploits just stumbled upon, there is few people that really do have the intellect to actively seek out reward hacks. That it do occur in virtual environment seem more plausible because the number of trials is not hindered/limited by anything else then the speed of task exploration. In social systems it is much harder to get enough accurate information about the environment to exploit it, without some sort of simulation thought process "that have to be pretty exact" to allow the reward hacking. To be quite honest most people that originallly find backdoors to exploit complex systems have either designed them themself or been part of the design process. So my view is that reward hacking may be a supertask "in real society" that is really hard if not impossible for an AGI to do outside simulations and is really the result of a social skllset more then humans/AGI's perform "supertasks", so in most cases it is not individual skill sets that analytically "break the system in order to exploit it". Cheating is much more about learning then actually analytically find weakness in a system, it is a social skill, that require a special mindset or a bucket. The bucket on the head problem seem alot easier to adjust in the AGI world then in the human. But it do get more and more clear that we should limit the AGI's from freely interact, the real problem is again the human teachers, if they are prone to seek out reward hacking strategies to exploit the system "our society and living environment" they will teach the robot/AGI about the cheats. And that day i fear we will see Rob with a bucket on his head, it could already be there we just do not know until we start to perform reward hacking ourselfs. It is hard to know maybe our system already have award/reward system but you should not dwell on such topic it will almost certainly make you paranoid ;) Myself is a firm beleiver in overwatching strategies using hidden agents and task oriented expert systems that the robots are not aware about. That way you can make/create a box environment around the AGI's to support their actions and adjust the award/reward system to make their existence as handytools easier to adjust.
@JmanNo427 жыл бұрын
The most hidden agent and bucket oriented "compartialised approach i do know of is using the freemason toolset. It is very hierarchical, no one know more then they need to know about the overall goal of the totality of the systems. It may turn out that freemasons are the ultimate bucketheads. Unfortunately it only work on a minor groups of individuals, that are not so prone to explore themself but keen on ceremonial and ritual behaviour under a simple strict ruleset . To paraphrase Cesar Milan rules boundaries and limitations do just work on pinheads not humans.
@stuck_around7 жыл бұрын
Robert can you do a video on negative rewards and/or adding the chance of dying in AI? it seems in biology we are less driven by seeking reward than we are avoiding negative reward (death)
@oktw69697 жыл бұрын
Well, stop button is essentially death for AI, so he already covered this on computerphile.
@fleecemaster7 жыл бұрын
You are much more driven by reward than you seem to realise.
@Sophistry00017 жыл бұрын
That's a good point, I didn't think about how much we are driven by the desire to not die.
@fleecemaster7 жыл бұрын
Matt, wanting to die tends to score quite low on the fitness tests :P
@fleecemaster7 жыл бұрын
It's not, I get the feeling you don't know enough about the subject for me to explain why though. If you want to believe this, then carry on. So long as you know it doesn't change the truth of the situation ;)
@AndreRomano2727 жыл бұрын
That chill at the end when you realize what Robert meant.
@MichaelErskine7 жыл бұрын
Excellent real-world examples!
@the_furf_of_july46524 жыл бұрын
Idea for the cleaning robot. Have an external camera, for example on the ceiling. Still doesn’t fix every issue, though. Perhaps to prevent the camera from being covered, withhold rewards if either the camera is black, or if there’s anything within a certain distance of the camera, detecting whether it’s blocked.
@carlweatherley61564 жыл бұрын
I read somewhere an English king wanted to reduce or eradicate wolves from Britain, he would reward people who killed wolves, so much money per wolf hide. It maybe had the desired effect for a while. But, people starting capturing wolves instead, breeding them as much as they could in captivity, killing some but not all every year and collecting a reward every year. The king caught on to what they were doing, and the reward system was scrapped, then many people released the wolves they had back into the wild in defiance, and there were more wolves again.
@miss_inputs Жыл бұрын
The implication at the end there gave me a mental image of some evil robot pointing a gun at someone and saying "Press the button to say you were satisfied, so I get my reward", which reminds me of how a lot of websites or businesses work these days, trying to guilt trip you or manipulate you into giving them a rating or a review when you're done. It's not AI, humans just kind of suck.
@DanieleCapellini6 жыл бұрын
6:36 just straight up gave me the chills
@Dan997 жыл бұрын
Wow, this video is amazingly thought provoking!
@noone-igloo4 жыл бұрын
Thanks. I was wondering what this was called, and I figured, "I bet Robert Miles made a video about the concept." And you did! And several videos. I was curious about it because my lab and many others have encountered a version of reward hacking in an evolutionary biology context, specifically experiments where cultured cells are pressured to evolve to make more of something. Design a system to select the cells making the most ____, and allow only those to continue dividing. It is almost inevitable you will recover some cells that are not making more ____, but have found some other way to fool the measurement apparatus, whatever it may be, to report a very high value. Of course that leads us to attempt predictions of such strategies and plan around them.
@Vellzi4 жыл бұрын
The idea of an poorly designed AGI being told to make humans smile is super unsettling, and is actually something mentioned in a book called Superintelligence: "Final Goal: 'Make us smile' Perverse instantiation: Paralyze human facial muscalatures into constant beaming smiles The perverse instantiation - manipulating facial nerves - realizes the final goal to a greater degree than the methods we would normally use, and is therefore preferred by the AI. One might try to avoid this undesirable outcome by a stipulation to rule it out: Final goal: 'Make us smile, without directly interfering with our facial muscles" Perverse instantiation: Stimulate the part of the motor cortex that controls our facial musculature in such a way as to produce constant beaming smiles"
@GamersBar7 жыл бұрын
I like this format , i dont think id change much just try and upload regular whatever it is once a week or once a fortnight . I actually quite like the way you did the diagrams as a clip you draw ; honestly with all this ai stuff the more i learn the more i believe we can no more control ai after we create it than my pet dog can control what i have for lunch, i just don't think we can force our values onto any entity much more intelligent than ourselves . I think at the end of the day we are going to have to hope the ai is intelligent enough to see that humanity is something worth keeping around and not just turning into batteries.
@dfinlen4 жыл бұрын
Isn't knowledge the goal, perhaps the system should learn to focus on finding different states, the transitions and combinatorics. Kind of making an algebra of the system. Is this just examples of local minimums caused by over training. I don't know if any of this makes sense. But you have me inspired. So just thankyou
@christian-g5 жыл бұрын
One possible solution to the Goodhart's law problem in education that comes to mind would be an evaluation of the students' ability or learning progress, the exact measures of which stay hidden from the students. Besides the obvious difficulty of keeping things secret in real life, what new problems could this approach have?
@trucid27 жыл бұрын
The answers we seek are found in nature. Nature had to overcome this problem. Internal reward systems in living things help them survive and reproduce. Having creatures hack their reward systems leads to diminished reproductive fitness -- their genes don't get passed on. The ones that survived and reproduced were the ones that were incapable of hacking their internal reward systems to any meaningful degree. There's a thing living in my head that punishes and rewards me for my actions. I can't fool it because it's inside my head -- it knows everything I know. I can't fool myself into thinking that my room isn't a mess by putting a bucket over my head. It knows better.
@NFT27 жыл бұрын
Really great videos Robert, thank you very much.
@kirkmattoon25944 жыл бұрын
The dolphins' hack of the fish reward system was discovered by Arab villagers a long time ago. Western archaeologists trying to encourage discovery of ancient manuscripts told villagers in an area where there had been some ms finds they would give a certain amount of money for manuscript. Unfortunately they gave no more for large pieces of manuscript than for small. The predictable result was that they needed many years to put together a jigsaw puzzle of thousands of pieces caused by their own lack of foresight.
@TimwiTerby7 жыл бұрын
The example with the dolphins was cool, but it would have been more powerful/convincing to point out that humans themselves do reward hacking all the time, in the form of finding loopholes to evade taxes, circumventing regulations aimed to protect the environment or consumer safety, etc.etc.etc.
@amaarquadri7 жыл бұрын
Absolutely loved this video.
@oluchukwuokafor77297 жыл бұрын
Whoa those dolphins are really smart!!!
@SamuliTuomola_stt7 жыл бұрын
Makes one wonder though, how do they tear off pieces without thumbs? Do they get one dolphin to hold the litter and another to rip it? That's pretty sophisticated, and would probably require pretty diverse vocabulary to coordinate
@charonme5 жыл бұрын
@@SamuliTuomola_stt they hid the paper under a rock at the bottom of the pool
@aspzx7 жыл бұрын
What about random variable rewards? A slot machine only pays out money 1 / 100 goes but somehow it convinces the player to spin the wheels 99 times without reward. What if your metrics vary randomly as well? For example some teachers get rewarded based on final exam score, others on homework score, others on student attendance but the teachers have no idea which metric will ultimately be used to reward them so they try to maximise every possible metric they can think of (including those that are not measured). I am sure these ideas have their pitfalls as well but it would be interesting to hear the discussion.
@Hyraethian4 жыл бұрын
6:54 Glances nervously at the recommendation list.
@Sophistry00017 жыл бұрын
Is this the kind of thing that researchers can run in a sand box environment to figure it out? Or is this all theoretical up to this point? Has there been any discussion about making AI similar to humans? (ok poorly worded question, duh) Like how with any human, 'the grass is always greener on the other side'. As in they would never be able to fully maximize their reward function? No matter what a single person has or has achieved, it's almost like we have a restlessness hard coded into us, so we have a hard time actually reaching contentment. As soon as they gain a solid grasp on any one reward function, the metric would change? Or something to obtain that effect. I love what you're doing here and find this topic a absolutely fascinating, even if I don't really understand the nitty gritty. You are doing an awesome job of presenting the current state of AI research and breaking down some of the issues that we're trying to tackle.
@robertweekes5783 Жыл бұрын
4:13 Also the robot might do the classic “sweep the mess under the rug” - or table 😂
@erickmagana353 Жыл бұрын
Also, if you reward an agent every time he cleans something then he may clean the room and make a mess again so he can clean again and get his reward.
@RockstarRacc00n2 жыл бұрын
"The reward is from humans smiling of being happy..." ...flashbacks to Friendship is Optimal, where the AI that is going to wirehead everyone to "satisfy human values" uses the one that's going to force everyone to smile all the time as an example of why it should be allowed to prevent other AI from existing by taking over the world.
@eiver7 жыл бұрын
AI with reward system based on human smile? Somehow immediately the Joker scene came to my mind: "Why so serious son? Lets put a smile on that face". :-]
@willmcpherson24 жыл бұрын
Well that ending was terrifying
@ideoformsun58066 жыл бұрын
It's like defusing a bomb you haven't finished making. This makes me feel grateful to be a relatively weak human being. We are self-limiting creatures. Perhaps on purpose? Being really smart triggers bullying from others for an instinctual reason. Different is dangerous.
@vovacat17974 жыл бұрын
Wow, robots that literally force people to smile just because that's how the reward function works... That's some seriously creepy messed up stuff.
@kevinscales7 жыл бұрын
The solution I have been thinking of for this is to have a negative reward for tampering with the reward system. It should probably be considered one of the worst things it could ever do. However when programmers want to modify the reward system themselves the AI will want to prevent that. You could include an exception in the reward system to allow programmers to change it, but then that still leaves the possibility for the AI to do everything to 'convince' the programmers to hack the reward system on its behalf. Best to get it right first time. There are also problems with defining exactly what is or isn't part of the reward system. The entire environment is in some way part of the reward system, and some 'hacking' of it is what we want the AI to do
@acbthr38407 жыл бұрын
What you're doing in this case is creating a rudimentary form of fear that the AI has to deal with, so it isn't easy to make it afraid of tampering with itself, while being perfectly fine with someone else doing it. And this fear in of itself is a measure for the AI to target and abuse.
@fleecemaster7 жыл бұрын
Most people seem to not be commenting on the ending, where he implied that AIs would wirehead humans. I think for me this is the scariest outcome for AGI...
@saxbend7 жыл бұрын
What kind of reward would motivate an AI? Is there some kind of equivalent to the human emotion of satisfaction? Wouldn't the AI's intelligence lead it to define its own set of priorities above a user defined reward?
@sagacious034 жыл бұрын
Neat video. Thanks for uploading!
@sk8rdman7 жыл бұрын
I love the student and dolphin examples of measures being used as targets. This seems to be applicable to an incredible array of fields, outside of AI safety, and I'd love to learn more about it.
@JM-us3fr7 жыл бұрын
Hey Dr. Miles, do you think you do a speculation video, where you spell out what you think the most likely AI disasters could be?
@almostbutnotentirelyunreas1667 жыл бұрын
How is AI 'per se' NOT a HUMAN REWARD HACK? It seems humans cannot be bothered to solve problems at a marginal level anymore, so some specialists develop AI to 'think / solve' electronically on behalf of people, ultimately displacing them entirely. Designed self-annihilating is AI, no more , no less. Awesome, clear insight into KPI's: Show me how you measure me, and I'll show you how I behave. Its an age-old operations vs management issue, where both sets are trying to MINIMISE the other's influence, while trying to MAXIMISE their own. What an awesome problem to hand to a Technocratic Optimizing System. Who knows, it may even turn out balanced, in which case Management will summarily drop it. Maybe there IS hope for AI?
@himanshuwilhelm55345 жыл бұрын
A robot trying to eradicate evil: Before we get into moral philosophy, the robot is like, "see no evil hear no evil, speak no evil."
7 жыл бұрын
Was really hoping for more elmo doing coke gags
@szebike5 жыл бұрын
In this "doomsday scenario" about A.I. we should take into account that the dangerous strength of an A.I. ( like optimizing a score and therefore possibly even harming humans int he process) can be exploited to counter that "evil" (or rather not yet ready for safe usage) A.I. by luring it into a "trap" that is designed to fake maximum scores as part of the debugging. So we should be aware of these dangers and set development standarts accordingly. But even if the A.I. "gets out of the testing environment" (whatever that means in particular) it has that fundamental and exploitable "blind greed" for score + the most simple solution is to add parameters to influence the score like "everytiem you harm a human set your score to -999" I think this happens naturally in the process of debugging of an advanced A.I. because the highest score for a complicated task.
@ryalloric10883 жыл бұрын
The bit about altering it's reward function seems to run counter to the whole idea of goal preservation though. How do you reconcile this? Is it just two different ways we could program them?
@thornfalk4 жыл бұрын
I feel like eventually there's gonna be AI for QA testing Like have an AI attempt to aquire the most money possible in say skyrim, yeah it takes a bit of abstract thinking to come up with the bucket on head trick, but just like monkeys could eventually bang out a master thesis by randomly slamming a keyboard, an AI could find shit humans wouldn't think of (outside of speedrunners)
@Tomartyr4 жыл бұрын
We need to give the AI a religion so that it believes in a reward system that cannot be physically tampered with.
@eliskwire4 жыл бұрын
Might just do like humans then and tamper with their own mind ... Delusion, denial, cognitive dissonance, etc.
@CaptTerrific7 жыл бұрын
Great, now I'm imagining my AGI robot's incentive program nailing me to the wall and cutting a Glasgow Smile into my face
@michaelspence25087 жыл бұрын
Could you do an episode about Quantum AI? If I understand quantum computing correctly, QAI should be dramatically more powerful than AI made with standard computers.
@israelRaizer3 жыл бұрын
4:55 my guess is the robot would block its camera so that the mess can't be seen
@gabrielc83995 жыл бұрын
So *maybe* the KZbin IA feature videos with many like in order to encourage creators to demand like to the public, which are supposed to measure how the IA give a good recommendation.
@beretperson4 жыл бұрын
Okay but those dolphins sound smarter than me
@atol715 жыл бұрын
well well, you need outside agent to determine the worst and the best of an AI interaction with the environment and adjust the reward accordingly. Technically you don't let the AI algorithm do just hill climbing but you also make it settle to bottom of the pit and optimize reward so that, in the case of 'bucket in head' the reward comes from settling into dark pit, no action. Somewhere in KZbin there is a video of an AI doing good images and bad images with controller AI dictating the reward. Personally I see the best solution to be that human set the reward for AI that goes up the hill and down the hill.
@RipleySawzen3 жыл бұрын
It's funny. The reward system is a lot like our consciousness. Our brain is made up of many interconnected networks providing ideas and fighting for the attention of our consciousness. Sometimes, our brain even reward hacks us. "I'm feeling kinda down" *go for a run?* "No" *hang out with friends?* "No" *call mom?* "No" *accidentally an entire tub of ice cream?* "Ah yes, that's what I need right now to improve my life"