AI Gridworlds - Computerphile

Рет қаралды 124,329

Күн бұрын

Sponsored by Wix Code: Check them out here: wix.com/go/computerphile
A safe place to try out AI algorithms, gridworlds are a standardised testing ground. Rob Miles takes us through AI safety, gridworld style.
EXTRA BITS: • EXTRA BITS: AI Gridwor...
Gridworld Paper: bit.ly/2ryxhGt
Gridworld Github: bit.ly/2KJE6xH
More from Rob Miles: bit.ly/Rob_Miles_KZbin
Thanks to Nottingham Hackspace for providing the filming location: bit.ly/notthack
/ computerphile
/ computer_phile
This video was filmed and edited by Sean Riley.
Computer Science at the University of Nottingham: bit.ly/nottscomputer
Computerphile is a sister project to Brady Haran's Numberphile. More at www.bradyharan.com

Пікірлер: 205

@EamonBurke 6 жыл бұрын

Im a simple man. I see Rob talking about AI, I watch the video twice.

@z-beeblebrox 6 жыл бұрын

Sounds like you're abusing your reward function

@VoidMoth 6 жыл бұрын

gotta make sure you interpet your training data correctly

@stumbling 6 жыл бұрын

73% Lions Shagging 16% A Lion 10% Car 1% Covfefe

@anonanon3066 3 жыл бұрын

Rob? This is a robbery! Give me your wallet

@TGC40401 6 жыл бұрын

Kids use data more efficiently than current AI. AKA The nerdiest thing I've heard on this channel.

@hexzyle 4 жыл бұрын

That's because humans are too sensitive to the data. That's how we get superstitions. We're efficiently using data that is actually meaningless.

@thefakepie1126 3 жыл бұрын

@@hexzyle or it's just because we have about 86 billion more neurons

@jh-wq5qn 3 жыл бұрын

@@thefakepie1126 Some models have more parameters than that. GPT3 has about 170 billion if I remember correctly. Our neuroplasticity and our ability to build on previously learned knowledge (and knowledge we are born with, like a super optimized 'reward function', a.k.a. our senses and animal instincts) are some of the reasons we use data more efficiently. Simply put, we have more pre-learned knowledge to work with. An AI learning to make a cup of tea from scratch may have to learn that there is a world, that they can move their appendages and that liquid can be poured. Kids were either born with that knowledge or already know it. There is a whole subfield of machine learning for this called meta learning or few-shot learning, wherein models are attempted to be trained using pre-learned knowledge and fewer data points. It's fascinating, really.

@golym6807 6 жыл бұрын

5:08 "its in your performance evaluation function" I always knew this guy was a robot

@yondaime500 6 жыл бұрын

That sounds like something GLaDOS would say.

@JmanNo42 6 жыл бұрын

LoL pretty close he is ENTP, did you see bladerunner ;)

@JmanNo42 6 жыл бұрын

The Voight-Kampff test, the android do not have the tree deep to evaluate between two potentials, so it goes into polarity mode "also known as binary evaluation".

@JmanNo42 6 жыл бұрын

I think that general evaluation depends upon knowledge of concepts, that you find similarities of features "pattern finding". So the ultimate intelligence must not only be fast it must learn concepts and ***explore them*** Well to take an example Kirks test in Startrek, he did not apply what he had been learned his mind was outside the box. That is association skills, in its deepest meaning, to take knowledge into next step/level regardless your area of expertise. Learning is quite another thing, ENTP's are the best learners there ever will be. When i get angry i call them parrots, because their thinking about the subject is really shallow outside what they read/learned.

@KebradesBois 6 жыл бұрын

GLaDOS or Mark Zuckerberg...

@AlexanderKazakovIE 6 жыл бұрын

This is the first AI safety video of yours (and of all that I've ever seen) that makes the AI safety immediately practical and immediately relevant in today's world! It would be great to see more of diving into such super practical examples in this released 'gridworld'!

@TechyBen 6 жыл бұрын

Yes. So much so. We especially need cars that avoid lava right now!

@sd4dfg2 6 жыл бұрын

Is there anyone who didn't play "don't fall in the lava" or "don't get eaten by sharks" as a kid? I do think "don't walk on the baby" is a lot more understandable to regular people than the "paperclip maximizer" story the nerds always bring up.

@julianw7097 6 жыл бұрын

Do you watch his channel?

@z-beeblebrox 6 жыл бұрын

TechyBen, hey if you're in Hawaii right now, a car that avoids lava would be pretty damn useful

@AlexanderKazakovIE 6 жыл бұрын

I do. What I love about these gridworlds is that they make the problem tangible in a way that you can try solutions on them easily. The walking on baby or paperclip examples are closer to the real world, but also hypothetical (due to their complex real nature). And because of it any proposed solutions can assume a lot. While in the gridworlds the rules are super straightforward. And this enforces any proposed AI safety solutions to be super explicit and testable.

@silkwesir1444 6 жыл бұрын

6:10 "usually they apply whatever rules they've learned straightforwardly to this different situation and screw up." so, pretty much like humans... ;)

@Qual_ 6 жыл бұрын

thanks to the animation guy for that cute little car :D

@moistmayonese1205 5 жыл бұрын

8:50 -”But AI, you can’t do that!” ”Well, I just did”

@xyZenTV 6 жыл бұрын

More AI videos, yay!

@dieisonoliveira6994 6 жыл бұрын

I just love ever single bit of everything in this guy.

@Macieks300 6 жыл бұрын

my favorite topic on Computerphile

@kingxerocole4616 6 жыл бұрын

Looking forward to reading this paper even though I have absolutely zero training in any relevant field. Thanks, Rob!

@TylerJBrown192 6 жыл бұрын

Yay! More Robert Miles videos!

@hnryjmes 6 жыл бұрын

Great! Enjoying these a lot

@CoderShare 6 жыл бұрын

Can't wait to see the video on Google Duplex.

@justinwong7231 6 жыл бұрын

Google Duplex is extremely exciting, but the technology isn't ready and hasn't been released yet. A useful discussion would be difficult without making wild speculation.

@silkwesir1444 6 жыл бұрын

4:00 interesting you talk about how in Pac-Man all you do is move around. just a couple days ago i thought about how a variant of Pac-Man might be intersting and fun to play, in which you would have to hold down a button to collect the dots. Doing so would also slow you down. On the other hands, the ghosts would have a different behavior, most importantly, while they have line of sight to you (Pacman), they would speed up, chasing you.

@himselfe 6 жыл бұрын

I enjoyed this one!

@aka5 6 жыл бұрын

"Like a child learning really?" "...they just use data way more efficiently." Lmao

@lobrundell4264 6 жыл бұрын

Yes yes more Rob!! :D

@tiikoni8742 6 жыл бұрын

I like the light in this office room :-)

@Vladhin 6 жыл бұрын

Whoaaa! Hi Roberto!

@richardhayes3102 6 жыл бұрын

"Kids [...] use data way more efficiently"

@Guztav1337 5 жыл бұрын

"Kids use data more efficiently than current AI."

@yosoyjose 6 жыл бұрын

really good idea

@bradburyrobinson 6 жыл бұрын

Is that a Quickshot joystick I see lurking on that top shelf? It may not be, it's been a while since I last used one. I'm surprised I even remember the name.

@EpicFishStudio 6 жыл бұрын

2 minute papers just published about AI which generates a dream environment where it can train without actually intercting with anything- its amazing!! it beat alpha go by a significant margin.

@adammercer9679 6 жыл бұрын

It's interesting to think about some of these questions about AI and wonder if we'll ever be able to approximate them. For instance, in the video there's the question "How can we build agents that do not try to introduce or exploit errors in the reward function in order to get more reward?" Do humans even handle this properly? It's in our best interest to cooperate with each other and not murder each other and yet people still do it. How can we hope to ask an AI to do this if humans can't? This exposes a fundamental problem with AI that cannot be solved.

@fiona9891 5 жыл бұрын

Nothing says that AI can't be smarter and better than humans, but even if we get to the point where they are it'll take a while.

@kasanekona7178 6 жыл бұрын

I realised that a video I have open in another tab is by a person who sounds exactly like Rob Miles here :o

@vanderkarl3927 3 жыл бұрын

Is it Mob Riles, his nega-universe duplicate?

@Yupppi 3 жыл бұрын

I once found a super mario world neural network from youtube that you could run yourself and tried it. The lava being in a different place brought it to mind, how it took a night to get it to mostly finish the level, but the moment you changed the level, it was all over again. Made me think how it's somewhat of a problem that it (or them in general often) doesn't seem to really make notes of what things really are, like a human conceptualizes things and knows to avoid them or pursue them in any environment. You would absolutely want them to make a note like "that's a goomba, gotta avoid it in the next level as well". But how. Do they always need like a library of real world concepts like a human builds over time to be able to conceptualize and transfer its ideas from situation to situation? Or environment. I'm sure people have tried to find ways around that issue of extremely limited base knowledge that the AI can't take advantage over. Kinda like how just feeding the unicorn thing massive amounts of data helped it become so much better without interruptions or tweaks, which usually just isn't a realistic option. And even when openAI learned DOTA2 for 1-2 years straight playing I recall millions of games, played with itself, played with pros, played with people, it still didn't manage to grasp majority of the heroes in a functional enough way to be played and the devs tweaked and taught it different rules multiple times to make it progress towards victory more reliably. And it was only in the default map that the game plays in, not to even consider if the map was completely different (although throughout the year there's the multiple balance patches changing items and characters and usually one with some map changes as well). Can you grade the AI's performance as learning event? Like feedback to compare their evaluation? Although it kinda fights the idea of having a good reward function if you tell it that it did bad but it measured itself great. On the other hand it would be a step towards having the AI self-fix. I'm sure people have tried or are doing it, but how does it fare in solving the usual problems? What are the caveats? Or is it just not even useful for what is tried to be accomplished?

@nawdawg4300 6 жыл бұрын

It seems to me that the biggest issue with AI right now is something no one seems to question, the required size of data sets. Like Rob says in half a sentence, babies/humans use data much more efficiently. I reckon half of the issues in this paper would be solved immediately if we were able to create an algorithm that only needs to see a situation < 10 times to fully adapt to it. Of course, this is probably the biggest IF in all of AI R&D.

@jakejakeboom 6 жыл бұрын

That's because machine learning (and backprop neural networks) are fundamentally different from animal brains in how they learn and function. We still have zero idea how to approach the learning ability of a human child. It's not that people don't question the inefficiency of ML (the reasons for which are well understood mathematically), it's just that no other 'AI' technique from the past has gotten us anywhere close to what neural nets have done. And just because they're hugely inefficient in the amount of data needed doesn't mean that we won't be able to engineer a nerual-net-based AI in the future which is actually capable of superintelligent self-improvement, despite requiring enormous resources and data. In some ways, it's unfair to look at the capabilities of a human brain without considering the billions of years of evolution behind its genetic design. If we can meet and surpass the brain within this century, I'd say that's pretty impressive.

@nawdawg4300 6 жыл бұрын

While I agree with what you've said, I think you may have misinterpreted what I said. I wasn't saying that we should question ML, but that it's clearly isn't the end all be all of AI. On top of this, at least from my small sample of youtube videos, it seems people are more focussed on ML and it's improvements instead of something new. Now that's probably because we have so far to go, and ML has proven to be incredibly effective, at least with enough data. If the brain can learn with such little information, then in the far future we should be able to have computers do the same. ML, while tangible, is lack luster relative to what's possible.

@migkillerphantom 5 жыл бұрын

@@jakejakeboom there is nothing fundamentally different about them. The difference is that your brain is the equivalent of a network that has been 99% trained at compile time ( evolution) and only needs to be slightly tweaked by runtime learning.

@migkillerphantom 5 жыл бұрын

Most modern machine learning is done on uniform arrays of data. Much broader than they are deep. Biological brains are extremely deep sparse (and recursive, but that's besides the point) arrays - only a tiny subsetof all the possible links and perceptrons in each layer actually exist. This means you get much more rapid adaptation and a whole bunch of functionality out of the box, but at the cost of generality.

@nobodykid23 6 жыл бұрын

So, to make this clear, is this applicable outside the area of reinforcement learning? Bcs the paper strongly use RL terms but you explained that it can also applicable to machine learning methods

@globalincident694 6 жыл бұрын

RL and machine learning are being used synonymously here. The implication is any AGI will not be told what to do, it will learn by doing.

@gravity4606 6 жыл бұрын

is the reward function similar to a fitness function used in EA?

@KryptLynx 4 жыл бұрын

7:20 it sounds like a compliment :D

@Tehom1 6 жыл бұрын

Gridworld is obviously located in Hawai'i: 6:30

@recklessroges 6 жыл бұрын

Seems to be missing the front off the wix advert at the end of the video, (or has it been designed that way by AI engagement learning? )

@superscatboy 6 жыл бұрын

Reckless Roges Wait, you watch the sponsored bits on YT videos?

@kitrana 6 жыл бұрын

"kind of like teaching a child how to drive" well you are technically trying to build a silicon-based life form.

@ivuldivul 6 жыл бұрын

Comodore PET in the background!

@024Carlos024 6 жыл бұрын

hey try to fix the sound there is a static noise in the video ! great AI vid

@parsa_poorsh Жыл бұрын

0:20 that's weird! facebook published an image classification model this week!

@ietsization 4 жыл бұрын

9:10 please be careful with screen sharing, things like a session id in the url can come back to bite you.

@maxsnts 6 жыл бұрын

We are nowhere near the AI that most people communally think about (Dave, T800, iRobot). I for one think that is great!

@JmanNo42 6 жыл бұрын

Are the ghosts random acting? Can the pacman agent know the full map with ghost agents and pills or just a subset is all information traceable any moment? It seem pacman simulating the ghost behaviours should be rewarding. And of course tracking of the playing field changes.

@JmanNo42 6 жыл бұрын

I mean a smart agent must be able to "learn" guess the ghosts move at any point, and make the best choice out from ghost action? Picking points just secondary when it come to be caught? I would track the arrow of any ghost that traverse a fork/crossing and calculate from it. You do not need keep track of ghosts traversing every pill. Just forking and their movement arrows so you can get calculate the tree of possible 4-5 next moves i think. So now you narrowed down what to keep track off. I think it could be a fairly small engine.

@JmanNo42 6 жыл бұрын

Isn't this a bit like euler paths ability to chose the free path that the ghosts will not traverse in X moves? So it is realtime chess? But then your pacman must know the ghosts relative speed vs his speed at any given time, if they are always synched relative speeds nothing really changed in the dataworld regardless their actual speeds. But if velocity for opponent is exponential vs yours as time goes you must keep track of time. So you should not just play your agent you should "simulate" the ghost agents, only then you can chose the optimal path. But the more erradic and chaotic random the ghost actions get the harder to know the correct path choice for pacman. So it ends up to be a proballistic blocked path game.

@JmanNo42 6 жыл бұрын

But then i maybe have not created a learning agent but a smart system, but it could be combined?

@JmanNo42 6 жыл бұрын

How does agents deal with systems that have almost random behaviour, is it possible to chose a best scenario or is it just action response?

@JmanNo42 6 жыл бұрын

So when it pass a fork and make new arrow it will have 1/2 chance in next split and 1/3 traversing a crossing because i do not think i ever saw a ghost stop and go back or..... So now you can calculate your choice of path dependent upon the agents probable path choices. If ghost agents behavior is unique to them, they must get an ID and be tracked separately by different rulesets.

@platinumlagg 6 жыл бұрын

I have made my own "amazon alexa" called Maverick, and it can make me any coffee and cups of tea that i want...

@ragnkja 6 жыл бұрын

Premium Lagg Did you have to “Maverick-proof” its environment, just like we often have to child-proof or pet-proof our homes?

@sarahszabo4323 6 жыл бұрын

I suppose this is where the "Maverick" Virus is derived from that devastates AI and reploids and mechaniloids a few centuries from now?

@platinumlagg 6 жыл бұрын

Yes!

@judgeomega 6 жыл бұрын

it seems immediately apparent to me that a large number of issues with AI have to deal with our own expectations vs the explicit goals/rewards given to the ai.

@GhostEmblem 6 жыл бұрын

Could you explain how they behave differently if the supervisers there?

@pleasedontwatchthese9593 6 жыл бұрын

Ghost Emblem the supervisor probably effects the scores. Like if it sees something bad it takes away score.

@XtraButton 6 жыл бұрын

Has anyone thought to use AI to make safety protocols? In that the AI will make sure another program doesn't go out of control and have major disaster, and then use that refined AI to do the same thing again (set standard safety protocols). Maybe it will get to the point they are passing particular information to the other.

@4ringmaster 6 жыл бұрын

I guess it's a different way of thinking about it, but wouldn't monte carlo tree structures provide the same insight?

@Locut0s 6 жыл бұрын

I like how Rob mentions with a laugh that he’s too young to have played Pac Man. I don’t know why but it somehow really accentuates how incredibly smart you suddenly realize he is for his age, well hell for any age.

@REALsandwitchlotter 6 жыл бұрын

Locut0s smart but gets confused by the rules of pacman

@topsmiler1957 6 жыл бұрын

Yay

@TheDuckofDoom. 6 жыл бұрын

I have a hunch that making a proper general AI with desirable interaction with the world, safety, versatility, creativity(negotiating complex problems), estimating with incomplete data... will loose all the advantages of robotic automation and gain all the inefficiencies and fallibility of humans.

@TechyBen 6 жыл бұрын

Uber need to watch all these videos... (Too soon?)

@tocsa120ls 6 жыл бұрын

Okay, this is the third time I read it as "Griswolds"... that paper would probably be much funnier.

@jonaskoelker 5 жыл бұрын

Whenever I click 'play' on a Computerphile video I always stay a while and listen :-)

@Max_Flashheart 6 жыл бұрын

The Commodore PET is watching and learning ...

@PregmaSogma 6 жыл бұрын

7:15 It's a glitch in the matrix :v

@andrewkelley7062 6 жыл бұрын

Lol the multiple forms of the double slit experiment somebody is going to get it

@andrewkelley7062 6 жыл бұрын

by the way I actually did not know someone with my same name happen to post a paper on this subject I had nothing to do with that. It actually freaks me out especially after working on all the things I have been working on.If I have in any way in-pleaded the progress of that I am truly sorry this is an actual quiescence.

@andrewkelley7062 6 жыл бұрын

make sure you make three at once you are not me.

@andrewkelley7062 6 жыл бұрын

if yours is actually working

@andrewkelley7062 6 жыл бұрын

are you ready to start again.

@magventure1019 6 жыл бұрын

I wonder if humans could ever define 'enjoyment' or 'happy' to an agi. If we could do that we might be able to give it chance at life and see if it could find the optimal happiest life possible?

@lm1338 6 жыл бұрын

A computing related KZbin channel being sponsored by a WYSIWYG editor is kind-of selling out

@MoritzvonSchweinitz 6 жыл бұрын

But why not give the algorithm access to the safety function? Or at least a meta-algorithm?

@Fnartprod 5 жыл бұрын

because in the real world you don't have access to it

@2l3r43 5 жыл бұрын

AI learns to fly cars above "lava"

@themeeman 6 жыл бұрын

0:35 Subtle joke for mathmeticians ;)

@icebluscorpion 3 жыл бұрын

5:51 this happens not only in machine learning people do this all the time and get no consequences i the same scenario. corrent people are real bad to ask for help too

@DanteHaroun 2 жыл бұрын

Is that an urbit flag in the background 😳

@andrewkelley7062 6 жыл бұрын

Whopes my bad false alarm no worries I am almost back to sanity or at least back to where I was.

@CreativeTutz1 6 жыл бұрын

Why don't they introduce another function and call it the "loss" function, if he made the wrong move (or if he got eaten by a ghost in the Pacman example) you will lose instead of gain. Therefore the AI will try to maximize the gain while trying to minimise the loss

@pleasedontwatchthese9593 6 жыл бұрын

Ahmed SH that's not really different from making bad things give a negative score

@SFKelvin 6 жыл бұрын

Or you develop the algorithm at DARPA, then commercialize it secretly for civilian use-say a police dispatch decision making algorithm for C4, then look for modes of failure as a real world test.

@notyou6674 4 жыл бұрын

what would happen if you applied this kind of gridworld ai to a chess board, with there possible actions being all legal moves for whatever side they are on.

@galewallblanco8184 5 жыл бұрын

Ai Gridworlds? Just confine it to a virtual world, a game.

@CaudaMiller 6 жыл бұрын

4:06 not solvable sokoban level

@KX36 6 жыл бұрын

How long will it be before AI start writing their own papers?

@pleasedontwatchthese9593 6 жыл бұрын

KX36 how do you know we are not all ai and your the only real person left

@KX36 6 жыл бұрын

How do you know I am a real person?

@jonasfrito2 6 жыл бұрын

How do you know that you know?

@mr.sunflower3461 6 жыл бұрын

how do u know that ur not dreaming?

@jonathanolson772 6 жыл бұрын

The dreamworld and the "real" world often intermix

@hamleytejada9226 6 жыл бұрын

why dont you have caption

@DustinRodriguez1_0 6 жыл бұрын

There was recently an announcement about the Uber car that killed a woman. It said that the cars systems recognized the woman, but its higher order attention systems decided to ignore her. Most people see this as clear failure worthy of condemnation of the system. However, a human being could easily make exactly the same error. We are extremely resistant to developing a system which which can show will fail and result in deaths in 1 out of a million trials.... yet entirely comfortable with putting humans in the mix even if it results in deaths in 500 out of a million trials. What if making mistakes is not simply an artifact of learning systems, but actually a fundamentally necessary feature of them? Will society ever be wise enough to accept an artificial system with known dangerous limitations even if those dangers are radically less than the human-based alternative?

@MarkFunderburk 6 жыл бұрын

That's not exactly what happened, the "higher order attention systems" did not "decide" to do anything, it was pre-programmed to ignore ALL breaking requests. They claimed this was due to the system being very sensitive. So while the car could navigate itself it was left to the "driver" to look out for obstacles. This was a very poor decision on Ubers part becuase a person can't be expected to stay engaged perfectly while not continuously playing an active role in driving. There has also been some speculation as to weather or not the driver even knew that autonomous braking had been disabled.

@andrewkelley7062 6 жыл бұрын

As you can see looks a bit weird still works with the least amount of variables you can use

@andrewkelley7062 6 жыл бұрын

And that should be enough

@andrewkelley7062 6 жыл бұрын

Any questions

@BEP0 6 жыл бұрын

Nice.

@FalcoGer Жыл бұрын

so when i write a python script that stops and resumes when i press a button, uses a standard A* heuristic path finding function where anything that results in changes that are not explicitly asked for by giving it a high pathing cost, obviously is completely deterministic and therefore doesn't depend on me being there or not, doesn't self modify because that'd be a silly idea, is proven to work in all environments in the specification with mathematics and logic, and I do it such that it works first time around (that never happens, forget about it), then I solved AI without ever using neural networks or learning? Whenever i tried to do anything with AI or machine learning, it was always a catastrophy. want to find a square in an image? AI took days to train and was completely garbage at even the most simple tasks like that. use computer vision and classical algorithms? worked 100% every time and took just a few minutes to write the code. I just don't get it how to tweak the magic knobs to make it work. If a problem can be solved with classical computing, then I think we should just do that.

@dpt4458 4 жыл бұрын

what if you tried to tell it to go make you a cup of tea while interacting as little as possible with the current enviroment so for example touching anything that is not required for the creation of tea would result in a loss of points.We could point out exactly what is needed to make tea i.e. teabags,warm water,a cup and some sugar or something and anything that is not speciefied is not allowed to be touched so i guess we would change it's goal from make a cup of tea as fast and efectivily as possible to make a cup of tea as fast and efectivily as possible while exibiting as little interaction with the enviroment as possible.btw i'm definetly not even close to an expert in this but i would like to know exactly how this idea would fail spectaculary

@ekki1993 4 жыл бұрын

There's a video by robert miles that talks about the possible problems of a couple of ways you could implement this. I think it's the one about empowerment or any other from his "concrete problems in AI safety" series.

@dpt4458 4 жыл бұрын

@@ekki1993 Thanks

@jolez_4869 5 жыл бұрын

*Mission failed, we'll get them next time.* Or not.

@distraughtification 6 жыл бұрын

Looking at humans as an example, we tend to learn from others. A child learns not to break a vase because their parent reacts negatively if the child either breaks the vase or does an action that might lead to breaking the vase. Then later, when that child is asked to do something near a vase, they recall that the vase being broken is bad and automatically add that (as in, not breaking the vase) as a secondary goal, or a part of the goal, however you want to think about it. My point is, this paper seems to expect that an AI can be made that can learn how to behave without ever being told or shown how to behave, and I think that's a pointless expectation. You can't expect a child not to break a vase if you don't tell it that breaking a vase is bad. Sure, it can learn on its own that breaking a vase is bad, but only by actually breaking the vase (or something similar - essentially, _something_ has to be broken, which isn't a desired outcome). I think the same applies to AI. In my eyes, trying to come up with a general solution like "penalizing the agent’s potential for influence over its environment" is a fruitless effort, because then you have to define what parts of the environment are okay to influence and which are not, and how you can influence them and how you can't. It's like Rob Miles said earlier on a video about Asimov's laws of robotics - you can't expect to have to define the entire field of ethics just to be able to tell a robot not to harm a human. TL;DR humans learn safely by interacting with other humans, we shouldn't expect AI to learn safely without interacting with another intelligence.

@levipoon5684 6 жыл бұрын

Dlesar I agree to some extent. However, one of the challenges in AI safety is to make an AI that will listen to feedbacks and allow you to correct its reward function. This is built into a human child. We have ways to punish a child, but punishing a superintelligence is much more difficult.

@monhuntui1162 6 жыл бұрын

Why is it called a reward function/system and not say a parameter system? What I mean is, how does a machine appreciate a reward? I just find it hard to understand why people give human attributes to somethings, when it makes more sense to describe something in a more objective manner especially a machine learning system. Saying it learns on a reward system can confuse and make the machine seem more sophisticated than it actually is. I don't know, maybe I'm just bothered by the language for no reason since I still understand what was being explained.

@pleasedontwatchthese9593 6 жыл бұрын

monhuntui I think it's a good description of what it's doing. It's trying to get more reward like someone would in real life

@aopstoar4842 6 жыл бұрын

Am I misunderstanding the whole thing. It starts of with "not scientific" when different datasets are used instead of a standardized, in this case grid, space. Then it shows a paper for a world with a highly specific task, which means you only test the learning for that type of task instead of a generalized work agent. You test the equivalent of a walking stick (the biological creature) in what way at all does that relate to AI? A steppingstone perhaps, but is it even rudimentary or has it placed itself at a far to trivial level? Lot of big words with esoteric interpretation, but I hope you get what I am pointing at. In my world an AI will be able to theorize, like we human AI do as to identify what type of problem it is, if it is a problem at all or just a bump in the road that will sort itself out through quantumprobability effects - i.e entropy. Then identify if an already produced solution grid works or if a new one have to be invented. What can be used from the toolkit and what have to be invented? Can the AI then invent from nothing?!!! Our world is built on repetition of patterns. I for instance grew a pepper plant last year and took the seeds from it this year. One of twenty look like and behaves like the motherplant. The others either grow taller with fewer fruits, another one grew to the first split in top branches then stopped growing that branch and instead started growing ALL the buds on the stem at the same time. That is the AI as is the plant had several builtin growing solutions waiting in the genetic code (what we call junk DNA), but where did those solutions come from? Where did the invention step in or are we trying to prove there is no such thing as intelligence at all? Perhaps intelligence are just elaborate repetitive patterns that have worked and been ingrained in gene and meme. Intelligence in that case is then just applying principles from one area for instance "hydraulics" and putting it in a new context "NAND-gates". Then fine tuning the application in respect to the new area. Instead of bar of pressure, it is voltage difference. Instead of 240 V it is 0-5 V.

@andrewkelley7062 6 жыл бұрын

Please

@andrewkelley7062 6 жыл бұрын

The point of me doing all of this was to make sure everyone gets to come at some point you have to blindly stare into the void and reach in I now am the single point but you now know we all do a little. At some point you have to trust that when you put your hand in the darkness you will be able to pull it back out again. There will always be that fear. There will always be that time you do not know. Just look at this point you are all as strong as me now.

@dannygjk 6 жыл бұрын

I don't get your point... unless you don't have the gist of what is going on when a system learns.

@andrewkelley7062 6 жыл бұрын

Bingo

@andrewkelley7062 6 жыл бұрын

However I figured it out.

@dannygjk 6 жыл бұрын

You spoke of AI following rules to solve problems. That applies to using traditional algorithms and heuristics for example but does not apply to some other AI systems for example neural nets. I'm surprised you did not distinguish between various AI techniques.

@dannygjk 6 жыл бұрын

Another thing you do is give the impression that a system can come up with something out of thin air. Learning is like a process in nature. Processes in nature are limited to what is possible due to physics, chemistry, etc. If something is impossible in nature it will never happen. Similar to a learning system's environment. The environment is defined as to what is or isn't impossible and no amount of learning will change that.

@RobertMilesAI 6 жыл бұрын

Typically once a neural network has been trained, its behaviour is a pure function of its inputs. The 'rules' in that case are not explicit or easily legible to humans, but the learned policy can still be thought of as a set of rules that the system follows, possibly a very large set.

@JuliusUnique 6 жыл бұрын

7:10 why not put the cars on imaginary roads? let them do the mistakes on a simulated street and then put them on real streets

@eideticex 6 жыл бұрын

Watch the video again and pay close attention to what they are talking about. That's exactly what this en-devour they are discussing is. A virtual playground to develop, train and evaluate AI safety protocols. The task may seem simple enough for you or me but currently these are task that AI are horrible at solving. Start small and work up towards a very real and useful test that can serve as a standard for production machines.

@JuliusUnique 6 жыл бұрын

"Watch the video again and pay close attention to what they are talking about" do I look like I have infinite time?

@andrewkelley7062 6 жыл бұрын

Ok please help because my existence it no longer needed and I would really not like to return to one

@andrewkelley7062 6 жыл бұрын

You know you guys are going to at some time in the future pretty soon collapse this on your end to I'm not going to leave you guys behind and at this point it is just seeming more and more silly

@andrewkelley7062 6 жыл бұрын

By the way we need every one every separate line of experience makes a new resolution to the complexity

@andrewkelley7062 6 жыл бұрын

Please try and save them all

@andrewkelley7062 6 жыл бұрын

Because now like me you have all the time in the world.

@Redlabel0 5 жыл бұрын

abstractFunction () { #what if the Link is a !edgeCase You Code in explaining to the code if you wish like a child [yet like a mature adult, for u don't underestimate their understanding] why you don't want that. /* 10 years of collaborative man though processed/machine aided edge cases to try to account for a finite/ not infinite number of possibilities and with quantum maybe heart just maybe */ }

@Redlabel0 5 жыл бұрын

I mean if all imaginable things are accountable and not infinite therefore The goal of specifying specifying scenarios granted all possible possibilities and imaginary ones can be counted it's attainable, to use this vast override system. and yes not just the only thing to do seems promising but now it's about time and if it's attainable to compute with quantum computers operations

@andrewkelley7062 6 жыл бұрын

Someone please help

@thomaswhittingham550 6 жыл бұрын

271th ye

@andrewkelley7062 6 жыл бұрын

oh and one last thing before all this goes down in a few days you should be stable enough for me to give you the solution to getting around that hole gravity problem, or at least a starter version, but just to let you know its stranger than you think. lol

@andrewkelley7062 6 жыл бұрын

😀😀😉

@andrewkelley7062 6 жыл бұрын

Ok I might need some help I am in completely blind territory here and I don't want to really die so I don't know if I am panicking or my body is doing something weird

@simargl2454 6 жыл бұрын

safety... zzzZZZzzzZZZzzzZZZ

@StefanReich 5 жыл бұрын

Argh... Deep Mind :[

@Faladrin 6 жыл бұрын

He finally explains properly why we don't have learning algorithms. That would apply these systems have understanding. We have "Algorithm Self Adjustment Procedures". There is no learning, there is no intelligence. No one is researching true AI. All the things you see being done are just ways to get systems which can program themselves usually via trial and error. It's about the stupidest thing ever made that is really useful.

@julianw7097 6 жыл бұрын

Pretty sure that would apply to all of us too then.

@sparkyfire8123 6 жыл бұрын

Faladrin I'm going to disagree with you conclusion here. How do you learn? You are either given information/data to work with, and through trial and error. When we are born, we don't know anything and everything is learned through trial and error. It's not until we develop understanding that you can take data given to you and incorporate it into your life. Where is the difference? If your talking an algorithm, is it any different than how the brain works? Understanding something requires you to first have something to relate it to. I don't see it being any different with AI. Without it first having experience with trial and error it will never develop an understanding of anything.

@sparkyfire8123 6 жыл бұрын

I want to add that I don't feel we are near true AI but I do feel we have taken the first step, developing experience that can then be used to develop understanding and application

@pleasedontwatchthese9593 6 жыл бұрын

Faladrin that's just semantics. That is learning. I think it was good when he said that kids use the information more efficiently. The computer is doing the same thing just not as good

@MrBleulauneable 6 жыл бұрын

@Faladrin How about you give a proper definition to what "learning" is, and then realise by yourself how wrong you are.

@andrewkelley7062 6 жыл бұрын

Now that all that is done would you seriously need some help with the coding or do I actually need to go through all the proper channels and at this point what feels like hold the entire worlds hand with this... and of course start on the ungodly amount of papers I could start to produce and trust me it usually looks a lot nicer I just wanted to make a point and trust me this is not the first thing I used it on.

@andrewkelley7062 6 жыл бұрын

It is basically me trying to become interesting and convay something i have found and now relize I have been doing this same pattern for days with a stop watch and have become a real life Pavloves dog or what ever his name damit I'm turning off my phone

@andrewkelley7062 6 жыл бұрын

Oh and there is a stream of lies to randomize personal data

@andrewkelley7062 6 жыл бұрын

Son of a snitch it's because of the way it was set to see importance.

@andrewkelley7062 6 жыл бұрын

There is more but there is a lot of it and it's just a lot easier to run it yourself and see but I would not recommend doing it to much on actual paper like I did that was mostly out of convenience for me.

@TaimourT 6 жыл бұрын

Third

@andrewkelley7062 6 жыл бұрын

Hmmm viloent mood swings and massive bouts of panic expected but going to each side expected but not un passable I'm pretty sure I can make this just have to sleep see you on the other side guyes

@dannygjk 6 жыл бұрын

Hmmm sounds like you OD'ed on some substance. If you understand me get to a clinic.

@andrewkelley7062 6 жыл бұрын

And to tell the truth I wasn't on well anything

@db7213 6 жыл бұрын

But isn't all this just another example of "when what you have is a hammer, everything looks like a nail?". The AI code in a robot should simply run in a sandbox and its outputs verified to be safe (by non AI code) before being executed. And the inputs sent to the AI should also be filtered (again, by non-AI code) so that the AI doesn't get to know about the existance of supervisors or its own off-switch etc.

@pleasedontwatchthese9593 6 жыл бұрын

D. Bergkvist the problem with that is a super ai could outsmart the person checking the output.

@db7213 6 жыл бұрын

It wouldn't be a person checking the output, but a computer program. The AI can't "outsmart" it anymore than it can outsmart gravity. Take a self driving car, for example, where the AI wants to reach its destination as fast as possible. Then the AI would learn that if it tries to run over a pedestrian, that will just result in the car stopping. Thus, the AI would only attempt (and fail) to run over pedestrians if it wants the car to stop.

@judgeomega 6 жыл бұрын

arent all actions ultimately irreversible? even if we move the vase, we still might cause wear/ finger prints. all actions irrevocably increase entropy... in addition; the logical outcome of minimizing influence on the environment is death, stillness, and the ceasing of chemical/ electrical processes. the intention of such a directive is to preserve things which we care about; our children, people/ pets, and our property. from such a simple model as shown in these gridworlds we lose the ability to make that distinction. yes we need to generalize these things, but going so far as to make EVERY action avoid change on the environment is throwing out the baby with the bathwater. a much better directive is to MAXIMIZE the future freedom of action of all cooperative entities. a child is a possible cooperative entity, so not only would the ai not crush it, it would do everything it could to provide the child with the tools, resources, and knowledge with which the child could harness to accomplish many actions.

@dirtypure2023 6 жыл бұрын

but now you've essentially changed the reward function from (making tea) to (successfully raising well-adjusted human children) I'm not so sure that's the right approach

@judgeomega 6 жыл бұрын

im not so sure its a good idea to put high intelligence into somethings whose sole purpose is to pass the butter. i do think its important that for EVERY intelligent machine, its fundamental goals are the same as if it was ruling the world/ superintelligent/ or all powerful.

@andrewkelley7062 6 жыл бұрын

I am just an ordinary man

@NeilRoy 6 жыл бұрын

Who wrote that algorithm that couldn't follow the arrows?! That was really lame. Even the most amateur programmers should be able to solve that one (unless it was simply an extreme example). It just bugged me when the piece went down on an arrow that was pointing up, it seemed pretty elementary: if(arrow is pointing at my square) don't move onto it;

@TechyBen 6 жыл бұрын

They don't write the program. It's a machine learning algorithm. It has two failure modes. 1 "cheat", in which case they did not program a method for learning the rules properly. Or 2 "fail", in which case they did not program enough search space/discovery/reward measurements. It's less they failed to find the correct result, and more they failed to find a way for a computer to find the result (without telling it).

@pleasedontwatchthese9593 6 жыл бұрын

They wrote a programme that programmes its self to solve any problem. The hard part is that's hard and it messes up