Training AI Without Writing A Reward Function, with Reward Modelling

  Рет қаралды 234,460

Robert Miles AI Safety

Robert Miles AI Safety

Күн бұрын

How do you get a reinforcement learning agent to do what you want, when you can't actually write a reward function that specifies what that is?
The paper: arxiv.org/pdf/1706.03741.pdf
The blogpost: openai.com/blog/deep-reinforc...
Thanks to my wonderful patrons:
/ robertskmiles
James
Gladamas
Steef
Scott Worley
Jordan Medina
Simon Strandgaard
JJ Hepboin
Pedro A Ortega
Said Polat
Chris Canal
Jake Ehrlich
Kellen lask
Francisco Tolmasky
Michael Andregg
David Reid
Robert Daniel Pickard
Peter Rolf
Chad Jones
Richárd Nagyfi
Jason Hise
Phil Moyer
Shevis Johnson
Erik de Bruijn
Alec Johnson
Clemens Arbesser
Ludwig Schubert
Bryce Daifuku
Allen Faure
Eric James
Qeith Wreid
Jonatan R
Ingvi Gautsson
Michael Greve
Julius Brash
Tom O'Connor
Robin Green
Laura Olds
Jon Halliday
Paul Hobbs
Jeroen De Dauw
Lupuleasa Ionuț
Tim Neilson
Eric Scammell
Igor Keller
Ben Glanton
anul kumar sinha
Sean Gibat
Cooper Lawton
Will Glynn
Tyler Herrmann
Tomas Sayder
Ian Munro
Jérôme Beaulieu
Nathan Fish
Taras Bobrovytsky
Anne Buit
Vaskó Richárd
Sebastian Birjoveanu
Euclidean Plane
Andrew Harcourt
DGJono
robertvanduursen
Dmitri Afanasjev
Marcel Ward
Andrew Weir
Ben Archer
Kabs
Miłosz Wierzbicki
Tendayi Mawushe
Jannik Olbrich
Anne Kohlbrenner
Jussi Männistö
Wr4thon
Martin Ottosen
Archy de Berker
Marc Pauly
Andy Kobre
Brian Gillespie
Poker Chen
Kees
Darko Sperac
Truls
Paul Moffat
Anders Öhrt
Marco Tiraboschi
Michael Kuhinica
Fraser Cain
Robin Scharf
Seth Brothwell
Kasper Schnack
Klemen Slavic
Patrick Henderson
Oct todo22
Melisa Kostrzewski
Hendrik
Daniel Munter
Graham Henry
Duncan Orr
Bryan Egan
Robert Hildebrandt
James Fowkes
Alan Bandurka
Ben H
Tatiana Ponomareva
Michael Bates
Simon Pilkington
Dion Gerald Bridger
Petr Smital
Daniel Kokotajlo
Fionn
Yuchong Li
Diagon
Parker Lund
Paul Emmerich
Russell schoen
Andreas Blomqvist
Bertalan Bodor
David Morgan
Jeremy
Ben Schultz
Zannheim
Daniel Eickhardt
lyon549
HD
Ihor Mukha
14zRobot
Ivan
Arne Strasser
Jason Cherry
Igor (Kerogi) Kostenko
Isaac Boates
Thomas Dingemanse
Davy Ker
Alexander Brown
Devon Bernard
Ted Stokes
James Helms
Matheson Bayley
/ robertskmiles

Пікірлер: 924
@NoahTopper
@NoahTopper 4 жыл бұрын
"If you squint, the training process is sort of like a compiler." Totally brilliant statement.
@shouldb.studying4670
@shouldb.studying4670 4 жыл бұрын
I had to squint AND tilt my head but I see what he means 🤣
@ZyTelevan
@ZyTelevan 4 жыл бұрын
Code is data is code
@BlargVison
@BlargVison 4 жыл бұрын
yeah that was a fantastic comparison that i won't forget
@kacperozieblowski3809
@kacperozieblowski3809 4 жыл бұрын
I agree
@filipgara3444
@filipgara3444 4 жыл бұрын
„No.”
@TheLifeInMotion
@TheLifeInMotion 4 жыл бұрын
According to Strongbad: "Technology is anything that you don't understand how it works and if you break it you have to buy a new one."
@chrisdaley2852
@chrisdaley2852 4 жыл бұрын
So retractable pens are technology. Got it.
@CH-bd6jg
@CH-bd6jg 4 жыл бұрын
@@chrisdaley2852 pen goes in, pen goes out. you can't explain that! just buy a new one!
@columbus8myhw
@columbus8myhw 4 жыл бұрын
Chris Daley I mean, yes… but also I was willing to say scissors are technology so maybe I'm not a good judge of these things
@OrchidAlloy
@OrchidAlloy 4 жыл бұрын
@@chrisdaley2852 Yes they are
@diablominero
@diablominero 4 жыл бұрын
So my desktop computer isn't technology because I built it and could replace a single broken component rather than the whole thing?
@atimholt
@atimholt 4 жыл бұрын
“Are scissors technology?” Me: yeah, of course. “Most people would say no.” ¯\_(ツ)_/¯
@totally_not_a_bot
@totally_not_a_bot 4 жыл бұрын
Those of us who watch these videos don't really qualify as most people.
@pauljs75
@pauljs75 4 жыл бұрын
Even sticks can count as technology, if implemented as tools in some way. (Combination of tools and methods to achieve some goals. Usually making a task easier, or doing something else that improves conditions for the tool user.) Obviously such is not the latest and greatest technology, which seems to be the definition this video is going for.
@lucar6897
@lucar6897 4 жыл бұрын
I also think of calculators as artificial intelligence...
@shayneoneill1506
@shayneoneill1506 4 жыл бұрын
Yeah the part of my brain that did those anthropology units would never let me think scisors arent technology
@NoahTopper
@NoahTopper 4 жыл бұрын
When I was a kid I definitely would have said no. But I remember at some point being taught that anything along the line of a pencil or chair was technology, and that sunk in. But I imagine a lot of people still have that initial instinct.
@columbus8myhw
@columbus8myhw 4 жыл бұрын
"Like, there's no point asking for feedback if you're already pretty sure you know what the answer is, right?" …Do you want me to answer that question?
@discosteve
@discosteve 4 жыл бұрын
Your point still stands, but neverless the scissors have a butt load of tech in the background that us normies aren't aware of (material science). Just wanted to mention that the humble pair of scissors deserves some praise.
@DDvargas123
@DDvargas123 4 жыл бұрын
I was thinking the same thing. We take for granted a lot of the cool tech around us all the time. Levers and Pulleys and other simple machines most of all. But rob makes a good point that people dont commonly think of them as tech even though perhaps they should. Language is a cruel mistress.
@infinummjb
@infinummjb 4 жыл бұрын
scissors are relatively low-tech, but a tech nonetheless.
@columbus8myhw
@columbus8myhw 4 жыл бұрын
Would you consider a scissors company a "tech company" the same way you'd consider Apple and SpaceX tech companies? What about post-its? Is 3M a tech company?
@DDvargas123
@DDvargas123 4 жыл бұрын
@@columbus8myhw 3M's company description is literally: "applies science and innovation to make a real impact by igniting progress and inspiring innovation in lives and communities across the globe." That sounds really tech company to me
@RobertMilesAI
@RobertMilesAI 4 жыл бұрын
I think if you took someone to a scissors factory and showed them all the machines and equipment of the production line, they'd call that technology. But not so much the scissors themselves
@FortoFight
@FortoFight 4 жыл бұрын
If you think about it, this is a lot closer to how a human learns. A human won't constantly bug you for feedback every single time it does something, nor will it learn how to do something properly from a standardised function (e.g. exam mark schemes). A human will independently use its available knowledge, and occasionally ask for help when it's unsure what to do.
@dannygjk
@dannygjk Жыл бұрын
Do you have any children? Kids seek approval.
@owenpawling3956
@owenpawling3956 Жыл бұрын
@@dannygjk no, but he is right. Kids are just unsure more often.
@Nico-ur2po
@Nico-ur2po Жыл бұрын
@@dannygjk You don't correct a kid every time they talk using improper grammar or mix up word order. You correct them every now and then, and they learn over time combined with observing how other humans talk.
@dannygjk
@dannygjk Жыл бұрын
@@Nico-ur2po I didn't, (I have two kids).
@Henrix1998
@Henrix1998 4 жыл бұрын
I can already imagine the Indian ML farms where thousands of people just evaluate learning
@TurkishLoserInc
@TurkishLoserInc 4 жыл бұрын
Sounds a lot like the premise for The Matrix. "On a scale of 1-10, how real do you think this is?"
@Encypruon
@Encypruon 4 жыл бұрын
It's called Amazon Mechanical Turk.
@Verrisin
@Verrisin 4 жыл бұрын
Damn, that actually sounds likely... - Here is my idea: since AI will take all our jobs... There will be one job of the future: *Specifying preference.* - I actually don't hate it. :D
@Verrisin
@Verrisin 4 жыл бұрын
... thinking about it: It kind of is the ideal job, isn't it? Do we, as humans, even want to do anything more than that? - Our job will be saying what we want in the world, and how we want things to work... It will even work as a voting mechanism for policies since they will be run by AI - that figures out how to best match our preferences... - I think this is the way... (or at least a good direction for now ^^)
@benalias5766
@benalias5766 4 жыл бұрын
I can already imagine a complex AI which is surprisingly good at a wide variety of tasks... and turns out to have hired a load of people in India to do its work for it.
@riccardoorlando2262
@riccardoorlando2262 4 жыл бұрын
So in a couple years captchas will be reward predictor training? "Which of these is the better shoe design"?
@toxicpsion
@toxicpsion 4 жыл бұрын
nah, i'd bet they do it already; just more subtly than that.
@LoveScreamTrue
@LoveScreamTrue 4 жыл бұрын
@@toxicpsion Like Google CAPTCHA? - "Select all traffic lights"
@johnnymellon7414
@johnnymellon7414 4 жыл бұрын
"Select all the pictures with Sarah Connor in them" ... wait what?
@z-beeblebrox
@z-beeblebrox 4 жыл бұрын
@@LoveScreamTrue Except it'll become "Select your favorite traffic lights"
@stribika0
@stribika0 4 жыл бұрын
Which of these places do you prefer as a shelter during a robot uprising?
@Noxeus1996
@Noxeus1996 4 жыл бұрын
Definitely one of the best educational channels on KZbin.
@zacharieetienne5784
@zacharieetienne5784 4 жыл бұрын
hold on to your papers and i'll see you, next time!
@CynicatPro
@CynicatPro 4 жыл бұрын
@@zacharieetienne5784 TwoMinutePapers is also super good X3
@hypebeastuchiha9229
@hypebeastuchiha9229 Жыл бұрын
@@CynicatPro he sucks
@stefano8936
@stefano8936 4 жыл бұрын
Robert Miles: "what is technology?" Me: move the finger to calibrate the amount of video to skip Robert miles: "don't skip ahead" Me: humbly obey
@GrixM
@GrixM 4 жыл бұрын
I feel betrayed because the next 5 minutes were just repetition of previous videos so I wish I had in fact skipped ahead.
@jnevercast
@jnevercast 4 жыл бұрын
Yeah he got me too. I was about to skip just as he said don't skip. "Well okay!"
@Atariese
@Atariese 3 жыл бұрын
The thing is... the question he poses after that leads me down that rabbit hole and away from his video... definitely not the intent i would say
@riperian8954
@riperian8954 2 жыл бұрын
@@GrixM lol i did exactly what you and OP did, only I was like 'okay okay that's enough of that' after about 2 minutes. still a brilliant video overall though xd.
@Macieks300
@Macieks300 4 жыл бұрын
"in a later video" well... see you in 3 months then
@IstvanNagy86
@IstvanNagy86 4 жыл бұрын
This channel always worth the wait :)
@griest5493
@griest5493 4 жыл бұрын
IKR, what a tease.
@MatthewStinar
@MatthewStinar 4 жыл бұрын
You can't rush this kind of quality! Do you know how long it takes to read and digest all those research papers?
@IstvanNagy86
@IstvanNagy86 4 жыл бұрын
... almost there.
@Macieks300
@Macieks300 4 жыл бұрын
@@IstvanNagy86 to be fair Robert was on Computerphile in the meantime kzbin.info/www/bejne/aWLVhmCMr6tordk
@TheMan83554
@TheMan83554 4 жыл бұрын
The thing about your channel is the little touches of 4th wall humour. Having backflip you say "wait I don't have to do a backflip?" Was brilliant.
@sharkinahat
@sharkinahat 4 жыл бұрын
I wouldn't mind an ad. YT trained me how to skip paid promotion.
@weirdal3333
@weirdal3333 4 жыл бұрын
KZbin Vanced vanced.app
@rr.studios
@rr.studios 4 жыл бұрын
@@weirdal3333 lol im using this app rn
@HansLemurson
@HansLemurson 3 жыл бұрын
What sort of reward function did you use?
@megajor232
@megajor232 4 жыл бұрын
Whatcing your videos make me feel smart without actually having to be
@benalias5766
@benalias5766 4 жыл бұрын
Sounds like you're gaming your reward metric.
@ephemeralvapor8064
@ephemeralvapor8064 4 жыл бұрын
Maybe your evaluation of his teaching is: Good teacher = true Because he brings understanding lesser teachers could not in the same time and effort on your part.
@zeikjt
@zeikjt 4 жыл бұрын
8:50 That backflip part was super enjoyable :D
@MrCreeper20k
@MrCreeper20k 4 жыл бұрын
17:25 Don't worry Robert, at least I don't mind an ad at the end. And if anyone should get that bread, it's you.
@briandoe5746
@briandoe5746 4 жыл бұрын
I am in a room by myself and I audibly cussed when I heard that openai and deepmind we're working together on something. Google's apparent lack of concern with safety is one of the reasons I want your videos sir
@daniellewilson8527
@daniellewilson8527 4 жыл бұрын
Brian Doe why is two AIs with different modes of thought working together a problem? Humans have different modes(parts of the brain specialized for different tasks) that combine the inputs from these disparate programs into a coherent idea of the world. Imagine trying to learn about your surroundings when the only sense you have is the ability to differentiate temperature and you will understand why certain AIs need others to help with things.
@briandoe5746
@briandoe5746 4 жыл бұрын
@@daniellewilson8527 my main concern with AI is not the expediency that it gets to general intelligence. My concern with a i is the safety mechanisms and their capabilities when it gets to general intelligence. Google has multiple times proven to be unconcerned about the safety question in This is highly concerning
@igordmitriev7211
@igordmitriev7211 4 жыл бұрын
>We'll talk about them in a later video //Gets hyped, realises that it's the latest video on the channel, gets reminded of Patreon, enlists to see the video a bit sooner
@Varue
@Varue 11 ай бұрын
Humans being able to simulate problems in their head to predict different outcomes is one of their greatest strengths, it means they can be confronted with new experiences they haven’t evolved specifically for and come up with a solution from a list of possible solutions and stand a much greater chance of overcoming the problem without dying
@OrioPrisco
@OrioPrisco 4 жыл бұрын
Hey it's really cool for the viewers that you turned down that sponshorip offer, thanks
@dontfeo
@dontfeo 3 жыл бұрын
Nah he should've taken it. U can skip it anyway and it would help him bring more content.
@brendanjackman3600
@brendanjackman3600 4 жыл бұрын
"Hmm, reward functions are a limiting factor on some ML capabilities. This is a problem. How do we solve problems? WITH ML"
@DDvargas123
@DDvargas123 4 жыл бұрын
Sometimes a solution is so good it can solve its own cons
@MichaelWBauer
@MichaelWBauer 4 жыл бұрын
It's definitely funny when you frame it this way, but it's also interesting to note the similarity here with the brain. The brain is a system of interconnected neural networks which each are responsible for certain aspects of our thinking capabilities. It's not too hard to imagine the connection between the logical extension of the results in this video and the architecture of the human brain.
@default632
@default632 4 жыл бұрын
@@MichaelWBauer Remember where the word neural network came from. Duh
@MatthewStinar
@MatthewStinar 4 жыл бұрын
I think you're describing a Generative Adversarial Network. en.m.wikipedia.org/wiki/Generative_adversarial_network
@amyshaw893
@amyshaw893 4 жыл бұрын
just replace the human with another ai, and get the human to rate that ai. not good enough? MOAR AI!!11!!
@DDvargas123
@DDvargas123 4 жыл бұрын
It's AIs all the way down!
@thehypnotoad5184
@thehypnotoad5184 4 жыл бұрын
Just make an AI trained on footage of people doing back flips, no need for human input Even if the AI is "only" 99% accurate it should be enough
@DDvargas123
@DDvargas123 4 жыл бұрын
@@thehypnotoad5184 "footage of people doing backflips" IS human input
@thehypnotoad5184
@thehypnotoad5184 4 жыл бұрын
@@DDvargas123 I mean the input already exist its just need to be collected, its kinda going full circle but it would be interesting to see if you can speed up the reward model that way
@rumplstiltztinkerstein
@rumplstiltztinkerstein 4 жыл бұрын
@@thehypnotoad5184 but the ai will find ways to exploit it. Nothing stops us from giving the footage and having a human checking it from time to time telling it to stop using it's head as a catapult when the ai was supposed to be running
@DamianReloaded
@DamianReloaded 4 жыл бұрын
I would define intelligence as "the ability to autonomously identify problems and search for solutions to achieve goals"
@jessgold551
@jessgold551 4 жыл бұрын
I have watched all of Robert's videos several times. Its perfectly paced, well considered and clearly communicated. There is so much there its interesting to watch, sleep on it, and watch again later to catch more. I also enjoy the presentation and multiple interesting ways of presenting things like word popups and cut to screen as well as some graphics and clips. If it helps with demographics I am a former software engineer and still work in I.T.
@Felixkeeg
@Felixkeeg 4 жыл бұрын
I am actually a bit dissappointed that you didn't go for the backflip lol
@ruvimlashchuk6134
@ruvimlashchuk6134 4 жыл бұрын
My disappointment is immeasurable, and my day is ruined.
@ruvimlashchuk6134
@ruvimlashchuk6134 4 жыл бұрын
My disappointment is immeasurable, and my day is ruined.
@Suush
@Suush 4 жыл бұрын
He forgot to program a reward function :P
@Alex2Buzz
@Alex2Buzz 4 жыл бұрын
Miles: "What is technology?" *VSauce music*
@ohokcool
@ohokcool 4 жыл бұрын
Did u go to Palms Middle?
@fish_wizard618
@fish_wizard618 8 ай бұрын
It seems like this method of evaluation could also help AI's learn to do much more arbitrary things. Like if you wanted a “pretty” pattern, you could train it to make more patterns that you find pretty using this.
@FrotLopOfficial
@FrotLopOfficial 4 жыл бұрын
That last few minutes of your video will go unnoticed but for those who do, we very much appreciate it.
@the1gip
@the1gip 4 жыл бұрын
You, sir, remain one of the most interesting educators in KZbin. The effort you've put in to making this video watchable and entertaining really shows. There's not too many people I can watch for nearly 18 minutes in front of a beige backdrop and still be hooked.
@NoahTopper
@NoahTopper 4 жыл бұрын
12:19 I approve very greatly of your use of "eachother" as one word. The world needs this change. I don't know if you and I talked about this at all at the EA Hotel, but I've been trying to convince everyone to write it like that.
@squirlmy
@squirlmy 4 жыл бұрын
I started to do that, but "spell correct" too often comes on and I've gotten used to following automated corrections. I'm wondering if automated (or even AI writing assistants) will slow the evolution of language and grammar, and perhaps even pronunciation will remain in stasis not because of any changing dialect cues of social status, origin (or adopted location), or otherwise, but because of how our "correcting" algorithms are programmed in communication devices.
@qwertyTRiG
@qwertyTRiG 4 жыл бұрын
@@squirlmy You've reminded me that I really need to create a dictionary with Oxford Spelling (en-GB-oed).
@discipleoferis549
@discipleoferis549 4 жыл бұрын
I've been writing "eachother" for 15 years now. I've even told off some of my English teachers for trying to correct me. Heck... I remember back in 6th grade, I think, telling off my teacher for incorrectly correcting another student that had written "ain't". I was an opinionated 11-year-old, haha.
@NoahTopper
@NoahTopper 4 жыл бұрын
@@discipleoferis549 I told my high school English teach that I was attempting to turn "eachother" into one word, and if she'd be willing to not mark it wrong when I used it. She was super on board.
@qwertyTRiG
@qwertyTRiG 4 жыл бұрын
@@NoahTopper It definitely makes sense. Similarly, I tend to distinguish between "alright" (acceptable) and "all right" (completely correct).
@cmoxiv
@cmoxiv 4 жыл бұрын
Mate, you are brilliant. Great content with a philosophical flavour. The last part about Patreon is probably the only thing that actually convinced me about supporting content creators on Patreon. Well done mate. Well done.
@rosborr4330
@rosborr4330 4 жыл бұрын
I subbed because you knew I'd skip ahead the moment you said 'What is technology?'. You win this round, Robert.
@wilhem13
@wilhem13 4 жыл бұрын
A video upload ?? My day's already better. Great content my friend, THIS is why I don't watch TV anymore.
@crypticnomad
@crypticnomad 4 жыл бұрын
When people ask me what AI is I generally say that it is a universal function approximator.
@gus2747
@gus2747 3 жыл бұрын
"If you squint the training process is sort of like a compiler " --- great sentence!
@geronimomiles312
@geronimomiles312 Жыл бұрын
You choose to tackle issues which really clarify the meat of the process , and do fantastic. Really good stuff👍
@explogeek
@explogeek 4 жыл бұрын
Loving your videos, I understand it takes time to research and script and edit, but I wish they came out more often...
@dontyoufuckinguwume8201
@dontyoufuckinguwume8201 4 жыл бұрын
The guy has a full time job, the only way to get him to make more videos is to donate ^^
@firefoxmetzger9063
@firefoxmetzger9063 4 жыл бұрын
hmm. If samples are chosen based on unusual examples where the ensemble disagrees, what happens if the exploiting strategy has high agreement among members of the ensemble? It would never show up to the human for "correction" right, because the ensemble is confident about it? So rather then having to trust the network that performs the task, we now have to trust the ensemble training the reward function?
@MatthewStinar
@MatthewStinar 4 жыл бұрын
I was thinking you would still want to throw in some strong matches just to verify.
@DarkPrject
@DarkPrject 4 жыл бұрын
This continues to be one of the most interesting channels on KZbin. Fascinating video. Can't wait to see the next one.
@n4th4ni3lmc5
@n4th4ni3lmc5 4 жыл бұрын
Awesome explanation and sounds like great progress in the field! Thank you very much, sir.
@arthurguerra3832
@arthurguerra3832 4 жыл бұрын
I've been so long without your videos. Please upload more frequently so we can drink your intelligence and knowledge.
@Telhias
@Telhias 4 жыл бұрын
With regards to puppeteering the robot to perform a backflip. There is a whole community of the Toribash game who do exactly that. It is a game in which every time period (measured in ms) you decide which joints to flex, extend, hold rigid and relax.
@sjeses
@sjeses 4 жыл бұрын
Absolutely fascinating. Thank you for putting in all the time and effort to introduce me to all these ideas in such an effective way.
@frib75
@frib75 4 жыл бұрын
An amazing video. Never heard such a beautiful explanation of what reinforcement learning is. Thank you !
@StromyYTA
@StromyYTA 4 жыл бұрын
These videos are awesome. Feel almost like I can keep up to date with AI progress.
@wiktormigaszewski8684
@wiktormigaszewski8684 4 жыл бұрын
This is what I always thought of making a good robot - you give a feedback to it, while it learns, just like parents to a child. Very good, that this concept has been put into practice. It is definitely going to be helpful for AI companies making robots for their clients, who do not know exactly, what they need. The guy from "two minute papers" would say "what a great time to be alive!" :-)
@reneko2126
@reneko2126 4 жыл бұрын
Yeah, why not just raise AI like kids? kzbin.info/www/bejne/m5K8eohsjr2ladk
@circle688
@circle688 7 ай бұрын
what a time to be alive
@Metrolonx
@Metrolonx 4 жыл бұрын
Love how the video quality grows with every video! Keep it up!
@daviddawkins
@daviddawkins 4 жыл бұрын
Incredibly well presented and articulate, thank you.
@xxThabaxx
@xxThabaxx 4 жыл бұрын
This is something I've been thinking a lot about as it could work similarly to how we tend to train children. It seems like you could first train a machine learning algorithm to recognize social cues (lingual and physical responses) regarding it's behavior and build a reward function based on that. I think you still run into some complicated reward hacking situations like the machine wanting to force certain reactions. But it seems like it would get us closer.
@eathonhowell7414
@eathonhowell7414 Жыл бұрын
This way of thinking is exactly what's getting me interested in this field. I cannot help but feel there is a comparison to be made between the in-exact nature of child raising, and trying to "teach" artificial intelligence. General or otherwise. Hell, think of an individual cell within the body as an AGI and the totality of what humans are seems like a miracle.
@eathonhowell7414
@eathonhowell7414 Жыл бұрын
This way of thinking is exactly what's getting me interested in this field. I cannot help but feel there is a comparison to be made between the in-exact nature of child raising, and trying to "teach" artificial intelligence. General or otherwise. Hell, think of an individual cell within the body as an AGI and the totality of what humans are seems like a miracle.
@gwen9939
@gwen9939 Жыл бұрын
@@eathonhowell7414 You should probably watch the video called Why not just Raise AI like Kids.
@morkovija
@morkovija 4 жыл бұрын
Been a long time Rob! Hope you brought the sauce!
@non_complete
@non_complete 4 жыл бұрын
I agree wholeheartedly with your name.
@wilhem13
@wilhem13 4 жыл бұрын
Most videos I MUST watch them on, at least x1.25.
@morkovija
@morkovija 4 жыл бұрын
@@wilhem13 means that your content information density is quite high. No way I can speed up mathologer for example. But easily 2-3x some non-narrated restoration videos
@johnopalko5223
@johnopalko5223 3 жыл бұрын
Thank you for not accepting sponsorship from a company that wanted you to do a 60-second spiel. There are companies who sponsor videos and are happy with just having their logo displayed in the corner once or twice. At most, they have the presenter start out with, "This video is sponsored by So-and-So. [One or two brief sentences.] Link in the description below." These are the companies that get it.
@esquilax5563
@esquilax5563 4 жыл бұрын
Good to see you on here again! You have some of the most fascinating content on KZbin
@mrWade101
@mrWade101 4 жыл бұрын
Scissors would be Old technology, whilst when most people say Technology they mean New technology.
@fergochan
@fergochan 4 жыл бұрын
Great video, but there's still one thing I'm confused about: how do I tell if that simulated robot is doing a back flip or a front flip?
@zachkrakower172
@zachkrakower172 4 жыл бұрын
Dude these videos are awesome. Thank you for taking the time to educate all of us!
@EU_DHD
@EU_DHD 4 жыл бұрын
I like watching you talk about AI safety more than I like learning about AI safety. And I really like learning AI safety!
@unvergebeneid
@unvergebeneid 4 жыл бұрын
Shade much? So you're not learning AI safety by watching him talk about it?
@EU_DHD
@EU_DHD 4 жыл бұрын
@@unvergebeneid Those are two aspects of the same thing. I just like the one aspect more than the other.
@dsdy1205
@dsdy1205 4 жыл бұрын
When you realise you've reinvented the parent-child relationship
@AugustusBohn0
@AugustusBohn0 3 жыл бұрын
nature wins again
@dsdy1205
@dsdy1205 2 жыл бұрын
God coming back to this comment a year later it sounds so stupid
@AsmageddonPrince
@AsmageddonPrince 4 жыл бұрын
Your voice is so soothing, and videos so informative.
@panstromek
@panstromek 4 жыл бұрын
This is really on point for a problem I am trying to solve now. I do some computer vision for which it is way too complicated to create training data and way too complicated to write reward function, but it's the "You know it, when you see it" type of thing. Thanks for making this video ;)
@BinaryReader
@BinaryReader 4 жыл бұрын
Technology is just another word for "Tool". Everything created by humans of some utility is a tool, and is therefore technology. I wasnt aware there was confusion around the definition.
@oldvlognewtricks
@oldvlognewtricks 4 жыл бұрын
Queueing was created by humans and is of some utility. Queueing is not technology. Stand-up comedy was created by humans, and is of some utility. Stand-up comedy is not technology. It is difficult (or perhaps impossible) to write a definition that doesn’t raise exceptions, which I suspect was the point Robert was trying to make. Your example only confirms the point.
@BinaryReader
@BinaryReader 4 жыл бұрын
Not to get into a huge discussion here, but both of those could be loosely defined as technologies. What are jokes if not tools of social interaction? What is queuing if not a tool for social order (assuming you mean standing in line and not the computer science definition, which is also a technology)
@oldvlognewtricks
@oldvlognewtricks 4 жыл бұрын
@@BinaryReader I continue to agree, and disagree. A joke and a queue might be tools, but 'technology' is more of a push. technology /tɛkˈnɒlədʒi/ - noun the application of scientific knowledge for practical purposes, especially in industry. "advances in computer technology" machinery and equipment developed from the application of scientific knowledge. "it will reduce the industry's ability to spend money on new technology" the branch of knowledge dealing with engineering or applied sciences. There is perhaps some science to comedy, but a social convention like queueing is hardly an application of science, so much as an emergent social expediency, or whatever. I'm not getting 'engineering' from either, except in the loosest sense. Alternatively, to take the definition to its logical conclusion, all human action is technology and the definition loses its usefulness. But you're right - no potential for confusion whatsoever ;) At best, there is comparative 'technology-ness' - a joke might be technology, but it's less technology than a smartphone. Maybe moreso than a punch to the face. Maybe it depends on context. Still works to make the 'this is not straightforward to define' point.
@squirlmy
@squirlmy 4 жыл бұрын
@@BinaryReader Perhaps it's an Americanism, but there's another definition of "tool", and you're well on your way towards demonstrating it. Both of you actually, because none of us need or want an in depth discussion of the definitions of either word. Rob's brief mention of it doesn't warrant further commentary.
@drdca8263
@drdca8263 4 жыл бұрын
Rob’s definition kind of closely matches Strong Bad’s definition, of “anything that’s really cool and you don’t know how it works”. Ryan North’s definition includes language, and I think basically any technique which has been invented. But yeah, like Rob says, it isn’t a big deal how we define it. Slightly different definitions can can be used in different social circles, or even in different conversations among the same people.
@jayteegamble
@jayteegamble 4 жыл бұрын
meh, we don't mind a 60 second spiel if it gets us more of your awesome content (and we can skip forward anyway). Grab that bag imo
@diribigal
@diribigal 4 жыл бұрын
This is a tough problem since watching to the end is probably valued by KZbin's AI, and even though you and I wouldn't mind, some would. So how do the short term gains of the sponsorship compare to the long term dividends of the KZbin algorithm and extra subscribers, which increase visibility over time (perhaps by a minor amount) ?
@sevret313
@sevret313 4 жыл бұрын
@@diribigal That's why you don't put the sponsor at the end, but the start.
@MrKohlenstoff
@MrKohlenstoff 8 ай бұрын
Great video, super well and clearly explained! 👌
@maloxi1472
@maloxi1472 3 жыл бұрын
Thank you for bringing this idea to my attention ! Holy cow ! This is such a simple, yet beautiful idea !
@Laborejo
@Laborejo 4 жыл бұрын
"It is easier to write a program to evaluate a solution". This is also why artificial music composition does not produce even half-decent outcomes yet. Creating an artificial listener (or many of them) is still far down on the to-do list.
@postvideo97
@postvideo97 4 жыл бұрын
There have been no research (that I know of) that uses human reward modeling for music generation. It could be the next breakthrough in music generation!
@Sceleri
@Sceleri 4 жыл бұрын
this method could work for that tho you just tell it which beat is more fire
@ToriKo_
@ToriKo_ 4 жыл бұрын
Sceleri exactly
@dasc000
@dasc000 4 жыл бұрын
emily howell: hold my beer
@jameswalker9403
@jameswalker9403 4 жыл бұрын
Have none of you heard of Emily Howell?
@bencrossley647
@bencrossley647 4 жыл бұрын
This sounds like a method to solve NP problems. Easy to verify Hard to solve.
@4.0.4
@4.0.4 4 жыл бұрын
The year is 2069. A computer is granted the prize for solving the P vs NP problem. Despite the judges being unable to confirm that the overly-complex thesis the computer came up with was correct or not, it looked quite correct to all experts. A mathematician was quoted saying: "...I mean, in the two new branches of mathematics that the computer invented, the math does check out." It is unknown what the computer will do with the prize, but several paperclip factories report being contacted shortly after the prize money was deposited.
@bencrossley647
@bencrossley647 4 жыл бұрын
Chrysippus +1 for paperclips (assuming you’re referencing the game) It will work it’s way to a galactic army at some point.
@Kevin________
@Kevin________ 4 жыл бұрын
@@4.0.4 Alright... you win this comment section.
@griest5493
@griest5493 4 жыл бұрын
I was thinking the same thing when he said that. Also, the halting problem is a thing. The catch is that NNs are just making approximations.
@default632
@default632 4 жыл бұрын
@@4.0.4 universalist paperclips, hours of waste time for a reference on the interwebs. Worth it
@MidnightSt
@MidnightSt 4 жыл бұрын
...i don't know much about this area of IT, but the first thing that came to my mind after reading the video title was: "oh, yeah, what's a better idea than creating a black box that nobody knows how and why it works, and what its boundary conditions actually are? why, yes, creating such a black box without even explaining to it what is good and what is bad! BRILLIANT!"
@V1ctoria00
@V1ctoria00 4 жыл бұрын
Damn. I dont usually find a new channel by its latest video. I was hoping I could binge this topic here.
@xenoblad
@xenoblad 4 жыл бұрын
You've been playing Raid: Shadow Legends for 10 years?!
@BubbleManxx
@BubbleManxx 4 жыл бұрын
I laughed at the Vsauce reference.
@Hexanitrobenzene
@Hexanitrobenzene 4 жыл бұрын
Could you provide a timestamp ? Looks like I missed it.
@BubbleManxx
@BubbleManxx 4 жыл бұрын
@@Hexanitrobenzene Lol, it's at the very start of the video. When he pops up from the lower half of the screen and asks "What is technology?".
@Hexanitrobenzene
@Hexanitrobenzene 4 жыл бұрын
@@BubbleManxx Oh, that one :) Looks like I'm rusty on VSauce, haven't watched him in awhile...
@andersenzheng
@andersenzheng 4 жыл бұрын
@@Hexanitrobenzene Not your fault. There hasnt been one for a while
@bscutajar
@bscutajar 4 жыл бұрын
This is one of the best channels of youtube. The guy's explanations are extremely well done.
@haldir108
@haldir108 4 жыл бұрын
I am EAGERLY awaiting that video about self-teaching or whatever it is.
@sk8rdman
@sk8rdman 4 жыл бұрын
"Mattresses and VPNs." Someone watches SmarterEveryDay
@DigitalicaEG
@DigitalicaEG 4 жыл бұрын
"Don't skip ahe..." Me: **skipping**
@bibasniba1832
@bibasniba1832 4 жыл бұрын
Priceless knowledge, swift explanation. Bravissimo!
@AlejandroPiad
@AlejandroPiad 4 жыл бұрын
This is the first of your videos I see, and you almost got my subscribe with the first philosophical half, but the second half was plain brilliant, so you definitely got my subscribe now.
@Deez-Master
@Deez-Master 4 жыл бұрын
We are getting close to having P=NP
@governmentofficial1409
@governmentofficial1409 4 жыл бұрын
Silicon Valley spoiler
@realityChemist
@realityChemist 4 жыл бұрын
"How do you learn when there's nobody who can teach you?" Read a textbook or a WikiHow article?
@Vode_ika
@Vode_ika 4 жыл бұрын
That is someone teaching you, via a book.
@realityChemist
@realityChemist 4 жыл бұрын
@@Vode_ika True, I was thinking in the context of someone sitting there teaching you, like in this video. So I guess the answer is just unsupervised learning? Although I could have sworn Rob already did a video on that... Maybe it was someone else on Computerphile?
@drdca8263
@drdca8263 4 жыл бұрын
Isn’t the answer “think very hard, write things down, and when you can do so safely, try many options, test your previous ideas both by the results of the options you took and by more thinking, repeat”?
@Biped
@Biped 4 жыл бұрын
@@drdca8263 but that all requires some way of evaluating your results (aka having a reward function that teaches you)... It seems weird that there would be a way without that. I mean... the information has to come from somewhere...
@SimonBuchanNz
@SimonBuchanNz 4 жыл бұрын
I would suspect the answer is, in fact, something like googling it, but this, of course, requires a pretty complete internal model of the world to start generating and testing against your own predictions. I'm struggling to think of alternatives that aren't just this in disguise though: the best I have is looking at a small set of successful examples and trying to break down from the solution used what the problem is, so you have something to test your own solutions against. If there's a decent way to describe that that isn't going to fall prey to small training data issues like overfitting, I'm excited: that's starting to really sound like the casual meaning of learning!
@bensonmiakoun7674
@bensonmiakoun7674 4 жыл бұрын
Highly interested for the next video! Thanks
@injinii4336
@injinii4336 4 жыл бұрын
Surely scissors are an example of some of our most cutting-edge technology. Ba-dum-tss!
@roberthoople
@roberthoople 4 жыл бұрын
"Training AI Without Writing A Reward Function..." *Capitalism Drools*
@MatthewStinar
@MatthewStinar 4 жыл бұрын
Watching this video made be realize how much corporations are like poorly programmed artificial intelligence, like the stamp collecting AI that decided to "Kill all humans." We take our instrumental goal of maximizing profits and assign that as the corporation's terminal goal. In pursuing it's terminal goal of maximising profits, the corporation decides to "Kill all humans." 😲
@Karpata1
@Karpata1 4 жыл бұрын
Hey if I have to hit the "L" button a couple times so you can get a couple hundreds or even a couple thousands of pounds I'm fine with it.
@mygreenlama
@mygreenlama 2 жыл бұрын
Thank you for another great video! I am very much looking forward to the continuation ;)
@rerere284
@rerere284 4 жыл бұрын
9:00 There's a game called Toribash where you do exactly this, but with a more complex body. It lets you specify the states of all the joints in 1 second tine segments, playing out like speed clocks from chess when playing multiplayer.
@Havermeijer
@Havermeijer 4 жыл бұрын
I remember that game! You could pull someones head off and stuff. Pretty difficult to master though. Also, the game kept sending me happy birthday emails for years and years. I didn't get one last time :(
@hypnotourist
@hypnotourist 4 жыл бұрын
Very clear presentation for a fascinating topic ! Your "patreon/human discussions" reward function has trained you well, so to speak :-)
@alexlamson
@alexlamson 4 жыл бұрын
Excellent work Rob, this is a good one. I think there's some content potential in reviewing these big RL papers
@augustinaslukauskas4433
@augustinaslukauskas4433 4 жыл бұрын
I'm not surprised this result is amazing considering both OpenAI and DeepMind worked on it. I dream of working for one of them after uni. Thank you for explaining the paper so clearly and in an entertaining way!
@sam-you-is
@sam-you-is Жыл бұрын
did you make it sir
@stefx5994
@stefx5994 4 жыл бұрын
Hi Rob, Firstly many thanks for the amazing videos you produce - as a fellow Dev and Techie i find your content and delivery style some of the best and most informative on KZbin. Could i request a future video in which you explain the coding side of developing a basic AI Agent? It would be great to learn how to explore some of the concepts and interesting problems your videos highlight. There's a lot of frameworks, open source projects and tutorials out there already, but they present a very black box, end result focused approach rather than explaining what components we have and how they are working together to reach the end result..the type of complexity you seem to be fantastic at explaining :)
@RobertMilesAI
@RobertMilesAI 4 жыл бұрын
I've been thinking about a "Write an AGI from scratch" series, but it would be a lot
@Kobriks1
@Kobriks1 4 жыл бұрын
Excellent explanation! Thank you.
@MrLuMax5
@MrLuMax5 4 жыл бұрын
In my opinion you could have done the sponsorship. It helps you as you help us, 60 seconds is like not that much and you deserve it for all the work.
@gabrote42
@gabrote42 2 жыл бұрын
These are brilliantly designed! I want more!
@derasor
@derasor 4 жыл бұрын
Hi Robert, that was a very interesting and very well done video, it got me to activate notifications, something that I've never done before in any other channel. I just wanted to say I can't be a patreon but I wouldn't mind watching advertisement on your videos, I know this in no way compromises the integrity of your great content, and if it helps you making more of these amazing videos more often, I think we all benefit.
@weirdsciencetv4999
@weirdsciencetv4999 Жыл бұрын
This channel is so underrated. I had to do just what he proposes in one of my experiments in college. The technique most definitely works!
@orcu
@orcu 4 жыл бұрын
I liked this explanation very much. Great work!
@rewrose2838
@rewrose2838 4 жыл бұрын
Lovely stuff here, very clear explanation and great use of the graph
@SandwichMitGurke
@SandwichMitGurke 4 жыл бұрын
Omg I think I just saw one of the best videos about ML so far. I’ve never heard of this and your explanation was just phenomenal. I am so hyped to try this out because I often make my ai really bad just because of a crappy reward function. (I do AI programming just as a hobby)
@ZachAgape
@ZachAgape 3 жыл бұрын
The first videos I saw u in were the computerphile videos on AI which I enjoyed a lot, and thanks, this video was very interesting too! Also thank you for not wanting to waste 60 seconds of our time ^^
@tedstokes57
@tedstokes57 4 жыл бұрын
I like that there's a hint about the next video at the end
@SkyboxMonster
@SkyboxMonster Жыл бұрын
Patreon feedback. Congrats you are now smarter than many hundreds of Advertising specialists. Just think of every advertising blunder that the public caught. but the designers did not.
@andybaldman
@andybaldman 4 жыл бұрын
*Woohoo, new vid. You do not post enough! Love all of your vids. You should post more often.*
@DahVoozel
@DahVoozel 4 жыл бұрын
Fascinating stuff as always.
AI That Doesn't Try Too Hard - Maximizers and Satisficers
10:22
Robert Miles AI Safety
Рет қаралды 201 М.
AI "Stop Button" Problem - Computerphile
20:00
Computerphile
Рет қаралды 1,3 МЛН
Суд над Бишимбаевым. 2 мая | ОНЛАЙН
7:14:30
AKIpress news
Рет қаралды 545 М.
Süper ❤️ Cute 💕💃 #dance
00:13
Koray Zeynep
Рет қаралды 22 МЛН
Nonomen funny video😂😂😂 #magic
00:27
Nonomen ノノメン
Рет қаралды 15 МЛН
A Response to Steven Pinker on AI
15:38
Robert Miles AI Safety
Рет қаралды 204 М.
There's No Rule That Says We'll Make It
11:32
Robert Miles 2
Рет қаралды 32 М.
An introduction to Reinforcement Learning
16:27
Arxiv Insights
Рет қаралды 637 М.
Deceptive Misaligned Mesa-Optimisers? It's More Likely Than You Think...
10:20
Robert Miles AI Safety
Рет қаралды 81 М.
Big Tech AI Is A Lie
16:56
Tina Huang
Рет қаралды 46 М.
Why Do I Avoid Sci-fi?
3:20
Robert Miles 2
Рет қаралды 13 М.
AI Safety Gym - Computerphile
16:00
Computerphile
Рет қаралды 119 М.
AI Invents New Bowling Techniques
11:33
b2studios
Рет қаралды 3,2 МЛН
What can AGI do? I/O and Speed
10:41
Robert Miles AI Safety
Рет қаралды 117 М.
How Stable Diffusion Works (AI Image Generation)
30:21
Gonkee
Рет қаралды 123 М.
🤯Самая КРУТАЯ Функция #shorts
0:58
YOLODROID
Рет қаралды 1,9 МЛН
🔥Новый ЛИДЕР РЫНКА СМАРТФОНОВ🤩
0:33
The PA042 SAMSUNG S24 Ultra phone cage turns your phone into a pro camera!
0:24
Эволюция телефонов!
0:30
ТРЕНДИ ШОРТС
Рет қаралды 853 М.
Result of the portable iPhone electrical machine #hacks
1:01
KevKevKiwi
Рет қаралды 7 МЛН
Опасная флешка 🤯
0:22
FATA MORGANA
Рет қаралды 546 М.