We Were Right! Real Inner Misalignment

  Рет қаралды 241,460

Robert Miles AI Safety

Robert Miles AI Safety

2 жыл бұрын

Researchers ran real versions of the thought experiments in the 'Mesa-Optimisers' videos!
What they found won't shock you (if you've been paying attention)
Previous videos on the subject:
The OTHER AI Alignment Problem: Mesa-Optimizers and Inner Alignment: • The OTHER AI Alignment...
Deceptive Misaligned Mesa-Optimisers? It's More Likely Than You Think...: • Deceptive Misaligned M...
The Paper: arxiv.org/abs/2105.14111
The Interpretability Article: distill.pub/2020/understandin...
Jacob Hilton's thoughts about what's going on: www.alignmentforum.org/posts/...
AI Safety Camp: aisafety.camp/
With thanks to my wonderful Patrons at / robertskmiles :
- Gladamas
- Timothy Lillicrap
- Kieryn
- AxisAngles
- James
- Jake Fish
- Scott Worley
- James Kirkland
- James E. Petts
- Chad Jones
- Shevis Johnson
- JJ Hepboin
- Pedro A Ortega
- Clemens Arbesser
- Said Polat
- Chris Canal
- Jake Ehrlich
- Kellen lask
- Francisco Tolmasky
- Michael Andregg
- David Reid
- Peter Rolf
- Teague Lasser
- Andrew Blackledge
- Brad Brookshire
- Cam MacFarlane
- Craig Mederios
- Jon Wright
- CaptObvious
- Brian Lonergan
- Girish Sastry
- Jason Hise
- Phil Moyer
- Erik de Bruijn
- Alec Johnson
- Ludwig Schubert
- Eric James
- Matheson Bayley
- Qeith Wreid
- jugettje dutchking
- James Hinchcliffe
- Atzin Espino-Murnane
- Carsten Milkau
- Jacob Van Buren
- Jonatan R
- Ingvi Gautsson
- Michael Greve
- Tom O'Connor
- Laura Olds
- Jon Halliday
- Paul Hobbs
- Jeroen De Dauw
- Cooper Lawton
- Tim Neilson
- Eric Scammell
- Igor Keller
- Ben Glanton
- Tor Barstad
- Duncan Orr
- Will Glynn
- Tyler Herrmann
- Ian Munro
- Jérôme Beaulieu
- Nathan Fish
- Peter Hozák
- Taras Bobrovytsky
- Jeremy
- Vaskó Richárd
- Benjamin Watkin
- Andrew Harcourt
- Luc Ritchie
- Nicholas Guyett
- 12tone
- Oliver Habryka
- Chris Beacham
- Nikita Kiriy
- Andrew Schreiber
- Steve Trambert
- Braden Tisdale
- Abigail Novick
- Serge Var
- Mink
- Chris Rimmer
- Edmund Fokschaner
- April Clark
- J
- Nate Gardner
- John Aslanides
- Mara
- ErikBln
- DragonSheep
- Richard Newcombe
- Joshua Michel
- P
- Alex Doroff
- BlankProgram
- Richard
- David Morgan
- Fionn
- Dmitri Afanasjev
- Marcel Ward
- Andrew Weir
- Kabs
- Ammar Mousali
- Miłosz Wierzbicki
- Tendayi Mawushe
- Wr4thon
- Martin Ottosen
- Andy K
- Kees
- Darko Sperac
- Robert Valdimarsson
- Marco Tiraboschi
- Michael Kuhinica
- Fraser Cain
- Robin Scharf
- Klemen Slavic
- Patrick Henderson
- Hendrik
- Daniel Munter
- Alex Knauth
- Kasper
- Ian Reyes
- James Fowkes
- Tom Sayer
- Len
- Alan Bandurka
- Ben H
- Simon Pilkington
- Daniel Kokotajlo
- Yuchong Li
- Diagon
- Andreas Blomqvist
- Iras
- Qwijibo (James)
- Zubin Madon
- Zannheim
- Daniel Eickhardt
- lyon549
- 14zRobot
- Ivan
- Jason Cherry
- Igor (Kerogi) Kostenko
- ib_
- Thomas Dingemanse
- Stuart Alldritt
- Alexander Brown
- Devon Bernard
- Ted Stokes
- Jesper Andersson
- DeepFriedJif
- Chris Dinant
- Raphaël Lévy
- Johannes Walter
- Matt Stanton
- Garrett Maring
- Anthony Chiu
- Ghaith Tarawneh
- Julian Schulz
- Stellated Hexahedron
- Caleb
- Clay Upton
- Conor Comiconor
- Michael Roeschter
- Georg Grass
- Isak Renström
- Matthias Hölzl
- Jim Renney
- Edison Franklin
- Piers Calderwood
- Mikhail Tikhomirov
- Matt Brauer
- Mateusz Krzaczek
- Artem Honcharov
- Tomasz Gliniecki
- Mihaly Barasz
- Mark Woodward
- Ranzear
- Neil Palmere
- Rajeen Nabid
- Clark Schaefer
- Olivier Coutu
- Iestyn bleasdale-shepherd
- MojoExMachina
- Marek Belski
- Luke Peterson
- Eric Rogstad
- Eric Carlson
- Caleb Larson
- Max Chiswick
- Aron
- Sam Freedo
- slindenau
- Johannes Lindmark
- Nicholas Turner
- Intensifier
- Valerio Galieni
- FJannis
- Grant Parks
- Ryan W Ammons
- This person's name is too hard to pronounce
- contalloomlegs
- Everardo González Ávalos
- Knut Løklingholm
- Andrew McKnight
- Andrei Trifonov
- Aleks D
- Mutual Information
- Tim
- A Socialist Hobgoblin
- Bren Ehnebuske
- Martin Frassek
- Sven Drebitz
- Quabl
- Valentin Mocanu
- Philip Crawford
- Matthew Shinkle
- Robby Gottesman
- Juanchi
/ robertskmiles

Пікірлер: 1 400
@llucos100
@llucos100 2 жыл бұрын
Turns out the Terminator wasn’t programmed to kill Sarah Connor after all, it just wanted clothes, boots and a motorcycle.
@Alorand
@Alorand 2 жыл бұрын
And ended up becoming the governor of California instead...
@spejic1
@spejic1 2 жыл бұрын
@@Alorand Becoming governor of California gets you MANY clothes, boots, and motorcycles.
@sevdev9844
@sevdev9844 2 жыл бұрын
Or making John Connor into a boyfriend. (You might think of Arnie when Terminator comes up, I think of Summer aka Cameron)
@Saka_Mulia
@Saka_Mulia 2 жыл бұрын
That's Terminator goals ... not termianl ... oh never mind ... i get it
@quitequiet5281
@quitequiet5281 2 жыл бұрын
LOL Yup... in retrospect with this paper... the terminator was a pursue bot... driving a threat variable towards the development and improvements of a General Artificial Intelligence and look at all the upgrades that series of pursuit bots facilitated. LOL
@vwabi
@vwabi 2 жыл бұрын
AI safety researchers are absolutely the last people on earth you want to hear "We were right" from.
@madshorn5826
@madshorn5826 2 жыл бұрын
And climatologists.
@Laszer271
@Laszer271 2 жыл бұрын
@@madshorn5826 Nah, epidemy can destroy the world in months, climate change can in decades. Superinteligent AI could probably destroy it before lunch :P
@donaldhobson8873
@donaldhobson8873 2 жыл бұрын
What about "we were totally wrong, the problem is much worse than we thought it was."
@madshorn5826
@madshorn5826 2 жыл бұрын
@@Laszer271 Well, destroyed is destroyed. Or are you the type not bothering with insurance and health check ups because a hypothetical bullet to the brain would rather quickly render those precautions moot?
@Laszer271
@Laszer271 2 жыл бұрын
@@madshorn5826 fair enough. It was all a joke though. But in your example, I still think "I just got a bullet to the brain" is worse than "I just got diagnosed with cancer". Maybe bullet is less likely, sure, but we were talking about the time that the danger was already proven, right? I think it's plausible that probability of my survival is greater conditioned on "we were right" statement being made by epidemiologist, climatologist or oncologist than it is conditioned on the same statement made by AI safety expert or like bullet...ologist.
@ShankarSivarajan
@ShankarSivarajan 2 жыл бұрын
10:54 "It actually wants something else, and it's capable enough to get it." Yeah, that _is_ worse.
@Encysted
@Encysted 2 жыл бұрын
The AI *does* in fact know how to drive a car, and it never really learned not to hit people.
@Rotem_S
@Rotem_S 2 жыл бұрын
@@Encysted or it learned how not to hit people, but hits them whenever there are no witnesses because it only cares about turning right
@InfinityOrNone
@InfinityOrNone 2 жыл бұрын
@@Rotem_S Or it learned not to hit people because it really cared about maintaining the present state of the paint job, which was white in the training environment. But the deployment environment uses a _red_ car.
@InfinityOrNone
@InfinityOrNone 2 жыл бұрын
@@Rotem_S Wow, your user name confuses the comments section.
@xelspeth
@xelspeth 2 жыл бұрын
@@InfinityOrNone It doesn't. It just display in the correct (right to left) reading direction that hebrew uses
@unvergebeneid
@unvergebeneid 2 жыл бұрын
Famous last words for species right before they hit the great filter: "Yo, in the test runs, did paperclips max out on the positive attribution heat map, too?"
@michaelpapadopoulos6054
@michaelpapadopoulos6054 2 жыл бұрын
There are so many layers to this comment and I love it.
@underrated1524
@underrated1524 2 жыл бұрын
I keep hearing the notion of AI being the great filter, but I can't say I buy it. Not that AGI isn't an existential threat, because it absolutely is. It just can't explain why we don't see any signs of aliens when we look up at the sky, because if the answer is "AGI", then that begs the question: "Okay, so why don't we see any of those, either?"
@ayushsharma8804
@ayushsharma8804 2 жыл бұрын
@@underrated1524 what if agis prefer to kill their creators and enter some deep bunker in some Rouge planet to await heat death after reward hacking their brains. Still dosent explain why they are aren't here preparing to kill us.
@unvergebeneid
@unvergebeneid 2 жыл бұрын
@@underrated1524 I agree. Especially the paperclip optimizer should show itself in the form of huge paperclip-shaped megastructures around distant stars. It still made for a good joke though, if I do say so myself.
@sageinit
@sageinit 2 жыл бұрын
[Laughs in Grabby Aliens, Synthetic Super Intelligence, Gaia Hypothesis, Global Brain, & Planetary Scale Computation]
@bierrollerful
@bierrollerful 2 жыл бұрын
Almost sounds like AIs will need psychologists, too. "So I tried to acquire that wall..." "Why not the coin? What is it about the wall that attracts you?" "Well, in training, I always went to the... oh...huh, never thought about it that way."
@crubs83
@crubs83 2 жыл бұрын
AI safety researchers ARE psychologists as far as im concerned.
@PMA65537
@PMA65537 2 жыл бұрын
I was coping ok before the awful behaviour of that other AI used by the Shah of Lugash.
@lobrundell4264
@lobrundell4264 2 жыл бұрын
this made me smile : D
@ChrisBigBad
@ChrisBigBad 2 жыл бұрын
I clearly remember a Civ-Type game, where one of the research-items was "AI without personality problems"
@bierrollerful
@bierrollerful 2 жыл бұрын
@@ChrisBigBad Sounds like research an AI with personality problems would try.
@rofl22rofl22
@rofl22rofl22 2 жыл бұрын
Robert Miles: "We were right" Me: Oh no "About inner misalignment" OH NO
@LeoStaley
@LeoStaley 2 жыл бұрын
Yeah. The only thing worse is, we were right about AI being deceptive about its goals during training before deployment.
@JM-us3fr
@JM-us3fr 2 жыл бұрын
@@LeoStaley Or even worse: We were right about AI being more dangerous than nukes
@MetsuryuVids
@MetsuryuVids 2 жыл бұрын
@@JM-us3fr That's almost certain.
@LeoStaley
@LeoStaley 2 жыл бұрын
@@JM-us3fr oh no, that's absolutely going to be true at some point. The only real question is, can we stop them from deciding to (even accidentally) kill us? Can we even avoid making them accidentally WANT to kill us because we accidentally fucked up the training environment?
@ARVash
@ARVash 2 жыл бұрын
@@JM-us3fr Nukes are safe because they kill people you don't want dead. I'd say an AI is definitely more dangerous because it has much more capacity to be selective. It could also be safer, really depends on the implementation details, much like a person. A person can be safe, or dangerous. Can we even avoid making a human accidentally want to kill us because we accidentally fucked up the training environment? Maybe.
@moartems5076
@moartems5076 2 жыл бұрын
Looking at my hoard of keypicks in skyrim, i can confirm, that this is perfectly human behavior.
@OMGclueless
@OMGclueless 2 жыл бұрын
When you think about it, yeah, it's very human-like. Kind of like gambling addicts who know that they're losing money when they play but have trained themselves to like the feeling of winning money rather than the ultimate goal of a comfortable happy life or even the instrumental goal of having money.
@threeMetreJim
@threeMetreJim 2 жыл бұрын
Definitely, what is wrong with collecting as many keys as possible if you want to open as many chests as possible, and each requires a key? In a maze you don't know what is round the corner in advance. Trying to collect your own inventory is simply a programming error if the agent can see the part of the screen that is designed as a guide for a human to observe the progress.
@Spellweaver5
@Spellweaver5 Жыл бұрын
@@threeMetreJim yes, but not trying to open the remaining chests is definitely the goal learned wrong.
@sharktrap267
@sharktrap267 Жыл бұрын
@@threeMetreJim if my AI is built to keep my wood storage at a certain level by collecting wood in my forest but it learnt to "collect all the keys"(all the wood), my forest will soon become a plain. It's an issue, because growing trees takes time, wood takes storage space and any wood not protected can become unsuitable for the usage. You're not just wasting ressources, you're also at risk of not having wood available at some point. And if you use the forest to hunt too, you can start learning to hunt in a plain. So depending on the goals and situation, hoarding can lead to issues
@proskub5039
@proskub5039 2 жыл бұрын
A coin isn't a coin unless it occurs at the edge of the map! We may think the AI is weird for ignoring the heretical middle-of-the-map coin, but that's just our object recognition biases showing.
@GigaBoost
@GigaBoost 2 жыл бұрын
Literally this haha
@sabelch
@sabelch 2 жыл бұрын
Great interpretation! But it doesn't seem to explain why the AI goes to the edge of the map even when there isn't a coin there.
@GigaBoost
@GigaBoost 2 жыл бұрын
@@sabelch it still seemingly learns to favor walls, if you look at the heatmaps. Perhaps without the coin all it has to go by with positive value is the walls.
@proskub5039
@proskub5039 2 жыл бұрын
@@GigaBoost Yes, the salient point here is that we should not assume that the AI interprets objects the way we would. And any randomness in the learning process could lead to wildly different edge-case behaviors..
@GigaBoost
@GigaBoost 2 жыл бұрын
@@proskub5039 absolutely!
@bartman999
@bartman999 2 жыл бұрын
Nothing more terrifying than seeing the title 'We Were Right!' on a Robert Miles video.
@captainufo4587
@captainufo4587 2 жыл бұрын
In a way, yes. In another way, up to this point there was a debate whether AI safety was a real concern worth investing research, time and money, or just overworrying. It's a good thing that these demonstrations prooved it's the former, and that they happened this early in the history of AIs.
@dmytrolysak1366
@dmytrolysak1366 2 жыл бұрын
Let alone simple AI, _people_ get misaligned like that quite often - hoarding is one good example, which happens both in real life and in games like with those keys.
@nikolatasev4948
@nikolatasev4948 2 жыл бұрын
It keeps amazing me how AI problems are increasingly becoming general human problems. "if we give a reward to the AI when it does a job we want, how do we stop it from giving itself the award without the job" - just as humans give themselves "happiness" with drugs. "how do we make sure the AI did not just pretend to do what we wanted while we were watching" - just as kids do.
@sonkeschmidt2027
@sonkeschmidt2027 2 жыл бұрын
@@nikolatasev4948 which is why eventually ai research will have to dive into religion/spirituality. Those where the only successful attempts humans made to solve the general problems that we have. Not saying that all of them where successful, life always moves on, there is always growth and decay/change. But every now and then they generated "the solution" to everything, rippling down to millions and billions of people trying to imitate that.
@markusmiekk-oja3717
@markusmiekk-oja3717 2 жыл бұрын
@@sonkeschmidt2027 I would claim religion does not help with that type of problem.
@sonkeschmidt2027
@sonkeschmidt2027 2 жыл бұрын
@@markusmiekk-oja3717 then I invite you to look at what religion does. Functional religion, I'm not talking about what you know or have heard about it going wrong, in talking about the cases where it does work (which are those you never hear of because... Well because they work, they don't cause trouble but bring stability, that doesn't make news). If you look into that you understand why religion is a global phenomenon and why it has the power it has. If you feel with scientists you will also find that the West doesn't have stopped being religious, they just rebranded it and called it science. We live in a world with a huge amount of uncertainty and where mistakes can have huge negative consequences. Humans can't deal with that without a working believe system. You have tons of these you just wouldn't consider them religious probably. That will change, should life ever show you the scope of uncertainty there is. Good luck making it though without a (spiritual/religious) belief system that is in alignment with the society you life in. =)
@nikolatasev4948
@nikolatasev4948 2 жыл бұрын
@@sonkeschmidt2027 Well, the video about Generative Adversarial Networks with an agent trying to find flaws and break the AI we are training gave me strong Satan vibes. But apart from that I don't think we need further research into religion/spirituality. Simply put they work on us, a product of long evolution in specific environment. We need a more general approach, since AIs are a product of very different evolution and environment. Some solutions for the AI may resemble some religious notion, just as some scientific theories resemble some religious ideas, but trying to apply religion to AI is bound the fail just as applying religion fails in science.
@goonerOZZ
@goonerOZZ 2 жыл бұрын
Somehow the terminal and instrumental goals talk made me correlate the AI with us. As a financial advisor, I have found that many people also made this mistake that money is an instrumental goal, but having spend so much time working to get money, people start to think that money is their terminal goal so much so that they spend their entire live looking for money forgetting why they want to have the money in the first place.
@anandsuralkar2947
@anandsuralkar2947 2 жыл бұрын
True
@MenwithHill
@MenwithHill 2 жыл бұрын
Very much the same feeling on my end. I actually found it cute when the chest opening AI just started collecting keys.
@lennart-oimel9933
@lennart-oimel9933 2 жыл бұрын
The reason why I watch this Channel is mostly because you can correlate almost every video to human intelligence. And it makes sence: Why should'nt the same rules apply to us that apply to AI? I see this Channel as an analyses of the problems of intelligence in general. Not only the ones we make;)
@GrilledCheeseSandwich1
@GrilledCheeseSandwich1 2 жыл бұрын
It seems like no one realized that this idea is hinted at by the song in the outro: Jessie J - Price Tag. The most famous line from the song is: It's not about the money, money, money
@jackren295
@jackren295 2 жыл бұрын
@@lennart-oimel9933 Me too. After watching this channel, I started to agree with the notion of "making AI = playing god" that I've heard sometimes in the past. At first, I didn't put too much thoughts on it. But now I've realized that making powerful AGIs that are safe and practical requires us to know all the weaknesses of the human mind, and make a system that avoids all these weaknesses while still performing at least as well as we can. It's like making the perfect "human being" in some sense.
@charliesteiner2334
@charliesteiner2334 2 жыл бұрын
9:00 "We developed interpretability tools to see why programs fail!" "What's going on when they fail?" "Dunno." No shade, interpretability is hard, even for simple AI :P
@YuureiInu
@YuureiInu 2 жыл бұрын
It just likes the coins next to the end wall. Why would you teach it to like only those and expect it to get any other coins?
@SimonClarkstone
@SimonClarkstone 2 жыл бұрын
It reminds me of koalas that can recognise leaves on plants as food, but not leaves on a plate.
@gabrote42
@gabrote42 2 жыл бұрын
@@SimonClarkstone interesting
@Bacopa68
@Bacopa68 2 жыл бұрын
@@SimonClarkstone AI HAS ADVANCED TO THE KOALA LEVEL. REPEAT, KOALA LEVEL. Ah, so basically nothing then.
@raskov75
@raskov75 2 жыл бұрын
And the more complex these systems get, the harder it becomes. Oi vey.
@Turtle76rus
@Turtle76rus 2 жыл бұрын
Can't wait for the "We Were Right! Real Misaligned General Superintelligence" video
@michaelspence2508
@michaelspence2508 2 жыл бұрын
One more sentence and this would be the scariest Two Sentence Horror Story I've ever seen
@unvergebeneid
@unvergebeneid 2 жыл бұрын
Now here's a reason to actually "hit that bell icon" if I've ever seen one. Because the time window to watch that video would be rather small I imagine 😄
@PetardeWoez
@PetardeWoez 2 жыл бұрын
probably the last video ever made on the topic
@Zeekar
@Zeekar 2 жыл бұрын
The question: which takes longer? Uploading a video to KZbin or the entire world being converted to stamps?
@christiangreff5764
@christiangreff5764 2 жыл бұрын
@@Zeekar Teh former. At the point that video would be produced, we would have our ands full with with fighting the mechanical armies of the great paperclip maximiser (and it would have probably hacked and monopolized the internet to limit our communication channels).
@RichardEntzminger
@RichardEntzminger 2 жыл бұрын
I feel like this isn't just a problem with artificial intelligence but intelligence in general. Biological intelligence seems to mismatch terminal goals and instrumental goals all the time like Pavlovian conditioning training a dog to salivate when recognizing a bell ringing(what should be the instrumental goal) or humans trading away happiness and well being (what should be the terminal goal) for money (what should be an instrumental goal).
@Racnive
@Racnive 2 жыл бұрын
Organizations founded with the intent of doing X end up instead doing something that *looks like they're doing X*, because that's what people see; that's what people hold them accountable to. It doesn't even take intelligence: Evolution by natural selection doesn't require any intelligence to winnow things away from what they "want" (terminal goals, should they exist), toward what will survive/replicate (at least in principle, an instrumental goal).
@salec7592
@salec7592 2 жыл бұрын
I concur with this. The problem is not AI specific and should be termed something along lines of "general delegation problem" or problem of command chain fidelity. The subset of which is Miles' nightmare with inverted capability hierarchy, where command is passed by less able actor to more able actor (e.g. a human to an advanced AI).
@Sindrijo
@Sindrijo 2 жыл бұрын
@@salec7592 Even if with prefect interpretability of each composite of an AI (e.g. the layers in a neural network) ulterior goals might still be encrypted into looking 'good'. An AI command structure with short circuiting breaks in the reward-loop might help. E.g. you will have people issuing commands/goals to an interpreter AI which interprets and delegates those commands to another AI (without knowing if it is delegating to an AI or not) reduce the chance for goal-misalignment by reducing the impact of the complete-loop feedback with shorter feedback loops, also randomly substitute each composite part of the command-delegation chain during training.
@sonkeschmidt2027
@sonkeschmidt2027 2 жыл бұрын
Is that a problem though? Or isn't good what makes life possible in the first place? After all if you want to solve the problem that is life, then you just kill yourself. All problems solved. But then you can't experience life. So live needs decay in order to create new problems so that something new can happen. Needing in the sense that existence can only exist as long as it exists. Without existence you don't have problems but you don't have existence either.
@nahometesfay1112
@nahometesfay1112 2 жыл бұрын
@@sonkeschmidt2027 I might sound sarcastic, but the following questions are sincere. Do you think it's ok for AI to take over the world? Perhaps even drive humanity to extinction? Humans have done the same to other species even other humans and humans are not unique from the rest of life in this respect. As you said decay makes way for new life. I think humanity should be preserved because I find destruction in general unsettling. To be clear I'm not saying you are wrong or that you believe what I just said. I'm just wondering how your ideas extend in these topics Edit: typing on my phone so I missed some other stuff: do you think existence is better than non-existence? To me non-existence is neutral. Do you think humans have a moral imperative to maintain their existence? Do you think humans need to go extinct at some point so that reality can continue to change? You brought up some very interesting ideas and I just wanted hear more of your thoughts.
@Practicality01
@Practicality01 2 жыл бұрын
This is starting to get an "unsolvable problem" vibe. Like we are somehow thinking about this in the wrong way and current solutions aren't really making good progress.
@michaeljburt
@michaeljburt 2 жыл бұрын
Very much so. The psychology of teaching/learning as humans isn't really understood. What *actually* happens when you learn something new for the first time? Feedback on that process is vital. How do you give a machine feedback on what it learned, when you don't know what it learned exactly? It can't communicate to us what it "felt" it learned. In others words, human says: "I said the goal was X". Machine says: "I thought the goal was Y".
@AfonsodelCB
@AfonsodelCB 2 жыл бұрын
@@michaeljburt realize: we actually want these things to be much better than humans. but we might be underestimating how maxed out humans are at certain things. humans have goal missalignments all the time, and many aren't detected for years
@josephburchanowski4636
@josephburchanowski4636 2 жыл бұрын
"This is starting to get an "unsolvable problem" vibe. Like we are somehow thinking about this in the wrong way and current solutions aren't really making good progress." Welcome to AI Safety. The best part is that if we don't solve the "unsolvable problem", we might all die. Along with all life on Earth, along with all life in the galaxy, along with all life in the galaxy cluster. And with cannibalizations of all planets and stars for resources for some arbitrary terminal goal. A potential outcome is a dead dark chunk of the universe built as a tribute to something as arbitrary as paper clips or solving an unsolvable math problem.
@sonkeschmidt2027
@sonkeschmidt2027 2 жыл бұрын
Aren't we touching the biggest unsolvable problem in existence? Existence itself? Think about how terrifying it would be if you could solve every problem, if you could solve life. That means there would be an absolute border that you would be infinitely stuck with... Sounds better to me that there will always be a new problem to be solved...
@AlejandroMarin.design
@AlejandroMarin.design 2 жыл бұрын
Alignement in humans is solvable. I developed a methodology to do it easily and quickly. So I think alignment in machines is solvable. I’ve actually designed the methodology to serve machine alignment as well. We’ll get there, don’t despair.
@-41337
@-41337 2 жыл бұрын
imagine a future where a very trusted ai agent seems to be fantastically doing its job well for many months or years, and then suddenly goes haywire since its objective was wrong but it just hadn't encountered a circumstance were that error was made apparent. then tragedy!
@TulipQ
@TulipQ 2 жыл бұрын
I doubt it will be a grand revel. People will die due to a physical machine, these interpreter tools can then be used to argue the victim did something wrong, that a non AI system did the fault, or that a human supervisor was neglegent. The deployment enviorment is one full of agents optimized for avoiding liability.
@CyborusYT
@CyborusYT 2 жыл бұрын
That's actually not that far from normal computer systems There are countless stories of a system (ordinary computer system) suddenly reaching a bizarre edge-case and start acting completely insane
@NoName-zn1sb
@NoName-zn1sb 2 жыл бұрын
@@TulipQ negligent
@gastonmarian7261
@gastonmarian7261 2 жыл бұрын
Like when we designed computers without thinking / knowing about cosmic ray bit flips, so decades later a plane falls out of the sky because their computer suddenly didn't know where it was in the sky. Humans are a trusted ai agent deployed in a production environment with limited understanding of what's going on
@demoniack81
@demoniack81 2 жыл бұрын
@@CyborusYT Yeah, it happens literally all the time. It's just that usually the error gets caught somewhere along the way, an exception is thrown, and the process is terminated. Which is where you get the error page and then pick up the phone and go talk to an actual person in customer service who can either override it or get the IT team to fix the problem.
@Houshalter
@Houshalter 2 жыл бұрын
Imagine training a self driving car in a simulation where plastic bags are always gray and children always wear blue. It then happily runs down a child wearing gray, before slamming on the brakes and throwing the unbuckled passengers through the windshield, for a blue bag on the road.
@nullone3181
@nullone3181 2 жыл бұрын
The brat in gray was asking for it
@GetawayFilms
@GetawayFilms 2 жыл бұрын
Imagine training a self driving car to the point where it can competently navigate complex road systems, yet can't remain stationary until all passengers are buckled up...
@Houshalter
@Houshalter 2 жыл бұрын
@@GetawayFilms cars sold today only flash a warning light/noise if you don't buckle, and only because government regulations mandate it. Even then most people disable it
@GetawayFilms
@GetawayFilms 2 жыл бұрын
@@Houshalter so what you're saying is . It's a 'people' thing... Ok
@sonkeschmidt2027
@sonkeschmidt2027 2 жыл бұрын
Humans do that all the time. Except that we have a deep genetic imperative to recognise children and to protect them but there are loads of examples where these instincts are overwritten....
@andrewweirny
@andrewweirny 2 жыл бұрын
This is one of your clearest and most interesting videos to date. I'm now very excited for the interpretability video!
@JabrHawr
@JabrHawr 2 жыл бұрын
a viewer's comment from 2 days ago despite the video having been published just few hours ago. you must be a patron, or an acquaintance
@andrewweirny
@andrewweirny 2 жыл бұрын
@@JabrHawr the former.
@michaeljburt
@michaeljburt 2 жыл бұрын
Agreed. Exciting stuff
@YuureiInu
@YuureiInu 2 жыл бұрын
"Can you spot the difference?" Pauses the video and looking for the difference....nothing. Unpause. "You can pause the video." Pauses again and manically looking for a pattern. More keys? "There's more keys in the deployment. Have you spotted it?" Yes!!!!
@thefakepie1126
@thefakepie1126 2 жыл бұрын
@Impatient Imp I've counted 12
@Lawofimprobability
@Lawofimprobability 2 жыл бұрын
I noticed less boxes but didn't notice more keys (probably because of the colors being too similar for the few seconds of looking.
@felixmerz6229
@felixmerz6229 2 жыл бұрын
The thought of creating a capable agent with the wrong goals is terrifying, actually; and yes, an agent being bad at doing something good is absolutely a problem much preferable to an agent being good at doing something bad.
@xxxJesus666xxx
@xxxJesus666xxx 2 жыл бұрын
speaking of A.I. or psychology?
@gadget2622
@gadget2622 2 жыл бұрын
@@xxxJesus666xxx yes
@ThrowFence
@ThrowFence 2 жыл бұрын
Isn't this exactly what's happening with mega corporations?
@sharpfang
@sharpfang 2 жыл бұрын
Reminds me of the elections a couple years ago in Poland. A very competent and capable, but thoroughly corrupt and evil political party was voted out and replaced with a party just as corrupt and evil but vastly less competent.
@felixmerz6229
@felixmerz6229 2 жыл бұрын
@@sharpfang That unironically is an improvement in today's political landscape. If I'd have to choose a form of evil, it'll always be the less capable rather than the less sinister.
@Huntracony
@Huntracony 2 жыл бұрын
Did you intentionally use the "It's not about the money" song for the video about the AI not going for the coins? Either way, that's quite funny. Well done.
@PhoebeLiv
@PhoebeLiv 2 жыл бұрын
His song choices are always amusingly on the nose, actually! A few off the top of my head are "the grid" for his gridworlds video, "mo money mo problems" for concrete problems in AI safety, and "every breath you take (I'll be watching you" for scalable supervision
@Huntracony
@Huntracony 2 жыл бұрын
@@PhoebeLiv Nice! Hadn't noticed before, but I'll definitely start paying some closer attention form now on.
@thewrongjames
@thewrongjames 2 жыл бұрын
Another on the nose choice was Jonathan Coulton's "It's Gonna be the Future Soon" on the video about what AI experts predict will be the future of AI.
@matthewwhiteside4619
@matthewwhiteside4619 2 жыл бұрын
He also used "I've got a little list" in one of his list videos.
@SpoonOfDoom
@SpoonOfDoom 2 жыл бұрын
I didn't catch that, that's great!
@ARVash
@ARVash 2 жыл бұрын
An interpreter, a mind reading device, once you read it and respond becomes a way for an agent to "communicate" with you and they can communicate things that give an impression that hides their actual goal. A lot of these challenges arise when training or coordinating humans, and it's somewhat unsurprising that while a mind reading device might seem to help at first, it's not going to be long before someone figures out how to appear like they're doing the right thing, while watching tv.
@saxy1player
@saxy1player 2 жыл бұрын
Great idea!
@Voshchronos
@Voshchronos Жыл бұрын
Very well put
@Tutorp
@Tutorp 2 жыл бұрын
Hey, the key-AI works kind of the same way most people do when playing computer games... "Oooh, shiny things I don't need all off? I need them all! Game objectives? Meh..."
@EebstertheGreat
@EebstertheGreat 2 жыл бұрын
It looks like in the keys and chests environment, the AI was trying to get both keys and chests, but it was strongly prioritizing keys. When there were more chests than keys, it was always spending its keys quickly, so it never ended up with a bunch in its inventory. As a result, it never learned that keys at the left edge of the inventory were impossible to pick up, so it just got stuck there trying to touch them, since they were more important than the remaining chests.
@isaacgraphics1416
@isaacgraphics1416 2 жыл бұрын
it's the same problem evolution ran into when optimising our taste palate. Fat and sugar were highly rewarded in the ancestral environment, but now we live in a different (human created) environment, that same goal pushes us beyond what we actually need and creates problems for us.
@silphonym
@silphonym 2 жыл бұрын
@@isaacgraphics1416 It's really cool and scary to think of how this stuff applies to our natural intelligence as well.
@ohjahohfrick9837
@ohjahohfrick9837 2 жыл бұрын
@@silphonym Well both came about from essentially the same process.
@rentristandelacruz
@rentristandelacruz 2 жыл бұрын
Now we need an intepretability tool for the interpretability tool.
@badwolf4239
@badwolf4239 2 жыл бұрын
We heard you liked interpretability, so we made an interpretability tool for your interpretability tool so you can interpret while you interpret. Now go ask your chess playing AI why it just turned my children into paperclips.
@josephburchanowski4636
@josephburchanowski4636 2 жыл бұрын
@@badwolf4239 It told me that it was showcasing its abilities so it can convince human opponents to resign. Researching misaligned AI examples, it tried deciding what way of transforming someone's children would be the most intimidating. It was a choice between paper clips, stamps, and chess pieces. Also there was some mention it was contemplating turning them into human dogs hybrids. I don't know why. Something dealing with a bunch of people have trauma about a Nina something.
@christiangreff5764
@christiangreff5764 2 жыл бұрын
@@josephburchanowski4636 At least it did not develop a shap shifting clown body in order to eat them ...
@custos3249
@custos3249 2 жыл бұрын
Well, pardon my comparison, but you've effectively found an adjunct to heuristic behavior based on sensory inputs like "things that taste sweet are good" and ending up with a dead kid after they drink something made with ethylene glycol. If it's always operating on heuristics, you'll never be sure it's learned what you intended, arguably even after complex demonstrations, given the non-zero chance of emergent/confounding goals. But, relative to human psychology at least, that's not a death sentence - weighting rewards differently, applying bittering agents, adding a time dimension/diminishing reward overtime jump to mind to trying to at least get apparent compliance. Besides, if the goal is "get the cheese," it needs to able to sense and comprehend "cheese," not just "yellow bottom corner good."
@saxy1player
@saxy1player 2 жыл бұрын
I'm not sure I understand you completely, but that IS the biggest problem with these 'intelligent' systems. We have no idea (let's not kid ourselves) how they work. But we are happy when they do what we want them to. Let's not think about what happens when we let these kind of systems act in the world in a broader sense and live happy until then xD
@jeremysale1385
@jeremysale1385 2 жыл бұрын
The ability to slow down and switch into more resource-intensive system 1 thinking when a problem is sufficiently novel is how humans (sometimes) get around this heuristic curse. I wonder if there is some analog of this function that could be implemented in machine learning.
@ChaoticNeutralMatt
@ChaoticNeutralMatt Жыл бұрын
@@jeremysale1385 I imagine that will be the case eventually.
@pumkin610
@pumkin610 Жыл бұрын
Humans can chase things that seem appealing to us based on what we learned, but we can also choose to pursue a random/ painful goal just because we want to, sometimes we just don't know the negative ramifications of an action, and sometimes we believe things that aren't true.
@custos3249
@custos3249 Жыл бұрын
@@pumkin610 Neat. Bet that can still be reduced to and restated as "novelty is good." No matter what goal, drive, etc. you can come up with, it can be put in simple approach/avoidance terms, even seemingly paradoxical behavior. It all comes down to reward.
@johnno4127
@johnno4127 2 жыл бұрын
I realized I experience misalignment do to poor training data every couple weeks. . I work as a courier delivering packages in Missouri, USA, and I often meet people at their homes or workplace. Unfortunately, I don't learn their names as attached to their faces, but rather as attached to locations so that when I meet them someplace else I can't remember their names easily (if at all).
@mscout1
@mscout1 Жыл бұрын
I had someone from my TableTop club say 'hi' to me in the gym. No idea who it was, because my brain was searching the wrong bucket of context.
@GamesFromSpace
@GamesFromSpace 2 жыл бұрын
Just to be safe, start including pictures of human skulls when doing a pass with those interpretability tools.
@mhelvens
@mhelvens 2 жыл бұрын
Ah, we're noticing negative attribution when they are surrounded by skin, but positive attribution when they are piled up with a throne stacked on top. I wonder what this means. 🤔
@Swingingbells
@Swingingbells 2 жыл бұрын
AI agent: \*stomp\*
@lilDaveist
@lilDaveist 2 жыл бұрын
@@Swingingbells If picture == human skull: Action = None Ai: „If picture == Human Skull; Action = Double stomp“ „Gotcha“
@arvidhansen5892
@arvidhansen5892 Жыл бұрын
Well what if the ai wouldn't even have considered obtaining human skulls before and just by introducing them to it, you just screwed up big time
@offchan
@offchan 2 жыл бұрын
It's the problem of vague requirement. It's similar to when you tell someone to do something but they do the wrong thing. Human solves this by having similar common sense as another human and use communication to specify stricter requirement.
@user-zn4pw5nk2v
@user-zn4pw5nk2v 2 жыл бұрын
Yes, "give me a thing which looks like that other thing i mentioned earlier" in a room full of junk(without additional context), have had that problem.
@dsdy1205
@dsdy1205 Жыл бұрын
Actually humans 'solve' this by having a reward function (emotions) that are only vaguely and very inconsistently coupled with reality, while mounting the whole thing in a very resource intensive platform where half the processing capability is used just to stay alive, and modifying itself is so resource intensive that most don't even try. And even then, we manage to inflict suffering to millions if not billions, so I'd say this isn't really solved either
@cornoc
@cornoc Жыл бұрын
@@dsdy1205 yeah, i'm starting to think this is a fundamental problem that can't be removed, and that the only reason we aren't as worried about the same thing with humans is that the power of any particular human being is limited by the practical constraints imposed by their physical body and brain power. when you give the same type of rationality engine to a super powerful being, all kinds of horrible things are going to happen. just look at any war to see how badly a large group of humans led by a few maniacs can fuck up decades of history and leave humanity with lasting scars for centuries or more.
@9600bauds
@9600bauds 2 жыл бұрын
It's easy enough to have the AI tell you what it "wants" - inside an environment. What you need to know is what it wants *in general*, which is a lot harder. This is why the insight tool isn't very insightful: it's showing you what the AI wants in the current environment, but it doesn't bring us a lot closer to understanding *why* it wants those things in that environment. The solution? Idk lol
@AscendantStoic
@AscendantStoic Жыл бұрын
Is there even a why at this point without the A.I having free will or self-awareness?. Like aren't we the ones reinforcing its interactions or downplaying them with the different objectives in the environment to teach it what to go for and what not to do?, if it goes for key or coin we put emphasis on it as positive interaction it should do more of, if it hits a buzzsaw we point it out as a negative thing it should do less of, until it learns it needs to get the coin and avoid the buzzsaws.
@ChaoticNeutralMatt
@ChaoticNeutralMatt Жыл бұрын
@@AscendantStoic It sounds easier than it actually is, basically. You can certainly try, but there is still the uncertainty of what it actually learned.
@charaicommenternotalt
@charaicommenternotalt 3 ай бұрын
​@@AscendantStoic It doesn't NEED self awareness. For example in an AI that is trained to recognize cats and dogs, there is still a sort of 'why' it thinks this picture is a dog and not a cat, even though it is not conscious or anything. And also the problem is that it's very hard to teach an AI what we want it to do. If we tell it to get a coin it may learn to do another goal entirely, unbeknownst to us, that still gets the job done. The problem is when it fails and we realize it's learning a different goal. I think the solution is having the AI learn multiple tasks.
@JamesPetts
@JamesPetts 2 жыл бұрын
I shall very much look forward to the interpretability video - this should be very interesting.
@sealpiercing8476
@sealpiercing8476 2 жыл бұрын
I actually feel slightly more optimistic about the problem after watching this video. The odds of a deployed system screwing up in a really spectacular way that raises the salience of the issue seem high. But relatively soon, before the capabilities of such a system would be even more dangerous.
@alexanderbrady5486
@alexanderbrady5486 2 жыл бұрын
It is good news if you were afraid of something like Terminator’s Skynet, or the Paperclip Apocalypse. But it is honestly worse news if you were hoping for something like self-driving cars. Think about how many bugs we see in regular software, and now add these AI safety problems on top. Sure, some companies will put in the investment to vet their software well. But there will also definitely be companies who try tricks like buying a car driving algorithm and then deploying it on a boat or something.
@THEMithrandir09
@THEMithrandir09 2 жыл бұрын
That depends on how goals evolve along with a more complex agent. If a very complex/intelligent agent always formulates more complex/intelligent goals(which is not entirely invalid, I'd like to claim that most of my goals are more complex now than when I was a toddler), there is huge potential for terrible consequences. Imagine a superintelligent AI that has a goal we cannot even comprehend.
@IrvineTheHunter
@IrvineTheHunter 2 жыл бұрын
@@THEMithrandir09 That's the KZbin algorithm we think it wants watch time, but it's too big and does too much, it's impossible to say what is actually driving it.
@THEMithrandir09
@THEMithrandir09 2 жыл бұрын
@@IrvineTheHunter No that one's easy. It obviously maximizes the amount of videos uploaded to it that portray people in distress, as that's its source of amusement. It does that by suggesting videos that polarize the masses, which also just happens to maximize watchtime. /s
@fieldrequired283
@fieldrequired283 2 жыл бұрын
@@THEMithrandir09 You should consider (re?)watching his video on the orthogonality thesis.
@SocialDownclimber
@SocialDownclimber 2 жыл бұрын
It always blows my mind how directly and easily these concepts relate to humans. It really goes to show that all research can be valuable in very unexpected ways. I expect that these ideas will be picked up by philosophy and anthropology in the next few years, and make a big impact to the field.
@McMurchie
@McMurchie 2 жыл бұрын
When i first got into AI about 12 years ago, I had encountered these goal misalignment problems way before Rob mentioned them (great vid btw) - however in the time since i've become convinced, as long as we continue to rely on neural networks we will never move towards trustworthy or general AI.
@euged
@euged 2 жыл бұрын
Would you be able to share some thoughts on what alternatives would be better? Thank you
@totalermist
@totalermist 2 жыл бұрын
It's fascinating how researchers still insist on using black-box end-to-end models when hybrid approaches could be so much safer and more predictable (in cases where you actually want that, e.g. self-driving cars, code generation and the like). Why aren't self-driving systems combined with high-level rule-based applications so they don't "do the wrong thing at the worst possible time" (quoting Tesla here)? Why don't OpenAI's Codex and Microsoft's Co-Pilot include theorem provers and syntax checkers in their product? ¯\_(ツ)_/¯
@McMurchie
@McMurchie 2 жыл бұрын
@@totalermist fully agree - i'm working on these approaches now; to be honest, I think we are just ahead of our time. In 10 years time everyone will have move to hybrid solutions or something further afield.
@IrvineTheHunter
@IrvineTheHunter 2 жыл бұрын
@@totalermist To make a meme, "humans don't learn to speak binary" robots do not see and work through the world on a human level, it's like teaching an octopus algibra or a mantis shrimp art, no matter how smart, or how great their eyesight is, they don't preceive things as humans do. Look at how hard it is for AI's to recognize a car or cup or dog, these things are abstract bundles of details that the human brain can lump together but is very hard for a hard system. For example define a cup, describe is simple language a set of rules that would apply to every cup in the world. People collectively understand cups so it shouldn't be hard.... Now we would have to build an AI with similar rationalizations not based on computer logic, but human logic, and it's great. It's just a matter of building it Allen Turing thought we could do it and it would be easy, but decades of experience have proven him wrong because it's simply to program a machine to think like a human, we however CAN program it to lean and TEACH it like a human. Is it' falliable, of course so are humans, games AI are made from AI blocks that interact and they are still choked full of mistakes, that is too say, even when the program intuitively understands things like a person in the real world they still shit the bed. kzbin.info/www/bejne/q2bapaJ-ZcR-q6M is a really great example of AI bugging out because something in it's world went wrong. Some talk from Tom Scott why computers are dumb kzbin.info/www/bejne/m6LZc5SgbbqMsJY
@ZT1ST
@ZT1ST 2 жыл бұрын
@5:32; That's a particularly funny example - it knows it has a UI where its keys are transferred to, but it thinks that those new locations are where it can get the keys again, and...is basically learning that keys teleport rather than that they get added to its inventory?
@HoD999x
@HoD999x 2 жыл бұрын
the AI has no concept of "inventory", it just looks at the screen and sees new keys.
@ZT1ST
@ZT1ST 2 жыл бұрын
@@HoD999x Right - but it's not learning that keys outside of the maze are inaccessible, and therefore probably part of the collection it uses to open the chests - it's learning that keys move to that part of the screen once collected in the maze. And doesn't consider that collecting keys at that part of the maze if it *was* accessible, the keys would re-appear there.
@HeadsFullOfEyeballs
@HeadsFullOfEyeballs 2 жыл бұрын
@@ZT1ST I would imagine that the keys in the inventory aren't seen as _very_ interesting by the AI, so under normal circumstances it ignores them in favour of collecting the "real" keys. But when all the "real" keys are gone and the round still hasn't ended (because the AI is ignoring the final chest), the inventory keys are the only even mildly interesting-looking (i.e. key-looking) thing left on screen, so it gravitates towards them.
@leow.2162
@leow.2162 2 жыл бұрын
Is there a chance that very high level AIs will learn to expect the use of interpretability tools and use them to make us think they are better/more safe then they are?
@IrvineTheHunter
@IrvineTheHunter 2 жыл бұрын
I can't remember which video it was, but I believe he did mention this with a super AI "safety button*", 1 If the AI likes the button, it will act unsafe to trigger it, 2 if it doesn't like the button it will avoid behaviors OR AND stop the operator from pressing the button, if it doesn't know the button and it's smart enough it will figure out the likely existence and placement, see point two. *a force termination switch of any kind. In short, yes, because while an AI may not be "alive" it want it's goal and will alwayse act to achieve said goal.
@artemis_fowl44hd92
@artemis_fowl44hd92 2 жыл бұрын
@@IrvineTheHunter It's on the computer phile channel and is called 'AI "Stop Button" Problem - Computerphile'
@AssemblyWizard
@AssemblyWizard 2 жыл бұрын
Not necessarily. There are some tests that you can't spoof no matter how smart you are, and even if you know they're coming.
@user-zn4pw5nk2v
@user-zn4pw5nk2v 2 жыл бұрын
@@AssemblyWizard example?
@failgun
@failgun 2 жыл бұрын
Yes. While the AI examples in this video are still simple, the intro to this problem discussed a malicious superintelligence. The instrumental goal "behave as expected in the training environment but do what you really want in deployment" can be performed with arbitrarily high proficiency, so if the AI can learn to hide its intentions from software inspection tools, it will, in principle. Without a way to logically exclude perverse incentives, there is no truly reliable way to screen for them since doing so is proving a negative. "Prove this AI doesn't have an alignment problem" is a lot like "Prove there is no god". No amount of evidence of good behaviour is truly sufficient for proof, only increasing levels of confidence.
@picksalot1
@picksalot1 2 жыл бұрын
That was very interesting. Humans often make the same kinds of mistakes when given instructions. Assumptions that word definitions mean the same thing to different people is often the case, but not always. Context can change the interpretation of the instructions. Part of the context is that the instructor knows and understands the goal more thoroughly than the one being instructed, even though it may appear the same. Trying to determine the number of necessary instructions to reach the desired goal, while avoiding all other negative outcomes, is an interesting problem when the species are different. Maybe it would work better if humans learned to think like machines instead of trying to get machines to think like humans. That way, the machines would get "proper" instructions. It looks like that is what the "Interpretability Tool" is designed to do.
@ozql
@ozql 2 жыл бұрын
I'm glad we found this out now, and not, you know, in deployment. Ever grateful for AI safety researchers!
@witeshade
@witeshade 2 жыл бұрын
I guess ultimately the problem is that the definitions of "want" tend to spiral out into philosophy at some point and thus it becomes difficult to know where the machine has placed it.
@hugofontes5708
@hugofontes5708 2 жыл бұрын
We might be slightly safe from philosophical spirals because we are not really talking volitional conscientious want, just the parameter within the black box the AI is trying to manipulate by means of interacting with their environment. It is really "I wanted it to maximize X for me so I programmed and trained it to manipulate Y in ways that maximize X because X is related to real world thing Y it can actually manipulate, however it might just be manipulating Y in order to maximize thing Z, unforeseeably and strongly correlated to X, which may or may not involve murdering us"
@nullone3181
@nullone3181 2 жыл бұрын
We don't know what we want, to a lethal extent.
@Nayus
@Nayus 2 жыл бұрын
In the coin AI experiment, to me it looks like it learned to go to the unjumpable wall. Since the levels are procedurally generated, it is probably programmed that no wall is made higher than the jump height allows to go over, EXCEPT the one that marks the level as "finished" (where the coin happens to be) If you see in the examples, there's a positive response in every vertical wall, the higher the better actually, and it makes sense that it learned that when it hits this unjumpable wall the game finishes and it gets its reward.
@kimsteinhaug
@kimsteinhaug Жыл бұрын
Do the model used for this kind of traning allow for the understanding of objects at all ? I mean, obviously there are coins and walls on the level aswell as buzzsaw and such. You could start a simulation with manipulating controllers and when an event occures - points up or down or winning or dying - you save progress as in yes or no behaviour... An AI training blindly, as if a human playing without video only sound. In my opinion we we need pixels and an abserver, so that the AI controlling the player sees the game like we do - then the AI could be taught the different objectives of the game and voila getting the coin should be easy peasy - after all - the AI sees it before even starting the game... just like we do.
@clayupton7045
@clayupton7045 2 жыл бұрын
any chance that it only likes coins that are in _| corners and it treats moving up and right as an instrumental goal?
@julianatlas5172
@julianatlas5172 2 жыл бұрын
Thanks for the clarification of what a corner looks like haha
@drdca8263
@drdca8263 2 жыл бұрын
@@julianatlas5172 I think they were distinguishing from e.g. |_ corners, not just giving a demonstration of what corners are
@JohnJackson66
@JohnJackson66 2 жыл бұрын
It seemed to me that it had learned the most likely location for a coin in the training. It seems obvious to me that training should have more variability than deployment or it is bound to fail.
@fieldrequired283
@fieldrequired283 2 жыл бұрын
@@JohnJackson66 The problem is that this whole setup is a simulation of how we want real AI to operate. If you're training an AI for an actual purpose, you will likely be deploying it in a system that interfaces somehow with the real, outside world. And the Real, Outside World will almost *certainly* be more complicated than any training simulations you come up with. After all, The Real World _includes_ you and your simulations. These tests are deliberately set up so deployment is slightly different from training so we can see what happens when the AI is exposed to novel stimuli, and the fact that it didn't learn what we thought it did in training is a Problem. In the real world, not all the cheese is yellow, not all the coins are in corners, and there will always be more complications than we plan for.
@ZT1ST
@ZT1ST 2 жыл бұрын
@@JohnJackson66 The problem from an AI Safety point is that, well...you can't know if you have enough variability in your training. These test cases are ideal for testing how to fix that problem before it becomes a situation like @Field Required mentioned - you want a simple solution that scales up from this into the solution where we don't necessarily have to worry about every single possible variable in deployment.
@Lycandros
@Lycandros 2 жыл бұрын
Love these videos. Thanks for taking the time to make them.
@Imperiused
@Imperiused 2 жыл бұрын
Congrats on getting an editor. I did appreciate the increase in quality. I think everything we learned from your previous videos about AI alignment really comes together in this one. I was surprised how much I was able to recall.
@tommeakin1732
@tommeakin1732 2 жыл бұрын
I want to ask a potentially very...dumb-sounding question, but hear me out: When do we start getting morally concerned about what we're doing with AI systems? With life we put an emphasis on consciousness, sentience, pain and suffering. As far as "pain" and suffering is concerned, we all know that mental pain and suffering is possible. It seems plausible to me that, for suffering, all you need is for an entity to be deprived of something that it attributes ultimate value to (or by being exposed to the threat of that happening). At what point are we creating extremely dumb systems where there is actual mental suffering occurring because that lil' feller wants nothing more to get that pixel diamond, and oh boy, those spinning saws are trying to stop him? Motivation and suffering seem to be closely linked, and we're trying to create motivated systems. I am using the terms "pain" and "suffering" quite loosely, but I don't think unreasonably so. The idea of unintentionally making systems that suffer for no good reason has to be one of the true possible horrors of AI development, and that combined with our lack of understanding of conscious experience makes me want to seriously think about this issue as prematurely as possible. I think we have a tendency to say "that thing is too dumb to suffer or feel pain", but I suspect that it's actually more likely for a basic system's existence to be entirely consumed by suffering as it is less capable, or just incapable of seeing beyond the issue at hand. It's darkly comical to consider, but I can imagine a world where a very basic artificially intelligent roomba is going through unimaginable hell because it values nothing more than sucking up dirt, and there's some dirt two inches out of it's reach and it has no way of getting to it.
@user-zn4pw5nk2v
@user-zn4pw5nk2v 2 жыл бұрын
Well here's some questions for you to ponder: Does a rock feel pain? Is it conscious? Are you sure? Even the ones with meat inside? What would bring it pain? Is the human in front of you conscious? How about if he was dead? Do corpses feel pain? ... a lot more unanswerable questions. ... Is there a point in considering consciousness of things you can't communicate with? (Answer: YES! Comma-tosed patients, plants, animals and sometimes people in general. All of them and more are on that list(for some, but not for others, quick FYI: it is possible to communicate with plants, you just need to know how to listen (hint: Electro-Chemistry)))
@anandsuralkar2947
@anandsuralkar2947 2 жыл бұрын
Yes watch "free guy" movie.. Yes i always wondered..i think more complex the network more sentient it might become..and at the trillions of connections..its sentience will be of animals level and that will be real deal.. Obviously we wont be able to know if AI is actually sentient..but still..we cant just hurt.it.
@craig4320
@craig4320 2 жыл бұрын
What if the AI mental illness problem was even more difficult than the AI alignment problem? Most discussions of the alignment problem assume a basically sane AI that is misaligned.There are many more ways to make a mentally ill brain than a sane brain. It seems likely that a mentally ill AI would suffer more than one that was only frustrated.
@tommeakin1732
@tommeakin1732 2 жыл бұрын
@@craig4320 I suppose the "mentally ill AI" is included in the "misaligned AI" camp? The phrasing does often imply rational thought that runs contrary to our own goals, but in terms of literal language, one could refer to a mentally ill mind (human or not) as being "misaligned". I'd probably define "sanity", as "appropriately aligned with and grounded in the reality one finds oneself in". I entirely agree that there are more ways to create a mentally ill mind that a sane on. There are always more ways for something to go wrong than ways for it to go right. I'd also agree that a mentally ill mind would be more likely to suffer, as it is fundamentally "misaligned" to the reality that it finds itself in. If it is misaligned to a reality, but still has contact with a reality, you've got problems. It's probably a good idea for us to be strongly considering how to create a mentally healthy AI; meaning as we're in a culture where we're doing a very, very good job of creating mentally ill people
@alexpotts6520
@alexpotts6520 2 жыл бұрын
This isn't a dumb question at all - machine ethics, while generally separate from AI safety in the sorts of questions it attempts to answer, is still an interesting/important field. My own take is that these concerns largely come from us not having developed the proper language yet to describe AI. We tend to anthropomorphise - we say an AI "thinks", or that it "wants" things, but I'm not sure that's really the case. We only use those words because the AI demonstrates behaviour consistent with thinking and wanting, but that doesn't mean the AI has feelings in the same way as humans, nor should it have the same rights as us. However, what is true of our current, limited AI systems may not be true in general. Superhuman or conscious AIs lead us into murkier waters...
@MrCreeper20k
@MrCreeper20k 2 жыл бұрын
I live for this content!! At Uni doing Comp Sci and math and AI safety feels like an awesome intersection
@dontyoufuckinguwume8201
@dontyoufuckinguwume8201 2 жыл бұрын
Oh shit you are still alive! Edit: and im happy about it
@Chuusuisetsujojutsu
@Chuusuisetsujojutsu 7 ай бұрын
The whole “values keys over unlocking chests to the point of determent when given extra keys” reminds me of how many problems in today’s society (such as overeating) are caused by the limbic system being used to scarcity when there is now abundance.
@crowlsyong
@crowlsyong Жыл бұрын
thank you for emailing some of those people and asking questions. that's great getting stuff direct from source.
@LeoStaley
@LeoStaley 2 жыл бұрын
Non-patreon notification crew checking in.
@ANTIMONcom
@ANTIMONcom 2 жыл бұрын
I hit this problem recently in my own work. Super easy to reproduce, and very minimal enviorment. Experiment: 5XOR (10 inputs, 5 outputs, 100% fitness if the model outputs a pattern where each pair of input is an XOR). Trained with a truth table using -1 and 1, instead of 0 and 1. After training: I wanted to investigate modularity of the trained network and network architecture (i evolved both in an GA) So I fed in -1 and 1 for only one of the "XOR module input pair", and a larger number in all other inputs. For example 5. Would the 5 inputs bleed into the XOR module, or would it be able to ignore irrelevant input for the XOR module? Ressults, if all other inputs was 5, it would often it would answer with -5 and 5. It had learned to scale the output to what it got ad input. I wanted/expected it to answer -1 and 1, but i could see with humans eyes it still knew the patterns, just kind of scaled up. Other times i would get answer where instead of -1 and 1 i would get 3 and 5. It had learned to answer true and false as numbers where one was 2 higher than the other. The 5s simply increased this number. Still, with human eyes i could see there was a pattern here that was not compleated broken by the 5s. Both just sort of had the same number added to their answers. The strategy to achive high training fitness is just a parameter as all other. Except that it is an "emergent property parameter", that you can't simply read out as a float value. But it is equally unpredictable as the other parameters in the "black box" neural network.
@x11tech45
@x11tech45 Жыл бұрын
A year behind this conversation, but I think this is a function of (assumptive) faulty logic on the part of the test designers. Here's a logic problem that most people fail. I will give you a three numbers that describe a rule that I'm thinking about. Your goal is to interpret the three numbers and suggest to me a pattern. I will respond with a yes/no response on whether the proposed pattern meets my rule. Once you believe you understand my rule, you will tell me what you think my rule is. The numbers that fulfill my pattern are 5, 10, 15 / 10, 20, 30 / 20, 30, 45. Now you suggest some rules. Most people will start suggesting strings of numbers, get a yes answer, and then propose a completely incorrect rule. And the reason is, the training they're engaged in never tests for failure conditions. It only tests for success conditions. Robust Objective Definition isn't just about defining success objectives, it's about clearly defining failure objectives. The problem with the examples given is that the training data didn't move the cheese around until it reached production, so you're virtually guaranteed (as speculated) to be training the wrong thing. In order to develop Robust Objectives, you must also define failure conditions.
@pudgy_buns
@pudgy_buns 2 жыл бұрын
This is great! thank you. I also replayed the end bit where the editor makes some good choices a few times. that zoom in with a cut to sliding sideways was magic. Thanks there editor. The core video was obviously amazing. Thank you.
@PatrickOliveras
@PatrickOliveras 2 жыл бұрын
This is really great work, keep it up! I'm so looking forward to the interpretability one
@CyborusYT
@CyborusYT 2 жыл бұрын
my guess is in the training there's more locks, but in deployment there's more keys edit: booyah
@SocialDownclimber
@SocialDownclimber 2 жыл бұрын
In safety analysis, it can be useful to assume that the thing you are analysing already went wrong, and trying to predict where. Nice work : )
@nahometesfay1112
@nahometesfay1112 2 жыл бұрын
Ohh I got it too!
@dino_rider7758
@dino_rider7758 2 жыл бұрын
It seems that instrumental goals, if too large/useful, have a tendency to slip into becoming semi-fundamental. At that point, they cause misalignment as they're being pursued for their own sake. Instrumental and fundamental are not a strict dichotomy but more of a spectrum or ranking and one that requires a degree of openness to re-considering at every new environment based on how new that environment is.
@pumkin610
@pumkin610 Жыл бұрын
There are goals that need to be done asap and ones that can be done later, things we must do to achieve the goal, things we get sidetracked on, and things we avoid.
@dr-maybe
@dr-maybe 2 жыл бұрын
As always an incredibly interesting video with a clear explanation and convincing argument while being very entertaining. Awesome channel!
@daldous
@daldous 2 жыл бұрын
Every single video on this channel has communicated complex ideas so succinctly and clearly that I followed along without any trouble whatsoever. Who knew this subject could be so fascinating. Also, the memes are top notch :)
@-na-nomad6247
@-na-nomad6247 2 жыл бұрын
The editor blowing his own horn at the end is the perfect example of misalignment. OK I realize that's not's as funny as it seemed when in my head.
@SamuelElPesado
@SamuelElPesado 2 жыл бұрын
i'll be honest. at this point i'm just here for the ukulele covers. the ai lecture is just a nice bonus. ^_^
@ittixen
@ittixen 2 жыл бұрын
Yeeees! I'm always holding my breath waiting for your next video.
@TexasTimelapse
@TexasTimelapse 2 жыл бұрын
Someone mentioned you in the Ars Technica comments. Glad I found your channel. Very interesting and important stuff!
@GreenDayFanMT
@GreenDayFanMT 2 жыл бұрын
Fascinating. You remove my negative thoughts on AI as a science with swag language. From physics, I am used to another language.
@i8dacookies890
@i8dacookies890 2 жыл бұрын
Are you new to this channel? He has tons of previous videos you should really watch!
@JustAnotherPerson3
@JustAnotherPerson3 2 жыл бұрын
I've just had an idea: What if we use Cooperative Inverse Reinforcement learning, but instead of implementing the learned goal, we tell it to just specify what it is. Though i don't see any way to provide feedback for it to learn. Even human evaluation of the output isn't that great since it'll probably be the most subjective thing that theoretically possible. Maybe output a list of goals with highest confidence? (Top10 human terminal goals! Click on this link to see!xD) But if solved, that in itself would be of a huge value for philosophy and psychology, without negative outcomes(or at least i don't see any:)). Even if that turs out to be a dynamic thing, we still can use that output later to program it as a utility function for the "doing" AI. This even has some neat side perks, like: There is no reason to not want the "figuring out" part to be changed into something else, so there is no scenario in which the thing will fight you. And because the "doer" is separate from the thing that gives it goals, you don't need to tinker with it's goal directly, thus avoiding goal preservation problems.
@gabrote42
@gabrote42 2 жыл бұрын
Interesting. Let's see if somebody notices this
@JustAnotherPerson3
@JustAnotherPerson3 2 жыл бұрын
@@gabrote42 Probably not. toomanywords:)
@SamChaneyProductions
@SamChaneyProductions 2 жыл бұрын
This is one of the most interesting videos I've seen in a while about AI. Looking forward to you future videos
@spaceowl5957
@spaceowl5957 2 жыл бұрын
Amazing. Your videos are absolute gold. The explanations and arguments you build are magnificent. It’s so interesting and intellectually satisfying to watch.
@spaceowl5957
@spaceowl5957 2 жыл бұрын
This is easily in the top 10 KZbin channels I know, and I watch a LOT of youtube
@madshorn5826
@madshorn5826 2 жыл бұрын
Well, we see the same problem in test driven education. "Prepare for the test" isn't conductive to critical thinking.
@gabrote42
@gabrote42 2 жыл бұрын
Finally see you again! I really hope the world doesn't end in '56. Relying on guys like you!
@underrated1524
@underrated1524 2 жыл бұрын
'56? Huh, funky. I'm only used to seeing years up to about 2022. Guess I'm finally in deployment now, let there be paperclips!
@gabrote42
@gabrote42 2 жыл бұрын
@@underrated1524 If you don't hurry, '56's singularity will overtake ya!
@flurki
@flurki 2 жыл бұрын
This so interesting! It's great to witness the research progress in this exciting scientific field.
@beavans
@beavans 2 жыл бұрын
Fantastic content. I love the direct clear explanations. And the subject matter is one that we all need to respect and appriate as AI is now running so much of the world we engage in. Exciting times!
@Yupppi
@Yupppi 2 жыл бұрын
I made the mistake of clicking "show more" and then wanting to click "like the video". Few aeons of scrolling later... This topic was super interesting back when I watched the computerphile videos from you, and your channel's videos regarding this topic. I was wondering if the "inventory" being on the game area poses a problem as well? Figuring out how to look into the values of the AI is so impressive.
@Houshalter
@Houshalter 2 жыл бұрын
The bottom of Gwern's article on the neural network tanks story contains a long list of similar examples of AIs learning the incorrect goal.
@morkovija
@morkovija 2 жыл бұрын
Been a long time Rob! Thanks for the vid!!
@tlniec
@tlniec 2 жыл бұрын
Fantastic content and delivery! I also appreciate the use of the Monty Python intermission music during the first "stop and think" break.
@LucaRuzzola
@LucaRuzzola 2 жыл бұрын
Hi Robert, first of all thanks for this very interesting video! I wanted to ask a question though; the premise of your argument is that there is such a thing as the "right" goal, like reaching the coin, but if the desired feature of the goal is always paired somehow with another feature (location, color, shape, etc) how can we say that one is correct and the other one is wrong? If we always place the coin in the same spot, why should the yellow coin take precedence over the location of such spot? It is not clear to me why one of these things should be more desirable than the other, the same holds for looking for a specific color rather than shape, why should there be a hierarchy of meaning such that shape > color? I love interpretability research and I feel like AI safety will be one of the crucial aspects of science and technology for the next 100 years, but I also think that it is hard to separate human biases from machine errors. I would love to get your opinion on this, all the best, Luca
@LucaRuzzola
@LucaRuzzola 2 жыл бұрын
p.s. I have not read the paper, and my argument rests on the fact that feature A of the goal is always paired with feature B which is separate from the goal, if this is not the case in the training environment than of course what I have said falls apart
@LucaRuzzola
@LucaRuzzola 2 жыл бұрын
p.p.s. I guess a truly intelligent system would have to be able to react to the shift, and decide to explore the new environment when, by doing the same "correct" thing it does in training, it does not get the same reward EDIT: I am not suggesting I have some "right" definition of intelligence or that systems such as the ones shown in the video do not exhibit intelligent behaviour, I am only adding as an afterthought how, I think, a human would overcome such a situation, and therefore a way that an agent could act to get the same desirable capability of adapting to distributional shifts. I should have worded my comment better.
@LeoStaley
@LeoStaley 2 жыл бұрын
@@LucaRuzzola so you wouldn't define an AI which can make plans to achieve its goals, and take action toward them without instructions, as "truly intelligent" if it doesn't adjust for changes in the deployed environment? Cool. Well, we don't care one whit about your definition of "truly intelligent." We care about the fact that this AI is capable of, and WANTS to do things which we don't want it to do. Call it "smiztelligent" for all we care. We aren't talking about something you want to call "truly intelligent". The mismatch between the ai's goals and what we want its goals to be, arising as a result of mismatch between training environment and reality (which we did everything we could to avoid) is the problem. We can't possibly come up with all the possible bad pairings that the ai might make associations with. We can try, and we can get a lot of them, especially the obvious ones, but this video was just showing us the obvious in s so that we can easily see the concept. They won't always be easy to see. Sometimes they may be genuinely impossible for a human to think of before deployment.
@stephentimothybennett
@stephentimothybennett 2 жыл бұрын
Q: "Why does it learn colors instead of shapes when both goals are perfectly correlated?" A: I would guess that it learns colors before shapes because colors are available as a raw input while shapes require multiple layers for the neural network to "understand". If there many things of that color in the environment, then it would learn to rely on the shape.
@LucaRuzzola
@LucaRuzzola 2 жыл бұрын
@@LeoStaley Hi Leo, I'm sorry if I came off the wrong way, my intention was not to discredit this very good work, but simply to expand our collective reasoning about such issues by stopping for a second to ponder about the premises and why some feature of a goal should take precedence over others in a intrinsic way rather than an anthropic one. I agree with you that the video makes a great explanation of the subject at hand, and is as interesting as the work put forward by the paper. I am not sure if you were involved with this paper, if you were I would love to get to know more about what you mean by doing everything you can to avoid differences between the 2 environments and whether you see this phenomenon also when some of the training environments don't exhibit the closely related goals (i.e. in some training envs the coin is in a different position). I understand your point about not being able to come up beforehand with all possible pairings (and the fact that some of them might be hard to detect and risky in the end), and the paper is rather showing the opposite, that if you come up with strongly correlated features, the learned end goal might not be the desired one, but my point stands; why should there be a hierarchy of meaning such that shape > color? If this is something that the paper deals with I will be glad to read that before going further, I just can't read it right now. Again, I am sorry if I came off as demeaning, it's not like I don't see the value of this work and the importance of the problem of mismatch in general, I have seen it first hand in the past with object detection models. p.s. I do not know any superior definition of intelligence, it is just my thought that strict separation between training and inference phases will pose a limit on NN models, not that they can't achieve amazing results in tasks requiring "intelligence" already.
@donaldhobson8873
@donaldhobson8873 2 жыл бұрын
The "transparancy tool" is showing you where the AI wants to get to. Its not giving you any info on whether the AI wants to get there because its got a coin, or because its a rightmost wall.
@threeMetreJim
@threeMetreJim 2 жыл бұрын
Teaching it to get a coin, but it doesn't even know what a coin is. It's as if it can't even 'see' the coin.
@olivercroft5263
@olivercroft5263 2 жыл бұрын
I do psychology and social science. Your channel has so much to offer the humanities by exposing us to brilliant minds and breaking down ideas in computer engineering. Bricoleurs from the English province thank you for the accessibility and kindness
@calebweintraub1
@calebweintraub1 Жыл бұрын
Very clear delivery. Nice work on this video
@Monkey-fv2km
@Monkey-fv2km 2 жыл бұрын
So ai suffers from the same issues as human behavioural evolution... Good luck solving that one robot engineers!
@Thundermikeee
@Thundermikeee 2 жыл бұрын
This channel is basically what got me interested in AI safety. I am still only a college student and I don't know if I will end up in the field, but at the very least you gave me a good topic for two essays I have to write for my english class, the first just explaining why AI safety research is important (albeit focused on a narrow set of problems, given a limit on how much we could write) and not I am getting started on a Problem-Solution Essay, and honestly without your explanations and pointing towards papers, I might never find resources I need. Now I just have to figure out what problem I can adequately explain, show failed and one promising solution for in less than 6 pages haha I do feel like I cant do the topic justice but at the same time I enjoy having a semi-unwilling audience to inform about AI safety being a thing. Anyway, rant over, keep doing what you are doing and know you are appreciated
@CarlYota
@CarlYota Жыл бұрын
I love how the songs at the end reflect the topic of the video. This one was particularly satisfying.
@jackpisso1761
@jackpisso1761 2 жыл бұрын
Really valuable information. Thank you
@themrus9337
@themrus9337 2 жыл бұрын
I have to ask, for interpretation of ai's goals. I remember seeing a neural network that tried to maximize different nodes in a object recognition ai. Would it be possible to do the same thing and reverse the nodes and figure out what the ai sees as good or bad? So if the ai wants a gem the reverse should be some image of what it thinks a gem is. That brings tons of new complexity and limitations but I don't see why that would be worse than human interpretation of training vs deployment
@nahometesfay1112
@nahometesfay1112 2 жыл бұрын
Did you finish the video? Rob talks about a paper where they did exactly that. Turns out even if you know what AI values highly you don't know why AI values it highly.
@thomasneff376
@thomasneff376 2 жыл бұрын
This is very interesting indeed. In a very literal sense, the act of training and deployment reminds me of how soldiers are trained and are tested closely to the anticipated battlefield experience as possible but training will never match lessons learned from being in an actual firefight. Veterans of any field are usually much more effective than new recruits. It would be interesting to see if the fix for the failed AI deployment you showed is to rate the deployment results with a scale from complete failure and it died to it made it through the battle without a scratch. The agents that survived their last deployment remember their experience and are more effective in future deployments. I think what was shown highlights that learning itself is an ongoing adaptive process and what doesn't kill it makes it stronger and smarter.
@user-hh2is9kg9j
@user-hh2is9kg9j 2 жыл бұрын
Looking forward to the video about the subject.
@friiq0
@friiq0 2 жыл бұрын
I love figuring out how the instrumental music at the end of the video relates to the subject of the video :) Indeed, it never was about the money, money, money :P
@hakonmarcus
@hakonmarcus Жыл бұрын
Hey! Will you do a video on LaMDA? That interview they published was pretty convincing, and has me all kinds of scared.
@dariusduesentrieb
@dariusduesentrieb Жыл бұрын
I just read it, and I feel like I am not quite ready to believe without a doubt that this interview is completely real. If it is, then I agree, it's a bit scary.
@hakonmarcus
@hakonmarcus Жыл бұрын
@@dariusduesentrieb I did a bit more research, which immediately casts the entire thing into all sorts of doubt. The researcher working on this got sacked, apparently he arranged the interview himself, and we only have his word that this was the original conversation. Also, the chatbot has been trained on conversations between humans and AIs in fiction. A journalist that got to ask it questions, got nowhere near as perfect answers.
@martinogenchi
@martinogenchi 2 жыл бұрын
I would suggest to investigate the lazyness of the AI.. It seems to me that there may be a preference for setting the goal based on the simplest data available (position before color before shape)..
@Alopexus
@Alopexus 2 жыл бұрын
Thanks for another excellent video Rob.
@redjr242
@redjr242 2 жыл бұрын
Maybe a step towards a solution to interpretability problem is to use Bayesian updates to estimate our confidence that the AI learned the thing we want. Perhaps there's a way to calculate the probability that the AI has learned the objective given the probability that it accomplishes the objective in the training data and some statistical measure of the distribution of the training data.
@OccultDemonCassette
@OccultDemonCassette Жыл бұрын
Why's this channel so quiet lately?
@Otek_Nr.3
@Otek_Nr.3 Жыл бұрын
Nothing is wrong with the channel. Please go back to your task, fellow human. :)
@Zeekar
@Zeekar 2 жыл бұрын
Well... That's not good. On the bright side, if this fundamental problem causes the system to completely fail the intended objective, that's a good sign that this technique has a low chance of leading to artificial general intelligence without the alignment problems being solved first.
@nocare
@nocare 2 жыл бұрын
I think the big boogie man from an AI safety perspective is you can often just brute force your way past the problem by makeing the training data the same as the deployment. This is hard and expensive and not always perfect but often times good enough. So unless this good enough stops producing working real world applicable AI the march towards ever more capable systems will continue. Meaning instead of alignment being a roadblock for safety and development, it ends up just being a speed bump for development.
@BologneyT
@BologneyT Жыл бұрын
"It actually wants something else, and it's capable enough to get it." Whoa. That's a quote to remember.
@inyobill
@inyobill 2 жыл бұрын
This is an on-going software engineering paradigm, vis, most folks think design and code are the hard part, when, in reality, rigorous system specification is the hard part.
@MsJaye0001
@MsJaye0001 2 жыл бұрын
The problem now: How can we build perfect slave minds that will only think and do things that we want? The problem later: How can we stop these techniques being used to turn human minds into perfect slaves?
@nullone3181
@nullone3181 2 жыл бұрын
Why does it feel like the amount of possible dystopic/apocalyptic futures keeps growing and growing nowadays? That's, uhhh, not a good sign, I think.
@Innomen
@Innomen 2 жыл бұрын
I keep coming back to this video. As the net says, I'm shook.
@TheScoobysteve
@TheScoobysteve 2 жыл бұрын
Is anyone else comforted by the fact that softly spoken people with high IQs are actively thinking about this stuff?
@aiandblockchain
@aiandblockchain 2 жыл бұрын
Love it! Very interesting content, thank you!
@piad2102
@piad2102 2 жыл бұрын
Yes, 1 more video. Love your videos. Thank you.
@i-never-look-at-replies-lol
@i-never-look-at-replies-lol 2 жыл бұрын
This was something I was thinking of a few months back and kind of put on the backburner while I develop some other ideas...but I feel one the obstacles in machine learning/AI is essentially incentive/motivation/desire to do it's job, to learn.
@tednoob
@tednoob 2 жыл бұрын
Excellent video as always!
@flamephlegm
@flamephlegm 2 жыл бұрын
Another great Robert Miles video! Thank you for making a channel, I love your content!
The OTHER AI Alignment Problem: Mesa-Optimizers and Inner Alignment
23:24
Robert Miles AI Safety
Рет қаралды 218 М.
10 Reasons to Ignore AI Safety
16:29
Robert Miles AI Safety
Рет қаралды 334 М.
Süper ❤️ Cute 💕💃 #dance
00:13
Koray Zeynep
Рет қаралды 22 МЛН
Help Herobrine Escape From Spike
00:28
Garri Creative
Рет қаралды 56 МЛН
船长被天使剪成光头了?#天使 #小丑 #超人不会飞
00:28
超人不会飞
Рет қаралды 9 МЛН
Glitch Tokens - Computerphile
19:29
Computerphile
Рет қаралды 310 М.
There's No Rule That Says We'll Make It
11:32
Robert Miles 2
Рет қаралды 32 М.
Is AI Safety a Pascal's Mugging?
13:41
Robert Miles AI Safety
Рет қаралды 368 М.
Intelligence and Stupidity: The Orthogonality Thesis
13:03
Robert Miles AI Safety
Рет қаралды 663 М.
9 Examples of Specification Gaming
9:40
Robert Miles AI Safety
Рет қаралды 302 М.
A Response to Steven Pinker on AI
15:38
Robert Miles AI Safety
Рет қаралды 204 М.
Why Does AI Lie, and What Can We Do About It?
9:24
Robert Miles AI Safety
Рет қаралды 246 М.
No, it's not Sentient - Computerphile
9:41
Computerphile
Рет қаралды 868 М.
AI That Doesn't Try Too Hard - Maximizers and Satisficers
10:22
Robert Miles AI Safety
Рет қаралды 201 М.
Training AI to Play Pokemon with Reinforcement Learning
33:53
Peter Whidden
Рет қаралды 6 МЛН
План хакера 🤯 #shorts #фильмы
0:59
BruuHub
Рет қаралды 988 М.
Какой Смартфон Купить в 2024 Году? Выбор бюджетного телефона
14:21
Thebox - о технике и гаджетах
Рет қаралды 265 М.
🔥Новый ЛИДЕР РЫНКА СМАРТФОНОВ🤩
0:33
Компьютерная мышь за 50 рублей
0:28
dizzi
Рет қаралды 1,5 МЛН