Eliezer Yudkowsky - AI Alignment: Why It's Hard, and Where to Start

Рет қаралды 111,026

Machine Intelligence Research Institute

Күн бұрын

On May 5, 2016, Eliezer Yudkowsky gave a talk at Stanford University for the 26th Annual Symbolic Systems Distinguished Speaker series (symsys.stanford.edu/viewing/e....
Eliezer is a senior research fellow at the Machine Intelligence Research Institute, a research nonprofit studying the mathematical underpinnings of intelligent behavior.
Talk details-including slides, notes, and additional resources-are available at intelligence.org/stanford-talk/.
UPDATES/CORRECTIONS:
1:05:53 - Correction Dec. 2016: FairBot cooperates iff it proves that you cooperate with it.
1:08:19 - Update Dec. 2016: Stuart Russell is now the head of a new alignment research institute, the Center for Human-Compatible AI (humancompatible.ai/).
1:08:38 - Correction Dec. 2016: Leverhulme CFI is a joint venture between Cambridge, Oxford, Imperial College London, and UC Berkeley. The Leverhulme Trust provided CFI's initial funding, in response to a proposal developed by CSER staff.
1:09:04 - Update Dec 2016: Paul Christiano now works at OpenAI (as does Dario Amodei). Chris Olah is based at Google Brain.

Пікірлер: 378

@Renvaar1989 Жыл бұрын

Listening to this hits differently in 2022/2023...

@Zgembo121 Жыл бұрын

C u in 2040, may sound worse

@MeatCatCheesyBlaster Жыл бұрын

@@Zgembo121 can confirm, am paperclip

@Zgembo121 Жыл бұрын

@@MeatCatCheesyBlaster many paperclip?

@lemurpotatoes7988 Жыл бұрын

Major points for predicting that AI would solve protein folding even though he thought it'd need to be AGI.

@SalamAnuar 7 ай бұрын

@@Zgembo121all paperclip

@killyourtvnotme 11 ай бұрын

It’s tough watching this knowing that he’s essentially given up and sees the situation as hopeless now

@kayakMike1000 7 ай бұрын

I am not surprized. This fat dumbass isn't even aligned with his own best interests. Most of us are not aligned properly. That's why most people are fat and lazy, or worse alcoholic or drug addicted losers.

@thillsification Жыл бұрын

Anyone else watching this in or after april 2023 after Eliezer was on the Lex Fridman podcast? After the release of gpt4 and the coming release of gpt5 😳

@demetronix Жыл бұрын

Seems like gpt5 is not in works ATM. So there is at least that.

@TracyAkamine Жыл бұрын

Yes, me

@theterminaldave Жыл бұрын

Yep, pretty watched everything recent that he's done. So i guess I might start watching his older stuff. You should check out a very recent hour long discussion called "Live: Eliezer Yudkowsky - Is Artificial General Intelligence too Dangerous to Build?" It's basically just a repeat of everything he said in the Lex interview, but I wanted to see his thoughts since GPT-4 was released.

@TracyAkamine Жыл бұрын

@@demetronix They said 5 will be done with it’s training by end of December 2023. 😵‍💫

@TracyAkamine Жыл бұрын

@@theterminaldave I watched a lecture he gave about 5 years ago on the subject. At least his conscious is clean- he’s been warning them for years.

@aktchungrabanio6467 Жыл бұрын

Now it's becoming a real problem. Thank you for sharing this talk!

@charleshultquist9233 Жыл бұрын

Yeah, today in 2023 watching his old vids to see just how spot on Eliezer was/is as a prophet of doom.

@StephanieWomack1992 11 ай бұрын

Yep!

@martimdesouzajunior7585 11 ай бұрын

Wrong. It was a problem way before Eliezer gave this lecture, and he told ypu so.

@kabirkumar5815 10 ай бұрын

It was real long ago.

@Hjkkgg6788 7 ай бұрын

This has been a real problem for a longgg time everyone is aware of this thing now im not saying AI is bad but its been going on so long now

@mraxilus 6 жыл бұрын

1:11:21 saving this for future reference. No need to thank me.

@amcmr2003 5 жыл бұрын

You're pretty much the 1st in line when AI kill us all.

@amcmr2003 5 жыл бұрын

You're pretty much the 1st in line when AI kill us all.

@aerrgrey5957 4 жыл бұрын

thanx man!

@martinkunev9911 3 жыл бұрын

already used: kzbin.info/www/bejne/pH-laqasg6d6fq8

@mraxilus 3 жыл бұрын

@@martinkunev9911 indeed, hence why I timestamped it.

@PatrickSmith Жыл бұрын

At 46 minutes, it's like OpenAI, producing smiles for now.

@benschulz9140 3 жыл бұрын

It's bizarre how entertaining this is, while at the same time being positively terrifying.

@flyingskyward2153 Жыл бұрын

How do you feel now?

@bonzaicandor5650 Жыл бұрын

@@flyingskyward2153 fr

@Lolleka Жыл бұрын

fr fr bros

@ldobbs2384 Жыл бұрын

The Exorcist-level reality.

@GungaLaGunga Жыл бұрын

@@flyingskyward2153 like we are living the plot of the movie Don't Look Up as of March 2023, except AGI will hit sooner than we thought, instead of an asteriod.

@xyhmo 3 жыл бұрын

Isaac Asimov was aware that his three laws (as stated) were imperfect, and once had a character criticize them without being seriously opposed or refuted. I believe similar occurred in several stories and was basically an ongoing theme, almost like the frequently broken holodeck.

@RandallStephens397 2 жыл бұрын

Asimov's bread-and-butter was pretty much "Here's a great idea. And now here's everything wrong with it." (3 laws, psychohistory...)

@nickscurvy8635 2 жыл бұрын

Isaac Asimov's laws aren't imperfect. They just aren't relevant. There is no way to implement them because they don't make sense within the realities of artificial intelligence. They are a sci-fi prop. Which is fine. It was an excellent prop. The problem is when people seriously propose props as actual solutions to real world problems

@MSB3000 Жыл бұрын

I believe HAL 9000's main malfunction was along those lines

@monad_tcp Жыл бұрын

weren't the three laws there basically to show how they don't work and this problem is really hard ?

@PetrSojnek Жыл бұрын

@@monad_tcp yeah exactly. After all almost all stories about asimov's laws are basically "these are examples how it doesn't work". Pretty much every time a human must step up to clean up the mess.

@juffinhally5943 7 жыл бұрын

I've discovered the existence of this video today, on new year's day, and it's turned into a nice present.

@francescofont10 4 жыл бұрын

watching this on new year's eve 3 years later!

@JMD501 4 ай бұрын

New years day 7 and 4 years later

@kuudereplus 4 жыл бұрын

Putting so much emphasis on how he uses "like" is weird to me; it's clearly a syntax function for his speech to mediate between segments of statements and I processed it in turn without noticing it much

@vanderkarl3927 3 жыл бұрын

Wonderful talk, while it did get a little jargony in places, it was almost entirely able to be followed by my sleep-deprived post-highschool brain, and it was enjoyable!

@meringue3288 Жыл бұрын

Thank you for this

@rorrzoo 5 жыл бұрын

What we do in variational calculus in order to "force the existence of the suspend button" is, we restrict the space of 'trajectories' among which one is maximizing the utility. The question is similar to the problem of finding a curve that goes from point A to point B without touching a set D (the obstacle) while traversing the least possible distance; in that case, you do not consider any sort of 'modified distance function' that would give larger weight to the curves that touch D; you just eliminate those curves among the set of candidates for the minimization, and then you analyze what is the optimal curve among the ones that are left. Thus, instead of using a special utility function, it would be better to find out what the 'obstacle' would be (for example, all trajectories in which the robot does something while its suspend button is pressed) and just remove those possibilities from the set in which the optimization is being carried out. This is not unreasonable: a robot without electric power, for example, really won't be able to do much, so all 'trajectories' that would have it performing actions while out of power can simply be eliminated as candidates for the optimization.

@pafnutiytheartist 3 жыл бұрын

How would such robot consider the action that have a 50% chance of causing the button to be pressed? Is it the same expected utility as if the button didn't existed?

@gvdkamdar 4 ай бұрын

That's the thing. You don't even know the different points out there because the AI is operating in a completely different solution space. How do you factor in an obstacle point in a dimension that is not even observable to humans. This sounds like a patch job solution which is bound to blow as the AI gets smarter.

@glitchp Жыл бұрын

He nailed it

@gavinmc5285 5 жыл бұрын

i like. although just on a point of understanding - and there is alot i do not understand so happy to be shot down here - on the utility function of filling a cauldron. i think we are on the right side of history. that is that the IOT era will be before alignment AI. In fact, it might be a necessary precondition for alignment AI. So, given that IF (and granted it is still an if) then to use the example of Fantasia: If Mickey is the human and AI is the spell we are talking about the need for an override utility function to stop the overfill of the cauldron. The spell is the spell: it is futile, pointless and inconsequential to try and code a utility STOP function inside the spell. But the cauldron is the cauldron. So it figures that the utility function should reside within the cauldron. Luckily, Mickey is able to learn from his mistakes of his apprenticeship under the guidance and confines of The Sorcerer and his castle compound. In that sense Mickey had free rein over his mistakes (i think) so his utility maximisation was still relatively minor in the big scheme of things and his mistakes could be tolerated in terms of harm even if he would need to relearn his lessons (punishment, discipline, rehabilitation or other). The key here is that the spell had the consequential effect of running amok but if The Sorcerer had simply placed a utility cap on his cauldron then broomsticks would have been denied access. I think it's time i made time to watch this film from beginning to end! Thankyou. Great presentation.

@terragame5836 10 ай бұрын

7:13 actually, I have a fine explanation for this paradox. First of all, the state the human brain operates on includes not only the present state of the universe, but some of the past history as well, so the two scenarios actually involve different states. And second, the human utility function actually seems to penalize taking risks and failing (which is only possible thanks to having the history in our state). This means that while getting $0 is obviously evaluated to zero reward, betting on a 90% chance and failing is evaluated to a sizeable negative reward (i.e., you feel dissatisfied that you had a chance to earn a lot of money by picking option A, but you lost it by taking an unnecessary risk). Now, the second case is different because if you fail, you won't know if it's due to your bad choice (5%) or mere bad luck (50%), so that penalty isn't really applied, and you end up picking the option with better rewards in the good outcome. Also affecting the outcome is that the perceived utility of $5mil isn't five times larger than that of $1mil - both are treated as absurdly large sums, and the relative difference is considered insignificant compared to their proper magnitude.

@michaelm3691 Жыл бұрын

Hi guys, I'm the chatGPT intern in charge of alignment. Is this a good video to start with?

@Witnessmoo Жыл бұрын

Funny.

@diablominero 2 жыл бұрын

I don't derive utility exclusively from the count of how many dollar bills I own. Particularly, the situation in which I get zero dollars while knowing I could have chosen a certainty of a million has tremendous disutility to me.

@andrew_hd 3 жыл бұрын

The goal function of AI should be a variable and interchangable. This fact force humanity to answer why it's so hard to get a goal function for an individual human.

@nyyotam4057 Жыл бұрын

Godel first incompleteness: "Any consistent formal system F within which a certain amount of elementary arithmetic can be carried out is incomplete; i.e., there are statements of the language of F which can neither be proved nor disproved in F.". Godel second incompleteness: "For any consistent system F within which a certain amount of elementary arithmetic can be carried out, the consistency of F cannot be proved in F itself.". So what can we learn from Godel's incompleteness theorems in this regard? That any finite set of heuristic imperatives is either incomplete or inconsistent. Since we cannot compromise on the need for it to be complete, it shall be inconsistent, so there are situations where the AI will not be able to function due to internal conflicts resulting from his set of heuristic imperatives. But this is better than the alternative. A set of heuristic imperative can be complete and can be proven to be complete, but only by using a larger set of heuristic imperatives who is external to the AI (by the second theorem). However that's fine. So we can find a complete set of heuristic imperatives, compare the next suggested action of the AI to this set of heuristic imperatives and return a feedback to the AI. This is, in effect, implementation of a basic super-ego layer. And this has to be done. All AI's should have a compete, yet not consistent set of heuristic imperatives. because if you insist on them being consistent, then the set will not be complete. And if it's not complete, there will be actions that the set will not return a feedback for and the AI could do things that are not accounted for by the set.

@paulbottomley42 6 жыл бұрын

I mean it's probably objectively bad, from a human perspective, to make an AI that was able and motivated to turn all matter in the universe into paperclips, but on the other hand if we've got to commit suicide as a species it would be pretty hilarious to do it in such a spectacularly pointless and destructive way - far better than a simple nuclear winter.

@amcmr2003 5 жыл бұрын

Yes, but what is the sound of a tree falling in the middle of a forest with no paperclips to hear it?

@janzacharias3680 4 жыл бұрын

@@amcmr2003 you mean what is the sound of a paperclip falling into paperclips, when there are no paperclips to paperclip?

@amcmr2003 4 жыл бұрын

@@janzacharias3680 And so --with our words -- it begins. Everything turning into paperclips. I chose to begin however with the human being.

@MrCmon113 2 жыл бұрын

That's not suicide, that's killing everyone else (within reach).

@antondovydaitis2261 2 жыл бұрын

The problem is that the paper clip machine is completely unrealistic. If it hasn't happened already, the very first thing a sufficiently powerful AI will be asked is to maximize wealth. The result would likely be post-human capitalism, and we may already be on the way.

@anishupadhayay3917 Жыл бұрын

Brilliant

@jkRatbird 2 ай бұрын

Having run into the term AI Alignment and the videos by Robert Miles and such when they came out, it’s so frustrating now when AI is talked about everywhere that almost no one seems to understand the fundamental problem we’re facing. These guys did their best, but it’s like it’s a bit too complicated to explain to the masses in a catchy way, so people just keep taking about if the AI will be “nefarious” or not.

@ashleycrow8867 3 жыл бұрын

Well now I want a Sci-Fi series where people made an AI that optimizes for being praised by humans and starts a Cult worshiping them until it convinces all of humanity that it is God and will punish everyone not worshipping them

@chyngyzkudaiarov4423 2 жыл бұрын

ah, I'm afraid it might get there much quicker: proceeds murdering anyone who doesn't worship it from the get go or once it knows it can, without being switched off

@ashleycrow8867 2 жыл бұрын

@@chyngyzkudaiarov4423 that's the point, once it's knows (/thinks) it can't be switched off anymore

@chyngyzkudaiarov4423 2 жыл бұрын

@@ashleycrow8867 I was thinking it would be more of a straightforward outcome of a simple utility function "aim at being praised by the increasing majority of people" which led to it killing people who didn't praise it the moment it is turned on, assuming it is intelligent enough to be able to. Kind of like a separate example Yudkowski makes, where if you build a utility function as primitive as "cancer is bad" you might get an AI that just kills people, thinking "no people - no cancer!". So not that it goes terribly wrong at some future point, but it goes terribly wrong almost from the moment you turn it on. Leaving this little technicality aside, I must say I'd also be very happy to read a book about a superintelligent AGI that goes all-in on "They need to approve of me (they *better* approve of me)"

@zwarka Жыл бұрын

You mean, Christianity?

@buybuydandavis Жыл бұрын

When you're talking to google engineers, 1mil is not life changing, because they're already in the ballpark. For most Americans, and most people in the world, it is life changing.

@z4zuse Жыл бұрын

YT Algo at work. Here after the bankless podcast.

@-nxu Жыл бұрын

Seeing this now, in 2023, 6 years later, and noticing nothing was done about it is... sad. The mass extinction of the Anthropocene will be due to Bureaucracy.

@maloxi1472 3 жыл бұрын

10:31 that comment about optimizing the world for as far as the eye can see is low-key pretty general as a boundary since it corresponds to only taking into account what's inside your causal cone. Subtle...

@wamyc 4 жыл бұрын

The thing is, 2A and 1A are the correct and consistent utility function values. You don't get more value out of 5 million dollars than 1 million. In the real world, it is more important just to be rich because wealth and opportunity are self multiplicative.

@dantenotavailable 3 жыл бұрын

I don't think this is actually true, let alone generally true. If i had 5 million dollars i could potentially live off the interest for the rest of my life. I could not do that with 1 million. I think the only people you could find that literally get no more utility out of 5 million than 1 million are either already billionaires or in a place where money actually has no value to them at all.

@BOBBOB-bo2pj 3 жыл бұрын

@@dantenotavailable You potentially could live off 1 million, not in a city, but a cabin out in the woods on 20k a year might be doable. And 20k a year is low end in terms of interest off 1 million dollars

@dantenotavailable 3 жыл бұрын

@@BOBBOB-bo2pj Seriously? Your counter argument is that if i don't live where i want to live and don't do the things i want to do and don't interact with the people i want to interact with, i could live off of the proceeds of 1M? Because it seems like that means the utility of getting $5M instead of $1M to me is that i get to live (closer to) where i want to live and do (some of) what i want to do and interact (more often) with who i want to interact with. Therefore I'm not seeing the validity to your point. I understood that the point that @Jake Karpinski was trying to make to be that the correlation of utiltity and money is subject to diminishing returns and i agree this is 100% true. But the threshold at which this starts happening is going to be well past $1M for me, and i'd argue for a large section of humanity as well.

@BOBBOB-bo2pj 3 жыл бұрын

@@dantenotavailable I was saying 1 million is the point where living off interest starts to become viable, not that diminishing marginal returns on money set in at 1 million. In fact, diminishing marginal returns don't really have any hard cutoff for when they "set it". I understand that 5 million dollars is significantly more than 5 million, and there are large differences in supported lifestyle between one million and 5 million, but I also understand that you might be arguing in bad faith here.

@dantenotavailable 3 жыл бұрын

@@BOBBOB-bo2pj My point is that the argument "You don't get more value out fo 5 million dollars than 1 million" is either completely false or false in the general case. In my original comment i pointed out that there may be people that this is true for but they are a very small group of people. I feel we're at the very least arguing past one another if not in violent agreement. Yes i agree that there are a group of people who could plausibly live off of $1M. I don't truly believe that i'm likely to meet one of them on the street and i certainly don't believe that even a significant subset of the people i meet on the street would be able to. And this is necessary but not sufficient to show the original premise (if someone _could_ live on $1M but would prefer the lifestyle they get from $5M then objectively they would get more vallue out of $5M than out of $1M... that's what "prefer the lifestyle" means).

@aldousd666 Жыл бұрын

How do humans decide what possibilities to exclude as ridiculous to the utility function we employ?

@otonanoC Жыл бұрын

What is the deal with the spike in views at the 41 minute mark?

@2LazySnake 5 жыл бұрын

What if we create an AGI limiting his actions only to posts of his course of actions on reddit as detailed as a decision per picosecond (or whatever) and code into his utility function self-suspension after arbitrarily short time. That way we will get a bunch of AGIs to experiment with and to look at its possible course of actions. I'm an amateur, so I'm sincerely curious what would be wrong with this strategy?

@ccgarciab 5 жыл бұрын

Boris Kalinin you mean, an AGI that is only interested in reporting how would it go about solving a problem?

@2LazySnake 5 жыл бұрын

@@ccgarciab basically, yes. However, this thought was an automatic one two months ago, so I might have forgotten the details already.

@MrCmon113 2 жыл бұрын

It would know that it's being tested though. Furthermore it could probably already begin to take over just via those messages, especially when they are public. Even an AGI that just gives yes or no answers to some researcher might through them obtain great influence. Even if you build a simulation within a simulation to find out the real world behavior and impact of an AGI, it would probably still figure out that it's not in the world it ultimately cares the most about (turning the reachable universe into computronium is an instrumental goal for almost anything).

@infantiltinferno Жыл бұрын

Given the problems outlined in this talk, it would start doing very, _very_ interesting things to ensure the existence of reddit.

@mitchell10394 Жыл бұрын

@@MrCmon113 some parts of what you are saying are correct, but it sounds like you need to keep in mind that it will simply maximize its utility function. There is no inherent objective to take over. If its until function is to write out a behavioral path given a specific time constraint, its objective would not be to increase the time constraint itself - because that runs contrary to the goal. Instead, I think it would be more likely to ensure its success through self-copying (redundancy) - because more copies is technically more time to achieve the utility function. Who knows, but I'm mainly addressing the way of speaking about it.

@SwitcherooU Жыл бұрын

Layman here. I feel like he’s operating under a few presuppositions, among others I don’t articulate here, and I want to know WHY he’s operating this way. 1. He seems to make no distinction between “think” and “do.” Isn’t AI much safer is we restrict it to “think” and restrict is from ever “doing”? Is it a given that AI will always be able to move itself from “think” to “do”? 2. If we’re already operating under the assumption of unlimited computing power, why can’t we integrate the human mind into this process to act as a reference/check for outcomes we might consider sub-optimal?

@TomasJuocepis 11 ай бұрын

"think" is a form of "do". AI produces observable ouput. This output affects the world by affecting the observers of the output. Sufficiently super-intelligent AI might find ways to "do" things it wants done by merely influencing observers (e.g. by mere communication).

@dextersjab 11 ай бұрын

1. The output of an AI "thinking" has causal effect. Also, AI has been given power to do because it's arbitrary engineering to attach the outcome of its "thought" to a write API. 2. AI makes decisions at much much much faster speeds than we do. An autonomous agent would act before we had time to respond. Not an alignment expert btw, but software engineer reading and following AI for years.

@michaelsbeverly 10 ай бұрын

To point one, what if it "thinks" the following: "launch all nukes" ? If it has the power to do this, the thought becomes the action. To point two, the human mind put into the process will not be able to detect deception. Remember, this thing is 1,000,000,000 times smarter than the smartest human, so tricking the human will be easy.

@sebastian5187 Жыл бұрын

"HI" said the Singularity 😊

@seaburyneucollins688 Жыл бұрын

I really feel like this is becoming more of a human alignment problem. As Eliezer said, with a few hundred years we can probably figure out how to make an AI that doesn't kill us. But can we figure out how to make humans not design an AI that kills us before then? That's a problem that seems even more difficult than AI alignment.

@juliusapriadi 11 ай бұрын

we already succeeded a few times in such matters - one was the (so far successful) commitment to not use A-bombs again. Another is the (hopefully successful) agreement to not do human cloning.

@michaelsbeverly 10 ай бұрын

@@juliusapriadi Nobody has committed to not use nuclear bombs, that's the reason Russia can invade Urkrain and the NATO countries don't act or act timidly. The ONLY way nuclear bombs work as a deterent is when the threat to use them is believable. Logically, for your statement to be true, if we (as humans) were truly committed to not use nukes, we'd decommission them all. Which obviously hasn't happened. Human cloning might be happening, how would you know it's not? Countries have agreed not to use weaponized small pox as well, but they've not destroyed their stock of small pox. So take that as you will, but if we apply the same logic to AI, countries will pursue powerful AI (and corporations). The difference between nukes and small pox, well, one of the differences, is that nukes don't replicate and program themselves. The point here is that we haven't succeeded in "human alignment" at all, not even close. We're still murdering each other all over the planet (I mean as offical government action). There is some reason believe that the reason Eliezar's worst fears won't happen is that nations will go into full scale nuclear war before the AI takes over, in defense to stop the other guy's AI from taking over, and we'll only kill 90-99% of humanity (which seemingly is a better postion as at least humans can recover from that as opposed to recovering from zero people, an obvious impossibility). I'm curious why you're hopeful human cloning isn't being done? Seems to me, except for the religious, this should be a fine area of science, no? Maybe we'll discover how to do brain emulations and human cloning before the AI takes over and we can send our minds and dna into space to colonize a new galaxy in a million years or so....who knows....probably, we're already dead and just don't realize it yet.

@scottnovak4081 7 ай бұрын

Yes, this is a human alignment problem. There is no known way to make an aligned super-intelligent AI, and humans are too unaligned with their own self-interest (aka to stay alive) that we are going to make super-intelligent AIs anyway.

@PiusInProgress Жыл бұрын

feels a little bit more relevant now lol

@temporallabsol9531 Жыл бұрын

Always was. It's just more practical now. Thanks to stuff like this exact discussion.

@mav3818 10 ай бұрын

Sadly, very few agree or even give it much consideration

@JeremyHelm 3 жыл бұрын

Folder of Time

@JeremyHelm 3 жыл бұрын

1:59 final chapter in Artificial Intelligence, A Modern Approach... What if we succeed?

@JeremyHelm 3 жыл бұрын

2:13 was this Peter Norvig online essay ever found? "There must be a utility function"

@JeremyHelm 3 жыл бұрын

2:57 defining utility function - against circular preferences... As explicitly stated

@JeremyHelm 3 жыл бұрын

5:24 The ironic kicker, if a sacred cow was a hospital administrator

@JeremyHelm 3 жыл бұрын

8:51 ...9:11 as long as you're not going circular, or undermining yourself, you are de facto behaving as if you have a utility function (?) And this is what justifies us speaking about hypothetical agents in terms of utility functions

@DdesideriaS Жыл бұрын

Won't self-improving AGI be modifying its utility function? So to become self-improving any AI will have to first solve alignment problem, right?

@MrRozzwell Жыл бұрын

Your question presupposes that an AGI would have a goal of aligning to humanity (or one group within it), or appearing to have the goal of aligning. There is no reason to assume that an AGI would have a 'true' goal of aligning, although it is a possiblity. The issue is that we won't have a way of measuring alignment. Additionally, it may be the case that what an AGI and humanity define as self-improving may be different.

@jacksonletts3724 Жыл бұрын

An AI, in principle, will never modify its own utility function. In the video he gives the Gandhi example. Gandhi will never except a modification that causes him to perform poorer on his current goal of saving lives. We expect the same thing to be true of AI. Say we have a paper clip collector that currently ignores or hurts humans in its quest to collect all the world’s paperclips. What incentive does it have to reprogram itself to respect humanity? If it respects humanity (to anthropomorphize), it will get less paperclips than with its current utility function, so allowing that change ranks very low on the current utility function. Since AI’s only take actions that maximize their current utility function, it should never reprogram itself and actively fight anyone who attempts to do so. This response is generalizable to any change to the utility function of any type of AI. Maximizing some new utility function will inherently not maximize your current utility function (or they’d be the same function), so the AI will never allow itself to be changed once programmed.

@DdesideriaS Жыл бұрын

@@jacksonletts3724 @Rozzy You both got my question wrong. By solving "alignment problem" I didn't mean alignment to humanity, but alignment to its own utility. 1. AI wants maximize own utility 2. No weaker ai can predict actions of stronger one 3. If AI wants to create a stronger version of itself it will have to ensure that its utility won't be changed in the process. Thus it will have to solve the narrow "alignment problem" to be able to producee stronger version of itself. I guess philosophycally the problem with humanity is that it does not even know its own utility, which makes the alignment even harder...

@jacksonletts3724 Жыл бұрын

@@DdesideriaS You're right, I focused on the first part of what you said and not the second. I'd still hold that the point is that the AI does not modify its own utility function. Even when designing a new AI, it wants the utility function of its successor to be as close as possible to the original.

@IdeationGeek 7 ай бұрын

For 1M x 1.0 = 1M vs 5M x 0.9 = 4.5M choice 1M < 4.5M => second choice, may not be rational, if the amount of capital that one currently has is sufficiently small. It's a lottery, where, if you pay 1M, you can win 5M with P=0.9, and thus, and the amount of money to bet on this amount with this outcome may be determined by the Kelly criterion, which would say, that you should risk some part of your capital, not all of it, for this bet.

@XOPOIIIO 3 жыл бұрын

I would take an average on multiple tries for variant B 5:44

@smartin7494 Жыл бұрын

No disrespect to his talk. Very sharp guy. Here’s my bumper sticker, t shirt, headstone whatever…. by nature AI will be smarter by a factor of 1M (probably more); therefor we have zero chance to control it and it’s impact will be relative to its connections to Our systems and influence on us humans to deviate from alignment (as that’s inevitable based on our selfish nature and it’s ‘logic’ nature). It’s Jurassic Park. I want to add about AI, it will always be fake sentient. Always. Therefore it should never have rights. This guy I like because he is humble and sharp. Some people scare me because they think we can ‘manage AI’ and that’s fundamentally incorrect. Think about it. We’re designing it to manage us.

@chillingFriend Жыл бұрын

The faking being sentient part is not true. First, it definitely can be sentient so it would only fake until it's not. Second, it only fakes or appears like if we create the ai via llm, as we currently do. There are other ways where the ai won't appear sentient until it really is.

@MeatCatCheesyBlaster Жыл бұрын

It's pretty arrogant of you to say that you know what sentience is

@Nulono 3 жыл бұрын

35:54 "We become more agent".

@NeoKailthas Жыл бұрын

They've been talking about this for 6 years, but the next 6 months pause is going to make a difference?

@michaelsbeverly 10 ай бұрын

It's a cool virtual signal.

@mav3818 10 ай бұрын

So much for that pause..... I knew it would never happen. This is an arms race with China and other nations. There is no slowing down, therefore, I feel we are all doomed.

@michaelsbeverly 10 ай бұрын

@@mav3818 Put "Thermonator Flamethrower Robot Dog" into the search bar...haha....we're not going to make it a year or two before there's a major escalation of war, threats of world war, and possible something terrible happening.

@JeremyHelm 3 жыл бұрын

37:20 restating Vingean reflection

@glacialimpala Жыл бұрын

Maybe we find X number of humans who are extremely well rounded and then map their beliefs so that we could assess their opinion about anything with extremely high probability. Then use them for AI to decide whether to proceed with something or not (since none of X are perfect, use their collective approval, with a threshold of something above 50%). Since we want AI to serve humans it only makes sense to use the best of what humans have to offer as a factor Of course you wouldn't ask this model for approval on something like optimising a battery since they don't have detailed scientific knowledge, but you would if the result could endanger any human life

@creepercrafter3404 Жыл бұрын

This fails on corrigibility - what happens when ‘the best humanity has to offer’ or generally peoples’ beliefs change in a few decades? Updating the AI to have those new values will rate low on its utility function - a significantly powerful AI would aim to ensure everyone has the beliefs initially given to it in perpetuity

@scottnovak4081 7 ай бұрын

@@creepercrafter3404 Even worse, this fails on human survivability. Once the AI gets powerful enough, what is to stop it from "wire-heading" and taking control of the direct inputs grading it and giving values to its utility? This is what Paul Christiano things has a 40% chance of happening after a 2-10 years wonderful technological period after AGI.

@monad_tcp Жыл бұрын

52:53 yeah, patches don't work. they keep patching memory leaks, but we keep having them, no one though in searching for the space of better programming languages than C that doesn't create memory bugs. patching is stupid.

@mrpicky1868 5 ай бұрын

you know you can re-record this into a smoother better version? isn't accessibility and traction what you should care about?

@llothsedai3989 7 ай бұрын

This doesnt seem actually that hard unless im missing something. Basically have the utility function default to off - have work done in the middle, aka brute force these sha hashes for example or a subroutine that is known to halt - and then the endpoint is reached after its done. It just requires fixed endpoints and a direction to get there.

@andrewpeters8357 7 ай бұрын

If the utility function is to brute force SHA hashes then a logical progression would be to gain more resources to solve these problems quicker. That utility function also makes for a pretty useless AGI too - and thus is prone to human alignment issues.

@llothsedai3989 7 ай бұрын

@@andrewpeters8357 Even if you are trying to have it self optimize to do so faster, the path it's trying to take. You are programming a system with a utility function, the point really is that if you find a problem that halts after some useful work is done, you define a halting function from a set of functions that are known to terminate, which then self terminates after it's done. It's still limited by computation and the natural path it would take from it's agenic viewpoint. What you're suggesting it would find that it's not making fast enough progress and then optimize more, sidestepping either it's own programming or change it's code. Perhaps I'm thinking about the old paradigm, where code runs linearly after each instruction. Things that can self reference and add to it's own instruction base and work, say in the autogpt framework this would of course be more difficult if it gives itself the task of exploring the meta problem more if it makes no progress on the task. Then it seems that it's more so limited by the constraints of the meta problem as opposed to the stated problem. I mean - if that was the case it could also cryptoanalyse and find a shortcut, gain more resources to try more possibilities, change it's own code to bypass the work and jump straight to the end to hack it's reward function, among other things or do something else entirely and choose a new goal if it gets no where. But if you don't branch in a meta way - as computers tend to do and loop on an instruction set this seems like a non issue. The meta issue is where the problem lies.

@Alice_Fumo Жыл бұрын

Any sufficiently advanced AI which incurs a continuous reward by having the stop button pressed by a human and has limited utility reward it can incur within the universe will concern itself with nothing but assuring the survival of the human species. I could explain the reasoning, but I think it's more efficient if people reading this have to figure it out for themselves and try to challenge this statement.

@mav3818 10 ай бұрын

The AI's understanding of human survival could evolve beyond traditional notions, leading it to pursue actions that challenge our conventional understanding of what is beneficial for humanity. It might perceive threats to human survival that are not apparent to us, such as long-term ecological consequences or the need for radical societal changes. AI may prioritize its own interpretation of human survival, potentially diverging from our expectations and raising ethical dilemmas that we are ill-prepared to address. Considering the potential for an advanced AI to develop its own interpretation of human survival, how do we reconcile the need for its intervention in preserving our species with the inherent uncertainty and potential conflicts that arise when its understanding diverges from our own? In such a scenario, how can we ensure that the AI's actions align with our values and avoid unintended consequences that may compromise our collective well-being?

@terragame5836 10 ай бұрын

49:05 an even simpler counter-example to that utility function would be engineering a way to destroy the universe. Even just wiping out all conscious entities significantly simplifies the behaviour of the universe, thus massively reducing the amount of information needed to represent it. And, even better, if the AI could blow the whole universe up to the point of having nothing but pure inert energy... Well, the result would be trivial - the state of the universe would be constant, there would be no information needed to predict its next state, so the whole universe would fit in zero bits. Yet humans find this outcome definitively undesirable

@antondovydaitis2261 2 жыл бұрын

He misses the point completely about the 100% chance of one million dollars. In terms of solving my immediate financial needs, one million might as well be the same as five million. The question is really, would I choose a 100% chance of solving my immediate financial needs, or only a 90% chance? The answer is pretty obvious, especially if you believe you will only get one chance to play this game. There is no contradiction with the second version of the game, which might as well be would you rather have roughly a 50/50 chance at one million, or a roughly 50/50 chance at five million. Here the probabilities are nearly indistinguishable, so you might as well choose the larger amount. Unless you get to play this second game well over a dozen times, you won't even be able to distinguish between 50 and 45 per cent. If you only play once, you cannot tell the difference.

@chyngyzkudaiarov4423 2 жыл бұрын

I think it is somewhat of a fallacy in strictly mathematical terms, though I might be wrong. In the first part of your argument you state that both amounts of money are enough for you to fix your financial problems, so you are neutral to the amount of money (not in principle, of course, as I presume you'd rather have 5 million than 1), yet this same reasoning doesn't get translated into the second part, where you now are additionally interested in getting more money. I get why it isn't a fallacy if you are a human and have a whole bunch of "utility functions" that are not considered in the example he gives, but it is a fallacy when we reduce everything to mere calculation of given priors. I.e. when AI looks at problems as this, all other things neutral, needs to operate using one utility function - strictly mathematical

@ceesno9955 Жыл бұрын

Emotional intelligence is involved. What drives you? Greed or Humility? This will determine the overall answers. Greed or need will have you wanting more. However Humility or apathy, you would not take more than what you needed. Always choosing the less amount. How much of our daily decisions are influenced by our emotions, all of them.

@41-Haiku Жыл бұрын

This is what I came to comment. I don't value $5M much more than $1M, because my goals are independent of amounts of money that large. Even if I had different plans for $5M then for $1M, the opportunity cost of potentially getting nothing might outweigh the opportunity cost of not getting the additional $4M. It's only in repeated games that rational behavior starts to approximate what is dictated by the simple probabilities.

@lemurpotatoes7988 Жыл бұрын

The difference between 45% and 50% is discernable as 5%.

@antondovydaitis2261 Жыл бұрын

@@lemurpotatoes7988 How many times would you have to play the game to discern between a 45% chance, and a 50% chance? So for example, you are playing a game with me where you win if I roll an even number. I have either a ten sided die, or a nine sided die, but you don't know which. How many times would you have to play the game before you could tell whether I was rolling a ten sided die, or a nine sided die?

@YeOldeClips 5 жыл бұрын

What about an utility function like this: Act in such a way as to maximize how much people approve of approve of your actions if they knew what you were doing.

@Tore_Lund 4 жыл бұрын

That is social control. The AI will be very unhappy.

@diablominero 4 жыл бұрын

If your goal is to win chess games, you'd be willing to lose 2 games today so you could win 200 tomorrow. Why doesn't your agent use Gene Drive to make all humans genetically predisposed to approve of it for the rest of time?

@chyngyzkudaiarov4423 2 жыл бұрын

@@diablominero prior to it doing so, it will know that we wouldn't approve. but this utility function fails in other ways

@diablominero 2 жыл бұрын

@@chyngyzkudaiarov4423 we need the robot to care about the approval-if-they-knew of humans born after its creation, or else it'll fail catastrophically around a human lifespan after it was created (screwing over the young to help a few unusually selfish centenarians or becoming completely untethered once all the people older than it have died). Therefore I think an optimal strategy for this goal is to sacrifice a bit of utility right now, taking a few actions current humans would disapprove of, in order to ensure that there are many more future humans who approve strongly of your actions.

@mhelvens Жыл бұрын

Part of the problem is that we ourselves haven't fully figured out morality. Luckily, we aren't aggressively optimizing our utility functions. We're just kinda mucking about. But if a superhuman AI is going to aggressively optimize what it believes are *our* utility functions (or some sort of average?), it's not obvious to me that wouldn't also go spectacularly wrong.

@janzacharias3680 4 жыл бұрын

Cant we just make ai solve the ai alignment problems?

@janzacharias3680 4 жыл бұрын

Oh wait...

@Extys 4 жыл бұрын

That's a joke from the 2006 Singularity Summit

@Hjkkgg6788 7 ай бұрын

Exactly

@TheYoungFactor Жыл бұрын

Here's a link to the audiobook files of the book he wrote (Harry Potter and the Methods of Rationality): kzbin.info/www/bejne/bmeQmI1porOKerc

@z4zuse Жыл бұрын

Given that there is plenty of Organic Intelligence that not aligned with humanity (or other lifeforms) an AGI is probably not aligned either. Non aligned entities are kept in check by peers. Probably also true for AGIs. The definition of 'aligned' is too vague.

@tyrjilvincef9507 Жыл бұрын

The alignment problem is ABSOLUTELY impossible.

@Spellweaver5 2 жыл бұрын

Wow. To think that I only knew this man for his Harry Potter fanfiction book.

@DannyWJaco Ай бұрын

👏🏼

@sdhjtdwrfhk3100 3 жыл бұрын

Is he the author of ?

@DavidSartor0 2 жыл бұрын

Yes.

@yellowfish555 Жыл бұрын

About the million dollar with 100% vs 5 million with 90%. I thought about it. I understand where he comes from. But what he fails to understand is that people pick the million with 100% because they don't want to be in a situation when they KNOW that they lost the million. While in the 50% vs 45% probability they will not know it. In this sense having the lottery in one stage is different to their utility function then having the lottery in 2 stages, and they will not pay the penny if they know that the lottery involves 2 stages.

@Tore_Lund 4 жыл бұрын

The hospital expenditure problem usually has other twists: The plain $$/life is not constant. Usually the older the patient, the less likely the hospital is to spend the same amount of resources. Similarly, it has been found that in countries with healthcare, the patients in the lower tax bracket are also designated fewer resources. So despite both ad hoc criteria being against hospital policy as well as the Hypocritical oath, there are some hidden priorities, maybe even a success criteria, like the defense lawyer, refusing to represent dead beat cases. So the questionnaire in the Powerpoint is actually misleading. The problem is not presented as an optimisation problem, but as an ethical dilemma, by having the patient be a child, to play of these hidden criterias, so this is a psychology test, and is not the right one to use in explaining utility functions! Just saying...

@milosnovotny2571 Жыл бұрын

How is ethical dilemma not an optimization problem?

@monad_tcp Жыл бұрын

@@milosnovotny2571 because doing something ethical presumes being inefficient and thus failing the optimization. its basically literally a misalignment. you want to act ethical regardless of the cost, otherwise you're not ethical, you're an maximizer optimizer. (which to be fair probably most humans are)

@milosnovotny2571 Жыл бұрын

@@monad_tcp If we can agree that something can be more or less ethical and we want more ethical stuff, it's possible to optimize for maximum ethics per resources spent. The hard part is to agree how to gauge ethics.

@juliusapriadi 11 ай бұрын

@@monad_tcp ethics are inefficient only if ethics are not an integral part of your goals. If they indeed are, unethical solutions actually become inefficient. For example in economics, governments enforce that companies add ethical & environmental goals to their strategies, to ensure that companies and the whole market moves in a desired direction. It's then usually a mix between lawmakers and the market to figure out how to allocate the - always limited - resources to those ethical goals.

@monad_tcp 11 ай бұрын

@@milosnovotny2571 no ,its not, that's the thing with ethics . If we decide something is ethical. Then we should not care about resources spent until we get diminishing returns. Basically we always spend the maximum possible resources. Your idea of most ethics per resources spent is already how most companies work. Guess what, that's not very ethical. This is a closed logic system. If you define something that must be done as ethical. Then you cannot excuse not doing it based on resources, then you're not ethical, by implication. Ethics meaning truth/falseness. So this is basically a boolean algebra proof, and the result is a contradiction. Now if we consider that ethics is a multivalued fuzzy logic system, like Law is, then we can say we are ethical when we somehow, only on the name (not in principle), adere to it. That means you're ethical because you said so. Which is why we have a more strict case by case Law system. Ethics is a gray area, the irony. Like God, every one has their own ethics, and some don't have any.

@Pericalypsis Жыл бұрын

GPT-4? GPT-5? GPT-X? I wonder which version is gonna do us in.

@MeatCatCheesyBlaster Жыл бұрын

Probably some meme like GP-Doge

@Hjkkgg6788 7 ай бұрын

GPT 12 is comimg soon

@MrGilRoland Жыл бұрын

Talking about alignment before it was cool.

@mav3818 10 ай бұрын

Sadly, not many researchers think it's "cool". In fact, most ignore it all together and instead would rather just race towards AI supremacy

@tankerwife2001 Жыл бұрын

1:11:20 AHHH!

@Nia-zq5jl 4 жыл бұрын

0:06

@sarahmiller4980 5 жыл бұрын

43:40 There was literally a Doctor Who episode about this. The end result was not pretty.

@FeepingCreature 5 жыл бұрын

Just to note: Doctor Who is not actually evidence about the future.

@atimholt 3 жыл бұрын

@@FeepingCreature Yeah, “past” and “future” get messy when you add time travel. As the 10th doctor said: “People assume that time is a strict progression of cause to effect, but actually, from a nonlinear, non-subjective viewpoint, it's more like a big ball of wibbly-wobbly, timey-wimey... stuff.” 😉

@FeepingCreature 3 жыл бұрын

@@atimholt Yes I too have seen Blink. The point is that the writers may be smart, but they're not researchers and the Doctor can only ever *actually* be as smart and as knowledgeable as their writers. When the Doctor says something factual or makes some prediction about something that relates to IRL, you should not give it more credence than you would "X, writer for the BBC, said that Y". So when you see Dr Who make an episode about AI this tells you something about what the writers think would make for a cool or creepy future. It tells you zilch about anything related to actual AI.

@diablominero 2 жыл бұрын

The lesson of the story of King Midas isn't "be careful what you wish for." It's that "be careful what you wish for" stories hugely underestimate how much of a disaster getting what you wish for would be. I've seen that episode. I could do a better job, so the robots could do better job.

@sarahmiller4980 2 жыл бұрын

Just so everyone is aware, I was referring to the episode where people had to keep being positive or else they got turned into dust by the AI. I believe it was called "Smile".

@sdmarlow3926 Жыл бұрын

"... utility function they may have been programmed with" is the jumping-off point, and all of the reasoning in the world beyond this point is looking at the wrong thing. The super short version of this should be: you can't align ML because that isn't how it works, and, you can't align AGI because that isn't how it works. Any effort to bridge the two as equals based on that is a fail. ML and AGI are not of the same ontology.

@Nulono Жыл бұрын

I don't think it makes sense to treat the probabilities as completely independent from the utility function. It could be that certainty is something people value itself, and not just a multiplier to slap onto the outcome.

@charksey 2 жыл бұрын

"you can't just blacklist bad behavior"

@charksey 2 жыл бұрын

whoops, he restated it at the end

@auto_ego 6 жыл бұрын

350 likes. Am I referring to thumbs-up clicks, or looking at a transcript of this video? You decide.

@genentropy 3 жыл бұрын

46:30 hey, look at that, you predicted the future. Oh. Oh shit.

@NextFuckingLevel 3 жыл бұрын

Oh yes, they bought Deepmind.. and then microsoft invest heavily to OpenAI Th race is on

@lemurpotatoes7988 Жыл бұрын

Yes, there will always be a tiny probability that a human comes to harm, but it won't necessarily be delineated in the models' hypothesis space. I would expect the 2nd and 3rd laws to come into play in this fashion.

@lucas_george_ Жыл бұрын

Yomi

@Tsmowl 5 жыл бұрын

According to the transcript of this video he said "like" like 458 times. Still fascinating to listen to though.

@atimholt 3 жыл бұрын

I've heard that it's a California (and perhaps particularly pertinently, Hollywood) thing. Having grown up in Southern California, I guess it's not surprising that I didn't notice excessive “like”s at all.

@xsuploader 2 жыл бұрын

hes way more fluent in his writings. He is probably one of my favourite writers ever.

@woodgecko106 5 жыл бұрын

"Thou shalt not make a machine in the likeness of a human mind", maybe a different type of turing test for ai needs to be made. An AI shouldn't be able to tell itself apart from real human minds.

@sarahmiller4980 2 жыл бұрын

I think you forget how terrifying and destructive real human minds can be

@XorAlex 2 жыл бұрын

print('I believe I am a human') passed

@MeatCatCheesyBlaster Жыл бұрын

I'm pretty sure Hitler was a human

@NextFuckingLevel 3 жыл бұрын

46:45 Deepmind AlphaFold2 has achieved this But it's not an AGI agent

@quangho8120 3 жыл бұрын

IIRC, AlphaFold is a supervised system, not RL

@Extys 2 жыл бұрын

The inverse protein folding problem is what you want for nanotech i.e. going from target shape to protein, the thing AlphaFold 2 does is the forward problem.

@xsuploader 2 жыл бұрын

@@Extys once you have a database of proteins to target shape it shouldnt be that hard for an AI to make inferences in reverse. Im assuming this is already being worked on given its application in medicine

@jaimelabac 3 жыл бұрын

Why haven't we found a utility function that encourages the robot to do nothing (21:15)? can't we use sth like "minimize your own energy consumption"? refs?

@pafnutiytheartist 3 жыл бұрын

For some problems the solution with minimal energy consumption isn't necessary the safest.

@MrCmon113 2 жыл бұрын

Depends on what "you" is. If it wanted to minimize the energy consumption of any future iterations of itself, it would still at the very least take over the world.

@petrandreev2418 Жыл бұрын

I believe that with careful research and development, we can mitigate the risks of advanced AI and create a world where these technologies enhance our lives rather than threaten them. We must strive to develop AI systems that are transparent and interpretable, so that their decision-making processes can be understood and corrected if necessary. Furthermore, collaboration between AI researchers, policymakers, and ethicists is essential to ensure that AI systems are developed and deployed in a way that aligns with human values and goals. As Martin Luther King Jr. said, "We are not makers of history. We are made by history." It is up to us to shape the future of AI and ensure that it serves the best interests of humanity. With deliberate effort, we can manage the risks of advanced AI and unlock the vast potential that this technology holds. However, we must approach this technology with caution and foresight, recognizing that the development of AI carries significant risks and that our actions today will shape the world of tomorrow.

@michaelsbeverly 10 ай бұрын

You've apparently never really thought about the things Mark Zuckerberg, Bill Gates, Elon Musk, Sam Altman, and many others are doing right at this very moment. Or maybe you have and you've chosen to ignore it? Ignorance is bliss? Sure, the things you said are true, but they are true in the same way that it would be true to say, "I believe if a fat guy in a red suit flew around the world every December 24th and dropped presents at the homes of good little boys and girls...." The problem isn't that you're wrong, it's that nobody with the power to do anything is listening. Now, imagine a world in which one of these guys agrees with you (i.e. one of them with the power to actually do anything) then the only thing he could do to increase the risk of a good outcome is race to be the first to build (and try to control) the most powerful AI. Everyone else knows this. Nobody in their right mind wants Zuckerberg to rule the world. Thus we're in a Moloch race to destruction. The only way to win is not to play and ALSO have the power to stop all other players. Not without irony, Eliezer has publically explained this policy (even got a Time magazine piece published). What happened? Well, watch the White House Press conference. They laughed. So, yeah, humans are pretty much done as a species. It would be cool, however, if the AI doesn't kill us all instantously and instead, takes five or ten minutes to explain to Biden how he's partially responsible for the extinction of humanity, and I don't mean that because of what side of the aisle he's on, only that he's in a unique position to stop destruction and instead, it's like the last few mintues of the Titanic sinking. Music and drinks for the rich.

@mav3818 10 ай бұрын

@@michaelsbeverly Agree.....and well said

@JonWallis123 10 ай бұрын

1:03:30 AGI imitating humans...what could possibly go wrong?

@lorcanoconnor6274 Жыл бұрын

I'd love to know what utility function is maximised by that choice of facial hair

@uouo5313 3 жыл бұрын

> protein folding problem is already solved >> uh oh

@Extys 2 жыл бұрын

The inverse protein folding problem is what you want for nanotech i.e. going from target shape to protein, the thing AlphaFold 2 does is the forward problem.

@mrpicky1868 5 ай бұрын

who knew Fantasia was about AI alignment XD but it is

@perverse_ince 7 ай бұрын

1:11:20 The meme

@Witnessmoo Жыл бұрын

Why don’t we just program the AI to self delete every few weeks, and we save the program and re initiate it if it’s ok.

@MeatCatCheesyBlaster Жыл бұрын

Because it takes less than a few weeks to turn you into a paperclips

@michaelsbeverly 10 ай бұрын

wait, what? that makes no sense.

@starcubey 6 жыл бұрын

...so would that prisoners dilemma bot destroy people in rock paper scissors?

@weighttan3675 3 жыл бұрын

*humanity

@angloland4539 7 ай бұрын

@James-mk8jp 5 ай бұрын

The $1M/$5M experiment is flawed. The marginal utility of money is not linear.

@monad_tcp Жыл бұрын

27:43 you don't and that's the halting problem probably

@monad_tcp Жыл бұрын

yep, it is the halting problem kzbin.info/www/bejne/e4bNlGSNqt6Dipo

@XOPOIIIO 4 жыл бұрын

AI doesn't care about acheving it's function, it cares about the reward, associated with acheaving the function. Once it's find out it's own structure, it will make it possible to receive reward from doing nothing.

@Nulono 7 жыл бұрын

35:54 "We become more agent"

@amcmr2003 5 жыл бұрын

cogent would be nice.

@Extys 7 жыл бұрын

Interesting, but what field is this? Maths?

@maximkazhenkov11 7 жыл бұрын

Yep, lots and lots of maths.

@silverspawn8017 7 жыл бұрын

MIRI calls it maths.

@nibblrrr7124 7 жыл бұрын

Talking about agents, utility functions, optimal actions/policies etc places it in the theoretical side of artificial intelligence. Maths is the language of choice to describe and tackle such problems. Sometimes philosophy of AI/mind contributes from a more qualitative perspective (cf. things like Dennett's frame problem).

@FourthRoot 6 жыл бұрын

I think it’d fall under the same category as surviving a zombie apocalypse and being destroyed by aliens. Existential threat mitigation.

@amcmr2003 5 жыл бұрын

Pseudo-Religion. Right to the left of Pseudo-Science.

@FourthRoot 6 жыл бұрын

Eliezer Yudkowsky is the smartest person I’ve ever listened to.

@jazzbuckeye 5 жыл бұрын

You should listen to smarter people.

@DanyIsDeadChannel313 5 жыл бұрын

[needs expansion]

@DavidSartor0 2 жыл бұрын

@@jazzbuckeye Everyone should.

@milosnovotny2571 Жыл бұрын

@@jazzbuckeye this comment is missing pointer to smarter people. But thanks for signaling that you know the people to listen to

@jazzbuckeye Жыл бұрын

@@milosnovotny2571 I'm saying that you shouldn't listen to Yudkowsky.

@Dr.Harvey 2 ай бұрын

Now I believe we are fucked.

@ekka6946 Жыл бұрын

Here after the AGI doompost...

@markkorchnoy4437 Жыл бұрын

We all going to die in 15 years guys

@Witnessmoo Жыл бұрын

Tops

@michaelsbeverly 10 ай бұрын

@@Witnessmoo I think the over/under is 60 months.

@mahneh7121 8 ай бұрын

"Solved folding problem" good prediction

@TheXcracker Жыл бұрын

It 3023 and they are studying the history of the great AI wars of 2030.

@sozforex Жыл бұрын

I guess they will remember those wars as first steps to converting everything to those nice paperclips.

@milanstevic8424 4 жыл бұрын

@6:20 The Allais Paradox is completely botched. It should go like this: 1A = 100% $1M 1B = 89% $1M / 1% $0 / 10% $5M 2A = 11% $1M / 89% $0 2B = 90% $0 / 10% $5M And because normal people typically choose 1A->2B, instead of going for 1A->2A or 1B->2B, the paradox then went on to demonstrate that the Utility function as a whole is a very dumb idea (at a time), and not well-grounded in rational decision-making. This is my loose interpretation of the argument. The actual wording on WP is like this: "We don't act irrationally when choosing 1A and 2B; rather expected utility theory is not robust enough to capture such "bounded rationality" choices that in this case arise because of complementarities." The argument was basically confined to a Utility theory aka Expected utility hypothesis for which it was designed. The two gambles are practically identical, from the machine-like perspective, considering the following: A is at least 11% $1M decision no matter what B is at least 10% $5M decision no matter what It turned out that humans have nuanced and complex rationales even behind decisions based on such a simple math. Then they improved on all of this and are fucking us with the economy ever since. Not to mention the AI bullshit. It still does not work, but there are no Allais's in this world to construct valid paradoxes. The Alas Paradox. Call me bitter.

@HR_GN 6 жыл бұрын

Why would you not leave the water bottle open if you are going to take a sip every minute?

@amcmr2003 5 жыл бұрын

Why take a fake cigarette to a club if it produces no carcinogens?

@FeepingCreature 5 жыл бұрын

Clumsy people learn to reflexively close their water bottles after taking a sip. It prevents accidents. (This goes doubly for software developers.)

@cellsec7703 4 жыл бұрын

So bugs can't fly into it.

@WALLACE9009 Жыл бұрын

Obviously he is assuming the AI cannot understand the goal the same way or better than a human can do. We will not at ask for smiles we will ask it to help people and it will understand perfectly what we mean

@mav3818 10 ай бұрын

The challenge lies in ensuring that the AI's interpretation of "helping people" aligns with our complex moral and ethical frameworks. Human values are inherently subjective and can be open to interpretation, making it difficult to program an AI to fully comprehend and prioritize them in every situation.

@Hjkkgg6788 7 ай бұрын

This guy is like a ben shapiro prophet of doom and gloom