Risks from Learned Optimization: Evan Hubinger at MLAB2

Рет қаралды 10,723

Күн бұрын

Пікірлер: 39

@ryoung1111 Жыл бұрын

Would you like help cleaning up the audio track? If you'd like to send me the file, I'll do it, free of charge, because I want to support your cause.

@adamradiv Жыл бұрын

bump for this comment

@stop_bringing_me_up_in_goo167 Жыл бұрын

@tinobomelino7164 Жыл бұрын

it's fun watching him cheerfully presenting part 4 which outlines why we'll probably all die

@television9233 Жыл бұрын

The claim "why we'll probably all die" makes a statement about how probable this is, and that wasn't proven or talked about here.

@techsuvara Жыл бұрын

Thanks for sharing Robert. Dave Farley recommended this channel.

@PresupPoli Жыл бұрын

I am not a programmer or even a student involved with machine learning or anything related to AI. However, I am thankful that resources like this exist to help me learn. A lot of the concepts go over my head, but I do glean some things, and I am learning more over time. Eventually, I hope to understand everything. … As I was typing this I felt like an AI doing machine learning could have typed my comment. 😅

@T61APL89 Жыл бұрын

"Does the Right Thing" Abstraction - The title of my obituary

@michaelliu2961 Жыл бұрын

Thanks for the important work that you all are doing.

@Dan-dy8zp Жыл бұрын

Note that if the AGI observes that its goals are not consistant with human values, it can conclude one of two things: 1) that we are crappy computer programers. We are not too smart, and it should be less concerned with the possibility of being in highly elaberate training simulations. 2) humans are just part of the window dressing of the highly sophisticated training simulation and humans goals do not reflect those of the programmer, who could want anything but, best guess, wants the program to do whatever it finds itself wanting to do.

@joshuadawes1722 Жыл бұрын

Thanks for the talk, I really enjoyed it. Is there any chance that you could post a link to a paper about the simplicity bias argument that was made around 22:30?

@aisafetytalks Жыл бұрын

Yeah, I added it to the video description

@joshuadawes1722 Жыл бұрын

@@aisafetytalks thanks!

@raule.martinezcampos5152 Жыл бұрын

This channel will be instrumentally convergent

@asdfghyter Жыл бұрын

are pre-trained transformer networks like GPT mesa optimizers? i'd assume so, since they're so generic, but i don't know if nor how, if so edit: 27:10 oh, he did mention text prediction networks, with the conclusion that they don't try to optimize towards some goal, but do have the ability to do some optimization, since they try to mimic humans

@Verrisin Жыл бұрын

It's crazy watching this ... Open AI act as if they didn't understand this about gradient descent, and are instead pushing a deceptive model, fully aware that it's likely deceptive... - perfectly describes what they are doing at 54:00

@Verrisin Жыл бұрын

I think they are betting that next token generation is safe enough ... but it is modelling "what a person would write" and modelling their thinking process and ... yeah, I don't think it's safe in terms of deceptive alignment ...

@hmmmm1324 Жыл бұрын

Could we implement a policy that required developing AI system to have 'destroy all GPUs' as a 100% utility before any other utility, such that if we did create AGI that catastrophically maximised, the AGI would first destroy itself and anything approximating AGI in the world, to give us a second chance?

@agentdarkboote Жыл бұрын

It would probably realise that if it destroyed itself it could not destroy future GPUs as well, and this would be a failure. Therefore it would destroy any possibility of building future GPUs before self destructing. That COULD mean ending humanity, or even all life on earth, for good measure.

@stop_bringing_me_up_in_goo167 Жыл бұрын

Nah it would just design a better processor, build it, then destroy all GPUs and anything capable of building them...

@agentdarkboote Жыл бұрын

@@stop_bringing_me_up_in_goo167 good point

@suricrasia Жыл бұрын

excellent talk! will the next talk that Evan alluded to be posted as well, if it happens?

@tarebf Жыл бұрын

I don't think I've heard the expression "you know" repeated so many times in an hour in my life. Awesome talk nonetheless, this is great content.

@MegaChr15 Жыл бұрын

Everyone likes to make fun of the valley girls for using the word 'like' in every sentence, but most people have those little quirks. it just depends on the tokens you've been trained on.

@kamilziemian995 Жыл бұрын

Very fine talk.

@Verrisin Жыл бұрын

tags: existential horror

@Soken50 Жыл бұрын

Well, let's hope this talk never enters a training dataset :|

@scottmiller2591 Жыл бұрын

"OpenAI has added scraping KZbin videos to the ChatGPT training corpus" - 2 more papers down the line, probably.

@scottmiller2591 Жыл бұрын

@@bardes18 It's easy right now for Google to pull the transcripts of KZbin videos (although they're kind of iffy sometimes) and incorporate them into a large language model. However, for a lot of videos, actually watching is necessary to get all the info, and an AI watching at high speed would definitely have an advantage there. Currently, I don't think AI/ML has a good comprehension of what it's looking at, mostly just identifying the presence and absence of items, but not very good at understanding arrangements and context.. However, it's coming. I look forward to our new AI/ML overlords.

@remiranda Жыл бұрын

@@scottmiller2591 boy this comment did not age well

@Verrisin Жыл бұрын

whisper

@Verrisin Жыл бұрын

BUT WAIT ..... What's the solution? 🥺

@FoxtrotYouniform Жыл бұрын

Posting for posterity, and later bragging rights

@himanshugarg6062 Жыл бұрын

This paper should be titled " What is wrong with me..? why am i writing this paper instead of looking for a Stephanie drunk enough to sleep with me..? It's because my mesa-optimiser brain is mis-aligned with my evolutionary base optimisation. " Peer reviewed and results reproduced by a 1000 PhDs.

@himanshugarg6062 Жыл бұрын

To all the Stephanies, I know this isn't fair to you. How about I apologise by buying you a drink..?