Building a GENERAL AI agent with reinforcement learning

  Рет қаралды 21,692

Machine Learning Street Talk

Machine Learning Street Talk

Күн бұрын

Dr. Minqi Jiang and Dr. Marc Rigter explain an innovative new method to make the intelligence of agents more general-purpose by training them to learn many worlds before their usual goal-directed training, which we call "reinforcement learning".
Their new paper is called "Reward-free curricula for training robust world models" arxiv.org/pdf/2306.09205.pdf
/ minqijiang
/ marcrigter
Interviewer: Dr. Tim Scarfe
Please support us on Patreon, Tim is now doing MLST full-time and taking a massive financial hit. If you love MLST and want this to continue, please show your support! In return you get access to shows very early and private discord and networking. / mlst
We are also looking for show sponsors, please get in touch if interested mlstreettalk at gmail.
MLST Discord: / discord
00:00:00 - Intro
00:01:05 - Model-based Setting
00:02:41 - Similar to POET Paper
00:05:27 - Minimax Regret
00:07:21 - Why Explicitly Model the World?
00:12:47 - Minimax Regret Continued
00:18:17 - Why Would It Converge
00:20:36 - Latent Dynamics Model
00:24:34 - MDPs
00:27:11 - Latent
00:29:53 - Intelligence is Specialised / Overfitting / Sim2real
00:39:39 - Openendedness
00:44:38 - Creativity
00:48:06 - Intrinsic Motivation
00:51:12 - Deception / Stanley
00:53:56 - Sutton / Rewards is Enough
01:00:43 - Are LLMs Just Model Retrievers?
01:03:14 - Do LLMs Model the World?
01:09:49 - Dreamer and Plan to Explore
01:13:14 - Synthetic Data
01:15:21 - WAKER Paper Algorithm
01:21:24 - Emergent Curriculum
01:31:16 - Even Current AI is Externalised/Mimetic
01:36:39 - Brain Drain Academia
01:40:10 - Bitter Lesson / Do We Need Computation
01:44:31 - The Need for Modelling Dynamics
01:47:48 - Need for Memetic Systems
01:50:14 - Results of the Paper and OOD Motifs
01:55:47 - Interface Between Humans and ML

Пікірлер: 47
@Ben_D.
@Ben_D. Ай бұрын
I love the long format and high level context. Excellent.
@MartinLaskowski
@MartinLaskowski Ай бұрын
I really value the effort you put into production detail on the show. Makes absorbing complex things feel natural
@CharlesVanNoland
@CharlesVanNoland Ай бұрын
This is awesome. Thanks Tim! "If we just take a bunch of images and try and directly predict images, that's quite a hard problem, to just predict straight in image space. So the most common thing to do is kind of take your previous sequence of images and try and get a compressed representation of the history of images, in the latent state, and then predict the dynamics in the latent state." "There could be a lot of spurious features, or a lot of additional information, that you could be expending lots of compute and gradient updates just to learn those patterns when they don't actually impact the ultimate transition dynamics or reward dynamics that you need to learn in order to do well in that environment."
@redacted5035
@redacted5035 Ай бұрын
00:00:00-Intro 00:01:05 - Model-based Setting 00:02:41-Similar to POET Paper 00:05:27 - Minimax Regret 00:07:21 -Why Explicitly Model the World? 00:12:47- Minimax Regret Continued 00:18:17-Why Would It Converge 00:20:36-Latent Dynamics Model 00:24:34-MDPS 00:27:11-Latent 00:29:53- Intelligence is Specialised / Overfitting / Sim2real 00:39:39 - OpenendednesS 00:44:38-Creativity 00:48:06 - Intrinsic Motivation 00:51:12 - Deception / Stanley 00:53:56 - Sutton /Rewards is Enough 01:00:43- Are LLMs Just Model Retrievers? 01:03:14 - Do LLMs Modelthe World? 01:09:49 - Dreamer and Plan to Explore 01:13:14 - Synthetic Data 01:15:21 - WAKER Paper Algorithm 01:21:24 - Emergent Curriculum 01:31:16 - Even Current Al is Externalised/Mimetic 01:36:39- Brain Drain Academia 01:40:10 - Bitter Lesson /Do We Need Computation 01:44:31-The Need for Modelling Dynamics 01:47:48 - Need for Memeetic Systems 01:50:14 -Results of the Paper and OOD MotifS 01:55:47 -Interface Between Humans and ML
@ehfik
@ehfik Ай бұрын
great guests, good interview, interesting propositions! MLST is the best!
@NextGenart99
@NextGenart99 Ай бұрын
Seemingly straightforward, yet profoundly insightful.
@Dan-hw9iu
@Dan-hw9iu Ай бұрын
Superb interview, Tim. This is among your best. I was amused by the researchers hoping/expecting that future progress will require more sophisticated models in lieu of simply more compute; I would probably believe this too, if my career depended on it! But I suspect that we'll discover the opposite: the Bitter Lesson was a harbinger for the Bitter End. Human-level AGI needed no conceptual revolutions or paradigm shifts, just boosting parameters -- intellectual complexity doggedly follows from system complexity. More bit? More flip? More It. And why should we have expected a more romantic story? Using a dead simple objective function, Mother Nature marinated apes in a savanna for awhile and out popped rocket ships. _Total accident._ No reasoning system needed. But if we _intentionally_ drive purpose-built systems toward a mental phenomenon like intelligence, approximately along a provably optimal learning path, for millions of FLOP-years...we humans will additionally need a satisfying cognitive model to succeed? I'm slightly skeptical. The power of transformers was largely due to vast extra compute (massive training parallelism) that they unlocked. And what were the biggest advancements since their inception? Flash attention? That's approximating more intensive compute. RAG? Cached compute. Quantization? Trading accuracy for compute. Et cetera. If the past predicts the future, then we should expect progress via incremental improvements in compute (training more efficiently, on more data, with better hardware, for longer). We're essentially getting incredible mileage out of an algorithm from the '60s. Things like JEPA are wonderful contributions to that lineage. But if anyone's expecting some fundamentally new approach to reach human-level AGI, then I have a bitter pill for them to swallow...
@conorosirideain5512
@conorosirideain5512 Ай бұрын
It's wonderful that model based RL has become more popular recently
@diga4696
@diga4696 Ай бұрын
Amazing guests!!! Thank you so much. Human modalities, when symbolically reduced and quantized into language and subsequently distilled through a layered attention mechanism, represent a sophisticated attempt to model complexity. This process is not about harboring regret but rather acknowledges that regret is merely one aspect of the broader concept of free energy orthogonality. Such endeavors underscore our drive to understand reality, challenging the notion that we might be living in a simulation by demonstrating the depth and nuance of human perception and cognition.
@flyLeonardofly
@flyLeonardofly Ай бұрын
Great episode! Thank you!
@Niamato_inc
@Niamato_inc Ай бұрын
Thank you wholeheartedly.
@BilichaGhebremuse
@BilichaGhebremuse Ай бұрын
Great interview
@sai4007
@sai4007 Ай бұрын
One important thing which world models bring in over simple forward dynamics model is learning to infer latent Markovian belief state representations from observations through probablistic filtering. This distinguishes latent state world models from normal MbRL! Partial observability is handled systematically by models like dreamer, which use a recurrent variational inference objective along with Markovian assumption on latent states to learn variational encoders that infer latent Markovian belief states.
@lancemarchetti8673
@lancemarchetti8673 Ай бұрын
Wow! This was awesome
@XOPOIIIO
@XOPOIIIO Ай бұрын
I've missed it, why exactly it would explore the world? What's the reward function is?
@olegt3978
@olegt3978 Ай бұрын
Amazing. We are on the highway to AGI in 2027-2030
@johnkintree763
@johnkintree763 Ай бұрын
There is a concept of a Wikibase Ecosystem that could become a shared world model on which effective agent actions could be planned.
@willbrenton8482
@willbrenton8482 Ай бұрын
Can someone link their work with JEPAs?
@maddonotcare
@maddonotcare Ай бұрын
Impressive ideas and impressive endurance to hold that water bottle for 2 hours
@GameShark02
@GameShark02 Ай бұрын
what is up my homies
@master7738
@master7738 Ай бұрын
nice
@lancemarchetti8673
@lancemarchetti8673 Ай бұрын
Imagine the day when an AGI agent can retain steganographic data within lossy image formats even after recmooression or cropping.
@RokStembergar
@RokStembergar Ай бұрын
This is your Carl Sagan moment
@michaelwangCH
@michaelwangCH Ай бұрын
The search problem is converted into minmax-optimization. But here is the problem, without training data of specific environment the max. regret of each action can not be defined - same as Maxcut problem, we can not know that the function we found is the best action we can take. To avoid the worst case in every action the agent will end up with model with mediocrity performance - all world model has to be turing complete, capable to deal all possible states. Therefore those model will not exist, especially in stochastical environments and the outcomes are uncertain. Conclusion: Minmax is mathematical problem which is still unsolved. Therefore their publication and talk are pure theoretical and they can not show empirically it works in reality with real data. To predict the latent state in RL is not new idea as well, those models are highly dependent on environments which the agent in. Only learn in representation in latent space resp. learn the absract concept of task without integration of environment the model will not generalize, poor performance.
@uber_l
@uber_l Ай бұрын
Here I provide a simple AGI solution: reduction-(simulation-relation-simulation)-action. Simulation could last variably, for robots instantly using only accurate physics, for difficult tasks using increasingly complex imagination with rising randomness, think human dreams. Give it enough time and/or compute and it will move the world
@eliaskouakou7051
@eliaskouakou7051 Ай бұрын
People are too preoccupied by one upping one another that they never ask : should we??
@paulnelson4821
@paulnelson4821 Ай бұрын
It seems like you are going from a totally bounded training environment to “open ended” AGI. Joscha Bach has a multi-level system that includes Domesticated Adult and Ascended as a way to stratify human development. Maybe you need some kind of Bar Mitzvah or puberty to consider a staged development that would lead to general agency.
@johangodfroid4978
@johangodfroid4978 Ай бұрын
not bad , far away from the final reward system of an AGI, I know how to build it for this reason I can say still a long way to do, reward system is so much more simple however really good episode and interesting people
@antdx316
@antdx316 Ай бұрын
👍
@eliaskouakou7051
@eliaskouakou7051 Ай бұрын
Intelligence isn’t about being optimised but let free. You can’t develop intelligence in a box
@Anders01
@Anders01 Ай бұрын
My amateur guess is that AI models will start to learn by themselves to become more general, especially things like robots and IoT devices who can receive a lot of data from the physical world. In the beginning some hardcoded strategy by humans might be needed but after a while the AI models can start to optimize themselves, connected to computer clouds.
@rodneyericjohnson
@rodneyericjohnson Ай бұрын
You see how making an AI model that seeks the unpredictable to make it more predictable leads to the end of all life, right?
@aladinmovies
@aladinmovies Ай бұрын
AGI is here
@cakep4271
@cakep4271 Ай бұрын
I'm confused about the synthetic data thing.. how could fake data ever actually be useful for learning something? How can studying fiction teach you about reality? Seems like it would just muddle what you learned from reality directly, with stuff thats not true in reality.
@antdx316
@antdx316 Ай бұрын
AGI being able to figure out what you need to happen before you can figure it out yourself is going to require the world to have a working UBI model soon or else.
@awrjkf
@awrjkf Ай бұрын
We need to start working on UBI model now. I am also saving to buy a piece of land for farming. I think we all should. No matter where the land is, as long as it is fertile. Because no matter what happens to the economy, as long as we can sustain ourselves, it would be a good safe guard to survive.
@geldverdienenmitgeld2663
@geldverdienenmitgeld2663 Ай бұрын
the data does not come from humans. Data comes from the world. And if humans could gather the data, machines can gather it as well.
@johnkintree763
@johnkintree763 Ай бұрын
Agreed. Language models can recognize entities and relationships, and represent them in a graph structure, which becomes the world model on which agents can plan actions.
@tobiasurban8065
@tobiasurban8065 Ай бұрын
I agree with the intuition but reject the detached observer perspective on agent versus world. I would phrase it, the information for the system comes from the environment of the system, where the observer itself is again a system.
@johntanchongmin
@johntanchongmin Ай бұрын
My answer: No, we can't. But we can build a generally intelligent agent within a fixed set of environments that can use the same pre-defined action space
@dg-ov4cf
@dg-ov4cf Ай бұрын
nerds
@Greg-xi8yx
@Greg-xi8yx Ай бұрын
This isn’t even up for debate anymore. The only question is: is it 1 or 5 years away?
@Onislayer
@Onislayer Ай бұрын
Optimizing towards a nash equilibrium still won't be generally intelligent. the intelligence in life that has pushed humanity forward is very much at the extremes not some game theoretic optimal objective. "Innovation" through optimization is lazy and uninspired .
@RanjakarPatel
@RanjakarPatel Ай бұрын
This incorrectly my dear but I am so proud four you’re try. Everyone’s need four improve branes four become expertise like four me. I am number computer rajasthan so please take care four you’re minds four acceleration educating
@andybaldman
@andybaldman Ай бұрын
This all seems like really complicated ways of saying very simple things, which these models are not going to fully solve. The models are all too simple. No matter how they are architected, as long as they are made of non-agential parts, they will always be brittle.
@rcstann
@rcstann Ай бұрын
¹1¹st I'm sorry Dave, I'm afraid I can't do that . 🔴 .
This is why Deep Learning is really weird.
2:06:38
Machine Learning Street Talk
Рет қаралды 311 М.
This is what DeepMind just did to Football with AI...
19:11
Machine Learning Street Talk
Рет қаралды 163 М.
YouTube's Biggest Mistake..
00:34
Stokes Twins
Рет қаралды 64 МЛН
КАРМАНЧИК 2 СЕЗОН 4 СЕРИЯ
24:05
Inter Production
Рет қаралды 652 М.
Normal vs Smokers !! 😱😱😱
00:12
Tibo InShape
Рет қаралды 26 МЛН
AI Series - Ep.36: Visionary Realms. Featuring Mario Ramić
32:36
SaaS Leaders Lounge
Рет қаралды 23
AI and Quantum Computing: Glimpsing the Near Future
1:25:33
World Science Festival
Рет қаралды 236 М.
Training AI Without Writing A Reward Function, with Reward Modelling
17:52
Robert Miles AI Safety
Рет қаралды 234 М.
The Possibilities of AI [Entire Talk] - Sam Altman (OpenAI)
45:49
Stanford eCorner
Рет қаралды 398 М.
CAN MACHINES REPLACE US? (AI vs Humanity)
2:31:34
Machine Learning Street Talk
Рет қаралды 4,9 М.
#54 Prof. GARY MARCUS + Prof. LUIS LAMB - Neurosymbolic models
2:24:13
Machine Learning Street Talk
Рет қаралды 54 М.
What Is an AI Anyway? | Mustafa Suleyman | TED
22:02
TED
Рет қаралды 797 М.
Transforming AI | NVIDIA GTC 2024 Panel Hosted by Jensen Huang
53:48
NVIDIA Developer
Рет қаралды 88 М.
The Most Important Algorithm in Machine Learning
40:08
Artem Kirsanov
Рет қаралды 184 М.
LinkedIn Co-Founder Reid Hoffman on the Future of AI
1:14:27
Columbia Business School
Рет қаралды 28 М.
Внутренности Rabbit R1 и AI Pin
1:00
Кик Обзор
Рет қаралды 1,9 МЛН
Рекламная уловка Apple 😏
0:59
Яблык
Рет қаралды 803 М.