Language Models as World Models

  Рет қаралды 4,489

MITCBMM

MITCBMM

Күн бұрын

Jacob Andreas, MIT

Пікірлер: 10
@janerikbellingrath820
@janerikbellingrath820 Күн бұрын
great talk
@GerardSans
@GerardSans 7 күн бұрын
Speaker just learned the effects of inputs in Transformers as if it was a fundamental breakthrough. Maybe just learning how attention together with embeddings work would save himself lots of hypothetical scenarios and resorting to internal world models. You don’t even need much to prove a unified world model doesn’t exist in Transformers. It’s important to first learn the Transformer basics correctly before embarking on Theories that are easily debunked.
@AlgoNudger
@AlgoNudger 8 күн бұрын
Thanks.
@minhuang8848
@minhuang8848 8 күн бұрын
Just to make a minor point about idiomatic language: the first Janet completion example IS plausible, it's just not necessarily evident that this is from a hyperstylized dialog script from a movie like, say, Lucky Number Slevin. That or just some very stimulated person interacting in a repetitive manner - but it's easy to come up with situations where this is reasonable and congruent language for sure.
@HoyleBarret-p4e
@HoyleBarret-p4e 4 күн бұрын
Robinson Anna Miller Charles Taylor Dorothy
@RossettiAries-s5w
@RossettiAries-s5w 4 күн бұрын
Martin Margaret Anderson Ronald Clark Nancy
@EsatBargan
@EsatBargan 8 күн бұрын
Gonzalez Jeffrey Lopez Sandra Jackson Steven
@MuratDagcan
@MuratDagcan 7 күн бұрын
Perez Sharon Clark Donald Perez Gary
@MahmutAyabakan
@MahmutAyabakan 8 күн бұрын
Garcia Richard Jones Elizabeth Perez Maria
@xspydazx
@xspydazx 8 күн бұрын
its interesting the guessing games played by these researcheers ! We have a neural netwoork which is a collectio of regression tress as well as word matrixes : So it uses the regression to predict the next word given the matrixes for each position in the sequence : SO we trained the same model to produce outpuits based on questions and fed it im=nputs and outputs and forced to to match the output given the input ! so we have amny of this too ! so we need to understand that the neural network did not changed and it can be used to move a robotic arm ! as what is a neural network !> it predicts based on past actions so it is picking the highest probable output given the input ! by training the model on multiple tasks .... who knew that it could maintain past tasks : Such tat themodel used to preditnumbers based on handwriting, can also be trained to answer a question ! so we can findout that what is actualy fgoing on is Regression ! as we can map nearly any task to t regresion model which is the structure of the neural network : the transformer uses word matrixes as a state ! but to drive a car we would have a diffeent state and to generate a sound or image we owuld have a different state ! so to make the model very versitile its all about what state we can pass throught the modle and it will produce regression tress based on this state at various layers ! so we find that the layer coul=nt can help with ttransformationa dn the more comlexed the more layers arerrequired ! today we have found that with the transformer model as long as we can place the satte in TEXT format we can use this model for w=varrious types of predictive asks ... so what is the state inside the mdel now ?? is it word to word matrixes a? NO! its tenirs and vectors ! so a mathmatically represented data can be regressed and predicted ! .... right now we are only using tesors of massive width to represnt the massive state of the sequence but it could be smallerr ! So we find that the attentoon is all you need is a very important step in transofmrer allowing for rretaergeting of the expected output and keeping the modle from straying from the ctual expected outcome ! --- -these variious attention methods are the diversive factor in the network as at these lcation the state is what is attended too: it is rewoven innto the current layer or step ! this allows us to have many layers! and gradually changing the output as it passes through these layers : we find interestingly that we can take outputs qhich are vaild from various layers ! hence the attentionn layers are actualy doing more than regression !
The Platonic Representation Hypothesis
1:09:11
MITCBMM
Рет қаралды 2,7 М.
Touching Act of Kindness Brings Hope to the Homeless #shorts
00:18
Fabiosa Best Lifehacks
Рет қаралды 19 МЛН
An Unknown Ending💪
00:49
ISSEI / いっせい
Рет қаралды 46 МЛН
Query, Key and Value Matrix for Attention Mechanisms in Large Language Models
18:22
Machine Learning Courses
Рет қаралды 3,4 М.
Coding Consciousness: An Algorithm for Awareness?
41:49
World Science Festival
Рет қаралды 44 М.
The scientist who coined retrieval augmented generation
1:13:47
Machine Learning Street Talk
Рет қаралды 6 М.
LLM vs NLP | Kevin Johnson
10:36
dscout
Рет қаралды 16 М.
Can ChatGPT o1-preview Solve PhD-level Physics Textbook Problems?
19:53
Variational Autoencoders | Generative AI Animated
20:10
Deepia
Рет қаралды 9 М.
How might LLMs store facts | Chapter 7, Deep Learning
22:43
3Blue1Brown
Рет қаралды 483 М.
Touching Act of Kindness Brings Hope to the Homeless #shorts
00:18
Fabiosa Best Lifehacks
Рет қаралды 19 МЛН