Speaker just learned the effects of inputs in Transformers as if it was a fundamental breakthrough. Maybe just learning how attention together with embeddings work would save himself lots of hypothetical scenarios and resorting to internal world models. You don’t even need much to prove a unified world model doesn’t exist in Transformers. It’s important to first learn the Transformer basics correctly before embarking on Theories that are easily debunked.
@AlgoNudger8 күн бұрын
Thanks.
@minhuang88488 күн бұрын
Just to make a minor point about idiomatic language: the first Janet completion example IS plausible, it's just not necessarily evident that this is from a hyperstylized dialog script from a movie like, say, Lucky Number Slevin. That or just some very stimulated person interacting in a repetitive manner - but it's easy to come up with situations where this is reasonable and congruent language for sure.
@HoyleBarret-p4e4 күн бұрын
Robinson Anna Miller Charles Taylor Dorothy
@RossettiAries-s5w4 күн бұрын
Martin Margaret Anderson Ronald Clark Nancy
@EsatBargan8 күн бұрын
Gonzalez Jeffrey Lopez Sandra Jackson Steven
@MuratDagcan7 күн бұрын
Perez Sharon Clark Donald Perez Gary
@MahmutAyabakan8 күн бұрын
Garcia Richard Jones Elizabeth Perez Maria
@xspydazx8 күн бұрын
its interesting the guessing games played by these researcheers ! We have a neural netwoork which is a collectio of regression tress as well as word matrixes : So it uses the regression to predict the next word given the matrixes for each position in the sequence : SO we trained the same model to produce outpuits based on questions and fed it im=nputs and outputs and forced to to match the output given the input ! so we have amny of this too ! so we need to understand that the neural network did not changed and it can be used to move a robotic arm ! as what is a neural network !> it predicts based on past actions so it is picking the highest probable output given the input ! by training the model on multiple tasks .... who knew that it could maintain past tasks : Such tat themodel used to preditnumbers based on handwriting, can also be trained to answer a question ! so we can findout that what is actualy fgoing on is Regression ! as we can map nearly any task to t regresion model which is the structure of the neural network : the transformer uses word matrixes as a state ! but to drive a car we would have a diffeent state and to generate a sound or image we owuld have a different state ! so to make the model very versitile its all about what state we can pass throught the modle and it will produce regression tress based on this state at various layers ! so we find that the layer coul=nt can help with ttransformationa dn the more comlexed the more layers arerrequired ! today we have found that with the transformer model as long as we can place the satte in TEXT format we can use this model for w=varrious types of predictive asks ... so what is the state inside the mdel now ?? is it word to word matrixes a? NO! its tenirs and vectors ! so a mathmatically represented data can be regressed and predicted ! .... right now we are only using tesors of massive width to represnt the massive state of the sequence but it could be smallerr ! So we find that the attentoon is all you need is a very important step in transofmrer allowing for rretaergeting of the expected output and keeping the modle from straying from the ctual expected outcome ! --- -these variious attention methods are the diversive factor in the network as at these lcation the state is what is attended too: it is rewoven innto the current layer or step ! this allows us to have many layers! and gradually changing the output as it passes through these layers : we find interestingly that we can take outputs qhich are vaild from various layers ! hence the attentionn layers are actualy doing more than regression !