This video is sooo cool. I have a question isn’t the problem with hidden variables that there are an infinite amount of them. So how would this model scale?
@Thinkstr14 күн бұрын
Hi, thanks for watching! I think a big problem with hidden states is gradient vanishing/explosion, so even though there are infinite possible hidden states, they have a problem holding information for too long. The model might need to be a lot larger to have longer memory. If you use ChatGPT a lot, you probably notice it forgets what you said a long time ago.
@F_Sacco3 ай бұрын
Loved this video! Also i think this is relevant to training LLMs. Often dataset curation is really important because you want the LLM to learn only from "high quality text"... But shouldn't the LLM be capable of recognizing high quality text and ignore bad text on its own? Ideally you would want the LLM to ignore text that is low quality or that he knows very well and only focus on what is "high quality" I think having an LLM capable of being curious or bored just like us humans will solve this problem
@Thinkstr3 ай бұрын
I like that! Maybe the FEP could help with vanishing gradient by figuring out what information is good.
@ЕвгенийИванов-ю2б2 ай бұрын
Hi! Really nice work! If I get that correct, in your case actor and critic use that hidden state instead of observations. At the same time the hidden state is the conceptualization of the world, provided by the forward model. The goal of the forward model is to predict the future observations and at the same time keep sustainable world view by not being overly surprised by seeing something irrelevant (on which it’ll be difficult to predict the next observation given an action). Can I ask you: do you freeze the forward model weights while training actor and critic? I guess forward model can learn patterns relevant to predict future state or patterns that are also relevant for the actor and critic. What are the ways of backpropagation? For example in GANs you freeze a discriminator while backpropagating towards generator.
@Thinkstr2 ай бұрын
@@ЕвгенийИванов-ю2б Hi, thanks for watching! Yeah, your understanding sounds like it's about as deep as mine, but I can't guarantee that's all there is, ha ha. About backpropogation, I don't freeze the forward model, but the hidden states are detached before the actor and critic train with them. I could consider the forward model, actor, and critic to be one model making actions, predictions of value and predictions of observations, and then the curiosity term could incorporate all those parts, but I tried it, and I think those hyperparameters where too tough to fine tune.
@ЕвгенийИванов-ю2б2 ай бұрын
@@Thinkstr Thanks for your answer! If I got that correct, it's a bit like if in your game a dog is chasing a cat, and a cat moves away from a dog. They see and must recognize each other among other animals. And for deep features you can train VAC by restoring images of them, or you can use pretrained ResNet50 conv part, which has trained on that domain specific task of classification, including among other classes cats and dogs. And It's a kind of dilemma what should be primary, what should be secondary in terms of deep features, because usually deep feature vector doesn't correspond to the original image vector in terms of size, you have to compress the information. Actor and critic may be interested in some domain-specificity and that may not be provided (hypothetically) by the conceptualization of the world, learned by forward model... I mean my internal mental state should be build upon features that are relevant to my specific task to a certain extent, not only good upon ability to predict the future state, if I get that right... Anyway great job! There is shortage of something practical and palpable in terms of FEP usage in reinforcement learning. If I write an article on FEP (maybe some time)) ), I'll definitely make a link to your article.
@Thinkstr2 ай бұрын
@@ЕвгенийИванов-ю2б Oh, wow, thanks! Just saying its a palpable use of the FEP is an honor, because it's such a hard idea to grasp, haha. And yeah, the actor and critic might be better off seeing observations themselves, but it saves a lot of computation time to use the forward model's hidden states instead, because then you only need to process the observations once!
@Naker-l1e3 ай бұрын
Wow!! I have been waiting for this video for a long time; I didn't think the results would be so impressive. I will definitely read your paper carefully! I am working on applying the free energy principle and active inference to Transformers. Do you know of any work on this? Congratulations from Spain on your work.
@Thinkstr3 ай бұрын
Thanks for watching! The paper in the arxiv should eventually be replaced by the same paper with changes by Neural Computation like border-spacing. Also, I'll eventually upload a video which is a little deeper on the math than this one. I haven't used transformers much, but maybe I should look more into them!