Thank you so much Richard. I always find little stuff on the subject. Your videos are very valuable
@skeletonmasterkiller27 күн бұрын
Thank you so much for this
@skeletonmasterkiller27 күн бұрын
Is there any advantage to using alternate representations like a grey coded vector or binary spatter codes?
@richardaragon847127 күн бұрын
The #1 thing I have learned from my research into this is that EVERYTHING is a variable. It is a variable whether you encode the data as the shape of a Poincare curve vs a toris . Why? I could not offer the first guess as to why. I know it is a variable though. So the way you mathematically represent it is 100% a variable.
@skeletonmasterkiller27 күн бұрын
@@richardaragon8471 one interesting idea I have is to try and encode the state of the network itself as a hyper dimensional state and whenever a new query comes in traverse over the good states to find the answer
@richardaragon847127 күн бұрын
@@skeletonmasterkiller I can't believe you are mentioning this lmfao. I just haven't made a video on it yet. I figured out exactly how to do it yesterday. colab.research.google.com/drive/1hSuy5n_jyBplQlTGx4rMKQ3IbonBdgEr?usp=sharing
@richardaragon847127 күн бұрын
@@skeletonmasterkiller I made a podcast video about it on my second channel lol: kzbin.info/www/bejne/pXvQi5SXeLOSnJo
@skeletonmasterkiller27 күн бұрын
@@richardaragon8471 wow nice this is very similar to neural gas models or kohonen self organizing maps, but being so data dependent makes the network brittle. It is a matter of where the network gets its supervisory learning signal from. It can get it from the data itself, from self supervision, but it can also get it from previous 'experiences' it had from solving problems related to other data sets it has encountered. So my idea is a little different. During each step of training or fine tuning an LLM I want to create a high dimensional state vector and store it. The sequence of state vectors that lead to a "positive" result are bound (binding/aggregation) together this is the experience vector. We use the best experience vector to formulate a state plan for the network which would require modification to the weight update for the task it is currently encountering. So the training and inference are both conducted during the learning phase and During inference time we form a state plan that traverses the state vector space and of all the sequences observed this far chooses the best sequence that predicts or solves the task given before it. If no plan is found the network is retrained on the samples that it got wrong using intermediate states and expanded (check out zig zag products and expander graphs they are really cool) until it solves the task.