Рет қаралды 384
This is a visualization of catastrophic forgetting and complete forgetting which are two distinct phenomena that occur while training machine learning algorithms.
Let K denote the field of complex numbers (Everything would have worked out about the same if K were the field of real numbers or the field of quaternions). Let d=e=8. Let D be a collection of pairs (u,v) where u belongs to K^d and v belongs to K^e and u,v are unit vectors. Let r=3. Then the goal is to find matrices A_1,...,A_r over the field K where A_1 u,...,A_r u all represent the vector v whenever (u,v) is a data point. Here we need multiple vectors to represent the vector v since the relation D is most likely a highly non-linear relation.
The fitness level for the matrices A=(A_1,...,A_r) on the data set D is
[\sum_{(u,v)\in D}log[\sum_{k=1}^{r}|dot(A_k u,v)|^2]/|D|]-2 log(norm(A)). We optimize this fitness function using gradient ascent. For this animation, we use gradient ascent using momentum. The norm of A is defined by setting
norm(A)^2=norm(A_1)^2+...+norm(A_r)^2 where norm(A_j) is the Frobenius norm of A_j. When we normalize A so that norm(A)=1, the fitness level for A is \sum_{(u,v)\in D}log[\sum_{k=1}^{r}|dot(A_k u,v)|^2]/|D|.
For the visualization, there are 200 training data points in total. For each training data point, in the visualization, we show the value \sum_{k=1}^{r}|dot(A_k u,v)|^2. For the visualization, in order to compare how the training behaves with different initialization, we train the matrices (A_1,...,A_r) five different times, and each instance of this training is designated using dots of a different color. During each round of training, we only train with 50 of the 200 training data points. This training schedule is evident in the visualization since the fitness levels for the data points that are being trained on are generally higher than the fitness levels for all training points.
From this visualization, we observe two prominent phenomena which we shall call catastrophic forgetting and complete forgetting.
In catastrophic forgetting, the fitness level for a data point goes down to zero as soon as we are no longer training that data point but instead train with different training points. Catastrophic forgetting is a phenomenon that occurs with typical neural networks and not just my machine learning algorithm.
Complete forgetting is an extreme form of catastrophic forgetting. In this visualization, all five of the tuples of matrices all eventually approximate each other. This means that with enough training, the tuples of matrices completely forget how they were initialized and the initialization has absolutely no effect on the tuple of matrices after enough training. We see that the phenomenon of complete forgetting still occurs even though we train only 50 data points at a time, but complete forgetting takes more training time than catastrophic forgetting.
This is my own machine learning algorithm.
Catastrophic forgetting is sometimes seen as a negative aspect of neural networks, but there are some benefits to catastrophic forgetting. Furthermore, when retraining a network, it is natural for that network to forget some of the material that it was trained on in the past but not in the present (neural networks have a limited amount of memory), but neural networks tend to forget information even if they have extra capacity for learning.
Machine learning models that have exhibited complete forgetting should be seen as more interpretable and safe than machine learning models that do not exhibit complete forgetting. This is because networks that exhibit complete forgetting do not have any random/pseudorandom information that is independent of the training data, and such random/pseudorandom information may interfere with interpretation attempts. Furthermore, complete forgetting is a mathematical property of machine learning algorithms and machine learning algorithms that exhibit such mathematical properties are more likely to be interpretable than ones which do not exhibit such mathematical properties.
Unless otherwise stated, all algorithms featured on this channel are my own. You can go to github.com/spo... to support my research on machine learning algorithms. I am also available to consult on the use of safe and interpretable AI for your business. I am designing machine learning algorithms for AI safety such as LSRDRs. In particular, my algorithms are designed to be more predictable and understandable to humans than other machine learning algorithms, and my algorithms can be used to interpret more complex AI systems such as neural networks. With more understandable AI, we can ensure that AI systems will be used responsibly and that we will avoid catastrophic AI scenarios. There is currently nobody else who is working on LSRDRs, so your support will ensure a unique approach to AI safety.