Рет қаралды 146
For this visualization, we have 5 neural networks N1,...,N5 of the form Chain(Dense(1,mn,tanh),Dense(mn,mn,tanh),Dense(mn,mn,tanh),
Dense(mn,mn,tanh),Dense(mn,1,tanh)) where mn=40. In particular, these neural networks take in single real numbers as inputs and return real numbers as outputs.
The loss level for a data point (x,y) for a neural network N is the value
(N(x)-y)^2. The networks are trained to minimize the mean loss level over all data points. The training data points are values of the form (x,Ni(x)) where i belongs to {1,...,5} and where x is a uniformly random point in the domain shown in the visualization. After each gradient update for the networks, we update the training data set with new points. This way, in order to minimize the loss level, the neural networks need to all compute the same function.
On the right side of the visualization, we sett the graphs of N1,...,N5 while the left side of the visualization shows N1-M,...,N5-M where M=(N1+...+N5)/5 which is just the mean of all 5 networks.
We observe that the neural networks are able to compute functions that look similar on the training interval, but when we zoom into the differences between these networks, we see that these neural networks cannot all converge to the same function. While the neural networks N1,...,N5 are trained to be similar to one another, we see that the similarities between the networks N1,...,N5 are kist superficial. While the networks N1,...,N5 resemble each other on the training interval, outside the training interval, these networks behave substantially different from each other.
The notion of a neural network is not my own. I am simply making neural network visualizations in order to demonstrate the properties of neural networks especially with regard to AI safety and interpretability. This visualization is a negative result for AI safety and interpretability since the networks are clearly incapable of interpreting each other and since when we zoom into the differences between these networks, these differences are themselves difficult to interpret. A safer and more interpretable AI algorithm would be an algorithm where the networks would converge to exactly the same thing when they are trained to imitate each other.
Unless otherwise stated, all algorithms featured on this channel are my own. You can go to github.com/spo... to support my research on machine learning algorithms. I am also available to consult on the use of safe and interpretable AI for your business. I am designing machine learning algorithms for AI safety such as LSRDRs. In particular, my algorithms are designed to be more predictable and understandable to humans than other machine learning algorithms, and my algorithms can be used to interpret more complex AI systems such as neural networks. With more understandable AI, we can ensure that AI systems will be used responsibly and that we will avoid catastrophic AI scenarios. There is currently nobody else who is working on LSRDRs, so your support will ensure a unique approach to AI safety.