Handwritten digits generated by an AI that maximizes fitness

Рет қаралды 6,212

Күн бұрын

This is the visualization of a machine learning algorithm generating the handwritten digits 0-9 twice each by optimizing the fitness level using gradient ascent.
The MNIST data set is a collection of 60,000 handwritten digits used to train an AI model together with 10,000 used to test the AI model. Each of these digits are represented as 28 by 28 matrices.
For our purposes, we only use the first 100 training data points.
Suppose that D_1,,...,D_{100} are our handwritten digits, and v_1,...,v_{100} are their numerical values one hot encoded as vectors of length 10.
Let F denote the 28 by 28 discrete Fourier transform matrix.
Let A be a 20 by 784 matrix. Let B_1,...,B_{784} be 10 by 20 matrices. The fitness level of (A,(B_1,...,B_{784})) is
\sum_{i=1}^{100}log(abs(dot((\sum_{j=1}^{784}B_j red(vec(F D_i F)_j]))*A*vec(D_i),v_i)))/100-log(norm(adjoint(A)*A))/2-log(norm(Z^2))/4 where Z=\sum_{i=1}^784 adjoint(B_j) B_j and where if v is a vector, then red(v) is the vector obtained by replacing the first entry in v with zero. The absolute value function is applied elementwise. There are many variations to this sort of fitness function.
We maximize this fitness using gradient ascent (with an initialization consisting of matrices with positive entries) to obtain a tuple (A,(B_1,...,B_{784})) which can be used to distinguish MNIST digits.
After maximizing the fitness level, consider functions g_0,...,g_9 by setting g_i(x)=abs(dot((\sum_{j=1}^{784}B_j red(vec(F D_i F)_j]))*A*vec(x),v_i))/norm(x)^3 where norm(x) refers to the Frobenius norm. The functions g_0,...,g_9 can be used to classify handwritten digits by declaring a matrix x to have value k if g_k(x)=\max_j g_j(x). The functions g_0,...,g_9 may even be used to determine whether an image is an image of a handwritten digit at all. If all of the values g_0(x),...,g_9(x) are low, then it is not likely that x is a handwritten digit at all.
For the visualization, we maximize the fitness function g_i(x) using gradient ascent twice for each digit i in {0,...,9}, and we show the two matrices during the course of training i. We observe that for each digit, in both instances, we obtain the same matrices after training.
The absolute value of the Fourier transform of the image will be the same if the image is rotated up down or left right. The absolute value of the Fourier transform also non-linearly expands our matrix with positive entries into two matrices with positive entries, and it seems easier to distinguish MNIST digits using two matrices with positive entries instead of just one. We need positivity in order for our algorithms to consistently produce the same results, so we had to take the absolute value of the Fourier transform.
The AI algorithm for producing MNIST digits is my own. This experiment has a few good interpretability characteristics. First of all, after training, the tuple (A,(B_1,...,B_{784})) will not have any random or pseudorandom information in the sense that if we train (A,(B_1,...,B_{784})) multiple times, then the resulting tuple (A,(B_1,...,B_{784})) will be the same thing up to floating point errors and symmetry. The values x where g_j(x) is maximized are also unique (up to a sign) even though g_j incorporates the absolute value function. Finally, the values x where g_j(x) is maximized are actual MNIST digits instead of adversarial examples. But with that being said, these machine learning models for distinguishing and generating MNIST digits, and the MNIST handwritten digit dataset is very simple any easy to analyze. These AI algorithms for distinguishing and producing MNIST digits do not use the grid structure of the 28 by 28 images nor can these algorithms generalized to algorithms where adding many layers improves performance that much. The model g_0,...,g_9 for distinguishing handwritten digits is barely a non-linear model, and g_0,...,g_9 does not seem to be particularly suited for image processing.
Unless otherwise stated, all algorithms featured on this channel are my own. You can go to github.com/spo... to support my research on machine learning algorithms. I am also available to consult on the use of safe and interpretable AI for your business. I am designing machine learning algorithms for AI safety such as LSRDRs. In particular, my algorithms are designed to be more predictable and understandable to humans than other machine learning algorithms, and my algorithms can be used to interpret more complex AI systems such as neural networks. With more understandable AI, we can ensure that AI systems will be used responsibly and that we will avoid catastrophic AI scenarios. There is currently nobody else who is working on LSRDRs, so your support will ensure a unique approach to AI safety.

Пікірлер: 59

@josephvanname3377 5 күн бұрын

After training, we always get the same images regardless of our initialization.

@SeanStClair-cr9jl 2 күн бұрын

Woah, that is a cool observation

@Hylianmonkeys 2 күн бұрын

Is guess that is what makes them great for using as numbers. Easy to recognize.

@josephvanname3377 2 күн бұрын

@@SeanStClair-cr9jl A good chunk of my videos are about various machine learning algorithms that tend to return the same trained model regardless of the initialization.

@BadChess56 2 күн бұрын

Fascinating

@npc4416 Күн бұрын

yes this was also observed with efficient compute line

@halfsine 2 күн бұрын

i found out that i'm actually just a crayon sniffer compared to actually smart programmers with that description

@josephvanname3377 2 күн бұрын

I can't say that crayons are significantly worse than the standard American diet (this is not dietary advice obviously).

@RubyPiec 2 күн бұрын

this is math tbf

@josephvanname3377 Күн бұрын

@@RubyPiec It is better to say that math is sensible and logical and able to be understood.

@Hylianmonkeys 2 күн бұрын

I like that the 7 has like 3 or 4 different versions overlapping.

@josephvanname3377 2 күн бұрын

I tried to incorporate an absolute value of the Fourier transform layer in order to try to prevent the AI from simply stacking the training examples on top of each other (and because non-linearity makes it interesting). But the AI stacked the digits anyways.

@npc4416 Күн бұрын

my dumass was thinking its gonna show 10 next 😹😭

@josephvanname3377 Күн бұрын

Don't worry. I will make another video where it actually does show '10' so you are not disappointed.

@teemteem Күн бұрын

⁠@@josephvanname3377❤

@npc4416 Күн бұрын

@josephvanname3377 yayyyyyy

@danieldelacruz7038 5 күн бұрын

Are those the most average handwritten digit possible?

@josephvanname3377 5 күн бұрын

They bear some resemblance to the average of the training samples that I used. But you can see that in the boundary of the 28 by 28 matrices after training, it is not a uniform color, but in the MNIST training data, the boundary consists of uniform colored images. Also, in the generated images, we see some negative values which cannot happen if we are just averaging them. The reason for this is that in the second layer for the network, we use the absolute value of the Fourier transform of the images, and to get the Fourier transform right, the generated handwritten digits need to fill in all entries in the 28 by 28 matrix with non-zero values. And I generated the new digits from a sample size of 100.

@CrimsomGloryXD 2 күн бұрын

So the most fit 4 is the Inverted Chair(tm)

@josephvanname3377 2 күн бұрын

That is because there are multiple images of 4 and some of them are shifted and rotated a bit. A bit of preprocessing may help with this, but I did not do any preprocessing.

@anonymous_memer7397 Сағат бұрын

These could just be hung up in an abstract art museum and no one would bat an eye

@deadhorseak 2 күн бұрын

you should do this with a sort of facial recognition framework. really rudimentary, just whether or not said pixels meet threshold for a "real face" and then see how high above the threshold you can get

@deadhorseak 2 күн бұрын

i remember someone did something similar with a sat solver optimizing the fitness for a standard set of viola-jones algorithm parameters ages ago. will have to find it again...

@josephvanname3377 2 күн бұрын

In that case, I will need to make something that looks like a convolutional network, but I should make it so that if we train twice, then we get exactly the same thing. I have gotten better results on other tasks but I have not gotten much computer vision and image processing success yet (I won't use the standard neural networks since with today's neural networks, if we train multiple times, we will tend to get completely different networks each time because today's neural networks are messy and non-mathematical).

@Uhhhhhhhhh777 Күн бұрын

The 8 actually came out really nice!

@josephvanname3377 Күн бұрын

But your question mark look even better.

@usernametaken017 Күн бұрын

that's a weird-looking 4

@denki2558 Күн бұрын

Did you compare that with the direct average of each class of training samples? If the results are invariant from the model, then you're likely uncovering the statistics of the training data.

@josephvanname3377 Күн бұрын

I did compare the generated image with the average. The main difference is that the generated digits have values below zero which cannot happen if we were to take the average of the training sample. The generated digits also are uneven in the corners which cannot happen if we simply took the average of the digits. The digits that I have produced are statistics of the training data, but they are more complicated than simply the mean of the training examples especially since I had to use the absolute value of the discrete Fourier transform in everything. I tried to produce statistics of the training data, but I wanted to produce more complicated statistics of the training data that look like their own training examples (but I only was able to use a few layers since I don't know how to generalize this procedure to more layers).

@Solar_427 2 күн бұрын

I thought it says maximizes the error

@josephvanname3377 2 күн бұрын

The difference is that we want the error value to go all the way down to zero. Here the locally maximum fitness level is some value, but it is not a priori clear where what that locally maximized fitness level actually is until we do the experiment.

@Solar_427 2 күн бұрын

@@josephvanname3377 no no I know this, error value, gradient decent etc I just read it wrong and was really confused for a second

@Solar_427 2 күн бұрын

@@josephvanname3377 but thanks anyway

@casev799 5 күн бұрын

I think the MNIST is beat, and now I'm wondering just how different numbers are in device/system fonts. Using a custom one I installed a while back and the 5 looks kinda similar(†) to the one here. †: I should say kinda similar, I doubt it could ever be an exact match unless I set it to these outputs

@josephvanname3377 5 күн бұрын

The MNIST is solvable even by a basic linear regression model, so I am making it more difficult by requiring a few more standards besides accuracy on the test data: 1. If we train the network multiple times to obtain multiple networks N,M, we end up with N(x)=M(x) for all inputs x. 2. After training the network N, if we use gradient descent to generate an input x_j for each digit j and we do this again to generate a new input y_j for each digit j, we end up with x_j=y_j regardless of the initialization. 3. The images x_j that we produce should visually represent MNIST digits.

@casev799 5 күн бұрын

I have typed this about 3 times and I still made errors, I see.... dang

@NoenD_io Күн бұрын

Next: numbers that ai cannot suspect are not themselves

@josephvanname3377 17 сағат бұрын

That is a double negative.

@NoenD_io 17 сағат бұрын

@josephvanname3377 ?

@josephvanname3377 17 сағат бұрын

@@NoenD_io "ai canNOT suspect are NOT themselves".

@NoenD_io 17 сағат бұрын

@josephvanname3377 tü iupo heja reff curtö

@josephvanname3377 17 сағат бұрын

@@NoenD_io Are you doing alright?

@Lord_Rynan Күн бұрын

1 is a bone

@josephvanname3377 Күн бұрын

Thanks for the ink-blot analysis.

@takeraparterer Күн бұрын

try a GAN architecture

@josephvanname3377 Күн бұрын

This network behaves completely opposite to GANs. Here, if we use the same data and hyperparameters multiple times, we would get the exact same results. But GANs are anything but consistent which is not good for AI safety and interpretability. But I should look into convolutional networks where if we train the network multiple times, then we would get the exact same thing, because I need to incorporate the grid structure into the AI architecture.

@arothron6973 2 күн бұрын

In what programming language did you wrote this AI? (I'm not fluent in English)

@josephvanname3377 2 күн бұрын

I used Julia.

@V0W4N Күн бұрын

ohhh so thats why mnist didnt work as well as other dataset i used is there a resource to read so i can render fitness like this?

@josephvanname3377 17 сағат бұрын

I do not give expert advice to anonymous entities for free on this site.

@twixerclawford 2 күн бұрын

that 2 looked wack!

@josephvanname3377 2 күн бұрын

Regardless of how the 2 looks, we produced the 2 twice using different initializations and we got the same digit. Perhaps I could have done some pre-processing to all of the handwritten digits lined up. But then again, the discrete Fourier transform that I applied was supposed to completely ignore any shifts in the image.

@peashooterman3 2 күн бұрын

wait a minute i remember you you're the sine function guy lol

@josephvanname3377 2 күн бұрын

I have made a lot more content rather than just a video about the sine function. This site simply recommends the same content to everyone. The sine function is high school mathematics, and I have made content about various topics such as large cardinals, quantum information theory, and reversible computing.

@Im_Rainrot 3 күн бұрын

Yo I'm not einstein so like what's fitness mean here

@Aldueja 3 күн бұрын

I'm guessing it's how much the image looks like it's number counterpart but I could be wrong

@tetrasimplex3236 3 күн бұрын

There is a popular set of data used when people first start making neural networks (simple AIs) on their own consisting of a bunch of handwritten digits. The intended objective of the set is to create an neural network that can match the image of a handwritten number to a digit. Fitness is a commonly used term in the AI/neural network space used usually to mean conforming to something very well or performing very well. In this case, the author of rhe video trained a neural network to create the easiest guessable images possible, aka the 'ideal handwritten number'. The level of easiness the original neural network has guessing it is measured by the fitness value. tl:dr; the fitness value is how close the image is to a very easily identifiable number.

@killianobrien2007 2 күн бұрын

How well it fits with existing data

@josephvanname3377 2 күн бұрын

I gave a detailed description of the fitness function that we used here. The fitness is supposed to measure how well the image represents a handwritten digit, but the fitness function's parameters were trained using its own fitness function to be able to recognize handwritten digits. I designed the fitness function so that if we maximize the fitness level multiple times, we would get the exact same thing regardless of the initialization.