The Dimpled Manifold Model of Adversarial Examples in Machine Learning (Research Paper Explained)

Рет қаралды 16,012

Күн бұрын

Пікірлер: 55

@YannicKilcher 3 жыл бұрын

OUTLINE: 0:00 - Intro & Overview 7:30 - The old mental image of Adversarial Examples 11:25 - The new Dimpled Manifold Hypothesis 22:55 - The Stretchy Feature Model 29:05 - Why do DNNs create Dimpled Manifolds? 38:30 - What can be explained with the new model? 1:00:40 - Experimental evidence for the Dimpled Manifold Model 1:10:25 - Is Goodfellow's claim debunked? 1:13:00 - Conclusion & Comments Paper: arxiv.org/abs/2106.10151 My replication code: gist.github.com/yk/de8d987c4eb6a39b6d9c08f0744b1f64 Goodfellow's Talk: kzbin.info/www/bejne/eXrJpHWVer6mjKs

@ahnafapathan 3 жыл бұрын

Yannic I simply cannot put into words my gratitude for you. Please never stop.

@ZimoNitrome 3 жыл бұрын

To add, I agree that their simpler decision boundary is in fact not much simpler, if at all. Readers can easily be fooled between the different models simply because the figures and examplea are 2d vs. 3d. The classic "scattered" 2d samples can just as well have extremely awkward positioning in additional dimensions. The dimpled model would be just as complex. Edit: Nevermind, Yannic covers everything.

@AICoffeeBreak 3 жыл бұрын

Me until 01:20: Nice, this is exactly how I have been thinking about adversarial examples this whole time! 😎 01:27 Yannic: "I don't think this is really useful to think in this way and I will explain why." Me: Okay, seems like I am going to watch the whole video now. 😂

@ukhu_pacha 3 жыл бұрын

I haven't finished the whole video and you are already commenting? I guess I'm late

@AICoffeeBreak 3 жыл бұрын

@@ukhu_pacha I haven't either, I commented about the first minute. 😅 I actually have to postpone watching this video because I have to go in 10 mins. 😑

@ukhu_pacha 3 жыл бұрын

@@AICoffeeBreak Coffee bean needs help!

@ZimoNitrome 3 жыл бұрын

Everyone is biased. As it's difficult to get around that it's good that you disclose it. Good vid. Keep it up!

@josephclements2145 3 жыл бұрын

To me biases are intuitive understandings of patterns observed through experience that are often difficult to explain logically. So biases should be carefully evaluated to determine if there are legitimate rules generating the pattern or if our personal perspective is distorting our understanding. Comparing our biases with someone with different biases one way of accomplishing that task.

@oshri8 3 жыл бұрын

Great video as always. Fun fact Adi Shamir is the "S" in the "RSA" encryption algorithm.

@herp_derpingson 3 жыл бұрын

14:25 This way of thinking is a quite similar to SVM Kernels. There is some plane and you classify stuff depending on which side of the plane the input data lies. . 31:40 IDK if someone has tried this already. The reason the decision boundaries are so close by is because when SGD sees are a zero gradient it says "good enough" and moves on, even if it is right on the edge. I wonder if we can add a "Its good, but lets go a bit further in the same direction just in case" parameter. IDK how to implement it in SGD though. Otherwise SGD will always go for the minimum energy manifold. Something that makes the SGD push the centroids far from each other, even at zero gradient. . 35:19 Yes, the squiggly line is another way of drawing a decision boundary but this way of doing has a "higher energy" associated with it. Forgive me for not knowing the technical term for it. I will just call it "energy". Since SGD is a greedy algorithm, it will always minimize "energy". Think about it like how much energy you would need to hammer a metal sheet to make the squiggly line valleys and ravines vs. how much energy you would need to simply dimple the sheet metal at the exact data points. . 38:47 Up and "dowm" ;) . 40:40 It can be guacamole, you need a 4D space to visualize it. If you have a sufficiently big D, anything is possible. . 45:35 I am not sure about the fur thing. What I understand is that any "human recognizable features" are along the manifold and all non-human recognizable features are perpendicular to the manifold. So, by definition if a human cannot see it, then it must be perpendicular to the manifold, otherwise it would bring sufficient change in image space. . 57:30 The adversarial datapoints are now *pulling* the manifold towards itself instead of pushing it. So, when the adversarial noise is reverted, the image jumps on the other side of the now pulled manifold. It is still at minimum energy. . 1:00:00 I am not sure about projection. I just think that the SGD like algorithm in human brain is not satisfied at the zero gradient, but continues further and makes the dimple deeper and wider. . 1:08:00 How do you even define "perpendicular" for a hyperplane? Sorry, I did not do that much math in college. I cannot comment on this. I was planning to see if moving the point towards any centroid and making it adversarial is perpendicular to the optimally small norm-ed adversarial example. . IDK I actually find the dimple manifold theory rather convincing. With image augmentations what we do is make these "dimple pockets" bigger. So, adversarial datapoints have to go further in image space before they can get out of the pocket. . Infact we can also take this understanding to the double descent paper. Let n be the dimensions of the network and k be the dimensions of the natural image manifold. k should be constant as it as a property of the dataset. As n increases, in the beginning k > n so the NN behaves poorly as it cannot separate everything. As n increases, the performances improves. As n approaches k, the dimple manifold effect takes it true form and it starts overfitting. Then when n becomes sufficiently large, the extra dimensions allow more wiggle room so a small change in image space, causes a large change in manifold space, simply because the space is so high dimensional. Effectively this makes the pockets larger. This in turn prevents overfitting/adversarial examples.

@YannicKilcher 3 жыл бұрын

very good points, nice connections to the double descent phenomenon. the way they make things "perpendicular" is that they linearize the manifold around a data sample, using essentially a first order taylor approximation. from there, perpendicular just means large inner product with the normal to that plane.

@hsooi2748 3 жыл бұрын

I think instead of calling it perpendicular, we can call it "orthogonal". Basically means a 90' angle direction to a new dimension, if you visualize in 2D space extending into 3D. If it is a 4D space, then it is extending into 5D...

@sayakpaul3152 3 жыл бұрын

37:35: Totally agree with the argument on Kernelized SVMs especially if you go higher dimensions. Not only that is backed by good theory but also leads to the implicit bias of SGD with the thought that continuing training will converge to an SVM solution.

@stacksmasherninja7266 3 жыл бұрын

Kinda unsure how this model scales to multiclass setting. Moreover, how do the dimples explain targeted adversarial examples ? You can make the classifier classify a "cat" image as literally any class in the dataset (by decreasing the loss w.r.t. new target class instead of increasing the loss w.r.t. original target class) using PGD. Any idea ?

@YannicKilcher 3 жыл бұрын

Yes, the model would somehow have to explain which "side" the dimples go around the data in high-dimensional space.

@ukhu_pacha 3 жыл бұрын

Squishy fippy stretchy feature model, say that 100 times.

@AhmedNaguibVideos 3 жыл бұрын

36:36 omg that’s right, how did I not see that until that point!

@ericadar 3 жыл бұрын

Why should it be the case that with a sufficiently diverse distribution of training examples and sufficiently complex function (NN) we don’t see convergence on a feature space via gradient descent with k dimensions (k

@YannicKilcher 3 жыл бұрын

I think by the fact that our visual system and these models don't use exactly the same architecture means there will always be features picked up by one but not the other.

@WhatsAI 3 жыл бұрын

Amazing video as always Yannic, thank you for sharing and explaining so clearly!

@jsunrae 3 жыл бұрын

Ever thought of doing a follow up with the researchers?

@Addoagrucu 3 жыл бұрын

Hey Yannic, could you try the experiment at the end with more randomly generated manifolds, maybe even with more adversarial attacks on your part to see if what you claim isn't possibly an artifact as well?

@YannicKilcher 3 жыл бұрын

Code is linked in the description, have at it 😁

@eugenioshi 3 жыл бұрын

does anyone know which app he's using to do these annotations?

@victorrielly4588 3 жыл бұрын

Very good tests. This is what research needs, people stepping up to demonstrate that most research is bogus. Not that this paper is bogus. It makes reasonable claims, but like with all machine learning papers the support for the claims is far poorer than the authors claim. We should move to a publication system where publications of reviews of papers, and reproductions of results are at least as valuable, if not more valuable, and more published than original works. I have an idea of how one might go about validating their claim. Assuming we are working in a framework with binary classification, the main claim of their paper to test is the decision boundary lies close to the image manifold. They can use an auto encoder to estimate the image manifold. One can also sample from the manifold created by the classification boundaries by selecting images with the constraint that when passing the image through the classifier, the output is close to .5 (half way between the two classifications). Determining whether the two manifolds are similar is then a simple matter of determining whether, for any “reasonable” sample from the image manifold generated from the autoencoder, there is a sample close to the decision boundary, that is also close to this sample, and visa versa. Essentially, we can characterize both manifolds, if there is also an effective way to determine the difference between the manifolds, you will have your answer. My suggestions would be: first, instead of using real data, use a toy dataset generated from a predefined and known manifold. This will remove the need for training an autoencoder. You can estimate the difference between two manifolds by sampling from one, and finding the closest point in the other to your sample, and then sampling from the other and finding the closest point to the first from this sample as well. Do this a bunch of times and this will be something like a least square distance between the manifolds. Things to keep in mind, the manifold created by the decision boundary will be 2-d while the actual image manifold may be any dimension. In their example, the manifold learned by the auto-encoder is k-d

@victorrielly4588 3 жыл бұрын

I’m sorry, I made a mistake, the manifold defined by the decision boundary is something like d-1 dimensional, because the input is d-dimensional, and there is one constraint on the output. For example, if the model is linear, the input is d-dimensional, and the constraint is x^Tw = c, that manifold is a d-1 dimensional hyperplane. On the other hand, if the problem was a k-class problem, the decision boundary would be something like a d-k+1 manifold?

@victorrielly4588 3 жыл бұрын

In the k class case the decision manifold would be defined as the set of inputs that provide the same score for each classes, (if the last layer is a softmax, all outputs would be .5), an arbitrarily small perturbation of such an input could then be made to send the output to any desired class. That might answer your question of how these authors expect their results to hold for multi class classification problems.

@eelcohoogendoorn8044 3 жыл бұрын

Why cant I find any google hits about the intersection of SAM and adversarial examples yet? Must be because I suck at search because it seems like something pretty obvious to investigate.

@DistortedV12 3 жыл бұрын

Does this method explain the adversarial examples are features not bugs finding from Aleksander Madry, when they trained the classifier on the adversarial examples and it still had good generalization performance?

@senli6842 3 жыл бұрын

they claim they did, but I don't think they did

@sacramentofwilderness6656 3 жыл бұрын

Thanks a lot for this video and very deep and thoughtful discussion! I have a question concerning the adversarial examples - we start from the image, say, of a cat, and after moving some distance, our classifier would say that it is guacamole. But what happens if one moves further in this direction - model becomes more and more confident, that it is guacamole or there can be some other class, like an apple or helicopter, and when moving far along this direction we would have some weird changes of classes, despite from the view of human perception, there won't be anything meaningful depicted. May it be the case, that far from the data manifold one has a lot of sharp and rapidly changing decision boundaries?

@minecraftermad 3 жыл бұрын

i really don't get why they make this so complicated? isn't it just that there are through lines through the neural net that have just a little bit too much strength and when you just slightly increase the color on those through lines it affects the end way more than you'd expect because they stack up. and as to how to fix this... now that's a more difficult question... maybe try cutting up the neural net sideways in the middle... 47:00 and this would make sense in how i think about it because it would directly not favor the through lines for that images. killing them.

@DamianReloaded 3 жыл бұрын

I kinda intuit that adversarial attacks have more to do with how cnn's are built than with the data distribution. Because when a cnn outputs a high probability that it's "seeing" a bird, what's really saying is "the feature detectors that make this value high have been activated". So in reality, as long as you can create a feature detector activator, you can produce an adversarial attack. It's not about classes or similarity between clases. Cnn's probably "know" nothing about classes (as we do). They aren't "deep"/robust enough. They're just sensitive to pixel values. EDIT: Monochromatic images should be more resilient to adversarial attacks. Maybe monochrome filters should have greater weight for the final classification (of color images). EDIT: I meant B&W as in 1bit images or posterization.

@004307ec 3 жыл бұрын

To me, it just looks like some classic SVM plot. The hyperplane can be presented as a curve that would sometimes split some highly similar sample points into two groups.

@MrJaggy123 3 жыл бұрын

Tldr; authors think that the old way of thinking, x, is deeply flawed. Yannic points out "nobody thinks x".

@bublylybub8743 3 жыл бұрын

I swell I have seen papers talking about this perspective back in 2019.

@senli6842 3 жыл бұрын

I think the proposed dimpled model explanation is somewhat different from the existing explanation. Of course, such discussion is helpful, but I really can't understand why the decision boundary tends to parallel with the data manifold and puts training data on small dimples, which does not make sense as models with many small dimples are hard to generalize on unseen data. Besides, why these dimples are similar enough in different models so that adversarial examples transfer across models?

@BooleanDisorder 10 ай бұрын

Huh, might explain why some image generators had such a hard time creating a pink Labrador for me!

@psxz1 3 жыл бұрын

maybe a random manifold makes more sense in general since it's supposed to be noise anyway. from what little i know from the basics of GANs

@G12GilbertProduction 3 жыл бұрын

As 3000 dimensions it adversarial sampling data got? How would you calculated for your mindfeed, Yannic? I bet it is more than 10³³. :D

@paxdriver 3 жыл бұрын

You're such an entertaining character.

@swordwaker7749 3 жыл бұрын

Suggestion: training ASMR where you watch model TRAIN and see loss slowly go down.

@scottmiller2591 3 жыл бұрын

Maybe SVM margin maximization had a point?

@swordwaker7749 3 жыл бұрын

Israel people are also into machine learning? Nice to see papers from all over the world.

@drorsimon 3 жыл бұрын

Actually, Israel is ranked in the top 10 countries in AI and deep learning when considering the number of publications in NeurIPS. chuvpilo.medium.com/ai-research-rankings-2019-insights-from-neurips-and-icml-leading-ai-conferences-ee6953152c1a

@swordwaker7749 3 жыл бұрын

@@drorsimon Interesting... Somehow, Russia ranks at number 11 despite the president himself showing sign of support.

@oleksiinag3150 3 жыл бұрын

It looks like you were reviewer for this paper, and they pissed you off

@sieyk 3 жыл бұрын

Doesn't the adversarial dataset problem get explained by simply realising that the adversarial noise for the cat was tailored for the original network, therefore the network that classifies that adversarial cat as a dog would require weights that classify the original cat as a cat? I would assume that (very!) minor alterations to the model architecture would make this adversarial-only training fail.

@vslaykovsky 3 жыл бұрын

Ok, meet simple dimple in machine learning!

@sieyk 3 жыл бұрын

I think a big misconception is that NNs encode information in a sensical way; actually, all information of the input is used, with each colour channel being considered separately. It just so happens that to derail a trained network, it requires modification of specific pixels in specific channels such that the target network activates certain kernels (possibly otherwise entirely dormant kernels) to produce a kernel activation pattern that activates the FC layers in such a way (perhaps exploiting dormant neurons again) that the final layer activates the way the adversarial network wanted. This may sound complicated, but it's really straightforward. NNs don't necessarily logically group samples together or even know samples are related, they just train to reduce error; obviously, this doesn't apply to tasks that specifically seek to solve that problem. This paper seems to believe that, somehow, a trained network will _understand_ that all blue samples are encoded 'up' and all red encoded 'down', but this doesn't happen in practice generally. This also plays into underspecification of training data.

@Addoagrucu 3 жыл бұрын

non natural pseudo guacamole is my rapper name

@sieyk 3 жыл бұрын

What? Why do they think the simple example shows a 2d set separated by a curved 1d line? 1d lines _cannot_ be curved, it is an intersection of a high-dimensional plane defined by each neuron being a 1d line (in a 2d space) on their own axis, where the neurons all share a common y-intercept. Since a neuron activation is just y = mx + b, where the b comes from the bias of the _next_ neuron. You can't just _have_ a curved 3d sheet, the maximal curvature of the separation plane is directly related to the number of neurons; which is precisely why it's a straight plane in a high dimension.