Euclidean distance and the Mahalanobis distance (and the error ellipse)

Рет қаралды 38,979

TileStats

Күн бұрын

Пікірлер: 66

@tilestats 3 жыл бұрын

Note that the covariance matrix shown at 6:10 should be [0.724 0.687 0.687 1.046] for more accurate calculations.

@azibatorbanigo4043 2 жыл бұрын

How did you compute the covariance matrix from the green data points?

@youngzproduction7498 3 жыл бұрын

I love the way you explicitly explain every step of calculations. It helps me who is not a math expert understand the concept at ease. Thanks.

@tilestats 3 жыл бұрын

Great!

@alecmunnur5918 3 жыл бұрын

That was heck of a good explanation. Thanks very much👍

@tilestats 3 жыл бұрын

Thank you!

@startupeco2257 Жыл бұрын

Very well explained! Even for a non-mathematician.

@szymonk.7237 3 жыл бұрын

So clearly explained ! 😮 Thank you for it ❤️

@tilestats 3 жыл бұрын

Thank you!

@kyle9697 6 ай бұрын

very comprehensive explainnation. thank you

@merythegirl 2 жыл бұрын

This video helped a lot, thank you for this!

@forrestoakley4882 2 жыл бұрын

Thank you! Very clear explanation

@tilestats 3 жыл бұрын

I got this comment: "Are you sure the inverse of the covariance matrix is correct? This is what I get when I put it into symbolab. [4.1 -2.82 -2.82 2.95]." This is due to that the covariance matrix has been rounded. This is the covariance matrix with more decimals. x y x 0.7241053 0.6869474 y 0.6869474 1.0462105

@compsci91 3 жыл бұрын

Got it! Thank you for clearing that up!

@tabyonyt8091 2 жыл бұрын

this was enlightening, thanks a lot

@lba7238 Жыл бұрын

Excellent video currently studying up to be able to break up a single model into sub models and I'm trying to use the m distance

@guidenote771 3 жыл бұрын

Thank you sir for another great video!

@tilestats 3 жыл бұрын

Thank you!

@ricardpunsola 2 жыл бұрын

Very helpful, thanks 👍🏻

@ya00278 3 жыл бұрын

Super clear. Thank you!!

@tilestats 3 жыл бұрын

Thank you!

@amankushwaha8927 3 жыл бұрын

Thanks. It was really informative

@tilestats 3 жыл бұрын

Thank you!

@Nada-yc8uo 3 жыл бұрын

Thank you sir

@Unaimend Жыл бұрын

Hi Andreas, could you explain why I should expect a chi-square distribution at 8:26. As always a nice video :)

@tilestats Жыл бұрын

If you would square the values from a normal distribution, those values will generate a chi-square distribution with 1 df. So, calculations that involve squaring stuff usually result in that we use the chi-square distribution.

@Unaimend Жыл бұрын

Thanks for the explanation@@tilestats

@wagon19 2 жыл бұрын

Can you tell me how you built the ellipse? Preferably in the program scilab

@tilestats 2 жыл бұрын

I answered a similar question below. Hope that helps.

@Jonathan_wow 3 жыл бұрын

How did you consider the corresponding critical value 13.82 at 9:50 minute of the video if the cut off is 0.001? Can you kindly explain it ?

@tilestats 3 жыл бұрын

If you like a cutoff of 0.001, you should extract the corresponding value from a chi-square distribution, which means that you should extract the value that defines 0.001 of the upper tail. In this example, the area to the right-hand side of 13.82 in a chi-square distribution with 2 degrees of freedom is 0.001. Use a software or a chi-square table to get this value. The cutoff 0.001 is an arbitrary, but common, value to use to detect outliers.

@TM-vg4mx 3 жыл бұрын

great video, thanks

@tilestats 3 жыл бұрын

Thank you!

@mathematicswithmushtaqkhan8647 4 ай бұрын

Excellent

@yd3130 Жыл бұрын

Is it the centroid that has to be computed or the mean. I think they aren't always the same, right?

@tilestats Жыл бұрын

I would say the overall mean in the multivariate space. As you point out, a centroid might have different meanings in different fields.

@tone5875 3 жыл бұрын

hi can you elaborate more on generating 95% error ellipse. do we use random number generator with normal distribution to create it? is there a simple example of generating random numbers with intended distribution, or ive read long time ago from monte carlo where we can use cholesky decomposition to create data from correlation matrix? curios to know the mechanics behind them

@tilestats 3 жыл бұрын

You simply draw the ellipse based on the eigenvectors and eigenvalues of the covariance matrix. I used the package ellipse in R to draw the ellipse but if you like to know the details, I suggest this page: www.visiondummy.com/2014/04/draw-error-ellipse-representing-covariance-matrix/#google_vignette

@tone5875 3 жыл бұрын

@@tilestats thx a lot.

@shivamsharma6255 Жыл бұрын

mazaa aa Gaya bhai

@lorenzotagliari6699 7 ай бұрын

I did not understand why the cutoff od 0.001 would not be appropriate in cases when we have many datapoints. Could you clear this up for me?

@tilestats 7 ай бұрын

Because, 0.1% of the data points will be outside the ellipse due to chance. If you for example have 1 million data points, you should expect that 1000 are outside the ellipse, right? It would then not be appropriate to define all these as outliers.

@cmindaaa 3 жыл бұрын

How do you get 6.45 as the MD for point 2? When I calculate using the same method for point 1, i got back the same MD as point 1

@tilestats 3 жыл бұрын

Go to minute 6:32, and replace vector [5 5] by [5 1] for data point 2. Try and do the math again and let me know if it works.

@cmindaaa 3 жыл бұрын

@@tilestats Yeap, I have tried and I still did not get it. My workings: [1.9 -2] * matrix * [1.9 -2]. Eventually, I get sqrt(5.080360804). I took 5 - 3.1 = 1.9 and 1 -3 = -2

@tilestats 3 жыл бұрын

@@cmindaaa If you multiply the row vector [1.9 -2 ] by the matrix, you should get the row vector [11.83 -9.56]. If you multiply this row vector by the column vector [1.9 -2.0], you should get the number 41.597. The square root of this number is about 6.45.

@cmindaaa 3 жыл бұрын

@@tilestats omg i got it! thank you so much!!

@jacksonchen8679 2 жыл бұрын

Thank you

@tilestats 2 жыл бұрын

Thank you!

@rambisneves2077 3 жыл бұрын

Hi Tile, Could you share these points in an excel file?

@tilestats 3 жыл бұрын

I do not have the original data since that was randomly generated. However, the data below should work to reproduce the calculations: x=[4.6, 4.4, 3.9, 3.9, 3.8, 3.5, 3.8, 3.4, 3.0, 2.7, 3.7, 3.0, 2.5, 2.2, 2.9, 2.5, 2.3, 2.1, 2.1, 1.5] y=[4.6, 4.1, 4.5, 3.9, 3.5, 4.0, 3.3, 3.2, 3.7, 3.5, 2.1, 2.7, 3.1, 3.2, 2.3, 2.0, 2.3, 1.8, 1.4, 1.0]

@rambisneves2077 3 жыл бұрын

@@tilestatsthanks, What do you think in relation to do the ellipse in the excel file?

@MrTOCSY 3 жыл бұрын

Is it correct to calculate the error ellipse for the autoscaled data for PCA calculation?

@tilestats 3 жыл бұрын

Not sure I understand. It would be OK to calculate the error ellipse based on the scores in 2D (if that is what you mean).

@MrTOCSY 3 жыл бұрын

@@tilestats, yes, I ment 2D score plot. "It would be OK to calculate the error ellipse based on the scores in 2D" But why? The data were previously autoscaled, i.e. were divided by standard deviation. Is it correct to calculate the error ellipse for scores since scores and autoscaled data are DIFFERENT in their own nature?

@MrTOCSY 3 жыл бұрын

@@tilestats Sorry for the bothering, but you explain transparently and simply. A rare phenomenon if we consider statistics )

@tilestats 3 жыл бұрын

Yes, since scaling does not affect the relative distances between the points. If you create an error ellipse of unscaled data, and you, for example, identify 2 points outside that ellipse, the same points will be outside that ellipse if you scale the data, given that you of course calculate the ellipse on the scaled data. Try this on a simple data set, which will help to understand.

@MrTOCSY 3 жыл бұрын

Is it correct to calculate MD using correlation matrix instead of covarience matrix?

@tilestats 3 жыл бұрын

No, you will then not get the correct value, unless you have standardized data, where the covarince and correlation matrix will be identical. Have a look at my video about this: kzbin.info/www/bejne/aJPGnp6iq9eLirM

@MrTOCSY 3 жыл бұрын

The data are autoscaled. Numerical values of elements of correlation matrix and covariance matrix are equal.

@MrTOCSY 3 жыл бұрын

And one more question, if I may. If we are up to find an outlier on a 2D score plot of principal components should we use a covariance matrix of SCORES?

@tilestats 3 жыл бұрын

Yes, but note that PC1 and PC2 are uncorrelated.

@juhoke 2 жыл бұрын

I wish I had seen this video during my clustering methods course. I had to drop it because I did not understand for example meaning of centroids.

@tilestats 2 жыл бұрын

I have two vids on clustering if you like to catch up: kzbin.info/www/bejne/q4jJkJKBfrCthrM kzbin.info/www/bejne/anbCdXmDqZtjqMU

@eyupyondem4818 2 жыл бұрын

Hi sir; this is a really nice and clear explanation. However, there may be an incorrect covariance matrix inversion since when I compute the values in R, it gave me another result. X X [,1] [,2] [1,] 0.72 0.69 [2,] 0.69 1.00 > solve(X) [,1] [,2] [1,] 4.100041 -2.829028 [2,] -2.829028 2.952030 > X %*% solve(X) [,1] [,2] [1,] 1 0 [2,] 0 1

@tilestats 2 жыл бұрын

That is because I show rounded values in the covariance matrix. In the first comment below the video, I show the covariance matrix with more decimals.