Note that the covariance matrix shown at 6:10 should be [0.724 0.687 0.687 1.046] for more accurate calculations.
@azibatorbanigo40432 жыл бұрын
How did you compute the covariance matrix from the green data points?
@youngzproduction74983 жыл бұрын
I love the way you explicitly explain every step of calculations. It helps me who is not a math expert understand the concept at ease. Thanks.
@tilestats3 жыл бұрын
Great!
@alecmunnur59183 жыл бұрын
That was heck of a good explanation. Thanks very much👍
@tilestats3 жыл бұрын
Thank you!
@startupeco2257 Жыл бұрын
Very well explained! Even for a non-mathematician.
@szymonk.72373 жыл бұрын
So clearly explained ! 😮 Thank you for it ❤️
@tilestats3 жыл бұрын
Thank you!
@kyle96976 ай бұрын
very comprehensive explainnation. thank you
@merythegirl2 жыл бұрын
This video helped a lot, thank you for this!
@forrestoakley48822 жыл бұрын
Thank you! Very clear explanation
@tilestats3 жыл бұрын
I got this comment: "Are you sure the inverse of the covariance matrix is correct? This is what I get when I put it into symbolab. [4.1 -2.82 -2.82 2.95]." This is due to that the covariance matrix has been rounded. This is the covariance matrix with more decimals. x y x 0.7241053 0.6869474 y 0.6869474 1.0462105
@compsci913 жыл бұрын
Got it! Thank you for clearing that up!
@tabyonyt80912 жыл бұрын
this was enlightening, thanks a lot
@lba7238 Жыл бұрын
Excellent video currently studying up to be able to break up a single model into sub models and I'm trying to use the m distance
@guidenote7713 жыл бұрын
Thank you sir for another great video!
@tilestats3 жыл бұрын
Thank you!
@ricardpunsola2 жыл бұрын
Very helpful, thanks 👍🏻
@ya002783 жыл бұрын
Super clear. Thank you!!
@tilestats3 жыл бұрын
Thank you!
@amankushwaha89273 жыл бұрын
Thanks. It was really informative
@tilestats3 жыл бұрын
Thank you!
@Nada-yc8uo3 жыл бұрын
Thank you sir
@Unaimend Жыл бұрын
Hi Andreas, could you explain why I should expect a chi-square distribution at 8:26. As always a nice video :)
@tilestats Жыл бұрын
If you would square the values from a normal distribution, those values will generate a chi-square distribution with 1 df. So, calculations that involve squaring stuff usually result in that we use the chi-square distribution.
@Unaimend Жыл бұрын
Thanks for the explanation@@tilestats
@wagon192 жыл бұрын
Can you tell me how you built the ellipse? Preferably in the program scilab
@tilestats2 жыл бұрын
I answered a similar question below. Hope that helps.
@Jonathan_wow3 жыл бұрын
How did you consider the corresponding critical value 13.82 at 9:50 minute of the video if the cut off is 0.001? Can you kindly explain it ?
@tilestats3 жыл бұрын
If you like a cutoff of 0.001, you should extract the corresponding value from a chi-square distribution, which means that you should extract the value that defines 0.001 of the upper tail. In this example, the area to the right-hand side of 13.82 in a chi-square distribution with 2 degrees of freedom is 0.001. Use a software or a chi-square table to get this value. The cutoff 0.001 is an arbitrary, but common, value to use to detect outliers.
@TM-vg4mx3 жыл бұрын
great video, thanks
@tilestats3 жыл бұрын
Thank you!
@mathematicswithmushtaqkhan86474 ай бұрын
Excellent
@yd3130 Жыл бұрын
Is it the centroid that has to be computed or the mean. I think they aren't always the same, right?
@tilestats Жыл бұрын
I would say the overall mean in the multivariate space. As you point out, a centroid might have different meanings in different fields.
@tone58753 жыл бұрын
hi can you elaborate more on generating 95% error ellipse. do we use random number generator with normal distribution to create it? is there a simple example of generating random numbers with intended distribution, or ive read long time ago from monte carlo where we can use cholesky decomposition to create data from correlation matrix? curios to know the mechanics behind them
@tilestats3 жыл бұрын
You simply draw the ellipse based on the eigenvectors and eigenvalues of the covariance matrix. I used the package ellipse in R to draw the ellipse but if you like to know the details, I suggest this page: www.visiondummy.com/2014/04/draw-error-ellipse-representing-covariance-matrix/#google_vignette
@tone58753 жыл бұрын
@@tilestats thx a lot.
@shivamsharma6255 Жыл бұрын
mazaa aa Gaya bhai
@lorenzotagliari66997 ай бұрын
I did not understand why the cutoff od 0.001 would not be appropriate in cases when we have many datapoints. Could you clear this up for me?
@tilestats7 ай бұрын
Because, 0.1% of the data points will be outside the ellipse due to chance. If you for example have 1 million data points, you should expect that 1000 are outside the ellipse, right? It would then not be appropriate to define all these as outliers.
@cmindaaa3 жыл бұрын
How do you get 6.45 as the MD for point 2? When I calculate using the same method for point 1, i got back the same MD as point 1
@tilestats3 жыл бұрын
Go to minute 6:32, and replace vector [5 5] by [5 1] for data point 2. Try and do the math again and let me know if it works.
@cmindaaa3 жыл бұрын
@@tilestats Yeap, I have tried and I still did not get it. My workings: [1.9 -2] * matrix * [1.9 -2]. Eventually, I get sqrt(5.080360804). I took 5 - 3.1 = 1.9 and 1 -3 = -2
@tilestats3 жыл бұрын
@@cmindaaa If you multiply the row vector [1.9 -2 ] by the matrix, you should get the row vector [11.83 -9.56]. If you multiply this row vector by the column vector [1.9 -2.0], you should get the number 41.597. The square root of this number is about 6.45.
@cmindaaa3 жыл бұрын
@@tilestats omg i got it! thank you so much!!
@jacksonchen86792 жыл бұрын
Thank you
@tilestats2 жыл бұрын
Thank you!
@rambisneves20773 жыл бұрын
Hi Tile, Could you share these points in an excel file?
@tilestats3 жыл бұрын
I do not have the original data since that was randomly generated. However, the data below should work to reproduce the calculations: x=[4.6, 4.4, 3.9, 3.9, 3.8, 3.5, 3.8, 3.4, 3.0, 2.7, 3.7, 3.0, 2.5, 2.2, 2.9, 2.5, 2.3, 2.1, 2.1, 1.5] y=[4.6, 4.1, 4.5, 3.9, 3.5, 4.0, 3.3, 3.2, 3.7, 3.5, 2.1, 2.7, 3.1, 3.2, 2.3, 2.0, 2.3, 1.8, 1.4, 1.0]
@rambisneves20773 жыл бұрын
@@tilestatsthanks, What do you think in relation to do the ellipse in the excel file?
@MrTOCSY3 жыл бұрын
Is it correct to calculate the error ellipse for the autoscaled data for PCA calculation?
@tilestats3 жыл бұрын
Not sure I understand. It would be OK to calculate the error ellipse based on the scores in 2D (if that is what you mean).
@MrTOCSY3 жыл бұрын
@@tilestats, yes, I ment 2D score plot. "It would be OK to calculate the error ellipse based on the scores in 2D" But why? The data were previously autoscaled, i.e. were divided by standard deviation. Is it correct to calculate the error ellipse for scores since scores and autoscaled data are DIFFERENT in their own nature?
@MrTOCSY3 жыл бұрын
@@tilestats Sorry for the bothering, but you explain transparently and simply. A rare phenomenon if we consider statistics )
@tilestats3 жыл бұрын
Yes, since scaling does not affect the relative distances between the points. If you create an error ellipse of unscaled data, and you, for example, identify 2 points outside that ellipse, the same points will be outside that ellipse if you scale the data, given that you of course calculate the ellipse on the scaled data. Try this on a simple data set, which will help to understand.
@MrTOCSY3 жыл бұрын
Is it correct to calculate MD using correlation matrix instead of covarience matrix?
@tilestats3 жыл бұрын
No, you will then not get the correct value, unless you have standardized data, where the covarince and correlation matrix will be identical. Have a look at my video about this: kzbin.info/www/bejne/aJPGnp6iq9eLirM
@MrTOCSY3 жыл бұрын
The data are autoscaled. Numerical values of elements of correlation matrix and covariance matrix are equal.
@MrTOCSY3 жыл бұрын
And one more question, if I may. If we are up to find an outlier on a 2D score plot of principal components should we use a covariance matrix of SCORES?
@tilestats3 жыл бұрын
Yes, but note that PC1 and PC2 are uncorrelated.
@juhoke2 жыл бұрын
I wish I had seen this video during my clustering methods course. I had to drop it because I did not understand for example meaning of centroids.
@tilestats2 жыл бұрын
I have two vids on clustering if you like to catch up: kzbin.info/www/bejne/q4jJkJKBfrCthrM kzbin.info/www/bejne/anbCdXmDqZtjqMU
@eyupyondem48182 жыл бұрын
Hi sir; this is a really nice and clear explanation. However, there may be an incorrect covariance matrix inversion since when I compute the values in R, it gave me another result. X X [,1] [,2] [1,] 0.72 0.69 [2,] 0.69 1.00 > solve(X) [,1] [,2] [1,] 4.100041 -2.829028 [2,] -2.829028 2.952030 > X %*% solve(X) [,1] [,2] [1,] 1 0 [2,] 0 1
@tilestats2 жыл бұрын
That is because I show rounded values in the covariance matrix. In the first comment below the video, I show the covariance matrix with more decimals.