Standard deviation of residuals or Root-mean-square error (RMSD)

Рет қаралды 132,273

7 жыл бұрын

Courses on Khan Academy are always 100% free. Start practicing-and saving your progress-now: www.khanacademy.org/math/stat...
Calculating the standard deviation of residuals (or root-mean-square error (RMSD) or root-mean-square deviation (RMSD)) to measure disagreement between a linear regression model and a set of data.

Пікірлер: 37

@abakella 2 жыл бұрын

Once again Sal Khan saves my life on an important assignment. A tale as old as time.

@galactic-nucleus 2 жыл бұрын

Two problems with this video. 1) The degrees of freedom for simple regression is n-2. One for the estimate of the y intercept and one for the estimate of the slope. So the denominator for this should be 2 not three. That results in a value of .866 and not .707. 2) Residual standard error, or standard deviation of residuals as you are calling it does use degrees of freedom as the denominator. However for RMSE, most statisticians use n in the denominator. So technically, RMSE and residual standard error are not the same thing. I'll cut you some slack though since many software packages and even SPSS conflate these two different things. SPSS reports residual standard error as RMSE.

@nancyj795 Жыл бұрын

Yes: it's called the Standard Error of the Estimate in most spreadsheets - "STEYX" and it does come out to your number of .866.

@mickmertens3554 4 жыл бұрын

Perfectly described, thank´s a lot!

@124deaper1 7 жыл бұрын

Can you please post a video which talks about modulus? That chapter has been always getting me.

@techtak5948 4 жыл бұрын

Perfectly described. Thank you

@punyipeter8174 5 жыл бұрын

Thanks for the good explanation

@abimaeldominguez4126 3 жыл бұрын

What does people use RMSE for?: "To figure out how much the model disagrees with the actual data". Thank you for that!

@rholin0997 3 жыл бұрын

@4:10 I wish I could free hand lines this straight on my computer, Sal has some serious skill

@XinhLe 3 жыл бұрын

he has special device for this work. I guess

@jingyiwang5113 Жыл бұрын

Thank you so much for this amazing video!

@baptistezhong4321 2 жыл бұрын

Is there any way to normalize RMSD? Such as R2 in linear regression. If you tell me a RMSD value, I still don't know how well it is fitted since I don't know the data set.

@surajnakhate8986 2 жыл бұрын

Realy informative thank you

@Vihntage 3 жыл бұрын

What would be the equations for the (two) regression lines that are one standard deviation away from the original regression line? So the two gold/yellow lines parallel with the regression line

@jingyiwang5113 Жыл бұрын

And I have learned a lot from your channel!

@spinLOL533 3 жыл бұрын

thank you!

@thegorillaz4759 4 жыл бұрын

thank you sir

@sweetberries4611 4 жыл бұрын

amazing

@cnlowry 6 жыл бұрын

why is it n-1=3 in the denom and not n=4 in the denom?

@freelusion7330 5 жыл бұрын

For standard deviation it's is standard to use n-1 to dispatch the noise a little bit.

@syedib 5 жыл бұрын

i have used same dataset to find root mean squared error in scikit learn python library i got value like 0.6123724356957945, it is considering n as 4 and not 3, it is bit confusing

@mausunk 5 жыл бұрын

They are calculating the mean of a subset (or sample set) of a whole population. When you work with a population you would use σ (sigma) for the standard deviation and μ (mu) for the mean. For a population you would use (big n) N=4, this means there are 4 degrees of freedom (df) because the mean of the population (μ) is the true mean and all points used to calculate the mean use a degree of freedom. When working with a subset (sample) of the population you use x̄ (x-bar) as the mean and s as the standard deviation. Because we are working with a subset there are (small n for subset) n-1 (here 3) degrees of freedom because we are not working with the true mean of the population. The mean x̄ takes a degree of freedom, and the n-1 data points take the other degrees of freedom. Think of it as knowing the x̄ (mean) beforehand and then knowing the values of 3 of your 4 data points. The last data point always has to be a specific value to calculate the right mean. Using this n-1 degrees of freedom in your mean of subset calculation is done to provide a better APPROXIMATION of the true population mean. Hope this clarifies and to get about the same explanation but in video format check out watch?v=9ONRMymR2Eg.

@MrVivekc 5 жыл бұрын

@@syedib I tried the same on copy and took n=4 and value is same as yours, 0.612..

@tuur319 4 жыл бұрын

all these other formulas i see use n-2? did you make a mistake or does it depend on the situation?

@seanvespucci9788 3 жыл бұрын

It depends on the assumptions he's using. I've been taught that for each parameter we estimate, one is subtracted from n. So With the Standard deviation of residuals generally the slope and intercept are estimated so you subtract 2 from n.

@alexwyler4570 4 жыл бұрын

why are we squaring the residuals? why are we dividing by number of data points -1?

@Penguinian 4 жыл бұрын

Alex Wyler um you square the residuals because if you don’t, when you add them up they just add up to zero I think. I don’t know about the -1 thing

@gabrielkpaka806 2 ай бұрын

This helpful

@VyTran-sm1qp 4 жыл бұрын

Could u please tell me how can we find the y^?

@rasoolkilani3170 3 жыл бұрын

if you have some data, you can input it into ms excel and plot it, then show the trend line and you can have a formula that calculates y^ (just google trend line+show formula in excel) choose linear or whatever relation that produces highest R2