Unbiased Estimators (Why n-1 ???) : Data Science Basics

Рет қаралды 45,082

Күн бұрын

Пікірлер: 91

@davidszmul2141 4 жыл бұрын

In order to be even more practical, I would simply say that: - Mean: You only need 1 value to estimate it. (Mean is the value itself) - Variance: You need at least 2 values to estimate it. Indeed the variance estimates the propagation between values (the more variance, the more spreaded around the mean it is). It is impossible to get this propagation with only one value. For me it is sufficient to explain practicaly why it is n for mean and n-1 for variance

@chonky_ollie 3 жыл бұрын

Best and shortest example I’ve ever seen. What a gigachad

@YusufRaul 4 жыл бұрын

Great video, now I understand why I failed that test years ago 😅

@venkatnetha8382 4 жыл бұрын

payhip.com/b/ndY6

@jamiewalker329 4 жыл бұрын

How I think about it: suppose you have n data points: x1, x2, x3, x4.., xn. We don't really know the population mean, so let's just pick the data point on our list which is closest to the sample mean, and use this to approximate the population mean. Say this is xi We can then code the data, by subtracting xi from each element - but this doesn't affect any measure of spread (including the variance). But then after coding we will have a ist x1', x2', ...., xn' but the i'th position will be 0. Then only the other n-1 data points will contribute to the spread around the mean, so we should take the average of these n-1 square deviations.

@gfmsantos 4 жыл бұрын

I guess the only other n-1 data points will contibuite to the spread around zero not the mean.... I got lost.

@jamiewalker329 4 жыл бұрын

@@gfmsantos 0 is the mean of the coded data.

@gfmsantos 4 жыл бұрын

@@jamiewalker329 Yes, but you didn't know the mean before you chose the point. As far as I understood, you've just picked a point that might be close to the sample mean, haven't you?

@jamiewalker329 4 жыл бұрын

@@gfmsantosYes, the sample mean. It's not supposed to be rigorous, just a way of thinking that given any data point as a reference point then there are n-1 independent deviations from that point. One data point gives zero indication of spread. With 2 data points, only the 1 distance between them would give an indication of spread, and so on...

@gfmsantos 4 жыл бұрын

@@jamiewalker329 I see. Good. Thanks

@Physicsnerd1 3 жыл бұрын

Best explanation I've seen on KZbin. Excellent!

@ritvikmath 3 жыл бұрын

Wow, thanks!

@Matthew-ez4ze Жыл бұрын

I am reading a book on Jim Simons, who ran the Medallion fund. I’ve gone down the rabbit hole of Markov chains and this is an excellent tutorial. Thank you.

@ritvikmath Жыл бұрын

Wonderful!

@abderrahmaneisntthatenough6905 4 жыл бұрын

I wish you provide all math related to ml and data science

@699ashi 4 жыл бұрын

I believe this is the best channel I have discovered in a long time. Thanks man.

@stelun56 3 жыл бұрын

The lucidity of this explanation is commendable.

@junechu9701 Жыл бұрын

Thanks!! I love the way of saying "boost the variance."

@ritvikmath Жыл бұрын

Any time!

@DistortedV12 4 жыл бұрын

I watch all your vids in my free time. Thanks for sharing!

@venkatnetha8382 4 жыл бұрын

For a 1200 long pages of question bank on real world scenarios to make you think like a data scientist. please visit: payhip.com/b/ndY6 You can download the sample pages so as to see the quality of the content.

@cadence_is_a_penguin Жыл бұрын

been trying to understand this for weeks now, this video cleared it all up. THANK YOU :))

@neelabhchoudhary2063 Жыл бұрын

dude. this is amazingly clear

@vvalk2vvalk 4 жыл бұрын

What about n-2 or n-p, howcome more estimators we have the more we adjust? How does it exactly transfer intro calculation and ehat is the logic behind it?

@Ni999 4 жыл бұрын

That last blue equation looks more straightforward to me as - = [n/(n-1)] [σ²-σ²/n] =[σ²n/(n-1)] [1-1/n] =σ²[(n-1)/(n-1)] = σ² ... but that's entirely my problem. :D Anyway, great video, well done, many thanks! PS - On the job we used to say that σ² came from the whole population, n, but s² comes from n-1 because we lost a degree of freedom when we sampled it. Not accurate but a good way to socialize the explanation.

@kvs123100 3 жыл бұрын

Thanks for the great explanation! But one question! why minus 1? Why not 2? I know the DoF concept would come over here! but all the explanation I have gone through, they have fixed the value of the mean so as to make the last sample not independent! but in reality as we take samples the mean is not fixed! It is itself dependent on the value of the samples! then DoF would be number of samples itslef!

@musevanced 4 жыл бұрын

Great video. But anyone else feel unsatisfied with the intuitive explanation? I've read a better one. When calculating the variance, the values we are using are x_i from 1 to n and x_bar. Supposedly, each of these values represents some important information that we want to include in our calculations. But, suppose we forget about the value x_n and consider JUST the values x_i from 1 to (n-1) and x_bar. It turns out we actually haven't lost any information! This is because we know that x_bar is the average of x_i from 1 to n. We know all the data points except one, and we know the average of ALL of the data points, so we can easily recalculate the value of the lost data point. This logic applies not just for x_n. You can "forget" any individual data point and recalculate it if you know the average. Note that if you forget more than one data point, you can no longer recalculate them and you have indeed lost information. The takeaway is that when you have some values x_i from 1 to n and their average x_bar, exactly one of those values (whether its x_1 or x_50 or x_n or x_bar) is redundant. The point of dividing by (n-1) is because instead of averaging over every data point, we want to average over every piece of new information. And finally, what if we were somehow aware of the true population mean, μ, and decided to use μ instead of x_bar in our calculations? In this case, we would divide by n instead of (n-1), as there would be no redundancy in our values.

@cuchulainkailen 4 жыл бұрын

Right. The phraseology is this: the system has only n-1 degrees of freedom when you use xbar. ...Xbar has "taken it away".

@richardchabu4254 3 жыл бұрын

well explained very clear to understand

@DonLeKouT 4 жыл бұрын

Try explaining the above ideas using the degrees of freedom.

@cuchulainkailen 4 жыл бұрын

correct.

@venkatnetha8382 4 жыл бұрын

@tyronefrielinghaus3467 Жыл бұрын

Good intuitive explantation,,,thanksd

@AbrarAhmed-ox2fd 3 жыл бұрын

Exactly what I have been looking for.

@陳冠熏-m3d 8 ай бұрын

Th last section is so helpful thank you！

@ritvikmath 8 ай бұрын

Glad it was helpful!

@yassine20909 2 жыл бұрын

Now it makes total sense. Thank you 👏👍

@Set_Get 4 жыл бұрын

Thank you. Could you please do a clip on Expected value and it's rules and how to derive some results.

@ChakravarthyDSK 2 жыл бұрын

Please do one lesson on the concept of ESTIMATORs. It would be good if the basics of these ESTIMATORs is understood before getting into the concept of being BIASED or not. Anyways, you are doing extremely good and you way of explanation is simply superb. clap.. clap ..

@amittksingh Ай бұрын

great explanation!

@braineater351 4 жыл бұрын

I wanted to ask a question. For E(x bar), x bar is calculated using a sample of size n, so is E(x bar) the average value of x bar over all samples of size n? Other than that, I think this has been one of the more informative videos on this topic. Additionally, many times people tie in the concept of degrees of freedom into this, but usually they show why you have n-1 degrees of freedom and then just say "that's why we divide by n-1", I understand why it's n-1 degrees of freedom, but not how that justifies dividing by n-1. I was wondering if you had any input on this?

@subhankarghosh1233 9 ай бұрын

Marvelous... Loved it...❤

@ritvikmath 9 ай бұрын

Thanks a lot 😊

@martinw.9786 2 жыл бұрын

Great explanation! Love your videos.

@missghani8646 3 жыл бұрын

this is how we can understand stats not by just throwing some number to students

@alexandersmith6140 Жыл бұрын

Hi @ritvikmath, I want to understand those derivations in the red brackets. Do you have a good set of sources that will explain to me why those three expected values return their respective formulae?

@nguyenkimquang0201 Жыл бұрын

Thank you for great content!!!❤❤❤

@ritvikmath Жыл бұрын

You are so welcome!

@chinmaybhalerao5062 2 жыл бұрын

I guess second approach for n-1 explanation will be right when both population and sample will follow same distribution which is very rare case.

@yitongchen75 4 жыл бұрын

is that because of we lose 1 degree of freedom when we used the estimated mean to calculate the estimated variance?

@cuchulainkailen 4 жыл бұрын

Correct. It's NOT as author states, that the Variance is boosted.

@venkatnetha8382 4 жыл бұрын

@prof.g5140 2 жыл бұрын

incorrect intuition. this is more accurate: ideally the actual sample mean equals the population mean, however the actual sample mean is rarely ideal and there's an error amount. if the sample is more concentrated on lower values, then the sample mean will be lower than the population mean. since the sample is concentrated on lower values and the sample mean is also lower, the differences between the samples and the sample mean will mostly be lower than the samples and the population mean thus lowering the sample variance. if the sample is instead concentrated on higher values, then the sample mean will be higher than the population mean. since the samples are concentrated on higher values and the sample mean is higher than the population mean, the distance between the samples and the sample mean will mostly be higher than the differences between the samples and the population mean thus lower the sample variance. whether the sample is concentrated on lower or higher values (not concentrated is unlikely for small sample sizes), the sample variance (using n as denominator) will prob be lower than the population variance. therefore, we need to add a correction factor.

@GauravSharma-ui4yd 4 жыл бұрын

Amazing...

@venkatnetha8382 4 жыл бұрын

@plttji2615 2 жыл бұрын

Thank you for the video, can you help me how to prove that is unbiased in this question? Question: Compare the average height of employees in Google with the average height in the United States, do you think it is an unbiased estimate? If not, how to prove it is not mathced?

@user-or7ji5hv8y 3 жыл бұрын

Great video but still not convinced on the intuition. How do you know that the adjustment compensates for missing tail in sampling? And if so, why not n-2, etc? I guess, if anywhere there would be missing data, it would be in the tail.

@yezenbraick6598 2 жыл бұрын

yes why not n-2 Jamie Walker's comment explains it in another way check that out

@jingsixu4665 3 жыл бұрын

Thanks for the explaination from this perspective. Can u talk more about why 'n-1'? I remember there is something with the degree of freedom but I never fully understand that when I was learning it.

@samtan6304 3 жыл бұрын

I also had this confusion when I first learned it. Say you have a sample with values 1,2,3, Now, you calculate the sample variance. The numerator will be [(1 - 2) + (2 - 2) + (3 - 2)]. Notice in this calculation, you are implicitly saying the sample mean must be 2, because you are subtracting every value by 2. Using this implicit information, you will realize that one term in the numerator cannot vary given the other two terms.

@soumikdey1456 2 жыл бұрын

just wow!

@mm_ww_2 3 жыл бұрын

tks, great explanation

@jeffbezos4474 2 жыл бұрын

you're hired!

@nelsonk1341 2 жыл бұрын

you are GREAT

@AmineChM21 4 жыл бұрын

Quality video , keep it up !

@EkShunya Жыл бұрын

good one

@pranavjain9799 Жыл бұрын

You are awesome

@ritvikmath Жыл бұрын

Thanks you too!

@Titurel 10 ай бұрын

4:38 You really should give links to the derivation otherwise we still feel it's hand wavy

@asifshikari Жыл бұрын

Why n-1...we could adjust even better by doing n-2

@mohammadreza9910 10 ай бұрын

useful

@jtm1283 9 ай бұрын

Two criticism (of an otherwise very nice video): 1. all the real work in the proof is done by the formulae in black on the right, for which you provided no explanation; and 2. to talk about sample sd without mentioning degrees of freedom seems incomplete. WRT to the latter, just look inside the summation and ask "how many of these are there?" For the mean, there are n different things (the x-sub-i values), so you divide by n. For sample sd there are n things (the x-sub-i values) minus 1 thing (x-bar), so it's n-1.

@thomaskim5394 4 жыл бұрын

You still are not clear why we use n-1 instead n in the sample variance, intuitively.

@jamiewalker329 4 жыл бұрын

See my comment.

@thomaskim5394 4 жыл бұрын

@@jamiewalker329 I have already seen a similar argument like yours.

@cuchulainkailen 4 жыл бұрын

@@jamiewalker329 It's convoluted. The answer is what I posted. # of degrees of freedom is reduced to n-1 by use of xbar.

@venkatnetha8382 4 жыл бұрын

@thomaskim5394 4 жыл бұрын

@@venkatnetha8382 What are you talking about?

@gianlucalepiscopia3123 3 жыл бұрын

Never understood why "data science" and not "statistics"

@yepitsodex Жыл бұрын

the 'we need it to be slightly smaller to make up for it being a sample and not the population' argument isnt needed or realistic. Having n-1, regardless of the size of the sample, says that the one is completely arbitrary just to tweak it the smallest amount. in reality when you go to the sample space from the population space, you lose exactly one degree of freedom. It seems like thats why its n - 1 and not n-2 or something else. if you had all of the sample space numbers except for one of them, the value of the last one would be fixed, because it has to average out to the sample variance. Since it cant be just anything, that is a loss of a degree of freedom, which justifies the use of n-1

@alexcombei8853 3 жыл бұрын

@tooirrational 4 жыл бұрын

Bias is not the factor that is used to deside the best estimates...its Mean Squares Error...n-1 is used because error is low not because its unbiased

@rhke6789 Жыл бұрын

Ah. Learning is i the details. You just skipped over "not interesting" that permits the logic to flow. Not good, Even mentioning the names of the quoted formulas you used but not explain be helpful.... variance decomposition formula or the deviation square formula