Understand The Normal Distribution and Central Limit Theorem

Рет қаралды 9,926

Күн бұрын

Пікірлер: 37

@very-normal Жыл бұрын

ERRATA: - 2:44: for the scaling property, the mean is only unchanged when the mean of that Normal is 0. Otherwise, the mean is also scaled by the factor b. I took for granted that we usually center at zero first before doing the scaling. Thanks to those that pointed this out.

@matthewmatos-pacheco1777 Жыл бұрын

I love your videos, im trying to make my own how are so consistent releasing every week , how long does a video take you it takes me hours just to record mine do you have any help or do you do it all by yourself

@pbcascual Жыл бұрын

@@matthewmatos-pacheco1777I’ve done a lot of experimenting with how to pace my videos, and I actually upload on a 10 day pace rather than weekly. I am usually working on two videos at a time: writing the script for one, while doing the editing for the other. Editing takes me forever, so it’s something I need to everyday. There are lots of times where I think that I can improve the quality of the video, but I choose to stick to a regular upload schedule since I’ll usually just apply that improvement to the next one. My recommendation would be to find a pace that you can manage with your life and stick to, since my belief is that the long game will lead to your success rather than viral hits, especially for an education channel

@ale5579 9 ай бұрын

Also at 10:20, you forgot to sqrt the variance

@bcs1793 4 ай бұрын

Small detail: At 5:48 it is not most distributions, but all others. This is called Geary's Theorem, and it is a pretty crazy property of the normal distribution

@FTFP1300 Жыл бұрын

5:55 -- "Proving this is a character building exercise" Can confirm that the image is accurate as I'm currently struggling through this proof.

@academyofuselessideas Жыл бұрын

Pretty cool that you give an example in which the central theorem fails... The insidious danger of the normal distribution is that it appears in so many useful phenomena that we tend to forget that not everything is normal... As for that example, I would say that you used a Cauchy distribution to generate samples were the central limit theorem does not hold... Though any distribution without finite variance would do

@aguerokun-c1g Жыл бұрын

bro this channel is a godsend, thank you

@bcs1793 4 ай бұрын

About the rule of thumb at 11:00 - I think it comes from the fact that the t-distribution becomes very close to normal when n = 30. If you take a look at the table for the t-distribution the critical value for a 0.05 level of significance becomes very stable after n = 30. Matter of fact, if you take a look at one of the tables we used to need to have for exams and exercises, the table used to have the critical value for every degree of freedom up to 30, but then it would just have a few extra rows with some larger numbers, like 40, 50 , 100 and "infinity" (i.e, the normal).

@timmychanson Ай бұрын

it also depends on the number of variables considered. if we work on multivariate data, then n=30 might not work, but determining the number of n such that CLT could work is the same. just find the value of T^2 (hotelling-T) and divide it to the χ^2(p), p is the number of the variables. if it's around 1.0, then that's the minimum n needed

@adamgee1020 Жыл бұрын

My proffesor told me the "rule of thumb" that we need 30 samples was made up because theres roughly 30 lines on a page, and that's how many values of the standard normal statisticians used to write on paper back in the old days before computers. Like you said, there's nothing special about the number 30 theoretically

@very-normal Жыл бұрын

Secret paper industry conspiracy in the statistics world

@pipertripp 10 ай бұрын

@@very-normalmy understanding is that 30 crops up because the Normal distribution becomes a very good approximation of the t-distribution when the degrees of freedom of the t-distribution >= 30.

@christophersoo 8 ай бұрын

the reason we use Z is because Karl Pearson, a renowned statistican, was german. And in german, the word "central" is "Zentral"

@jjmyt13 Жыл бұрын

Practically speaking, the location-scale student-t is similar to the Cauchy for small numbers of samples (e.g., df of 1) but becomes approximately normal when the degrees of freedom (df) are about 30. So that might be worth looking into when considering why some people focus on 30 samples for reasons of significance.

@xavierlarochelle2742 Жыл бұрын

For the last example, I'm guessing you are using the Cauchy distribution, which does not converge to a mean since it has infinite variance (I think).

@santiagodm3483 Жыл бұрын

I love your channel!! You are a great mentor for me!

@saminchowdhury7995 Жыл бұрын

I wanna like this a 1000 times

@christinaminty 2 ай бұрын

thank you for the visual helps. I struggle a lot to even grasp the basic things in statistics. but I wonder, are there any prerequisite knowledge to be able to understand your videos properly? I'm coming from an absolutely no-math background and I just can't work or understand any formulas. I only understand the concept you explained.

@RomanNumural9 Жыл бұрын

The reason why n=30 (I've heard anywhere from 20-30 actually) is chosen is because the rate is convergence to the normal is known. There is another theorem (Berry-Esseen) that determines the convergence rate to be proportional 1/sqrt(n) provided the third moment exists. Suppose for example our iid random variables have zero mean, variance 2, and third moment 1. then by this theorem the error between the converging and limit distributions is 3/(8*sqrt(n))=3/40 in the case n=25. Which is less than 0.1 error. This is far from an actual proof, but I think this is more or less where the n=30 reasoning comes from along with other mentions about making t tables looking nice.

@vix_ki_youtube 4 ай бұрын

This is extremely information dense for an employee training video mate... But Thanks for the lesson

@very-normal 4 ай бұрын

all the best for my “employees”

@mokamed3443 Жыл бұрын

Awesome video - did catch an error at 10:15, (\bar{X} - \mu)/(\sigma/\sqrt{n}) ~ AN(0, 1). Cheers, and happy holidays.

@TRIBYE Жыл бұрын

2:33 is incorrect. For X ~ N(mu, sigma^2), bX ~ N(b * mu, b^2 * sigma^2) i.e. you're missing the b * mu. What you wrote at 2:33 (and 2:48) is incorrect for all mu =/= 0

@very-normal Жыл бұрын

Thanks for pointing that out, you’re right. I missed talking about a step in my original script, I’ve added an errata about your correction

@pipertripp 10 ай бұрын

Are you simulating the Cauchy at the end of the vid?

@very-normal 10 ай бұрын

Yes! You got it, the classic pathological distribution lol

@pipertripp 10 ай бұрын

@@very-normalindeed. It looks so unassuming, but beneath that symmetric curve lies danger.

@samlevey3263 Жыл бұрын

There's a pretty big mistake in this video, where you say that scaling a normal variable only changes the variance and not the mean. That's only true if the mean is 0. Otherwise, nicely done as usual.

@very-normal Жыл бұрын

You’re right, thanks for catching that. I’ll add it as an errata in a pinned comment

@pipertripp 10 ай бұрын

I've always thought that the whole N=30 thing comes from the fact that the standard normal becomes a very good approximation of the t-distribution when N=30. But I have to admit that I have no idea if this is actually true or not. Anybody know?

@bin4ry_d3struct0r 4 ай бұрын

Not to argue with professional statisticians or anything, but 30 seems like a very low minimum threshold for CLT to apply. In a sample of 30 people, you can easily find one outlier that you most probably couldn't find in a sample of 30,000. I'm not saying it's wrong ... I'm just saying it's super sus.

@very-normal 4 ай бұрын

lol yeah for an asymptotic theorem, it feels weird to be taught that it should approach infinity and then be told “30 is cool”