An Introduction to Inference for a Proportion

Рет қаралды 71,831

jbstatistics

Күн бұрын

Пікірлер

@taladiv3415 3 жыл бұрын

3:52, Thank you for clarifying this point.

@ZicoGhosh 10 ай бұрын

Question : Unlike previous scenarios, the population proportion does not follow a normal distribution (since it's between 0 and 1). sort of bounded. We know this for a fact, so can we not start with the assumption that the population proportion follows an example : a Beta distribution maybe. And using that can we do some Hypothesis testing ?

@jbstatistics 10 ай бұрын

This is a very good question. The answer is that exact methods are quite easy to come by - the sample proportion p hat = X / n can be viewed as a scaled binomial random variable, since under typical assumptions X~Bin(np). So, if, say, we wanted to test H_0: p = 0.3 against H_0 p > 0.3, and we got 38 successes in a random sample of n = 100, then the exact p-value (based on the binomial distribution) is P(X >= 38) for X~Bin(100,0.3). In R: > 1-pbinom(37,100,.3) [1] 0.05304559. If we instead used the z test: z = (.38 - .30)/sqrt(.3*.7/100), then in R: (.38 - .30)/sqrt(.3*.7/100) [1] 1.745743 and > 1-pnorm( (.38 - .30)/sqrt(.3*.7/100)) [1] 0.0404278. The normal approx is often used for a number of reasons: 1) The parallels with other testing we see in intro stats. 2) We can’t pick alpha to be whatever we want it to be with the exact method. Here, for example, with the exact method there is no test that will give us an alpha of exactly 0.05. Conceptually this isn’t a big deal, but it adds a level of complication that makes things a bit trickier when learning intro stats. 3) (Related to 2)). Confidence intervals are a bit trickier. Your idea of a beta distribution would work in a Bayesian setting, where we could set our prior distribution to be beta and update it with info from the sample proportion.

@ZicoGhosh 10 ай бұрын

@@jbstatistics Thank you so much for the reply. This makes sense, I did not think about the fact that we can't really choose a standard significance level like 95% or 97.5% since it is discrete distribution. Would that be correct ? since its a discrete distribution, we don't have option to choose any significance level of our choice? Another question, since in the example you showed, we get a p_value < 0.05 when assuming normal distribution, but p_value > 0.05 when assuming Binomial. is this a matter of concern ? since, how we usually do Hypothesis testing, we would be accepting the H0 in the former case and rejecting it in the later ? However when I try with N=1,000 , and success = 380. and test the same Hypothesis, the two approaches agree and reject it. So is it correct to understand in the first case, the sample ~100 was not large enough for our approximation of Normal distribution to hold ? And finally, on the Bayesian setting, on your suggestion I did a little reading up, & as I understand I think my statement in the original comment is incorrect ("assumption that the population proportion follows an example : a Beta distribution maybe") , since the population proportion is essentially a constant. I should have said, the sampling distribution of the sample proportion follows a Beta distribution. But I also realised, if we are only interested in answering the H0 : p = 0.3 against H1 : p > 0.3, then we can do it even without knowledge of the prior Dist of p, since the p hat = X / n is gonna follow a binomial distribution. -- is this understanding making sense ?

@jt007rai 2 жыл бұрын

why we use z-score for inference of proportion but use t-score for inference of mean?

@RafauPe 2 жыл бұрын

We also use z-score for inference of the mean - t-score is used only if we do not know population sigma. Here we know our sigma - its this square root (when n tends to infinity, but whole this inference is based on this assumption).

@cococnk388 2 жыл бұрын

We alse use Z-score to infer mean... the sample size should be greater than 30 ans you have to know the variance of the population if not use t-score

@yekhtiari 5 жыл бұрын

I am confused as some books for normality use np>5 or np>10 ?you are using np>15.Which one is more accurate?

@michaelstassen6551 4 жыл бұрын

The key is at 2:55 -- "The sampling distribution is approximately normal if the sample size is large." The larger the sample size, measured by the sizes of np and n(1-p), the better the approximation. The rules "np and n(1-p) at least 5", "np and n(1-p) at least 10", and "np and n(1-p) at least 15" are attempts to answer the question, "What sample size is large enough for the approximation to be 'good'?" 15 is a more conservative answer than 10, which is more conservative than 5.

@cococnk388 2 жыл бұрын

Books talk about proportion of population and non proportion of a sample .. he use n x (p hat) > 15 ... when we only have the sample to infer a parameter of a population or np>5 is when we know about the population proportion p and we want to carry on hypothesis testing.

@weisanpang755 Жыл бұрын

Hi professor Jeremy, At ~4mins into the video, the test statistic Z was used in the calculation of confidence interval of the population proportion. It's my limited understanding that if the variance of the sampling distribution cannot be determined accurately (due to unknown population proportion in this case), then a t-distribution should be used in place of Z-distribution, as the variance in this case is estimated using the sample proportion. Could you kindly point me to where my understanding is wrong ?

@jbstatistics Жыл бұрын

This is a common point of confusion, and a very understandable one. It shows an understanding of the earlier inference procedures on means. As you know, when conducting inference procedures for the mean of a normally distributed population, we use the t procedure when the population standard deviation is known and is estimated by sample data. The hand-waving argument at the time is that using the sample SD to estimate the true SD introduced more variability and uncertainty into the equation, so we need to correct for that by using a distribution that has more area in the tails than the standard normal. That was a hand-waving argument, and a reasonable and knowledgeable person could think the same type of hand-waving argument applies here. But it's not quite right this time. The t distribution arises from a very specific set of mathematical circumstances. The technical details are that the t distribution arises as the distribution of a standard normal random variable, divided by the square root of an independent chi-square random variable over its degrees of freedom. That’s a pretty big mouthful, and not really super necessary to know in practical spots. But that situation *does* hold when we’re sampling from a normally distributed population and using the sample SD to estimate sigma. It does not hold here. Not as important, but perhaps easier to think about, is that these z methods on proportions require a large sample size in order for the normality assumption to be reasonable, and if we have that, there wouldn’t be much difference between z and t anyway.

@weisanpang755 Жыл бұрын

@@jbstatistics Hi professor Jeremy, thank you so much for your explanation. It seems a good understanding of the mathematical origin of t-distribution is required to properly understand its use cases, but I believe your explanation is as good as it could be without getting into too much technical details.