Logistic Regression Details Pt 3: R-squared and p-value

Рет қаралды 291,961

Күн бұрын

Пікірлер: 297

@statquest 4 жыл бұрын

Correction: 13:58 The formula at should be 2[(LL(saturated) - LL(overall)) - (LL(saturated) - LL(fit))]. I got the terms flipped. Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/

@falaksingla6242 2 жыл бұрын

Hi Josh, Love your content. Has helped me to learn a lot & grow. You are doing an awesome work. Please continue to do so. Wanted to support you but unfortunately your Paypal link seems to be dysfunctional. Please update it.

@hayagreevansriram326 4 жыл бұрын

4 days on this channel, I've learnt more than 12 weeks' lectures at college. Thank you, Josh!!

@statquest 4 жыл бұрын

Awesome!!! I'm glad you're enjoying my videos. :)

@hayagreevansriram326 4 жыл бұрын

@@statquest Enjoying them as well as hoping they'll help me ace my Data Mining exam tomorrow 😂

@statquest 4 жыл бұрын

@@hayagreevansriram326 Good luck tomorrow and let me know how it goes.

@peasant12345 4 жыл бұрын

what do you think about the tuition colleges charge?

@gracel2931 2 жыл бұрын

Same 😂

@sharonlee5219 6 жыл бұрын

I've been binge-watching many of your videos recently to learn more about stats & RNA-Seq and I just wanted to say thank you for all the work you do! these videos are amazing and have been so incredibly helpful in explaining things :)

@statquest 6 жыл бұрын

You’re welcome!!! I’m glad you like my videos so much. I have a lot of fun putting them together. :)

@statquest 6 жыл бұрын

ps, I have 3 more videos on logistic regression coming out in July. :)

@rameshbabu2228 2 жыл бұрын

Your explanation always unique sir. I completed Masters in Statistics, my brother did PhD in Statistics had explained Logistic Regression theoretically but not satisfied. I have huge confidence on your explanation and hard work so listened got 200 % satisfication. Thank you so much sir

@statquest 2 жыл бұрын

Thank you!

@xiaoyuqian5317 2 жыл бұрын

Hi, Josh. I started watching your video 3 years ago. At that time, I was a master in bioinformatics, I came across many questions in statistics while doing my research. Your video is clear and instructive, which allows me to put the models mentioned in your video into my research very quickly. It means a lot to me. Now I have already started my career as a PhD candidate in statistical genetics. Your videos have really helped me a lot at an important time in my career, I can't put your name in my journal article, but it deserves a place there, a sincere thank you for the video you uploaded. Wish you happiness every day.

@statquest 2 жыл бұрын

Thank you very much!!! I'm so glad that my videos have helped you and good luck with your PhD! BAM! :)

@karakter3 4 жыл бұрын

I've been having difficulty going through grad level stats after taking a loong break from academics and found your videos very useful and so much fun, thank you !

@statquest 4 жыл бұрын

Thank you! :)

@magtazeum4071 5 жыл бұрын

I'm addicted to these intro songs..

@statquest 5 жыл бұрын

@vincenttan6303 3 жыл бұрын

I always wondered what the interviewers wanted me to say... I didn't know what I didn't know... until this.

@statquest 3 жыл бұрын

bam!

@jesscharon9146 4 жыл бұрын

Thank you Josh, I’m a PhD student from China, and I’ve never learnt logistic regression before. But this is sooooo good for beginners like us, clear examples, clear explanations, humorous way of talking. I really appreciate you for making these fantastic videos. This gonna help me finish the most difficult quant. data analysis chapter. Thank you so much. Btw the singing at beginning is cute as always XDD

@statquest 4 жыл бұрын

Thank you very much! :)

@dainegai 5 жыл бұрын

Enjoying going through the logistic regression StatQuestline (i.e. playlist) :D Small nitpick @3:09 -- the horizontal line corresponding to the mean of the data is *not* the "worst" fitting line in a sum-of-squared-residuals sense (you can make some pretty bad-fitting lines if you wanted to ;p ). It's actually "the best-fitting line (in a sum-of-squared-residuals sense) when you're forced to have a slope of zero". (It's the best-fitting model with 1 less **degree of freedom** than the model that includes a potentially non-zero slope.) This corresponds to a flat line "y = (mean of the data)".

@statquest 5 жыл бұрын

Very true.

@cezarystorczyk1722 10 ай бұрын

Dziękujemy.

@statquest 10 ай бұрын

Thank you very much for supporting StatQuest!!! TRIPLE BAM!!! :)

@bolajiadedasola6369 3 ай бұрын

You are the best in teaching

@statquest 3 ай бұрын

Thank you!

@alvaroaguado3 6 жыл бұрын

Awesome vids!! I don’t miss a statquest

@statquest 6 жыл бұрын

Thank you! :)

@jessicatan278 6 жыл бұрын

why is it 0.55 and not 0.56? at min 6:47

@statquest 6 жыл бұрын

Ooops. I didn't do a good job rounding! The true value is 0.55555555....repeating, which rounds to 0.56. However, I messed up on the next slide and just put 0.55. Sorry for the confusion.

@carloscamargo566 4 жыл бұрын

I'm watching your videos from Colombia and it's amazing how trivial has become distance and money to get access to extremely good quality knowledge , I really appreciate the work you put on your videos it have really helped me a lot on improving my Statistical analysis skills , thank you!

@statquest 4 жыл бұрын

Hooray!!! I'm so glad you can watch and learn from my videos. I'm very passionate about helping everyone learn.

@TheImpulsiveRamble 4 ай бұрын

Hi Josh, I don't know if you're still monitoring comments, but let me begin by thanking you for putting together these videos. As someone who didn't enjoy math and stat back when I was a student, it's refreshing to have someone provide such clear and concise explanations of the intuition behind concepts instead of getting muddied up in abstractions and notations. I have a few clarifying questions regarding the interpretation of the p-value of the McFadden's R-squared described in 11:55 of this video and the p-value of the coefficients described in 10:41 of the Pt1: Coefficients video. Is it appropriate to think of these as being analogous to the f-test and t-test in linear regression, respectively (i.e., the first tests the significance of the overall model whereas the second tests the significance of a single coefficient)? If so, just as the f-test can find that coefficients are significant jointly while the t-test can fail to find that coefficients are significant individually, can a similar situation occur with the aforementioned p-values in a logistic regression context? Thanks in advance for your reply.

@statquest 4 ай бұрын

Yes and presumably. At least to me, it seems reasonable that you could have a model with a lot of parameters, where each parameter only contributes a tiny amount to the overall fit - so in the big picture, you have a predictive model, but the individual parameters don't have much of an effect.

@JohnWick-ls7yt 3 жыл бұрын

You are the best musistician in the world!

@statquest 3 жыл бұрын

Triple bam! :)

@margotalalicenciatura1376 5 жыл бұрын

First of all a million thanks for your work man! It's really outstanding and almost infuriating to think how bad teachers are most of the people in stats by contrast. Got two questions: first, you say we can't use least squares since in the log odds scale the residuals are infinite, couldn't we just use them in the probability scale with the squiggly line? Second, are you planning in eventually doing a MCMC StatQuest? That'd be reaaaaally handy. Thankss

@NazaninYari 2 жыл бұрын

You are a GENIUS. Hats off to you!

@statquest 2 жыл бұрын

Thank you!

@omercoskun6042 Жыл бұрын

I wonder why you mentioned SS(mean) as the worst fitting line. Clearly, there are worse lines that we can fit. I always thought SS(mean) as a base value, the line that minimizes the sum of squares if we only had y values and no x values (no input). By the way, loving your lectures, they are all clearly explained and super helpful!

@statquest Жыл бұрын

The mean of the thing we want to predict is thought of as the worst fitting line because that is what we would fit if we had nothing to predict (no x-axis value).

@russelllavery2281 Жыл бұрын

this series is great! Thanks.

@statquest Жыл бұрын

Glad you enjoy it!

@soya1226 4 жыл бұрын

this is extremely well explained!!much appreciated!

@statquest 4 жыл бұрын

Thank you! :)

@casperhansen3012 5 жыл бұрын

Hey Josh, I was wondering about the projecting of points at negative or positive infinity onto the candidate line, or just any line in general. You just say that we project the data onto the line at 5:57. But how does the math work?

@エリアル-d7x 5 жыл бұрын

Here is what I think:There are 5 mice obese and 4 not obese,totally 9 mice.Without considering for weight,the probability of a mouse being obese is 5/9=0.56.If we map the probability(5/9) to the right figure,that is log(0.56 / 1-0.56)=log(5/4)=0.22.

@ToloSanso-dg3po Ай бұрын

I think your channel is the best in stats! I have a question about this video. In min 9:73, how can you proyect the data onto de candidate line? The line is so vertical that I can´t see how you can do that proyection in order to get log(1) and the log (0) in the log(likehood) in min 10:03. Thank you

@statquest Ай бұрын

The line is near, but not quite vertical. If we had a much larger computer screen, we would see that the line has y-axis coordinates that correspond to the x-axis values for the data. We can solve for those y-axis coordinates by multiplying the x-axis values by 22.42 and adding the intercept -63.72.

@ToloSanso-dg3po 17 күн бұрын

@@statquest I understand! Thank you very much!

@StephenRoseDuo 6 жыл бұрын

Now I can't wait for the deviance videos!

@statquest 6 жыл бұрын

I've got the slides all done for it - so it's ready to go. The bummer is that I'm traveling a lot in the next two weeks so it won't be out for a while... unless I can somehow make it happen this Friday.... I'll see what I can do.

@miguelangelpastorvalverde9196 3 жыл бұрын

Thank you very much Josh for clarifying my doubts. I am doing a logistic regression, and I have 2 questions 1) Why do I get a significant p- Value and I get an r-square of 2 percent for a specific independent variable? If I get a r-square of 2 percent, I should get a pvalue greater than 0.05 (not significant)? 2) How valid that probability equation will serve me? Look residual ?

@statquest 3 жыл бұрын

You can have a terrible R-squared value and still have a small p-value if you have a lot of data. However, if the R^2 value is bad, then, even with a significant p-value, your model may not be worth very much.

@miguelangelpastorvalverde9196 3 жыл бұрын

I really appreciate the time you take to answer questions !! Thanks, I already have it clearer

@lprasai 2 жыл бұрын

Who liked the way he says StatQueeest!

@statquest 2 жыл бұрын

bam!

@abiyosopurnomosakti1994 5 жыл бұрын

What a prolific teaching Josh! Enjoy your song as well! :)

@statquest 5 жыл бұрын

Thank you! :)

@manikdhingra1606 5 жыл бұрын

Hello Josh, again much thanks for the video. QQ- @13:27 how did you calculate the p-value using formula [ 2*(LL(fit) - LL(overall Probability))]? I've already watched P-value video but unable to figure out. Don't know what I am missing. Thanks in advance!

@jhfoleiss 5 жыл бұрын

Hi! I think Josh would give you a much better explanation, but i'll try :) Chi-square distributions come in different degrees of freedom. In the case of logistic regression, the degrees of freedom is 1 (2 parameters in the logistic regression (y-intercept and slope), and 1 parameter for the overall probability (y-intercept, just a horizontal line), thus 2-1=1). Thus, you need to use the Chi-square distribution with 1 degree of freedom. *The p-value is given by the area under the 1-DoF chi-square distribution (integral) from [ 2*(LL(fit) - LL(overall Probability))] to infinity!* In the first example: Since, by definition, the area under a statistical distribution curve is always 1, and [ 2*(LL(fit) - LL(overall Probability))] = 0, the integral is over the entire distribution (chi-square support (domain) is from 0 to +infty), thus 1. Therefore, the p-value = 1. In the second example: [ 2*(LL(fit) - LL(overall Probability))] = 4.82. The integral of the 1-DoF chi-square distribution from 4.82 to +infinity is 0.03. Thus, the p-value = 0.03, which is statistically significant in most situations, since it is less than 0.05. Hope this helps!

@mortezamohammadi9963 Жыл бұрын

The formula to calculate the p-value from the test statistic in logistic regression is based on the principles of hypothesis testing and the properties of the standard normal distribution. Here's a step-by-step explanation of how the formula is derived: 1. **Null Hypothesis and Test Statistic**: In hypothesis testing, you start with a null hypothesis (\(H_0\)) that assumes no effect (e.g., the coefficient is zero). The test statistic \(z\) is calculated to measure how far the estimated coefficient (\(\hat{\beta}\)) is from the null hypothesis value (usually zero). The formula for the test statistic is: \[ z = \frac{\hat{\beta}}{SE(\hat{\beta})} \] 2. **Standard Normal Distribution**: Under the null hypothesis, the test statistic \(z\) follows a standard normal distribution (\(N(0, 1)\)). This is a fundamental property of hypothesis testing. 3. **Two-Tailed Test**: Since you're interested in whether the coefficient is significantly different from zero (two-tailed test), you want to calculate the probability of observing a test statistic as extreme as \(z\) in either tail of the standard normal distribution. 4. **Cumulative Distribution Function (CDF)**: The cumulative distribution function (\(\Phi(z)\)) of the standard normal distribution gives you the probability that a standard normal random variable is less than or equal to \(z\). In mathematical notation: \(\Phi(z) = P(Z \leq z)\). 5. **Probability Calculation**: The p-value is the probability of observing a test statistic as extreme as \(z\) in both tails of the distribution. Since the standard normal distribution is symmetric, you can calculate the probability of observing a test statistic as extreme as \(z\) in one tail and then multiply it by 2 to account for both tails: \[ p = 2 \cdot (1 - \Phi(|z|)) \] Here, \(|z|\) ensures that the value inside the cumulative distribution function is positive. In summary, the formula \(p = 2 \cdot (1 - \Phi(|z|))\) calculates the p-value by determining the probability of observing a test statistic as extreme as \(z\) in both tails of the standard normal distribution. If this probability is small (i.e., the p-value is small), you have evidence to reject the null hypothesis and conclude that the coefficient is statistically significant.

@ruxiz2007 4 жыл бұрын

This is great great explanation, thanks!

@statquest 4 жыл бұрын

Thanks!

@almonddonut1818 3 жыл бұрын

Thank you so much for your videos!

@statquest 3 жыл бұрын

Glad you like them!

@Felicidade101 6 жыл бұрын

Amazing Thank you Josh!

@statquest 6 жыл бұрын

You’re welcome! I’m glad you like the videos! I have 3 more on Logistic Regression coming out in July. :)

@phongapex3741 Жыл бұрын

Hello! At the 8:24, you can determine the maximum likelihood with the intercept of -0.22. How can you know that? Which line do we have first? squiggle line OR straight line? I do not actually understand that at the beginning, we already had a squiggle line, then found p values of points to calculate log(odds) in order to get the straight line of the log(odds) graph. How did we have that squiggle line at the beginning? OR, we already had a straight line, then projected points to find the log(odds) values, next, calculated the p values in order to have the squiggle line. How did we have that straight line at the beginning? I AM STILL CONFUSED ...

@statquest Жыл бұрын

To learn more about how we fit lines and squiggles to data in logistic regression, see: kzbin.info/www/bejne/eJeukqGiZsaGfZI

@saltedfish_is_good 6 ай бұрын

I am finally clear. Time for relu logistic model

@statquest 6 ай бұрын

bam! :)

@jiayoongchong2606 4 жыл бұрын

13:56 out in the wild R squared value commonly written as

@ivanrecalde8543 4 жыл бұрын

Increible! Saludos desde Argentina

@statquest 4 жыл бұрын

Gracias!!! :)

@UncleLoren 4 жыл бұрын

So we took log(5/4) = .22, plugged it into the (e/1+e) equation and got .56, which we could have gotten from 5/9, proving there are two ways to come up with the same number, with one inducing a migraine. That's OK; I got it. Then, for some reason you plugged .55 into an equation -- not .56 -- and later used a NEGATIVE .22 to arrive at something that resulted in .45, the complement of .55...which you adjust to .44. WHY the .01 adjustment?? THROW ME A BONE, BRO!!! PLEASE. ****Update****: I just noticed in the "proof" portion of video that you changed the ratio of obesity from 5/4 to 4/5 which explains how #s got turned upside down. You just HAD to pick something strikingly similar to the previous example to confuse me, right? But why, Josh? If your videos make 99.999% of the people viewing them smarter and one person ends up smashing themselves in the head with a hammer, can you see how this might be a problem? It reminds me of the class imbalance problem. For a certain audience, your videos are excellent, you're a saint for creating them and it's unfortunate that I am an imbecile. Thank you for reading. (Only joking. I am getting smarter, just gotta stick with it. Thanks a million.)

@jodischmodi 3 жыл бұрын

you're better than my prof

@statquest 3 жыл бұрын

BAM! :)

@tysonliu2833 11 ай бұрын

so essentially with a model where weight is a very poor predictor for obese, the best line that we can find will be as poor as the LL(overall probablity), therefore R2 is 0, otherwise with a perfect predictor, LL(fit) is dramatically different from the LL(overall probablity) so that R2 is 1

@statquest 11 ай бұрын

yep

@ml6352 5 жыл бұрын

Hi Josh, really good explanations :) I have seen already all the logistic regression series. Just one question: I would assume that the Part 1 [Coefficients] is basically the last part occurring when performing a logistic regression, right? I mean the algorithm will first optimize the squiggly line to the best fit(Part 2) , then evaluate for the significance (Part 3) . Finally the results can be seen by interpreting the coefficients (Part 1) which are given in terms of log(Odds). I hope you understand my question :) Thanks in advance and happy holidays. Marcelo

@statquest 5 жыл бұрын

You are correct. The reason I organized the videos the way I did was to follow the output that R gives you when you do Logistic Regression. The first thing it prints out are the coefficients, and the last thing it prints out is the R^squared. So I was just going from the top and working my way down the output.

@ml6352 5 жыл бұрын

@@statquest Thank you 😊. Best regards from Germany

@statquest 5 жыл бұрын

@@ml6352 Thanks! :)

@bhargavpotluri5147 4 жыл бұрын

I found out your channel 2 days back. Since then, my learning curve is going towards infinity (Original axis & not on the log odds axis :P). superb videos & content. Thanks a lot MAN !! Also one more suggestion, can you also include the cost function of the respective model so that it is 100% complete.

@statquest 4 жыл бұрын

Awesome! I'm glad you like my videos! :)

@bhargavpotluri5147 4 жыл бұрын

@@statquest Hi Josh, Can you please come up with Image processing algorithms or NN models as well

@statquest 4 жыл бұрын

@@bhargavpotluri5147 I'm working on the NN videos.

@bhargavpotluri5147 4 жыл бұрын

@@statquest Wow, Thanks Josh :)

@mriduls95 4 жыл бұрын

but what are the 2 groups of values on which we perform the chi square in the end? As chi square is performed on groups

@statquest 4 жыл бұрын

In this case we are using a Chi-Square distribution to determine a p-value, but we are not performing a standard Chi-Squared test. This is similar to how a z-test is based on the normal distribution, but the normal distribution is used for a lot more things than just the z-test.

@tallwaters9708 6 жыл бұрын

Nice stuff as always! If you're still taking video ideas I'd love to see some stuff on Bayesian models, monte carlo, markov chains :)

@statquest 6 жыл бұрын

Those are all on the to-do list... I'll get to them one day! I hope that day is soon! :)

@construenist6966 3 жыл бұрын

Very useful content 🔥

@statquest 3 жыл бұрын

Thank you! :)

@adenuristiqomah984 4 жыл бұрын

I am currently on your Machine Learning playlist, Josh. Keep up the good work

@statquest 4 жыл бұрын

Thanks, will do!

@rishavdhariwal4782 Жыл бұрын

hi Josh i don't know if you will see this but i had a question how does one know which distribution to compare to determine the p values. Like in the video at 12:01 you said that the metric follows a chi squared distibution but how does one get the intuition fo when to use which distibution to get the coressponding p - value of the metric?

@statquest Жыл бұрын

We can use theory to derive the distribution. This is pretty advanced stuff (I did it once a long time ago), so usually we just look it up when needed rather than derive it from scratch.

@rishavdhariwal4782 Жыл бұрын

Thanks for the reply Josh, Can you give an example of the keywords we may use to lookup the corresponding distribution? Like i know for testing the coefficients of a linear regression model we use the T-test, but in time-series data, we use the ADF test for checking stationarity. Here the value for the T statistic of a coefficient is to see if it is higher than a certain threshold and based on that we reject or fail to reject the hypothesis. The problem is the threshold that is set here is higher than the one you get if you test it with a normal T-test(I don't know the exact distribution but it follows another distribution). So how may i go about finding the distribution for testing the statistic in the above case? @@statquest

@statquest Жыл бұрын

@@rishavdhariwal4782 To be honest, I'm not sure I understand your question. However, if you are interested in why these specific statistics have a chi-squared distribution, you can look at how Mcfadden's R-squared is derived.

@nataliakos4932 3 жыл бұрын

I watch this series with such commitment as if I were watching a good Netflix series. Just can't stop.

@statquest 3 жыл бұрын

bam! :)

@LakshyaIIITD Жыл бұрын

3:09 I, think worst fitting line perpendicular to the best fitting line

@statquest Жыл бұрын

You are correct - I should have been a little more careful with my words at that point.

@desmondturner5435 3 жыл бұрын

Thank you for the help! This series is amazing. at 12:31 would the degrees of freedom for 2 independent variables be 2? and for 3, 3, etc?

@statquest 3 жыл бұрын

I believe that is correct.

@deuteros 3 жыл бұрын

Josh, I have read that pseudo R2 is not a good metric to compare models which predict the same variable through different covariates (different models built from individual covariates, y ~ x1, y ~ x2, y ~ x3, etc..). What is, in your opinion, the best way to do this comparison?

@statquest 3 жыл бұрын

You can also use a confusion matrix and associated metrics (like sensitivity and specificity and ROC). For details, see: kzbin.info/www/bejne/gZXWoWmppNZ0bdE kzbin.info/www/bejne/rIGTZ5SDpN9nrJo kzbin.info/www/bejne/apu1c4V6l6-Yo68

@rabbitazteca23 2 жыл бұрын

Can we also use the maximum likelihood instead of its log version for calculating R^2

@statquest 2 жыл бұрын

Maybe! I don't know off the top of my head. However, the log is often used to avoid underflow errors, so if you don't have too much data, it might work without the log.

@hang1445 3 жыл бұрын

13:40 Hello Josh, thanks for making this useful video list so that I can learn machine learning rather than studying in uni. And I would like to clarify sth. The logistic model you have built has a p-value of 0.03, does it indicate that there is a strong relationship between weight and obesity? Just like what you have said in the video, it is not due to chance. For the R^2 value, 0.39, does it indicate that the model is not good enough? We may need to add more parameters other than weight to classify whether the mice are obese or not. Hope you can correct me if I get sth wrong, thanks 😁

@statquest 3 жыл бұрын

The p-value only tells us if the relationship is significantly different from random noise. The r-squared value tells us the strength of the relationship. How "strong" is "strong" depends on the field or area being studied.

@hang1445 3 жыл бұрын

So the relationship is significantly different from random noise as the p value is so small. Here, I have one thing to ask, what is random noise? Though, the relationship is significantly different from random noise, the strength of the relationship is not quite good as we obtain only 0.39. Do I interpret correct?

@statquest 3 жыл бұрын

@@hang1445 Random Noise is just "random stuff", things that are not related. And if the p-value small, then you can conclude that your relationship is significantly different from random stuff that is not related (and that suggests it represents a true relationship). As for the R-squared value. Depending on the field, 0.39 may be considered a "weak" relationship, other fields might consider it "strong". It depends on the type of data you are working with.

@hang1445 3 жыл бұрын

Well explained! Thanks ：）

@SS-ve1jm 2 жыл бұрын

Amazing content please continue to upload videos always and grow this channel🎉 Triple BAM🎉

@statquest 2 жыл бұрын

Thank you! :)

@thomasamet5853 3 жыл бұрын

Great explanations !!! At 11:06, is it the log( likelihood of the data given the line) or the log(likelihood having this squiggly line given the data)?

@statquest 3 жыл бұрын

I believe it is the log( likelihood of the data given the line)

@thomasamet5853 3 жыл бұрын

@@statquest Thank you for the answer. I thought we were trying to find optimum parameters of the linear equation which would yield in the best sigmoid. Thus finding the MLE of the sigmoid (hence parameters) given the data. I'll watch your video on the MLE again then. I am still confused with the difference between the two.

@statquest 3 жыл бұрын

@@thomasamet5853 Regardless of how you phrase it, the likelihoods are the y-axis coordinates on the squiggle for each data point.

@thomasamet5853 3 жыл бұрын

That helps a lot. Thank you again for taking the time to answer and for the amazing content :)

@willychen6967 4 жыл бұрын

Hi Josh, I really enjoy these videos. Can you possibly do one that relates extreme value theory ( I'm thinking of T1EV) to the logit function?

@rrrprogram8667 6 жыл бұрын

Here it comess.... Great teaching josh... Thanks for all ur efforts...

@statquest 6 жыл бұрын

You are welcome!!! I'm always so happy to hear how much you like the videos! :)

@rrrprogram8667 6 жыл бұрын

StatQuest with Josh Starmer this is awesome channel for machine learning... Hope next exercise is in R

@statquest 6 жыл бұрын

I've got one more video, on the saturated model and deviance statistics, and then we put everything together with "Logistic Regression in R".

@rrrprogram8667 6 жыл бұрын

StatQuest with Josh Starmer woowwww.... We love statquest videos

@elrishiilustrado9592 3 жыл бұрын

It's very clear, thank you ! so the number of degrees of freedom its equal to the number of Xi variables? in this case we have a y variable and only 1 x variable, so we have only 1degree of freedom, but if we have 3 xi variables the degrees of freedom would be 3? bonus question : how do you compare logistic models ? how can i choose the best ? Thanks !

@statquest 3 жыл бұрын

The degrees of freedom is the difference in the number of parameters between the fitted model and the overall probability (which typically only has 1 parameter). So if the fitted model has 3 parameters, then DF = 3 - 1 = 2. People often use the Akaike information criterion (AIC) to choose the best model. For details, see: en.wikipedia.org/wiki/Akaike_information_criterion

@BeginnerVille Жыл бұрын

If directly project the data into the S shape logistice regression, wouldn't can get same image as 5:04? Don't get why need to do these.

@statquest Жыл бұрын

I'm not sure I understand your question, can you rephrase it?

@BeginnerVille Жыл бұрын

@@statquest Sorry, I mean original data distrubute on continuously x and binary y(0,1) But with the S shape logistice regression, it's intuition to direct project the y(0,1) on the regression line to get y values(0.01,0.5 0.99) directly. (Same as input x and get the y from regression line.) Why I must turn into log ,turn back into p, then get the same graph as what I mention to calculate LL()? Thanks for your amazing visualized teaching~

@statquest Жыл бұрын

@@BeginnerVille Have you watched my video on how the 's' shape is fit to the data to begin with? kzbin.info/www/bejne/eJeukqGiZsaGfZI The answer you want may be there. Anyway, the reason we start out in the log(odds) space to begin with is that the "best fitting" line is linear with respect to the coefficients, and thus, we can easily optimize it. In contrast, we can't optimize the 's' shape squiggle directly. Thus, we start with a straight line (or linear function) in log(odds) space and then then translate it to the 's' shape fit in probability space. We can then evaluate how well the 's' fits the data by calculating the log(likelihoods). We use that log(odds) then to compare to alternatives.

@BeginnerVille Жыл бұрын

@@statquest Thanks! Finally get the working logic. Would you mind to explain more about why you said "In contrast, we can't optimize the 's' shape squiggle directly"? As I shallow understand, sigmoid function can use some coefficient like c1,c2. AS: 1/(1+e**(c1*(x-c0))) Isn't changing these two coefficient and project y on the sigmoid line, I can directly optimize the shape by same maximun likelihood? What's the limit of this way? Thank you for your thoughtful assistance.

@statquest Жыл бұрын

@@BeginnerVille First, the equation for the sigmoid is non-linear with respect to c1 and c2 because they are in the exponent for 'e'. This means we need to use a non-linear, or numerical technique (like gradient descent kzbin.info/www/bejne/qXXZZZlqqJeGeJo ) to find the optimal values for c1 and c2. And I believe that part of the problem with using the sigmoid equation is that the output values are restricted to be between 0 and 1, instead of -infinity and +infinity, and this makes the math for optimization much more complicated. In contrast, in log(odds) space, the output values can be any value between -infinity and +infinity, so standard numerical techniques can be easily used.

@marcobarreto5429 4 жыл бұрын

In the case of comparing a Ridge vs a Logistic model would R^2 be a good approach?

@statquest 4 жыл бұрын

You would probably compare accuracy or some other metric used for classification.

@sajozsattila 2 жыл бұрын

I have a question about the p-value. The 2(LL(fit)-LL(overall)) a point estimation for the probability of this value. So Chi f( 2(LL(fit)-LL(overall)) ) just give us the probability of this single value. In your example f_{\chi^2}(4.82) \approx 0.0163. So to get the actual p-value we need to use: 1 - F_{\chi^2}( 2(LL(fit)-LL(overall)) ), which is the area of the right tail where x > 2(LL(fit)-LL(overall)). In your example, the actual p-value is approx 0.0281. Am I right?

@statquest 2 жыл бұрын

That seems correct. I rounded the value to 0.03.

@annillonaa 4 жыл бұрын

amazing!!! So helpful !! the song makes it ever greater!!! Thank u!!

@statquest 4 жыл бұрын

Thanks! :)

@xinzhaotong6531 6 ай бұрын

Hi Josh, at 11:39, the arrangement of the red and blue dots on p = 0.44 of the left figure seems incorrect. They should be positioned as follows from left to right: three red dots, two blue dots, one red dot, and three blue dots, as depicted in the figure on the right. This mistake should not impact the overall probability results of LL. Please correct me if I'm wrong. Thank you.

@statquest 6 ай бұрын

The ordering of the red and blue dots in the left figure at 11:39 is based on the ordering that is introduced at 7:44, when weight has no relationship with obesity.

@murselmusabasic4260 4 жыл бұрын

What does it mean to project data onto the fit line? Thanks for great lessons!

@statquest 4 жыл бұрын

Plug the x-axis coordinate for the data into the equation for the line to find the corresponding y-axis coordinate on the line.

@PunmasterSTP 7 ай бұрын

LL Cool J? More like LL "StatQuest is here to stay!" 👍

@statquest 7 ай бұрын

This is your best yet.

@PunmasterSTP 7 ай бұрын

@@statquest Thank you! If you ever want to hear a pun on a particular topic, just let me know.

@jaegermeistersfriend 3 жыл бұрын

you are single-handedly saving my bachelor's thesis! I could not make sense of anything about logreg in text books. Thank you!

@statquest 3 жыл бұрын

Good luck! :)

@jaegermeistersfriend 3 жыл бұрын

@@statquest Thanks! (: and while we're at it, can I ask what program you use to make your graphics?

@statquest 3 жыл бұрын

@@jaegermeistersfriend I draw most things by hand in Keynote. Other graphs are created in R.

@Mona-so9ss 6 жыл бұрын

what if we have a discrete variable instead of weight? how do we find the best fit then? also would love to see a video on multiple logistic regression!!

@statquest 6 жыл бұрын

This is a good question! Talk about this in "Part 1" and "Part 2" of this series: kzbin.info/www/bejne/rH-YlIGEZ5J7jac and kzbin.info/www/bejne/eJeukqGiZsaGfZI

@statquest 6 жыл бұрын

Also, once you understand how parameters are estimated for Logistic Regression, it's easy to see that it works just like like regular multiple regression when you have more variables predicting whatever it is you're predicting.

@Mona-so9ss 6 жыл бұрын

Thanks! one more (stupid) question. When you convert the probability of obesity to log odds of obesity, the x axis- weight is also converted to log weight? If not then what is the x axis in log odds graph?

@statquest 6 жыл бұрын

Not a stupid question at all. The x-axis stays the same. The parameter (slope) tells you that for every one unit of weight (the x-axis in the original units), you increase (or decrease, depending on the angle of the slope) the log(odds) of obesity (you either go up or down along the y-axis, which is now now in log(odds) units).

@foreverpali 2 жыл бұрын

Your videos are amazing! You make statistic modules so simple and understandable, thank you!

@statquest 2 жыл бұрын

Glad you like them!

@arshsadh7332 Жыл бұрын

Hey Josh, Thanks for sharing this. It really helped me clear some doubts. I have one doubt, how do I find p-values using the chi-squared distribution if degrees of freedom is 10, for example?

@statquest Жыл бұрын

It depends on what tool you use. In R, we calculate it with: 1 - pchisq(2*(ll.proposed - ll.null), df=10).

@henri9289 4 жыл бұрын

Hi, do you have any instrutions of multinomial ordinal logistic regression ?

@statquest 4 жыл бұрын

Not yet.

@henri9289 4 жыл бұрын

@@statquest I can not find its content on internet I have been beated by this statistic ... most of academics usually teach about binomial one

@statquest 4 жыл бұрын

@@henri9289 Noted

@henri9289 4 жыл бұрын

@@statquest I have searched for content on both internet and library, I have only found binomial's equations... Iam looking for multinomial in order to write the equations on my dissertation

@chuangchen5547 5 жыл бұрын

In the last part of the lecture, why it follows chi-square distribution when we calculate the p-value? Further, why the chi-square value is determined by 2*(LL(fit) - LL(overall))?? Thanks.

@lishanjiang260 5 жыл бұрын

likelihood ratio test converge in distribution to chi-square asymptotically

@elenaviter4138 5 жыл бұрын

en.wikipedia.org/wiki/Wilks%27_theorem

@iraidaredondo5008 4 жыл бұрын

Hi, Josh I would really appreciate if you could help me with some doubts I have dealing with my own data. I'm trying to figure out if some morphological features determine reproductive status (0 = not reproductive in a given season; 1 = reproductive in a given seaosn) in a wild passerine. Instead of analyzing each phenotypic trait separately, we decided to do a logistic regression where status is the response variable and morphological features the explanatory one. In my case, the capture year is placed as a random factor in our model. My question is: is there a better way to get an R^2 for mixed generalized models? I've enjoyed these series a lot since they'd helped me build confidence and knowledge about what I was doing! Thank you so much!

@statquest 4 жыл бұрын

Unfortunately I can't help you with mixed models at this time.

@rabbitazteca23 2 жыл бұрын

If my model has a high p-value (x variable is not correlated to y) but has a high R-squared value (meaning the variance in the y data is explained by x = our line fit our data well) what does this tell us? How can x be not related to y but at the same time our y's correspond to correct and reasonable values for x?

@statquest 2 жыл бұрын

If we only had 2 data points, then we could get the squiggle or line to fit them perfectly, resulting in a high r-squared. However, any too random points will result in a perfect fit (just connect the two points), so the p-value will be terrible. Thus, one thing the p-value can tell us is how much data supports the r-squared value.

@utsavprabhakar5072 6 жыл бұрын

Whats R-squared and p ? Do you have a stat quest where ther are explained or mentioned for the first time?

@statquest 6 жыл бұрын

These are great questions. I have a bunch of videos that talk about R-squared and P-values. Check out: kzbin.info/www/bejne/a4ucgHyPdp17m5o kzbin.info/www/bejne/aHK0fKCtZpmgfq8 kzbin.info/www/bejne/pJyVdIR_idKSm9E

@utsavprabhakar5072 6 жыл бұрын

StatQuest with Josh Starmer thanks :)

@kanikabagree1084 3 жыл бұрын

This is the best channel i've come across to understand the statsbehind the ML algorithms thaaankyou Josh ❤️ love from India.

@statquest 3 жыл бұрын

Awesome, thank you!

@narendrasompalli5536 4 жыл бұрын

Sir how do we calculate the intercept and slop for logistic regression ? Please tell me with example

@statquest 4 жыл бұрын

We use maximum likelihood and gradient descent. For an example, see: kzbin.info/www/bejne/eJeukqGiZsaGfZI and kzbin.info/www/bejne/qXXZZZlqqJeGeJo

@narendrasompalli5536 4 жыл бұрын

Sir ,can't we calculate the slop and intercept to logistic regression without using gradient decent?

@statquest 4 жыл бұрын

@@narendrasompalli5536 There is not an analytical solution, so you have to use some iterative method. Gradient Descent is a popular method, but there are others you could use.

@narendrasompalli5536 4 жыл бұрын

Sir i said that we can calculate the best slop in linear regression by using sum((x-x bar) (y-y bar)) /sum(x-x bar) ^2

@narendrasompalli5536 4 жыл бұрын

Like that can't we calculate in logistic regression!? Sir

@shivanidhawal8261 4 жыл бұрын

Hey Josh ! Loved every video of yours question :i have read many books saying R^2 has a range of -infinity to 1, negative r in the case where regression completely fails to explain varitions among the data , it fails to map it. is this correct ? but you took the range from 0 to 1. which one is correct?

@statquest 4 жыл бұрын

For linear regression, R^2 can never go below 0. This is because your model can never be worse than the base line model. However, in other settings it is possible to have your model fit worse than the base line model.

@shivanidhawal8261 4 жыл бұрын

@@statquest thanks alot :) !

@yulinliu850 6 жыл бұрын

Excellent! Much appreciated!

@statquest 6 жыл бұрын

Thank you!

@TheRamnath007 5 жыл бұрын

the squiggle line is the best fit line right? which is -3.77. but in the later part of the video you take -6.18 and say it a LL(FIt). But that is LL(overall prob). Why is that so?

@statquest 5 жыл бұрын

There is a lot in this video, so can you tell me what time point (minute and seconds) is confusing you?

@TheRamnath007 5 жыл бұрын

@@statquest Check the video at 5.18(LLfit) , 6.51 (overall prob) and 8.41 (LLfit)

@statquest 5 жыл бұрын

@@TheRamnath007 OK, so in this video, I use three different datasets to demonstrate how to calculate the R^2 value. For the first dataset weight is correlated with obesity, and I calculate LL(fit) = -3.77 and LL(overall) = -6.18. Then I calculate the R^2 = 0.39 at 7:25 . Thus, the R^2 confirms that weight is correlated with obesity. After that first example, I then create a new dataset that does not have a correlation between weight and obesity. I then calculate LL(fit) and LL(overall) for the new dataset. In this case, both LL(fit) and LL(overall) = -6.18. I then plug this number into the formula for R^2 and get R^2 = 0 (see 9:22 ). So the R^2 confirms that this new dataset is not correlated. After the second example, I then create a new dataset where there is tons of correlation between weight and obesity. I then calculate LL(fit) = 0 and LL(overall) = -6.18 for this new dataset. Lastly, I calculate R^2 and get 1 (see 11:26 ). My guess is that the thing that is confusing is that the number -6.18 keeps coming up in each example. This is because each made up dataset for the three examples has 4 obese mice and 5 mice that are not-obese. This means that the LL(overall) will be -6.18 in all three examples. However, it also means that LL(fit) = -6.18 in the second example because the data are not correlated and the best fit is a horizontal line at the log(odds), just like LL(overall). Does this make sense?

@kevinshah8471 4 жыл бұрын

Hey Josh! Great videos. I have a doubt though. In the first video, you used the intercept and slope of the log-odds graph to show that the p-value is not less than 0.05 (using walds). Here, for the same model, you used maximum likelihood and got a p-value less than 0.05. I don't understand why the two differ. Is it that using walds is one method and maximum likelihood is another and I'll accept one of the two values? Thanks.

@statquest 4 жыл бұрын

Your question makes me suspect that you skipped watching Part 2 in this series. Part 2 explains the role that maximum likelihood plays in logistic regression. Hint, maximum likelihood does something completely different from Wald's test. For more details, see: kzbin.info/www/bejne/eJeukqGiZsaGfZI

@kevinshah8471 4 жыл бұрын

@@statquest I went back and rewatched the video. Thanks man!

@zhiyongbai4414 3 жыл бұрын

Thanks both! I have the same qn here: 1) does it mean with one x-variable, the p value of the coefficient (part 1) and p value of the model (part 3) are the same? 2) and if there are more than 1 x-variable, p value of the model (part 3) means if the combined effects of the x-variables are stats sig? Thank you!

@jonathanbarajas7940 2 жыл бұрын

Que gran video!

@statquest 2 жыл бұрын

Muchas gracias!

@kt4nk95 2 жыл бұрын

This may be a silly question, but I'm still confused where the 2[LL(fit) - LL(overall probability)] came from. How do we know to use that to calculate the p-value?

@statquest 2 жыл бұрын

Unfortunately, deriving that equation would probably take a whole video.

@zhou6075 2 жыл бұрын

so understandable

@statquest 2 жыл бұрын

Hooray!

@jessicatan278 6 жыл бұрын

and why is it 0.44 and not 0.45 at min 8:37? :'(

@statquest 6 жыл бұрын

Again, this is just poor rounding on my behalf. The true value is 0.4452208, which rounds to 0.45.

@anshulsaini5401 3 жыл бұрын

I had a doubt, that in logistic regression what does this R square value actually tells? In Linear regression it used to tell the amount of variance explained by our model. How do we interpret it in Logistsic regression? Is it really helpful in logistic regression or we can just skip it's interpretation?

@statquest 3 жыл бұрын

Umm... this whole video is intended to answer your question. Is there a specific time point that is confusing?

@anshulsaini5401 3 жыл бұрын

@@statquest I was reading a article on google and it said that R square in logistic regression is not used to tell the explained variance but rather the improvement in model likelihood over null. I wasn't able to relate with this video. Just wondering what actually it reresents in logistic regression.

@statquest 3 жыл бұрын

@@anshulsaini5401 I guess I don't understand the question since this video, and the article on google, say that the R^2 is the improvement in model likelihood over the null.

@mahdimohammadalipour3077 2 жыл бұрын

I've heard that we can not apply LSE to find the best fit in logistic regression and honestly, yet I don't know why? (When it comes to log(odds) I know that residuals are infinity and we can't) but why don't we simply assume that our data is only 0 or 1 and simply use LSE just like linear model to find best fit. i.e. we have data that are obese (1) and not obese (0) and we use logistic regression with specific threshold (0.5) to predict 0 and 1's and then we define cost function and try to minimize it?

@statquest 2 жыл бұрын

It's actually possible to use the sum of the squared residuals, but it doesn't always work as well. To learn more see: kzbin.info/www/bejne/bHLVhKypatZ7d7c (NOTE: To understand what is going on, just replace "cross entropy" with "log(odds)")

@remid5842 2 жыл бұрын

Shouldn't it be 0.56 instead of 0.55 at 6:46? Or did I misunderstand?

@statquest 2 жыл бұрын

You are correct. That's a typo. Sorry for the confusion.

@janinajochim1843 4 жыл бұрын

Hi there! Thank you for this fantastic video! I've been struggling to understand the outcome of the pseudo-R square in my model and what this means for me to proceed. For McFadden's R-square, I got 0.03 for my final model. Whilst the internet tells me to be 1. Careful with the interpretation 2. That a score of 0.2 - 0.4 is desirable and that 3. The interpretation is 'not the same as for OLS R-square' and 4. That pseudo R-squares are smaller in general than OLS R-squares, it doesn't really tell me where to go from here. How bad is 0.03? Can I still interprete my odds ratios or do I need to re-specify my model? There is no doubt that I am lacking relevant variables in my model, however, none of them were assessed in the study! Thank you so much in advance (PLEASE HELP ME!!!!).

@janinajochim1843 4 жыл бұрын

* I should have also added that I have multiple IVs in my model and 3-4 of them are significant. I wonder to what extent I can interpret them as important predictors regardless of high R-square

@statquest 4 жыл бұрын

0.03 seems pretty small to me, and thus, despite the significance of the independent variables, they do not give you very much information about what is really going on with what you are trying to model.

@janinajochim1843 4 жыл бұрын

@@statquest The promised funny story: Recently overheard two of my fellow students having the following exchange: Student 1: I am not sure what to do over the summer Student 2: Mh ... Student 1: Was thinking about doing some modelling Student 2: Oh cool. What like for magazines? Student 1: What? Student 2: You didn't mean on catwalks, right? Student 1: What? I meant with my mice- data!

@statquest 4 жыл бұрын

@@janinajochim1843 That is great!!! Very funny. I got a big laugh out of that. :)

@maidang4081 3 жыл бұрын

Your videos are very well explained and clearly understandable, your BAM is a huge hugeee plus. I learnt so much via your videos than my grad shcool's ML lectures. Also, I have a small question. I am new to Machine Learning and also have a fear of it... so anw, can you please explain to me "Why the residuals for Logistic Regression are all infinite?" because the data point is probability so its range is between 0 and 1...? I just can't get my brain stretching out with it T.T

@statquest 3 жыл бұрын

I answer your question in this video: kzbin.info/www/bejne/eJeukqGiZsaGfZI

@maidang4081 3 жыл бұрын

@@statquest Thank you so much!!! I will look into that :)

@tamerosman774 2 жыл бұрын

Can you do the linear and logistic regression in matrix form please

@statquest 2 жыл бұрын

I go through design matrices in these videos: kzbin.info/www/bejne/hHeYkJWqhMZ2n8k kzbin.info/www/bejne/eaKveKmtnpJohsU and kzbin.info/www/bejne/fqPVY5SkrrCSa9U

@tamerosman774 2 жыл бұрын

@@statquest Thank you! Are there any videos on Bayesian Networks?

@statquest 2 жыл бұрын

@@tamerosman774 Not yet.

@alexandrezajic4426 5 жыл бұрын

Hi Josh - appreciate your videos! I'm curious why you say that R squared only goes between 0 and 1, when it can go between negative infinity and 1. Any model can have infinitely poor fit - leading to significantly worse residuals than the mean's residuals. While this indicates your model is terrible, in the off chance that it happens (which it has for me), it would clear up any ensuing confusion that something must be broken with your programs. Thanks!

@statquest 5 жыл бұрын

Yeah, it's possible to have negative R-squared values. However, typically with Logistic Regression we compare "nested models". In other words, one model is the "simple model" and the other, the "fancy model", contains all of the variables in the "simple model" plus others. When this is the case for Logistic Regression, the fancy model can not do worse than the simple model because otherwise the parameters for the new variables would be zero (or not significantly different from zero), and thus, in the worst case, the simple model = the fancy model, which results in an R^2 = 0. However, when you don't use nested models, or you are working with something other than logistic regression, you can get negative values.

@alex_zetsu 5 жыл бұрын

10 different ways to calculate R squared? I'm just curious what they are so I can look them up. I can only find 4. McFadden's is the only one that seems to make sense to me since it's close to the linear models (presumably why you chose it), but I am curious as to what are all the ways to do it.

@statquest 5 жыл бұрын

Mittlbock and Schemper (1996) “Explained variation in logistic regression.” discuss *12* different R-squared formulas for Logistic Regression: citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.477.3328&rep=rep1&type=pdf

@janinajochim1843 4 жыл бұрын

stats.idre.ucla.edu/other/mult-pkg/faq/general/faq-what-are-pseudo-r-squareds/

@deuteros 3 жыл бұрын

@@janinajochim1843 Thanks!

@xuemeiwang1881 4 жыл бұрын

great man

@statquest 4 жыл бұрын

Thank you! :)

@coinatlas5953 3 жыл бұрын

What about the assumptions of a logistic regression which must not be violated?

@statquest 3 жыл бұрын

In log() space, you want to have a linear response.

@coinatlas5953 3 жыл бұрын

@@statquest But this linearity must be checked only if the predictor is continues right? Is there anything to check for categorical variables? Also thanks for responding.

@Nordlinger.Dr4ke 2 жыл бұрын

Thanks a lot, me and my friends really enjoy ur content. really appreciate ur content one of the best statistical video i had ever see

@statquest 2 жыл бұрын

Thank you so much 😀

@punchline9131 3 жыл бұрын

Is LL(fit) the same as the maximum-likelihood? And thanks for your excellent work! 👌

@statquest 3 жыл бұрын

LL(fit) is the log-likelihood of the fitted squiggle. We can use that as input to an algorithm that can maximize the likelihood. To learn more about maximum likelihood, see: kzbin.info/www/bejne/jpbTiaeibr5-rcU

@bennybenbenw 2 жыл бұрын

hi josh, log(likelihood of data given overall probability) isnt 0.56, but what u written is 0.55

@statquest 2 жыл бұрын

What time point, minutes and seconds, are you referring to?

@bennybenbenw 2 жыл бұрын

@@statquest kzbin.info/www/bejne/rqmpiqWlbbaojqM & kzbin.info/www/bejne/rqmpiqWlbbaojqM

@statquest 2 жыл бұрын

@@bennybenbenw I see. Yes, that's just a rounding error.

@evan168gt6 4 жыл бұрын

Hello, Josh! Your content is so useful, it’s single handedly carried me through my paper! I thank you very much and hope you continue to post content. Also as a side note, is there no possible way of calculating the correlation of a logistic regression? Any insight is greatly appreciated!

@statquest 4 жыл бұрын

Thanks! There is no way to calculate a "normal" correlation for logistic regression because of the infinite distance between the data and the log(odds) linear fit.