Correction: 13:58 The formula at should be 2[(LL(saturated) - LL(overall)) - (LL(saturated) - LL(fit))]. I got the terms flipped. Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
@falaksingla62422 жыл бұрын
Hi Josh, Love your content. Has helped me to learn a lot & grow. You are doing an awesome work. Please continue to do so. Wanted to support you but unfortunately your Paypal link seems to be dysfunctional. Please update it.
@hayagreevansriram3264 жыл бұрын
4 days on this channel, I've learnt more than 12 weeks' lectures at college. Thank you, Josh!!
@statquest4 жыл бұрын
Awesome!!! I'm glad you're enjoying my videos. :)
@hayagreevansriram3264 жыл бұрын
@@statquest Enjoying them as well as hoping they'll help me ace my Data Mining exam tomorrow 😂
@statquest4 жыл бұрын
@@hayagreevansriram326 Good luck tomorrow and let me know how it goes.
@peasant123454 жыл бұрын
what do you think about the tuition colleges charge?
@gracel29312 жыл бұрын
Same 😂
@sharonlee52196 жыл бұрын
I've been binge-watching many of your videos recently to learn more about stats & RNA-Seq and I just wanted to say thank you for all the work you do! these videos are amazing and have been so incredibly helpful in explaining things :)
@statquest6 жыл бұрын
You’re welcome!!! I’m glad you like my videos so much. I have a lot of fun putting them together. :)
@statquest6 жыл бұрын
ps, I have 3 more videos on logistic regression coming out in July. :)
@rameshbabu22282 жыл бұрын
Your explanation always unique sir. I completed Masters in Statistics, my brother did PhD in Statistics had explained Logistic Regression theoretically but not satisfied. I have huge confidence on your explanation and hard work so listened got 200 % satisfication. Thank you so much sir
@statquest2 жыл бұрын
Thank you!
@xiaoyuqian53172 жыл бұрын
Hi, Josh. I started watching your video 3 years ago. At that time, I was a master in bioinformatics, I came across many questions in statistics while doing my research. Your video is clear and instructive, which allows me to put the models mentioned in your video into my research very quickly. It means a lot to me. Now I have already started my career as a PhD candidate in statistical genetics. Your videos have really helped me a lot at an important time in my career, I can't put your name in my journal article, but it deserves a place there, a sincere thank you for the video you uploaded. Wish you happiness every day.
@statquest2 жыл бұрын
Thank you very much!!! I'm so glad that my videos have helped you and good luck with your PhD! BAM! :)
@karakter34 жыл бұрын
I've been having difficulty going through grad level stats after taking a loong break from academics and found your videos very useful and so much fun, thank you !
@statquest4 жыл бұрын
Thank you! :)
@magtazeum40715 жыл бұрын
I'm addicted to these intro songs..
@statquest5 жыл бұрын
:)
@vincenttan63033 жыл бұрын
I always wondered what the interviewers wanted me to say... I didn't know what I didn't know... until this.
@statquest3 жыл бұрын
bam!
@jesscharon91464 жыл бұрын
Thank you Josh, I’m a PhD student from China, and I’ve never learnt logistic regression before. But this is sooooo good for beginners like us, clear examples, clear explanations, humorous way of talking. I really appreciate you for making these fantastic videos. This gonna help me finish the most difficult quant. data analysis chapter. Thank you so much. Btw the singing at beginning is cute as always XDD
@statquest4 жыл бұрын
Thank you very much! :)
@dainegai5 жыл бұрын
Enjoying going through the logistic regression StatQuestline (i.e. playlist) :D Small nitpick @3:09 -- the horizontal line corresponding to the mean of the data is *not* the "worst" fitting line in a sum-of-squared-residuals sense (you can make some pretty bad-fitting lines if you wanted to ;p ). It's actually "the best-fitting line (in a sum-of-squared-residuals sense) when you're forced to have a slope of zero". (It's the best-fitting model with 1 less **degree of freedom** than the model that includes a potentially non-zero slope.) This corresponds to a flat line "y = (mean of the data)".
@statquest5 жыл бұрын
Very true.
@cezarystorczyk172210 ай бұрын
Dziękujemy.
@statquest10 ай бұрын
Thank you very much for supporting StatQuest!!! TRIPLE BAM!!! :)
@bolajiadedasola63693 ай бұрын
You are the best in teaching
@statquest3 ай бұрын
Thank you!
@alvaroaguado36 жыл бұрын
Awesome vids!! I don’t miss a statquest
@statquest6 жыл бұрын
Thank you! :)
@jessicatan2786 жыл бұрын
why is it 0.55 and not 0.56? at min 6:47
@statquest6 жыл бұрын
Ooops. I didn't do a good job rounding! The true value is 0.55555555....repeating, which rounds to 0.56. However, I messed up on the next slide and just put 0.55. Sorry for the confusion.
@carloscamargo5664 жыл бұрын
I'm watching your videos from Colombia and it's amazing how trivial has become distance and money to get access to extremely good quality knowledge , I really appreciate the work you put on your videos it have really helped me a lot on improving my Statistical analysis skills , thank you!
@statquest4 жыл бұрын
Hooray!!! I'm so glad you can watch and learn from my videos. I'm very passionate about helping everyone learn.
@TheImpulsiveRamble4 ай бұрын
Hi Josh, I don't know if you're still monitoring comments, but let me begin by thanking you for putting together these videos. As someone who didn't enjoy math and stat back when I was a student, it's refreshing to have someone provide such clear and concise explanations of the intuition behind concepts instead of getting muddied up in abstractions and notations. I have a few clarifying questions regarding the interpretation of the p-value of the McFadden's R-squared described in 11:55 of this video and the p-value of the coefficients described in 10:41 of the Pt1: Coefficients video. Is it appropriate to think of these as being analogous to the f-test and t-test in linear regression, respectively (i.e., the first tests the significance of the overall model whereas the second tests the significance of a single coefficient)? If so, just as the f-test can find that coefficients are significant jointly while the t-test can fail to find that coefficients are significant individually, can a similar situation occur with the aforementioned p-values in a logistic regression context? Thanks in advance for your reply.
@statquest4 ай бұрын
Yes and presumably. At least to me, it seems reasonable that you could have a model with a lot of parameters, where each parameter only contributes a tiny amount to the overall fit - so in the big picture, you have a predictive model, but the individual parameters don't have much of an effect.
@JohnWick-ls7yt3 жыл бұрын
You are the best musistician in the world!
@statquest3 жыл бұрын
Triple bam! :)
@margotalalicenciatura13765 жыл бұрын
First of all a million thanks for your work man! It's really outstanding and almost infuriating to think how bad teachers are most of the people in stats by contrast. Got two questions: first, you say we can't use least squares since in the log odds scale the residuals are infinite, couldn't we just use them in the probability scale with the squiggly line? Second, are you planning in eventually doing a MCMC StatQuest? That'd be reaaaaally handy. Thankss
@NazaninYari2 жыл бұрын
You are a GENIUS. Hats off to you!
@statquest2 жыл бұрын
Thank you!
@omercoskun6042 Жыл бұрын
I wonder why you mentioned SS(mean) as the worst fitting line. Clearly, there are worse lines that we can fit. I always thought SS(mean) as a base value, the line that minimizes the sum of squares if we only had y values and no x values (no input). By the way, loving your lectures, they are all clearly explained and super helpful!
@statquest Жыл бұрын
The mean of the thing we want to predict is thought of as the worst fitting line because that is what we would fit if we had nothing to predict (no x-axis value).
@russelllavery2281 Жыл бұрын
this series is great! Thanks.
@statquest Жыл бұрын
Glad you enjoy it!
@soya12264 жыл бұрын
this is extremely well explained!!much appreciated!
@statquest4 жыл бұрын
Thank you! :)
@casperhansen30125 жыл бұрын
Hey Josh, I was wondering about the projecting of points at negative or positive infinity onto the candidate line, or just any line in general. You just say that we project the data onto the line at 5:57. But how does the math work?
@エリアル-d7x5 жыл бұрын
Here is what I think:There are 5 mice obese and 4 not obese,totally 9 mice.Without considering for weight,the probability of a mouse being obese is 5/9=0.56.If we map the probability(5/9) to the right figure,that is log(0.56 / 1-0.56)=log(5/4)=0.22.
@ToloSanso-dg3poАй бұрын
I think your channel is the best in stats! I have a question about this video. In min 9:73, how can you proyect the data onto de candidate line? The line is so vertical that I can´t see how you can do that proyection in order to get log(1) and the log (0) in the log(likehood) in min 10:03. Thank you
@statquestАй бұрын
The line is near, but not quite vertical. If we had a much larger computer screen, we would see that the line has y-axis coordinates that correspond to the x-axis values for the data. We can solve for those y-axis coordinates by multiplying the x-axis values by 22.42 and adding the intercept -63.72.
@ToloSanso-dg3po17 күн бұрын
@@statquest I understand! Thank you very much!
@StephenRoseDuo6 жыл бұрын
Now I can't wait for the deviance videos!
@statquest6 жыл бұрын
I've got the slides all done for it - so it's ready to go. The bummer is that I'm traveling a lot in the next two weeks so it won't be out for a while... unless I can somehow make it happen this Friday.... I'll see what I can do.
@miguelangelpastorvalverde91963 жыл бұрын
Thank you very much Josh for clarifying my doubts. I am doing a logistic regression, and I have 2 questions 1) Why do I get a significant p- Value and I get an r-square of 2 percent for a specific independent variable? If I get a r-square of 2 percent, I should get a pvalue greater than 0.05 (not significant)? 2) How valid that probability equation will serve me? Look residual ?
@statquest3 жыл бұрын
You can have a terrible R-squared value and still have a small p-value if you have a lot of data. However, if the R^2 value is bad, then, even with a significant p-value, your model may not be worth very much.
@miguelangelpastorvalverde91963 жыл бұрын
I really appreciate the time you take to answer questions !! Thanks, I already have it clearer
@lprasai2 жыл бұрын
Who liked the way he says StatQueeest!
@statquest2 жыл бұрын
bam!
@abiyosopurnomosakti19945 жыл бұрын
What a prolific teaching Josh! Enjoy your song as well! :)
@statquest5 жыл бұрын
Thank you! :)
@manikdhingra16065 жыл бұрын
Hello Josh, again much thanks for the video. QQ- @13:27 how did you calculate the p-value using formula [ 2*(LL(fit) - LL(overall Probability))]? I've already watched P-value video but unable to figure out. Don't know what I am missing. Thanks in advance!
@jhfoleiss5 жыл бұрын
Hi! I think Josh would give you a much better explanation, but i'll try :) Chi-square distributions come in different degrees of freedom. In the case of logistic regression, the degrees of freedom is 1 (2 parameters in the logistic regression (y-intercept and slope), and 1 parameter for the overall probability (y-intercept, just a horizontal line), thus 2-1=1). Thus, you need to use the Chi-square distribution with 1 degree of freedom. *The p-value is given by the area under the 1-DoF chi-square distribution (integral) from [ 2*(LL(fit) - LL(overall Probability))] to infinity!* In the first example: Since, by definition, the area under a statistical distribution curve is always 1, and [ 2*(LL(fit) - LL(overall Probability))] = 0, the integral is over the entire distribution (chi-square support (domain) is from 0 to +infty), thus 1. Therefore, the p-value = 1. In the second example: [ 2*(LL(fit) - LL(overall Probability))] = 4.82. The integral of the 1-DoF chi-square distribution from 4.82 to +infinity is 0.03. Thus, the p-value = 0.03, which is statistically significant in most situations, since it is less than 0.05. Hope this helps!
@mortezamohammadi9963 Жыл бұрын
The formula to calculate the p-value from the test statistic in logistic regression is based on the principles of hypothesis testing and the properties of the standard normal distribution. Here's a step-by-step explanation of how the formula is derived: 1. **Null Hypothesis and Test Statistic**: In hypothesis testing, you start with a null hypothesis (\(H_0\)) that assumes no effect (e.g., the coefficient is zero). The test statistic \(z\) is calculated to measure how far the estimated coefficient (\(\hat{\beta}\)) is from the null hypothesis value (usually zero). The formula for the test statistic is: \[ z = \frac{\hat{\beta}}{SE(\hat{\beta})} \] 2. **Standard Normal Distribution**: Under the null hypothesis, the test statistic \(z\) follows a standard normal distribution (\(N(0, 1)\)). This is a fundamental property of hypothesis testing. 3. **Two-Tailed Test**: Since you're interested in whether the coefficient is significantly different from zero (two-tailed test), you want to calculate the probability of observing a test statistic as extreme as \(z\) in either tail of the standard normal distribution. 4. **Cumulative Distribution Function (CDF)**: The cumulative distribution function (\(\Phi(z)\)) of the standard normal distribution gives you the probability that a standard normal random variable is less than or equal to \(z\). In mathematical notation: \(\Phi(z) = P(Z \leq z)\). 5. **Probability Calculation**: The p-value is the probability of observing a test statistic as extreme as \(z\) in both tails of the distribution. Since the standard normal distribution is symmetric, you can calculate the probability of observing a test statistic as extreme as \(z\) in one tail and then multiply it by 2 to account for both tails: \[ p = 2 \cdot (1 - \Phi(|z|)) \] Here, \(|z|\) ensures that the value inside the cumulative distribution function is positive. In summary, the formula \(p = 2 \cdot (1 - \Phi(|z|))\) calculates the p-value by determining the probability of observing a test statistic as extreme as \(z\) in both tails of the standard normal distribution. If this probability is small (i.e., the p-value is small), you have evidence to reject the null hypothesis and conclude that the coefficient is statistically significant.
@ruxiz20074 жыл бұрын
This is great great explanation, thanks!
@statquest4 жыл бұрын
Thanks!
@almonddonut18183 жыл бұрын
Thank you so much for your videos!
@statquest3 жыл бұрын
Glad you like them!
@Felicidade1016 жыл бұрын
Amazing Thank you Josh!
@statquest6 жыл бұрын
You’re welcome! I’m glad you like the videos! I have 3 more on Logistic Regression coming out in July. :)
@phongapex3741 Жыл бұрын
Hello! At the 8:24, you can determine the maximum likelihood with the intercept of -0.22. How can you know that? Which line do we have first? squiggle line OR straight line? I do not actually understand that at the beginning, we already had a squiggle line, then found p values of points to calculate log(odds) in order to get the straight line of the log(odds) graph. How did we have that squiggle line at the beginning? OR, we already had a straight line, then projected points to find the log(odds) values, next, calculated the p values in order to have the squiggle line. How did we have that straight line at the beginning? I AM STILL CONFUSED ...
@statquest Жыл бұрын
To learn more about how we fit lines and squiggles to data in logistic regression, see: kzbin.info/www/bejne/eJeukqGiZsaGfZI
@saltedfish_is_good6 ай бұрын
I am finally clear. Time for relu logistic model
@statquest6 ай бұрын
bam! :)
@jiayoongchong26064 жыл бұрын
13:56 out in the wild R squared value commonly written as
@ivanrecalde85434 жыл бұрын
Increible! Saludos desde Argentina
@statquest4 жыл бұрын
Gracias!!! :)
@UncleLoren4 жыл бұрын
So we took log(5/4) = .22, plugged it into the (e/1+e) equation and got .56, which we could have gotten from 5/9, proving there are two ways to come up with the same number, with one inducing a migraine. That's OK; I got it. Then, for some reason you plugged .55 into an equation -- not .56 -- and later used a NEGATIVE .22 to arrive at something that resulted in .45, the complement of .55...which you adjust to .44. WHY the .01 adjustment?? THROW ME A BONE, BRO!!! PLEASE. ****Update****: I just noticed in the "proof" portion of video that you changed the ratio of obesity from 5/4 to 4/5 which explains how #s got turned upside down. You just HAD to pick something strikingly similar to the previous example to confuse me, right? But why, Josh? If your videos make 99.999% of the people viewing them smarter and one person ends up smashing themselves in the head with a hammer, can you see how this might be a problem? It reminds me of the class imbalance problem. For a certain audience, your videos are excellent, you're a saint for creating them and it's unfortunate that I am an imbecile. Thank you for reading. (Only joking. I am getting smarter, just gotta stick with it. Thanks a million.)
@jodischmodi3 жыл бұрын
you're better than my prof
@statquest3 жыл бұрын
BAM! :)
@tysonliu283311 ай бұрын
so essentially with a model where weight is a very poor predictor for obese, the best line that we can find will be as poor as the LL(overall probablity), therefore R2 is 0, otherwise with a perfect predictor, LL(fit) is dramatically different from the LL(overall probablity) so that R2 is 1
@statquest11 ай бұрын
yep
@ml63525 жыл бұрын
Hi Josh, really good explanations :) I have seen already all the logistic regression series. Just one question: I would assume that the Part 1 [Coefficients] is basically the last part occurring when performing a logistic regression, right? I mean the algorithm will first optimize the squiggly line to the best fit(Part 2) , then evaluate for the significance (Part 3) . Finally the results can be seen by interpreting the coefficients (Part 1) which are given in terms of log(Odds). I hope you understand my question :) Thanks in advance and happy holidays. Marcelo
@statquest5 жыл бұрын
You are correct. The reason I organized the videos the way I did was to follow the output that R gives you when you do Logistic Regression. The first thing it prints out are the coefficients, and the last thing it prints out is the R^squared. So I was just going from the top and working my way down the output.
@ml63525 жыл бұрын
@@statquest Thank you 😊. Best regards from Germany
@statquest5 жыл бұрын
@@ml6352 Thanks! :)
@bhargavpotluri51474 жыл бұрын
I found out your channel 2 days back. Since then, my learning curve is going towards infinity (Original axis & not on the log odds axis :P). superb videos & content. Thanks a lot MAN !! Also one more suggestion, can you also include the cost function of the respective model so that it is 100% complete.
@statquest4 жыл бұрын
Awesome! I'm glad you like my videos! :)
@bhargavpotluri51474 жыл бұрын
@@statquest Hi Josh, Can you please come up with Image processing algorithms or NN models as well
@statquest4 жыл бұрын
@@bhargavpotluri5147 I'm working on the NN videos.
@bhargavpotluri51474 жыл бұрын
@@statquest Wow, Thanks Josh :)
@mriduls954 жыл бұрын
but what are the 2 groups of values on which we perform the chi square in the end? As chi square is performed on groups
@statquest4 жыл бұрын
In this case we are using a Chi-Square distribution to determine a p-value, but we are not performing a standard Chi-Squared test. This is similar to how a z-test is based on the normal distribution, but the normal distribution is used for a lot more things than just the z-test.
@tallwaters97086 жыл бұрын
Nice stuff as always! If you're still taking video ideas I'd love to see some stuff on Bayesian models, monte carlo, markov chains :)
@statquest6 жыл бұрын
Those are all on the to-do list... I'll get to them one day! I hope that day is soon! :)
@construenist69663 жыл бұрын
Very useful content 🔥
@statquest3 жыл бұрын
Thank you! :)
@adenuristiqomah9844 жыл бұрын
I am currently on your Machine Learning playlist, Josh. Keep up the good work
@statquest4 жыл бұрын
Thanks, will do!
@rishavdhariwal4782 Жыл бұрын
hi Josh i don't know if you will see this but i had a question how does one know which distribution to compare to determine the p values. Like in the video at 12:01 you said that the metric follows a chi squared distibution but how does one get the intuition fo when to use which distibution to get the coressponding p - value of the metric?
@statquest Жыл бұрын
We can use theory to derive the distribution. This is pretty advanced stuff (I did it once a long time ago), so usually we just look it up when needed rather than derive it from scratch.
@rishavdhariwal4782 Жыл бұрын
Thanks for the reply Josh, Can you give an example of the keywords we may use to lookup the corresponding distribution? Like i know for testing the coefficients of a linear regression model we use the T-test, but in time-series data, we use the ADF test for checking stationarity. Here the value for the T statistic of a coefficient is to see if it is higher than a certain threshold and based on that we reject or fail to reject the hypothesis. The problem is the threshold that is set here is higher than the one you get if you test it with a normal T-test(I don't know the exact distribution but it follows another distribution). So how may i go about finding the distribution for testing the statistic in the above case? @@statquest
@statquest Жыл бұрын
@@rishavdhariwal4782 To be honest, I'm not sure I understand your question. However, if you are interested in why these specific statistics have a chi-squared distribution, you can look at how Mcfadden's R-squared is derived.
@nataliakos49323 жыл бұрын
I watch this series with such commitment as if I were watching a good Netflix series. Just can't stop.
@statquest3 жыл бұрын
bam! :)
@LakshyaIIITD Жыл бұрын
3:09 I, think worst fitting line perpendicular to the best fitting line
@statquest Жыл бұрын
You are correct - I should have been a little more careful with my words at that point.
@desmondturner54353 жыл бұрын
Thank you for the help! This series is amazing. at 12:31 would the degrees of freedom for 2 independent variables be 2? and for 3, 3, etc?
@statquest3 жыл бұрын
I believe that is correct.
@deuteros3 жыл бұрын
Josh, I have read that pseudo R2 is not a good metric to compare models which predict the same variable through different covariates (different models built from individual covariates, y ~ x1, y ~ x2, y ~ x3, etc..). What is, in your opinion, the best way to do this comparison?
@statquest3 жыл бұрын
You can also use a confusion matrix and associated metrics (like sensitivity and specificity and ROC). For details, see: kzbin.info/www/bejne/gZXWoWmppNZ0bdE kzbin.info/www/bejne/rIGTZ5SDpN9nrJo kzbin.info/www/bejne/apu1c4V6l6-Yo68
@rabbitazteca232 жыл бұрын
Can we also use the maximum likelihood instead of its log version for calculating R^2
@statquest2 жыл бұрын
Maybe! I don't know off the top of my head. However, the log is often used to avoid underflow errors, so if you don't have too much data, it might work without the log.
@hang14453 жыл бұрын
13:40 Hello Josh, thanks for making this useful video list so that I can learn machine learning rather than studying in uni. And I would like to clarify sth. The logistic model you have built has a p-value of 0.03, does it indicate that there is a strong relationship between weight and obesity? Just like what you have said in the video, it is not due to chance. For the R^2 value, 0.39, does it indicate that the model is not good enough? We may need to add more parameters other than weight to classify whether the mice are obese or not. Hope you can correct me if I get sth wrong, thanks 😁
@statquest3 жыл бұрын
The p-value only tells us if the relationship is significantly different from random noise. The r-squared value tells us the strength of the relationship. How "strong" is "strong" depends on the field or area being studied.
@hang14453 жыл бұрын
So the relationship is significantly different from random noise as the p value is so small. Here, I have one thing to ask, what is random noise? Though, the relationship is significantly different from random noise, the strength of the relationship is not quite good as we obtain only 0.39. Do I interpret correct?
@statquest3 жыл бұрын
@@hang1445 Random Noise is just "random stuff", things that are not related. And if the p-value small, then you can conclude that your relationship is significantly different from random stuff that is not related (and that suggests it represents a true relationship). As for the R-squared value. Depending on the field, 0.39 may be considered a "weak" relationship, other fields might consider it "strong". It depends on the type of data you are working with.
@hang14453 жыл бұрын
Well explained! Thanks :)
@SS-ve1jm2 жыл бұрын
Amazing content please continue to upload videos always and grow this channel🎉 Triple BAM🎉
@statquest2 жыл бұрын
Thank you! :)
@thomasamet58533 жыл бұрын
Great explanations !!! At 11:06, is it the log( likelihood of the data given the line) or the log(likelihood having this squiggly line given the data)?
@statquest3 жыл бұрын
I believe it is the log( likelihood of the data given the line)
@thomasamet58533 жыл бұрын
@@statquest Thank you for the answer. I thought we were trying to find optimum parameters of the linear equation which would yield in the best sigmoid. Thus finding the MLE of the sigmoid (hence parameters) given the data. I'll watch your video on the MLE again then. I am still confused with the difference between the two.
@statquest3 жыл бұрын
@@thomasamet5853 Regardless of how you phrase it, the likelihoods are the y-axis coordinates on the squiggle for each data point.
@thomasamet58533 жыл бұрын
That helps a lot. Thank you again for taking the time to answer and for the amazing content :)
@willychen69674 жыл бұрын
Hi Josh, I really enjoy these videos. Can you possibly do one that relates extreme value theory ( I'm thinking of T1EV) to the logit function?
@rrrprogram86676 жыл бұрын
Here it comess.... Great teaching josh... Thanks for all ur efforts...
@statquest6 жыл бұрын
You are welcome!!! I'm always so happy to hear how much you like the videos! :)
@rrrprogram86676 жыл бұрын
StatQuest with Josh Starmer this is awesome channel for machine learning... Hope next exercise is in R
@statquest6 жыл бұрын
I've got one more video, on the saturated model and deviance statistics, and then we put everything together with "Logistic Regression in R".
@rrrprogram86676 жыл бұрын
StatQuest with Josh Starmer woowwww.... We love statquest videos
@elrishiilustrado95923 жыл бұрын
It's very clear, thank you ! so the number of degrees of freedom its equal to the number of Xi variables? in this case we have a y variable and only 1 x variable, so we have only 1degree of freedom, but if we have 3 xi variables the degrees of freedom would be 3? bonus question : how do you compare logistic models ? how can i choose the best ? Thanks !
@statquest3 жыл бұрын
The degrees of freedom is the difference in the number of parameters between the fitted model and the overall probability (which typically only has 1 parameter). So if the fitted model has 3 parameters, then DF = 3 - 1 = 2. People often use the Akaike information criterion (AIC) to choose the best model. For details, see: en.wikipedia.org/wiki/Akaike_information_criterion
@BeginnerVille Жыл бұрын
If directly project the data into the S shape logistice regression, wouldn't can get same image as 5:04? Don't get why need to do these.
@statquest Жыл бұрын
I'm not sure I understand your question, can you rephrase it?
@BeginnerVille Жыл бұрын
@@statquest Sorry, I mean original data distrubute on continuously x and binary y(0,1) But with the S shape logistice regression, it's intuition to direct project the y(0,1) on the regression line to get y values(0.01,0.5 0.99) directly. (Same as input x and get the y from regression line.) Why I must turn into log ,turn back into p, then get the same graph as what I mention to calculate LL()? Thanks for your amazing visualized teaching~
@statquest Жыл бұрын
@@BeginnerVille Have you watched my video on how the 's' shape is fit to the data to begin with? kzbin.info/www/bejne/eJeukqGiZsaGfZI The answer you want may be there. Anyway, the reason we start out in the log(odds) space to begin with is that the "best fitting" line is linear with respect to the coefficients, and thus, we can easily optimize it. In contrast, we can't optimize the 's' shape squiggle directly. Thus, we start with a straight line (or linear function) in log(odds) space and then then translate it to the 's' shape fit in probability space. We can then evaluate how well the 's' fits the data by calculating the log(likelihoods). We use that log(odds) then to compare to alternatives.
@BeginnerVille Жыл бұрын
@@statquest Thanks! Finally get the working logic. Would you mind to explain more about why you said "In contrast, we can't optimize the 's' shape squiggle directly"? As I shallow understand, sigmoid function can use some coefficient like c1,c2. AS: 1/(1+e**(c1*(x-c0))) Isn't changing these two coefficient and project y on the sigmoid line, I can directly optimize the shape by same maximun likelihood? What's the limit of this way? Thank you for your thoughtful assistance.
@statquest Жыл бұрын
@@BeginnerVille First, the equation for the sigmoid is non-linear with respect to c1 and c2 because they are in the exponent for 'e'. This means we need to use a non-linear, or numerical technique (like gradient descent kzbin.info/www/bejne/qXXZZZlqqJeGeJo ) to find the optimal values for c1 and c2. And I believe that part of the problem with using the sigmoid equation is that the output values are restricted to be between 0 and 1, instead of -infinity and +infinity, and this makes the math for optimization much more complicated. In contrast, in log(odds) space, the output values can be any value between -infinity and +infinity, so standard numerical techniques can be easily used.
@marcobarreto54294 жыл бұрын
In the case of comparing a Ridge vs a Logistic model would R^2 be a good approach?
@statquest4 жыл бұрын
You would probably compare accuracy or some other metric used for classification.
@sajozsattila2 жыл бұрын
I have a question about the p-value. The 2(LL(fit)-LL(overall)) a point estimation for the probability of this value. So Chi f( 2(LL(fit)-LL(overall)) ) just give us the probability of this single value. In your example f_{\chi^2}(4.82) \approx 0.0163. So to get the actual p-value we need to use: 1 - F_{\chi^2}( 2(LL(fit)-LL(overall)) ), which is the area of the right tail where x > 2(LL(fit)-LL(overall)). In your example, the actual p-value is approx 0.0281. Am I right?
@statquest2 жыл бұрын
That seems correct. I rounded the value to 0.03.
@annillonaa4 жыл бұрын
amazing!!! So helpful !! the song makes it ever greater!!! Thank u!!
@statquest4 жыл бұрын
Thanks! :)
@xinzhaotong65316 ай бұрын
Hi Josh, at 11:39, the arrangement of the red and blue dots on p = 0.44 of the left figure seems incorrect. They should be positioned as follows from left to right: three red dots, two blue dots, one red dot, and three blue dots, as depicted in the figure on the right. This mistake should not impact the overall probability results of LL. Please correct me if I'm wrong. Thank you.
@statquest6 ай бұрын
The ordering of the red and blue dots in the left figure at 11:39 is based on the ordering that is introduced at 7:44, when weight has no relationship with obesity.
@murselmusabasic42604 жыл бұрын
What does it mean to project data onto the fit line? Thanks for great lessons!
@statquest4 жыл бұрын
Plug the x-axis coordinate for the data into the equation for the line to find the corresponding y-axis coordinate on the line.
@PunmasterSTP7 ай бұрын
LL Cool J? More like LL "StatQuest is here to stay!" 👍
@statquest7 ай бұрын
This is your best yet.
@PunmasterSTP7 ай бұрын
@@statquest Thank you! If you ever want to hear a pun on a particular topic, just let me know.
@jaegermeistersfriend3 жыл бұрын
you are single-handedly saving my bachelor's thesis! I could not make sense of anything about logreg in text books. Thank you!
@statquest3 жыл бұрын
Good luck! :)
@jaegermeistersfriend3 жыл бұрын
@@statquest Thanks! (: and while we're at it, can I ask what program you use to make your graphics?
@statquest3 жыл бұрын
@@jaegermeistersfriend I draw most things by hand in Keynote. Other graphs are created in R.
@Mona-so9ss6 жыл бұрын
what if we have a discrete variable instead of weight? how do we find the best fit then? also would love to see a video on multiple logistic regression!!
@statquest6 жыл бұрын
This is a good question! Talk about this in "Part 1" and "Part 2" of this series: kzbin.info/www/bejne/rH-YlIGEZ5J7jac and kzbin.info/www/bejne/eJeukqGiZsaGfZI
@statquest6 жыл бұрын
Also, once you understand how parameters are estimated for Logistic Regression, it's easy to see that it works just like like regular multiple regression when you have more variables predicting whatever it is you're predicting.
@Mona-so9ss6 жыл бұрын
Thanks! one more (stupid) question. When you convert the probability of obesity to log odds of obesity, the x axis- weight is also converted to log weight? If not then what is the x axis in log odds graph?
@statquest6 жыл бұрын
Not a stupid question at all. The x-axis stays the same. The parameter (slope) tells you that for every one unit of weight (the x-axis in the original units), you increase (or decrease, depending on the angle of the slope) the log(odds) of obesity (you either go up or down along the y-axis, which is now now in log(odds) units).
@foreverpali2 жыл бұрын
Your videos are amazing! You make statistic modules so simple and understandable, thank you!
@statquest2 жыл бұрын
Glad you like them!
@arshsadh7332 Жыл бұрын
Hey Josh, Thanks for sharing this. It really helped me clear some doubts. I have one doubt, how do I find p-values using the chi-squared distribution if degrees of freedom is 10, for example?
@statquest Жыл бұрын
It depends on what tool you use. In R, we calculate it with: 1 - pchisq(2*(ll.proposed - ll.null), df=10).
@henri92894 жыл бұрын
Hi, do you have any instrutions of multinomial ordinal logistic regression ?
@statquest4 жыл бұрын
Not yet.
@henri92894 жыл бұрын
@@statquest I can not find its content on internet I have been beated by this statistic ... most of academics usually teach about binomial one
@statquest4 жыл бұрын
@@henri9289 Noted
@henri92894 жыл бұрын
@@statquest I have searched for content on both internet and library, I have only found binomial's equations... Iam looking for multinomial in order to write the equations on my dissertation
@chuangchen55475 жыл бұрын
In the last part of the lecture, why it follows chi-square distribution when we calculate the p-value? Further, why the chi-square value is determined by 2*(LL(fit) - LL(overall))?? Thanks.
@lishanjiang2605 жыл бұрын
likelihood ratio test converge in distribution to chi-square asymptotically
@elenaviter41385 жыл бұрын
en.wikipedia.org/wiki/Wilks%27_theorem
@iraidaredondo50084 жыл бұрын
Hi, Josh I would really appreciate if you could help me with some doubts I have dealing with my own data. I'm trying to figure out if some morphological features determine reproductive status (0 = not reproductive in a given season; 1 = reproductive in a given seaosn) in a wild passerine. Instead of analyzing each phenotypic trait separately, we decided to do a logistic regression where status is the response variable and morphological features the explanatory one. In my case, the capture year is placed as a random factor in our model. My question is: is there a better way to get an R^2 for mixed generalized models? I've enjoyed these series a lot since they'd helped me build confidence and knowledge about what I was doing! Thank you so much!
@statquest4 жыл бұрын
Unfortunately I can't help you with mixed models at this time.
@rabbitazteca232 жыл бұрын
If my model has a high p-value (x variable is not correlated to y) but has a high R-squared value (meaning the variance in the y data is explained by x = our line fit our data well) what does this tell us? How can x be not related to y but at the same time our y's correspond to correct and reasonable values for x?
@statquest2 жыл бұрын
If we only had 2 data points, then we could get the squiggle or line to fit them perfectly, resulting in a high r-squared. However, any too random points will result in a perfect fit (just connect the two points), so the p-value will be terrible. Thus, one thing the p-value can tell us is how much data supports the r-squared value.
@utsavprabhakar50726 жыл бұрын
Whats R-squared and p ? Do you have a stat quest where ther are explained or mentioned for the first time?
@statquest6 жыл бұрын
These are great questions. I have a bunch of videos that talk about R-squared and P-values. Check out: kzbin.info/www/bejne/a4ucgHyPdp17m5o kzbin.info/www/bejne/aHK0fKCtZpmgfq8 kzbin.info/www/bejne/pJyVdIR_idKSm9E
@utsavprabhakar50726 жыл бұрын
StatQuest with Josh Starmer thanks :)
@kanikabagree10843 жыл бұрын
This is the best channel i've come across to understand the statsbehind the ML algorithms thaaankyou Josh ❤️ love from India.
@statquest3 жыл бұрын
Awesome, thank you!
@narendrasompalli55364 жыл бұрын
Sir how do we calculate the intercept and slop for logistic regression ? Please tell me with example
@statquest4 жыл бұрын
We use maximum likelihood and gradient descent. For an example, see: kzbin.info/www/bejne/eJeukqGiZsaGfZI and kzbin.info/www/bejne/qXXZZZlqqJeGeJo
@narendrasompalli55364 жыл бұрын
Sir ,can't we calculate the slop and intercept to logistic regression without using gradient decent?
@statquest4 жыл бұрын
@@narendrasompalli5536 There is not an analytical solution, so you have to use some iterative method. Gradient Descent is a popular method, but there are others you could use.
@narendrasompalli55364 жыл бұрын
Sir i said that we can calculate the best slop in linear regression by using sum((x-x bar) (y-y bar)) /sum(x-x bar) ^2
@narendrasompalli55364 жыл бұрын
Like that can't we calculate in logistic regression!? Sir
@shivanidhawal82614 жыл бұрын
Hey Josh ! Loved every video of yours question :i have read many books saying R^2 has a range of -infinity to 1, negative r in the case where regression completely fails to explain varitions among the data , it fails to map it. is this correct ? but you took the range from 0 to 1. which one is correct?
@statquest4 жыл бұрын
For linear regression, R^2 can never go below 0. This is because your model can never be worse than the base line model. However, in other settings it is possible to have your model fit worse than the base line model.
@shivanidhawal82614 жыл бұрын
@@statquest thanks alot :) !
@yulinliu8506 жыл бұрын
Excellent! Much appreciated!
@statquest6 жыл бұрын
Thank you!
@TheRamnath0075 жыл бұрын
the squiggle line is the best fit line right? which is -3.77. but in the later part of the video you take -6.18 and say it a LL(FIt). But that is LL(overall prob). Why is that so?
@statquest5 жыл бұрын
There is a lot in this video, so can you tell me what time point (minute and seconds) is confusing you?
@TheRamnath0075 жыл бұрын
@@statquest Check the video at 5.18(LLfit) , 6.51 (overall prob) and 8.41 (LLfit)
@statquest5 жыл бұрын
@@TheRamnath007 OK, so in this video, I use three different datasets to demonstrate how to calculate the R^2 value. For the first dataset weight is correlated with obesity, and I calculate LL(fit) = -3.77 and LL(overall) = -6.18. Then I calculate the R^2 = 0.39 at 7:25 . Thus, the R^2 confirms that weight is correlated with obesity. After that first example, I then create a new dataset that does not have a correlation between weight and obesity. I then calculate LL(fit) and LL(overall) for the new dataset. In this case, both LL(fit) and LL(overall) = -6.18. I then plug this number into the formula for R^2 and get R^2 = 0 (see 9:22 ). So the R^2 confirms that this new dataset is not correlated. After the second example, I then create a new dataset where there is tons of correlation between weight and obesity. I then calculate LL(fit) = 0 and LL(overall) = -6.18 for this new dataset. Lastly, I calculate R^2 and get 1 (see 11:26 ). My guess is that the thing that is confusing is that the number -6.18 keeps coming up in each example. This is because each made up dataset for the three examples has 4 obese mice and 5 mice that are not-obese. This means that the LL(overall) will be -6.18 in all three examples. However, it also means that LL(fit) = -6.18 in the second example because the data are not correlated and the best fit is a horizontal line at the log(odds), just like LL(overall). Does this make sense?
@kevinshah84714 жыл бұрын
Hey Josh! Great videos. I have a doubt though. In the first video, you used the intercept and slope of the log-odds graph to show that the p-value is not less than 0.05 (using walds). Here, for the same model, you used maximum likelihood and got a p-value less than 0.05. I don't understand why the two differ. Is it that using walds is one method and maximum likelihood is another and I'll accept one of the two values? Thanks.
@statquest4 жыл бұрын
Your question makes me suspect that you skipped watching Part 2 in this series. Part 2 explains the role that maximum likelihood plays in logistic regression. Hint, maximum likelihood does something completely different from Wald's test. For more details, see: kzbin.info/www/bejne/eJeukqGiZsaGfZI
@kevinshah84714 жыл бұрын
@@statquest I went back and rewatched the video. Thanks man!
@zhiyongbai44143 жыл бұрын
Thanks both! I have the same qn here: 1) does it mean with one x-variable, the p value of the coefficient (part 1) and p value of the model (part 3) are the same? 2) and if there are more than 1 x-variable, p value of the model (part 3) means if the combined effects of the x-variables are stats sig? Thank you!
@jonathanbarajas79402 жыл бұрын
Que gran video!
@statquest2 жыл бұрын
Muchas gracias!
@kt4nk952 жыл бұрын
This may be a silly question, but I'm still confused where the 2[LL(fit) - LL(overall probability)] came from. How do we know to use that to calculate the p-value?
@statquest2 жыл бұрын
Unfortunately, deriving that equation would probably take a whole video.
@zhou60752 жыл бұрын
so understandable
@statquest2 жыл бұрын
Hooray!
@jessicatan2786 жыл бұрын
and why is it 0.44 and not 0.45 at min 8:37? :'(
@statquest6 жыл бұрын
Again, this is just poor rounding on my behalf. The true value is 0.4452208, which rounds to 0.45.
@anshulsaini54013 жыл бұрын
I had a doubt, that in logistic regression what does this R square value actually tells? In Linear regression it used to tell the amount of variance explained by our model. How do we interpret it in Logistsic regression? Is it really helpful in logistic regression or we can just skip it's interpretation?
@statquest3 жыл бұрын
Umm... this whole video is intended to answer your question. Is there a specific time point that is confusing?
@anshulsaini54013 жыл бұрын
@@statquest I was reading a article on google and it said that R square in logistic regression is not used to tell the explained variance but rather the improvement in model likelihood over null. I wasn't able to relate with this video. Just wondering what actually it reresents in logistic regression.
@statquest3 жыл бұрын
@@anshulsaini5401 I guess I don't understand the question since this video, and the article on google, say that the R^2 is the improvement in model likelihood over the null.
@mahdimohammadalipour30772 жыл бұрын
I've heard that we can not apply LSE to find the best fit in logistic regression and honestly, yet I don't know why? (When it comes to log(odds) I know that residuals are infinity and we can't) but why don't we simply assume that our data is only 0 or 1 and simply use LSE just like linear model to find best fit. i.e. we have data that are obese (1) and not obese (0) and we use logistic regression with specific threshold (0.5) to predict 0 and 1's and then we define cost function and try to minimize it?
@statquest2 жыл бұрын
It's actually possible to use the sum of the squared residuals, but it doesn't always work as well. To learn more see: kzbin.info/www/bejne/bHLVhKypatZ7d7c (NOTE: To understand what is going on, just replace "cross entropy" with "log(odds)")
@remid58422 жыл бұрын
Shouldn't it be 0.56 instead of 0.55 at 6:46? Or did I misunderstand?
@statquest2 жыл бұрын
You are correct. That's a typo. Sorry for the confusion.
@janinajochim18434 жыл бұрын
Hi there! Thank you for this fantastic video! I've been struggling to understand the outcome of the pseudo-R square in my model and what this means for me to proceed. For McFadden's R-square, I got 0.03 for my final model. Whilst the internet tells me to be 1. Careful with the interpretation 2. That a score of 0.2 - 0.4 is desirable and that 3. The interpretation is 'not the same as for OLS R-square' and 4. That pseudo R-squares are smaller in general than OLS R-squares, it doesn't really tell me where to go from here. How bad is 0.03? Can I still interprete my odds ratios or do I need to re-specify my model? There is no doubt that I am lacking relevant variables in my model, however, none of them were assessed in the study! Thank you so much in advance (PLEASE HELP ME!!!!).
@janinajochim18434 жыл бұрын
* I should have also added that I have multiple IVs in my model and 3-4 of them are significant. I wonder to what extent I can interpret them as important predictors regardless of high R-square
@statquest4 жыл бұрын
0.03 seems pretty small to me, and thus, despite the significance of the independent variables, they do not give you very much information about what is really going on with what you are trying to model.
@janinajochim18434 жыл бұрын
:C
@janinajochim18434 жыл бұрын
@@statquest The promised funny story: Recently overheard two of my fellow students having the following exchange: Student 1: I am not sure what to do over the summer Student 2: Mh ... Student 1: Was thinking about doing some modelling Student 2: Oh cool. What like for magazines? Student 1: What? Student 2: You didn't mean on catwalks, right? Student 1: What? I meant with my mice- data!
@statquest4 жыл бұрын
@@janinajochim1843 That is great!!! Very funny. I got a big laugh out of that. :)
@maidang40813 жыл бұрын
Your videos are very well explained and clearly understandable, your BAM is a huge hugeee plus. I learnt so much via your videos than my grad shcool's ML lectures. Also, I have a small question. I am new to Machine Learning and also have a fear of it... so anw, can you please explain to me "Why the residuals for Logistic Regression are all infinite?" because the data point is probability so its range is between 0 and 1...? I just can't get my brain stretching out with it T.T
@statquest3 жыл бұрын
I answer your question in this video: kzbin.info/www/bejne/eJeukqGiZsaGfZI
@maidang40813 жыл бұрын
@@statquest Thank you so much!!! I will look into that :)
@tamerosman7742 жыл бұрын
Can you do the linear and logistic regression in matrix form please
@statquest2 жыл бұрын
I go through design matrices in these videos: kzbin.info/www/bejne/hHeYkJWqhMZ2n8k kzbin.info/www/bejne/eaKveKmtnpJohsU and kzbin.info/www/bejne/fqPVY5SkrrCSa9U
@tamerosman7742 жыл бұрын
@@statquest Thank you! Are there any videos on Bayesian Networks?
@statquest2 жыл бұрын
@@tamerosman774 Not yet.
@alexandrezajic44265 жыл бұрын
Hi Josh - appreciate your videos! I'm curious why you say that R squared only goes between 0 and 1, when it can go between negative infinity and 1. Any model can have infinitely poor fit - leading to significantly worse residuals than the mean's residuals. While this indicates your model is terrible, in the off chance that it happens (which it has for me), it would clear up any ensuing confusion that something must be broken with your programs. Thanks!
@statquest5 жыл бұрын
Yeah, it's possible to have negative R-squared values. However, typically with Logistic Regression we compare "nested models". In other words, one model is the "simple model" and the other, the "fancy model", contains all of the variables in the "simple model" plus others. When this is the case for Logistic Regression, the fancy model can not do worse than the simple model because otherwise the parameters for the new variables would be zero (or not significantly different from zero), and thus, in the worst case, the simple model = the fancy model, which results in an R^2 = 0. However, when you don't use nested models, or you are working with something other than logistic regression, you can get negative values.
@alex_zetsu5 жыл бұрын
10 different ways to calculate R squared? I'm just curious what they are so I can look them up. I can only find 4. McFadden's is the only one that seems to make sense to me since it's close to the linear models (presumably why you chose it), but I am curious as to what are all the ways to do it.
@statquest5 жыл бұрын
Mittlbock and Schemper (1996) “Explained variation in logistic regression.” discuss *12* different R-squared formulas for Logistic Regression: citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.477.3328&rep=rep1&type=pdf
What about the assumptions of a logistic regression which must not be violated?
@statquest3 жыл бұрын
In log() space, you want to have a linear response.
@coinatlas59533 жыл бұрын
@@statquest But this linearity must be checked only if the predictor is continues right? Is there anything to check for categorical variables? Also thanks for responding.
@Nordlinger.Dr4ke2 жыл бұрын
Thanks a lot, me and my friends really enjoy ur content. really appreciate ur content one of the best statistical video i had ever see
@statquest2 жыл бұрын
Thank you so much 😀
@punchline91313 жыл бұрын
Is LL(fit) the same as the maximum-likelihood? And thanks for your excellent work! 👌
@statquest3 жыл бұрын
LL(fit) is the log-likelihood of the fitted squiggle. We can use that as input to an algorithm that can maximize the likelihood. To learn more about maximum likelihood, see: kzbin.info/www/bejne/jpbTiaeibr5-rcU
@bennybenbenw2 жыл бұрын
hi josh, log(likelihood of data given overall probability) isnt 0.56, but what u written is 0.55
@statquest2 жыл бұрын
What time point, minutes and seconds, are you referring to?
@@bennybenbenw I see. Yes, that's just a rounding error.
@evan168gt64 жыл бұрын
Hello, Josh! Your content is so useful, it’s single handedly carried me through my paper! I thank you very much and hope you continue to post content. Also as a side note, is there no possible way of calculating the correlation of a logistic regression? Any insight is greatly appreciated!
@statquest4 жыл бұрын
Thanks! There is no way to calculate a "normal" correlation for logistic regression because of the infinite distance between the data and the log(odds) linear fit.
@wolfisraging6 жыл бұрын
Kudos to power kudos to you
@statquest6 жыл бұрын
Thank you!
@michael0520754 жыл бұрын
Very clear explanation. Thank you!
@statquest4 жыл бұрын
Thanks! :)
@billzen62293 жыл бұрын
why is it that the logistic regression residual are infinite? didn't quite get it
@statquest3 жыл бұрын
Because in log odds space (the graphs on the right side), probability = 1 is infinity and probability = 0 is negative infinity.