Saturated Models and Deviance

  Рет қаралды 115,166

StatQuest with Josh Starmer

StatQuest with Josh Starmer

Күн бұрын

Пікірлер: 195
@statquest
@statquest 5 жыл бұрын
NOTE: In statistics, machine learning and most programming languages, the default base for the log() function is 'e'. In other words, when I write, "log()", I mean "natural log()", or "ln()". Thus, the log to the base 'e' of 2.717 = 1. ALSO NOTE: This video is about Saturated Models and Deviance as applied to Logistic Regression (and not ordinary linear regression). Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
@falaksingla6242
@falaksingla6242 2 жыл бұрын
Hi Josh, Love your content. Has helped me to learn a lot & grow. You are doing an awesome work. Please continue to do so. Wanted to support you but unfortunately your Paypal link seems to be dysfunctional. Please update it.
@MilaBear
@MilaBear 4 жыл бұрын
You know you've been watching too many StatQuest videos when you say "double bam" before he does.
@statquest
@statquest 4 жыл бұрын
You made me laugh out loud! Funny. :)
@mili3212
@mili3212 4 жыл бұрын
dude wtf. why have I just discovered your channel? you have a gift to make things as simple as possible
@statquest
@statquest 4 жыл бұрын
Thanks! :)
@spartan9729
@spartan9729 Жыл бұрын
bro you are still 2 years earlier than me. I am discovering this right now when almost everyone is a pro in machine learning. 😭😭😭
@amarnathmishra8697
@amarnathmishra8697 3 жыл бұрын
I love the weird tunes you make during any sort of calculations.Besides needless to say your teaching is amazing!!!
@statquest
@statquest 3 жыл бұрын
Wow, thank you!
@mariaelisaperesoliveira4419
@mariaelisaperesoliveira4419 5 жыл бұрын
OMG HAHAHAHA You make statistics seem so easy and simple!!! I looooove this channel
@statquest
@statquest 5 жыл бұрын
Awesome! I'm impressed. Not many people watch this video because they don't want the details, but you've gone all the way. You deserve a prize! :)
@AndrewCarlson005
@AndrewCarlson005 5 жыл бұрын
10:17 BOOP BOOP BEEP BEEP BOOP BOOP BEEP BEEP BOOP BOOP BEEP BEEP HAHAHA had me dying
@statquest
@statquest 5 жыл бұрын
:) That's the sound of plugging in numbers. :)
@ehg02
@ehg02 4 жыл бұрын
@@statquest can you make a song about that hahaha
@taotaotan5671
@taotaotan5671 4 жыл бұрын
Another exciting Statquest! Thank you Josh
@statquest
@statquest 4 жыл бұрын
Bam! :)
@muhammedhadedy4570
@muhammedhadedy4570 3 жыл бұрын
The best KZbin channel ever existed.
@statquest
@statquest 3 жыл бұрын
Thanks!
@mutonchops1
@mutonchops1 5 жыл бұрын
Why on Earth did I not have you giving this lecture in my course - I actually understood it! Thanks!
@inamahdi7959
@inamahdi7959 Жыл бұрын
I have been listening to your lectures as a review on my commute and I am mesmerized. I never made the connection between LRT and these methods.bThey are one in the same! just re-written differently. or where the negative or the product of 2 come from.
@statquest
@statquest Жыл бұрын
Glad you are enjoying the videos and making some deep connnections.
@cfalguiere
@cfalguiere 5 жыл бұрын
Here for the song :-) and because it was an unknown piece of jargon to me. Jargon clarified... BAM!
@statquest
@statquest 5 жыл бұрын
Awesome!!!! Thank you very much. :)
@hgupta32323
@hgupta32323 3 жыл бұрын
When you are calculating the likelihood of each of the null (2:39). proposed (3:18), and saturated (3:57) models, where do those raw values you are multiplying together come from? To me, it looks a bit like you have drawn several probability density functions (PDF) and looking at the value of a given PDF for each of the points. However, I know my interpretation is wrong because the values are often greater than 1, thus I am confused as to what those values are?
@statquest
@statquest 3 жыл бұрын
You interpretation is actually correct. I drew PDF functions and I use the y-axis value above each point the x-axis. The y-axis values are likelihoods. Likelihoods are not probabilities, and thus, can be larger than 1. For example, if you draw a normal curve with mean = 0 and sd = 0.1, you will see many y-axis coordinates on the curve that are > 1. For more details on the difference between likelihoods and probabilities, see: kzbin.info/www/bejne/porbf4aLebh5fpY
@craigmauz
@craigmauz 5 жыл бұрын
@2:34 How did you calculate the LL of the data?
@taotaotan5671
@taotaotan5671 4 жыл бұрын
It's surprising to see t-test, linear model and anova roughly fit the framework of MLE, and likelihood ratio test. BAMMMMM
@taotaotan5671
@taotaotan5671 4 жыл бұрын
Wait... I want to confirm if that is correct. I tried to manually calculate the p-value from t test (equal variance), and manually calculate the log likelihood ratio and use chi-square to resolve the p-value. But it seems there is a small difference (second decimal). Is that because of the bias estimation of variance in MLE? It is really cool to resolve the same question from two very distinct frameworks (t test and likelihood ratio test). triple BAM!
@statquest
@statquest 4 жыл бұрын
I'm not really sure what causes the small difference, but I'm glad that it was small! That's a very cool thing to try out.
@taotaotan5671
@taotaotan5671 4 жыл бұрын
StatQuest with Josh Starmer BAM
@PunmasterSTP
@PunmasterSTP 7 ай бұрын
10:17 12:23 Peak number-plugging-in sound effect! 👌
@statquest
@statquest 7 ай бұрын
Haha! :)
@WeLoveChouBJu
@WeLoveChouBJu 4 жыл бұрын
I cannot like this video enough.
@statquest
@statquest 4 жыл бұрын
Thanks! :)
@margotalalicenciatura1376
@margotalalicenciatura1376 5 жыл бұрын
Hey Josh! Again, thanks for your awesome work as always! In this specific case, I got into more questions than answers, but after trying to find answers on my own I assume it's a conscious decision to not overly complicate things. Namely, you say that we "-2" in the log likelihood ratio is what makes it be chi-squared distributed. The origin of this seems to be the fairly complicate derivation in Wilks (1937), so I assume there is no quick intuitive way to answer why is this chi-squared distributed (i.e. in 13:42, which is the standard normal variable that's being squared?). In the same line, about why are we working with the log of the likelihoods, is it because the saturated likelihood is obtained by maximization and thus it is easier to work with the logs? If that's the case, I'm not clear on what dictates which densities are used for the likelihood computation. As you say, you used normal distribution here for illustration purposes, but with regards to the actual computation of the likelihoods, what distributions are used by the glm function for example? In logistic regression I would first say that it's a binomial but that's only conditional on the x_is and would demand one parameter per observation. And last but not least, the most simple question: why are Deviances and Likelihoods required in GLM in contrast to straight Sum of Squares and Residuals? Again, thanks a lot for your time!
@Gengar99
@Gengar99 2 жыл бұрын
I loved your video, super no-brainer examples, thank you.
@statquest
@statquest 2 жыл бұрын
Thank you!
@mahadmohamed2748
@mahadmohamed2748 3 жыл бұрын
15:26 Why does multiplying the log likelihood's by 2 give a chi-sqaured distribution with degrees of freedom equal to the difference in the number of parameters of the proposed model and the null model.
@statquest
@statquest 3 жыл бұрын
That would require a whole other StatQuest to explain.
@snehashishpaul2740
@snehashishpaul2740 6 жыл бұрын
Love your Bams and oan ouannn as much as your teaching. :)
@statquest
@statquest 6 жыл бұрын
Thank you!!! :)
@suyashmishra8821
@suyashmishra8821 2 жыл бұрын
sir U r great. U r Ultimate. U r dangerous. lots of love from a Data Scientist 🥰🥰🥰
@statquest
@statquest 2 жыл бұрын
Thanks!
@dominicj7977
@dominicj7977 5 жыл бұрын
How did you calculate the likelihood of univariate data?
@bobo0612
@bobo0612 5 жыл бұрын
I am wondering the same as yours. How can the likelihood bigger than 1...
@mktsp2
@mktsp2 3 жыл бұрын
@@bobo0612 a normal density with small std rises above 1..
@Maha_s1999
@Maha_s1999 6 жыл бұрын
That's right Josh- here just for the song 😂 Seriously - your videos are super helpful!
@statquest
@statquest 6 жыл бұрын
Thanks so much!!! Wow, you're even watching the video on Saturated Models! I like this one, but it's not very popular.
@Maha_s1999
@Maha_s1999 6 жыл бұрын
You are welcome! I am a graduate stat student and now studying generalised linear models (binomial, ordinal/nominal and count models) so any other videos like this one will be really useful. If I find myself watching more videos from your channel, I would definitely like to support you like I do with other channels I am an avid follower of. I saw that you prefer your subscribers buy your music - do you have a Patreon account for regular donations?
@aojing
@aojing 10 ай бұрын
@9:37 Hi Josh, can you explain why the Residual Deviance is defined as this form and what is the connection to chi-square test, or provide a reference link? Thanks.
@statquest
@statquest 10 ай бұрын
Unfortunately I can't explain it, other than the definition allows the values to be modeled by a chi-square distribution.
@jic3897
@jic3897 5 жыл бұрын
17:06 Can someone explain why the likelihoods will be between 0 and 1? I thought likelihoods are greater than 0 but can be positive infinite.
@statquest
@statquest 5 жыл бұрын
The value of the likelihoods depends on the distribution we are using. For example, with a normal distribution, the likelihoods can be much larger than 1. However, for the sigmoidal function that has a range from 0 to 1 that is used with Logistic Regression, the maximum value is 1. In other words, the likelihood is the vertical distance from a point on the x-axis to a point on the sigmoidal curve. Since that curve only goes to up to 1, then the maximum value is 1.
@Han-ve8uh
@Han-ve8uh 4 жыл бұрын
How is the saturated model at 16:19 created? I thought lines can only go upwards from left to right, but there is a line going downwards from 1st blue to 3rd red dot. Also how are there 3 line segments? Are they created from 3 logistic regression models, with first 2 red dot + 1st blue dot creating 1st model, 1st blue dot + 3rd red dot creating 2nd model, 3rd red dot + last 3 blue dot creating 3rd model?
@statquest
@statquest 4 жыл бұрын
The saturated model is just a model that fits the data perfectly. It does not have to follow rules like "must always increase" etc., because if it did, it would not be able to fit the data perfectly.
@xruan6582
@xruan6582 4 жыл бұрын
(2:50) can anyone explain why the probabilities 1.5, 2.5 are greater than 1 ?
@statquest
@statquest 4 жыл бұрын
Although we commonly treat the words "likelihood" and "probability" as if they mean the same thing, in statistics they are different. What you're seeing at 2:50 are likelihoods, not probabilities. I explain the difference in this video: kzbin.info/www/bejne/porbf4aLebh5fpY
@kimicheng5611
@kimicheng5611 4 жыл бұрын
@@statquest Could you please do an example that likelihood is greater than 1? I still feel confused how do you get 1.5 2.5 after watching the Probability vs Likelihood video.
@statquest
@statquest 4 жыл бұрын
@@kimicheng5611 Plot a normal distribution with mean = 0 and standard deviation = 0.1. The maximum y-axis value is 4.
@franzvonmoor6145
@franzvonmoor6145 4 жыл бұрын
@@statquest Hey Josh, hey everyone, Your videos a totes useful, thanks so much! I want to agree with other viewers here, that the Likelihoods above 1 in this video do cause quite a bit of confusion, since in most other videos, you explain likelihoods with values on the scale of [0, 1]. Hence, I first understood it that way that likelihoods are always between zero and one and respectively, log(Likelihoods) are always below or equal to zero. Looking it up on StackExchange I found a good helpful answer (stats.stackexchange.com/questions/4220/can-a-probability-distribution-value-exceeding-1-be-ok). I believe not to have totally grasped it yet, and hence can only humbly recommend you bridging this gap by updating the video where you explain likelihoods v probabilites and explain why and when likelihoods can be above 1. Best regards from Germany!
@jakobmeryn61
@jakobmeryn61 3 жыл бұрын
@@franzvonmoor6145 I still struggle with this concept. Would be great to have a new video.
@TheFacial83
@TheFacial83 2 жыл бұрын
Do you have video that about multinomial logistic regression . your video is the best in stat thanks a lot
@statquest
@statquest 2 жыл бұрын
Unfortunately I don't
@alanzhu7538
@alanzhu7538 4 жыл бұрын
2:37 How do you come up with the numbers for the likelihood of the data?
@statquest
@statquest 4 жыл бұрын
The likelihoods are the y-axis coordinates on the normal curve. For more details, see: kzbin.info/www/bejne/porbf4aLebh5fpY
@alanzhu7538
@alanzhu7538 4 жыл бұрын
StatQuest with Josh Starmer That is clear, thank you!
@kartikeyasharma4017
@kartikeyasharma4017 2 жыл бұрын
At 14:31, we have the p-value =0.002 which means that the proposed and the null model are statistically different but can't we infer that the result is by chance (connecting it with the root definition of p-value)
@statquest
@statquest 2 жыл бұрын
I probably should have been more careful with my wording at that point. I assumed that because the p-value was less than < 0.05 that we have rejected the null hypothesis that the data are due to random chance. However, you are correct, the p-value 0.002 tells us that the probability of random chance giving us the observed data or something rarer is 0.002.
@stefanopalliggiano6596
@stefanopalliggiano6596 3 жыл бұрын
Question: when you calculate the likelihood based on two means at 03:20, how do you know which distribution to use? The arrows intersect both distributions, so for each data point we should multiply two likelihoods? Could you please clarify?
@statquest
@statquest 3 жыл бұрын
The data are assigned to one of the two distributions. That's the whole idea of the "fancy" model - that some observations come from the distribution on the left and others come from the distribution on the right. So, given those assignments (which you make as part of creating the "fancy" model), you can calculate likelihoods.
@stefanopalliggiano6596
@stefanopalliggiano6596 3 жыл бұрын
@@statquest I see now, thank you for the reply!
@patrickbormann8103
@patrickbormann8103 4 жыл бұрын
Is the Saturated Model always a model thats overfitted in machine learning lingus (16:30 in the vid)? If so, does it make sense to say we are looking for a proposed model that is close to the saturated model? OR is the term "close" the keyword, as we don't want an overfit, but just a very good model? Or are Overfit and Saturated Models something completely different, and not related to each other?
@statquest
@statquest 4 жыл бұрын
They are related concepts, but used in very different ways. When we use a model in a machine learning context, we are worried about overfitting. In contrast, when we are using a model in a statistical analysis context, where are are trying to determine if some variable (or set of variables) is (are) related to another, the goal is to get it to fit the data as close as possible because that means our model "explains the data" and the variables are related.
@patrickbormann8103
@patrickbormann8103 4 жыл бұрын
@@statquest Ok, thanks ! Got it :-) Quest on!
@eurekam6117
@eurekam6117 3 жыл бұрын
wow this opening song is great!
@statquest
@statquest 3 жыл бұрын
:)
@bfod
@bfod Жыл бұрын
Hi Josh, amazing videos. You are a godsend for visual learners and showing the concepts behind the math. I am a little confused about why we calculate the p-value for the R^2 the way we do. We have: Null Deviance ~ ChiSq(df = num params sat - null) Residual Deviance ~ ChiSq(df = num params sat - prop) and we use these distributions to calculate the p-values for the null and residual deviance. We also have: LL R^2 = (LL_null - LL_prop) / (LL_null - LL_sat) but we calculate the p-value via: null deviance - residual deviance ~ ChiSq(df = num param prop - null model) Can it be shown that (LL_null - LL_prop) / (LL_null - LL_sat) = null deviance - residual deviance and therefore R^2 ~ ChiSq(df = num param prop - null model) or am I missing something?
@statquest
@statquest Жыл бұрын
Off the top of my head, I don't know how to respond to your question. Unfortunately you'll have to find someone else to verify it one way or the other.
@xruan6582
@xruan6582 4 жыл бұрын
(12:05) why the null deviance changed from "saturated model - ..." to "proposed model - ..." after greying out
@statquest
@statquest 4 жыл бұрын
That's just a typo.
@Han-ve8uh
@Han-ve8uh 4 жыл бұрын
This video talks a lot about using chi-sq distribution, I have a general question about how distributions come about, because it seems people can design infinitely many different experiments to generate infinitely many distributions. Is chi-sq invented to measure p-value of R2 of logistic regression? If no, is it pure coincidence that this p-value can be seen from chi-sq, or the inventors of the 2(null dev-res dev) formula saw some useful properties of chi-sq and decided to use chi-sq, and invented the null dev - res dev formula? Is this formula tied to chi-sq only, or we can put the 21.34 value at 12:40 on the x axis of another dist that is not chi-sq too? Similarly, how/why was F distribution chosen to provide p-value of R2 for linear regression models?
@statquest
@statquest 4 жыл бұрын
The people that created the statistics knew about the Chi-squared distribution in advance and saw that, given this type of data, they could create something that followed that distribution. In other words, for logistic regression, knowledge of the distributions came first and they tried to to make use of them.
@wexwexexort
@wexwexexort 4 жыл бұрын
You deserve a big bam. BAM!!!
@statquest
@statquest 4 жыл бұрын
Thanks! :)
@palsshin
@palsshin 6 жыл бұрын
You nailed it, much appreciated
@statquest
@statquest 6 жыл бұрын
Hooray! I'm glad you like this video! :)
@karannchew2534
@karannchew2534 2 жыл бұрын
For my future revision. To work out R², we need Log Likelihood of: .Saturated Model (best possible) .Null Model (most lousy) .Proposed Model (fitting) To work out the p value, we need the Deviance. Residual Deviance = Deviance of Saturated (best possible) Model vs Proposed (fitted) Model = 2 * (LL_saturated_model - LL_proposed_model) = the value for chi stats to work out the p value Is it from Log[prob saturated model / prob proposed model ]² ? Null Deviance = Deviance of Saturated (best possible) Model vs Null (lousiest) Model = 2 * (LL_saturated_model - LL_null_model) Null Deviance - Residual Deviance = Yet Another Deviance = The Chi Sq value to work out significance between Null and Proposed model. BAM!!!
@statquest
@statquest 2 жыл бұрын
YES! BAM! :)
@mahadmohamed2748
@mahadmohamed2748 3 жыл бұрын
4:11 How can the likelihood of the data given the distribution be larger than 1. I thought each specific likelihood was a probability (so
@mahadmohamed2748
@mahadmohamed2748 3 жыл бұрын
I think the above is only for logistic regression, since the y axis for the sigmoid function is a probability. Though this is not the case for the normal distribution.
@statquest
@statquest 3 жыл бұрын
Likelihood are conceptually different from probabilities and can be much larger than 1. For details, see: kzbin.info/www/bejne/porbf4aLebh5fpY
@salvatoregiordano6816
@salvatoregiordano6816 5 жыл бұрын
Great explanation. Thank you sir!
@potatopancake5259
@potatopancake5259 2 жыл бұрын
Hey Josh, thanks for this video on saturated models and deviances! I'm wondering whether I'm missing some point in your video or whether there may be some slight confusion. On the slide at 14:22 it says that the p-value computed on the slide before for the difference between NullDeviance and ResidualDeviance (NullDev - ResDev) gives a p-value for the R-squared (R2). I can follow the argument up to the point that NullDev - ResDev has a chi-square distribution, since it is in fact a classical likelihood ratio, namely the same as 2*(LL(Prop model) - LL(Null model)). However, if we want to express the R2 in terms of deviances, R2 would be (NullDev - ResDev)/NullDev, i.e. we divide by NullDev. The distribution of this is not chi-square anymore and we would get a different p-value. Would you be able to clarify this?
@statquest
@statquest 2 жыл бұрын
I'm not sure this answers your question, but the R^2 we are using in this video is McFadden's Pseudo-R^2 ( see: kzbin.info/www/bejne/b4WTqJ-BmcqqbKs ), and that is what we are calculating the p-value for. I'm not sure the formula you are using can be applied to Logistic Regression.
@potatopancake5259
@potatopancake5259 2 жыл бұрын
@@statquest Thanks a lot for your quick reply! So we want to express Mc Fadden's R^2 in terms of deviances, and we get in the numerator: LL(null) - LL(Prop) = ResDev - NullDev. In the denominator we get: LL(Null) - LL(Sat) = - NullDev. (Ignoring the factor 2, which cancels out between numerator and denominator). So in total for McFadden's R^2 we get (NullDev - ResDev)/NullDev. In the slides (at 13:28; kzbin.info/www/bejne/b4WTqJ-BmcqqbKs), the distribution of NullDev - ResDev is developed (chi-squared) and p-value computed for this quantity. And there is the statement that this p-value is the p-value for Mc Fadden's R^2 (slide at 14:28; kzbin.info/www/bejne/b4WTqJ-BmcqqbKs). However, Mc Fadden's R^2 would rather be (NullDev - ResDev)/NullDev instead of (NullDev - ResDev), as far as I can see. So, I'm confused at what happens between slide at 14:19 (kzbin.info/www/bejne/b4WTqJ-BmcqqbKs) and slide at 14:28 (kzbin.info/www/bejne/b4WTqJ-BmcqqbKs). I'm wondering if I'm missing a piece here. Any input appreciated.
@statquest
@statquest 2 жыл бұрын
@@potatopancake5259 To be honest, I'm not sure I understand your question. We are calculating two different quantities: McFadden's R-squared and it's corresponding p-value, and it appears that you seem to want to use the same equation for both. If so, I'm not sure I understand why.
@hilfskonstruktion
@hilfskonstruktion 2 жыл бұрын
@@statquest What I'm asking is whether you're really calculating the p-value for McFadden's R-squared. The way I follow your slides, I think you are calcuating the p-value for the likelihood ratio test statistics (NullDev - ResDev in your notation) instead.
@statquest
@statquest 2 жыл бұрын
@@hilfskonstruktion It's possible that I am wrong here, but I believe what I have is consistent with McFadden's original manuscript. See eqs. 29 and 30 on page 121 of this: eml.berkeley.edu/reprints/mcfadden/zarembka.pdf
@jukazyzz
@jukazyzz 4 жыл бұрын
Really nice video, like all others! However, I'm still unsure about how we can disregard the saturated model in the logistic regression to find our R^2 when that could mean that our R^2 will be larger than 1. For example, that would happen in this example that you showed. I'm probably missing something, could you please explain?
@statquest
@statquest 4 жыл бұрын
At 15:59 I show how that for logistic regression, the log(likelihood) of the saturated model = 0. Thus, it can be ignored for logistic regression.
@jukazyzz
@jukazyzz 4 жыл бұрын
I saw that yes, but the R^2 of that logistic regression would be higher than 1, if your particular example from the video. What to do then?
@statquest
@statquest 4 жыл бұрын
Why do you say that the R^2 would be > 1 in this example? LL(Null) = -4.8. LL(Proposed) = -3.3, so R^= (-4.8 - (-3.3)) / -4.8 = 0.31.
@jukazyzz
@jukazyzz 4 жыл бұрын
@@statquest Because I thought that the example from the first part of the video where you explain saturated models and deviance only is the same as the one in the later part of the video where you explain why we can get rid of the saturated models in the logistic regression. Specifically, numbers from the first example were: LL(null) = -3.51; LL(proposed) = 1.27. If these were the numbers for the logistic regression, we would get R^2 higher than 1 without using saturated model? I didn't realise that you used another example (numbers) for logistic regression because now it makes sense after your comment.
@winglau7713
@winglau7713 5 жыл бұрын
learned a lot from your lessons, thx so much. BAM!!
@karollipinski76
@karollipinski76 5 жыл бұрын
Keep it up, Josh Star-scat-mer.
@statquest
@statquest 5 жыл бұрын
You just made me laugh out loud! And you get double likes for watching this relatively obscure video. Only the best make it this far into my catalog! :)
@karollipinski76
@karollipinski76 5 жыл бұрын
@@statquest You are probably too modest. I'm trying to remedy my own obscurity here. By the way, will you discuss models more like the scream of Tarzan or Johnny Rotten?
@statquest
@statquest 5 жыл бұрын
I will do my best! :)
@shalompatole5710
@shalompatole5710 4 жыл бұрын
@Josh Starmer. First of all thanks a lot. secondly, I am confused how you got your Saturated model likelihood. You multiplied 3.3 6 times. I understand 6 is no. of parameters. What is 3.3 here?
@statquest
@statquest 4 жыл бұрын
3.3 is the likelihood. To learn more about likelihoods, check out this 'Quest: kzbin.info/www/bejne/porbf4aLebh5fpY
@shalompatole5710
@shalompatole5710 4 жыл бұрын
@@statquest I have seen that video. I was just wondering how you got a value of 3.3 for it here? I understand that likelihood is value on y-axis for a given data point. Should it not be bounded by 1 in that case?
@statquest
@statquest 4 жыл бұрын
@@shalompatole5710 Likelihoods are not probabilities, so they are not bounded by 1. All that is required for a probability distribution is for the area under the curve to be equal to 1. For example, a uniform distribution that goes from 0 to 0.5 has a maximum value of 2, and this means that the area is 1 (since 0.5 * 2 = 1, where 0.5 is the width of the rectangle and 2 is the height of the rectangle). A normal distribution with standard deviation of 0.25 has a maximum value of 1.6. (In R, you can see this with: dnorm(x=0, mean=0, sd=0.25) ).
@shalompatole5710
@shalompatole5710 4 жыл бұрын
@@statquest as usual a very crisp and clear explanation. Sigh!! how do you do it man? Thanks so much. DUnno how I would have managed without your explanations.
@monishadamodaran677
@monishadamodaran677 5 жыл бұрын
How did you calculate log-likelihoods @16:10 and @16:33???
@statquest
@statquest 5 жыл бұрын
I used the natural log. The log base 'e' is the standard log function for statistics and machine learning.
@navneetpatwari1305
@navneetpatwari1305 2 жыл бұрын
How likelihood for the saturated model brings value-3.3 at time- 4:03
@statquest
@statquest 2 жыл бұрын
The likelihood is the y-axis coordinate on each curve over the red dot. For details on likelihoods, see: kzbin.info/www/bejne/porbf4aLebh5fpY
@anand.aditya31
@anand.aditya31 2 жыл бұрын
@@statquest Hi Josh! Just to make sure I am getting the concept right, if the std dev is smaller, the likelihood value will be higher, right? And the value of likelihood at mean will depend on the std dev only? At the end, thanks a ton for making these awesome videos. People like you, Andrew N G, Salman Khan of Khan Academy are making this world so much better. ❤️ Please clarify my doubt.
@statquest
@statquest 2 жыл бұрын
@@anand.aditya31 For a normal curve, the hight of the likelihood is determined only by the standard deviation. A very narrow standard deviation, like 0.1, results in a y-axis coordinate at the mean = 4. We can see this in R with the command: > dnorm(0, sd=0.1) [1] 3.989423
@junhaowang6513
@junhaowang6513 4 жыл бұрын
3:20 The likelihood is greater than 1? Is it possible to have greater than 1 likelihood?
@statquest
@statquest 4 жыл бұрын
Yes. Likelihoods are not probabilities and are not limited to being values between 0 and 1. See: kzbin.info/www/bejne/porbf4aLebh5fpY For example, the likelihood at 0 for a normal distribution with mean = 0 and sd=0.25 is 1.6, which is > 1.
@junhaowang6513
@junhaowang6513 4 жыл бұрын
@@statquest Thank you!
@shaadakhtar5986
@shaadakhtar5986 Жыл бұрын
Why is the LL at 4:00 3.3 for each curve?
@statquest
@statquest Жыл бұрын
The likelihood is the y-axis coordinate on the curve above each point. And each curve is the same hight, so the y-axis coordinate is the same for each point. For more details, see: kzbin.info/www/bejne/porbf4aLebh5fpY and kzbin.info/www/bejne/ep-Zk2yceK6Ipq8
@shubhamshukla5093
@shubhamshukla5093 4 жыл бұрын
2:34 how do you get the likelihood of data? please explain.
@statquest
@statquest 4 жыл бұрын
The likelihood is the y-axis coordinate value. So we just plug the x-axis coordinate into the equation for the normal distribution and we get the likelihoods. For more details, see: kzbin.info/www/bejne/ep-Zk2yceK6Ipq8
@hajer3335
@hajer3335 6 жыл бұрын
If I have a model contain four parameters with three variables as like concentration, times and their interaction, is the model with just theirs interaction as a variable and contain two parameters, is it a nested model?
@statquest
@statquest 6 жыл бұрын
It could be. I'd have to see the formulas to be sure.
@hajer3335
@hajer3335 6 жыл бұрын
StatQuest with Josh Starmer yes sure, I saw this formula in a paper about biology: Y is the transformed FDI, which may depend on time t, concentration C and their interaction (i.e. Ct) to get a four parameter model Y =a+b1log(t)+b2log(c)+b3log(t)log(c) (1) If we omit the influence of the interaction between C and t, we get three-parameter model in the form: Y =a+b1log(t)+b2log(c) (2) If the independent data do not depend on C or t separately, but on the product Ct, the parameters b1 and b2 can be merged into a single parameter b and we get the two-parameter model, which we will use here, in the form: Y =a+blog(Ct)
@statquest
@statquest 6 жыл бұрын
I'll be honest, I haven't had a lot of experience with models like this. However, I would think that in order for the models to be nested, you would need the full model (the first one) to be this: y = a + b1log(Ct) + b2log(t)log(c) and the second (the reduced model) to be: y = a + b1log(Ct).
@hajer3335
@hajer3335 6 жыл бұрын
StatQuest with Josh Starmer Anyway thank you so much for your patience.🙏🏻
@vinhnguyenthi8656
@vinhnguyenthi8656 3 жыл бұрын
Thank you for this video! I have some questions after watching. How to create the curve in the proposed model and why are you choose this curve when below the 2-curve area to calculate the log-likelihood?
@vinhnguyenthi8656
@vinhnguyenthi8656 3 жыл бұрын
3:57 is 3.3 equal to the mean in the normal curve?
@statquest
@statquest 3 жыл бұрын
3.3 is the likelihood, the y-axis coordinate that corresponds to the highest point in the curve. The highest point on the curve occurs at the mean value.
@dominicj7977
@dominicj7977 5 жыл бұрын
Hi Josh. I don't understand how you got the likelihood values. Mathematically likelihood is just a probability.
@ahjiba
@ahjiba 3 жыл бұрын
Can you explain what the log likelihood based R^2 actually represents? I know that R^2 in normal linear regression just represents the strength of correlation but what is it here?
@statquest
@statquest 3 жыл бұрын
Are you asking about this? kzbin.info/www/bejne/rqmpiqWlbbaojqM
@panpiyasil790
@panpiyasil790 6 жыл бұрын
This one is difficult Do you plan to do other goodness-of-fit methods as well? I'm looking for a way to do counted R-square for conditional logistic regression in R
@statquest
@statquest 6 жыл бұрын
Are you asking if I have plans to cover other approximations of R-squared for logistic regression?
@panpiyasil790
@panpiyasil790 6 жыл бұрын
Yes
@statquest
@statquest 6 жыл бұрын
I'll add that to my "to-do" list.
@panpiyasil790
@panpiyasil790 6 жыл бұрын
Thank you. I really appreciate your work.
@TheSambita20
@TheSambita20 Жыл бұрын
Is there any pre-requisite video before watching this? FInding it little complicated to understand.
@statquest
@statquest Жыл бұрын
Well, it helps if you've stumbled over the term "saturated model" before. If not, check out this playlist, which will give you context: kzbin.info/aero/PLblh5JKOoLUKxzEP5HA2d-Li7IJkHfXSe
@scottnelson7841
@scottnelson7841 4 жыл бұрын
I want to earn my PhD in BAM!!! Is there a thesis option?
@statquest
@statquest 4 жыл бұрын
That would be awesome! :)
@PedroRibeiro-zs5go
@PedroRibeiro-zs5go 6 жыл бұрын
Thanks! The video was great as usual :)
@statquest
@statquest 6 жыл бұрын
You're welcome! :)
@hiclh4128
@hiclh4128 3 жыл бұрын
Could someone explain why it is a chi-square distribution instead of a F distribution? I am confused because I saw there's a ratio of two chi-square distribution.
@statquest
@statquest 3 жыл бұрын
That would take a whole other video, but I'll keep the topic in mind.
@panagiotisgoulas8539
@panagiotisgoulas8539 5 жыл бұрын
How do we know that mouse weight follows a normal distribution?
@statquest
@statquest 5 жыл бұрын
We can plot a histogram of mouse weights and look at it. If it looks normal, then that's a good indication that it is. We can also draw a "q-q plot" kzbin.info/www/bejne/pZzNip15obidhck and use that to determine if it is normal.
@panagiotisgoulas8539
@panagiotisgoulas8539 5 жыл бұрын
@@statquest I mean we assumed normality from these 6 data points only? Also, why would we care for normality regarding the mouse weight which is an independent variable? As far as I recall regarding assumptions in linear regression I care about normality in the residuals and the dependent variable, so I assume something similar would be required on logistic one since you basically transform y-axis in log odds one. Thanks
@statquest
@statquest 5 жыл бұрын
@@panagiotisgoulas8539 In this video, the normal distribution is just an example, that let's us illustrate the concept of how to calculate likelihoods of the null and saturated models - it's not a requirement. The concepts work with any model that we can use to calculate likelihoods. At the end of the video at 15:59, I show how we can use a squiggle to calculate likelihoods. So, don't worry too much about the normal distribution here - it's just used to illustrate the concepts.
@adenuristiqomah984
@adenuristiqomah984 4 жыл бұрын
I know it's weird but I was waiting for the jackhammer's sound XD
@statquest
@statquest 4 жыл бұрын
:)
@ehg02
@ehg02 4 жыл бұрын
Which models necessitate including the LL(saturated model) or equiv. to calculate R^2?
@statquest
@statquest 4 жыл бұрын
Logistic Regression
@ehg02
@ehg02 4 жыл бұрын
@@statquest but I thought you said the “sigmoidal plot” of the saturated model fits all the data points and hence the LL(sat. model) is zero. This, we can ignore it? Thank you.
@statquest
@statquest 4 жыл бұрын
@@ehg02 Technically, Logistic Regression needs it, but it goes away. However, there are other generalized linear models, like poisson regression, that may make more use of it.
@BeVal
@BeVal 6 жыл бұрын
Oh My! I really love this man
@statquest
@statquest 6 жыл бұрын
Hooray!!! :)
@hajer3335
@hajer3335 6 жыл бұрын
I do logistic regression in MAPLE 18. I want to learn another program to applied. I need an advice from you. How about R? I do not want to loss the time? please help me?
@statquest
@statquest 6 жыл бұрын
Today I will put up a video on how to do logistic regression in R.
@hajer3335
@hajer3335 6 жыл бұрын
Thank you so much , i'm waiting your video.
@hajer3335
@hajer3335 6 жыл бұрын
R^2=0.45, means the proposed model does not fit the data! Is this right? (I think r-squared must be very small)
@statquest
@statquest 6 жыл бұрын
It depends on what you are modeling. R^2=0.45 means that 45% of the variation in the data is explained by the model. In some fields, like human genetics, that would be awesome and would mean you have a super model. In other fields, like engineering, that would be very low and mean you have a bad model. So it all depends on what you are studying.
@justinking5964
@justinking5964 4 жыл бұрын
Hi Mr. handsome Josh. Is it possible to analysis for lottery pick 3. I have different theroy to lottery pick 3 and want to verify it.
@statquest
@statquest 4 жыл бұрын
Maybe one day - right now I have my hands full.
@ausrace
@ausrace 6 жыл бұрын
Can you explain how you calculated the likelihood of the data please? If you are multiplying the probabilities then surely they must all be less than 1?
@statquest
@statquest 6 жыл бұрын
You are correct that probabilities should all be less than one, but likelihoods are different and can be larger. The probability is the area under the curve between two points. The likelihood is the hight of the curve at a specific point. For more details, check out this StatQuest: kzbin.info/www/bejne/porbf4aLebh5fpY
@ausrace
@ausrace 6 жыл бұрын
thanks will do.
@dominicj7977
@dominicj7977 5 жыл бұрын
I don't think it should be greater than 1. I think these values are not probabilities. But likelihood is just a probability
@gamaputradanusohibien5730
@gamaputradanusohibien5730 4 жыл бұрын
why liklihood vaule can >1?? probability value >1?
@statquest
@statquest 4 жыл бұрын
Likelihood and probability are not always the same thing. Here's why: kzbin.info/www/bejne/porbf4aLebh5fpY
@bartoszszafranski8051
@bartoszszafranski8051 4 жыл бұрын
This got me confused. I've always thought of R^2 as a measure of how much better our model is at predicting the value of dependent variable given the values of features, but adding a saturated model to the equation makes it less intuitive for me. Of course, when we have the same amount of observations as dependent variables, we will maximize R^2, just because that's how math works (as they're in the same dimension you'll always be able to fit a geometric figure to all of them) but overfitting obviously is a malpractice. I don't get why we measure our proposed model's predictive ability by kinda comparing it to a saturated one (which is pretty useless).
@mahdimohammadalipour3077
@mahdimohammadalipour3077 2 жыл бұрын
Why do we use saturated model? Why do we use it as the basis for evaluating how good our model is? I can't comprehend the concept behind it! in the saturated model we consider a separate model for each point and obviously it results in overfitting doesn't it ??
@statquest
@statquest 2 жыл бұрын
The Saturated model simply provides an upper bound for how well a model fits the data and we can use that upper bound to compare and contrast the model we are interested in using.
@duynguyennhu5677
@duynguyennhu5677 Жыл бұрын
can somebody explain to me how the likelihood of data is calculated
@statquest
@statquest Жыл бұрын
Sure - see: kzbin.info/www/bejne/porbf4aLebh5fpY and kzbin.info/www/bejne/pmS3XpKCgtepeMU
@dr.kingschultz
@dr.kingschultz 2 жыл бұрын
your explanation 6:20 is very confusing
@statquest
@statquest 2 жыл бұрын
Are you familiar with R-squared? If not, see kzbin.info/www/bejne/aHK0fKCtZpmgfq8
@DrewAlexandros
@DrewAlexandros 2 ай бұрын
@statquest
@statquest 2 ай бұрын
:)
@emonreturns7811
@emonreturns7811 5 жыл бұрын
did you really do that with your mouth??????
@jayjayf9699
@jayjayf9699 5 жыл бұрын
I have no clue what u talking about
@statquest
@statquest 5 жыл бұрын
Bummer.
@jayjayf9699
@jayjayf9699 5 жыл бұрын
@@statquest how did u calculate the null model? Or the two parameter model (fancier model)?
@statquest
@statquest 5 жыл бұрын
For details of how to fit models to data, you should either watch my series of videos on Linear Models (i.e. linear regression) and my series of videos on Logistic Regression. You can find links to these videos on my website here: statquest.org/video-index/ Once you have those concepts down, this video will make a lot more sense.
@jayjayf9699
@jayjayf9699 5 жыл бұрын
@@statquest OK I'll check it out, I've already got the concept of linear regression down, Inusing inference on the slope estimate, sum of squares ect
@statquest
@statquest 5 жыл бұрын
If you really understand linear regression well, then you should know how to fit a one parameter model to data and you should know how to fit a 2 (or more) parameter model to data. So that you should answer your original question.
Deviance Residuals
6:18
StatQuest with Josh Starmer
Рет қаралды 90 М.
Logistic Regression in R, Clearly Explained!!!!
17:15
StatQuest with Josh Starmer
Рет қаралды 527 М.
SIZE DOESN’T MATTER @benjaminjiujitsu
00:46
Natan por Aí
Рет қаралды 8 МЛН
How to Fight a Gross Man 😡
00:19
Alan Chikin Chow
Рет қаралды 20 МЛН
p-values: What they are and how to interpret them
11:21
StatQuest with Josh Starmer
Рет қаралды 1,1 МЛН
Multinomial logistic regression | softmax regression | explained
15:03
Multivariable Logistic Regression in R: The Ultimate Masterclass (4K)!
18:14
yuzaR Data Science
Рет қаралды 4,5 М.
Logistic Regression Details Pt1: Coefficients
19:02
StatQuest with Josh Starmer
Рет қаралды 934 М.
Regression with Count Data: Poisson and Negative Binomial
19:36
Matthew E. Clapham
Рет қаралды 62 М.
Maximum Likelihood, clearly explained!!!
6:12
StatQuest with Josh Starmer
Рет қаралды 1,4 МЛН
17 Probabilistic Graphical Models and Bayesian Networks
30:03
Bert Huang
Рет қаралды 97 М.
Multinominal logistic regression, Part 1: Introduction
17:57
National Centre for Research Methods (NCRM)
Рет қаралды 39 М.
SIZE DOESN’T MATTER @benjaminjiujitsu
00:46
Natan por Aí
Рет қаралды 8 МЛН