NOTE: In statistics, machine learning and most programming languages, the default base for the log() function is 'e'. In other words, when I write, "log()", I mean "natural log()", or "ln()". Thus, the log to the base 'e' of 2.717 = 1. ALSO NOTE: This video is about Saturated Models and Deviance as applied to Logistic Regression (and not ordinary linear regression). Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
@falaksingla62422 жыл бұрын
Hi Josh, Love your content. Has helped me to learn a lot & grow. You are doing an awesome work. Please continue to do so. Wanted to support you but unfortunately your Paypal link seems to be dysfunctional. Please update it.
@MilaBear4 жыл бұрын
You know you've been watching too many StatQuest videos when you say "double bam" before he does.
@statquest4 жыл бұрын
You made me laugh out loud! Funny. :)
@mili32124 жыл бұрын
dude wtf. why have I just discovered your channel? you have a gift to make things as simple as possible
@statquest4 жыл бұрын
Thanks! :)
@spartan9729 Жыл бұрын
bro you are still 2 years earlier than me. I am discovering this right now when almost everyone is a pro in machine learning. 😭😭😭
@amarnathmishra86973 жыл бұрын
I love the weird tunes you make during any sort of calculations.Besides needless to say your teaching is amazing!!!
@statquest3 жыл бұрын
Wow, thank you!
@mariaelisaperesoliveira44195 жыл бұрын
OMG HAHAHAHA You make statistics seem so easy and simple!!! I looooove this channel
@statquest5 жыл бұрын
Awesome! I'm impressed. Not many people watch this video because they don't want the details, but you've gone all the way. You deserve a prize! :)
@AndrewCarlson0055 жыл бұрын
10:17 BOOP BOOP BEEP BEEP BOOP BOOP BEEP BEEP BOOP BOOP BEEP BEEP HAHAHA had me dying
@statquest5 жыл бұрын
:) That's the sound of plugging in numbers. :)
@ehg024 жыл бұрын
@@statquest can you make a song about that hahaha
@taotaotan56714 жыл бұрын
Another exciting Statquest! Thank you Josh
@statquest4 жыл бұрын
Bam! :)
@muhammedhadedy45703 жыл бұрын
The best KZbin channel ever existed.
@statquest3 жыл бұрын
Thanks!
@mutonchops15 жыл бұрын
Why on Earth did I not have you giving this lecture in my course - I actually understood it! Thanks!
@inamahdi7959 Жыл бұрын
I have been listening to your lectures as a review on my commute and I am mesmerized. I never made the connection between LRT and these methods.bThey are one in the same! just re-written differently. or where the negative or the product of 2 come from.
@statquest Жыл бұрын
Glad you are enjoying the videos and making some deep connnections.
@cfalguiere5 жыл бұрын
Here for the song :-) and because it was an unknown piece of jargon to me. Jargon clarified... BAM!
@statquest5 жыл бұрын
Awesome!!!! Thank you very much. :)
@hgupta323233 жыл бұрын
When you are calculating the likelihood of each of the null (2:39). proposed (3:18), and saturated (3:57) models, where do those raw values you are multiplying together come from? To me, it looks a bit like you have drawn several probability density functions (PDF) and looking at the value of a given PDF for each of the points. However, I know my interpretation is wrong because the values are often greater than 1, thus I am confused as to what those values are?
@statquest3 жыл бұрын
You interpretation is actually correct. I drew PDF functions and I use the y-axis value above each point the x-axis. The y-axis values are likelihoods. Likelihoods are not probabilities, and thus, can be larger than 1. For example, if you draw a normal curve with mean = 0 and sd = 0.1, you will see many y-axis coordinates on the curve that are > 1. For more details on the difference between likelihoods and probabilities, see: kzbin.info/www/bejne/porbf4aLebh5fpY
@craigmauz5 жыл бұрын
@2:34 How did you calculate the LL of the data?
@taotaotan56714 жыл бұрын
It's surprising to see t-test, linear model and anova roughly fit the framework of MLE, and likelihood ratio test. BAMMMMM
@taotaotan56714 жыл бұрын
Wait... I want to confirm if that is correct. I tried to manually calculate the p-value from t test (equal variance), and manually calculate the log likelihood ratio and use chi-square to resolve the p-value. But it seems there is a small difference (second decimal). Is that because of the bias estimation of variance in MLE? It is really cool to resolve the same question from two very distinct frameworks (t test and likelihood ratio test). triple BAM!
@statquest4 жыл бұрын
I'm not really sure what causes the small difference, but I'm glad that it was small! That's a very cool thing to try out.
Hey Josh! Again, thanks for your awesome work as always! In this specific case, I got into more questions than answers, but after trying to find answers on my own I assume it's a conscious decision to not overly complicate things. Namely, you say that we "-2" in the log likelihood ratio is what makes it be chi-squared distributed. The origin of this seems to be the fairly complicate derivation in Wilks (1937), so I assume there is no quick intuitive way to answer why is this chi-squared distributed (i.e. in 13:42, which is the standard normal variable that's being squared?). In the same line, about why are we working with the log of the likelihoods, is it because the saturated likelihood is obtained by maximization and thus it is easier to work with the logs? If that's the case, I'm not clear on what dictates which densities are used for the likelihood computation. As you say, you used normal distribution here for illustration purposes, but with regards to the actual computation of the likelihoods, what distributions are used by the glm function for example? In logistic regression I would first say that it's a binomial but that's only conditional on the x_is and would demand one parameter per observation. And last but not least, the most simple question: why are Deviances and Likelihoods required in GLM in contrast to straight Sum of Squares and Residuals? Again, thanks a lot for your time!
@Gengar992 жыл бұрын
I loved your video, super no-brainer examples, thank you.
@statquest2 жыл бұрын
Thank you!
@mahadmohamed27483 жыл бұрын
15:26 Why does multiplying the log likelihood's by 2 give a chi-sqaured distribution with degrees of freedom equal to the difference in the number of parameters of the proposed model and the null model.
@statquest3 жыл бұрын
That would require a whole other StatQuest to explain.
@snehashishpaul27406 жыл бұрын
Love your Bams and oan ouannn as much as your teaching. :)
@statquest6 жыл бұрын
Thank you!!! :)
@suyashmishra88212 жыл бұрын
sir U r great. U r Ultimate. U r dangerous. lots of love from a Data Scientist 🥰🥰🥰
@statquest2 жыл бұрын
Thanks!
@dominicj79775 жыл бұрын
How did you calculate the likelihood of univariate data?
@bobo06125 жыл бұрын
I am wondering the same as yours. How can the likelihood bigger than 1...
@mktsp23 жыл бұрын
@@bobo0612 a normal density with small std rises above 1..
@Maha_s19996 жыл бұрын
That's right Josh- here just for the song 😂 Seriously - your videos are super helpful!
@statquest6 жыл бұрын
Thanks so much!!! Wow, you're even watching the video on Saturated Models! I like this one, but it's not very popular.
@Maha_s19996 жыл бұрын
You are welcome! I am a graduate stat student and now studying generalised linear models (binomial, ordinal/nominal and count models) so any other videos like this one will be really useful. If I find myself watching more videos from your channel, I would definitely like to support you like I do with other channels I am an avid follower of. I saw that you prefer your subscribers buy your music - do you have a Patreon account for regular donations?
@aojing10 ай бұрын
@9:37 Hi Josh, can you explain why the Residual Deviance is defined as this form and what is the connection to chi-square test, or provide a reference link? Thanks.
@statquest10 ай бұрын
Unfortunately I can't explain it, other than the definition allows the values to be modeled by a chi-square distribution.
@jic38975 жыл бұрын
17:06 Can someone explain why the likelihoods will be between 0 and 1? I thought likelihoods are greater than 0 but can be positive infinite.
@statquest5 жыл бұрын
The value of the likelihoods depends on the distribution we are using. For example, with a normal distribution, the likelihoods can be much larger than 1. However, for the sigmoidal function that has a range from 0 to 1 that is used with Logistic Regression, the maximum value is 1. In other words, the likelihood is the vertical distance from a point on the x-axis to a point on the sigmoidal curve. Since that curve only goes to up to 1, then the maximum value is 1.
@Han-ve8uh4 жыл бұрын
How is the saturated model at 16:19 created? I thought lines can only go upwards from left to right, but there is a line going downwards from 1st blue to 3rd red dot. Also how are there 3 line segments? Are they created from 3 logistic regression models, with first 2 red dot + 1st blue dot creating 1st model, 1st blue dot + 3rd red dot creating 2nd model, 3rd red dot + last 3 blue dot creating 3rd model?
@statquest4 жыл бұрын
The saturated model is just a model that fits the data perfectly. It does not have to follow rules like "must always increase" etc., because if it did, it would not be able to fit the data perfectly.
@xruan65824 жыл бұрын
(2:50) can anyone explain why the probabilities 1.5, 2.5 are greater than 1 ?
@statquest4 жыл бұрын
Although we commonly treat the words "likelihood" and "probability" as if they mean the same thing, in statistics they are different. What you're seeing at 2:50 are likelihoods, not probabilities. I explain the difference in this video: kzbin.info/www/bejne/porbf4aLebh5fpY
@kimicheng56114 жыл бұрын
@@statquest Could you please do an example that likelihood is greater than 1? I still feel confused how do you get 1.5 2.5 after watching the Probability vs Likelihood video.
@statquest4 жыл бұрын
@@kimicheng5611 Plot a normal distribution with mean = 0 and standard deviation = 0.1. The maximum y-axis value is 4.
@franzvonmoor61454 жыл бұрын
@@statquest Hey Josh, hey everyone, Your videos a totes useful, thanks so much! I want to agree with other viewers here, that the Likelihoods above 1 in this video do cause quite a bit of confusion, since in most other videos, you explain likelihoods with values on the scale of [0, 1]. Hence, I first understood it that way that likelihoods are always between zero and one and respectively, log(Likelihoods) are always below or equal to zero. Looking it up on StackExchange I found a good helpful answer (stats.stackexchange.com/questions/4220/can-a-probability-distribution-value-exceeding-1-be-ok). I believe not to have totally grasped it yet, and hence can only humbly recommend you bridging this gap by updating the video where you explain likelihoods v probabilites and explain why and when likelihoods can be above 1. Best regards from Germany!
@jakobmeryn613 жыл бұрын
@@franzvonmoor6145 I still struggle with this concept. Would be great to have a new video.
@TheFacial832 жыл бұрын
Do you have video that about multinomial logistic regression . your video is the best in stat thanks a lot
@statquest2 жыл бұрын
Unfortunately I don't
@alanzhu75384 жыл бұрын
2:37 How do you come up with the numbers for the likelihood of the data?
@statquest4 жыл бұрын
The likelihoods are the y-axis coordinates on the normal curve. For more details, see: kzbin.info/www/bejne/porbf4aLebh5fpY
@alanzhu75384 жыл бұрын
StatQuest with Josh Starmer That is clear, thank you!
@kartikeyasharma40172 жыл бұрын
At 14:31, we have the p-value =0.002 which means that the proposed and the null model are statistically different but can't we infer that the result is by chance (connecting it with the root definition of p-value)
@statquest2 жыл бұрын
I probably should have been more careful with my wording at that point. I assumed that because the p-value was less than < 0.05 that we have rejected the null hypothesis that the data are due to random chance. However, you are correct, the p-value 0.002 tells us that the probability of random chance giving us the observed data or something rarer is 0.002.
@stefanopalliggiano65963 жыл бұрын
Question: when you calculate the likelihood based on two means at 03:20, how do you know which distribution to use? The arrows intersect both distributions, so for each data point we should multiply two likelihoods? Could you please clarify?
@statquest3 жыл бұрын
The data are assigned to one of the two distributions. That's the whole idea of the "fancy" model - that some observations come from the distribution on the left and others come from the distribution on the right. So, given those assignments (which you make as part of creating the "fancy" model), you can calculate likelihoods.
@stefanopalliggiano65963 жыл бұрын
@@statquest I see now, thank you for the reply!
@patrickbormann81034 жыл бұрын
Is the Saturated Model always a model thats overfitted in machine learning lingus (16:30 in the vid)? If so, does it make sense to say we are looking for a proposed model that is close to the saturated model? OR is the term "close" the keyword, as we don't want an overfit, but just a very good model? Or are Overfit and Saturated Models something completely different, and not related to each other?
@statquest4 жыл бұрын
They are related concepts, but used in very different ways. When we use a model in a machine learning context, we are worried about overfitting. In contrast, when we are using a model in a statistical analysis context, where are are trying to determine if some variable (or set of variables) is (are) related to another, the goal is to get it to fit the data as close as possible because that means our model "explains the data" and the variables are related.
@patrickbormann81034 жыл бұрын
@@statquest Ok, thanks ! Got it :-) Quest on!
@eurekam61173 жыл бұрын
wow this opening song is great!
@statquest3 жыл бұрын
:)
@bfod Жыл бұрын
Hi Josh, amazing videos. You are a godsend for visual learners and showing the concepts behind the math. I am a little confused about why we calculate the p-value for the R^2 the way we do. We have: Null Deviance ~ ChiSq(df = num params sat - null) Residual Deviance ~ ChiSq(df = num params sat - prop) and we use these distributions to calculate the p-values for the null and residual deviance. We also have: LL R^2 = (LL_null - LL_prop) / (LL_null - LL_sat) but we calculate the p-value via: null deviance - residual deviance ~ ChiSq(df = num param prop - null model) Can it be shown that (LL_null - LL_prop) / (LL_null - LL_sat) = null deviance - residual deviance and therefore R^2 ~ ChiSq(df = num param prop - null model) or am I missing something?
@statquest Жыл бұрын
Off the top of my head, I don't know how to respond to your question. Unfortunately you'll have to find someone else to verify it one way or the other.
@xruan65824 жыл бұрын
(12:05) why the null deviance changed from "saturated model - ..." to "proposed model - ..." after greying out
@statquest4 жыл бұрын
That's just a typo.
@Han-ve8uh4 жыл бұрын
This video talks a lot about using chi-sq distribution, I have a general question about how distributions come about, because it seems people can design infinitely many different experiments to generate infinitely many distributions. Is chi-sq invented to measure p-value of R2 of logistic regression? If no, is it pure coincidence that this p-value can be seen from chi-sq, or the inventors of the 2(null dev-res dev) formula saw some useful properties of chi-sq and decided to use chi-sq, and invented the null dev - res dev formula? Is this formula tied to chi-sq only, or we can put the 21.34 value at 12:40 on the x axis of another dist that is not chi-sq too? Similarly, how/why was F distribution chosen to provide p-value of R2 for linear regression models?
@statquest4 жыл бұрын
The people that created the statistics knew about the Chi-squared distribution in advance and saw that, given this type of data, they could create something that followed that distribution. In other words, for logistic regression, knowledge of the distributions came first and they tried to to make use of them.
@wexwexexort4 жыл бұрын
You deserve a big bam. BAM!!!
@statquest4 жыл бұрын
Thanks! :)
@palsshin6 жыл бұрын
You nailed it, much appreciated
@statquest6 жыл бұрын
Hooray! I'm glad you like this video! :)
@karannchew25342 жыл бұрын
For my future revision. To work out R², we need Log Likelihood of: .Saturated Model (best possible) .Null Model (most lousy) .Proposed Model (fitting) To work out the p value, we need the Deviance. Residual Deviance = Deviance of Saturated (best possible) Model vs Proposed (fitted) Model = 2 * (LL_saturated_model - LL_proposed_model) = the value for chi stats to work out the p value Is it from Log[prob saturated model / prob proposed model ]² ? Null Deviance = Deviance of Saturated (best possible) Model vs Null (lousiest) Model = 2 * (LL_saturated_model - LL_null_model) Null Deviance - Residual Deviance = Yet Another Deviance = The Chi Sq value to work out significance between Null and Proposed model. BAM!!!
@statquest2 жыл бұрын
YES! BAM! :)
@mahadmohamed27483 жыл бұрын
4:11 How can the likelihood of the data given the distribution be larger than 1. I thought each specific likelihood was a probability (so
@mahadmohamed27483 жыл бұрын
I think the above is only for logistic regression, since the y axis for the sigmoid function is a probability. Though this is not the case for the normal distribution.
@statquest3 жыл бұрын
Likelihood are conceptually different from probabilities and can be much larger than 1. For details, see: kzbin.info/www/bejne/porbf4aLebh5fpY
@salvatoregiordano68165 жыл бұрын
Great explanation. Thank you sir!
@potatopancake52592 жыл бұрын
Hey Josh, thanks for this video on saturated models and deviances! I'm wondering whether I'm missing some point in your video or whether there may be some slight confusion. On the slide at 14:22 it says that the p-value computed on the slide before for the difference between NullDeviance and ResidualDeviance (NullDev - ResDev) gives a p-value for the R-squared (R2). I can follow the argument up to the point that NullDev - ResDev has a chi-square distribution, since it is in fact a classical likelihood ratio, namely the same as 2*(LL(Prop model) - LL(Null model)). However, if we want to express the R2 in terms of deviances, R2 would be (NullDev - ResDev)/NullDev, i.e. we divide by NullDev. The distribution of this is not chi-square anymore and we would get a different p-value. Would you be able to clarify this?
@statquest2 жыл бұрын
I'm not sure this answers your question, but the R^2 we are using in this video is McFadden's Pseudo-R^2 ( see: kzbin.info/www/bejne/b4WTqJ-BmcqqbKs ), and that is what we are calculating the p-value for. I'm not sure the formula you are using can be applied to Logistic Regression.
@potatopancake52592 жыл бұрын
@@statquest Thanks a lot for your quick reply! So we want to express Mc Fadden's R^2 in terms of deviances, and we get in the numerator: LL(null) - LL(Prop) = ResDev - NullDev. In the denominator we get: LL(Null) - LL(Sat) = - NullDev. (Ignoring the factor 2, which cancels out between numerator and denominator). So in total for McFadden's R^2 we get (NullDev - ResDev)/NullDev. In the slides (at 13:28; kzbin.info/www/bejne/b4WTqJ-BmcqqbKs), the distribution of NullDev - ResDev is developed (chi-squared) and p-value computed for this quantity. And there is the statement that this p-value is the p-value for Mc Fadden's R^2 (slide at 14:28; kzbin.info/www/bejne/b4WTqJ-BmcqqbKs). However, Mc Fadden's R^2 would rather be (NullDev - ResDev)/NullDev instead of (NullDev - ResDev), as far as I can see. So, I'm confused at what happens between slide at 14:19 (kzbin.info/www/bejne/b4WTqJ-BmcqqbKs) and slide at 14:28 (kzbin.info/www/bejne/b4WTqJ-BmcqqbKs). I'm wondering if I'm missing a piece here. Any input appreciated.
@statquest2 жыл бұрын
@@potatopancake5259 To be honest, I'm not sure I understand your question. We are calculating two different quantities: McFadden's R-squared and it's corresponding p-value, and it appears that you seem to want to use the same equation for both. If so, I'm not sure I understand why.
@hilfskonstruktion2 жыл бұрын
@@statquest What I'm asking is whether you're really calculating the p-value for McFadden's R-squared. The way I follow your slides, I think you are calcuating the p-value for the likelihood ratio test statistics (NullDev - ResDev in your notation) instead.
@statquest2 жыл бұрын
@@hilfskonstruktion It's possible that I am wrong here, but I believe what I have is consistent with McFadden's original manuscript. See eqs. 29 and 30 on page 121 of this: eml.berkeley.edu/reprints/mcfadden/zarembka.pdf
@jukazyzz4 жыл бұрын
Really nice video, like all others! However, I'm still unsure about how we can disregard the saturated model in the logistic regression to find our R^2 when that could mean that our R^2 will be larger than 1. For example, that would happen in this example that you showed. I'm probably missing something, could you please explain?
@statquest4 жыл бұрын
At 15:59 I show how that for logistic regression, the log(likelihood) of the saturated model = 0. Thus, it can be ignored for logistic regression.
@jukazyzz4 жыл бұрын
I saw that yes, but the R^2 of that logistic regression would be higher than 1, if your particular example from the video. What to do then?
@statquest4 жыл бұрын
Why do you say that the R^2 would be > 1 in this example? LL(Null) = -4.8. LL(Proposed) = -3.3, so R^= (-4.8 - (-3.3)) / -4.8 = 0.31.
@jukazyzz4 жыл бұрын
@@statquest Because I thought that the example from the first part of the video where you explain saturated models and deviance only is the same as the one in the later part of the video where you explain why we can get rid of the saturated models in the logistic regression. Specifically, numbers from the first example were: LL(null) = -3.51; LL(proposed) = 1.27. If these were the numbers for the logistic regression, we would get R^2 higher than 1 without using saturated model? I didn't realise that you used another example (numbers) for logistic regression because now it makes sense after your comment.
@winglau77135 жыл бұрын
learned a lot from your lessons, thx so much. BAM!!
@karollipinski765 жыл бұрын
Keep it up, Josh Star-scat-mer.
@statquest5 жыл бұрын
You just made me laugh out loud! And you get double likes for watching this relatively obscure video. Only the best make it this far into my catalog! :)
@karollipinski765 жыл бұрын
@@statquest You are probably too modest. I'm trying to remedy my own obscurity here. By the way, will you discuss models more like the scream of Tarzan or Johnny Rotten?
@statquest5 жыл бұрын
I will do my best! :)
@shalompatole57104 жыл бұрын
@Josh Starmer. First of all thanks a lot. secondly, I am confused how you got your Saturated model likelihood. You multiplied 3.3 6 times. I understand 6 is no. of parameters. What is 3.3 here?
@statquest4 жыл бұрын
3.3 is the likelihood. To learn more about likelihoods, check out this 'Quest: kzbin.info/www/bejne/porbf4aLebh5fpY
@shalompatole57104 жыл бұрын
@@statquest I have seen that video. I was just wondering how you got a value of 3.3 for it here? I understand that likelihood is value on y-axis for a given data point. Should it not be bounded by 1 in that case?
@statquest4 жыл бұрын
@@shalompatole5710 Likelihoods are not probabilities, so they are not bounded by 1. All that is required for a probability distribution is for the area under the curve to be equal to 1. For example, a uniform distribution that goes from 0 to 0.5 has a maximum value of 2, and this means that the area is 1 (since 0.5 * 2 = 1, where 0.5 is the width of the rectangle and 2 is the height of the rectangle). A normal distribution with standard deviation of 0.25 has a maximum value of 1.6. (In R, you can see this with: dnorm(x=0, mean=0, sd=0.25) ).
@shalompatole57104 жыл бұрын
@@statquest as usual a very crisp and clear explanation. Sigh!! how do you do it man? Thanks so much. DUnno how I would have managed without your explanations.
@monishadamodaran6775 жыл бұрын
How did you calculate log-likelihoods @16:10 and @16:33???
@statquest5 жыл бұрын
I used the natural log. The log base 'e' is the standard log function for statistics and machine learning.
@navneetpatwari13052 жыл бұрын
How likelihood for the saturated model brings value-3.3 at time- 4:03
@statquest2 жыл бұрын
The likelihood is the y-axis coordinate on each curve over the red dot. For details on likelihoods, see: kzbin.info/www/bejne/porbf4aLebh5fpY
@anand.aditya312 жыл бұрын
@@statquest Hi Josh! Just to make sure I am getting the concept right, if the std dev is smaller, the likelihood value will be higher, right? And the value of likelihood at mean will depend on the std dev only? At the end, thanks a ton for making these awesome videos. People like you, Andrew N G, Salman Khan of Khan Academy are making this world so much better. ❤️ Please clarify my doubt.
@statquest2 жыл бұрын
@@anand.aditya31 For a normal curve, the hight of the likelihood is determined only by the standard deviation. A very narrow standard deviation, like 0.1, results in a y-axis coordinate at the mean = 4. We can see this in R with the command: > dnorm(0, sd=0.1) [1] 3.989423
@junhaowang65134 жыл бұрын
3:20 The likelihood is greater than 1? Is it possible to have greater than 1 likelihood?
@statquest4 жыл бұрын
Yes. Likelihoods are not probabilities and are not limited to being values between 0 and 1. See: kzbin.info/www/bejne/porbf4aLebh5fpY For example, the likelihood at 0 for a normal distribution with mean = 0 and sd=0.25 is 1.6, which is > 1.
@junhaowang65134 жыл бұрын
@@statquest Thank you!
@shaadakhtar5986 Жыл бұрын
Why is the LL at 4:00 3.3 for each curve?
@statquest Жыл бұрын
The likelihood is the y-axis coordinate on the curve above each point. And each curve is the same hight, so the y-axis coordinate is the same for each point. For more details, see: kzbin.info/www/bejne/porbf4aLebh5fpY and kzbin.info/www/bejne/ep-Zk2yceK6Ipq8
@shubhamshukla50934 жыл бұрын
2:34 how do you get the likelihood of data? please explain.
@statquest4 жыл бұрын
The likelihood is the y-axis coordinate value. So we just plug the x-axis coordinate into the equation for the normal distribution and we get the likelihoods. For more details, see: kzbin.info/www/bejne/ep-Zk2yceK6Ipq8
@hajer33356 жыл бұрын
If I have a model contain four parameters with three variables as like concentration, times and their interaction, is the model with just theirs interaction as a variable and contain two parameters, is it a nested model?
@statquest6 жыл бұрын
It could be. I'd have to see the formulas to be sure.
@hajer33356 жыл бұрын
StatQuest with Josh Starmer yes sure, I saw this formula in a paper about biology: Y is the transformed FDI, which may depend on time t, concentration C and their interaction (i.e. Ct) to get a four parameter model Y =a+b1log(t)+b2log(c)+b3log(t)log(c) (1) If we omit the influence of the interaction between C and t, we get three-parameter model in the form: Y =a+b1log(t)+b2log(c) (2) If the independent data do not depend on C or t separately, but on the product Ct, the parameters b1 and b2 can be merged into a single parameter b and we get the two-parameter model, which we will use here, in the form: Y =a+blog(Ct)
@statquest6 жыл бұрын
I'll be honest, I haven't had a lot of experience with models like this. However, I would think that in order for the models to be nested, you would need the full model (the first one) to be this: y = a + b1log(Ct) + b2log(t)log(c) and the second (the reduced model) to be: y = a + b1log(Ct).
@hajer33356 жыл бұрын
StatQuest with Josh Starmer Anyway thank you so much for your patience.🙏🏻
@vinhnguyenthi86563 жыл бұрын
Thank you for this video! I have some questions after watching. How to create the curve in the proposed model and why are you choose this curve when below the 2-curve area to calculate the log-likelihood?
@vinhnguyenthi86563 жыл бұрын
3:57 is 3.3 equal to the mean in the normal curve?
@statquest3 жыл бұрын
3.3 is the likelihood, the y-axis coordinate that corresponds to the highest point in the curve. The highest point on the curve occurs at the mean value.
@dominicj79775 жыл бұрын
Hi Josh. I don't understand how you got the likelihood values. Mathematically likelihood is just a probability.
@ahjiba3 жыл бұрын
Can you explain what the log likelihood based R^2 actually represents? I know that R^2 in normal linear regression just represents the strength of correlation but what is it here?
@statquest3 жыл бұрын
Are you asking about this? kzbin.info/www/bejne/rqmpiqWlbbaojqM
@panpiyasil7906 жыл бұрын
This one is difficult Do you plan to do other goodness-of-fit methods as well? I'm looking for a way to do counted R-square for conditional logistic regression in R
@statquest6 жыл бұрын
Are you asking if I have plans to cover other approximations of R-squared for logistic regression?
@panpiyasil7906 жыл бұрын
Yes
@statquest6 жыл бұрын
I'll add that to my "to-do" list.
@panpiyasil7906 жыл бұрын
Thank you. I really appreciate your work.
@TheSambita20 Жыл бұрын
Is there any pre-requisite video before watching this? FInding it little complicated to understand.
@statquest Жыл бұрын
Well, it helps if you've stumbled over the term "saturated model" before. If not, check out this playlist, which will give you context: kzbin.info/aero/PLblh5JKOoLUKxzEP5HA2d-Li7IJkHfXSe
@scottnelson78414 жыл бұрын
I want to earn my PhD in BAM!!! Is there a thesis option?
@statquest4 жыл бұрын
That would be awesome! :)
@PedroRibeiro-zs5go6 жыл бұрын
Thanks! The video was great as usual :)
@statquest6 жыл бұрын
You're welcome! :)
@hiclh41283 жыл бұрын
Could someone explain why it is a chi-square distribution instead of a F distribution? I am confused because I saw there's a ratio of two chi-square distribution.
@statquest3 жыл бұрын
That would take a whole other video, but I'll keep the topic in mind.
@panagiotisgoulas85395 жыл бұрын
How do we know that mouse weight follows a normal distribution?
@statquest5 жыл бұрын
We can plot a histogram of mouse weights and look at it. If it looks normal, then that's a good indication that it is. We can also draw a "q-q plot" kzbin.info/www/bejne/pZzNip15obidhck and use that to determine if it is normal.
@panagiotisgoulas85395 жыл бұрын
@@statquest I mean we assumed normality from these 6 data points only? Also, why would we care for normality regarding the mouse weight which is an independent variable? As far as I recall regarding assumptions in linear regression I care about normality in the residuals and the dependent variable, so I assume something similar would be required on logistic one since you basically transform y-axis in log odds one. Thanks
@statquest5 жыл бұрын
@@panagiotisgoulas8539 In this video, the normal distribution is just an example, that let's us illustrate the concept of how to calculate likelihoods of the null and saturated models - it's not a requirement. The concepts work with any model that we can use to calculate likelihoods. At the end of the video at 15:59, I show how we can use a squiggle to calculate likelihoods. So, don't worry too much about the normal distribution here - it's just used to illustrate the concepts.
@adenuristiqomah9844 жыл бұрын
I know it's weird but I was waiting for the jackhammer's sound XD
@statquest4 жыл бұрын
:)
@ehg024 жыл бұрын
Which models necessitate including the LL(saturated model) or equiv. to calculate R^2?
@statquest4 жыл бұрын
Logistic Regression
@ehg024 жыл бұрын
@@statquest but I thought you said the “sigmoidal plot” of the saturated model fits all the data points and hence the LL(sat. model) is zero. This, we can ignore it? Thank you.
@statquest4 жыл бұрын
@@ehg02 Technically, Logistic Regression needs it, but it goes away. However, there are other generalized linear models, like poisson regression, that may make more use of it.
@BeVal6 жыл бұрын
Oh My! I really love this man
@statquest6 жыл бұрын
Hooray!!! :)
@hajer33356 жыл бұрын
I do logistic regression in MAPLE 18. I want to learn another program to applied. I need an advice from you. How about R? I do not want to loss the time? please help me?
@statquest6 жыл бұрын
Today I will put up a video on how to do logistic regression in R.
@hajer33356 жыл бұрын
Thank you so much , i'm waiting your video.
@hajer33356 жыл бұрын
R^2=0.45, means the proposed model does not fit the data! Is this right? (I think r-squared must be very small)
@statquest6 жыл бұрын
It depends on what you are modeling. R^2=0.45 means that 45% of the variation in the data is explained by the model. In some fields, like human genetics, that would be awesome and would mean you have a super model. In other fields, like engineering, that would be very low and mean you have a bad model. So it all depends on what you are studying.
@justinking59644 жыл бұрын
Hi Mr. handsome Josh. Is it possible to analysis for lottery pick 3. I have different theroy to lottery pick 3 and want to verify it.
@statquest4 жыл бұрын
Maybe one day - right now I have my hands full.
@ausrace6 жыл бұрын
Can you explain how you calculated the likelihood of the data please? If you are multiplying the probabilities then surely they must all be less than 1?
@statquest6 жыл бұрын
You are correct that probabilities should all be less than one, but likelihoods are different and can be larger. The probability is the area under the curve between two points. The likelihood is the hight of the curve at a specific point. For more details, check out this StatQuest: kzbin.info/www/bejne/porbf4aLebh5fpY
@ausrace6 жыл бұрын
thanks will do.
@dominicj79775 жыл бұрын
I don't think it should be greater than 1. I think these values are not probabilities. But likelihood is just a probability
@gamaputradanusohibien57304 жыл бұрын
why liklihood vaule can >1?? probability value >1?
@statquest4 жыл бұрын
Likelihood and probability are not always the same thing. Here's why: kzbin.info/www/bejne/porbf4aLebh5fpY
@bartoszszafranski80514 жыл бұрын
This got me confused. I've always thought of R^2 as a measure of how much better our model is at predicting the value of dependent variable given the values of features, but adding a saturated model to the equation makes it less intuitive for me. Of course, when we have the same amount of observations as dependent variables, we will maximize R^2, just because that's how math works (as they're in the same dimension you'll always be able to fit a geometric figure to all of them) but overfitting obviously is a malpractice. I don't get why we measure our proposed model's predictive ability by kinda comparing it to a saturated one (which is pretty useless).
@mahdimohammadalipour30772 жыл бұрын
Why do we use saturated model? Why do we use it as the basis for evaluating how good our model is? I can't comprehend the concept behind it! in the saturated model we consider a separate model for each point and obviously it results in overfitting doesn't it ??
@statquest2 жыл бұрын
The Saturated model simply provides an upper bound for how well a model fits the data and we can use that upper bound to compare and contrast the model we are interested in using.
@duynguyennhu5677 Жыл бұрын
can somebody explain to me how the likelihood of data is calculated
@statquest Жыл бұрын
Sure - see: kzbin.info/www/bejne/porbf4aLebh5fpY and kzbin.info/www/bejne/pmS3XpKCgtepeMU
@dr.kingschultz2 жыл бұрын
your explanation 6:20 is very confusing
@statquest2 жыл бұрын
Are you familiar with R-squared? If not, see kzbin.info/www/bejne/aHK0fKCtZpmgfq8
@DrewAlexandros2 ай бұрын
@statquest2 ай бұрын
:)
@emonreturns78115 жыл бұрын
did you really do that with your mouth??????
@jayjayf96995 жыл бұрын
I have no clue what u talking about
@statquest5 жыл бұрын
Bummer.
@jayjayf96995 жыл бұрын
@@statquest how did u calculate the null model? Or the two parameter model (fancier model)?
@statquest5 жыл бұрын
For details of how to fit models to data, you should either watch my series of videos on Linear Models (i.e. linear regression) and my series of videos on Logistic Regression. You can find links to these videos on my website here: statquest.org/video-index/ Once you have those concepts down, this video will make a lot more sense.
@jayjayf96995 жыл бұрын
@@statquest OK I'll check it out, I've already got the concept of linear regression down, Inusing inference on the slope estimate, sum of squares ect
@statquest5 жыл бұрын
If you really understand linear regression well, then you should know how to fit a one parameter model to data and you should know how to fit a 2 (or more) parameter model to data. So that you should answer your original question.