Seriously, I am doing a business analytics course at a top-ranked university, and I am relying so heavily on your various videos. Why can't my profs speak in human language when you manage to do it so effortlessly!
@PunmasterSTP2 жыл бұрын
How are your studies going?
@requirementsrequired43842 жыл бұрын
Dude! Me toooooo! Where are you studying at? My professor does seem to care. He throws a book at us. I bombed 2 tests and I need 100% on my next 3 to get a 88%.
@komakush33586 жыл бұрын
no joking, your tutorials are much better than that im taking rn.
@MoinulHossain-rw2ry6 ай бұрын
Your presentation are far better than what many professors provide in the classroom. I am majoring in accounting and currently taking a course on econometrics and these videos are nailing it every time for me. Thanks a ton for sharing knowledge in such a comprehensive yet simple way.
@worawatsr98034 жыл бұрын
The natural log Odometer has a negative sign, so 1% increase in odometer reading should decrease price by $10.
@nitincsawant4 жыл бұрын
This series on the Linear Regression deserve a standing applause. I am grateful for such a wonderful explanation. I have a question though. I am still trying to understand on why do we need to include Age term when we know the relationship between Age and price of the car is not linear and we are already including Age-square in the equation. Any reference on this explanation is much appreciated.
@CollegeVideoFreak4 жыл бұрын
The most general equation for a straight line is y = ax + b. You would be making an implicit assumption about the intercept if you used y = ax instead. Similarly, the general equation for a parabola is y= ax^2 + bx + c. Should the second term be unnecessary, its coefficient will come out as zero. As such, there is no loss in adding it.
@amits3108744 жыл бұрын
Your lectures are very interesting, simple they explain a complex subject in a much easy language, thank you very much
@PunmasterSTP2 жыл бұрын
Logarithms? More like "Lo and behold, this is the best of 'em!" Your videos are so helpful; thanks again for making and sharing them.
@panagiotisgoulas85395 жыл бұрын
i liked both the breaking in 2 scatter plots to find appropriate models and the explanation of why β1Age kinda explaining (x+a)^2 function. Also nice trick with % regarding logs while it doesn't work on the inverse.
@omey19896 жыл бұрын
Thank you for these videos. One question - at @17:15, how are you mentioning that 1% increase in Odometer reading "increases" the price by $10.79 on average, when the sign on the co-efficient is negative. Shouldn't that be "decreases" instead?
@SiddharthPrabhu19836 жыл бұрын
I think it's just a typo. It should be "decreases".
@habibmrad81166 жыл бұрын
good note, it must be decreases
@jackdepaula6 жыл бұрын
Agree
@chandraprakash28514 жыл бұрын
yeah 20:27 he corrected that
@cococnk3882 жыл бұрын
I came in the comment section .. just to see if someone had the same question... thanks my doubts are cleared off....
@existentialrap521 Жыл бұрын
You said, “Alright, gang.” What set you rep? Out here reviewing. You know what it be. 🎉
@Kk-nc8xk4 жыл бұрын
Not all heroes wear capes ! Respect !
@amphibious55773 жыл бұрын
Best explanation of nonlinear relationships in linear models. Thank you
@danielabraha3871 Жыл бұрын
I followed your videos really biult confidence for my exam.
@achudakhinkudachin2048 Жыл бұрын
17:30 are you sure it is correct? There is a minus sign there. But the explanation is so calm and lucid!
@martinrussell14046 жыл бұрын
Excellent video, I may get this data science thing down yet! Thank you.
@mmamm163 жыл бұрын
regarding R2, I would say even if we do not compare it for price VS ln(price), the former model preforms better as it can explain more data proportion (around 60%)
@pnm32252 жыл бұрын
Hey Justin, Awesome video! thank you so much!! But I have a question, how would you interpret the Model4 output with Age and Age2? please.
@batatambor4 жыл бұрын
Hello Justin, thanks for the great material. In the regression videos you explain about the 6 assumptions underlying regression models: Linearity , Constant Error Variance, Independent Error Terms, Normal Error, No multi-colinearity, Exogeneity However in this video you also state that we want to deal with variables following normal distributions. Why? Isn't it possible to satisfy the 6 above assumptions without a normal distributed Y variable ? Why is it important that "we've corrected some of those heavily skewed variables"?
@samtaylor21693 жыл бұрын
Great explanations on all videos. Btw your voice sounds like that narrator for Lemony Snicket's Series of Unfortunate Events.
@uguumurkhosbayar16832 жыл бұрын
Thank you so much for preparing this useful content. However, I was so confused about the logarithm coefficient part. But figured it out eventually.
@sugbksugbk6556 Жыл бұрын
Justin can u pls share excel file used in this series , actually said in the video that ull put link to excel file in description but th is none .
@dslitmanovich2 жыл бұрын
You are great, Justin!
@antoniaangelopoulou42694 жыл бұрын
Bravo!!! Really good teaching skills. Please interpret b2 in model 3 which refers to ages^2
@janjohansen41324 жыл бұрын
I will be happy to help you out: The easiest way to interpret age squared is through an example, to make it simple we will remove the variable “Odometer” from the linear model. Accordingly, our model will look like this: Price = 11863,1 - 365,6*(Age) + 6,63(Age)^2 i.e. we want to come up with an estimate of the price of the car only by observing its age. The car our model will estimate a price on is 10 years old, we insert this into our model. Price = 11863,1 - 365,6*(10) + 6,63(10)^2 = 11863,1 - 3656 + 663 =8870,1 Back to your question: We saw from the scatterplot, that b1`s strictly linear assumtion was slightly incorrect as the relationship was more like an hyperbola. Thus, a discount in the price of 3656, is also incorrect. Accordingly, b2 accounts for this by adding on 663, making the age discount about 3000 instead. (Rationale, cars depreciate most in the early years and old cars can even appreciate)
@nimashahidinia45034 жыл бұрын
@@janjohansen4132 Just what I was looking for.. thanks!
@callppatel14 жыл бұрын
time 17:21, why you have price increase when 1 % log of odometer reading increase holding age constant. It should be decrease right??
@JoaoVitorBRgomes4 жыл бұрын
I found it strange too. Think should be a decrease too because of the minus sign.
@Sergio-td7mn3 жыл бұрын
yes, I would say so too
@ramilaliyev5183 жыл бұрын
Yes, it should decrease
@harnett10113 жыл бұрын
at 17.34 the text box states that an increase in odometer reading increases the price. In the equation the Ln(odometer) term is negative. Shouldn't the price be negatively affected by the odometer?
@liamhoward22083 жыл бұрын
R^2 = Percentage of variation in your y-variable that can be explained by the variation in your x-variable. Therefore, 17% of the variation in price can be explained by the age of the car and its odometer (mileage) reading. On its surface this model doesn’t seem to predict too well. I would think that age and miles would explain more of the variation in price
@abelrodger4 жыл бұрын
At 16:50 - A 1% increase in the odometer reading *decreases* the price by $10.79
@AI-ew1rj7 жыл бұрын
Please do multivariable regression too!!
@r5t6ymax7 жыл бұрын
this entire video is an example of multi variable regression, the two variables are age and odometer.
@chithrasrinivasan99046 жыл бұрын
Sir,a wonderful explanation on the Non linearity aspect. Can you please show us how to create the model that includes the squaring and inverse functions in r?
@salmanhussain68635 жыл бұрын
Can you give me some sources from where I can learn R?
@biancacatherine99803 жыл бұрын
I am really enjoying your videos, thank you :) Just a question, at the 11:11 tie stamp, you have beta 2 for both Age i squared, and beta 2 for the inverse of odometer. Is this on purpose? And if so, why are they both beta 2? Thank you in advance of a response.
@capper3360 Жыл бұрын
I think that is a mistake, it should be beta 3.
@myunghee72314 жыл бұрын
hello i clicked your url to download the data but i could not find the data where can i download i want to follow the process while i am watching your video. ..thank you!
@AJ-et3vf2 жыл бұрын
Awesome video! Thank you!
@resap.91286 жыл бұрын
This is amazing!
@sinkingtitanic2965 Жыл бұрын
Nice and simple, thanks.
@hr-xd8bf3 жыл бұрын
Hey zed, thanks for the comprehensive vid! One question about the log transformation for multiple regression models: my model includes two independent variables, of which only one (intensity) is not normally distributed. Can I just log this iv intensity and leave the other iv as it is or do I have to transform both of them?
@evannotley38382 жыл бұрын
@zedstatistics -> how does a 1 unit increase in the ln(odometer) represent a 100% increase in odometer? Wouldn’t that be an increase by e/2% since base 2 represents a doubling or 100% change per unit increase? So it should be log2(odometer) for a 1 unit increase to equal 100% increase?
@PoleReseal Жыл бұрын
What I thought. Seeing a 1% increase in odometer, one expects to see a ln(1.01) * coeff change in the response variable...
@existentialrap521 Жыл бұрын
I was wondering this too. Not sure of mistake or if I’m a goof.
@lewists947511 ай бұрын
I am thinking the same
@lewists947511 ай бұрын
He is probably wrong
@KevinNguyenSESE11 ай бұрын
Thanks Zed, super helpful! I love the way you explained logs and how to interpret them as well. How might I interpret the relationship between age, age2 and the ln(price). Coefficient of age/100 corresponds to a 1% movement in price? How about age2?
@riki24044 жыл бұрын
Thanks
@CVfidjosterra3 жыл бұрын
Model 4 says that a 1% increase in odometer decreases price by a 0.197% (basically coefficient). But I am strugling to understand how you picked 1% change. What happens if you picked 10% change?
@khwaab42842 жыл бұрын
beautiful video.. thanks
@tacticisacting37022 жыл бұрын
i'm no expert but as far as i am aware, to shift a parabola you would rather have b2 * (age - mean(age))^2, no? how would adding an extra linear feature move age on the x axis? When age is zero, both coefficients are multiplied by zero still.
@mouradmadouni82773 жыл бұрын
Thank you very much !
@iamtrash2882 жыл бұрын
I suppose, in the regression descriptions, the low p-values said that the probability that the regression fit like this by pure chance was low?
@PunmasterSTP2 жыл бұрын
I think that can be reasonable. I tend to think of the p-value as the probability that we'd expect things to fit as well as they do *if the null hypothesis were true*. So for instance, if our p-value was 0.003, then that means that if the null hypothesis were true we'd only expect to come up with a regression as or more extreme than we did about 0.3% of the time.
@ashablinski4 жыл бұрын
It seems that you are putting a rather high priority to the normality distribution of the logged data. Isn't it supposed to be the lowest priority according to the assumptions of the regression modelling?
@panagiotisgoulas85395 жыл бұрын
@zedstatistics Justin I have some questions, if you have the time please reply. At some point you say: "We like to input variables which are roughly normally distributed". First time I hear this in regression regarding independent variables. I understand the normality assumptions about residuals and about the dependent variables for every value of independent, a)but what are you trying to say exactly? Then you plot the histogram for the odometer (odometer,price) pointing out the skewness and correcting it with the log plotting the (ln(odometer),price) highlighting the normality of ln(odometer). b)The histogram for the (Age,price) seems to have problems as well. So according to your logic If you plot the histogram on your dataset the (β1Age+β2Age^2,price) it should look normal as well? c)Nowhere in your table and in the model you mentioned anything regarding cards sold. Then you run a histogram of price vs cards sold, so you decide to log the price while the cards sold were not even a part of the regression equation you made. Confusing...
@jamesrobertson91494 жыл бұрын
these are good questions!
@archidar14 жыл бұрын
If you look at the graph on the right at 8:32 you can see that skew is present in the data. Specifically, the data is right-skewed whereby most of the points are clustered on the left but a few are on the right with very large values. The problem with this graph of Y against X, is that the outliers can heavily influence where the model draws the line of best fit. For example, imagine a perfectly straight line of 10 points evenly distributed between 0.1 and 0.2, then 1 point at 100. (Lets just give them Y values of 1 to 11) Use data (0.1, 1), (0.2, 2), (0.3, 3), ... (1.0, 10), (100, 11). The point at 100 will cause a huge squared error term if the linear model just matches the 10 points perfectly, so the priority of Linear Regression via Ordinary Least Squares is to draw a line as close to this outlier, even if it does not match the perfect line the 10 points are on. Elaboration: Scenario 1: Draw a line that matches the 10 points but completely overshoots the 11th point, get a huge squared error term from it. Scenario 2: Draw a line that passes through the middle of the 10 points, and the 11th point, but does not match the exact line of the 10 points. The Sum of Squared Errors wont be very large since the 10 points will be quite close to the line anyway. But if a log (or any other) transformation can get the data to be more evenly spread, so there arent any points that dictate the line, then we can avoid this problem. The point is, regression is sensitive to outliers and the log transformation is effective in drawing in outliers, monotonically (i.e. we can usually reverse the transformation). He touches on this point at 14:28 Regarding sensitivity, something else to think about is how the arithmetic mean is more sensitive to outliers than the median. This is why the median is sometimes used when countries report income levels (i.e. median income).
@mohammadaasifkhaja1892 Жыл бұрын
you are simply good
@osaabd3905 жыл бұрын
The odometer and age are correlated, isn't that a problem for the whole regression model? or what does age mean here? the age of the car? doesnt that imply that some cars are used? or all the cars are new and age refers to the year the car was produced? but since we have an odometer reading, this means some of these cars are used, this means age could refer to the year of production if new and the number of years it was used. This implies correlation between the 2 IVs. Is that problematic if it is the case?
@QZainyQ3 жыл бұрын
1% increase in odometer reading causes a $10.79 dollars increase or decrease in price?
@qinghuafeng1705 Жыл бұрын
Excellent!
@tanjimrafi62115 жыл бұрын
Your videos are awesome
@sathanasaetiao82193 жыл бұрын
I still don't understand the 17:21, why the coefficient means the change of price for a 100% change of odometer. Please help.
@atarabishi4 жыл бұрын
This is amazing I just have one question, how can we interpret age in this example where we have age and age square and the B1 is -429 while B2= 7.318? If anyone can help me with that I would appreciate it.
@DDranks4 жыл бұрын
My interpretation is that B1 (Age) normally decreases the price. However, there are extreme outliers, really old cars that sell for a high price, that B2 (Age^2) represent. I imagine those old cars to be something like old classics that collectors buy for big bucks.
@TheCsePower2 жыл бұрын
Put an age greater than 58.6 in the equation and you see that it has positive effect on the price. Put an age lower than 58.6 and you will see that it has a negative effect. The two terms essentially try to capture that. And this is probably becasue like @Pyry Kontio said, cars that are 60 years and older are considered collector cars.
@gemini225815 ай бұрын
How would you interpret if the dependent variable (price) is logged but the independent variables stay as is. Please help with this interpretation
@eirinla11403 жыл бұрын
Thank you for saving my master
@dorafragkouli23216 жыл бұрын
you are the best!
@prasitrattanapiseth10585 жыл бұрын
I have learn a lot from your kind post.Thx and Lord of Buddha be with and blessed you all.
@rhishikeshjoshi68064 жыл бұрын
Referring to timestamp 10:40. We are using age and age^2 both, but will this not run into a problem of perfect multi-collinearity? Will the regression still work? Thanks
@zedstatistics4 жыл бұрын
Great question. Believe it or not age^2 will not be problematically collinear with age. Remember that 'collinear' means co-LINEAR. By squaring, the relationship is not linear. Yes, the value of rho will be closer to 1 than 0, but generally it wont be a big problem.
@rhishikeshjoshi68064 жыл бұрын
@@zedstatistics Understood. Thanks a ton for this Justin! This helps.
@DoctorJ-NY9 ай бұрын
Is the interpretation of a 1% increase of natural log(x) resulting in the beta value change to y, the exact same if we are using Base 10?
@wormself2 жыл бұрын
Your intro is killing me, dude. 💀🤣
@Skey13372 жыл бұрын
When we construct the age squared variable, how do we interpret that? Do we interpret the age or age squared?
@anaverageloser83946 жыл бұрын
Q. What will my new variable (eg. age ^2 for age since it was parabola as in above video) be if my existing variable is in the form of y = x^2 i.e. right leaning parabola. Thank you in advance for your time.
@jackyhuang60342 жыл бұрын
I don't really get the part "the phrase linear regression refers to the coefficients themselves". I thought whenever we have ^2 or 1/x , it becomes non linear
@bobchannell35535 жыл бұрын
I'm having trouble understanding why the price would increase when the odometer reading increases. Is this a mistake of some sort, or am I missing something? As far as I can tell, this does seem to change to decrease in the final part of the video. Hmm.. Oh, this has been asked and answered in previous comments. OK. Thanks!
@serikshamgunov79406 жыл бұрын
thank you very much for this video
@ricklongley91724 жыл бұрын
Did the log video ever come to fruition?
@AlawiAlAlawi5 жыл бұрын
Thanks for the video. The ending interpretation ... A 1% increase in Odometer ?? ... Is it a 1% increase from the mean of Log Odometer Variable ??
@sugusyang93314 жыл бұрын
kzbin.info/www/bejne/l4mld36BnZpne9U you can check this video for interpretation of log regression models
@jeevanpati19934 жыл бұрын
Hi, there are no files in the description. Can you please give link to that @zedstatistics
@fengnicole80214 жыл бұрын
Jeevan Pat Same here. Could you upload the link here? Thanks .
@m.c.degroffdavis98854 жыл бұрын
If you head over to his page, the links are on the video playlist page (about halfway down, right side).@@fengnicole8021
@sanjitstyleicon4 жыл бұрын
instead of using log, can we not normalize the variable by subtracting mean and dividing standard deviation to get the bell shaped curve ?
@PunmasterSTP2 жыл бұрын
I think that would be a different procedure, and in this situation, we'd want to use a log to be able to capture a sense of scale as opposed to normalizing a distribution relative to the standard normal.
@salaartassadaq21774 жыл бұрын
At 17:33, shouldn't it be written that a 1% increase in the odometer reading decreases the price by $10.79? Amazing video indeed thanks Justin love from Australia and Pakistan!
@letslearntogether81995 жыл бұрын
Please share any video on DOE
@wilsonchung2 жыл бұрын
谢谢!
@theresayessivania60163 жыл бұрын
"because what the h*ll is the natural log of the odometer reading" hahaha
@manojshankar43154 жыл бұрын
Hi , I gettting bit confused , the polynomial and Quadratic are non linear relationship rite ? but in python we still use it with linear regression model , so i am confused now , if polynomial and quadratic are linear or non linear relationship ? can you please clarify ??
@zedstatistics4 жыл бұрын
When you square a variable, you are essentially creating a new variable which is the square of another variable. It is still a linear regression because all you are doing is linearly relating Y to this new variable. So it can become Y = B0 + B1(NEW VARIABLE). Python doesn't care how that variable was created. It just runs a linear regression with it. You, however, know that it is a squared variable.
@manojshankar43154 жыл бұрын
@@zedstatistics thanks a lot for your reply .. thanks for awesome videos and and your guidance 😀👌
@ajaykulkarni5764 жыл бұрын
can you please post the log video?
@marcospark28037 ай бұрын
Which one is Basic Regresion 3 video?
@karannchew25342 жыл бұрын
Shouldn't it be 1% increase in *"natural log"* of odometer?
@helloworld15372 жыл бұрын
Yes I think so, I don’t know if it is the error of the video or my understanding
@lewists947511 ай бұрын
you are probably right
@Larbjorn5 жыл бұрын
Are you using R^2 or R^2 adjusted?
@jerkeraberg13632 жыл бұрын
Is that your Volvo? :)
@abdelkaderkaouane1944 Жыл бұрын
hhh, I understand statistics now better than when I was in university
@Ricky-rv6bv5 жыл бұрын
can't find jaybob.csv dataset
@callppatel14 жыл бұрын
justin-zeltzer.squarespace.com/s/jaybob.csv
@rachadlakis12 жыл бұрын
great
@prabhakarz65212 жыл бұрын
I didn't get why we need to take age as age square.?
@ting-chiehhuang69374 жыл бұрын
I couldn't find data set from the website...please help.
@zedstatistics4 жыл бұрын
It's right there. www.zstatistics.com/videos#/regression see the section Regression 4.
@panju013 жыл бұрын
where is the jaybob.csv ?
@shreyasarojkar52674 жыл бұрын
What's t value used for ?
@requirementsrequired43842 жыл бұрын
Dude, thank you so much. I was bombing my tests, but I am feeling confident now. Well, at least a bit more now! You earned a SUB from my personal and education account later today!(:
@MBC999able5 жыл бұрын
Interpretaion oh log coef is Wrong. Plz check again.
@samanthavalderrama88654 жыл бұрын
Hi thank you for the video! can you interpret age in the last model with lnprice=y
@lewists947511 ай бұрын
Alert people, the explanation on the coefficient of log is wrong no matter increase or decrease
@batatambor4 жыл бұрын
Hello Justin, thanks for the great material. In the regression videos you explain about the 6 assumptions underlying regression models: Linearity , Constant Error Variance, Independent Error Terms, Normal Error, No multi-colinearity, Exogeneity However in this video you also state that we want to deal with variables following normal distributions. Why? Isn't it possible to satisfy the 6 above assumptions without a normal distributed Y variable ? Why is it important that "we've corrected some of those heavily skewed variables"?