REGRESSION: Non-Linear relationships & Logarithms

Рет қаралды 153,611

Күн бұрын

Пікірлер: 128

@haroldbradford690 3 жыл бұрын

Seriously, I am doing a business analytics course at a top-ranked university, and I am relying so heavily on your various videos. Why can't my profs speak in human language when you manage to do it so effortlessly!

@PunmasterSTP 2 жыл бұрын

How are your studies going?

@requirementsrequired4384 2 жыл бұрын

Dude! Me toooooo! Where are you studying at? My professor does seem to care. He throws a book at us. I bombed 2 tests and I need 100% on my next 3 to get a 88%.

@komakush3358 6 жыл бұрын

no joking, your tutorials are much better than that im taking rn.

@MoinulHossain-rw2ry 6 ай бұрын

Your presentation are far better than what many professors provide in the classroom. I am majoring in accounting and currently taking a course on econometrics and these videos are nailing it every time for me. Thanks a ton for sharing knowledge in such a comprehensive yet simple way.

@worawatsr9803 4 жыл бұрын

The natural log Odometer has a negative sign, so 1% increase in odometer reading should decrease price by $10.

@nitincsawant 4 жыл бұрын

This series on the Linear Regression deserve a standing applause. I am grateful for such a wonderful explanation. I have a question though. I am still trying to understand on why do we need to include Age term when we know the relationship between Age and price of the car is not linear and we are already including Age-square in the equation. Any reference on this explanation is much appreciated.

@CollegeVideoFreak 4 жыл бұрын

The most general equation for a straight line is y = ax + b. You would be making an implicit assumption about the intercept if you used y = ax instead. Similarly, the general equation for a parabola is y= ax^2 + bx + c. Should the second term be unnecessary, its coefficient will come out as zero. As such, there is no loss in adding it.

@amits310874 4 жыл бұрын

Your lectures are very interesting, simple they explain a complex subject in a much easy language, thank you very much

@PunmasterSTP 2 жыл бұрын

Logarithms? More like "Lo and behold, this is the best of 'em!" Your videos are so helpful; thanks again for making and sharing them.

@panagiotisgoulas8539 5 жыл бұрын

i liked both the breaking in 2 scatter plots to find appropriate models and the explanation of why β1Age kinda explaining (x+a)^2 function. Also nice trick with % regarding logs while it doesn't work on the inverse.

@omey1989 6 жыл бұрын

Thank you for these videos. One question - at @17:15, how are you mentioning that 1% increase in Odometer reading "increases" the price by $10.79 on average, when the sign on the co-efficient is negative. Shouldn't that be "decreases" instead?

@SiddharthPrabhu1983 6 жыл бұрын

I think it's just a typo. It should be "decreases".

@habibmrad8116 6 жыл бұрын

good note, it must be decreases

@jackdepaula 6 жыл бұрын

Agree

@chandraprakash2851 4 жыл бұрын

yeah 20:27 he corrected that

@cococnk388 2 жыл бұрын

I came in the comment section .. just to see if someone had the same question... thanks my doubts are cleared off....

@existentialrap521 Жыл бұрын

You said, “Alright, gang.” What set you rep? Out here reviewing. You know what it be. 🎉

@Kk-nc8xk 4 жыл бұрын

Not all heroes wear capes ! Respect !

@amphibious5577 3 жыл бұрын

Best explanation of nonlinear relationships in linear models. Thank you

@danielabraha3871 Жыл бұрын

I followed your videos really biult confidence for my exam.

@achudakhinkudachin2048 Жыл бұрын

17:30 are you sure it is correct? There is a minus sign there. But the explanation is so calm and lucid!

@martinrussell1404 6 жыл бұрын

Excellent video, I may get this data science thing down yet! Thank you.

@mmamm16 3 жыл бұрын

regarding R2, I would say even if we do not compare it for price VS ln(price), the former model preforms better as it can explain more data proportion (around 60%)

@pnm3225 2 жыл бұрын

Hey Justin, Awesome video! thank you so much!! But I have a question, how would you interpret the Model4 output with Age and Age2? please.

@batatambor 4 жыл бұрын

Hello Justin, thanks for the great material. In the regression videos you explain about the 6 assumptions underlying regression models: Linearity , Constant Error Variance, Independent Error Terms, Normal Error, No multi-colinearity, Exogeneity However in this video you also state that we want to deal with variables following normal distributions. Why? Isn't it possible to satisfy the 6 above assumptions without a normal distributed Y variable ? Why is it important that "we've corrected some of those heavily skewed variables"?

@samtaylor2169 3 жыл бұрын

Great explanations on all videos. Btw your voice sounds like that narrator for Lemony Snicket's Series of Unfortunate Events.

@uguumurkhosbayar1683 2 жыл бұрын

Thank you so much for preparing this useful content. However, I was so confused about the logarithm coefficient part. But figured it out eventually.

@sugbksugbk6556 Жыл бұрын

Justin can u pls share excel file used in this series , actually said in the video that ull put link to excel file in description but th is none .

@dslitmanovich 2 жыл бұрын

You are great, Justin!

@antoniaangelopoulou4269 4 жыл бұрын

Bravo!!! Really good teaching skills. Please interpret b2 in model 3 which refers to ages^2

@janjohansen4132 4 жыл бұрын

I will be happy to help you out: The easiest way to interpret age squared is through an example, to make it simple we will remove the variable “Odometer” from the linear model. Accordingly, our model will look like this: Price = 11863,1 - 365,6*(Age) + 6,63(Age)^2 i.e. we want to come up with an estimate of the price of the car only by observing its age. The car our model will estimate a price on is 10 years old, we insert this into our model. Price = 11863,1 - 365,6*(10) + 6,63(10)^2 = 11863,1 - 3656 + 663 =8870,1 Back to your question: We saw from the scatterplot, that b1`s strictly linear assumtion was slightly incorrect as the relationship was more like an hyperbola. Thus, a discount in the price of 3656, is also incorrect. Accordingly, b2 accounts for this by adding on 663, making the age discount about 3000 instead. (Rationale, cars depreciate most in the early years and old cars can even appreciate)

@nimashahidinia4503 4 жыл бұрын

@@janjohansen4132 Just what I was looking for.. thanks!

@callppatel1 4 жыл бұрын

time 17:21, why you have price increase when 1 % log of odometer reading increase holding age constant. It should be decrease right??

@JoaoVitorBRgomes 4 жыл бұрын

I found it strange too. Think should be a decrease too because of the minus sign.

@Sergio-td7mn 3 жыл бұрын

yes, I would say so too

@ramilaliyev518 3 жыл бұрын

Yes, it should decrease

@harnett1011 3 жыл бұрын

at 17.34 the text box states that an increase in odometer reading increases the price. In the equation the Ln(odometer) term is negative. Shouldn't the price be negatively affected by the odometer?

@liamhoward2208 3 жыл бұрын

R^2 = Percentage of variation in your y-variable that can be explained by the variation in your x-variable. Therefore, 17% of the variation in price can be explained by the age of the car and its odometer (mileage) reading. On its surface this model doesn’t seem to predict too well. I would think that age and miles would explain more of the variation in price

@abelrodger 4 жыл бұрын

At 16:50 - A 1% increase in the odometer reading *decreases* the price by $10.79

@AI-ew1rj 7 жыл бұрын

Please do multivariable regression too!!

@r5t6ymax 7 жыл бұрын

this entire video is an example of multi variable regression, the two variables are age and odometer.

@chithrasrinivasan9904 6 жыл бұрын

Sir,a wonderful explanation on the Non linearity aspect. Can you please show us how to create the model that includes the squaring and inverse functions in r?

@salmanhussain6863 5 жыл бұрын

Can you give me some sources from where I can learn R?

@biancacatherine9980 3 жыл бұрын

I am really enjoying your videos, thank you :) Just a question, at the 11:11 tie stamp, you have beta 2 for both Age i squared, and beta 2 for the inverse of odometer. Is this on purpose? And if so, why are they both beta 2? Thank you in advance of a response.

@capper3360 Жыл бұрын

I think that is a mistake, it should be beta 3.

@myunghee7231 4 жыл бұрын

hello i clicked your url to download the data but i could not find the data where can i download i want to follow the process while i am watching your video. ..thank you!

@AJ-et3vf 2 жыл бұрын

Awesome video! Thank you!

@resap.9128 6 жыл бұрын

This is amazing!

@sinkingtitanic2965 Жыл бұрын

Nice and simple, thanks.

@hr-xd8bf 3 жыл бұрын

Hey zed, thanks for the comprehensive vid! One question about the log transformation for multiple regression models: my model includes two independent variables, of which only one (intensity) is not normally distributed. Can I just log this iv intensity and leave the other iv as it is or do I have to transform both of them?

@evannotley3838 2 жыл бұрын

@zedstatistics -> how does a 1 unit increase in the ln(odometer) represent a 100% increase in odometer? Wouldn’t that be an increase by e/2% since base 2 represents a doubling or 100% change per unit increase? So it should be log2(odometer) for a 1 unit increase to equal 100% increase?

@PoleReseal Жыл бұрын

What I thought. Seeing a 1% increase in odometer, one expects to see a ln(1.01) * coeff change in the response variable...

@existentialrap521 Жыл бұрын

I was wondering this too. Not sure of mistake or if I’m a goof.

@lewists9475 11 ай бұрын

I am thinking the same

@lewists9475 11 ай бұрын

He is probably wrong

@KevinNguyenSESE 11 ай бұрын

Thanks Zed, super helpful! I love the way you explained logs and how to interpret them as well. How might I interpret the relationship between age, age2 and the ln(price). Coefficient of age/100 corresponds to a 1% movement in price? How about age2?

@riki2404 4 жыл бұрын

Thanks

@CVfidjosterra 3 жыл бұрын

Model 4 says that a 1% increase in odometer decreases price by a 0.197% (basically coefficient). But I am strugling to understand how you picked 1% change. What happens if you picked 10% change?

@khwaab4284 2 жыл бұрын

beautiful video.. thanks

@tacticisacting3702 2 жыл бұрын

i'm no expert but as far as i am aware, to shift a parabola you would rather have b2 * (age - mean(age))^2, no? how would adding an extra linear feature move age on the x axis? When age is zero, both coefficients are multiplied by zero still.

@mouradmadouni8277 3 жыл бұрын

Thank you very much !

@iamtrash288 2 жыл бұрын

I suppose, in the regression descriptions, the low p-values said that the probability that the regression fit like this by pure chance was low?

@PunmasterSTP 2 жыл бұрын

I think that can be reasonable. I tend to think of the p-value as the probability that we'd expect things to fit as well as they do *if the null hypothesis were true*. So for instance, if our p-value was 0.003, then that means that if the null hypothesis were true we'd only expect to come up with a regression as or more extreme than we did about 0.3% of the time.

@ashablinski 4 жыл бұрын

It seems that you are putting a rather high priority to the normality distribution of the logged data. Isn't it supposed to be the lowest priority according to the assumptions of the regression modelling?

@panagiotisgoulas8539 5 жыл бұрын

@zedstatistics Justin I have some questions, if you have the time please reply. At some point you say: "We like to input variables which are roughly normally distributed". First time I hear this in regression regarding independent variables. I understand the normality assumptions about residuals and about the dependent variables for every value of independent, a)but what are you trying to say exactly? Then you plot the histogram for the odometer (odometer,price) pointing out the skewness and correcting it with the log plotting the (ln(odometer),price) highlighting the normality of ln(odometer). b)The histogram for the (Age,price) seems to have problems as well. So according to your logic If you plot the histogram on your dataset the (β1Age+β2Age^2,price) it should look normal as well? c)Nowhere in your table and in the model you mentioned anything regarding cards sold. Then you run a histogram of price vs cards sold, so you decide to log the price while the cards sold were not even a part of the regression equation you made. Confusing...

@jamesrobertson9149 4 жыл бұрын

these are good questions!

@archidar1 4 жыл бұрын

If you look at the graph on the right at 8:32 you can see that skew is present in the data. Specifically, the data is right-skewed whereby most of the points are clustered on the left but a few are on the right with very large values. The problem with this graph of Y against X, is that the outliers can heavily influence where the model draws the line of best fit. For example, imagine a perfectly straight line of 10 points evenly distributed between 0.1 and 0.2, then 1 point at 100. (Lets just give them Y values of 1 to 11) Use data (0.1, 1), (0.2, 2), (0.3, 3), ... (1.0, 10), (100, 11). The point at 100 will cause a huge squared error term if the linear model just matches the 10 points perfectly, so the priority of Linear Regression via Ordinary Least Squares is to draw a line as close to this outlier, even if it does not match the perfect line the 10 points are on. Elaboration: Scenario 1: Draw a line that matches the 10 points but completely overshoots the 11th point, get a huge squared error term from it. Scenario 2: Draw a line that passes through the middle of the 10 points, and the 11th point, but does not match the exact line of the 10 points. The Sum of Squared Errors wont be very large since the 10 points will be quite close to the line anyway. But if a log (or any other) transformation can get the data to be more evenly spread, so there arent any points that dictate the line, then we can avoid this problem. The point is, regression is sensitive to outliers and the log transformation is effective in drawing in outliers, monotonically (i.e. we can usually reverse the transformation). He touches on this point at 14:28 Regarding sensitivity, something else to think about is how the arithmetic mean is more sensitive to outliers than the median. This is why the median is sometimes used when countries report income levels (i.e. median income).

@mohammadaasifkhaja1892 Жыл бұрын

you are simply good

@osaabd390 5 жыл бұрын

The odometer and age are correlated, isn't that a problem for the whole regression model? or what does age mean here? the age of the car? doesnt that imply that some cars are used? or all the cars are new and age refers to the year the car was produced? but since we have an odometer reading, this means some of these cars are used, this means age could refer to the year of production if new and the number of years it was used. This implies correlation between the 2 IVs. Is that problematic if it is the case?

@QZainyQ 3 жыл бұрын

1% increase in odometer reading causes a $10.79 dollars increase or decrease in price?

@qinghuafeng1705 Жыл бұрын

Excellent!

@tanjimrafi6211 5 жыл бұрын

Your videos are awesome

@sathanasaetiao8219 3 жыл бұрын

I still don't understand the 17:21, why the coefficient means the change of price for a 100% change of odometer. Please help.

@atarabishi 4 жыл бұрын

This is amazing I just have one question, how can we interpret age in this example where we have age and age square and the B1 is -429 while B2= 7.318? If anyone can help me with that I would appreciate it.

@DDranks 4 жыл бұрын

My interpretation is that B1 (Age) normally decreases the price. However, there are extreme outliers, really old cars that sell for a high price, that B2 (Age^2) represent. I imagine those old cars to be something like old classics that collectors buy for big bucks.

@TheCsePower 2 жыл бұрын

Put an age greater than 58.6 in the equation and you see that it has positive effect on the price. Put an age lower than 58.6 and you will see that it has a negative effect. The two terms essentially try to capture that. And this is probably becasue like @Pyry Kontio said, cars that are 60 years and older are considered collector cars.

@gemini22581 5 ай бұрын

How would you interpret if the dependent variable (price) is logged but the independent variables stay as is. Please help with this interpretation

@eirinla1140 3 жыл бұрын

Thank you for saving my master

@dorafragkouli2321 6 жыл бұрын

you are the best!

@prasitrattanapiseth1058 5 жыл бұрын

I have learn a lot from your kind post.Thx and Lord of Buddha be with and blessed you all.

@rhishikeshjoshi6806 4 жыл бұрын

Referring to timestamp 10:40. We are using age and age^2 both, but will this not run into a problem of perfect multi-collinearity? Will the regression still work? Thanks

@zedstatistics 4 жыл бұрын

Great question. Believe it or not age^2 will not be problematically collinear with age. Remember that 'collinear' means co-LINEAR. By squaring, the relationship is not linear. Yes, the value of rho will be closer to 1 than 0, but generally it wont be a big problem.

@rhishikeshjoshi6806 4 жыл бұрын

@@zedstatistics Understood. Thanks a ton for this Justin! This helps.

@DoctorJ-NY 9 ай бұрын

Is the interpretation of a 1% increase of natural log(x) resulting in the beta value change to y, the exact same if we are using Base 10?

@wormself 2 жыл бұрын

Your intro is killing me, dude. 💀🤣

@Skey1337 2 жыл бұрын

When we construct the age squared variable, how do we interpret that? Do we interpret the age or age squared?

@anaverageloser8394 6 жыл бұрын

Q. What will my new variable (eg. age ^2 for age since it was parabola as in above video) be if my existing variable is in the form of y = x^2 i.e. right leaning parabola. Thank you in advance for your time.

@jackyhuang6034 2 жыл бұрын

I don't really get the part "the phrase linear regression refers to the coefficients themselves". I thought whenever we have ^2 or 1/x , it becomes non linear

@bobchannell3553 5 жыл бұрын

I'm having trouble understanding why the price would increase when the odometer reading increases. Is this a mistake of some sort, or am I missing something? As far as I can tell, this does seem to change to decrease in the final part of the video. Hmm.. Oh, this has been asked and answered in previous comments. OK. Thanks!

@serikshamgunov7940 6 жыл бұрын

thank you very much for this video

@ricklongley9172 4 жыл бұрын

Did the log video ever come to fruition?

@AlawiAlAlawi 5 жыл бұрын

Thanks for the video. The ending interpretation ... A 1% increase in Odometer ?? ... Is it a 1% increase from the mean of Log Odometer Variable ??

@sugusyang9331 4 жыл бұрын

kzbin.info/www/bejne/l4mld36BnZpne9U you can check this video for interpretation of log regression models

@jeevanpati1993 4 жыл бұрын

Hi, there are no files in the description. Can you please give link to that @zedstatistics

@fengnicole8021 4 жыл бұрын

Jeevan Pat Same here. Could you upload the link here? Thanks .

@m.c.degroffdavis9885 4 жыл бұрын

If you head over to his page, the links are on the video playlist page (about halfway down, right side).@@fengnicole8021

@sanjitstyleicon 4 жыл бұрын

instead of using log, can we not normalize the variable by subtracting mean and dividing standard deviation to get the bell shaped curve ?

@PunmasterSTP 2 жыл бұрын

I think that would be a different procedure, and in this situation, we'd want to use a log to be able to capture a sense of scale as opposed to normalizing a distribution relative to the standard normal.

@salaartassadaq2177 4 жыл бұрын

At 17:33, shouldn't it be written that a 1% increase in the odometer reading decreases the price by $10.79? Amazing video indeed thanks Justin love from Australia and Pakistan!

@letslearntogether8199 5 жыл бұрын

Please share any video on DOE

@wilsonchung 2 жыл бұрын

谢谢！

@theresayessivania6016 3 жыл бұрын

"because what the h*ll is the natural log of the odometer reading" hahaha

@manojshankar4315 4 жыл бұрын

Hi , I gettting bit confused , the polynomial and Quadratic are non linear relationship rite ? but in python we still use it with linear regression model , so i am confused now , if polynomial and quadratic are linear or non linear relationship ? can you please clarify ??

@zedstatistics 4 жыл бұрын

When you square a variable, you are essentially creating a new variable which is the square of another variable. It is still a linear regression because all you are doing is linearly relating Y to this new variable. So it can become Y = B0 + B1(NEW VARIABLE). Python doesn't care how that variable was created. It just runs a linear regression with it. You, however, know that it is a squared variable.

@manojshankar4315 4 жыл бұрын

@@zedstatistics thanks a lot for your reply .. thanks for awesome videos and and your guidance 😀👌

@ajaykulkarni576 4 жыл бұрын

can you please post the log video?

@marcospark2803 7 ай бұрын

Which one is Basic Regresion 3 video?

@karannchew2534 2 жыл бұрын

Shouldn't it be 1% increase in *"natural log"* of odometer?

@helloworld1537 2 жыл бұрын

Yes I think so, I don’t know if it is the error of the video or my understanding

@lewists9475 11 ай бұрын

you are probably right

@Larbjorn 5 жыл бұрын

Are you using R^2 or R^2 adjusted?

@jerkeraberg1363 2 жыл бұрын

Is that your Volvo? :)

@abdelkaderkaouane1944 Жыл бұрын

hhh, I understand statistics now better than when I was in university

@Ricky-rv6bv 5 жыл бұрын

can't find jaybob.csv dataset

@callppatel1 4 жыл бұрын

justin-zeltzer.squarespace.com/s/jaybob.csv

@rachadlakis1 2 жыл бұрын

great

@prabhakarz6521 2 жыл бұрын

I didn't get why we need to take age as age square.?

@ting-chiehhuang6937 4 жыл бұрын

I couldn't find data set from the website...please help.

@zedstatistics 4 жыл бұрын

It's right there. www.zstatistics.com/videos#/regression see the section Regression 4.

@panju01 3 жыл бұрын

where is the jaybob.csv ?

@shreyasarojkar5267 4 жыл бұрын

What's t value used for ?

@requirementsrequired4384 2 жыл бұрын

Dude, thank you so much. I was bombing my tests, but I am feeling confident now. Well, at least a bit more now! You earned a SUB from my personal and education account later today!(: