If x1 and x2 are strongly correlated then we should check their individual correlation with target and will select the variable which is highly correlated with target and can also check p value for the variables.
@UnfoldDataScience3 жыл бұрын
Correct. Thank you.
@rizalrifai6010 Жыл бұрын
whats p value ?
@sanjeevkmr57493 жыл бұрын
Thanks a lot for the detailed discussion on this topic. For the question asked in the video(Which feature to be removed incase of high correlation), I guess among the two, we have to remove the one which least contributes(less correlated) with the target variable. In that way, we will be able to preserve the feature which has high contribution.
@UnfoldDataScience3 жыл бұрын
Thanks Sanjeev. True.
@babareddy443 жыл бұрын
How do we know which contributes least, help?
@arslanshahid34542 жыл бұрын
@@babareddy44 from R2, F- value or p- value?
@beautyisinmind2163 Жыл бұрын
@@babareddy44 you can use random forest model to see the significance of feature that contribute the most
@koustavdutta11763 жыл бұрын
Firstly great explanation !! Now coming to your question, we have to check the bi-variate strength between dependent variables with independent variables. The independent variable with weakest strength should choose to remove from model
@UnfoldDataScience3 жыл бұрын
Awesome. Thank you. :)
@jamiainaga58533 жыл бұрын
what is bi- viariate?
@sowsow51992 жыл бұрын
@@jamiainaga5853 the two variables that have been found to be highly correlated with each other
@kavankomer3048 Жыл бұрын
How to find this bi-variate strength?
@samruddhideshmukh59283 жыл бұрын
Simple, Clear and Amazing explanation!!! I think we can remove one of the columns seeing the p value. If p>0.05 then we fail to reject the Null hypothesis for that variable and thus that coefficient value will be equal to 0.Hence that variable will not contribute significantly. Sir pls do make a video on how to use Ridge-Lasso regression to handle multicollinearity.
@UnfoldDataScience3 жыл бұрын
Thanks Samruddhi, Videos u asked: kzbin.info/www/bejne/bYnZc6qHmrlshas kzbin.info/www/bejne/aGK3mH6erpZ6j5Y
@dariakrupnova62453 жыл бұрын
Wow, I think I owe you my mark on the Econometrics final, you blew my mind, I had no idea it was so simple. Thank you!
@umeshrawat8827 Жыл бұрын
To omit either X1 or X2, we can use PCA and remove the variable with low variance.
@akashchatterjee76788 ай бұрын
Why with low variance
@swatikute2193 жыл бұрын
Amazing pace, crisp word selection and good examples, thank you Aman for great videos !!
@UnfoldDataScience3 жыл бұрын
Thanks Swati.
@sangeethasaga9 ай бұрын
Never seen someone with such a clear understandable explanation...thank you so much!
@Bididudy_ Жыл бұрын
Thank you for detailed explanation. I tried this concept from other channels but was bit difficult to get it. Your way of explaining terms is very simple and which helps to understand subject. Really glad that i visited your channel.👍
@UnfoldDataScience Жыл бұрын
Thanks Bala. Keep learning
@datapointpune62163 жыл бұрын
Very Informative aman
@UnfoldDataScience3 жыл бұрын
Thanks a lot.
@ChenLiangrui5 ай бұрын
awesome video! very clear and beginner friendly, no broken train of thought, very problem-focused
@UnfoldDataScience5 ай бұрын
Glad you liked it!
@jhonatangilromero2311 Жыл бұрын
It is evident that a lot of work goes into developing these very informative videos. Thank you!
@KidsRiddleFun3 жыл бұрын
best explanation....keep the good work up.
@UnfoldDataScience3 жыл бұрын
Thanks a lot.
@shahbazkhalilli85937 ай бұрын
I don't know which one should I take. By the way video is great
@abdulhaseebshah91092 жыл бұрын
Amazing Explanation Aman, I have a question that VIF and auxiliary regression both use to detect multicollinearity?
@KastijitBabar6 ай бұрын
The best explaination on whole KZbin! Thank You.
@datafuturelab_ssb44333 жыл бұрын
Great explaination sir . Thanks for sharing and making my fundamentals strong
@UnfoldDataScience3 жыл бұрын
Most Welcome.
@shadow820003 жыл бұрын
If X1, X2 have high correlation, can I choose to drop the X with lower correlation to Y? Based on the correlation matrix
@UnfoldDataScience3 жыл бұрын
Yes Right.
@shadow820003 жыл бұрын
@@UnfoldDataScience Thank you kind sir. High quality content as always!
@carlmemes97633 жыл бұрын
👍❤️
@sudhirnanaware19443 жыл бұрын
Hi Aman, As per my knowledge we can use VIF (Variation Inflation Factor) function, heatmap,Corr() function to remove the multicoliniarity. Please confirm another techniques
@UnfoldDataScience3 жыл бұрын
Yes Sushir, apart from some other regression techniques can be used.
@sudhirnanaware19443 жыл бұрын
Thanks Aman, may I know the regression techniques to remove multicoliniarity. so I will definitely learn this and it will helpful for me.
@csprusty3 жыл бұрын
We can create and compare two models based on choosing each of the correlated explanatory variables one at a time and select the model having better R-squared value.
@kunalchakraborty30373 жыл бұрын
My question.. 1. Is multicollinearity a concern for predictive modeling. I mean the prediction is altered by neglecting this phenomenon or not. 2. In case of GAM do we have to worry about multicollinearity. 3. How collinearity inflates the variation.
@UnfoldDataScience3 жыл бұрын
Thanks Kunal for asking it. Answer to first question is prediction will not be impacted more however eoefficoents will be impacted. 2nd and 3rd, I will. Cover in other video
@kunalchakraborty30373 жыл бұрын
@@UnfoldDataScience thanks 👍. Really appreciate your videos.
@faozanindresputra3096 Жыл бұрын
is multicollinearity will be problem too in correlations? just focus on getting which variables that correlate, not focus on regression. like in PCA
@nurlanimanov95033 жыл бұрын
Hello sir! Firstly thank you for the video! I have 2 questions if you answer I will be glad: 1) Can we say that we don't need to be concerned about correlated features in for example decision tree-based models? I mean do we need this concept only in linear-based models? 2) Don't we need to touch correlated features when we use Lasso or Ridge regression is that true? Will the model do that by itself in that case? Don't we need to touch?
@UnfoldDataScience3 жыл бұрын
1. This is a problem with regression based models where coefficients come into picture. 2.still you need to take care.
@hemanthkumar423 жыл бұрын
@@UnfoldDataScience from you first answer, then why multicollinearity is not a problem in neural network? Pls make a video regarding this sir...
@saurabhagrawal98742 жыл бұрын
@@hemanthkumar42 Note that multicollinearity does not affect prediction accuracy of the linear regression ,it only make the interpretation harder in the linear regression and mostly for interpretation we go to linear regression and when we go to neural network we already know its type of blackbox and we dont want to interpret ,but want good prediction results ,thats why we dont bother about multicollinearity in neural network
@smegala3815 Жыл бұрын
Thank you sir... Best explanation
@UnfoldDataScience Жыл бұрын
Most welcome
@arshiyasaba22592 жыл бұрын
If value is less then thresholds value 0.5/0.7 as per the reference suggests. Then we can remove those values
@shanmukhchandrayama85083 жыл бұрын
Aman, Your videos are great. But there are many videos which have some connection with other, so can you please make a video in which you can say which order to follow the playlists to learn the machine learning from basics. It would be really helpful😅
@UnfoldDataScience3 жыл бұрын
Noted. Thank you for suggesting.
@datafuturelab_ssb44333 жыл бұрын
Remove the variable which have low impact on target variable... Sir I hv 2 question 1. If there is multicollinearity in Classification problem. How to handle that 2. What is VIF & how standardization done 3. Can we use standard scaler in regression problem
@UnfoldDataScience3 жыл бұрын
There are three questions, I will cover them in separate video. Thanks for asking.
@MuhammadImran-o4c3 жыл бұрын
Sr ap ko js ne jo answr dia he sb ka answr correct he ap sb ko yes bol rhen hn
@UnfoldDataScience3 жыл бұрын
Moslty answers are correct only.
@roshinidhinesh54903 жыл бұрын
Such a great explanation sir.. Thanks a lot!
@UnfoldDataScience3 жыл бұрын
Thanks Roshini.
@YourRandomVariable3 жыл бұрын
Hi Aman, What should we do when the constant term p-value is high? Mostly I see that people keep it without worrying about it. Could you please give an explanation for this?
@allaboutstat11033 жыл бұрын
thanks for clear explanation and God bless!
@UnfoldDataScience3 жыл бұрын
Welcome.
@AMVSAGOs3 жыл бұрын
Great Explanation... At 7.50 you said "that's why we should not have multicollinearity in regression" . So, Is it okay if we have multicollinearity in classification?? Could you please make it clear..
@UnfoldDataScience3 жыл бұрын
When I say, it means regression family of Algorithms. Logistic regression also.
@AMVSAGOs3 жыл бұрын
@@UnfoldDataScience Thank you Aman Sir
@nivednambiar68459 ай бұрын
Hi Aman, hope you are doing well ! I want to ask one thing, what you are mentioning regression models is related to linear models right not the tree based regression models am i correct ? does multicolinearity effects the tree based models ?
@shivamthakur40793 жыл бұрын
really loved sir what u said i can say that u have great idea of explaining concepts. i can blindly follow u sir
@UnfoldDataScience3 жыл бұрын
Thanks Shivam.
@ShubhamSharma-zb9uh3 жыл бұрын
09:11 The Data which More Coefficient Value that we have to consider for analysis.
@manavgora10 ай бұрын
great, easily understandable
@RamanKumar-ss2ro3 жыл бұрын
Great content.
@UnfoldDataScience3 жыл бұрын
Thanks a lot.
@ugwukelechi94762 жыл бұрын
You are a great teacher! I learnt something new today.
@UnfoldDataScience2 жыл бұрын
Pls share within data science groups
@prateeksachdeva16112 жыл бұрын
we will drop that feature from the model whose correlation with the dependent variable is lesser as compared to the other one
@muhammadaliabid57933 жыл бұрын
Thankyou for excellent explanation. I have fews questions please: 1. I used Polynomial features method in sklearn and it significantly improved accuracy of my linear regression prediction model, but i found that the newly created features are correlated with the existing features since i created square and cubes! I understand as per your explanation that it will lead to multicollinearity problem! So i understand that the coefficients are not the true picture, However can i use this type of model for predictions? 2. What would you suggest the threshold correlation value for multicollinearity? Thanks
@mariapramiladcosta19723 жыл бұрын
Sir if the there are 3 predictors and one dependent variable. all the three independent variables are highly correlated then which type of regression model can be used. multiple regression can not be used rt?can we use the linear regression? can the tolerance of .1 and the VIF less than 10 not a good enough to indicate that there is no multicollinearity? for your question i think the one with weak correlated one to be removed
@anmolpardeshi31383 жыл бұрын
regarding the question- which variable to remove out of a set of highly correlated variables? Can this be answered by PCA (principal component analysis)? or will the PCA weight them the same because they are highly correlated?
@UnfoldDataScience3 жыл бұрын
Hi Anmol , not in terms of pca, generally I asked.
@hakimandishmand10682 жыл бұрын
Good and perfect
@UnfoldDataScience2 жыл бұрын
Thank you
@kar21943 жыл бұрын
Sorry so it means when there is multicollinearity for example x2 and x3, so if I increase x2, x3 will automatically increased? Great video by the way!
@UnfoldDataScience3 жыл бұрын
internally at some level yes. tq
@zakiaa7464 Жыл бұрын
You are a genius. Thanks
@bijaynayak64732 жыл бұрын
which one will eliminate ? VIF of each features set the threshold >5
@sriadityab47943 жыл бұрын
Should we need to remove multicollinearity while building time series model?
@UnfoldDataScience3 жыл бұрын
Not necessarily.
@atomicbreath43603 жыл бұрын
Sir can given some ideas on how to know which type of ml models is affected by multicollinearity?
@UnfoldDataScience3 жыл бұрын
Regression based model
@ethiodiversity-11842 жыл бұрын
great explanation
@UnfoldDataScience2 жыл бұрын
Thank you 🙂
@nurlanimanov95033 жыл бұрын
Hello sir, After reading the comments I saw the answer to your question. They said we have to remove the one which has less correlation coefficient with the target variable due to the correlation matrix. It confused me at one point, Can we say that the coefficients in front of each feature that we get after running the regression model indicate us impact of each feature on the target? So, I mean can I take these coefficients when I decide which feature I have to remove bw two correlated features instead of taking correlation matrix value with the target variable? Can we say that the coefficients in front of each feature actually say the same thing as the value in the correlation matrix with the target variable in this context?
@suryadhakal36083 жыл бұрын
Great.
@UnfoldDataScience3 жыл бұрын
Thanks Surya.
@salajmondal34378 ай бұрын
Should I check multicolinearty for classification problem?
@UnfoldDataScience8 ай бұрын
For logistic regression - yes.
@salajmondal34378 ай бұрын
@@UnfoldDataScience Is it necessary to check multicollinearity between categorical features or numerical and categorical features??
@bhavanichatrathi74353 жыл бұрын
Hi Aman it's very good explanation...please do video on penalised regression like lasso ridge and elastic..too much of mathematics into those please explain in simple way Thank you
@UnfoldDataScience3 жыл бұрын
Thanks Bhavani, Sure will do,
@rafibasha41452 жыл бұрын
Multicolinearity is problem in classification as well right .@3:57
@UnfoldDataScience2 жыл бұрын
Yes, if it's a linear model like logistics regression.
@harshadbobade22002 жыл бұрын
Simple and to the point explaination 🤘
@UnfoldDataScience2 жыл бұрын
Thanks Harshad.
@sharadpkumar2 жыл бұрын
Hi Aman, nice work, keep it up.....i have a doubt that why normal distribution is so important? why we need our independent variable should show normal distribution for a good model? i am not finding a satisfying answer. can you please help?
@UnfoldDataScience2 жыл бұрын
Hi Sharad, in simple language, its easy for the model to learn pattern if you give examples from a large set of range.(That is your normal distribution). Take a example below: Predict salary of an individual(Y - target) based on his/her expense(X variable) Scenario 1 - in your training set you have Y as - 10LPA, 15LPA,20LPA, like that, here model wont be able to learn the pattern for 3LPA guys, may be there is difference is income/expense pattern for junior guys. Scenario 2 - You give many values of Y from all over like 2LPA, 4LPS,5LPA,100LPA, all values like they are normally distributed. Here its easy for model to learn pattern as it sees a range of values and the resulting model will be more reliable. Hope its clear now.
@sharadpkumar2 жыл бұрын
@@UnfoldDataScience thanks for clarification . Does a huge dataset always show normal distribution?
@UnfoldDataScience2 жыл бұрын
No, not always...it depends on data
@ashulohar89482 жыл бұрын
Please please make a vedio how to select drivers in linear regression which drive the sales
@RAJANKUMAR-mi1ib3 жыл бұрын
Hi...Thanks for the nice explaination. Have a question that is multicollinearity a problem for linear regression only? if not then how its a problem for non-linear regression?
@UnfoldDataScience3 жыл бұрын
For regression based models like linear/logistic etc
@sidrahms74583 жыл бұрын
Awesome explanation, I have a question: if I have nominal,ordinal and continuous variables how can I find multicollinearity among them?
@UnfoldDataScience3 жыл бұрын
Hi Sidrah, answered.
@sidrahms74583 жыл бұрын
I can't find your answer, I understand that we should use vif for continuous variables but what if I need to see correlation among all ordinal, numeric and nominal?
@beautyisinmind2163 Жыл бұрын
can we remove highly negatively correlated features also or not? someone reply, please
@hemanthkumar423 жыл бұрын
Is multicollinearity is the problem for neural network?
@UnfoldDataScience3 жыл бұрын
Not always.
@akhileshgandhe59343 жыл бұрын
Hi Aman, I have 9 categorical and 6 numerical columns and it's a regression problem. So I can find the correlation between numerical using correlation heatmap but how to find the relation between categorical..?? Can I use chi square test..?? If I use I am getting all 9 categorical are dependent on each other. So what should be my next step..?? Please guide me. Thanks
@UnfoldDataScience3 жыл бұрын
Yes, chi square can be used, I have a dedicated video for the same topic.
@omkarlokhande3692 Жыл бұрын
Sir what to do if the multi collinearity is affecting the binary classification problem
@UnfoldDataScience Жыл бұрын
many ways to take care of it. I have discussed in classification videos.
@sandipansarkar92113 жыл бұрын
finished watching
@shafeeqaabdussalam61953 жыл бұрын
Thank you
@UnfoldDataScience3 жыл бұрын
Thanks again.
@trushnamayeenanda54312 жыл бұрын
The independent variable with higher correlation among the similar factors should be removed
@MuhammadImran-o4c3 жыл бұрын
Thnks sr g I think uncecessary variable remove
@UnfoldDataScience3 жыл бұрын
Yes True.
@sudheeshe13843 жыл бұрын
You always rocks :)
@UnfoldDataScience3 жыл бұрын
Thanks for watching Sudheesh.
@KumarHemjeet3 жыл бұрын
Remove that feature which is in less correlation with target.
@jaheerkalanthar8162 жыл бұрын
I think which variable highly CO relate with target variable
@squadgang16782 жыл бұрын
I will find the correlation between x1 and y and x2 and y individually and see which one is lesser the one with lesser correlation i will delete it
@bezagetnigatu11732 жыл бұрын
Thank you!
@UnfoldDataScience2 жыл бұрын
Welcome.
@ameerrace22843 жыл бұрын
Great video. Please create video on python implementation of Lasso and ridge regression
@UnfoldDataScience3 жыл бұрын
Thanks Ameer. Sure!!
@rohitnalage63662 жыл бұрын
Sir please explain Lasso and ridge if you made it,link pl.
It depends on feature importance. the feature with less importance will be dropped. correct me if am wrong :0
@UnfoldDataScience3 жыл бұрын
Correct Sujith
@anirudhchandnani99173 жыл бұрын
Hi Aman, Could you please make a detailed video explaining the difference between Gradient Boost, AdaBoost and ExtremeGradientBoosting? Why is AdaBoost called adaptive? Is it only because it edits the weights of the misclassified instances? XGBoost and GradientBoost also are adaptive in that way, arent they? Also, why are XGBoost and Gboost more robust to outliers than AdaBoost despite all of them having a term of log in their loss functions? Would really appreciate your reply. Thanks
@karthikganesh46793 жыл бұрын
Sir plz do the video for post pruning decision tree
@UnfoldDataScience3 жыл бұрын
ok Karthik
@sreejadas44172 жыл бұрын
I want to be a data analyst but I want sequential courses from you please guide
@UnfoldDataScience2 жыл бұрын
www.unfolddatascience.com
@squadgang16782 жыл бұрын
Is Machine learning better than deep learning or deep learning better than machine learning
@UnfoldDataScience2 жыл бұрын
Depends on problem statement, data availability, Infra availability etc, can't say one is better then other