Multicollinearity in Decision Trees

Рет қаралды 2,283

Dimitri Bianco

Күн бұрын

Пікірлер: 24

@Septumsempra8818 2 жыл бұрын

This type of video is my favorite. It's unique on KZbin. s/o from South Africa

@DimitriBianco 2 жыл бұрын

Thanks! I have a similar one coming out in a few weeks on how not to do cross validation and sampling.

@andresrossi9 2 жыл бұрын

I've lost this one for some reason. Anyway, as someone who knows decision trees quite "in-depth" I'd say this is a very clear lesson, very good material as always

@seanmichael6579 2 жыл бұрын

Absolutely love this content and these kinds of videos. This was nuanced and came with real-world examples and experience. Beats anything I ever got from my stats grad school years. All the best.

@DimitriBianco 2 жыл бұрын

Thanks! I'll try and make a few more of these types of videos.

@QuantPy 2 жыл бұрын

Like always, great quality video Dimitri! If you are looking for video suggestions, I would really like to see a video regarding the possible risks of creating models based on observed granger causality between financial timeseries (perhaps not explainable because of the high number of independent variables) that may have led to good out of sample prediction performance. Would like to hear a practical example of model monitoring (perhaps some of the more popular metrics you have used previously) that could help detect if the model is deteriorating. Thanks again for your effort placed into putting these types of videos together.

@DimitriBianco 2 жыл бұрын

I'll look into making some videos around these ideas.

@Yasharghami 2 жыл бұрын

Can't wait.

@bhargav7476 2 жыл бұрын

Great stuff as always! I was wondering what's the average age group of your viewers?

@DimitriBianco 2 жыл бұрын

85% is between 18 and 34 years old.

@didierdupont5784 2 жыл бұрын

Great video! How would one avoid such a situation? In a scenario where there are thousands of predictors, I can hardly imagine looking at correlations before building the model could help, as there are just too many to manually go through. The same would apply when pruning a tree.

@DimitriBianco 2 жыл бұрын

Cluster analysis. You create clusters based on statistical relationships using something like PCA. There will be a point when the value added from adding more clusters becomes trivial. Often we end up with around 20 clusters for 500 variables. Then you manually review the top few variables in each cluster and build a model with those variables which would give you around 60 final variables.

@didierdupont5784 2 жыл бұрын

@@DimitriBianco Makes sense, thank you!

@Rizzickk 2 жыл бұрын

Please make more ✅

@Shawro 2 жыл бұрын

Hi Dimitri. Great video. I’m currently halfway through my first year of undergrad. I’m doing a dual math cs degree. I’ve chosen the Stats ‘track’ for the math part of my degree, but I’m not sure what the optimal ‘track’ for CS would be if I’m looking to best prepare myself for quant work. My options are data science, machine learning and scientific computing. I’m sure they’re all valuable skills to learn, but which do you think is the best foundation for quant work? Thanks in advance.

@DimitriBianco 2 жыл бұрын

I would do scientific computing but all the are decent choices as ML and data science are taking off. Scientific computing should give you some nice math overlap and numerical analysis is a key part of quant finance.

@FatmaNurAydin-p7r Жыл бұрын

Do you have any suggestions for scientific articles on the topic you mentioned in the video? Thank you...

@DimitriBianco Жыл бұрын

No but you could Google and see if any come up. Multicollinearity can be logically drawn from the math and method of trees. You don't need a paper to come to this conclusion.

@Jay-xb5du 2 жыл бұрын

Hi Dimitri, informative and great video as always! Just a quick question, do you personally think that a degree in statistics to then go onto a Mfe would give me a better chance to become a quant analyst, or a financial mathematics degree to then go onto an Mfe. Which degree do you think will prepare me better for a MFE too? Thanks

@jasdeepsinghgrover2470 2 жыл бұрын

If one of the correlated variables is used in the split then other ones automatically become unlikely as they won't reduce impurity. Won't this help a Decision Tree be more robust? Added the question from premiere in case someone has the same doubt.

@DimitriBianco 2 жыл бұрын

The strength of a decision tree is that it will prevent multicollinearity further down a branch. The issue is when variables are blindly selected based on correlation. If a wrong variable is used it is highly likely the tree will fail quickly which reduces the robustness.

@nyboret6384 2 жыл бұрын

@@DimitriBianco It is true, blindly selection of variable into model is a very dangerous business in ML/Datai and especially XAI that we wish to interpret Partial Dependency Plot blindly came with some wrong and/or noisy sign. Thanks for good explanations. From An Asian (Cambodian) Applied and Theoretical Economist’s Econometrics Mathematical Statistician

@Felix-vg4mv 2 жыл бұрын

Imagine you where the FBI and you predicted crime stats based on ice cream sale. Suddenly in November a video posted on Facebook and then ensues mass riots. Ice cream sale wouldn't change yet crimes would rise.

@Septumsempra8818 Жыл бұрын

How do we fix it?