6.5 Gini & Entropy versus misclassification error (L06: Decision Trees)

Рет қаралды 12,185

Күн бұрын

Sebastian's books: sebastianrasch...
This video explains why we use entropy (or Gini) instead of the misclassification error as impurity metric in the information gain equation of CART decision trees.
-------
This video is part of my Introduction of Machine Learning course.
Next video: • 6.6 Improvements & dea...
The complete playlist: • Intro to Machine Learn...
A handy overview page with links to the materials: sebastianrasch...
-------
If you want to be notified about future videos, please consider subscribing to my channel: / sebastianraschka

Пікірлер: 14

@visheshgoyal5903 3 жыл бұрын

I haven't seen such a great series in decision trees. Great work you have done. I mean all concepts are crystal clear. Thanks a lot

@sharkofjoy 3 жыл бұрын

You genius, you explained it in a way that my teacher could not. Thank you. The terms you use are different than the ones we use but the concepts are the same.

@SebastianRaschka 2 жыл бұрын

Thanks a lot for the kind works!

@tymothylim6550 3 жыл бұрын

Thank you very much for this video! It was helpful for comparing the different metrics for classification trees!

@akhileshpandey123 3 жыл бұрын

Thanks for providing the tree based models lecture. Nice explanation.

@anna-lenab8343 Жыл бұрын

Great explanation! That was exactly what I was missing for my bachelor thesis :) But as @danieleboch3224 already said - the last short example with the [25, 25] child nodes (from [50, 50] parent node) does not work, as the gain will be zero here regardless of the use of gini, entropy or misclasification error. I also answered to their post with an explanation why: "In case the proportion between the classes is in both child nodes exactly the same as it was in the parent node, the information gain will be zero. Regarding the plot in the end of the video, both child nodes would lie in the exact same spot as the parent node and the child node average would also be at the same spot as the parent node. But in any other case, where the proportion of the classes of the child nodes changes compared to the parent node (*), the information gain will be positive, as seen in the example in the video. But you are right, his last short example with the two [25, 25] child nodes was wrong. (*) It can never be, that only one of the child nodes has a different proportion. As soon as one child node has a different proportion than the parent node (for example a smaller percentage) the other child node will also have a different proportion than the parent node (in the opposite way - e.g. a bigger percentage). In this case one of the child nodes will have a spot above the parent node and the other one will have a spot below the parent node (thinking of the last plot in the video again)."

@danieleboch3224 Жыл бұрын

wait, in the case where we split the [50, 50] node and go to two [25, 25] nodes, our information gain is still 0...

@anna-lenab8343 Жыл бұрын

You are right. Thank you for the clue! In case you or anyone else want to know more: In case the proportion between the classes is in both child nodes exactly the same as it was in the parent node, the information gain will be zero. Regarding the plot in the end of the video, both child nodes would lie in the exact same spot as the parent node and the child node average would also be at the same spot as the parent node. But in any other case, where the proportion of the classes of the child nodes changes compared to the parent node (*), the information gain will be positive, as seen in the example in the video. But you are right, his last short example with the two [25, 25] child nodes was wrong. (*) It can never be, that only one of the child nodes has a different proportion. As soon as one child node has a different proportion than the parent node (for example a smaller percentage) the other child node will also have a different proportion than the parent node (in the opposite way - e.g. a bigger percentage). In this case one of the child nodes will have a spot above the parent node and the other one will have a spot below the parent node (thinking of the last plot in the video again).

@DamnightSC2 3 жыл бұрын

very good explanation

@ft_smile 3 жыл бұрын

Thanks a lot 🌸🌸

@urthogie 8 ай бұрын

the flaw with the example is that splitting on x_2 would be the greatest information gain in all of the cases. no need to be concerned with x_1 at all, you could do it all with an x_2 stump. you say to "make the assumption x_1 is a better split" but this is clearly not the case, since x_2 splits everything perfectly as seen in the leaf nodes.