Gini Index and Entropy|Gini Index and Information gain in Decision Tree|Decision tree splitting rule

Рет қаралды 156,288

Күн бұрын

Пікірлер: 294

@islamicinterestofficial 3 жыл бұрын

There is a mistake in your video: You said to choose that attribute that has less information gain. But actually we have to choose that has high information gain...

@UnfoldDataScience 3 жыл бұрын

Yes Naat, thanks for pointing out. I have pinned the comments related to it in the video for everyones benefit.

@islamicinterestofficial 3 жыл бұрын

@@UnfoldDataScience Pleasure sir

@nikhilgupta4859 3 жыл бұрын

If you are saying that we have to choose high information gain. Then as per video we should take the impure node. For pure node gini would come 0 and hence 0 IG. Isn't something wrong.

@DK-il7ql 3 жыл бұрын

At what time that has been said and corrected?

@RaviSingh-xx2wq 3 жыл бұрын

@@DK-il7ql 10:37 he said Low information gain by mistake instead of high information gain

@ahmedalqershi1245 4 жыл бұрын

I usually don't like commenting on KZbin videos. But for this one, I felt like I had to show appreciation because truly this video was extremely helpful. University professors spend hours explaining what you just explained in 11 minutes. And you are the winner. Perfect explanation. Thank you so much!!!!

@UnfoldDataScience 4 жыл бұрын

I appreciate it Ahmed. Your comments motivate me :)

@malavikadutta1011 4 жыл бұрын

Institutes spend two hours in explaining these two concepts and you made it clear in some minutes.excellent Explanation .

@UnfoldDataScience 4 жыл бұрын

Thanks a lot :)

@sandipansarkar9211 3 жыл бұрын

@@UnfoldDataScience I agree

@akhilgangavarapu9728 4 жыл бұрын

If i feel any concept is hard to understand, first thing i do is search for your videos. Very intuitive and easy to understand. Thank you so much!

@UnfoldDataScience 4 жыл бұрын

Your comments are my motivation Akhil. Thanks a lot. Happy learning. Tc

@jehanbhathena6270 3 жыл бұрын

This has become my favourite channel for ML/Data Science topics,thank you very much for sharing your knowledge

@UnfoldDataScience 3 жыл бұрын

Thanks Jehan, your words are my motivation.

@indrajithvasudevan8199 3 жыл бұрын

Best channel to learn ML and Data science concepts. Thank you sir

@UnfoldDataScience 3 жыл бұрын

Thanks Indrajit. Kindly share video within data science groups if possible.

@__anonymous__4533 Жыл бұрын

I have an assignment due tomorrow and this helped a lot!

@shyampratapsingh4878 4 жыл бұрын

The simplest and best explanation so far.

@UnfoldDataScience 4 жыл бұрын

Glad it was helpful Shyam.

@yhbarve 2 ай бұрын

Thanks for the explanation. It's made the concept a lot more clearer for me

@travelbearmama 4 жыл бұрын

With your clear explanation, I finally understand what Gini index is. Thank you so much!

@UnfoldDataScience 4 жыл бұрын

You are welcome. happy learning. Stay Safe!!

@hassangharbi3687 3 жыл бұрын

Very goog and clear, i'm french speaking and i had understood almost everything

@UnfoldDataScience 3 жыл бұрын

Thanks Hassan.

@vishesh_soni 3 жыл бұрын

Your first video that I came across. Subscribed!

@UnfoldDataScience 3 жыл бұрын

Thanks Vishesh.

@Guidussify 7 ай бұрын

Excellent, to the point, good examples. Great work!

@KASHOKKUMARgnitcECE Жыл бұрын

Thanks bro...explained in easy manner...

@valor36az 3 жыл бұрын

I just discovered this channel what a gem

@UnfoldDataScience 3 жыл бұрын

Thanks a lot. please share with others in various data science groups as well.

@zainahmed6502 4 жыл бұрын

Wow! Not only was your explanation amazing but you also answered every single comment! True dedication. Keep it up!

@UnfoldDataScience 4 жыл бұрын

Thanks a ton Zain.

@priyankabachhav5315 2 жыл бұрын

Thank you so much sir, before watching this video I have watched 4 videos related to impurity but everyone is doing mixup of entropy and impurity n it was not really clear like what exactly formula is, how does it works.. But after watching ur video.. It is tottaly cleared now. Thank you for this beautiful n clear explanation

@UnfoldDataScience 2 жыл бұрын

Glad you understood

@Pesions 4 жыл бұрын

You have a really good explanation skills, thank you man , i finally understand it

@UnfoldDataScience 4 жыл бұрын

Most Welcome :)

@joeycopperson Жыл бұрын

thanks for clear and easy explanation

@indronilbhattacharjee2788 3 жыл бұрын

finally i am getting some clear explanations for various concepts

@UnfoldDataScience 3 жыл бұрын

thanks Indra.

@9495tj 3 жыл бұрын

Awesome video.. Thank You so much!

@UnfoldDataScience 3 жыл бұрын

Thank you.

@alexandre52045 8 ай бұрын

Thanks for the video ! It was really clear and well executed. Would have been great to detail the entropy calculation though, I find it a bit elusive without an example

@shubhangiagrawal336 4 жыл бұрын

very well explained

@UnfoldDataScience 4 жыл бұрын

Thanks for watching Subhangi.

@zerdaloo 4 ай бұрын

Brilliant video sir

@datafuturelab_ssb4433 3 жыл бұрын

Great explaination I have que Is gini index negative

@UnfoldDataScience 3 жыл бұрын

Hi, no it can not be.

@ARJUN-op2dh 3 жыл бұрын

Simple & clear

@UnfoldDataScience 3 жыл бұрын

Thanks a lot.

@seanpeng12 4 жыл бұрын

Your explanation is awesome, thanks.

@UnfoldDataScience 4 жыл бұрын

Thanks a lot for your valuable feedback.

@ece7700 Жыл бұрын

thank you so much

@deepikanadarajan3407 3 жыл бұрын

very clear explanation and very helpfull

@UnfoldDataScience 3 жыл бұрын

Glad it was helpful Deepika.

@RaviSingh-xx2wq 3 жыл бұрын

Amazing explanation

@UnfoldDataScience 3 жыл бұрын

Thanks Ravi.

@jarrelldunson 4 жыл бұрын

Thank you

@UnfoldDataScience 4 жыл бұрын

Welcome Jarrell.

@abhishekraturi 4 жыл бұрын

Just to make clear, the Gini index ranges from 0 to 0.5 and not 0 to 1. Jump to to video at 7:10

@UnfoldDataScience 4 жыл бұрын

Yes, this is the common comment from many users. Your are right Abhishek.

@awanishkumar6308 3 жыл бұрын

I appreciate your concepts for Gini and Entropy

@UnfoldDataScience 3 жыл бұрын

Thanks Awanish.

@reviewsfromthe60025 3 жыл бұрын

Great video

@UnfoldDataScience 3 жыл бұрын

Thanaks a lot.

@Kumarsashi-qy8xh 3 жыл бұрын

U r doing great job sir

@UnfoldDataScience 3 жыл бұрын

Thanks a lot.

@zuzulorentzen8653 Жыл бұрын

Thanks man

@mavaamusicmachine2241 2 жыл бұрын

Thank you for this video very helpful

@Shonashoni1 2 жыл бұрын

Amazing explanation sir

@fromthenorthfromthenorth8224 4 жыл бұрын

Thanks for this clear and well explain Gini index.... Thanks ....

@UnfoldDataScience 4 жыл бұрын

Glad it was helpful!

@arindamn4880 Ай бұрын

Is it possible to derive entropy formula?

@OverConfidenceGamingYT 3 жыл бұрын

Thank you ❣️

@UnfoldDataScience 3 жыл бұрын

Welcome.

@onurrrrr77 2 ай бұрын

My question would be: Does the splitting algorithm also consider all the feature combinations besides splits for single features? For ezample id>2 & loan>250 on a single step?

@kunaldhuria3935 4 жыл бұрын

short simple and sweet, thank you so much

@UnfoldDataScience 4 жыл бұрын

You're welcome Kunal.

@sadhnarai8757 4 жыл бұрын

Great content.

@UnfoldDataScience 4 жыл бұрын

Thank you.

@adityasrivastava78 Жыл бұрын

Good teaching

@UnfoldDataScience Жыл бұрын

Keep watching

@yyndsai 2 жыл бұрын

Thank you, no one could have done better

@UnfoldDataScience 2 жыл бұрын

You comments mean a lot to me

@dracula5505 4 жыл бұрын

Do we have to calculate both gini and entropy to figure out which is best for the dat aset??

@UnfoldDataScience 4 жыл бұрын

Only one at a time.

@eiderdiaz7219 4 жыл бұрын

love it, very clear explanation

@UnfoldDataScience 4 жыл бұрын

Thanks Eider. Happy learning. Tc

@sandipansarkar9211 3 жыл бұрын

great explanation

@UnfoldDataScience 3 жыл бұрын

Glad it was helpful!

@abhinai2713 4 жыл бұрын

@10:38 where the information gain is high ,there we try to split to node right??

@UnfoldDataScience 4 жыл бұрын

That is a good question. The formula you see @10:38 is for entropy of a node. Information gain for a split = Entropy of node - Entropy of child nodes after the split Decision tree splits at the place where the information gain is highest. In other way you can say , decision tree splits where entropy is reduced to largest extent.

@bhargavsai8181 4 жыл бұрын

This is On point, thank you so much.

@UnfoldDataScience 4 жыл бұрын

You are so welcome Bhargav.

@skvali3810 2 жыл бұрын

i have one question aaman . at root node is the gini are Entropy is high are low..

@ykokadwar 2 жыл бұрын

Can you help to explain intuitively the Entropy equation

@yohanessatria2220 3 жыл бұрын

So, the only difference between Gini and Information Gain is only the performance speed right? I assume with the same state of descision making and data, both Gini and Information Gain will be able to pick the same best attribute, right? Great video btw!

@UnfoldDataScience 3 жыл бұрын

That is correct. Also the internal mathematical formula is different.

@response2u 2 жыл бұрын

Thank you, sir!

@UnfoldDataScience 2 жыл бұрын

Very welcome!

@vishalrai2859 3 жыл бұрын

Thank you so much sir please do some projects

@UnfoldDataScience 3 жыл бұрын

Thanks Vishal.

@muhyidinarif9248 4 жыл бұрын

thank you so much, this helps me a lot!!!

@UnfoldDataScience 4 жыл бұрын

I'm so glad!

@abhijitkunjiraman6899 4 жыл бұрын

This is brilliant. Thank you so much!

@UnfoldDataScience 4 жыл бұрын

Thanks Abhijit. Keep Watching. Stay Safe!!

@lalitsaini3276 4 жыл бұрын

Nicely explained....! Subscribed :)

@UnfoldDataScience 4 жыл бұрын

Thanks Lalit. So nice of you :)

@melvincotoner4878 4 жыл бұрын

thanks

@UnfoldDataScience 4 жыл бұрын

Welcome.

@karthikganesh4679 3 жыл бұрын

Sir kindly explain entropy in detail just like the way you presented gini index

@UnfoldDataScience 3 жыл бұрын

Sure Karthik. Keep watching.

@frosty2164 3 жыл бұрын

which model has less bias and high variance-logistic, decision tree or random forest? can you please help

@UnfoldDataScience 3 жыл бұрын

Decision tree high variance low bias Logistics regression - high bias, low variance Random forest - Tries to reduce the high variance of decision tree. Bias is low.

@frosty2164 3 жыл бұрын

@@UnfoldDataScience Thank you very much.. can you also share the reason behind this.. or if you got any link where i can understand

@nalisharathod6098 4 жыл бұрын

Great Explanation !! very helpful . Thank you :)

@UnfoldDataScience 4 жыл бұрын

Glad it was helpful!

@kamran_desu 4 жыл бұрын

Very nice explanation and icing on the cake for comparing their performance at the end. Just to confirm, is Gini/IG only for classification? For the regression trees we would use loss functions like sum of squared residuals?

@UnfoldDataScience 4 жыл бұрын

That's a good question, since it's based on probability so it is applicable to classifiers. For regression, we see something like to minimize SSE or other error.

@mannankohli 4 жыл бұрын

@@UnfoldDataScience Hi sir, as per my knowledge "Information Gain" is used when the attributes are categorical in nature. while "Gini Index" is used when attributes are continuous in nature.

@mannankohli 4 жыл бұрын

Hi sir, as per my knowledge "Information Gain" is used when the attributes are categorical in nature. while "Gini Index" is used when attributes are continuous in nature

@anil90kumar 4 жыл бұрын

Good explanation.. But correction needed. Gini oscillates between 0 and 0.5.. The worst split could half positive half negative.. Gini impurity for that wing is 0.5 also overall weighted gini would be 0.5.. It is entropy that oscillates between 0 and 1.

@UnfoldDataScience 4 жыл бұрын

You are Right Anil. This feedback is coming from other viewers as well may be I mentioned this part wrong in video. I am pinning your comment to top for everyone's benefit. Thanks again.

@bhagyashreemourya7071 3 жыл бұрын

I'm a bit confused between Gini and Entropy. I mean is it necessary to use both methods while analyzing or we can go for any one of them?

@nikhilgupta4859 3 жыл бұрын

We have to use only one of them. Which one to choose depends on data.

@UnfoldDataScience 3 жыл бұрын

Depends on case not both to be used

@umair.ramzan 4 жыл бұрын

I think we select the split with the highest information gain when using entropy. Please correct me if I'm wrong.

@abdobourenane9294 4 жыл бұрын

You are right, When an internal node is split, the split is performed in such a way so that information gain is maximized.

@UnfoldDataScience 4 жыл бұрын

Thanks Abdo. Yes maximum IG is considered for split. Probably I missed to include in video.

@abdobourenane9294 4 жыл бұрын

@@UnfoldDataScience You are welcome. i also get some new informations from your video

@nomanshaikhali3355 4 жыл бұрын

For titanic dataset what type of criteria we have to use??

@UnfoldDataScience 4 жыл бұрын

Hi Noman, cant say, we need to try and see which one works better.

@Kumarsashi-qy8xh 5 жыл бұрын

sir Your explanation really very much helps me thank you

@UnfoldDataScience 5 жыл бұрын

You are welcome.

@sahilmehta885 2 жыл бұрын

✌🏻✌🏻

@23ishaan 4 жыл бұрын

Great video !

@UnfoldDataScience 4 жыл бұрын

Thanks for the visit

@chrisamyrotos8313 4 жыл бұрын

Very Good!!!

@UnfoldDataScience 4 жыл бұрын

Thank you Chris. happy learning. stay safe. tc

@mx1327 4 жыл бұрын

does the CART go through all the possible numerical values under loan to find the best condition? If you have a large amount of data, then should it be very slow?

@UnfoldDataScience 4 жыл бұрын

That is a good question. Thanks for asking. In general, for a numerical variable, first split point is chosen randomly and then the point is optimized based on "in which direction" loss function is moving. Please note, loss in this case is the node purity after split.

@shivanshjayara6372 4 жыл бұрын

sir i am confused regarding the selection criteria for root node. Some where i have studied that whose I.G.-E value is maximum, that feature will be selected as root node...and here you have said that whose I.G. is less will be selected as root node....I am confused.

@UnfoldDataScience 4 жыл бұрын

That's a good question. Entropy and IG are related. Understand it like this, entropy should be less and IG should be more. IG from a split = entropy of parent node - entropy of child nodes created. Here, decision tree will try to split in such a way that IG is maximum, in other way entropy is reduced to maximum extent. Hope it's clear now.

@shivanshjayara6372 4 жыл бұрын

@@UnfoldDataScience thanks for this response but is it true that, Gain value is max that will be selected as root node after that splitting take place based on that root node (feature). If we get the pure split then no further splitting take place but if we get any impure split then splitting will take based on that feature whose gain is second highest among those feature. Isi tarah hai na?

@shivanshjayara6372 4 жыл бұрын

@@UnfoldDataScience and if possible plz give me ur email id. I have few more questions. I need to give the images of some points so that u can help me out.

@PrithivirajSaminathan 5 жыл бұрын

buddy, gini does not lie between 0 to 1 .. its entrophy that lies between 0 to 1 gini is always less than 0.5 so it always lies between 0 to 0.5

@UnfoldDataScience 5 жыл бұрын

I think yes, Gini lies between 0 to 1. Please help me with more details if you disagree.

@thenewnormal1197 3 жыл бұрын

How is a root node selected based on gini?

@UnfoldDataScience 3 жыл бұрын

All nodes, same strategy, criteria = "your parameter"

@bishwajeetsingh8834 2 жыл бұрын

Which one to choose, like how by seeing the data I can assume, what we can use gini or IG?

@UnfoldDataScience 2 жыл бұрын

Cant decide in advance, its more of trial and error(there are some directions though)

@ranad2037 Жыл бұрын

Thanks a lot!

@UnfoldDataScience Жыл бұрын

You're welcome!

@rajashekar7679 3 жыл бұрын

can i use these metrics for multi class classification

@UnfoldDataScience 3 жыл бұрын

Yes, in a different way.

@SivaKumar-rv1nn 3 жыл бұрын

Thankyou sir

@UnfoldDataScience 3 жыл бұрын

Welcome Siva.

@geethanjaliravichandhran8109 3 жыл бұрын

well sir how the root node selection criteria occurs if two data sets shares same and lowest gini index value

@UnfoldDataScience 3 жыл бұрын

Happens very rarely, Geethanjali.

@subhajitdutta1443 2 жыл бұрын

Hello Aman, Hope you are well. I have a question. Hope you can help me here. If probability(P) =0, Then Gini Impurity becomes = 1, as per the formula.. Then why it always ranges from 0 to 0.5? Thank you, Subhajit

@gmcoy213 4 жыл бұрын

So if i am using the C5.0 algorithm? Which Separation technique will be used?

@UnfoldDataScience 4 жыл бұрын

entropy for measuring purity.

@rosh70 2 жыл бұрын

Can you show one numerical example using entropy? when the formula starts with a negative sign, how can the value be positive? Just curious.

@kunalshaw2440 2 жыл бұрын

because log(x)

@ameerhamza-zr5oc 3 жыл бұрын

Hello sir can you please tell me about min split and max split in decision tree?

@UnfoldDataScience 3 жыл бұрын

Hi Ameer, Min Split. The minimum number of values in a node that must exist before a split is attempted. In other words, if the node has two members and the minimum split is set to 5, the node will become terminal, that is, no split will be attempted.

@ameerhamza-zr5oc 3 жыл бұрын

@@UnfoldDataScience thank you sir and sir please make a video on min and max split also in future

@rahuljaiswal141 4 жыл бұрын

is there any rule when to use gini index and IG?

@UnfoldDataScience 4 жыл бұрын

It is subjective. we need to check with Data.

@tathagatpatil1419 3 жыл бұрын

When shoud we use gini index and when should we use entropy?

@UnfoldDataScience 3 жыл бұрын

It's a question of tuning parameter based on your data, for some input one. Ay work better for some other one.

@stevenadiwiguna1995 4 жыл бұрын

Hi! i want to make sure about gini index. You said that "criteria of the split will be selected based on minimum GINI INDEX from all the possible condition". Is it "gini index" or "weighted gini index"? Thanks a lot tho! Learn a lot from this video!

@UnfoldDataScience 4 жыл бұрын

Thanks Steven. "Gini Index".

@MrKhaledpage 4 жыл бұрын

Thank you, well explained

@UnfoldDataScience 4 жыл бұрын

Glad it was helpful!

@samhitagiriprabha6533 4 жыл бұрын

Awesome Explanation, very sharp! I have 2 questions: 1. Since this algorithm calculates Gini index for ALL splits in EACH column, is this process time-consuming? 2. What if the algorithm finds TWO conditions where GINI Index is 0. Then how does it decide which condition to split on? Thank you in advance!

@UnfoldDataScience 4 жыл бұрын

1. It is process consuming but it does not happen one by one internally for numerical columns, algorithm tries to figure out in which direction it should move smartly. For categorical columns it happens one by one and time consuming.

@UnfoldDataScience 4 жыл бұрын

2.0 means homogeneous sets hence no further split will happen

@GopiKumar-ny3xx 5 жыл бұрын

Nice presentation.. Keep going....

@UnfoldDataScience 5 жыл бұрын

Thanks a lot.

@eramitjangra4660 4 жыл бұрын

In entropy splitting would be based on highest information gain not minimum information gain.

@divyad4058 3 жыл бұрын

Yes. Gini index should be minimum and information gain should be maximum

@UnfoldDataScience 3 жыл бұрын

Yes. This comment came before as well. IG has to be maximum. I will move this comment on top for everyone's benefit. Thanks.

@awanishkumar6308 3 жыл бұрын

But if we have datasets with multiple columns like more than this example then how we will decide select which input column should be splited?

@UnfoldDataScience 3 жыл бұрын

Answered.

@tanzeelmohammed9157 2 жыл бұрын

Sir, range of Gini Index is from 0 to 1 or 0 to 0.5? i am confused

@UnfoldDataScience 2 жыл бұрын

see previous comments. we have discussed it.

@nikhildevnani9207 3 жыл бұрын

Amazing explanation aman . I have one doubt like suppose there are 5 columns(4 independent and 1 target). For split i have used 1,2,4,3 columns and other person is using 3,2,1,4. Then on what factors we can decide either my splits are best or the other guy's split is best.

@UnfoldDataScience 3 жыл бұрын

Its algorithm decision which columns to use.