There is a mistake in your video: You said to choose that attribute that has less information gain. But actually we have to choose that has high information gain...
@UnfoldDataScience3 жыл бұрын
Yes Naat, thanks for pointing out. I have pinned the comments related to it in the video for everyones benefit.
@islamicinterestofficial3 жыл бұрын
@@UnfoldDataScience Pleasure sir
@nikhilgupta48593 жыл бұрын
If you are saying that we have to choose high information gain. Then as per video we should take the impure node. For pure node gini would come 0 and hence 0 IG. Isn't something wrong.
@DK-il7ql3 жыл бұрын
At what time that has been said and corrected?
@RaviSingh-xx2wq3 жыл бұрын
@@DK-il7ql 10:37 he said Low information gain by mistake instead of high information gain
@ahmedalqershi12454 жыл бұрын
I usually don't like commenting on KZbin videos. But for this one, I felt like I had to show appreciation because truly this video was extremely helpful. University professors spend hours explaining what you just explained in 11 minutes. And you are the winner. Perfect explanation. Thank you so much!!!!
@UnfoldDataScience4 жыл бұрын
I appreciate it Ahmed. Your comments motivate me :)
@malavikadutta10114 жыл бұрын
Institutes spend two hours in explaining these two concepts and you made it clear in some minutes.excellent Explanation .
@UnfoldDataScience4 жыл бұрын
Thanks a lot :)
@sandipansarkar92113 жыл бұрын
@@UnfoldDataScience I agree
@akhilgangavarapu97284 жыл бұрын
If i feel any concept is hard to understand, first thing i do is search for your videos. Very intuitive and easy to understand. Thank you so much!
@UnfoldDataScience4 жыл бұрын
Your comments are my motivation Akhil. Thanks a lot. Happy learning. Tc
@jehanbhathena62703 жыл бұрын
This has become my favourite channel for ML/Data Science topics,thank you very much for sharing your knowledge
@UnfoldDataScience3 жыл бұрын
Thanks Jehan, your words are my motivation.
@indrajithvasudevan81993 жыл бұрын
Best channel to learn ML and Data science concepts. Thank you sir
@UnfoldDataScience3 жыл бұрын
Thanks Indrajit. Kindly share video within data science groups if possible.
@__anonymous__4533 Жыл бұрын
I have an assignment due tomorrow and this helped a lot!
@shyampratapsingh48784 жыл бұрын
The simplest and best explanation so far.
@UnfoldDataScience4 жыл бұрын
Glad it was helpful Shyam.
@yhbarve2 ай бұрын
Thanks for the explanation. It's made the concept a lot more clearer for me
@travelbearmama4 жыл бұрын
With your clear explanation, I finally understand what Gini index is. Thank you so much!
@UnfoldDataScience4 жыл бұрын
You are welcome. happy learning. Stay Safe!!
@hassangharbi36873 жыл бұрын
Very goog and clear, i'm french speaking and i had understood almost everything
@UnfoldDataScience3 жыл бұрын
Thanks Hassan.
@vishesh_soni3 жыл бұрын
Your first video that I came across. Subscribed!
@UnfoldDataScience3 жыл бұрын
Thanks Vishesh.
@Guidussify7 ай бұрын
Excellent, to the point, good examples. Great work!
@KASHOKKUMARgnitcECE Жыл бұрын
Thanks bro...explained in easy manner...
@valor36az3 жыл бұрын
I just discovered this channel what a gem
@UnfoldDataScience3 жыл бұрын
Thanks a lot. please share with others in various data science groups as well.
@zainahmed65024 жыл бұрын
Wow! Not only was your explanation amazing but you also answered every single comment! True dedication. Keep it up!
@UnfoldDataScience4 жыл бұрын
Thanks a ton Zain.
@priyankabachhav53152 жыл бұрын
Thank you so much sir, before watching this video I have watched 4 videos related to impurity but everyone is doing mixup of entropy and impurity n it was not really clear like what exactly formula is, how does it works.. But after watching ur video.. It is tottaly cleared now. Thank you for this beautiful n clear explanation
@UnfoldDataScience2 жыл бұрын
Glad you understood
@Pesions4 жыл бұрын
You have a really good explanation skills, thank you man , i finally understand it
@UnfoldDataScience4 жыл бұрын
Most Welcome :)
@joeycopperson Жыл бұрын
thanks for clear and easy explanation
@indronilbhattacharjee27883 жыл бұрын
finally i am getting some clear explanations for various concepts
@UnfoldDataScience3 жыл бұрын
thanks Indra.
@9495tj3 жыл бұрын
Awesome video.. Thank You so much!
@UnfoldDataScience3 жыл бұрын
Thank you.
@alexandre520458 ай бұрын
Thanks for the video ! It was really clear and well executed. Would have been great to detail the entropy calculation though, I find it a bit elusive without an example
@shubhangiagrawal3364 жыл бұрын
very well explained
@UnfoldDataScience4 жыл бұрын
Thanks for watching Subhangi.
@zerdaloo4 ай бұрын
Brilliant video sir
@datafuturelab_ssb44333 жыл бұрын
Great explaination I have que Is gini index negative
@UnfoldDataScience3 жыл бұрын
Hi, no it can not be.
@ARJUN-op2dh3 жыл бұрын
Simple & clear
@UnfoldDataScience3 жыл бұрын
Thanks a lot.
@seanpeng124 жыл бұрын
Your explanation is awesome, thanks.
@UnfoldDataScience4 жыл бұрын
Thanks a lot for your valuable feedback.
@ece7700 Жыл бұрын
thank you so much
@deepikanadarajan34073 жыл бұрын
very clear explanation and very helpfull
@UnfoldDataScience3 жыл бұрын
Glad it was helpful Deepika.
@RaviSingh-xx2wq3 жыл бұрын
Amazing explanation
@UnfoldDataScience3 жыл бұрын
Thanks Ravi.
@jarrelldunson4 жыл бұрын
Thank you
@UnfoldDataScience4 жыл бұрын
Welcome Jarrell.
@abhishekraturi4 жыл бұрын
Just to make clear, the Gini index ranges from 0 to 0.5 and not 0 to 1. Jump to to video at 7:10
@UnfoldDataScience4 жыл бұрын
Yes, this is the common comment from many users. Your are right Abhishek.
@awanishkumar63083 жыл бұрын
I appreciate your concepts for Gini and Entropy
@UnfoldDataScience3 жыл бұрын
Thanks Awanish.
@reviewsfromthe600253 жыл бұрын
Great video
@UnfoldDataScience3 жыл бұрын
Thanaks a lot.
@Kumarsashi-qy8xh3 жыл бұрын
U r doing great job sir
@UnfoldDataScience3 жыл бұрын
Thanks a lot.
@zuzulorentzen8653 Жыл бұрын
Thanks man
@mavaamusicmachine22412 жыл бұрын
Thank you for this video very helpful
@Shonashoni12 жыл бұрын
Amazing explanation sir
@fromthenorthfromthenorth82244 жыл бұрын
Thanks for this clear and well explain Gini index.... Thanks ....
@UnfoldDataScience4 жыл бұрын
Glad it was helpful!
@arindamn4880Ай бұрын
Is it possible to derive entropy formula?
@OverConfidenceGamingYT3 жыл бұрын
Thank you ❣️
@UnfoldDataScience3 жыл бұрын
Welcome.
@onurrrrr772 ай бұрын
My question would be: Does the splitting algorithm also consider all the feature combinations besides splits for single features? For ezample id>2 & loan>250 on a single step?
@kunaldhuria39354 жыл бұрын
short simple and sweet, thank you so much
@UnfoldDataScience4 жыл бұрын
You're welcome Kunal.
@sadhnarai87574 жыл бұрын
Great content.
@UnfoldDataScience4 жыл бұрын
Thank you.
@adityasrivastava78 Жыл бұрын
Good teaching
@UnfoldDataScience Жыл бұрын
Keep watching
@yyndsai2 жыл бұрын
Thank you, no one could have done better
@UnfoldDataScience2 жыл бұрын
You comments mean a lot to me
@dracula55054 жыл бұрын
Do we have to calculate both gini and entropy to figure out which is best for the dat aset??
@UnfoldDataScience4 жыл бұрын
Only one at a time.
@eiderdiaz72194 жыл бұрын
love it, very clear explanation
@UnfoldDataScience4 жыл бұрын
Thanks Eider. Happy learning. Tc
@sandipansarkar92113 жыл бұрын
great explanation
@UnfoldDataScience3 жыл бұрын
Glad it was helpful!
@abhinai27134 жыл бұрын
@10:38 where the information gain is high ,there we try to split to node right??
@UnfoldDataScience4 жыл бұрын
That is a good question. The formula you see @10:38 is for entropy of a node. Information gain for a split = Entropy of node - Entropy of child nodes after the split Decision tree splits at the place where the information gain is highest. In other way you can say , decision tree splits where entropy is reduced to largest extent.
@bhargavsai81814 жыл бұрын
This is On point, thank you so much.
@UnfoldDataScience4 жыл бұрын
You are so welcome Bhargav.
@skvali38102 жыл бұрын
i have one question aaman . at root node is the gini are Entropy is high are low..
@ykokadwar2 жыл бұрын
Can you help to explain intuitively the Entropy equation
@yohanessatria22203 жыл бұрын
So, the only difference between Gini and Information Gain is only the performance speed right? I assume with the same state of descision making and data, both Gini and Information Gain will be able to pick the same best attribute, right? Great video btw!
@UnfoldDataScience3 жыл бұрын
That is correct. Also the internal mathematical formula is different.
@response2u2 жыл бұрын
Thank you, sir!
@UnfoldDataScience2 жыл бұрын
Very welcome!
@vishalrai28593 жыл бұрын
Thank you so much sir please do some projects
@UnfoldDataScience3 жыл бұрын
Thanks Vishal.
@muhyidinarif92484 жыл бұрын
thank you so much, this helps me a lot!!!
@UnfoldDataScience4 жыл бұрын
I'm so glad!
@abhijitkunjiraman68994 жыл бұрын
This is brilliant. Thank you so much!
@UnfoldDataScience4 жыл бұрын
Thanks Abhijit. Keep Watching. Stay Safe!!
@lalitsaini32764 жыл бұрын
Nicely explained....! Subscribed :)
@UnfoldDataScience4 жыл бұрын
Thanks Lalit. So nice of you :)
@melvincotoner48784 жыл бұрын
thanks
@UnfoldDataScience4 жыл бұрын
Welcome.
@karthikganesh46793 жыл бұрын
Sir kindly explain entropy in detail just like the way you presented gini index
@UnfoldDataScience3 жыл бұрын
Sure Karthik. Keep watching.
@frosty21643 жыл бұрын
which model has less bias and high variance-logistic, decision tree or random forest? can you please help
@UnfoldDataScience3 жыл бұрын
Decision tree high variance low bias Logistics regression - high bias, low variance Random forest - Tries to reduce the high variance of decision tree. Bias is low.
@frosty21643 жыл бұрын
@@UnfoldDataScience Thank you very much.. can you also share the reason behind this.. or if you got any link where i can understand
@nalisharathod60984 жыл бұрын
Great Explanation !! very helpful . Thank you :)
@UnfoldDataScience4 жыл бұрын
Glad it was helpful!
@kamran_desu4 жыл бұрын
Very nice explanation and icing on the cake for comparing their performance at the end. Just to confirm, is Gini/IG only for classification? For the regression trees we would use loss functions like sum of squared residuals?
@UnfoldDataScience4 жыл бұрын
That's a good question, since it's based on probability so it is applicable to classifiers. For regression, we see something like to minimize SSE or other error.
@mannankohli4 жыл бұрын
@@UnfoldDataScience Hi sir, as per my knowledge "Information Gain" is used when the attributes are categorical in nature. while "Gini Index" is used when attributes are continuous in nature.
@mannankohli4 жыл бұрын
Hi sir, as per my knowledge "Information Gain" is used when the attributes are categorical in nature. while "Gini Index" is used when attributes are continuous in nature
@anil90kumar4 жыл бұрын
Good explanation.. But correction needed. Gini oscillates between 0 and 0.5.. The worst split could half positive half negative.. Gini impurity for that wing is 0.5 also overall weighted gini would be 0.5.. It is entropy that oscillates between 0 and 1.
@UnfoldDataScience4 жыл бұрын
You are Right Anil. This feedback is coming from other viewers as well may be I mentioned this part wrong in video. I am pinning your comment to top for everyone's benefit. Thanks again.
@bhagyashreemourya70713 жыл бұрын
I'm a bit confused between Gini and Entropy. I mean is it necessary to use both methods while analyzing or we can go for any one of them?
@nikhilgupta48593 жыл бұрын
We have to use only one of them. Which one to choose depends on data.
@UnfoldDataScience3 жыл бұрын
Depends on case not both to be used
@umair.ramzan4 жыл бұрын
I think we select the split with the highest information gain when using entropy. Please correct me if I'm wrong.
@abdobourenane92944 жыл бұрын
You are right, When an internal node is split, the split is performed in such a way so that information gain is maximized.
@UnfoldDataScience4 жыл бұрын
Thanks Abdo. Yes maximum IG is considered for split. Probably I missed to include in video.
@abdobourenane92944 жыл бұрын
@@UnfoldDataScience You are welcome. i also get some new informations from your video
@nomanshaikhali33554 жыл бұрын
For titanic dataset what type of criteria we have to use??
@UnfoldDataScience4 жыл бұрын
Hi Noman, cant say, we need to try and see which one works better.
@Kumarsashi-qy8xh5 жыл бұрын
sir Your explanation really very much helps me thank you
@UnfoldDataScience5 жыл бұрын
You are welcome.
@sahilmehta8852 жыл бұрын
✌🏻✌🏻
@23ishaan4 жыл бұрын
Great video !
@UnfoldDataScience4 жыл бұрын
Thanks for the visit
@chrisamyrotos83134 жыл бұрын
Very Good!!!
@UnfoldDataScience4 жыл бұрын
Thank you Chris. happy learning. stay safe. tc
@mx13274 жыл бұрын
does the CART go through all the possible numerical values under loan to find the best condition? If you have a large amount of data, then should it be very slow?
@UnfoldDataScience4 жыл бұрын
That is a good question. Thanks for asking. In general, for a numerical variable, first split point is chosen randomly and then the point is optimized based on "in which direction" loss function is moving. Please note, loss in this case is the node purity after split.
@shivanshjayara63724 жыл бұрын
sir i am confused regarding the selection criteria for root node. Some where i have studied that whose I.G.-E value is maximum, that feature will be selected as root node...and here you have said that whose I.G. is less will be selected as root node....I am confused.
@UnfoldDataScience4 жыл бұрын
That's a good question. Entropy and IG are related. Understand it like this, entropy should be less and IG should be more. IG from a split = entropy of parent node - entropy of child nodes created. Here, decision tree will try to split in such a way that IG is maximum, in other way entropy is reduced to maximum extent. Hope it's clear now.
@shivanshjayara63724 жыл бұрын
@@UnfoldDataScience thanks for this response but is it true that, Gain value is max that will be selected as root node after that splitting take place based on that root node (feature). If we get the pure split then no further splitting take place but if we get any impure split then splitting will take based on that feature whose gain is second highest among those feature. Isi tarah hai na?
@shivanshjayara63724 жыл бұрын
@@UnfoldDataScience and if possible plz give me ur email id. I have few more questions. I need to give the images of some points so that u can help me out.
@PrithivirajSaminathan5 жыл бұрын
buddy, gini does not lie between 0 to 1 .. its entrophy that lies between 0 to 1 gini is always less than 0.5 so it always lies between 0 to 0.5
@UnfoldDataScience5 жыл бұрын
I think yes, Gini lies between 0 to 1. Please help me with more details if you disagree.
@thenewnormal11973 жыл бұрын
How is a root node selected based on gini?
@UnfoldDataScience3 жыл бұрын
All nodes, same strategy, criteria = "your parameter"
@bishwajeetsingh88342 жыл бұрын
Which one to choose, like how by seeing the data I can assume, what we can use gini or IG?
@UnfoldDataScience2 жыл бұрын
Cant decide in advance, its more of trial and error(there are some directions though)
@ranad2037 Жыл бұрын
Thanks a lot!
@UnfoldDataScience Жыл бұрын
You're welcome!
@rajashekar76793 жыл бұрын
can i use these metrics for multi class classification
@UnfoldDataScience3 жыл бұрын
Yes, in a different way.
@SivaKumar-rv1nn3 жыл бұрын
Thankyou sir
@UnfoldDataScience3 жыл бұрын
Welcome Siva.
@geethanjaliravichandhran81093 жыл бұрын
well sir how the root node selection criteria occurs if two data sets shares same and lowest gini index value
@UnfoldDataScience3 жыл бұрын
Happens very rarely, Geethanjali.
@subhajitdutta14432 жыл бұрын
Hello Aman, Hope you are well. I have a question. Hope you can help me here. If probability(P) =0, Then Gini Impurity becomes = 1, as per the formula.. Then why it always ranges from 0 to 0.5? Thank you, Subhajit
@gmcoy2134 жыл бұрын
So if i am using the C5.0 algorithm? Which Separation technique will be used?
@UnfoldDataScience4 жыл бұрын
entropy for measuring purity.
@rosh702 жыл бұрын
Can you show one numerical example using entropy? when the formula starts with a negative sign, how can the value be positive? Just curious.
@kunalshaw24402 жыл бұрын
because log(x)
@ameerhamza-zr5oc3 жыл бұрын
Hello sir can you please tell me about min split and max split in decision tree?
@UnfoldDataScience3 жыл бұрын
Hi Ameer, Min Split. The minimum number of values in a node that must exist before a split is attempted. In other words, if the node has two members and the minimum split is set to 5, the node will become terminal, that is, no split will be attempted.
@ameerhamza-zr5oc3 жыл бұрын
@@UnfoldDataScience thank you sir and sir please make a video on min and max split also in future
@rahuljaiswal1414 жыл бұрын
is there any rule when to use gini index and IG?
@UnfoldDataScience4 жыл бұрын
It is subjective. we need to check with Data.
@tathagatpatil14193 жыл бұрын
When shoud we use gini index and when should we use entropy?
@UnfoldDataScience3 жыл бұрын
It's a question of tuning parameter based on your data, for some input one. Ay work better for some other one.
@stevenadiwiguna19954 жыл бұрын
Hi! i want to make sure about gini index. You said that "criteria of the split will be selected based on minimum GINI INDEX from all the possible condition". Is it "gini index" or "weighted gini index"? Thanks a lot tho! Learn a lot from this video!
@UnfoldDataScience4 жыл бұрын
Thanks Steven. "Gini Index".
@MrKhaledpage4 жыл бұрын
Thank you, well explained
@UnfoldDataScience4 жыл бұрын
Glad it was helpful!
@samhitagiriprabha65334 жыл бұрын
Awesome Explanation, very sharp! I have 2 questions: 1. Since this algorithm calculates Gini index for ALL splits in EACH column, is this process time-consuming? 2. What if the algorithm finds TWO conditions where GINI Index is 0. Then how does it decide which condition to split on? Thank you in advance!
@UnfoldDataScience4 жыл бұрын
1. It is process consuming but it does not happen one by one internally for numerical columns, algorithm tries to figure out in which direction it should move smartly. For categorical columns it happens one by one and time consuming.
@UnfoldDataScience4 жыл бұрын
2.0 means homogeneous sets hence no further split will happen
@GopiKumar-ny3xx5 жыл бұрын
Nice presentation.. Keep going....
@UnfoldDataScience5 жыл бұрын
Thanks a lot.
@eramitjangra46604 жыл бұрын
In entropy splitting would be based on highest information gain not minimum information gain.
@divyad40583 жыл бұрын
Yes. Gini index should be minimum and information gain should be maximum
@UnfoldDataScience3 жыл бұрын
Yes. This comment came before as well. IG has to be maximum. I will move this comment on top for everyone's benefit. Thanks.
@awanishkumar63083 жыл бұрын
But if we have datasets with multiple columns like more than this example then how we will decide select which input column should be splited?
@UnfoldDataScience3 жыл бұрын
Answered.
@tanzeelmohammed91572 жыл бұрын
Sir, range of Gini Index is from 0 to 1 or 0 to 0.5? i am confused
@UnfoldDataScience2 жыл бұрын
see previous comments. we have discussed it.
@nikhildevnani92073 жыл бұрын
Amazing explanation aman . I have one doubt like suppose there are 5 columns(4 independent and 1 target). For split i have used 1,2,4,3 columns and other person is using 3,2,1,4. Then on what factors we can decide either my splits are best or the other guy's split is best.
@UnfoldDataScience3 жыл бұрын
Its algorithm decision which columns to use.
@Sagar_Tachtode_7774 жыл бұрын
Thank you for your wonderful explanation. Please make a video on PSI and KS index.
@UnfoldDataScience4 жыл бұрын
Will do soon Sager. Thanks for feedback.
@prernamalik55794 жыл бұрын
It was very informative, Sir. Thank you :)
@UnfoldDataScience4 жыл бұрын
Most welcome Prerna.
@prasanthkumar6324 жыл бұрын
Aman, Can you please explain entropy also with an example like you did for Gini Index
@UnfoldDataScience4 жыл бұрын
Yes Prasanth, I will try to cover that topic in one of the upcoming video.
@prasanthkumar6324 жыл бұрын
Thank you Aman
@satwaki0073 жыл бұрын
Where did u study data science ?
@UnfoldDataScience3 жыл бұрын
Answered Satwaki.
@ashishbhatnagar95904 жыл бұрын
Sir why decision tree gives good accuracy in imbalanced dataset compared to logistic regression
@UnfoldDataScience4 жыл бұрын
Good question Ashish, It is because, there is no mathematical equation involved in Decision tree hence learning happens purely on rules.