Machine Learning Tutorial Python - 9 Decision Tree

  Рет қаралды 520,721

codebasics

codebasics

Күн бұрын

Пікірлер: 1 000
@codebasics
@codebasics 2 жыл бұрын
Check out our premium machine learning course with 2 Industry projects: codebasics.io/courses/machine-learning-for-data-science-beginners-to-advanced
@honeymilongton8401
@honeymilongton8401 2 жыл бұрын
it is better for us if you please provide that slides sir can you please send slides also sir
@adiflorense1477
@adiflorense1477 Жыл бұрын
Cool
@kisholoymukherjee
@kisholoymukherjee Жыл бұрын
Hi Dhaval sir, please note I tried to register in Python course. But the link is not working on the site
@Swormy097
@Swormy097 11 ай бұрын
@codebasics Hello Sir, Regarding the encoding approach (label encoding) used in the video, I read on the sklearn documentation that it should be used only on the target variable (output "y") and not the input feature ("x"). The documentation stated that for input feature one should use either onehotencoder, ordinalencoder, or dummy variable encoding. Also, I was expecting that you use onehotencoder(OHE) since the input features (company, job and degree) are nominal and not ordinal variables. Is it best practice to use OHE for nominal variables or it just doesn't matter? Please could you clarify for me??? Thank you.
@Koome777
@Koome777 9 ай бұрын
My model got a score of 98.6%. I dropped all the Age Na values which reduced the sample size from 812 to 714. I label-encoded the Sex column and then used a test size of 0.2 with the remainder of 0.8 as the training size. I am all smiles. Thanks @codebasics
@ansh6848
@ansh6848 2 жыл бұрын
Actually this man has made learning Machine Learning easy for everyone whereas if you will see other channels they show big mathematical equations and formulas..which makes beginners uncomfortable in learning ML. But thanks to this channel.♥️🥰
@bhawnaverma5532
@bhawnaverma5532 2 жыл бұрын
very True. Complex concept explained in very understanding way. Hats off really
@kartikeyamishra4641
@kartikeyamishra4641 5 жыл бұрын
This is by far the most straight forward and amazing video on decision trees I have come across! Keep making more videos Sir! I am totally hooked to your channel :) :)
@codebasics
@codebasics 5 жыл бұрын
Thanks kartikeya for your valuable feedback. 👍
@anitoons999
@anitoons999 7 күн бұрын
I had completed the model of titanic survival and the accuracy or score of my model is 0.987. I had very much fun in creating the model and it's a very interesting model. Thank you, sir😊.
@proplayerzone5122
@proplayerzone5122 2 жыл бұрын
Hi sir, I am a 10th grade student and I am learning ML and in the exercise My model got 81% accuracy😀 sir. Will Make many models while learning and share with you. Thanks for the tutorials sir.
@codebasics
@codebasics 2 жыл бұрын
It is ok to learn ML but make sure you find time for outdoor activities, sports and some fun things. The childhood will never come back and do not waste it in search of some shiny career. If you are so much concerned, I would advice focusing on math and statistics at this stage and worry about ML later.
@proplayerzone5122
@proplayerzone5122 2 жыл бұрын
@@codebasics ok sir. Thanks for guidance!
@kalaipradeep2753
@kalaipradeep2753 11 ай бұрын
Hi bro now what doing....
@kalaipradeep2753
@kalaipradeep2753 11 ай бұрын
How to fill empty value on age feature
@toxiclegacy5948
@toxiclegacy5948 7 ай бұрын
@@codebasicsAbsolutely correct, it’s great to learn new things. But learning all these is not your right age. Make more and more memories in childhood. I am 23 and trust me life is very painful…
@nikhilrana668
@nikhilrana668 3 жыл бұрын
For those wondering what 'information gain' is, it is just the measure of decrease of entropy after the dataset is split.
@g.scholtes
@g.scholtes 2 жыл бұрын
In In (8) you use the "le_company" LaberEncoder object 3 times and never use the 'le_job" and 'le_degree' objects. It still works, so my guess would be that you'll only need one LabelEncoder object to do the job.
@rajubhatt2
@rajubhatt2 2 жыл бұрын
label encoder basically converts the categorical to numerical, since job and degree are categorical you still need them to be LabelEncoded. and he used them see carefully using fit_transform().
@omdusane8685
@omdusane8685 Жыл бұрын
@@rajubhatt2 he encoded them using company object Only though
@AkhileshKumar-mg9vs
@AkhileshKumar-mg9vs Жыл бұрын
well here it worked as Sir used fit_transform but if he had splitted the data into test and train sets , then he would have used transform on remaining test set and for that different instances would be required for each coloumn.
@PAWANKELA-rh7yj
@PAWANKELA-rh7yj 3 ай бұрын
when i use only one object then my first 2 rows are drop from dataset ,why??
@nnasirhussain
@nnasirhussain Жыл бұрын
Excellent Tutorial. In Exercise I used three different method to fill Age 1- Backward, Forward, Median of Age 2- Median of Female_Survive to fill Female_Survive_Age and Median of Male_Survive to fill Male_Survive_Age and same for Not survive. 3- Interpolate Method. Using train_test_split of 0.3 test size. I get max of 82% accuracy and I also change gini to entropy for each approach
@WorldsTuber13
@WorldsTuber13 5 жыл бұрын
Your videos are absolutely awesome.... Those who wants a career transition in DS basically they use to spend more then 3k us dollars to do their certification and what they ultimately get is a diploma or a degree certification on Data Science not what exactly happening in data science, but when a scholar like you train us we come to know what's happening in it.
@codebasics
@codebasics 5 жыл бұрын
K Prabhu, thanks for your kind words of appreciation.
@minsaralokunarangoda4251
@minsaralokunarangoda4251 3 ай бұрын
Thanks for the awesome tutorial.... Dropped all na values in Age column which reduced the sample size from 812 to 714 and ran the model couple times, the best accuracy I got was 83.21%
@WestCoastBrothers_
@WestCoastBrothers_ 3 жыл бұрын
Incredible video! Thank you for sharing your knowledge. Scored a 83.15%. I changed the hyperparameter "criterion" to entropy instead of gini and was consistently performing better. Looking forward to seeing how changing other hyperparameters effects accuracy.
@codebasics
@codebasics 3 жыл бұрын
That’s the way to go niko, good job working on that exercise
@franky0226
@franky0226 4 жыл бұрын
Got an accuracy of 78.92 Thanks for the Lovely tutorial !
@larrybuluma2458
@larrybuluma2458 4 жыл бұрын
Thanks for this tutorial mate, it is the best straight forward DTC tutorial. Using entropy i got an 81% accuracy and, using gini i have a 78% accuracy
@codebasics
@codebasics 4 жыл бұрын
That’s the way to go Larry, good job working on that exercise
@kuldeepsharma7924
@kuldeepsharma7924 4 жыл бұрын
Got an accuracy of 97.20% Dropped all rows whose values were missing. Thank you, Dhaval sir..
@codebasics
@codebasics 4 жыл бұрын
Kuldeep, that is indeed a nice score. good job buddy.
@elvenkim
@elvenkim 2 жыл бұрын
Mine is 98.459%. Likewise I removed all missing data for Age.
@ShubhamSharma-qb1bw
@ShubhamSharma-qb1bw 2 жыл бұрын
@@elvenkim why you are removing the missing value whether it is possible to fill with whether mean or median it depends upon the outlier present in the column age
@sujankatwal9255
@sujankatwal9255 4 жыл бұрын
Thank you so much for the tutorial. Im doing all the exercise.I got an accuracy of 81% on titanic dataset
@codebasics
@codebasics 4 жыл бұрын
Sujan that a decent score. Good job 👍👏
@DataScienceHarrison
@DataScienceHarrison 7 ай бұрын
Thanks for the video. My model got an accuracy of 83.5%. Glad to be this far with the data science roadmap. Continue with good work sir.
@abhishekkhare6175
@abhishekkhare6175 3 жыл бұрын
got 97.4% accuracy filled the empty blocks in age with mean. thanks a lot for perfect tutorial
@nitinmalusare6763
@nitinmalusare6763 3 жыл бұрын
How to calculate accuracy for the above dataset mentioned in the video
@muskanagrawal9428
@muskanagrawal9428 6 ай бұрын
thanks it helped me increase my accuracy
@pablu_7
@pablu_7 4 жыл бұрын
I got 98.4 % in titanic data set . Thank you Sir , you are the best.
@codebasics
@codebasics 4 жыл бұрын
Oh wow, good job arnab 👍😊
@jayrathod2172
@jayrathod2172 3 жыл бұрын
I don't want to hurt your fillings but 98.4% is only possible if you are checking model score on train data instead of test data.
@blaze9558
@blaze9558 7 ай бұрын
true@@jayrathod2172
@vikassengupta8427
@vikassengupta8427 5 ай бұрын
​@@jayrathod2172yes I was about to say that, and also possible if you have change the random state multiple times and your model has seen all your data, and is now overfitted
@vanshoberoi2154
@vanshoberoi2154 23 күн бұрын
how the fuckkkkk
@stephenngumbikiilu3988
@stephenngumbikiilu3988 2 жыл бұрын
Thank for these awesome videos. I have been learning a lot through your ML tutorials. I replaced the missing values in the 'Age' column with the median. My test set was 20% and my accuracy on test data was 99.44%.
@AnanyaRay-ct8nx
@AnanyaRay-ct8nx Жыл бұрын
how? can u share the solution?
@vikassengupta8427
@vikassengupta8427 5 ай бұрын
There is high chance that the model is overfitted, it is not generalized
@vikassengupta8427
@vikassengupta8427 5 ай бұрын
Nd chances are that ur model has already seen your test data, better rerun from the first cell once and check...
@anujack7023
@anujack7023 3 жыл бұрын
I got 74.4% accuracy. it is good to do everything by my own....
@codebasics
@codebasics 3 жыл бұрын
That’s the way to go anujack, good job working on that exercise
@irmscher9
@irmscher9 5 жыл бұрын
*for x in features.columns:* *features[x] = le.fit_transform(features[x])*
@prabur3296
@prabur3296 5 жыл бұрын
How to write the predicted values into a csv file For eg: model.predict(test_data), I want the output array in a csv file submission.csv
@Pacificatorrr
@Pacificatorrr Ай бұрын
Hi! Thank you for this playlist
@alexplastow9496
@alexplastow9496 3 жыл бұрын
Thanks for helping me get my homework done, by God it was a mistake to wait till the last day
@yeru2480
@yeru2480 3 жыл бұрын
oh i couldn't agree more
@aayushichaudhari9357
@aayushichaudhari9357 3 ай бұрын
Hello sir, I received an accuracy of 97.97% for the given exercise. Thank you for the wonderful tutorials, all of them are very helpful and I am performing all exercises that you give at the end of the video.
@valapoluprudhviraj9778
@valapoluprudhviraj9778 4 жыл бұрын
Hurray! Sir i got an accuracy of 97.38% by using interpolate method for Age column.😍✨
@HipHop-cz6os
@HipHop-cz6os 4 жыл бұрын
Did u use train_test_split method
@codebasics
@codebasics 4 жыл бұрын
Good job Prudhvi, that’s a pretty good score. Thanks for working on the exercise
@jixa2109
@jixa2109 2 жыл бұрын
It was easy.. i got 98.3%
@vanshoberoi2154
@vanshoberoi2154 23 күн бұрын
can you walk me through what extra did u do for 97 score .. normally im getting 82 . i found right random state..
@iamfavoured9142
@iamfavoured9142 2 жыл бұрын
Thank you so much codebasics I have some questions: Why use label encoding for nominal categorical values such as company and job Degree is the only Ordinal categorical value that requires Label encoding Why instantiate the class for job and degree and use the one for company to fit and transform all the columns
@moeintorabi2205
@moeintorabi2205 4 жыл бұрын
There are some NaN values in the Age column. I filled them through padding. Also, I spit my data for testing and at the end I got the accuracy of 0.8.
@piyushtale0001
@piyushtale0001 2 жыл бұрын
Use fillna with median and accuracy will be 0.9777 by normal method
@tejassrivastava6971
@tejassrivastava6971 2 жыл бұрын
@@piyushtale0001 i have used median() for Pclass, Age and Fare but got score = 78 around. How to improve?
@AnilAnvesh
@AnilAnvesh 2 жыл бұрын
Thanks for this video. I have used train and test csv files of titanic. Cleaned both datasets and implemented Decision Tree Classifier and got a test score of 0.74 ❤️
@codebasics
@codebasics 2 жыл бұрын
That’s the way to go anil, good job working on that exercise
@ritamsadhu2873
@ritamsadhu2873 Жыл бұрын
Score is 97.75% for exercise dataset. Filled the null values in Age column with median value
@RohithS-ig4hl
@RohithS-ig4hl Жыл бұрын
I did the same thing, but i still get accuracy around 79%. Any suggestions?
@istiakahmed3033
@istiakahmed3033 10 ай бұрын
@@RohithS-ig4hlHey, I got 80% percent accuracy. I got also low accuracy like your.
@maruthiprasad8184
@maruthiprasad8184 2 жыл бұрын
Got accuracy as 76.22 %. Tried by tweaking train data & test data but no significant difference. Thank you very much for simple & clear explanation.
@kalaipradeep2753
@kalaipradeep2753 11 ай бұрын
How to fill empty value on age feature
@ss57hd
@ss57hd 5 жыл бұрын
Your VIdeos are always Awesome! Can u suggest me some websites where I can find Questions like those in ur Excercises and all?
@codebasics
@codebasics 5 жыл бұрын
Hey, honestly I am not aware of any good resource for this. Kaggle.com is there but it is for competition and little more advanced level. Try googling it. Sorry.
@MunnaSingh-dx3or
@MunnaSingh-dx3or 4 жыл бұрын
Simple explanation thank you! The excercise you have given got score of 98.18%... And it's predicting pretty well 👍 Thank you once again
@niyazahmad9133
@niyazahmad9133 4 жыл бұрын
Best_params_ plz
@user-fz9ni1ff6x
@user-fz9ni1ff6x 4 жыл бұрын
This is unbelieveable. I saw someone used Random forecast, SVM, Gradient Boosting etc. The best score on testing data is 84%. With simple Decsion Tree, best score would be around 82%, i think.
@eliashossain9849
@eliashossain9849 4 жыл бұрын
Exercise result for the titanic dataset: Score: 0.77 (using Decision Tree Classifier)
@cyberversary262
@cyberversary262 3 жыл бұрын
DUDE CAN U PLS SHARE ME THE CODE.... IM GETTING ACCURACY 1.0
@prakashdolby2031
@prakashdolby2031 3 жыл бұрын
@@cyberversary262 you are giving entire dataset to get trained , Better try with test_size != 1 (use 0.3-0.2 ) to get better results
@cyberversary262
@cyberversary262 3 жыл бұрын
@@prakashdolby2031 dude I have asked this question 3 months ago 😂😂😂
@moustafa_kb
@moustafa_kb 3 жыл бұрын
I got a 97.9% score, I replaced the NaN values in Age by the mean Age! Thank you for these great tutorials^^
@codebasics
@codebasics 3 жыл бұрын
Good job Moustapha, that’s a pretty good score. Thanks for working on the exercise
@zainsattar7364
@zainsattar7364 3 жыл бұрын
i also did the same but i got only 81% accuracy. i think you tested your score on training data set
@mayanktripathi3168
@mayanktripathi3168 3 жыл бұрын
@@zainsattar7364 You get a higher score if you don't split your dataset into train and test.
@noorameera26
@noorameera26 3 жыл бұрын
Will never get tired to say thank you at every video I watched but honestly, you're the best! :) Keep posting great videos
@codebasics
@codebasics 3 жыл бұрын
I am happy this was helpful to you.
@renjitlp2000
@renjitlp2000 27 күн бұрын
Thank you sitr, for your excellent class . my score is 76.5%
@kirankumarb2190
@kirankumarb2190 3 жыл бұрын
Why didn't we use dummy column concept here like we did for linear regression?
@naveedarif6285
@naveedarif6285 3 жыл бұрын
As in trees we have many levels so here dummy variables concept doesnt work well so we try to avoid it
@snehagupta-xz1fs
@snehagupta-xz1fs 3 жыл бұрын
@@naveedarif6285 how can we train and split dataset in this? Please help
@oatilemothuloe9178
@oatilemothuloe9178 2 жыл бұрын
Played with the test_size a bit and I managed to push out a score of 87% max.Appriciate the tutorial lot!
@ajaykumaars2154
@ajaykumaars2154 4 жыл бұрын
Hi Sir, Thanks for the great video. I've a question, why didn't we use one hot encoding here for our categorical variables?
@codebasics
@codebasics 4 жыл бұрын
We can but for decision tree it doesn't make much difference that's why I didn't use it
@ajaykumaars2154
@ajaykumaars2154 4 жыл бұрын
@@codebasics Ohh, OK Sir. Thank you
@whatever_5913
@whatever_5913 3 жыл бұрын
@@codebasics But then doesn't the model give a higher priority(value) to Facebook than to google on the basis of the number assigned in Label Encoding ...just confused here.
@regithabaiju
@regithabaiju 3 жыл бұрын
Thanks for sharing this awesome video. I have learned more about ML using this.
@codebasics
@codebasics 3 жыл бұрын
Great to hear!
@gaganbansal386
@gaganbansal386 3 жыл бұрын
Why we have not created dummy variables here as we have done in Logistic Regression using OneHotEncoder
@mohitb5230
@mohitb5230 3 жыл бұрын
In one hot encoding turorial you mentioned its better cos then we dont have encoding which has relation to each other. Please clarify. These videos are teaching me a lot.
@anshulagarwal6682
@anshulagarwal6682 2 жыл бұрын
Yes same doubt. Have you cleared your doubt? If yes, then please tell.
@anshulagarwal6682
@anshulagarwal6682 2 жыл бұрын
I think company should be given one hot encoding while job and degree should be label encoded.
@mukulborole
@mukulborole 2 жыл бұрын
Thank you for this awesome tutorial Sir I got accuracy of 97.98% I replaced the missing age values with mean of whole age column.
@surajraika9245
@surajraika9245 2 жыл бұрын
where did you get that data base
@mukulborole
@mukulborole 2 жыл бұрын
@@surajraika9245 You can find the dataset on his github repo
@surajraika9245
@surajraika9245 2 жыл бұрын
@@mukulborole thanks
@ganeshyugesh9559
@ganeshyugesh9559 2 жыл бұрын
i have only started to learn about data science using python and i have a question: Why use labelencoder rather than getting dummy variables for the categorical variables? Is it more efficient using labelencoder?
@yourskoolboy
@yourskoolboy Жыл бұрын
I prefer the .get_dummies()
@iradukundapacifique987
@iradukundapacifique987 4 жыл бұрын
Thank you sir. I got 83.3% accuracy on the titanic exercise and used train_test_split.
@codebasics
@codebasics 4 жыл бұрын
👍😊
@nkechiesomonu8764
@nkechiesomonu8764 4 жыл бұрын
please to did you remove the NaN values because i used median
@naveenkalhan95
@naveenkalhan95 4 жыл бұрын
really appreciate your work. learning a lot... just want to confirm something from the tutorial @7:40 you are using fit_transform with le_company object for all the other columns and did not use le_job object and le_degree object. is it ok? or should we do it? Thank you very much again.
@sadiqabbas5239
@sadiqabbas5239 3 жыл бұрын
That's just the variable name you can use that way too..
@amanyadav411
@amanyadav411 4 жыл бұрын
My model accuracy is 79.32 Thanks for the nice data science series🙏
@krijanprajapati6816
@krijanprajapati6816 4 жыл бұрын
Thank you so much sir, I really appreciate your tutorial, I learnt a lot
@codebasics
@codebasics 4 жыл бұрын
Krijancool, thanks for the comment. By the way your name is really cool 😎
@Bull3r13
@Bull3r13 3 жыл бұрын
Thank you very much! You really helped me
@codebasics
@codebasics 3 жыл бұрын
Glad I could help!
@udaysai2647
@udaysai2647 5 жыл бұрын
Great Tutorials keep going but I have a doubt why haven't you used onehotencoder for company here as it is nominal variable? and please make a tutorial on what exactly these parameters are and on random forests
@Bobette_2409
@Bobette_2409 4 жыл бұрын
true, one hot encoding is better than labelEncoder as assigning categories would results in errors in prediction if that feature is chosen, because higher category is considered better over the others. so in this case if google =0 and Fb =1 , then FB>Google.
@aravindabilash151
@aravindabilash151 4 жыл бұрын
@@Bobette_2409 Thank you for the clarification, actually i was trying it with OneHotEncoder and resulted in mis-prediction.
@hasnahanakhatun8824
@hasnahanakhatun8824 4 жыл бұрын
Thank you so much sir...All videos are very helpful....
@musicsense2799
@musicsense2799 4 жыл бұрын
Amazing Video! But I have some doubts please help me here: 1. We made three Label encoder instances here. Cant we use just one to encode all three? 2. We Use label encoding and not OneHoteEncoding, however, the latter made more sense as our model might assume that our variables have some order/ precedence It would be great if you clarify my doubts. Thanks!
@paulkornreich9806
@paulkornreich9806 2 жыл бұрын
It is necessary to understand the underlying logic of the algorithm. In regression, the algorithm tries to fit to a line, curve (or higher dimensional object in SVM), so, what the relative value (order, or where it is on the axis) is matters. In decision tree, the algorithm is just asking Yes/No questions, such as Is the company Facebook?, Does the employee have only a bachelors degree?, etc, so the order is not significant. Therefore, a the Label encoder is valid for decision tree. While it could have been possible to lump the label encoders into one, say by using a power of 10 to distinguish them, it would have given too much weight to the highest power of 10 (the algorithm understands numbers, so it is going to ask >/< /= questions), but the whole point of using decision tree was for *the algorithm* to find the precedence of features that will give the quickest prediction. Therefore it is better to have more features (i.e. more Label encoders). Then, if more features is better, one could re-ask the question of why not one-hot encoding, that would give even more encoders. Now, the issue is the tradeoff of accuracy vs conciseness. Here, there were only 3 companies, but there could be a case where a problem was examining over 100 companies. Having a one-hot encoder for all the companies would get quite cumbersome.
@muhammedrajab2301
@muhammedrajab2301 4 жыл бұрын
you can also write the Label Encoder part like this: le_company = le_job = le_degree = LabelEncoder() this worked for me!
@codebasics
@codebasics 4 жыл бұрын
Thanks for the top Muhammed
@shreehari2589
@shreehari2589 4 жыл бұрын
I think only one instance of labelencoder is enough i guess!
@tejobhiru1092
@tejobhiru1092 3 жыл бұрын
thank you for such amazing, well detailed and easy to understand tutorial(s) ! im following your channel exclusively for learning ML, along with kaggle competitions. also recommending your channel to my peers. great work..! PS - i got 75.8% as the score of my model in for the exercise. any tips to improve the score?
@shreyansengupta2594
@shreyansengupta2594 2 жыл бұрын
take test_size=0.5 it increases to 78.15%
@pranav9339
@pranav9339 Жыл бұрын
re execute the test train split function as it generates rows randomly. Then Again fit the model and execute. Continue this for 4-5 time until u get somewhere around 95% accuracy. So this set of data is the most accurate for training the model.
@Neerajkumar-xl9kx
@Neerajkumar-xl9kx 3 жыл бұрын
This is how machine learning or in general anything should be taught, not just start lecturing in white board with marker.
@mohammedalshen3147
@mohammedalshen3147 4 жыл бұрын
Thank you so much for making it very simple. As an ML learner, will do we need to understand the code behind each of these sklearn functions ?
@codebasics
@codebasics 4 жыл бұрын
Not necessary. If you know the math and internal details then it can help if you want to write you own customised ML algorithm but otherwise no.
@areejbasudan4732
@areejbasudan4732 2 жыл бұрын
@@codebasics can you recommend videos for understanding the math behind it, thanks
@rahulkambadur147
@rahulkambadur147 5 жыл бұрын
Do you have any thing related to sentiment analysis/Text mining/Text analysis? please have a tutorial for the text analytics as the other videos are so good I also request you to create chats for AUC and also create a model evaluation according to CRISP DM model
@slainiae
@slainiae 6 ай бұрын
84.44%. Used One-Hot-Encoding for the Male/Female as they are nominal categories. For mssing ages, averaged depending on whether they were children ("Master"), Miss, Mrs., or Mr. Used 'gini' criterion.
@KallolMedhi
@KallolMedhi 5 жыл бұрын
can anyone tell me why didn't we use OneHotEncoding in this example???? does it mean that we need dummy variable only in Regression algorithms???
@daisydiary1895
@daisydiary1895 5 жыл бұрын
I also got the same question. I appreciate if somebody help.
@daisydiary1895
@daisydiary1895 5 жыл бұрын
Maybe here is the answer: "Still there are algorithms like decision trees and random forests that can work with categorical variables just fine". datascience.stackexchange.com/questions/9443/when-to-use-one-hot-encoding-vs-labelencoder-vs-dictvectorizor
@Bobette_2409
@Bobette_2409 4 жыл бұрын
use pandas.get_dummies
@amalsunil4722
@amalsunil4722 4 жыл бұрын
Using One hot encoding worsens the accuracy of trees...therefore it's recommended to use label encoding
@SohamPaul-xy9jw
@SohamPaul-xy9jw Жыл бұрын
I tried the exercise, filled the Age gaps with median and included that column in the data frame. Then trained my model and score cam upto 0.97979 or 97.97%
@vincentdey4313
@vincentdey4313 5 ай бұрын
l also got 97.97%
@yashchavan1350
@yashchavan1350 3 жыл бұрын
Sir, In the Exercise you perform map on sex column and I did it using LabelEncoder. I liked when you give us a difference approach to perform a same task .and one more question Sir, instead of mean why cant we use mode on age column ........btw My score is 79%
@yashdewan3633
@yashdewan3633 2 жыл бұрын
my score : 0.8044692737430168
@vaibhavdhand1140
@vaibhavdhand1140 4 жыл бұрын
Thank you, sir, the exercise that you gave at the end of your lectures help us to experiment and get an in-depth knowledge of the algorithm. accuracy achieved =0.87
@codebasics
@codebasics 4 жыл бұрын
Perfect. thats a pretty good score. Good job.
@jaihind5092
@jaihind5092 4 жыл бұрын
@@codebasics sir, i got 97.7% accuracy
@HarshalDev
@HarshalDev 4 жыл бұрын
@@jaihind5092 how did you acheibe a score of 97.7 % ? i only achevied 82 :( even after removing all NAN values from age and conveting age n fare to int my score went from 74 to 80 to finally flattened at 82 ! help me improve .
@HarshalDev
@HarshalDev 4 жыл бұрын
how did you acheibe a score of 87 % ? i only achevied 82 :( even after removing all NAN values from age and conveting age n fare to int my score went from 74 to 80 to finally flattened at 82 ! help me improve . thanks
@codebasics
@codebasics 4 жыл бұрын
Step by step roadmap to learn data science in 6 months: kzbin.info/www/bejne/fmW8lKSLgb5kY7M Exercise solution: github.com/codebasics/py/blob/master/ML/9_decision_tree/Exercise/9_decision_tree_exercise.ipynb Complete machine learning tutorial playlist: kzbin.info/www/bejne/nZ7Zp5Sll9Jqm7M 5 FREE data science projects for your resume with code: kzbin.info/www/bejne/b2aal4R5opqUetE
@bestineouya5716
@bestineouya5716 4 жыл бұрын
97.97% accurate
@rahulpatidar9905
@rahulpatidar9905 4 жыл бұрын
@@bestineouya5716 i also got the same accuracy
@praveenkamble89
@praveenkamble89 4 жыл бұрын
Great Explanation Sir, Thanks a lot for your efforts and help. I got 97.76% accuracy. I did not map male and female to 1, 2 instead used as it is. Is it necessary to do that ? is there any significance of it?
@harris7294
@harris7294 3 жыл бұрын
Exercise results ::::: Accuracy : 0.8229665071770335 Actually I your csv file as training and for test data used test.csv provided on Kaggle >> which increase my training data(which would have been less if I had split my data) >> Increased Accuracy(As we have more data to train) >> Reduce chances of overfitting if i had used same data for both training and testing... Thank you.. for great video
@anonym9158
@anonym9158 3 жыл бұрын
0.98
@aadarsh14
@aadarsh14 6 ай бұрын
Thank you very much for this course! Super helpful. I was able to get an accuracy of 83.24%
@usmanasad3146
@usmanasad3146 5 жыл бұрын
As usual, all your videos are awesome to watch. Thanks for the same :)
@blaze9558
@blaze9558 7 ай бұрын
thanks a lot sir i just learnt so many things without getting bored(usually we don't get to do hands on for these topic), this was super helpful
@patelshivam1965
@patelshivam1965 5 жыл бұрын
Please can any one tell me how to increase our model's accuracy? i.e. Score
@codebasics
@codebasics 5 жыл бұрын
Increasing score is an art as well as science. If your question is specific to only decision tree then try fine tunning model parameters such as criterian, tree depth etc. You can also try some feature engineering and see if it helps.
@samitpatra8615
@samitpatra8615 5 жыл бұрын
I tried with increasing training data and score is increased.
@piyushjha8888
@piyushjha8888 4 жыл бұрын
Sir Accuracy for exercise given=98,20 percent. Thanks one again for great video
@codebasics
@codebasics 4 жыл бұрын
Great that's an excellent score Piyush. Good job :)
@piyushjha8888
@piyushjha8888 4 жыл бұрын
@@codebasics thanks sir . your ML series is grt source to learn. i do all your exercise
@learnerlearner4090
@learnerlearner4090 4 жыл бұрын
Thanks so much for these tutorials! These are the best tutorials I've found so far. The code shared by you for examples and exercises are very helpful. I got score 76% for the exercise. How is it possible to get a different score for the same model and the same data? The steps followed are the same too.
@codebasics
@codebasics 4 жыл бұрын
In train_test_split it will generate different samples Everytime so even when you run your code multiple times it will give different score. Specify random_state in train_test_spkit method, let's say 10, after that when you run your code you get same score. This is because now your train and test samples are same between different runs.
@learnerlearner4090
@learnerlearner4090 4 жыл бұрын
@@codebasics Got it. Thanks!
@anujvyas9493
@anujvyas9493 4 жыл бұрын
Same, I too got an accuracy of 76% but was aware about the random_state attribute! :)
@bharadwajkamepalli3903
@bharadwajkamepalli3903 11 ай бұрын
I have tried different values for the hyperparameter 'min_samples_leaf' and got accuracy improved while increasing it up to 6 increased accuracy from 78.77 to 85.47 then onwards it started decreasing. I used the 'mean' to fill the na values in the age column and the method 'ffill' has given the same accuracy as the mean.
@GauravKumar-mq7xx
@GauravKumar-mq7xx 4 жыл бұрын
Really loved the way you have explained the concept..you made it tooo easy.thanks a lot and keep making more videos.
@codebasics
@codebasics 4 жыл бұрын
Thanks a lot 😊
@zainhana2968
@zainhana2968 2 жыл бұрын
i start to learn about machine learning and your video help me so much to make understanding
@rajmourya35
@rajmourya35 3 жыл бұрын
I just increased the training ka dataset to 90% and score increased to 81%. Awesome tutorial
@codebasics
@codebasics 3 жыл бұрын
That’s the way to go raj, good job working on that exercise
@ashishbirajdar5
@ashishbirajdar5 2 жыл бұрын
Amazin video, thank you so much! I have a question.. In the dummy variable video, you had mentioned that we should always make sure when we do the One Hot Encoding, we should create different columns. ie. if Monroe township = 1, Robbinville = 2 and West Windosor = 3.. and so we want to avoid confusing the model which may assume Monroe township < Robbinville < West Windosor.. But in this video, you're assigning company names Google = 0, ABC Pharma = 1 and Facebook = 2. Is it the right thing to do?
@codebasics
@codebasics 2 жыл бұрын
Decision tree is one of those algorithms where label encoding works ok in some cases like ours and you can save some memory space by not using OHE. Check this for some insights: datascience.stackexchange.com/questions/9443/when-to-use-one-hot-encoding-vs-labelencoder-vs-dictvectorizor Having said that since a number of categories are small we can use OHE as there is no concern with sparsity. If I have to re-record this session, I'd probably use OHE.
@mponcardas94
@mponcardas94 5 жыл бұрын
97.88% score. you are thoughtful and great teacher
@codebasics
@codebasics 5 жыл бұрын
Perfect Mai. Thanks for working on exercise.
@amirhosseindaneshpour8714
@amirhosseindaneshpour8714 4 жыл бұрын
I got an accuracy of 97.75% ! thanks for this extremely useful content. keep it up please ! :)
@codebasics
@codebasics 4 жыл бұрын
That’s the way to go Amir, good job working on that exercise
@HarshalDev
@HarshalDev 4 жыл бұрын
how did you acheibe a score of 97.75 % ? i only achevied 82 :( even after removing all NAN values from age and conveting age n fare to int my score went from 74 to 80 to finally flattened at 82 ! help me improve . Thanks
@amirhosseindaneshpour8714
@amirhosseindaneshpour8714 4 жыл бұрын
@@HarshalDev hi. i put my Jupyter notebook on GitHub on link below. check it out and if you had any questions i'd be happy to help ;) github.com/uncleamir/Decision-tree-code-basics-solution
@muralidharang6140
@muralidharang6140 5 жыл бұрын
Hi Sir, I got the score model.score(df_n,target) Out[247]: 0.8799102132435466 model.predict([[2,1,29]]) Out[251]: array([0], dtype=int64) model.predict([[0,0,52]]) Out[252]: array([1], dtype=int64) Thanks for the wonderful videos..
@user-fe7kg7jt5w
@user-fe7kg7jt5w 3 жыл бұрын
My score is: Without Train Test Split - 0.97 With Train Test Split - 0.77 Thanks for your video!
@codebasics
@codebasics 3 жыл бұрын
Hood work. thanks for working on the exercise
@dataguy7013
@dataguy7013 Жыл бұрын
Best description of Information gain, your explanation is really the only resource that explains the intuition well
@moushmi_nishiganddha
@moushmi_nishiganddha 2 жыл бұрын
thank you for this ML playlist....your way of teaching is the best anybody can understand if they watch videos in sequence my model score is 1 i replace all the NaN values in age by mean value of age by Pclass
@abhishekgoyal7580
@abhishekgoyal7580 2 жыл бұрын
you didn't split the dataset into training and test and maybe that's why its 1 coz your test is same as train model. split the dataset and check the score
@moushmi_nishiganddha
@moushmi_nishiganddha 2 жыл бұрын
@@abhishekgoyal7580 i split the data but i used x_train,y_train as parameter in score method. now my score show .79 thanks for correcting me
@abhishekgoyal7580
@abhishekgoyal7580 2 жыл бұрын
@@moushmi_nishiganddha just saw your profile. You’re from houston too?
@moushmi_nishiganddha
@moushmi_nishiganddha 2 жыл бұрын
@@abhishekgoyal7580 yes
@laurenlin7478
@laurenlin7478 2 жыл бұрын
much better explanation than my prof!
@tulikabhardwaj484
@tulikabhardwaj484 3 жыл бұрын
Thanks for the tutorial mate
@codebasics
@codebasics 3 жыл бұрын
No problem 👍
@yasarahmedshaik6623
@yasarahmedshaik6623 Жыл бұрын
Thanks a lot Sir. It was Awesome and Excellent.
@SushiTheLeo
@SushiTheLeo 2 жыл бұрын
Very apt tutorial, simplified everything.
@rambaldotra2221
@rambaldotra2221 3 жыл бұрын
Sir Thank You So Much for these wonderful lectures .I got an accuracy of 83% on Titanic dataset.
@codebasics
@codebasics 3 жыл бұрын
Good job Ram, that’s a pretty good score. Thanks for working on the exercise
@rambaldotra2221
@rambaldotra2221 3 жыл бұрын
@@codebasics Thanks a lot Sir ✨
@abhisheksharma1031
@abhisheksharma1031 2 жыл бұрын
Such a nice explanation , now I dont need to watch any further videos. This video was very satisfactory and convincing !!
@anirudhgangadhar6158
@anirudhgangadhar6158 2 жыл бұрын
I got a test Accuracy of 78.77% with a train-test split of 80-20. Dataset was normalized as part of pre-processing, Nan's were filled with variable mean.
@vivek9917333300
@vivek9917333300 5 жыл бұрын
Your videos are quite simple and very understandable. It is very useful for a beginner. I am requesting you please make some videos about SQL and SAS. That will be very helpful for us thanks.
@codebasics
@codebasics 5 жыл бұрын
Thanks for commenting Vivek 👍
@MLLearner
@MLLearner 6 ай бұрын
81% accuracy Sir! Thanks, a lot.
@saisanthosh8370
@saisanthosh8370 Жыл бұрын
For testing 0.79 For training 0.985 thank you for the lectures these are smooth to learn the machine learning
@kalaipradeep2753
@kalaipradeep2753 11 ай бұрын
How to fill empty value on age feature
@bhumitbedse8156
@bhumitbedse8156 3 жыл бұрын
Hello sir at 7:50 LabelEncoder is used for all the columns like compony,job and degree but when we fit_transform then why only le_compony is used ? For job and degree we have to write le_job.fit_transform() and le_degree.fit_transform() ? Am I right please answer 😶
@nihalchidambaram3395
@nihalchidambaram3395 2 жыл бұрын
Hello Sir, Great tutorial. My model's accuracy for the titanic dataset came out to be 82%. Thank you.
@olufemilawore3691
@olufemilawore3691 2 жыл бұрын
With LogisticRegression, I got 78.9% and discovered that when sex was ==1, the model reads wrongly. I also used train_test_split with an accuracy of 79.3% (training size=0.3) and similar problems with logistic regression. Using the Decision tree, I got a score of about 92% and the prediction was superb.
@me29pranavagarwal31
@me29pranavagarwal31 Жыл бұрын
can you please help me with your code... I'm a beginner and getting 78% accuracy
@me29pranavagarwal31
@me29pranavagarwal31 Жыл бұрын
I want to learn how you are getting 92% accuracy, please
@dmcg_creative
@dmcg_creative 2 ай бұрын
This is a wonderful video, very clear overview, thank you! Is there a way to predict a continuous variable vs just a binary one (yes/no)? For example if I wanted to take purchase amount, gender, and whether or not they started a subscription, how much is this person likely to spend over the next year? Thanks in advance!
@jayeshparmar2603
@jayeshparmar2603 11 ай бұрын
I got 100 percent accurate score 🤩
@cometolearn8598
@cometolearn8598 Жыл бұрын
BRO SOLVES THE EZ DATASET HIMSELF IN THE VID SO IT IS SIMPLE AND EZ FOR HIM BUT FOR US HE GAVE LIKE THE BIGGEST DATASET EVER SEEN ☠☠💀💀
@BCM_VISHNUPRIYAV
@BCM_VISHNUPRIYAV 3 жыл бұрын
i got 80.67% accuracy on titanic thanks a lots sir i I did it myself
@codebasics
@codebasics 3 жыл бұрын
Nice work!
@ahaditab6364
@ahaditab6364 2 жыл бұрын
you are amazing teacher
@geethanjaliravichandhran8109
@geethanjaliravichandhran8109 3 жыл бұрын
Hi sir,your videos are really great.Thankyou so much.In titanic survival my prediction score differs if i use train test split method and my prediction score differs if i directly predict the score with inputs_n and target.Could you please make a video on how to remove outliers and when to use which algorithm so that its prediction accuracy is more.and please add one more video on high information gain entrophy in decision trees and what is the difference between decision tree classifier and regressor. Your videos are highly informative thankyou so much once again sir
@spicytuna08
@spicytuna08 2 жыл бұрын
you explain so well. thanks.
Machine Learning Tutorial Python - 10  Support Vector Machine (SVM)
23:22
Part 1-Decision Tree Classifier Indepth Intuition In Hindi| Krish Naik
34:17
Underwater Challenge 😱
00:37
Topper Guild
Рет қаралды 47 МЛН
Running With Bigger And Bigger Feastables
00:17
MrBeast
Рет қаралды 203 МЛН
1ОШБ Да Вінчі навчання
00:14
AIRSOFT BALAN
Рет қаралды 5 МЛН
هذه الحلوى قد تقتلني 😱🍬
00:22
Cool Tool SHORTS Arabic
Рет қаралды 94 МЛН
Decision Tree Classification Clearly Explained!
10:33
Normalized Nerd
Рет қаралды 659 М.
How to implement Decision Trees from scratch with Python
37:24
AssemblyAI
Рет қаралды 63 М.
Decision and Classification Trees, Clearly Explained!!!
18:08
StatQuest with Josh Starmer
Рет қаралды 734 М.
Machine Learning Tutorial Python - 11  Random Forest
12:48
codebasics
Рет қаралды 364 М.
Machine Learning Tutorial Python 12 - K Fold Cross Validation
25:20
Underwater Challenge 😱
00:37
Topper Guild
Рет қаралды 47 МЛН