No video

Classification Trees in Python from Start to Finish

  Рет қаралды 186,310

StatQuest with Josh Starmer

StatQuest with Josh Starmer

Күн бұрын

Пікірлер: 582
@statquest
@statquest 4 жыл бұрын
NOTE: You can support StatQuest by purchasing the Jupyter Notebook and Python code seen in this video here: statquest.gumroad.com/l/tzxoh Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
@ezzouaouia.r1127
@ezzouaouia.r1127 4 жыл бұрын
The site is offline. 11/07 12:00
@statquest
@statquest 4 жыл бұрын
Thanks for the note. It's back up.
@ezzouaouia.r1127
@ezzouaouia.r1127 4 жыл бұрын
@@statquest Thanks very much .
@dfinance2260
@dfinance2260 3 жыл бұрын
Still offline unfortunately. Would love to check the code.
@statquest
@statquest 3 жыл бұрын
@@dfinance2260 It should be back up now.
@funnyclipsutd
@funnyclipsutd 4 жыл бұрын
BAM! My best decision this year was to follow your channel.
@statquest
@statquest 4 жыл бұрын
BAM! :)
@mike19558
@mike19558 3 жыл бұрын
Mood, been so useful!
@renekokoschka707
@renekokoschka707 3 жыл бұрын
I just started my bachelor thesis and i really wanted to thank you! Your videos are helping me so much. You are a LEGEND!!!!!
@statquest
@statquest 3 жыл бұрын
Thank you and good luck! :)
@user-lc8gc6vb3j
@user-lc8gc6vb3j 11 ай бұрын
Thank you, this video helped me a lot! For anyone else following along in 2023, the way the confusion matrix is drawn here didn't work for me anymore. I replaced it with the following code: cm = confusion_matrix(y_test, clf_dt_pruned.predict(x_test), labels = clf_dt_pruned.classes_) disp = ConfusionMatrixDisplay(confusion_matrix = cm, display_labels=['Does not have HD', "Has HD"]) disp.plot() plt.show()
@statquest
@statquest 11 ай бұрын
BAM! Thank you. Also, I updated the jupyter notebook.
@jahanvi9429
@jahanvi9429 Жыл бұрын
You are so so helpful!! I am a data science major and your videos saved my academics. Thank you!!
@statquest
@statquest Жыл бұрын
Happy to help!
@ravi_krishna_reddy
@ravi_krishna_reddy 3 жыл бұрын
I was searching for a tutorial related to statistics and landed here. At first, I thought this is just one among many low quality content tutorials out there, but I was wrong. This is one of the best statistics and data science related channels I have seen so far, wonderful explanation by Josh. Addicted to this channel and subscribed. Thank you Josh for sharing your knowledge and making us learn in a constructive way.
@statquest
@statquest 3 жыл бұрын
Thank you very much! :)
@joaomanoellins2219
@joaomanoellins2219 4 жыл бұрын
I loved your Brazil polo shirt! Triple bam!!! Thank you for your videos. Regards from Brazil!
@statquest
@statquest 4 жыл бұрын
Muito obrigado!!!
@cindinishimoto9528
@cindinishimoto9528 4 жыл бұрын
@@statquest paying homage to Brazil!!
@statquest
@statquest 4 жыл бұрын
@@cindinishimoto9528 Eu amo do Brasil!
@kaimueric9390
@kaimueric9390 4 жыл бұрын
I actually think it can be great if you created more videos for other ML algorithms. After teaching us almost every aspect of machine learning algorithms as far as the mechanics and the related fundamentals are concerned, I feel it is high time to see those in action, and Python is, of course, the best way to go.
@statquest
@statquest 4 жыл бұрын
I'm working on them!!! :)
@juniotomas8563
@juniotomas8563 6 ай бұрын
Come on, Buddy! I've just saw a recommendation to your channel and on the first video I see you with a Brazilian t-shirt. Nice surprise!
@statquest
@statquest 6 ай бұрын
Muito obrigado! :)
@pratyushmisra2516
@pratyushmisra2516 4 жыл бұрын
My intro song for this channel: " It's like Josh has got his hands on python right, He teaches Ml and AI really Well and tight ---- STAT QUEST" btw thanks Brother for so much wonderful content for free.....
@statquest
@statquest 4 жыл бұрын
Thank you! :)
@Mohamm-ed
@Mohamm-ed 3 жыл бұрын
This voice remembering me when I listening to radio in UK. Love that. I want to go again
@statquest
@statquest 3 жыл бұрын
:)
@ozzyfromspace
@ozzyfromspace 3 жыл бұрын
I dunno how I stumbled on your channel a few videos ago, but you've really got me interested in statistics. Nice Work sir 😃
@statquest
@statquest 3 жыл бұрын
Hooray!
@liranzaidman1610
@liranzaidman1610 4 жыл бұрын
Josh, this is really great. Can you upload videos with some insights on your personal research and which methods did you use? And some examples of why you prefer to use one method instead of the other? I mean, not only because you get a better result in RUC/AUC but is there a "biological" reasoning for using a specific method?
@statquest
@statquest 4 жыл бұрын
Great suggestion!
@creativeo91
@creativeo91 3 жыл бұрын
This video helped me a lot for my Data Mining assignment.. Thank you..
@statquest
@statquest 3 жыл бұрын
Glad it helped!
@ccuny1
@ccuny1 4 жыл бұрын
I have already commented but I watched the video again and I have to say I am even more impressed than before. truly fantastic tutorial, not too verbose but with every action clarified and commented in the code, beautifully presented (I have to work on my markdown; there are quite a few markdown formats you use that I cannot replicate...to study when I get the notebook). So all in all, one of the very top ML tuts I have ever watched (including paid for training courses). Can't wait for today's or tomorrows webinars. Can't join in real time as based in Europe, but will definitely pick it up here and get the accompanying study guides/code.
@statquest
@statquest 4 жыл бұрын
Hooray!!! Thank you very much!!!
@dhruvishah9077
@dhruvishah9077 3 жыл бұрын
I'm absolute beginner and this is what i was looking. Thank you so much for this. Much appreciated sir!!
@statquest
@statquest 3 жыл бұрын
Glad it was helpful! :)
@montserratramirez4824
@montserratramirez4824 4 жыл бұрын
I love your content! Definitely my favorite channel this year Regards from Mexico!
@statquest
@statquest 4 жыл бұрын
Wow, thanks! Muchas gracias! :)
@1988soumya
@1988soumya 4 жыл бұрын
Hey Josh, it’s so good to see you are doing this, I am preparing for some interviews, it will help a lot
@statquest
@statquest 4 жыл бұрын
Good luck! :)
@ccuny1
@ccuny1 4 жыл бұрын
Another hit for me. I will be getting the Jupyter notebook and some if not all of you study guides (I only just realised they existed).
@statquest
@statquest 4 жыл бұрын
BAM! :) Thank you very much! :)
@korcankomili7398
@korcankomili7398 Жыл бұрын
I wish you were my uncle Josh or something. I could imagine how hard I would have had discussions with my parents to spend time with my TRIPLE cool uncle.
@statquest
@statquest Жыл бұрын
bam! :)
@bressanini
@bressanini 2 жыл бұрын
Hey Josh, follow this equation: You + Brazilian Flag Polo Shirt + Awesome Content = TRIPPLE BAM!!!
@statquest
@statquest 2 жыл бұрын
Muito bem! :)
@anishchhabra5313
@anishchhabra5313 2 жыл бұрын
This is legen..... wait for it ....dary!! 😎 This detailed coding explanation of Decision Tree is hard to find but Josh you are brilliant. Thank you for such a great video.
@statquest
@statquest 2 жыл бұрын
Glad you liked it!
@rajatjain7465
@rajatjain7465 Жыл бұрын
wowowowwo the best course ever, even better than all those paid quests thank you @josh stramer for these materials
@statquest
@statquest Жыл бұрын
Thank you! :)
@jefferyg3504
@jefferyg3504 3 жыл бұрын
You explain things in a way that is easy to understand. Bravo!
@statquest
@statquest 3 жыл бұрын
Thank you! :)
@sameepshah3835
@sameepshah3835 2 ай бұрын
I love you so much Josh. Thank you so much for everything.
@statquest
@statquest 2 ай бұрын
Thanks!
@Moiez101
@Moiez101 Жыл бұрын
1 hour statquest? in the words of Barney Rubble's son: "BAM BAM!"
@statquest
@statquest Жыл бұрын
double bam! :)
@willw4096
@willw4096 Жыл бұрын
1:00:20 Use color to visualize the category and the Gini impurity
@breopardo6691
@breopardo6691 3 жыл бұрын
As Tina Turner would say: "You are simply the best!" 🎵🎵🎵
@statquest
@statquest 3 жыл бұрын
BAM! :)
@aleksandartta
@aleksandartta 2 жыл бұрын
How to implement pipeline with cost complexity? Consider the marking part which start before 49:00... Thank in advance! You are the best teacher...
@statquest
@statquest 2 жыл бұрын
I'll work on that
@DANstudiosable
@DANstudiosable 4 жыл бұрын
OMG... I thought you'd ignore when i asked you to post this webinar on youtube. Am glad you posted it. Thank you!
@statquest
@statquest 4 жыл бұрын
BAM!!!!
@shubhankarde4732
@shubhankarde4732 4 жыл бұрын
Double BAM
@bessa0
@bessa0 2 жыл бұрын
Kind Regards from Brazil. Loved your book!
@statquest
@statquest 2 жыл бұрын
Thank you!
@gbchrs
@gbchrs 2 жыл бұрын
your channel is the best at explaining complex machine learning algorithm step by step. please make more videos
@statquest
@statquest 2 жыл бұрын
Thank you very much!!! Hooray! :)
@bayesian7404
@bayesian7404 5 ай бұрын
You are fantastic! I'm hooked on your videos. Thank you for all your work.
@statquest
@statquest 5 ай бұрын
Glad you like them!
@beebee_0136
@beebee_0136 2 жыл бұрын
I'd like to thank you so much for making this stream cast available!
@statquest
@statquest 2 жыл бұрын
:)
@aryamohan7533
@aryamohan7533 3 жыл бұрын
This entire video is a triple bam! Thank you for all your content, I would be lost without it :)
@statquest
@statquest 3 жыл бұрын
Glad you enjoyed it!
@lawrencegayundato8398
@lawrencegayundato8398 3 жыл бұрын
@@statquest This is Quadruple BAM!!!! Thank you Mr. Josh :)
@sharmakartikeya
@sharmakartikeya 3 жыл бұрын
Hurray! I saw your face for the first time! Nice to see one of those whom I have subscribed
@statquest
@statquest 3 жыл бұрын
bam!
@filosofiadetalhista
@filosofiadetalhista 2 жыл бұрын
Loved it. I am working on Decision Trees on my job this week.
@statquest
@statquest 2 жыл бұрын
bam!
@randyluong6275
@randyluong6275 2 жыл бұрын
We have data scientist out there. We have "data artist" right in this video.
@statquest
@statquest 2 жыл бұрын
Wow! Thank you!
@JoRoCaRa
@JoRoCaRa Жыл бұрын
brooo... this is insane!! thanks so much! this is amazing saving me so many headaches
@statquest
@statquest Жыл бұрын
Glad it helped!
@3ombieautopilot
@3ombieautopilot 4 жыл бұрын
Thank you very much for this one! You're channel is incredible! Hats off to you
@statquest
@statquest 4 жыл бұрын
Bam! :)
@fuckooo
@fuckooo 3 жыл бұрын
Love your videos Josh, the notebook missing values sounds like a great one to do!
@statquest
@statquest 3 жыл бұрын
Awesome!
@magtazeum4071
@magtazeum4071 4 жыл бұрын
BAM...!!! I'm getting notifications from your channel again
@statquest
@statquest 4 жыл бұрын
BAM! :)
@amc9520
@amc9520 Жыл бұрын
Thanks for making my life easy.
@statquest
@statquest Жыл бұрын
Any time!
@josephgan1262
@josephgan1262 3 жыл бұрын
Hi Josh, Thanks for the video again!!. I have some questions hope you don't mind to clarify in regards to pruning in general hyperparameter tuning. I see that in general the video has done the following to find the best alpha. 1) After train test split, find the best alpha after comparison between test and training (single split). @50:32 2) Rechecking the best alpha by doing CV @52:33. It is checked that that is huge variation in the accuracy, and this implies that alpha is sensitive to different training set. 3) Redo the CV for to find the best alpha by taking the mean of accuracy for each alpha. a) At step two, do we still need to plot the training set accuracy to check for overfitting? (it is always mention that we should compare training & testing set accuracy to check for overfitting) but there is an debate on this as well. ( Where other party mentioned that for a model-A of training/test accuracy of 99/90% vs another model-B : 85/85%. We should pick model-A with 99/90% accuracy because 90% testing accuracy is higher than 85% even though the model-B has no gap (overfitting) between train & test. What's your thought on this? b) What if I don't do step 1) and 2) and straight to step 3) is this a bad practice? do i still need to plot the training accuracy to compare with test accuracy if I skip step 1 and step 2? Thanks. c) I always see that the final hyper parameter is decided on highest mean of accuracy of all K-folds. Do we need to consider the impact of variance in K-fold? surely we don't want our accuracy to jump all over the place if taken into production. if yes, what is general rule of thumb if the variance in accuracy is consider bad. Sorry for the long posting. Thanks!
@statquest
@statquest 3 жыл бұрын
a) Ultimately the optimal model depends on a lot of things - and often domain knowledge is one of those things - so there are no hard rules and you have to be flexible about the model you pick. b) You can skip the first two steps - those were just there to illustrate the need for using cross validation. c) It's probably a good idea to also look at the variation.
@utkarshsingh2675
@utkarshsingh2675 2 жыл бұрын
this is what I have been looking for on youtube...thanks alot sir!!
@statquest
@statquest 2 жыл бұрын
Thanks!
@nataliatenoriomaia1635
@nataliatenoriomaia1635 3 жыл бұрын
Great video, Josh! Thanks for sharing it with us. And I have to say: the Brazilian shirt looks great on you! ;-)
@statquest
@statquest 3 жыл бұрын
BAM! :)
@rhn122
@rhn122 3 жыл бұрын
Great tutorial! One question, by looking at the features included in the final tree, does it mean that only those 4 features are considered for prediction, i.e., we don't need the rest so we could drop those columns for further usage?
@statquest
@statquest 3 жыл бұрын
That is correct.
@xiolee7597
@xiolee7597 4 жыл бұрын
Really enjoy all the videos! Can you do a series about mixed models as well, random effects, choosing models, interpretation etc. ?
@statquest
@statquest 4 жыл бұрын
It's on the to-do list.
@junaidmalik9593
@junaidmalik9593 4 жыл бұрын
Hi Josh, one amazing thing about the playlist is the song u sing before starting the video, that refreshes me. u know how to keep the listener awake for the next video. hehe. and really thanks for the amazing explanation.
@statquest
@statquest 4 жыл бұрын
Awesome thank you!
@floral7448
@floral7448 3 жыл бұрын
Finally have the honor to see Josh :)
@statquest
@statquest 3 жыл бұрын
:)
@rahulthaker694
@rahulthaker694 4 жыл бұрын
You look exactly how I thought you'd look like 😂
@statquest
@statquest 4 жыл бұрын
BAM! :)
@rahulthaker694
@rahulthaker694 4 жыл бұрын
@@statquest yesss legend 😂🙏
@statquest
@statquest 4 жыл бұрын
@@rahulthaker694 :)
@chrissmith1152
@chrissmith1152 4 жыл бұрын
he's done a lot of Live stream previously
@BeSharpInCSharp
@BeSharpInCSharp 4 жыл бұрын
I think he looks like joshua from Friends ( friend of rachel)
@robertmitru7234
@robertmitru7234 3 жыл бұрын
Awesome StatQuest! Great channel! Make more videos like this one for the other topics. Thank you for your time!
@statquest
@statquest 3 жыл бұрын
Thanks! Will do!
@jonastrex05
@jonastrex05 2 жыл бұрын
Amazing video! One of the best out there for this Education! Thank you Josh
@statquest
@statquest 2 жыл бұрын
Thank you!
@mahdimj6594
@mahdimj6594 4 жыл бұрын
Neural Network Pleaseee, Bayesian and LARS as well. And Thank you. You actually make things much easier to understand.
@statquest
@statquest 4 жыл бұрын
Thanks! :)
@simaykazc1508
@simaykazc1508 3 жыл бұрын
Josh is the best. I learned a lot from him!
@statquest
@statquest 3 жыл бұрын
Wow! Thank you!
@naveenagrawal_nice
@naveenagrawal_nice 7 ай бұрын
Love this channel, Thank you Josh
@statquest
@statquest 7 ай бұрын
Glad you enjoy it!
@douglasaraujo9763
@douglasaraujo9763 4 жыл бұрын
Your videos are always very good. But today I’ll have to commend you on your fashion choice as well. Great-looking shirt! I hope you have had the opportunity to visit Brazil.
@statquest
@statquest 4 жыл бұрын
Muito obrigado! Eu amo do Brasil! :)
@SamirMishra6174
@SamirMishra6174 4 жыл бұрын
wow is that a tabla in the background ?
@statquest
@statquest 4 жыл бұрын
Yes! I used to play tabla a long time ago.
@SamirMishra6174
@SamirMishra6174 4 жыл бұрын
@@statquest Amazing you are multi talented.
@fernandosicos
@fernandosicos 2 жыл бұрын
greatings from Brazil!
@statquest
@statquest 2 жыл бұрын
Muito obrigado! :)
@ramendrachaudhary9784
@ramendrachaudhary9784 3 жыл бұрын
We need to see you play some tabla to one of your songs. Double BAM!! Great content btw :)
@statquest
@statquest 3 жыл бұрын
Maybe one day!
@amalsakr1381
@amalsakr1381 6 ай бұрын
Thank you for your powerful tutrial
@statquest
@statquest 6 ай бұрын
Glad it was helpful!
@_ahahahahaha9326
@_ahahahahaha9326 3 жыл бұрын
Really learn a lot from you
@statquest
@statquest 3 жыл бұрын
Thanks!
@umairkazi5537
@umairkazi5537 4 жыл бұрын
Thank you very much . This video is very helpful and clears a lot of concepts for me
@statquest
@statquest 4 жыл бұрын
Bam! :)
@ericwr4965
@ericwr4965 4 жыл бұрын
I absolutely love your videos and I love your channel. Thanks for this.
@statquest
@statquest 4 жыл бұрын
Thanks! :)
@paulovinicius5833
@paulovinicius5833 3 жыл бұрын
I know I'll love all the content, but I start liking the video immediatly bc of the music! haha
@statquest
@statquest 3 жыл бұрын
Thank you! :)
@jihowoo9667
@jihowoo9667 4 жыл бұрын
I really love your video, it helps me a lot!! Regards from China.
@statquest
@statquest 4 жыл бұрын
Awesome! Thank you!
@estebannantes8567
@estebannantes8567 4 жыл бұрын
Hi Josh. Loved this video. I have two questions: 1- Is there any way to save our final decision tree model to use it later in unseen data without having to train it all again? 2- Once you have decided on your final alpha: why not training your tree on a full-unsplit dataset. I know you will not be able to generate a confusion matrix, but wouldn't your final tree be better if it is trained with all the examples?
@statquest
@statquest 4 жыл бұрын
Yes and yes. You can write the decision tree to a file if you don't want to keep it in memory (or want to back it up). See: scikit-learn.org/stable/modules/model_persistence.html
@lucillewiid5476
@lucillewiid5476 7 ай бұрын
Hi, Josh, recommend your videos to all my students and love watching and learning from them 👍. Can we still download this notebook?? Or do we need to buy it?? Regards from South Africa!
@statquest
@statquest 7 ай бұрын
This notebook has always been for sale and is still for sale if you would like it.
@liranzaidman1610
@liranzaidman1610 4 жыл бұрын
Fantastic, this is exactly what I needed
@statquest
@statquest 4 жыл бұрын
Hooray! :)
@chaitanyasharma6270
@chaitanyasharma6270 3 жыл бұрын
i loved your video support vector machines in python from start to finish and this one too!!! can you make more on different algorithms?
@statquest
@statquest 3 жыл бұрын
I will try!
@avramisthename
@avramisthename 10 ай бұрын
great insight and refresher, thank you for documenting
@statquest
@statquest 10 ай бұрын
Glad you enjoyed it!
@Kenwei02
@Kenwei02 2 жыл бұрын
Thank you so much for this tutorial! This has helped me out a lot!
@statquest
@statquest 2 жыл бұрын
Glad it helped!
@abdelrhmansayed5436
@abdelrhmansayed5436 3 жыл бұрын
thank you for your great effort and simple explanation, i have only one question that is why did you split the data into X_train and y_trrain and then give it to cross_val_score , shouldn't coss validtion works on all X ?
@statquest
@statquest 3 жыл бұрын
In theory we are trying to save some data for a final validation of the model.
@teetanrobotics5363
@teetanrobotics5363 3 жыл бұрын
Amazing man. I love your channel. Could you please reorder this video , SVMs and Xgboost in the correct order in the playlist ?
@statquest
@statquest 3 жыл бұрын
Yes!
@vipanpatial2243
@vipanpatial2243 2 жыл бұрын
BAM!! You are best.
@statquest
@statquest 2 жыл бұрын
Double bam!
@mcmiloy3322
@mcmiloy3322 3 жыл бұрын
Really nice video. I thought you were actually going to implement the tree classifier itself, which would have been a real bonus but I guess that would have taken a lot longer.
@statquest
@statquest 3 жыл бұрын
Noted
@saiakhil4751
@saiakhil4751 3 жыл бұрын
Wow!! Josh on live? made my day...
@statquest
@statquest 3 жыл бұрын
:)
@ayatkhrisat5964
@ayatkhrisat5964 4 жыл бұрын
kindly add this video to the machine learning list
@statquest
@statquest 4 жыл бұрын
Will do!
@srmsagargupta
@srmsagargupta 3 жыл бұрын
Thank you Sir for this wonderful webinar
@statquest
@statquest 3 жыл бұрын
Thank you!
@michelchaghoury870
@michelchaghoury870 2 жыл бұрын
MANNNN so usefull please keep going
@statquest
@statquest 2 жыл бұрын
Thanks!
@amitsaxena6530
@amitsaxena6530 4 жыл бұрын
Hi Josh, Request you to make more such ML videos in python which covers all ML concepts holistically. I am sure this course will then become more popular then any of the available ML courses. Pls pls pls....
@statquest
@statquest 4 жыл бұрын
I'll consider it.
@TalesLimaFonseca
@TalesLimaFonseca 2 жыл бұрын
Man, you are awesome! Vai BRASIL!!!
@statquest
@statquest 2 жыл бұрын
Muito obrigado!
@prnv5
@prnv5 2 жыл бұрын
Hi Josh! I'm a HS student trying to learn ML algorithms and your videos are genuinely my saving grace. They're so concise, information heavy and educational. I understand concepts perfectly through your statquests, and I'm really grateful for that. One quick question: The algorithm used in this case to build a decision tree: is it the CART algorithm? I'm writing a paper on the CART algorithm and would hence like to confirm the same. Thanks again!
@statquest
@statquest 2 жыл бұрын
Yes, this is the "classification tree" in CART.
@prnv5
@prnv5 2 жыл бұрын
@@statquest Thank you so much 🥰
@kaimueric9390
@kaimueric9390 4 жыл бұрын
I liked before watching
@statquest
@statquest 4 жыл бұрын
Hooray!!!
@krishanudebnath1959
@krishanudebnath1959 2 жыл бұрын
love the tabla and ur content
@statquest
@statquest 2 жыл бұрын
Thanks! My father used to teach at IIT-Madras so I spent a lot of time there when I was young.
@alpatul
@alpatul 2 жыл бұрын
This is great, do you have any more python webinars related to machine learning? I would love to go through them.
@statquest
@statquest 2 жыл бұрын
I'm working on more right now.
@oliveryoule11
@oliveryoule11 2 жыл бұрын
@@statquest At 19 minutes you say you have plans for a whole webinar on missing data! This is what I need. Where can I find it or is it still in production? :D
@statquest
@statquest 2 жыл бұрын
@@oliveryoule11 Dang! I'd forgotten about that. I guess you could say it's still in production. :)
@oliveryoule11
@oliveryoule11 2 жыл бұрын
​@@statquest Thanks for replying! I can see how easy it is to forget! You have so much content its unreal! V impressive! I just purchased your Notebook through the link - but it doesn't appear to arrived in my inbox. Can you advise? I am also strongly considering paying for your Patreon account. I currently pay for Datacamp - but your material is so much better!
@statquest
@statquest 2 жыл бұрын
@@oliveryoule11 Wow! Thanks for supporting me and I'm sorry you had trouble purchasing the notebook. If you contact me through my website, I can send it tor you directly: statquest.org/contact/
@DanteNoguez
@DanteNoguez 2 жыл бұрын
Double BAM! Haha, I love this guy
@statquest
@statquest 2 жыл бұрын
Thank you!
@PinkFloydTheDarkSide
@PinkFloydTheDarkSide 2 жыл бұрын
Somehow your room and furniture remind me of my grad building room at the Univ. of Chicago.
@statquest
@statquest 2 жыл бұрын
Cool!
@shindepratibha31
@shindepratibha31 4 жыл бұрын
I have almost completed the Machine learning playlist and it was really helpful. One request, can you please make a short video on 'handling the imbalanced dataset'?
@statquest
@statquest 4 жыл бұрын
I've got a rough draft on that topic here: kzbin.info/www/bejne/n4Xbq4WMgdSHh5I
@Denise_lili
@Denise_lili 2 жыл бұрын
Thank you Josh for the nice videos! Questions: 1) What is accuracy? Is there a relationship between Gini impurity/Sum of squared residuals and accuracy (i.e. Lower Gini impurity means higher accuracy)? 2) Once we create a tree classifier with a certain alpha, will different training data sets give different fitted trees? And how are they different?
@statquest
@statquest 2 жыл бұрын
Accuracy is the percentage of the data that are correctly classified. The lower the gini index, the higher accuracy. Different training datasets will probably give a different value for alpha, so it's good to use cross validation to find the best value.
@Denise_lili
@Denise_lili 2 жыл бұрын
@@statquest Thank you Josh for the quick reply. This is very helpful! But AFTER the optimal alpha value is identified via the cross validation, will different training datasets give different final pruned trees in the below query? clf_dt_pruned = clf_dt_pruned.fit (X_train, y_train)
@statquest
@statquest 2 жыл бұрын
@@Denise_lili Yes
@rogertea1857
@rogertea1857 3 жыл бұрын
Pruning is better than setting max_depth or min_samples beforehand overall I guess. Thanks for another great tutorial : )
@statquest
@statquest 3 жыл бұрын
Thanks!
@bitseatomos
@bitseatomos 4 жыл бұрын
Congratulations ! Ten times triple bam !!
@statquest
@statquest 4 жыл бұрын
Hooray! :)
@Mustistics
@Mustistics 2 жыл бұрын
Hey Josh. One thing that bugs me about this tutorial: when you do binary classification, you need to take into account class imbalance. Accuracy is the worst metric for this. Was that neglected for a reason?
@statquest
@statquest 2 жыл бұрын
No, ideally we would take class imbalance into account.
@Mustistics
@Mustistics 2 жыл бұрын
@@statquest Thanks for the response. So theoretically, I can follow your tutorial exactly, but use recall wherever you use accuracy?
@statquest
@statquest 2 жыл бұрын
@@Mustistics Sure!
@Mustistics
@Mustistics 2 жыл бұрын
@@statquest You're the best (and I'm not just saying that)!
@GokulSKumar-uz9dy
@GokulSKumar-uz9dy 4 жыл бұрын
Great video sir.:) I just have a doubt in one part. At 52:14 instead of using X_train and y_train, arent we supposed to use the entire dataset(i.e. X_encoded and y) while implementing cross-validation? Also later in the video at 52:54, the value for alpha was found by using only X_train and y_train data in the cross-validation.
@statquest
@statquest 4 жыл бұрын
There are different schools of thought about what datasets you should use for cross validation. The most common one, however, is to do it as presented in this video.
@GokulSKumar-uz9dy
@GokulSKumar-uz9dy 4 жыл бұрын
@@statquest Thanks a lot! Just in case if it might be useful, I tried using the entire dataset in cross-validation before splitting it into train and test. I could see from the corresponding confusion matrix that the model predicted correctly 90% of people not having a heart disease whereas, there was no increase in the percentage of people having heart disease. Again loved the video a lot. Waiting for the next webinar.:)
@KeigoEdits
@KeigoEdits 2 жыл бұрын
@@statquest Hey Josh sir, actually after reading this comment I really went for cross-validation with the whole dataset as in above comment I also read that you mentioned that we should take whole dataset in case of small datasets and what I personally think is 297 datapoints dataset can be called small. This gave me better results at alpha=0.021097 and it was varying from 0.74 and 0.88. What are your views on it?
@statquest
@statquest 2 жыл бұрын
@@GokulSKumar-uz9dy It really depends on how noisy your data is and what you hope to do with it.
@RAMAYATRI
@RAMAYATRI 3 жыл бұрын
Can see Tabla in the background.... Planning to use it in any future video ?
@statquest
@statquest 3 жыл бұрын
One day!
@BeSharpInCSharp
@BeSharpInCSharp 4 жыл бұрын
i wanted to learn DT from scratch but it seems here we should already know things like confusion matrix. I better study that first and come back to this video
@statquest
@statquest 4 жыл бұрын
Yep.
@HB-ys9rt
@HB-ys9rt 3 жыл бұрын
You're a great professor. One point that I'd like to ask for your opinion is that once you created dummy variables for a categorical variable with several levels, you did not remove one of the dimensions from X. Do you think that multicollinearity would be causing a bias in the model?
@statquest
@statquest 3 жыл бұрын
No. Classification trees are not effected by multicollinearity.
@HB-ys9rt
@HB-ys9rt 3 жыл бұрын
@@statquest thank you. You’re just incredible!
@InternatoMiguel
@InternatoMiguel 9 ай бұрын
Hello Josh, thank you so much for another great video! Did you end up doing a webinar on inputting values? If so, where can I find it? :)
@statquest
@statquest 9 ай бұрын
Maybe. What time point in the video, minutes and seconds, are you asking about?
@InternatoMiguel
@InternatoMiguel 9 ай бұрын
@@statquest 18:29
@statquest
@statquest 9 ай бұрын
@@InternatoMiguel Unfortunately, except for imputing data for Random Forests, I haven't covered that topic very much. However, if you are interested in how Random Forests do it... kzbin.info/www/bejne/qYKbaGOXibCkn68
@raindrop0405070
@raindrop0405070 4 жыл бұрын
First, Thank you. You explain complicated things in very easiest way with visulization. But,You should have a better microphone with it. I think I am going to keep wathcing your videos.
@statquest
@statquest 4 жыл бұрын
Thanks for the tips!
@beibeima524
@beibeima524 2 жыл бұрын
Hi Josh, Thanks so much for the video! My Question is should we do one hot encoding before or after splitting the data into training and testing set? Thanks!
@statquest
@statquest 2 жыл бұрын
As long as all categories are in both sets, it doesn't matter.
XGBoost in Python from Start to Finish
56:43
StatQuest with Josh Starmer
Рет қаралды 224 М.
Decision and Classification Trees, Clearly Explained!!!
18:08
StatQuest with Josh Starmer
Рет қаралды 732 М.
拉了好大一坨#斗罗大陆#唐三小舞#小丑
00:11
超凡蜘蛛
Рет қаралды 16 МЛН
Unveiling my winning secret to defeating Maxim!😎| Free Fire Official
00:14
Garena Free Fire Global
Рет қаралды 16 МЛН
Running With Bigger And Bigger Feastables
00:17
MrBeast
Рет қаралды 195 МЛН
How to implement Decision Trees from scratch with Python
37:24
AssemblyAI
Рет қаралды 63 М.
Entropy (for data science) Clearly Explained!!!
16:35
StatQuest with Josh Starmer
Рет қаралды 602 М.
How to Prune Regression Trees, Clearly Explained!!!
16:15
StatQuest with Josh Starmer
Рет қаралды 224 М.
Regression Trees, Clearly Explained!!!
22:33
StatQuest with Josh Starmer
Рет қаралды 633 М.
Stanford's FREE data science book and course are the best yet
4:52
Python Programmer
Рет қаралды 694 М.
Researchers thought this was a bug (Borwein integrals)
17:26
3Blue1Brown
Рет қаралды 3,5 МЛН
StatQuest: Random Forests Part 1 - Building, Using and Evaluating
9:54
StatQuest with Josh Starmer
Рет қаралды 1,1 МЛН
AdaBoost, Clearly Explained
20:54
StatQuest with Josh Starmer
Рет қаралды 752 М.
How to Build Your First Decision Tree in Python (scikit-learn)
15:13
Ryan & Matt Data Science
Рет қаралды 13 М.