XGBoost in Python from Start to Finish

  Рет қаралды 234,845

StatQuest with Josh Starmer

StatQuest with Josh Starmer

Күн бұрын

Пікірлер: 728
@statquest
@statquest 4 жыл бұрын
NOTE: You can support StatQuest by purchasing the Jupyter Notebook and Python code seen in this video here: statquest.gumroad.com/l/uroxo Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
@PetStuBa
@PetStuBa 4 жыл бұрын
Dear Josh ... I have a request for new videos or livechats ... could you explain us these tests maybe ? ... Tukey, Bonferroni and Scheffé , it's hard for me to understand , you explain everything so well ... could be very helpful for a lot of people out there ... have a nice day , greetings from Europe
@znull3356
@znull3356 3 жыл бұрын
Please keep doing these long-form Python tutorials on the various ideas we've covered in earlier 'Quests. They're great for those of us working in Python, and they give me another way to support the channel. It has been a more-than-pleasant surprise that as I've grown from learning the basics of stats to machine learning and eventually deep learning, StatQuest has grown along with me into those very same fields. Thanks Josh.
@statquest
@statquest 3 жыл бұрын
That's the plan!
@kelvinhsueh5434
@kelvinhsueh5434 3 жыл бұрын
You are amazing. Can't imagine how much work you put into those step-by-step tutorials. Just bought the Jupyter Notebook code and it's beyond worth it! Thank you :)
@statquest
@statquest 3 жыл бұрын
Thank you very much for your support! :)
@zheyizhao4865
@zheyizhao4865 4 жыл бұрын
Hey Josh, I just purchased all of your 3 Jupyter Notebook! I transferred from Econ major to Data Science, it was a nightmare before I find your channel. Your channel shed the light upon my academic career! Look forward to more of the 'Python from Start to Finish' series, and I will definitely support it!
@statquest
@statquest 4 жыл бұрын
Awesome! Thank you!
@samxu5320
@samxu5320 2 жыл бұрын
Your pronunciation is the most authentic and clearest that I have ever heard
@statquest
@statquest 2 жыл бұрын
Wow! Thank you!
@abhinaym5923
@abhinaym5923 4 жыл бұрын
I am purchasing the Jupiter notebook to contribute to your work! Thanks a lot for this video! You are awesome! Will be very very happy to have more ML tutorials and thank you Josh!
@statquest
@statquest 4 жыл бұрын
Thank you very much! :)
@kd6600xt
@kd6600xt 2 жыл бұрын
Quick note: At 43:30 instead of using the plot_confusion_matrix() which is now depreciated, you need to use ConfusionMatrixDisplay.from_estimator(). This can be done as follows: Include: from sklearn.metrics import ConfusionMatrixDisplay at the start with the other imports. Then when printing the confusion matrix you need to use the line: ConfusionMatrixDisplay.from_estimator(clf_xgb, X_test, y_test, display_labels=["Did not leave", "Left"])
@statquest
@statquest 2 жыл бұрын
Thank you very much! I really appreciate it.
@kd6600xt
@kd6600xt 2 жыл бұрын
@@statquest no problem!
@emilioluissaenzguillen5719
@emilioluissaenzguillen5719 Жыл бұрын
I am getting this error following what you say: XGBoostError: [17:07:52] c:\buildkite-agent\builds\buildkite-windows-cpu-autoscaling-group-i-08de971ced8a8cdc6-1\xgboost\xgboost-ci-windows\src\c_api\c_api_utils.h:167: Invalid missing value: null Do you know why it might be? Thanks.
@emilioluissaenzguillen5719
@emilioluissaenzguillen5719 Жыл бұрын
I solved the above error by setting missing=0 on the above code as follows: clf_xgb = xgb.XGBClassifier(objective='binary:logistic', missing=0, seed=42, ## the next three arguments set up early stopping. eval_metric='aucpr', early_stopping_rounds=10)
@benchen9910
@benchen9910 Жыл бұрын
@@emilioluissaenzguillen5719 thanks it works
@tantalumCRAFT
@tantalumCRAFT 2 жыл бұрын
This is hands down the best Python tutorial on KZbin.. not just for XGBoost, but overall Python logic and syntax. Nice work, subscribed!!
@statquest
@statquest 2 жыл бұрын
Wow! Thank you!
@OlgaW_Lavender
@OlgaW_Lavender 4 ай бұрын
Outstanding! Complete with the thinking path - how to analyze variables in a logical way - and with common errors. Just purchased the Notebook. Thank you for all of your work on this channel.
@statquest
@statquest 4 ай бұрын
Triple bam!!! Thank you very much for supporting StatQuest! :)
@sudheerrao07
@sudheerrao07 4 жыл бұрын
Wow. Finally I see a face for the name. Your previous videos have had immensely helpul. I assumed you are a very senior person. I am not measuring your age. I mean, your way of explaining seemed like a professor with half a century of experience. But in reality, you are quite young. Thank you for all your simple-yet-detailed videos. No words to quantify how much I appreciate them. 🙏
@statquest
@statquest 4 жыл бұрын
Wow, thanks!
@RahulEdvin
@RahulEdvin 4 жыл бұрын
Josh, you’re well and truly phenomenal ! Love from Madras !
@prashanthb6521
@prashanthb6521 4 жыл бұрын
Chennai
@statquest
@statquest 4 жыл бұрын
BAM! Thank you very much!!!
@starmerf
@starmerf 4 жыл бұрын
Hi Rahul I taught atIIT-madras 19192-1993 lived on campus across from post office josh visited us there
@RahulEdvin
@RahulEdvin 4 жыл бұрын
Frank Starmer Hello Frank, wow! That’s great to know ! :) I’m sure you must have had a good time here. Cheers :)
@prashanthb6521
@prashanthb6521 4 жыл бұрын
@@starmerf wow the world is a small place ☺
@socksdealer
@socksdealer 22 күн бұрын
I understand you better than if it would be explained in my native language) Thank you for your work!
@statquest
@statquest 22 күн бұрын
Thank you!
@janskovajsa237
@janskovajsa237 4 ай бұрын
41:21 In my version of xgboost were parameters early_stopping_rounds=10 and eval_metric='aucpr' moved to XGBClassifier, so if it is not working I suggest trying this. Although I really appreciate value of StatQuest videos, I have to admit I really hate all that singing and Bams. Makes me feel like attending school for slower children
@statquest
@statquest 4 ай бұрын
Sorry to hear that.
@ovrava
@ovrava 14 күн бұрын
@@statquest i like it. its cool.
@RahulVarshney_
@RahulVarshney_ 4 жыл бұрын
"25:36" that's what i was waiting for from the beginning...Truly amazing.. You are providing precious information..CHEERS
@statquest
@statquest 4 жыл бұрын
Glad it was helpful!
@RahulVarshney_
@RahulVarshney_ 4 жыл бұрын
@@statquest one small request..can you provide some valuable information through a video like which model to chose for different datasets..how do we decide what model we should chose...thanks in advance
@statquest
@statquest 4 жыл бұрын
@@RahulVarshney_ I'll keep that in mind. In the mean time, check out: scikit-learn.org/stable/tutorial/machine_learning_map/index.html
@RahulVarshney_
@RahulVarshney_ 4 жыл бұрын
@@statquest that is amazing ...i will complete it today itself thanks again for your prompt reply Can i get your email
@markrauschkolb5370
@markrauschkolb5370 3 жыл бұрын
Extremeley helpful - would love to see more from the "start to finish" series
@statquest
@statquest 3 жыл бұрын
I'm working on it.
@navyasreepinjala1582
@navyasreepinjala1582 2 жыл бұрын
I love your teaching style. Extremely helpful for a beginner like me. Really helped me a lot in my exams. No words. You are the best!!!!
@statquest
@statquest 2 жыл бұрын
Thank you!
@jinwooseong2862
@jinwooseong2862 3 жыл бұрын
I watched your all video for XGBoost. It helps me a lot. very appreciated!
@statquest
@statquest 3 жыл бұрын
Glad it helped!
@julieirwin3288
@julieirwin3288 4 жыл бұрын
What did we do to deserve a great guy like Josh ? Thank you Josh!
@statquest
@statquest 4 жыл бұрын
Thanks! :)
@cszthomas
@cszthomas 2 жыл бұрын
Thank you for the great work!
@statquest
@statquest 2 жыл бұрын
Wow! Thank you so much for supporting StatQuest!!! BAM! :)
@AdamsJamsYouTube
@AdamsJamsYouTube 2 жыл бұрын
Josh, this video is epic and really helped me understand the actual process of tuning hyperparameters, something that had been a bit of a black box until I saw this video. Your channel is awesome too - great jingles as well :D
@statquest
@statquest 2 жыл бұрын
Thank you!
@fernandes1431
@fernandes1431 2 жыл бұрын
Can't thank you enough for the clearest and best explanation on KZbin
@statquest
@statquest 2 жыл бұрын
Thank you!
@sane7263
@sane7263 Жыл бұрын
That's the Best video I've ever seen. Period. TRIPLE BAM! :)
@statquest
@statquest Жыл бұрын
Wow, thanks!
@romanroman5226
@romanroman5226 3 жыл бұрын
Awesome video! The cleanest xgboost explanation a have ever seen.
@statquest
@statquest 3 жыл бұрын
Wow, thanks!
@SergioPolimante
@SergioPolimante 3 жыл бұрын
This kind of content is SUPER HARD to produce. I really understand and appreciate your effort here. Thanks and congratulations.
@statquest
@statquest 3 жыл бұрын
Thank you very much!
@josephhayes9152
@josephhayes9152 3 жыл бұрын
Thanks for the great tutorial! You covered a lot of details (mostly data cleaning) that are often overlooked or skipped as 'trivial' steps.
@statquest
@statquest 3 жыл бұрын
Thank you! Yes, "data cleaning" is 95% of the job.
@darksoul1381
@darksoul1381 4 жыл бұрын
I was wondering how to find stuff regarding dealing with actual churn data and sampling issues. The tutorial addressed a lot of them. Thanks!
@statquest
@statquest 4 жыл бұрын
Thanks!
@Krath1988
@Krath1988 4 жыл бұрын
Liked, favorited, recommended, shared, and sacrificed my first-born to this video.
@statquest
@statquest 4 жыл бұрын
TRIPLE BAM! :)
@mykindofgaming7345
@mykindofgaming7345 6 ай бұрын
😂😂😂
@VarunKumar-pz5si
@VarunKumar-pz5si 3 жыл бұрын
I'm very grateful to have you as my teacher.
@statquest
@statquest 3 жыл бұрын
Thanks!
@maurosobreira8695
@maurosobreira8695 3 жыл бұрын
A true, real Master Class - You got my support!
@statquest
@statquest 3 жыл бұрын
Thank you! :)
@KukaKaz
@KukaKaz 4 жыл бұрын
Yes pls more videos with python❤thank u for the webinar
@statquest
@statquest 4 жыл бұрын
Thanks! :)
@thomsondcruz
@thomsondcruz Жыл бұрын
Absolutely loved this video Josh. It breaks down everything into understandable chunks. Thank you and God bless. BAM! The only thing I missed (and its very minor) was taking in a new data row and making an actual prediction by using the model.
@statquest
@statquest Жыл бұрын
Thanks! For new data, you just call clf_xgb.predict() with the row of new data.
@francovega7089
@francovega7089 2 жыл бұрын
I really appreciate your content Josh. Thanks for your time
@statquest
@statquest 2 жыл бұрын
Thank you!
@henkhbit5748
@henkhbit5748 4 жыл бұрын
Greatly appreciated this videoLike you said, telcos should gives more effort to tie the current customers. In real practice you want to know what the probability is that a current customer will no longer renew the subscription. You should then try to bind the customer with a high risk with incentives.
@statquest
@statquest 4 жыл бұрын
True!
@danielmagical6298
@danielmagical6298 4 жыл бұрын
Hi Josh, great job really helpful material as I'm discovering XGBoost just now. Thank you and keep you great work!
@statquest
@statquest 4 жыл бұрын
Thank you very much! :)
@aksharkottuvada
@aksharkottuvada 2 жыл бұрын
Thank you Josh. Needed this tutorial to better solve a ML Problem as part of my internship :)
@statquest
@statquest 2 жыл бұрын
Glad it helped!
@minseong4644
@minseong4644 3 жыл бұрын
Such an amazing job Josh.. Couldn't find any better explanation than this! Mesmerizing!
@statquest
@statquest 3 жыл бұрын
Wow, thanks!
@muskanroxx22
@muskanroxx22 3 жыл бұрын
You're a very kind human being Josh!! Thank you so much for making these videos. Your content is gold!!! I am new to data science and this is exactly what I needed!! :) Much love from India!
@statquest
@statquest 3 жыл бұрын
Glad you like my videos!! BAM! :)
@muskanroxx22
@muskanroxx22 3 жыл бұрын
@@statquest Hey Josh! I am learning about Bayesian Optimizer and I don't seem to get it even after watching tons of tutorials, can you suggest where I should learn it from please? I couldn't find a video on your channel on this.
@statquest
@statquest 3 жыл бұрын
@@muskanroxx22 Unfortunately I don't know of a good source for that.
@daniloyukihara2143
@daniloyukihara2143 3 жыл бұрын
hurray, i picture you totally different! Thanks a lot for all the videos!
@statquest
@statquest 3 жыл бұрын
Glad you like them!
@parismollo7016
@parismollo7016 4 жыл бұрын
I haven't watched it yet but I know this will be great!!!!!!!! Thank you Josh.
@statquest
@statquest 4 жыл бұрын
BAM! :)
@codinghighlightswithsadra7343
@codinghighlightswithsadra7343 Жыл бұрын
Thank you so much for the work that you used in step by step tutorial. it was amazing.
@statquest
@statquest Жыл бұрын
You're very welcome!
@marekslazak1003
@marekslazak1003 2 жыл бұрын
Jesus, i just learned more over 10 minutes of this than i did throughtout an entire semester of a similar subject on CS. ++ tutorial
@statquest
@statquest 2 жыл бұрын
Thank you!
@godoren
@godoren 4 жыл бұрын
Thank you for your job, the explanation of the topic is very clear and transparent.
@statquest
@statquest 4 жыл бұрын
Thank you very much! :)
@dagma3437
@dagma3437 4 жыл бұрын
I'm so glad you are a bad-ass stats guru and a teacher waaaaaaaaay before a singer and a guitarist ...Thank you! ;)
@statquest
@statquest 4 жыл бұрын
joshuastarmer.bandcamp.com/
@dagma3437
@dagma3437 4 жыл бұрын
StatQuest with Josh Starmer ...not bad. A poor man’s Jack Johnson 🤔
@dagma3437
@dagma3437 4 жыл бұрын
Just pulling your leg. Thanks for all the content on stats
@keyurshah8451
@keyurshah8451 2 жыл бұрын
Hey Mate, amazing tutorial. Very complex problem explained in really simple and effective way. I am using XGBOOST for one of the classification model and after watching your video it made me realise I can further improve my model. So thank you again and keep making those videos. Kudos to you and long live data science 🙏🙏
@statquest
@statquest 2 жыл бұрын
Glad it helped!
@marcelocoip7275
@marcelocoip7275 2 жыл бұрын
Hard work here, I'ts funny how the responsabile scientist and the funny guy coexist, very useful lesson, thanks!
@statquest
@statquest 2 жыл бұрын
Thanks! 😃!
@ketanshetye5029
@ketanshetye5029 4 жыл бұрын
could not help u with money right now , but i watched all the adds in video , hope that helps u financially . love u videos . keep up!!
@statquest
@statquest 4 жыл бұрын
I appreciate that
@sreejaysreedharan4085
@sreejaysreedharan4085 4 жыл бұрын
Lovely and priceless video Josh...BAM BAM BAM as usual !! :) God bless. .
@statquest
@statquest 4 жыл бұрын
Thank you very much! :)
@Toyotaman
@Toyotaman Жыл бұрын
38:05 stratify=y is not for yes is for dependent variable y. if you have a different variable, you gotta pass your response variable's name to stratify
@statquest
@statquest Жыл бұрын
Oops! thanks for catching that.
@jimmyrico5364
@jimmyrico5364 4 жыл бұрын
This is a great piece of work, thanks for sharing it! Maybe the only additional piece I'd add which I've found useful on the documentation of XGBoost is that one can take advantage of parallel computing (more cores or using a graphic card your machine or you could have on the cloud) by simply passing the parameter (n_jobs = -1) while doing both, the RandomizedSearchCV stage and the setting the XGB regressor type (XGBRegressor for example).
@statquest
@statquest 4 жыл бұрын
Great tip! BAM!
@mdaroza
@mdaroza 3 жыл бұрын
Amazingly organized and well explained!
@statquest
@statquest 3 жыл бұрын
Thank you!
@chiragpalan9780
@chiragpalan9780 4 жыл бұрын
This guy is amazing. DOUBLE BAM 💥 💥
@statquest
@statquest 4 жыл бұрын
Thank you! :)
@saeedesmailii
@saeedesmailii 4 жыл бұрын
It was extremely helpful. Please continue making these videos. I suggest making a video to explain the clustering with unlabeled data, and predicting the future trend in time-series data.
@statquest
@statquest 4 жыл бұрын
I'll keep that in mind. :)
@andrewxie9896
@andrewxie9896 3 жыл бұрын
you are simply an amazing human being, also the notebooks are great! :D
@statquest
@statquest 3 жыл бұрын
Thanks!
@danielpinzon9284
@danielpinzon9284 4 жыл бұрын
Love u Josh.... you are a TRIPLE BAM!!! Greetings from Bogotá, Colombia.
@statquest
@statquest 4 жыл бұрын
Muchas gracias!!! :)
@nehabalani7290
@nehabalani7290 4 жыл бұрын
Good to also see you sing rather than just hear :).. i had to comment this even before starting the training
@statquest
@statquest 4 жыл бұрын
😊 thanks
@jessehe9286
@jessehe9286 4 жыл бұрын
Great video! Love it! request that you do a comparison of XGBoost, CatBoost, and LightGBM, and a quest on ensemble learning.
@statquest
@statquest 4 жыл бұрын
I'll keep those topics in mind.
@Fressia94
@Fressia94 4 жыл бұрын
many thanks to your great and so understandable video. It literaly helps me a lot in Python and XGBoost package
@statquest
@statquest 4 жыл бұрын
Glad it helped!
@felixwhise4165
@felixwhise4165 4 жыл бұрын
just here to say thank you! will come back in a month when I have time to watch it. :)
@statquest
@statquest 4 жыл бұрын
BAM! :)
@dillonmears6696
@dillonmears6696 2 жыл бұрын
Great video! You did a wonderful job of explaining the process. Thanks!
@statquest
@statquest 2 жыл бұрын
Thanks!
@PradeepMahato007
@PradeepMahato007 4 жыл бұрын
BAMMMMM !!! This is awesome 👍 Josh !! Thank you for your contribution, really helpful for new learners.😊😊😊
@statquest
@statquest 4 жыл бұрын
Glad you liked it!
@statquest
@statquest 3 жыл бұрын
@@salilgupta9427 Thanks!
@felixzhao3435
@felixzhao3435 2 жыл бұрын
Thanks!
@statquest
@statquest 2 жыл бұрын
WOW! Thank you so much for supporting StatQuest!!! BAM! :)
@pacificbloom1
@pacificbloom1 3 жыл бұрын
Wonderful video josh.....pleasee pleasee pleasee make more videos on start to finish on python for different models.....i havr actually submitted my assignments using your techniques and got better results than what i have learned in my class Waiting for more to come especially on python :)
@statquest
@statquest 3 жыл бұрын
Thanks! There should be more python coming out soon.
@hollyching
@hollyching 4 жыл бұрын
Thanks Josh for another GREAT video! Just some sharing and minor questions. 1. try pandas_profiling when doing EDA. I personally love it. :) 2. some features are highly correlated (eg: city name and zip code). Do we need to handle that before running XGB? 3. Why choose 10 for early_stopping_rounds 4. What’s the difference between - df.loc[df['Total_Charges']==' '] - df[df['Total_Charges']==' '] 5. What’s the difference between - y=df['Churn_Value'].copy - y=df['Churn_Value'] Many thanks in advance! H
@statquest
@statquest 4 жыл бұрын
1) Thanks for the tip on pandas_profiling. 2) No. 3) It's a commonly used number 4) I don't know. 5) I believe the former is copy by value and the latter is copy by reference.
@NLarsen1989
@NLarsen1989 3 жыл бұрын
Yikes, if I ever understand something enough to explain it as succinctly as you do then I'd be very happy. I've been smashing through a lot of your videos the last few days after spending countless months on python, sklearn and all the usual plug and play solutions and it's not been until I've started watching these that I've started to feel things click into place
@statquest
@statquest 3 жыл бұрын
Awesome! I'm glad my videos are helpful! :)
@gisleberge4363
@gisleberge4363 2 жыл бұрын
Appreciate the Python related videos...helps to manoeuvre the code when I try to replicate the method later on...easy to follow the whole thing, also for beginners... 🙂
@statquest
@statquest 2 жыл бұрын
Thanks! There will be a lot more python stuff soon.
@viniantunes5944
@viniantunes5944 4 жыл бұрын
Josh, you're the didactic in person form. Thanks!
@statquest
@statquest 4 жыл бұрын
I appreciate that!
@trendytrenessh462
@trendytrenessh462 3 жыл бұрын
It is really lovely to be able to put a face to the "Hooray!", "BAM !!!" and "Note:"s 😄❤
@statquest
@statquest 3 жыл бұрын
bam!
@azingo2313
@azingo2313 Жыл бұрын
This man deserves Nobel Prize for peace of mind ❤❤
@statquest
@statquest Жыл бұрын
bam! :)
@williamTjS
@williamTjS 2 жыл бұрын
Amazing! Thanks so much for the detailed video
@statquest
@statquest 2 жыл бұрын
Thanks!
@miguelbarajas9892
@miguelbarajas9892 2 жыл бұрын
Freaking amazing! You explain everything so well. Thank you!
@statquest
@statquest 2 жыл бұрын
Thank you!
@coolmusic4meyee
@coolmusic4meyee 10 ай бұрын
Great explanation and walk-through, big thanks!
@statquest
@statquest 10 ай бұрын
Glad you enjoyed it!
@karannchew2534
@karannchew2534 3 жыл бұрын
51:41 Send them StatQuest coupons instead. Better alternative to milkshake coupon.
@statquest
@statquest 3 жыл бұрын
Bam! :)
@abdulkayumshaikh5411
@abdulkayumshaikh5411 3 жыл бұрын
Hello josh, you are doing amazing work keep doing
@statquest
@statquest 3 жыл бұрын
Thanks!
@haskycrawford
@haskycrawford 4 жыл бұрын
I love the channel! Eu aprendo + aqui do que a Graduação! You great josh!
@statquest
@statquest 4 жыл бұрын
Muito obrigado! :)
@marceloherdy2379
@marceloherdy2379 4 жыл бұрын
Man, this video is awesome! Congratulations!
@statquest
@statquest 4 жыл бұрын
Thank you! :)
@miloszpabis
@miloszpabis 8 ай бұрын
Love these Python tutorials after watching theory videos:D
@statquest
@statquest 8 ай бұрын
Glad you like them!
@nikhilshaganti5585
@nikhilshaganti5585 2 жыл бұрын
Thank you for this great tutorial Josh! Your videos have immensely helped in understanding some of the complex topics. One thing I noticed while watching this tutorial is the handling of categorical features. I think the explanation you gave for "Why not to use LabelEncoding?" is applicable for models like Linear Regression, SVM, NN but not for Trees because they only focus on the order of the feature values. For example, in a set of [1,2,3,4], threshold < 1.5 would be equivalent to threshold == 1. Please let me know if my thought process in wrong.
@statquest
@statquest 2 жыл бұрын
To be honest, I'm not really sure I understand your question or your example. If we have a categorical feature with 4 colors, red, blue, green, and black, and we give them numbers, like 1, 2, 3 and 4, then a threshold < 2.5, would not make much sense and, based on how trees are implemented, there would be no options for threshold == 2 or threshold == 3. So we wouldn't be able to separate colors very well.
@nikhilshaganti5585
@nikhilshaganti5585 2 жыл бұрын
right, in your example, the threshold of threshold
@statquest
@statquest 2 жыл бұрын
@@nikhilshaganti5585 Sure, you can continue to separate things in later branches - but the greedy nature of the algorithm doesn't ensure that you'll get to those later branches. So you start by making a guess that it makes sense to group red and blue together.
@andrewnguyen5881
@andrewnguyen5881 4 жыл бұрын
Again another quality video, I was following along with your every word, which did bring up come questions: 1. When XGBoost deals with missing data, does it ever consider splitting the missing data in half? --Using your example, would it ever do 1 blue and 1 green? What would happen if XGBoost encountered a data set with alot of missing values? 2. When you ran your Cross-Validation, was there a reason you only used 3 values for each hyperparameter? Could you have done more if you wanted to? 3. When I ran my Cross-Validation, my scale_pos_weight didn't change even though I used the same parameters you did. What do you think the problem could be?
@statquest
@statquest 4 жыл бұрын
1. Not that I know of. If there was a lot of missing values, it would still proceed just as described. 2. I wanted the cross validation to run in a short period of time, so I picked 3 values for each hyperparameter. If I had more time on my hand, or a cluster of computers, I might have considered trying more. 3. I'm not sure.
@andrewnguyen5881
@andrewnguyen5881 4 жыл бұрын
@@statquest Gotcha! Thank you for actually answering my questions haha I really appreciate the help as someone getting into more Machine Learning. Not sure what your upcoming videos will be but I think some great videos would be: -How you did the Cross Validation for the hyper parameters in this video? ( I have watched your Cross-Validation video, but I want to learn more about actually doing it) [I also still can get my first run to output the same values you had lol] -XGBoost for Regression in Python -How to select the best algorithm based on the scenario for Regression or Classification Just some thoughts :)
@statquest
@statquest 4 жыл бұрын
@@andrewnguyen5881 I'll keep those topics in mind.
@aaltinozz
@aaltinozz 4 жыл бұрын
all week searched for this thank u very much
@statquest
@statquest 4 жыл бұрын
Enjoy!
@Justjemming
@Justjemming Жыл бұрын
Prof Starmer, I’ve a question regarding what you said at 31:30 on OHE not working for regression models. Would you be able to kindly explain how to encode categorical data before training these models then?
@statquest
@statquest Жыл бұрын
Sure, see: kzbin.info/www/bejne/eaKveKmtnpJohsU
@kandiahchandrakumaran8521
@kandiahchandrakumaran8521 11 ай бұрын
Amazing. Wonderful videos. I started only 3 months ago and with your videos I am very confident to do nalysis with Python. Manny thanks. Is it possible to create a video for Nomogram for competic#ing risks for Time-Event (survival analysis) based on CPH outputs?
@statquest
@statquest 11 ай бұрын
I'll keep that topic in mind.
@HardikShah17
@HardikShah17 Жыл бұрын
Excellent Video @StatQuest ! Can we please have more Start to Finish python videos? Like Lightgbm maybe?
@statquest
@statquest Жыл бұрын
I'll keep that in mind! :)
@FlexCrush1981
@FlexCrush1981 4 жыл бұрын
Very enjoyable webinar Josh. Thanks for posting. I'm not 100% sure how to interpret the leaves. The largest leaf value is 0.188 where Dependents_No
@statquest
@statquest 4 жыл бұрын
The leaves are how much to increase or decrease the log(odds) for one category. For more details, see: kzbin.info/www/bejne/bpOUe3h6q8qhh7c
@eeera-op8vw
@eeera-op8vw 2 ай бұрын
awesome!! I hope you can do a video about XGBoost with regression too.
@statquest
@statquest 2 ай бұрын
I'll keep that in mind.
@shazm4020
@shazm4020 3 жыл бұрын
Thank you so much Josh Starmer! BAM!
@statquest
@statquest 3 жыл бұрын
bam!
@k44zackie
@k44zackie 3 жыл бұрын
Thank you very much for nice video! Very helpful for me.
@statquest
@statquest 3 жыл бұрын
Glad it was helpful!
@CharlotteWilson-j9d
@CharlotteWilson-j9d Жыл бұрын
First you’ve saved me this is super clear! I love all your videos so much 😊 I do have two questions… 1. How would you handle a classification problem with time series data? 2. Is there any other evaluation test you should or could do to evaluate the effectiveness of your model?
@statquest
@statquest Жыл бұрын
1. I've never used XGBoost with time series (or done much of any time series stuff before), so I can't answer this question. 2. There are lots of ways to evaluate a model. I only present a few, but there are many more, and they really depend on what you want your model to do. Just google it.
@yurimartins1499
@yurimartins1499 4 жыл бұрын
Thank you Josh!! As a suggestion, you could do a StatQuest explaining the measures in market basket analysis?
@statquest
@statquest 4 жыл бұрын
I'll keep that in mind.
@fgfanta
@fgfanta 4 жыл бұрын
This is gold, thank you! I am a rookie of this stuff, still I am unsure one-hot encoding is the best to do, especially to encode the city; being a category with high cardinality, all those variables for 1-hot encoding will require many splits (I guess). Perhaps using a different encoding, like mean encoding or frequency encoding, would be better, may allow to have a good fit with fewer splits.
@statquest
@statquest 4 жыл бұрын
Maybe. Try it out and let me know if you get something that works better.
@thiagotanure2212
@thiagotanure2212 3 жыл бұрын
amazing tutorial Josh! Shared with my friends =D Could you do one of these about pygam? It would be amazing :)
@statquest
@statquest 3 жыл бұрын
I'll keep that in mind.
@imranselim5978
@imranselim5978 3 жыл бұрын
Thank you for the excellent content and walk through. While One Hot Encoding the categorical variables should not we use k-1 variables for k categories? For example, for the Payment_Method column if Mailed_Check=0, Electronic_Check=0, and Bank_Transfer=0 doesn't that imply Credit_Card=1 and make Credit_Card column redundant?
@statquest
@statquest 3 жыл бұрын
In this setting it doesn't matter - unlike linear models, xgboost doesn't have to worry about inverting a matrix to solve for parameters.
@ArunKumar-fg1yj
@ArunKumar-fg1yj 3 жыл бұрын
Hello Josh, I am trying to follow your steps. However, plot_confusion_matrix(clf_xgb, X_test, y_test, values_format = 'd', display_labels=['Did not leave','Left']) Throw an below error message:- XGBoostError: [09:28:33] c:\users\administrator\workspace\xgboost-win64_release_1.5.1\src\c_api\c_api_utils.h:161: Invalid missing value: null Can you please tell me what i am doing wrong
@statquest
@statquest 3 жыл бұрын
Are you using the code in the jupyter notebook or your own?
@arijitdas4504
@arijitdas4504 4 жыл бұрын
If "Stay Cool" had a face, it'd be you :)
@statquest
@statquest 4 жыл бұрын
Bam!
@matattz
@matattz Жыл бұрын
hey first of all thank you so much for all your videos! I understood everything and i am hyped about trying out what i have learned, but i have one question. So after we built our model and we are happy with how it performs, how do we feed it with actual new unseen data? If i have for example just the data for one specific customer and i want to check if he/she leaves or stays, what would the code look like.
@statquest
@statquest Жыл бұрын
Just use "clf_xgb.predict(YOUR DATA)", where "YOUR DATA" is formatted in the same order as the data used for training.
@matattz
@matattz Жыл бұрын
@@statquest oh god of course thank you! I had a blackout for a moment haha
@nalidbass
@nalidbass 3 жыл бұрын
Josh, won't there be target leakage when we evaluate 'aucpr' on the testing dataset to determine the number of trees?
@statquest
@statquest 3 жыл бұрын
Yes. We probably should have done that with cross validation and just the training data.
@BiffBifford
@BiffBifford 4 жыл бұрын
I am not a math geek. I am here strictly for the intro song!
@magtazeum4071
@magtazeum4071 4 жыл бұрын
same here
@statquest
@statquest 4 жыл бұрын
DOUBLE BAM! :)
@rishavpaudel7591
@rishavpaudel7591 4 жыл бұрын
@@statquest Sir your double and triple bam has really taught lots of things for me to be honest. Me as a student doing Post-Graduate in AI, lots of love form Nepal.
@tas3159
@tas3159 4 жыл бұрын
Thanks for that very clear tutorial. A question. On 21:25 why do you use loc and not iloc.
@statquest
@statquest 4 жыл бұрын
iloc can only take integers for indexes, and here we are using booleans. So if we wanted to use iloc, we'd have to convert the booleans to integers.
@tas3159
@tas3159 4 жыл бұрын
@@statquest kind of a low level question in a high level tutorial. Thanks for taking the time to answer.
@sunsiney7014
@sunsiney7014 2 жыл бұрын
Great video! Very informative and clearly explained! Could you please also present BART?
@statquest
@statquest 2 жыл бұрын
I'll keep that in mind.
@its_me7363
@its_me7363 4 жыл бұрын
Can we use xgboost for multilabel classification? If yes, what parameters should be changed?
@statquest
@statquest 4 жыл бұрын
Yes, change the loss function.
@juliakuchno7308
@juliakuchno7308 9 ай бұрын
Hey! Thank you so much for all the work you have been doing! I have a question regarding the leaves values if we use xgboost in a regression problem. What do they mean then? Is it probability that the average of the observations that were segregated to that particular leaf has this and this probability of contributing to the loss function? Thank you so much for your help!
@statquest
@statquest 9 ай бұрын
see: kzbin.info/www/bejne/haWnaaqMlqugbKc
XGBoost Part 1 (of 4): Regression
25:46
StatQuest with Josh Starmer
Рет қаралды 683 М.
How to train XGBoost models in Python
18:57
Lianne and Justin
Рет қаралды 41 М.
Сестра обхитрила!
00:17
Victoria Portfolio
Рет қаралды 958 М.
Cheerleader Transformation That Left Everyone Speechless! #shorts
00:27
Fabiosa Best Lifehacks
Рет қаралды 16 МЛН
Classification Trees in Python from Start to Finish
1:06:24
StatQuest with Josh Starmer
Рет қаралды 192 М.
Time Series Forecasting with XGBoost - Advanced Methods
22:02
Rob Mulla
Рет қаралды 132 М.
AdaBoost, Clearly Explained
20:54
StatQuest with Josh Starmer
Рет қаралды 791 М.
Gradient Descent, Step-by-Step
23:54
StatQuest with Josh Starmer
Рет қаралды 1,4 МЛН
XGBoost Part 4 (of 4): Crazy Cool Optimizations
24:27
StatQuest with Josh Starmer
Рет қаралды 95 М.
When to Use XGBoost
7:08
Super Data Science: ML & AI Podcast with Jon Krohn
Рет қаралды 6 М.
XGBoost Model in Python | Tutorial | Machine Learning
18:26
Harsh Kumar
Рет қаралды 32 М.
681: XGBoost: The Ultimate Classifier - with Matt Harrison
1:09:56
Super Data Science: ML & AI Podcast with Jon Krohn
Рет қаралды 6 М.
Support Vector Machines Part 1 (of 3): Main Ideas!!!
20:32
StatQuest with Josh Starmer
Рет қаралды 1,4 МЛН