NOTE: You can support StatQuest by purchasing the Jupyter Notebook and Python code seen in this video here: statquest.gumroad.com/l/uroxo Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
@PetStuBa4 жыл бұрын
Dear Josh ... I have a request for new videos or livechats ... could you explain us these tests maybe ? ... Tukey, Bonferroni and Scheffé , it's hard for me to understand , you explain everything so well ... could be very helpful for a lot of people out there ... have a nice day , greetings from Europe
@znull33563 жыл бұрын
Please keep doing these long-form Python tutorials on the various ideas we've covered in earlier 'Quests. They're great for those of us working in Python, and they give me another way to support the channel. It has been a more-than-pleasant surprise that as I've grown from learning the basics of stats to machine learning and eventually deep learning, StatQuest has grown along with me into those very same fields. Thanks Josh.
@statquest3 жыл бұрын
That's the plan!
@kelvinhsueh54343 жыл бұрын
You are amazing. Can't imagine how much work you put into those step-by-step tutorials. Just bought the Jupyter Notebook code and it's beyond worth it! Thank you :)
@statquest3 жыл бұрын
Thank you very much for your support! :)
@zheyizhao48654 жыл бұрын
Hey Josh, I just purchased all of your 3 Jupyter Notebook! I transferred from Econ major to Data Science, it was a nightmare before I find your channel. Your channel shed the light upon my academic career! Look forward to more of the 'Python from Start to Finish' series, and I will definitely support it!
@statquest4 жыл бұрын
Awesome! Thank you!
@samxu53202 жыл бұрын
Your pronunciation is the most authentic and clearest that I have ever heard
@statquest2 жыл бұрын
Wow! Thank you!
@abhinaym59234 жыл бұрын
I am purchasing the Jupiter notebook to contribute to your work! Thanks a lot for this video! You are awesome! Will be very very happy to have more ML tutorials and thank you Josh!
@statquest4 жыл бұрын
Thank you very much! :)
@kd6600xt2 жыл бұрын
Quick note: At 43:30 instead of using the plot_confusion_matrix() which is now depreciated, you need to use ConfusionMatrixDisplay.from_estimator(). This can be done as follows: Include: from sklearn.metrics import ConfusionMatrixDisplay at the start with the other imports. Then when printing the confusion matrix you need to use the line: ConfusionMatrixDisplay.from_estimator(clf_xgb, X_test, y_test, display_labels=["Did not leave", "Left"])
@statquest2 жыл бұрын
Thank you very much! I really appreciate it.
@kd6600xt2 жыл бұрын
@@statquest no problem!
@emilioluissaenzguillen5719 Жыл бұрын
I am getting this error following what you say: XGBoostError: [17:07:52] c:\buildkite-agent\builds\buildkite-windows-cpu-autoscaling-group-i-08de971ced8a8cdc6-1\xgboost\xgboost-ci-windows\src\c_api\c_api_utils.h:167: Invalid missing value: null Do you know why it might be? Thanks.
@emilioluissaenzguillen5719 Жыл бұрын
I solved the above error by setting missing=0 on the above code as follows: clf_xgb = xgb.XGBClassifier(objective='binary:logistic', missing=0, seed=42, ## the next three arguments set up early stopping. eval_metric='aucpr', early_stopping_rounds=10)
@benchen9910 Жыл бұрын
@@emilioluissaenzguillen5719 thanks it works
@tantalumCRAFT2 жыл бұрын
This is hands down the best Python tutorial on KZbin.. not just for XGBoost, but overall Python logic and syntax. Nice work, subscribed!!
@statquest2 жыл бұрын
Wow! Thank you!
@OlgaW_Lavender4 ай бұрын
Outstanding! Complete with the thinking path - how to analyze variables in a logical way - and with common errors. Just purchased the Notebook. Thank you for all of your work on this channel.
@statquest4 ай бұрын
Triple bam!!! Thank you very much for supporting StatQuest! :)
@sudheerrao074 жыл бұрын
Wow. Finally I see a face for the name. Your previous videos have had immensely helpul. I assumed you are a very senior person. I am not measuring your age. I mean, your way of explaining seemed like a professor with half a century of experience. But in reality, you are quite young. Thank you for all your simple-yet-detailed videos. No words to quantify how much I appreciate them. 🙏
@statquest4 жыл бұрын
Wow, thanks!
@RahulEdvin4 жыл бұрын
Josh, you’re well and truly phenomenal ! Love from Madras !
@prashanthb65214 жыл бұрын
Chennai
@statquest4 жыл бұрын
BAM! Thank you very much!!!
@starmerf4 жыл бұрын
Hi Rahul I taught atIIT-madras 19192-1993 lived on campus across from post office josh visited us there
@RahulEdvin4 жыл бұрын
Frank Starmer Hello Frank, wow! That’s great to know ! :) I’m sure you must have had a good time here. Cheers :)
@prashanthb65214 жыл бұрын
@@starmerf wow the world is a small place ☺
@socksdealer22 күн бұрын
I understand you better than if it would be explained in my native language) Thank you for your work!
@statquest22 күн бұрын
Thank you!
@janskovajsa2374 ай бұрын
41:21 In my version of xgboost were parameters early_stopping_rounds=10 and eval_metric='aucpr' moved to XGBClassifier, so if it is not working I suggest trying this. Although I really appreciate value of StatQuest videos, I have to admit I really hate all that singing and Bams. Makes me feel like attending school for slower children
@statquest4 ай бұрын
Sorry to hear that.
@ovrava14 күн бұрын
@@statquest i like it. its cool.
@RahulVarshney_4 жыл бұрын
"25:36" that's what i was waiting for from the beginning...Truly amazing.. You are providing precious information..CHEERS
@statquest4 жыл бұрын
Glad it was helpful!
@RahulVarshney_4 жыл бұрын
@@statquest one small request..can you provide some valuable information through a video like which model to chose for different datasets..how do we decide what model we should chose...thanks in advance
@statquest4 жыл бұрын
@@RahulVarshney_ I'll keep that in mind. In the mean time, check out: scikit-learn.org/stable/tutorial/machine_learning_map/index.html
@RahulVarshney_4 жыл бұрын
@@statquest that is amazing ...i will complete it today itself thanks again for your prompt reply Can i get your email
@markrauschkolb53703 жыл бұрын
Extremeley helpful - would love to see more from the "start to finish" series
@statquest3 жыл бұрын
I'm working on it.
@navyasreepinjala15822 жыл бұрын
I love your teaching style. Extremely helpful for a beginner like me. Really helped me a lot in my exams. No words. You are the best!!!!
@statquest2 жыл бұрын
Thank you!
@jinwooseong28623 жыл бұрын
I watched your all video for XGBoost. It helps me a lot. very appreciated!
@statquest3 жыл бұрын
Glad it helped!
@julieirwin32884 жыл бұрын
What did we do to deserve a great guy like Josh ? Thank you Josh!
@statquest4 жыл бұрын
Thanks! :)
@cszthomas2 жыл бұрын
Thank you for the great work!
@statquest2 жыл бұрын
Wow! Thank you so much for supporting StatQuest!!! BAM! :)
@AdamsJamsYouTube2 жыл бұрын
Josh, this video is epic and really helped me understand the actual process of tuning hyperparameters, something that had been a bit of a black box until I saw this video. Your channel is awesome too - great jingles as well :D
@statquest2 жыл бұрын
Thank you!
@fernandes14312 жыл бұрын
Can't thank you enough for the clearest and best explanation on KZbin
@statquest2 жыл бұрын
Thank you!
@sane7263 Жыл бұрын
That's the Best video I've ever seen. Period. TRIPLE BAM! :)
@statquest Жыл бұрын
Wow, thanks!
@romanroman52263 жыл бұрын
Awesome video! The cleanest xgboost explanation a have ever seen.
@statquest3 жыл бұрын
Wow, thanks!
@SergioPolimante3 жыл бұрын
This kind of content is SUPER HARD to produce. I really understand and appreciate your effort here. Thanks and congratulations.
@statquest3 жыл бұрын
Thank you very much!
@josephhayes91523 жыл бұрын
Thanks for the great tutorial! You covered a lot of details (mostly data cleaning) that are often overlooked or skipped as 'trivial' steps.
@statquest3 жыл бұрын
Thank you! Yes, "data cleaning" is 95% of the job.
@darksoul13814 жыл бұрын
I was wondering how to find stuff regarding dealing with actual churn data and sampling issues. The tutorial addressed a lot of them. Thanks!
@statquest4 жыл бұрын
Thanks!
@Krath19884 жыл бұрын
Liked, favorited, recommended, shared, and sacrificed my first-born to this video.
@statquest4 жыл бұрын
TRIPLE BAM! :)
@mykindofgaming73456 ай бұрын
😂😂😂
@VarunKumar-pz5si3 жыл бұрын
I'm very grateful to have you as my teacher.
@statquest3 жыл бұрын
Thanks!
@maurosobreira86953 жыл бұрын
A true, real Master Class - You got my support!
@statquest3 жыл бұрын
Thank you! :)
@KukaKaz4 жыл бұрын
Yes pls more videos with python❤thank u for the webinar
@statquest4 жыл бұрын
Thanks! :)
@thomsondcruz Жыл бұрын
Absolutely loved this video Josh. It breaks down everything into understandable chunks. Thank you and God bless. BAM! The only thing I missed (and its very minor) was taking in a new data row and making an actual prediction by using the model.
@statquest Жыл бұрын
Thanks! For new data, you just call clf_xgb.predict() with the row of new data.
@francovega70892 жыл бұрын
I really appreciate your content Josh. Thanks for your time
@statquest2 жыл бұрын
Thank you!
@henkhbit57484 жыл бұрын
Greatly appreciated this videoLike you said, telcos should gives more effort to tie the current customers. In real practice you want to know what the probability is that a current customer will no longer renew the subscription. You should then try to bind the customer with a high risk with incentives.
@statquest4 жыл бұрын
True!
@danielmagical62984 жыл бұрын
Hi Josh, great job really helpful material as I'm discovering XGBoost just now. Thank you and keep you great work!
@statquest4 жыл бұрын
Thank you very much! :)
@aksharkottuvada2 жыл бұрын
Thank you Josh. Needed this tutorial to better solve a ML Problem as part of my internship :)
@statquest2 жыл бұрын
Glad it helped!
@minseong46443 жыл бұрын
Such an amazing job Josh.. Couldn't find any better explanation than this! Mesmerizing!
@statquest3 жыл бұрын
Wow, thanks!
@muskanroxx223 жыл бұрын
You're a very kind human being Josh!! Thank you so much for making these videos. Your content is gold!!! I am new to data science and this is exactly what I needed!! :) Much love from India!
@statquest3 жыл бұрын
Glad you like my videos!! BAM! :)
@muskanroxx223 жыл бұрын
@@statquest Hey Josh! I am learning about Bayesian Optimizer and I don't seem to get it even after watching tons of tutorials, can you suggest where I should learn it from please? I couldn't find a video on your channel on this.
@statquest3 жыл бұрын
@@muskanroxx22 Unfortunately I don't know of a good source for that.
@daniloyukihara21433 жыл бұрын
hurray, i picture you totally different! Thanks a lot for all the videos!
@statquest3 жыл бұрын
Glad you like them!
@parismollo70164 жыл бұрын
I haven't watched it yet but I know this will be great!!!!!!!! Thank you Josh.
@statquest4 жыл бұрын
BAM! :)
@codinghighlightswithsadra7343 Жыл бұрын
Thank you so much for the work that you used in step by step tutorial. it was amazing.
@statquest Жыл бұрын
You're very welcome!
@marekslazak10032 жыл бұрын
Jesus, i just learned more over 10 minutes of this than i did throughtout an entire semester of a similar subject on CS. ++ tutorial
@statquest2 жыл бұрын
Thank you!
@godoren4 жыл бұрын
Thank you for your job, the explanation of the topic is very clear and transparent.
@statquest4 жыл бұрын
Thank you very much! :)
@dagma34374 жыл бұрын
I'm so glad you are a bad-ass stats guru and a teacher waaaaaaaaay before a singer and a guitarist ...Thank you! ;)
@statquest4 жыл бұрын
joshuastarmer.bandcamp.com/
@dagma34374 жыл бұрын
StatQuest with Josh Starmer ...not bad. A poor man’s Jack Johnson 🤔
@dagma34374 жыл бұрын
Just pulling your leg. Thanks for all the content on stats
@keyurshah84512 жыл бұрын
Hey Mate, amazing tutorial. Very complex problem explained in really simple and effective way. I am using XGBOOST for one of the classification model and after watching your video it made me realise I can further improve my model. So thank you again and keep making those videos. Kudos to you and long live data science 🙏🙏
@statquest2 жыл бұрын
Glad it helped!
@marcelocoip72752 жыл бұрын
Hard work here, I'ts funny how the responsabile scientist and the funny guy coexist, very useful lesson, thanks!
@statquest2 жыл бұрын
Thanks! 😃!
@ketanshetye50294 жыл бұрын
could not help u with money right now , but i watched all the adds in video , hope that helps u financially . love u videos . keep up!!
@statquest4 жыл бұрын
I appreciate that
@sreejaysreedharan40854 жыл бұрын
Lovely and priceless video Josh...BAM BAM BAM as usual !! :) God bless. .
@statquest4 жыл бұрын
Thank you very much! :)
@Toyotaman Жыл бұрын
38:05 stratify=y is not for yes is for dependent variable y. if you have a different variable, you gotta pass your response variable's name to stratify
@statquest Жыл бұрын
Oops! thanks for catching that.
@jimmyrico53644 жыл бұрын
This is a great piece of work, thanks for sharing it! Maybe the only additional piece I'd add which I've found useful on the documentation of XGBoost is that one can take advantage of parallel computing (more cores or using a graphic card your machine or you could have on the cloud) by simply passing the parameter (n_jobs = -1) while doing both, the RandomizedSearchCV stage and the setting the XGB regressor type (XGBRegressor for example).
@statquest4 жыл бұрын
Great tip! BAM!
@mdaroza3 жыл бұрын
Amazingly organized and well explained!
@statquest3 жыл бұрын
Thank you!
@chiragpalan97804 жыл бұрын
This guy is amazing. DOUBLE BAM 💥 💥
@statquest4 жыл бұрын
Thank you! :)
@saeedesmailii4 жыл бұрын
It was extremely helpful. Please continue making these videos. I suggest making a video to explain the clustering with unlabeled data, and predicting the future trend in time-series data.
@statquest4 жыл бұрын
I'll keep that in mind. :)
@andrewxie98963 жыл бұрын
you are simply an amazing human being, also the notebooks are great! :D
@statquest3 жыл бұрын
Thanks!
@danielpinzon92844 жыл бұрын
Love u Josh.... you are a TRIPLE BAM!!! Greetings from Bogotá, Colombia.
@statquest4 жыл бұрын
Muchas gracias!!! :)
@nehabalani72904 жыл бұрын
Good to also see you sing rather than just hear :).. i had to comment this even before starting the training
@statquest4 жыл бұрын
😊 thanks
@jessehe92864 жыл бұрын
Great video! Love it! request that you do a comparison of XGBoost, CatBoost, and LightGBM, and a quest on ensemble learning.
@statquest4 жыл бұрын
I'll keep those topics in mind.
@Fressia944 жыл бұрын
many thanks to your great and so understandable video. It literaly helps me a lot in Python and XGBoost package
@statquest4 жыл бұрын
Glad it helped!
@felixwhise41654 жыл бұрын
just here to say thank you! will come back in a month when I have time to watch it. :)
@statquest4 жыл бұрын
BAM! :)
@dillonmears66962 жыл бұрын
Great video! You did a wonderful job of explaining the process. Thanks!
@statquest2 жыл бұрын
Thanks!
@PradeepMahato0074 жыл бұрын
BAMMMMM !!! This is awesome 👍 Josh !! Thank you for your contribution, really helpful for new learners.😊😊😊
@statquest4 жыл бұрын
Glad you liked it!
@statquest3 жыл бұрын
@@salilgupta9427 Thanks!
@felixzhao34352 жыл бұрын
Thanks!
@statquest2 жыл бұрын
WOW! Thank you so much for supporting StatQuest!!! BAM! :)
@pacificbloom13 жыл бұрын
Wonderful video josh.....pleasee pleasee pleasee make more videos on start to finish on python for different models.....i havr actually submitted my assignments using your techniques and got better results than what i have learned in my class Waiting for more to come especially on python :)
@statquest3 жыл бұрын
Thanks! There should be more python coming out soon.
@hollyching4 жыл бұрын
Thanks Josh for another GREAT video! Just some sharing and minor questions. 1. try pandas_profiling when doing EDA. I personally love it. :) 2. some features are highly correlated (eg: city name and zip code). Do we need to handle that before running XGB? 3. Why choose 10 for early_stopping_rounds 4. What’s the difference between - df.loc[df['Total_Charges']==' '] - df[df['Total_Charges']==' '] 5. What’s the difference between - y=df['Churn_Value'].copy - y=df['Churn_Value'] Many thanks in advance! H
@statquest4 жыл бұрын
1) Thanks for the tip on pandas_profiling. 2) No. 3) It's a commonly used number 4) I don't know. 5) I believe the former is copy by value and the latter is copy by reference.
@NLarsen19893 жыл бұрын
Yikes, if I ever understand something enough to explain it as succinctly as you do then I'd be very happy. I've been smashing through a lot of your videos the last few days after spending countless months on python, sklearn and all the usual plug and play solutions and it's not been until I've started watching these that I've started to feel things click into place
@statquest3 жыл бұрын
Awesome! I'm glad my videos are helpful! :)
@gisleberge43632 жыл бұрын
Appreciate the Python related videos...helps to manoeuvre the code when I try to replicate the method later on...easy to follow the whole thing, also for beginners... 🙂
@statquest2 жыл бұрын
Thanks! There will be a lot more python stuff soon.
@viniantunes59444 жыл бұрын
Josh, you're the didactic in person form. Thanks!
@statquest4 жыл бұрын
I appreciate that!
@trendytrenessh4623 жыл бұрын
It is really lovely to be able to put a face to the "Hooray!", "BAM !!!" and "Note:"s 😄❤
@statquest3 жыл бұрын
bam!
@azingo2313 Жыл бұрын
This man deserves Nobel Prize for peace of mind ❤❤
@statquest Жыл бұрын
bam! :)
@williamTjS2 жыл бұрын
Amazing! Thanks so much for the detailed video
@statquest2 жыл бұрын
Thanks!
@miguelbarajas98922 жыл бұрын
Freaking amazing! You explain everything so well. Thank you!
@statquest2 жыл бұрын
Thank you!
@coolmusic4meyee10 ай бұрын
Great explanation and walk-through, big thanks!
@statquest10 ай бұрын
Glad you enjoyed it!
@karannchew25343 жыл бұрын
51:41 Send them StatQuest coupons instead. Better alternative to milkshake coupon.
@statquest3 жыл бұрын
Bam! :)
@abdulkayumshaikh54113 жыл бұрын
Hello josh, you are doing amazing work keep doing
@statquest3 жыл бұрын
Thanks!
@haskycrawford4 жыл бұрын
I love the channel! Eu aprendo + aqui do que a Graduação! You great josh!
@statquest4 жыл бұрын
Muito obrigado! :)
@marceloherdy23794 жыл бұрын
Man, this video is awesome! Congratulations!
@statquest4 жыл бұрын
Thank you! :)
@miloszpabis8 ай бұрын
Love these Python tutorials after watching theory videos:D
@statquest8 ай бұрын
Glad you like them!
@nikhilshaganti55852 жыл бұрын
Thank you for this great tutorial Josh! Your videos have immensely helped in understanding some of the complex topics. One thing I noticed while watching this tutorial is the handling of categorical features. I think the explanation you gave for "Why not to use LabelEncoding?" is applicable for models like Linear Regression, SVM, NN but not for Trees because they only focus on the order of the feature values. For example, in a set of [1,2,3,4], threshold < 1.5 would be equivalent to threshold == 1. Please let me know if my thought process in wrong.
@statquest2 жыл бұрын
To be honest, I'm not really sure I understand your question or your example. If we have a categorical feature with 4 colors, red, blue, green, and black, and we give them numbers, like 1, 2, 3 and 4, then a threshold < 2.5, would not make much sense and, based on how trees are implemented, there would be no options for threshold == 2 or threshold == 3. So we wouldn't be able to separate colors very well.
@nikhilshaganti55852 жыл бұрын
right, in your example, the threshold of threshold
@statquest2 жыл бұрын
@@nikhilshaganti5585 Sure, you can continue to separate things in later branches - but the greedy nature of the algorithm doesn't ensure that you'll get to those later branches. So you start by making a guess that it makes sense to group red and blue together.
@andrewnguyen58814 жыл бұрын
Again another quality video, I was following along with your every word, which did bring up come questions: 1. When XGBoost deals with missing data, does it ever consider splitting the missing data in half? --Using your example, would it ever do 1 blue and 1 green? What would happen if XGBoost encountered a data set with alot of missing values? 2. When you ran your Cross-Validation, was there a reason you only used 3 values for each hyperparameter? Could you have done more if you wanted to? 3. When I ran my Cross-Validation, my scale_pos_weight didn't change even though I used the same parameters you did. What do you think the problem could be?
@statquest4 жыл бұрын
1. Not that I know of. If there was a lot of missing values, it would still proceed just as described. 2. I wanted the cross validation to run in a short period of time, so I picked 3 values for each hyperparameter. If I had more time on my hand, or a cluster of computers, I might have considered trying more. 3. I'm not sure.
@andrewnguyen58814 жыл бұрын
@@statquest Gotcha! Thank you for actually answering my questions haha I really appreciate the help as someone getting into more Machine Learning. Not sure what your upcoming videos will be but I think some great videos would be: -How you did the Cross Validation for the hyper parameters in this video? ( I have watched your Cross-Validation video, but I want to learn more about actually doing it) [I also still can get my first run to output the same values you had lol] -XGBoost for Regression in Python -How to select the best algorithm based on the scenario for Regression or Classification Just some thoughts :)
@statquest4 жыл бұрын
@@andrewnguyen5881 I'll keep those topics in mind.
@aaltinozz4 жыл бұрын
all week searched for this thank u very much
@statquest4 жыл бұрын
Enjoy!
@Justjemming Жыл бұрын
Prof Starmer, I’ve a question regarding what you said at 31:30 on OHE not working for regression models. Would you be able to kindly explain how to encode categorical data before training these models then?
@statquest Жыл бұрын
Sure, see: kzbin.info/www/bejne/eaKveKmtnpJohsU
@kandiahchandrakumaran852111 ай бұрын
Amazing. Wonderful videos. I started only 3 months ago and with your videos I am very confident to do nalysis with Python. Manny thanks. Is it possible to create a video for Nomogram for competic#ing risks for Time-Event (survival analysis) based on CPH outputs?
@statquest11 ай бұрын
I'll keep that topic in mind.
@HardikShah17 Жыл бұрын
Excellent Video @StatQuest ! Can we please have more Start to Finish python videos? Like Lightgbm maybe?
@statquest Жыл бұрын
I'll keep that in mind! :)
@FlexCrush19814 жыл бұрын
Very enjoyable webinar Josh. Thanks for posting. I'm not 100% sure how to interpret the leaves. The largest leaf value is 0.188 where Dependents_No
@statquest4 жыл бұрын
The leaves are how much to increase or decrease the log(odds) for one category. For more details, see: kzbin.info/www/bejne/bpOUe3h6q8qhh7c
@eeera-op8vw2 ай бұрын
awesome!! I hope you can do a video about XGBoost with regression too.
@statquest2 ай бұрын
I'll keep that in mind.
@shazm40203 жыл бұрын
Thank you so much Josh Starmer! BAM!
@statquest3 жыл бұрын
bam!
@k44zackie3 жыл бұрын
Thank you very much for nice video! Very helpful for me.
@statquest3 жыл бұрын
Glad it was helpful!
@CharlotteWilson-j9d Жыл бұрын
First you’ve saved me this is super clear! I love all your videos so much 😊 I do have two questions… 1. How would you handle a classification problem with time series data? 2. Is there any other evaluation test you should or could do to evaluate the effectiveness of your model?
@statquest Жыл бұрын
1. I've never used XGBoost with time series (or done much of any time series stuff before), so I can't answer this question. 2. There are lots of ways to evaluate a model. I only present a few, but there are many more, and they really depend on what you want your model to do. Just google it.
@yurimartins14994 жыл бұрын
Thank you Josh!! As a suggestion, you could do a StatQuest explaining the measures in market basket analysis?
@statquest4 жыл бұрын
I'll keep that in mind.
@fgfanta4 жыл бұрын
This is gold, thank you! I am a rookie of this stuff, still I am unsure one-hot encoding is the best to do, especially to encode the city; being a category with high cardinality, all those variables for 1-hot encoding will require many splits (I guess). Perhaps using a different encoding, like mean encoding or frequency encoding, would be better, may allow to have a good fit with fewer splits.
@statquest4 жыл бұрын
Maybe. Try it out and let me know if you get something that works better.
@thiagotanure22123 жыл бұрын
amazing tutorial Josh! Shared with my friends =D Could you do one of these about pygam? It would be amazing :)
@statquest3 жыл бұрын
I'll keep that in mind.
@imranselim59783 жыл бұрын
Thank you for the excellent content and walk through. While One Hot Encoding the categorical variables should not we use k-1 variables for k categories? For example, for the Payment_Method column if Mailed_Check=0, Electronic_Check=0, and Bank_Transfer=0 doesn't that imply Credit_Card=1 and make Credit_Card column redundant?
@statquest3 жыл бұрын
In this setting it doesn't matter - unlike linear models, xgboost doesn't have to worry about inverting a matrix to solve for parameters.
@ArunKumar-fg1yj3 жыл бұрын
Hello Josh, I am trying to follow your steps. However, plot_confusion_matrix(clf_xgb, X_test, y_test, values_format = 'd', display_labels=['Did not leave','Left']) Throw an below error message:- XGBoostError: [09:28:33] c:\users\administrator\workspace\xgboost-win64_release_1.5.1\src\c_api\c_api_utils.h:161: Invalid missing value: null Can you please tell me what i am doing wrong
@statquest3 жыл бұрын
Are you using the code in the jupyter notebook or your own?
@arijitdas45044 жыл бұрын
If "Stay Cool" had a face, it'd be you :)
@statquest4 жыл бұрын
Bam!
@matattz Жыл бұрын
hey first of all thank you so much for all your videos! I understood everything and i am hyped about trying out what i have learned, but i have one question. So after we built our model and we are happy with how it performs, how do we feed it with actual new unseen data? If i have for example just the data for one specific customer and i want to check if he/she leaves or stays, what would the code look like.
@statquest Жыл бұрын
Just use "clf_xgb.predict(YOUR DATA)", where "YOUR DATA" is formatted in the same order as the data used for training.
@matattz Жыл бұрын
@@statquest oh god of course thank you! I had a blackout for a moment haha
@nalidbass3 жыл бұрын
Josh, won't there be target leakage when we evaluate 'aucpr' on the testing dataset to determine the number of trees?
@statquest3 жыл бұрын
Yes. We probably should have done that with cross validation and just the training data.
@BiffBifford4 жыл бұрын
I am not a math geek. I am here strictly for the intro song!
@magtazeum40714 жыл бұрын
same here
@statquest4 жыл бұрын
DOUBLE BAM! :)
@rishavpaudel75914 жыл бұрын
@@statquest Sir your double and triple bam has really taught lots of things for me to be honest. Me as a student doing Post-Graduate in AI, lots of love form Nepal.
@tas31594 жыл бұрын
Thanks for that very clear tutorial. A question. On 21:25 why do you use loc and not iloc.
@statquest4 жыл бұрын
iloc can only take integers for indexes, and here we are using booleans. So if we wanted to use iloc, we'd have to convert the booleans to integers.
@tas31594 жыл бұрын
@@statquest kind of a low level question in a high level tutorial. Thanks for taking the time to answer.
@sunsiney70142 жыл бұрын
Great video! Very informative and clearly explained! Could you please also present BART?
@statquest2 жыл бұрын
I'll keep that in mind.
@its_me73634 жыл бұрын
Can we use xgboost for multilabel classification? If yes, what parameters should be changed?
@statquest4 жыл бұрын
Yes, change the loss function.
@juliakuchno73089 ай бұрын
Hey! Thank you so much for all the work you have been doing! I have a question regarding the leaves values if we use xgboost in a regression problem. What do they mean then? Is it probability that the average of the observations that were segregated to that particular leaf has this and this probability of contributing to the loss function? Thank you so much for your help!