XGBoost Part 1 (of 4): Regression

Рет қаралды 612,559

Күн бұрын

XGBoost is an extreme machine learning algorithm, and that means it's got lots of parts. In this video, we focus on the unique regression trees that XGBoost uses when applied to Regression problems.
NOTE: This StatQuest assumes that you are already familiar with...
The main ideas behind Gradient Boost for Regression: • Gradient Boost Part 1 ...
...and the main ideas behind Regularization: • Regularization Part 1:...
Also note, this StatQuest is based on the following sources:
The original XGBoost manuscript: arxiv.org/pdf/1603.02754.pdf
And the XGBoost Documentation: xgboost.readthedocs.io/en/lat...
For a complete index of all the StatQuest videos, check out:
statquest.org/video-index/
If you'd like to support StatQuest, please consider...
Buying The StatQuest Illustrated Guide to Machine Learning!!!
PDF - statquest.gumroad.com/l/wvtmc
Paperback - www.amazon.com/dp/B09ZCKR4H6
Kindle eBook - www.amazon.com/dp/B09ZG79HXC
Patreon: / statquest
...or...
KZbin Membership: / @statquest
...a cool StatQuest t-shirt or sweatshirt:
shop.spreadshirt.com/statques...
...buying one or two of my songs (or go large and get a whole album!)
joshuastarmer.bandcamp.com/
...or just donating to StatQuest!
www.paypal.me/statquest
Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
/ joshuastarmer
0:00 Awesome song and introduction
2:35 The initial prediction
3:11 Building an XGBoost Tree for regression
4:07 Calculating Similarity Scores
8:23 Calculating Gain to evaluate different thresholds
13:02 Pruning an XGBoost Tree
15:15 Building an XGBoost Tree with regularization
19:29 Calculating output values for an XGBoost Tree
21:39 Making predictions with XGBoost
23:54 Summary of concepts and main ideas
Corrections:
16:50 I say "66", but I meant to say "62.48". However, either way, the conclusion is the same.
22:03 In the original XGBoost documents they use the epsilon symbol to refer to the learning rate, but in the actual implementation, this is controlled via the "eta" parameter. So, I guess to be consistent with the original documentation, I made the same mistake! :)
#statquest #xgboost

Пікірлер: 800

@statquest 4 жыл бұрын

Corrections: 16:50 I say "66", but I meant to say "62.48". However, either way, the conclusion is the same. 22:03 In the original XGBoost documents they use the epsilon symbol to refer to the learning rate, but in the actual implementation, this is controlled via the "eta" parameter. So, I guess to be consistent with the original documentation, I made the same mistake! :) Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/

@blacklistnr1 4 жыл бұрын

Terminology alert!! "eta" refers to the greek letter Η(upper case)/η(lower case), it is one of the greek's many "ee" sounds(as in wheeeeee), it's definitely not epsilon.

@MrPopikeyshen 2 жыл бұрын

like just for this sound 'bip-bip-pilulipup'

@servaastilkin7733 Жыл бұрын

@@blacklistnr1 I came here to say the same thing. Maybe this helps: èta - η sounds somewhat like the vowels in "air" epsilon - ε sounds somewhat like the vowel in "get"

@pulkitkapoor4091 3 жыл бұрын

I got my first job in Data Science because of the content you prepare and share. Can't thank you enough Josh. God bless :)

@statquest 3 жыл бұрын

That is awesome! Congratulations! TRIPLE BAM! :)

@SaurabhMishra-tt5qt 2 жыл бұрын

which company bro?

@sendhana-46 Жыл бұрын

kya company bhai?

@ImGeneralJAckson 3 ай бұрын

Same :-)

@Hardson 4 жыл бұрын

That's why I pay my Internet.

@statquest 4 жыл бұрын

Thanks! :)

@nikilisacrow2339 3 жыл бұрын

Can I just say I LOVE STATQUEST! Josh does the intuition of a complex algorithm and the math of it so well and then to make it into an engaging video that is so easy to watch is just amazing! I just LOVE this channel. You you boosted the gradient of my learning on machine learning in an extreme way. Really appreciate these videos

@statquest 3 жыл бұрын

Wow! Thank you very much!!! I'm so glad you like the videos. :)

@giannislazaridis6788 4 жыл бұрын

I'm starting writing my Master Thesis and there were still some things I needed to make clear before using XGBoost for my classification problem. God Bless You

@statquest 4 жыл бұрын

Thank you! :)

@hanyang4321 3 жыл бұрын

I watched all of the videos in your channel and they're extremely awesome! Now I have much deeper understanding in many algorithms. Thanks for your excellent work and I'm looking forward to more lovely videos and your sweet songs!

@statquest 3 жыл бұрын

Thank you very much! :)

@glowish1993 4 жыл бұрын

You make learning math and machine learning interesting and allow viewers to understand the essential points behind complicated algorithms, thank you for this amazing channel :)

@statquest 4 жыл бұрын

Thank you! :)

@kennywang9929 4 жыл бұрын

Man, you do deserve all the thanks from the comments! Waiting for part2! Happy new year!

@statquest 4 жыл бұрын

Thanks!!! I just recorded Part 2 yesterday, so it should be out soon.

@PauloBuchsbaum 4 жыл бұрын

An incredible job of clear, concise and non-pedantic explanation. Absolutely brilliant!

@statquest 4 жыл бұрын

Thank you very much!

@shhdeshp 5 ай бұрын

I just LOVE your channel! Such a joy to learn some complex concepts. Also, I've been trying to find videos that explain XGBoost under the hood in detail and this is the best explanation I've come across. Thank you so much for the videos and also boosting them with an X factor of fun!

@statquest 5 ай бұрын

Awesome, thank you!

@guoshenli4193 3 жыл бұрын

I am a graduate student at Duke, since some of the materials are not covered in the class, I always watch your videos to boost my knowledge. Your videos help me a lot in learning the concepts of these tree models!! Great thanks to you!!!!! You make a lot of great videos and contribute a lot in online learning!!!!

@statquest 3 жыл бұрын

Thank you very much and good luck with your studies! :)

@modandtheganggaming3617 4 жыл бұрын

Thank you! I'd been waited for XGBoost explained for so long

@statquest 4 жыл бұрын

I'm recording part 2 today (or tomorrow) and it will be available for early access on Monday (and for everyone a week from monday).

@mainhashimh5017 2 жыл бұрын

Man, the quality and passion put into this. As well as the sound effects! I'm laughing as much as I'm learning. DAAANG. You're the f'ing best!

@statquest 2 жыл бұрын

Thank you very much! :)

@andreitolkachev8295 3 жыл бұрын

I wanted to watch this video last week, but you sent me on a magical journey through adaboost, logistic regression, logs, trees, forests, gradient boosting.... Good to be back

@statquest 3 жыл бұрын

Glad you finally made it back!

@pranavjain9799 Жыл бұрын

same haha

@hellochii1675 4 жыл бұрын

xgboosting！This must be my Christmas 🎁 ~~ Happy holidays ~

@statquest 4 жыл бұрын

Yes, this is sort of an early christmas present. :)

@jaikishank 3 жыл бұрын

Thanks Josh for your explanation. XGBoost explanation cannot be made simpler and illustrative than this. I love your videos.

@statquest 3 жыл бұрын

Thank you very much! :)

@gawdman 4 жыл бұрын

Hey Josh! This is fantastic. As an aspiring data scientist with a couple of job interviews coming up, this really helped!

@statquest 4 жыл бұрын

Awesome!!! Good luck with your interviews and let me know how they go. :)

@nickbohl2555 4 жыл бұрын

I have been super excited for this quest! Thanks as always Josh

@statquest 4 жыл бұрын

Hooray!!!!

@jjlian1670 4 жыл бұрын

I have been waiting for your video for XGBoost, hope for LightGBM next!

@breopardo6691 3 жыл бұрын

In my heart, there is a place for you! Thank you Josh!

@statquest 3 жыл бұрын

Thanks!

@geminicify 4 жыл бұрын

Thank you for posting this! I have been waiting for it for long!

@statquest 4 жыл бұрын

Hooray! :)

@anupriy 2 жыл бұрын

Thanks for making such great videos, sir! You indeed get each concepts CLEARLY EXPLAINED.

@statquest 2 жыл бұрын

Thank you! :)

@mangli4669 4 жыл бұрын

Hey Josh, first I wanted to say thank you for your awesome content. You are the number one reason I am graduating my degree haha! I would love a behind the scenes video about how you make your videos. How you prepare for topic, how you make your animations and your fancy graphs! And some more singing ofcourse!

@statquest 4 жыл бұрын

That would be awesome. Maybe I'll do something like this in 2020. :)

@pavankumar6992 4 жыл бұрын

Fantastic explanation for XGBoost. Josh Starmer, you are the best. Looking forward to your Neural Network tutorials.

@statquest 4 жыл бұрын

Thanks! I hope to get to Neural Networks as soon as I finish this series on XGBoost (which will have at least 3 more videos).

@Azuremastery 3 жыл бұрын

Thank you! Super easy to understand one of the important ml algorithm XGBoost. Visual illustrations are the best part!

@statquest 3 жыл бұрын

Thank you very much! :)

@guillemperdigooliveras5351 4 жыл бұрын

As always, loved it! I can now wear my Double Bam t-shirt even more proudly :-)

@statquest 4 жыл бұрын

Awesome!!!!!! :)

@anggipermanaharianja6122 3 жыл бұрын

why not wearing the Triple Bam?

@guillemperdigooliveras5351 3 жыл бұрын

@@anggipermanaharianja6122 for a second you gave me hopes about new Statquest t-shirts being available with a Triple Bam drawing!

@DonDon-gs4nm 4 жыл бұрын

After watching your video, I understood the concept of 'understanding'.

@tusharsub1000 3 жыл бұрын

I had left all hope of learning machine learning owing to its complexity. But because of you I am still giving it a shot..and so far I am enjoying...

@statquest 3 жыл бұрын

Hooray!

@liuxu7879 2 жыл бұрын

Hey Josh, I really love your contents, you are the one who really explains the model details.

@statquest 2 жыл бұрын

WOW! Thank you so much for supporting StatQuest!

@siddharth4251 Жыл бұрын

sir you are awesome i really dont have enough words to express my gratitude ....this xgboost noone might make me understand as easy as you have made it....huge respect for you....to make understand such complex task to people like me who are just below avarage is not ordinary skill....

@statquest Жыл бұрын

Thank you! :)

@shubhambhatia4968 4 жыл бұрын

woah woah woah woah!... now i got the clear meaning of understanding after coming to your channel...as always i loved the xgboost series as well. thank you brother.;)

@statquest 4 жыл бұрын

Thank you very much! :)

@sidbhatia4230 4 жыл бұрын

Thanks, it helped a lot! Looking forward to part 2, and if possible please make one on catboost as well!

@user-jx7ft7ir7d 4 жыл бұрын

Awesome video!!! It's the best tutorial I have ever seen about XGBoost. Thank you very much!

@statquest 4 жыл бұрын

Thank you! :)

@nitinvijayy 2 жыл бұрын

Best Channel for anyone Working in the Domain of Data Science and Machine Learning.

@statquest 2 жыл бұрын

Thanks!

@lxk19901 4 жыл бұрын

This is really helpful, thanks for putting them together!

@statquest 4 жыл бұрын

Thank you! :)

@mentordedados Жыл бұрын

You are the best, Josh. Greetings from Brazil! We are looking forward you video explaining clearly the LightGBM!

@statquest Жыл бұрын

I hope do have that video soon.

@kamalamarepalli1165 2 ай бұрын

I have never seen an data science video like this....good informative, very clear, super explanation of math and wonderful animation and energetic voice....Learning many things very easily....thank you so much!!

@statquest 2 ай бұрын

Thank you very much!

@jackytsui422 3 жыл бұрын

I am learning machine learning from scratch and your videos helped me a lot. Thank you very much!!!!!!!!!!!

@statquest 3 жыл бұрын

Good luck! :)

@fivehuang7557 4 жыл бұрын

Happy holiday man! Waiting for your next episode

@statquest 4 жыл бұрын

It should be out in the first week in 2020.

@SaraSilva-zu7wn 2 жыл бұрын

Clear explanations, little songs and a bit of silliness. Please keep them all, they're your trademark. :-)

@statquest 2 жыл бұрын

Thank you! BAM! :)

@moidhassan5552 3 жыл бұрын

Wow, I am really interested in Bioinformatics and was learning Machine Learning techniques to apply to my problems and out of curiosity, I checked your LinkedIn profile and turns out you are a Bioinformatician too. Cheers

@statquest 3 жыл бұрын

Bam! :)

@RidWalker 7 ай бұрын

I've never I had so much fun learning something new! Not since I stared at my living room wall for 20min and realized it wasn't pearl, but eggshell white! Thanks for this!

@statquest 7 ай бұрын

Glad you got the wall color sorted out! Bam! :)

@natashadavina7592 3 жыл бұрын

your videos have helped me a lot!! thank you so much i hope you keep on making these videos:)

@statquest 3 жыл бұрын

Thanks!

@aksaks2338 4 жыл бұрын

Hey Josh! Thanks for the video, just wanted to know when will you release part 2 and 3 of this?

@statquest 4 жыл бұрын

Part 2 is already available for people with early access (i.e. channel members and patreon supporters). Part 3 will be available for early access in two weeks. I usually release videos to everyone 1 or 2 weeks after early access.

@yulinliu850 4 жыл бұрын

Great Xmas present! Thanks Josh!

@statquest 4 жыл бұрын

Hooray! :)

@junaidbutt3000 4 жыл бұрын

This has been one video I’ve been waiting for and it was well worth it. Brilliant as usual Josh. I wanted to ask about the differences between the XGBoost regression tree and the traditional regression tree with Boosting. It seems that the main difference is that the XGBoost version uses the gain measure (made of similarity) to determine the split thresholds for each feature (I presume if we had more than dosage we would consider them in the same way) and prunes according to the gamma parameter. Whereas the traditional tree uses a measure like Gini impurity to split and a method like cost complexity pruning. Is that the main difference? Or are there any more? Could you also mention why this type of tree is better than the traditional version? It seems like the algorithm has some optimisation for this type of tree than the other.

@statquest 4 жыл бұрын

There are lots of differences, however, the fundamental difference in trees is a big one. I believe the reason for XGBoost trees is that the computation can be easily optimized compared to traditional regression trees. The other major differences are optimizations for very large datasets - XGBoost was one of the first machine learning algorithms developed specifically for "big data" so it has tricks for working with datasets that can't all fit into memory. I'll talk about these in Part 4 (part is how XGBoost trees work for classification and part 3 derives the math and theory that underlies XGBoost trees.).

@ashfaqueazad3897 4 жыл бұрын

Life saver. Was waiting for this.

@urvishfree0314 3 жыл бұрын

thankyou so much i watched it 3-4 times already but finally everything makes sense. thankyou so much

@statquest 3 жыл бұрын

Hooray!

@anzei331 4 жыл бұрын

Best XgBoost explanation I've found on the internet! Keep it up Are you going to touch on alpha and other parameters later in the series?

@iop09x09 4 жыл бұрын

Wow! Very well explained, hats off.

@statquest 4 жыл бұрын

Thanks! :)

@SeitzAl1 4 жыл бұрын

amazing lesson as always. thanks josh!

@statquest 4 жыл бұрын

Thank you! :)

@HANTAIKEJU 3 жыл бұрын

Hi Josh, Love your videos. Currently preparing Data Science interviews based on your video. Actually, really want to hear one about LGBM !

@statquest 3 жыл бұрын

I'll keep that in mind.

@tc322 4 жыл бұрын

Xtreme Christmas gift!! :) Thanks!!

@statquest 4 жыл бұрын

@gorilaz0n 2 жыл бұрын

Gosh! I love your fellow-kids vibe!

@statquest 2 жыл бұрын

Thanks!

@bernardmontgomery3859 4 жыл бұрын

xgboosting! my Christmas gift!

@statquest 4 жыл бұрын

Hooray! :)

@monkeydrushi Жыл бұрын

God, thank you for your "beep boop" sounds. They just made my day!

@statquest Жыл бұрын

Hooray! :)

3 жыл бұрын

Thank you for sharing this amazing video!

@statquest 3 жыл бұрын

Thank you! :)

@machi992 3 жыл бұрын

I actually started looking for XGBoost, but every video assumes I know something. I have ended up watching more than 8 videos just to have no problems understanding and fulfilling the requirements, and find them awesome.

@statquest 3 жыл бұрын

Bam! Congratulations!

@DrJohnnyStalker 4 жыл бұрын

Best XGBoost explanation i have ever seen! This is Andrew Ng Level!

@statquest 4 жыл бұрын

Thank you very much! I just released part 4 in this series, so make sure you check them all out. :)

@DrJohnnyStalker 4 жыл бұрын

@@statquest I have binge watched them all. All are great and by far the best intuative explanation videos on XGBoost. A series on lightgbm and catboost would complete the pack of gradient boosting algorithms. Thx for this great channel.

@statquest 4 жыл бұрын

@@DrJohnnyStalker Thanks! :)

@tobiasksr23 2 жыл бұрын

I justo found this channel and i think it's amazing.

@statquest 2 жыл бұрын

Glad to hear it!

@oldguydoesntmatter2872 4 жыл бұрын

Bravo! Excellent presentation. I've been through it a bunch of times trying to write my own code for my own specialized application. There's a lot of detail and nuance buried in a really short presentation (that's a compliment - congratulations!). Since you have nothing else to do (ha! ha!), would you consider writing a "StatQuest" book? I'll bid high for the first autographed copy!

@statquest 4 жыл бұрын

Thank you very much!

@vladimirmihajlovic1504 2 ай бұрын

Love StatQuest. Please cover lightGBM and CatBoost!

@statquest 2 ай бұрын

I've got catboost, you can find it here: statquest.org/video-index/

@RS-el7iu 4 жыл бұрын

thank you for the knowledge your sharing. much love

@statquest 4 жыл бұрын

You are so welcome! :)

@sajjadabdulmalik4265 2 жыл бұрын

You are always awesome no better explanation ever seen like this ❤️❤️ big fan 🙂🙂.. Triple bammm!!! Hope we have Lightgbm coming soon.

@statquest 2 жыл бұрын

I've recently posted some notes on LightGBM on my twitter account. I hope to convert them into a video soon.

@gokulprakash8694 3 жыл бұрын

Stat quest is the bestttttt!!! love it love it love it!!!!!!

@statquest 3 жыл бұрын

Thank you! :)

@reyhanehhashemi5772 4 жыл бұрын

Many thanks Josh ! it was just amazing ! when is part 2 coming ? cannot wait ..

@statquest 4 жыл бұрын

The good news is that Part 2 is super close to being done... the bad news is that it won't be out until January 6th (we're just about to go on a holiday). But Part 3 will come out the following week. So there's that.

@reyhanehhashemi5772 4 жыл бұрын

@@statquest Great, thank you so much ! have great holidays ^^

@ahmedelhamy1845 2 жыл бұрын

Wonderful as usual Josh

@statquest 2 жыл бұрын

Thanks!

@saileshpatra2488 4 жыл бұрын

Love the content. How many parts will be there for xgboost?

@scotthalpern5631 8 ай бұрын

This is fantastic!

@statquest 8 ай бұрын

Thanks!

@palvinderbhatia3941 8 ай бұрын

Wow woww wowww !! How can you explain such complex concepts so easily. I wish I can learn this art from you. Big Fan!! 🙌🙌

@statquest 8 ай бұрын

Thank you so much 😀

@ramnareshraghuwanshi516 3 жыл бұрын

Thanks for uploading this.. i am your biggest fan!! I have noticed too many adds these days which really disturb :)

@statquest 3 жыл бұрын

Sorry about the adds. KZbin does that and I can not control it.

@nilanjana1588 11 ай бұрын

You make it little bit easy to understand Josh . I am saved.

@statquest 11 ай бұрын

Thanks!

@oriol-borismonjofarre6114 Жыл бұрын

Josh you are amazing!

@statquest Жыл бұрын

Thank you!

@oldguydoesntmatter2872 4 жыл бұрын

I've been using Random Forests with various boosting techniques for a few years. My regression (not classification) database has 500,000 - 5,000,000 data points with 50-150 variables, many of them highly correlated with some of the others. I like to "brag" that I can overfit anything. That, of course, is a problem, but I've found a tweak that is simple and fast that I haven't seen elsewhere. The basic idea is that when selecting a split point, pick a small number of data vectors randomly from the training set. Pick the variable(s) to split on randomly. (Variables plural because I usually split on 2-4 variables into 2^^n boosting regions - another useful tweak.) The thresholds are whatever the data values are for the selected vectors. Find the vector with the best "gain" and split with that. I typically use 5 - 100 tries per split and a learning rate of .5 or so. It's fast and mitigates the overfitting problem. Just thought someone might be interested...

@zhonghengzhang603 4 жыл бұрын

Sounds awesome, would you like share the code?

@alimmr2008 3 жыл бұрын

Excellent Job!

@statquest 3 жыл бұрын

Thanks!

@anggipermanaharianja6122 3 жыл бұрын

Awesome... this vid should be a mandatory in any schools

@statquest 3 жыл бұрын

bam! :)

@shivasaib9023 3 жыл бұрын

I fell in love with XGBOOST. While Pruning every node I was like whatttt :p

@statquest 3 жыл бұрын

@omkarjadhav13 4 жыл бұрын

You just amazing Josh. Xtreme Bam!!! You make our life so easy. Waiting for neural net vid and further Xgboost parts. Please plan a meetup in Mumbai. #queston

@statquest 4 жыл бұрын

Thanks so much!!! I hope to visit Mumbai in the next year.

@ksrajavel 4 жыл бұрын

@@statquest Happy New Year, Mr. Josh. New year arrived. Awaiting you in India.

@statquest 4 жыл бұрын

@@ksrajavel Thank you! Happy New Year!

@emrzful 2 жыл бұрын

Thanks for the awesome content

@statquest 2 жыл бұрын

Glad you enjoy it!

@aldo605 Жыл бұрын

Thank you so much. You are the best

@statquest Жыл бұрын

Thank you very much for supporting StatQuest! BAM! :)

@karannchew2534 3 жыл бұрын

For my future reference. 1) Initiate with a predicted value e.g. 0.5. 2) Get residual. Each sample vs. initial predicted value. 3) Build a mini tree, using the Residuals value of each sample. .Residuals .Different values of feature as cut off point at branches. Each value give a set of Similarity and Gain scores ..Similarity (use lambda here, the regularisation parameter) - measure how close the residual values to each other ..Gain (affected by lamda) .Pick the feature value that give highest gain - this determines how to split the data - which create the branch (and leaves) - which produce a mini tree. 4) Prune tree. Using gain threshold (aka complexity parameter), gamma. If gain>gamma, keep branch, else prune 5) Get Output Value OV for each leaf. Mini tree done. OV = sum of Residuals / (no. of Residuals + lambda) 6) Predict value for each sample using the newly created mini tree. Run each sample data through the mini tree. New Predicted value = last predicted value + eta * OV 7) Get new set of residual: New predicted value vs actual value of each sample. 8) Re do from step 3. Create more mini trees... .Each tree 'boosts' the prediction - improving the result. .Each tree creates new residual as input to creating the next new tree. ...until no more improvement or no. of tree is reached.

@statquest 3 жыл бұрын

Noted

@carlpiaf4476 Жыл бұрын

Could be improved by adding how the decision cut off point is made.

@jihaekim4327 3 жыл бұрын

This is really helpful!! Hope for LightGBM next!

@statquest 3 жыл бұрын

I'll keep that in mind.

@adityanimje843 3 жыл бұрын

Hey Josh, love your videos :) Any idea when you will make the videos for CatBoost and Light GBM ?

@statquest 3 жыл бұрын

Maybe as early as July.

@adityanimje843 3 жыл бұрын

@@statquest Thank you :) One more question - I was reading Light GBM documentationand it said Light GBM grows "leaf wise" where as most DT algorithm grow "level wise" and that is a major advantage of Light GBM. But in your videos ( RF and other DT algortihm ones ), all of the videos show that they are grown "leaf wise". Am I missing miunderstanding something here ?

@statquest 3 жыл бұрын

@@adityanimje843 I won't know the answer to that until I start researching Light GBM in July

@adityanimje843 3 жыл бұрын

@@statquest Sure - thank you for the swift reply. Looking forward to your new videos in July :)

@sarrae100 4 жыл бұрын

Love u Ppl, StatQuest the 👍💯, Super BAM!!!

@statquest 4 жыл бұрын

Thanks! :)

@weizhengtop Жыл бұрын

Hi Josh, wonderful job in making these valuable videos. It is very helpful for students to learn by your videos. I am wondering if you could make a series about the Bayesian Additive Regression Tree model. They are very closely related topics.

@statquest Жыл бұрын

I'll keep that in mind.

@shaz-z506 4 жыл бұрын

Extreme Bam! Finally xgboost is here

@statquest 4 жыл бұрын

That's a good one! :)

@ecotrix132 Ай бұрын

Thanks for the wonderful content! How does xgboost select which feature to split on? As I understand from the explanation, does each feature have its own full tree unlike bootstrapped subset in random forest that has multiple features used in a subset tree?

@statquest Ай бұрын

To select which feature to split on, XGBoost tests each feature in the dataset to selects the one the performs the best.

@stylianosiordanis9362 4 жыл бұрын

please post slides, this is the best channel for ML. thank you

@hubert1990s 4 жыл бұрын

can't wait the part 2

@statquest 4 жыл бұрын

I'm recording it this weekend. It should be available for early access by Monday afternoon.

@shangauri 2 жыл бұрын

Great video Josh. In practice, what is the best way to find the optimal values for lambda, gamma and eta?

@statquest 2 жыл бұрын

Cross validiation.

@keizerneptune4594 4 жыл бұрын

Great video! When r u gonna release part 2?

@statquest 4 жыл бұрын

It should be out for early access viewing on January 6th.

@PriyanshiSharma14oct 3 жыл бұрын

Awesome!! Thanks a lot!

@statquest 3 жыл бұрын

Thanks!

@sachinrathi7814 4 жыл бұрын

Waiting for this video since long back.

@statquest 4 жыл бұрын

I hope it was worth the wait! :)

@sachinrathi7814 4 жыл бұрын

@@statquest Indeed. I have gone through many post but everyone is telling about it combine week classified to make strong classifier..n same description every. & Then the way of describing the things make differ Josh Starmer to others. Marry Christmas 🤗

@kamaldeep8257 4 жыл бұрын

Hi Josh, Great explanation and it helped me understand every tiny bit of complex methods used in it. But I want to know one thing that you only considered one independent variable for building the individual trees. If we have more than 1 independent variable like you used in the previous explanations of ada boost and gradient boost. Do we use the same methods as the Gini index or information gain to decide the variable's importance to make the split? Thank you

@statquest 4 жыл бұрын

XGBoost trees are fit to residuals, which are always continuous values, and this makes them incompatible with GINI and that only works if the values are "true" or "false". So, for each independent variable in our dataset XGBoost calculates the similarity scores, based on how the residuals are clustered, and the gain for different splits. The variable with the largest gain value is the one XGBoost uses for the split.

@vishalshira7398 4 жыл бұрын

Hi, first of all thanks for uploading this video. It's 4 times BAM !! Can you please tell me, how to decide gamma value, which is used to prune the tree? Do we need to figure out with trial and error? Or is there any better way? Thanks in advance!!

@statquest 4 жыл бұрын

We use cross validation - which is just a fancy type of trial and error: kzbin.info/www/bejne/nITcpa19rNx1jNk

@metiseh 2 жыл бұрын

Bam!!! I am totally hypnotized

@statquest 2 жыл бұрын

Thanks!

@rishabhahuja2506 3 жыл бұрын

Thanks Josh for this great video. Your are explainations are dam good!! Waiting for Catboost and LightGBM. Bammmm!!!!

@statquest 3 жыл бұрын

Thanks!

@kn58657 4 жыл бұрын

I'm doing a club remix of the humming during calculations. Stay tuned!

@statquest 4 жыл бұрын

Awesome!!!!! I can't wait to hear.

@des_224 2 жыл бұрын

Hey Josh, I remember in your regression tree vid you mentioned that after we split the root node, the subsequent splits of the child nodes only happen if its size is greater than a typical threshold (~20) to prevent overfitting, e.g. if a node has size < 20, then we take the average value and make it a leaf. Does XGBoost do the same for an actual dataset, and does it also do the random forest thing where it randomly selects m from p features (where m~root(p)) when generating the individual splits for each tree? thanks!

@statquest 2 жыл бұрын

In practice, XGBoost has options for both (requiring a minimum number of samples in a node in order to make a split, and randomly selecting features at each node).