NOTE: At 7:23 I should have said that the cosine similarity was 0.71. To learn more about Lightning: lightning.ai/ Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
@sahilpalsaniya724 Жыл бұрын
"BAM" and its variants are stuck in my head. every time I solve a problem my head plays your voice
@statquest Жыл бұрын
bam! :)
@Monkey_uho Жыл бұрын
Awesome work ! I've been watching a lot of your videos to understand the basics ML algorithms, continue like that ! Thank you for taking the time and the energy to spread knowledge with others. Also, I would like to say that, like others, I will also love a video explaining the concepts behind LighGBM.
@statquest Жыл бұрын
Thank you! And one day I hope to do LightGBM
@weipenghu4463 Жыл бұрын
looking forward to it❤
@Quami111 Жыл бұрын
In 2:09 and 12:40, you assigned row with height=1.32 to bin=1, but you said that rows with smaller heights would have bin=0. It doesn't appear in 11:24, row with height=1.32 has bin=0, so I guess it is a mistake.
@statquest Жыл бұрын
Oops! That was a mistake. 1.32 was supposed to be in bin 0 the whole time.
@OscarMartinez-gg5du Жыл бұрын
@@statquest In 1:15 when create the randomized tree for the first build, the height also seems to be shuffled for their corresponding Favorite Color and that changes the examples for the creation of the stumps. However, the explanation is very clear, I love your videos!!
@aakashdusane6 ай бұрын
Not gonna lie catBoost nuances were significantly more difficult to understand than any other ensemble model till date. Although the basic intuition is pretty straightforward.
@statquest6 ай бұрын
It's a weird one for sure.
@LL-hj8yh Жыл бұрын
Hey Josh, thanks as always! Are you planning to roll out lightgbm videos as well?
@statquest Жыл бұрын
Eventually that's the plan.
@drelijahmikail39162 ай бұрын
How would we extract and abstract the common mechanisms of all decision tree family construction? The decision tree family can consist of Gradient Boost, Regression Trees, Random Forests, XGBoost, AdaBoost, CatBoost, Ensemble, etc, for Regression and Classification. There is detail walkthrough of "how" and less of "why" 1 observation for some of the decision tree family is that they construct multiple "weak learners", compute (SSR: sum of squares due to regression, similar to LMS: least mean squares), and order it from root to lower nodes.
@razielamadorrios7284 Жыл бұрын
Such a great video Josh! I really enjoyed it. Any chance to do an explanation for lightGBM? thanks in advance. Additionally, I'm a huge fan of your work :)
@statquest Жыл бұрын
I'll keep that in mind.
@rishabhsoni Жыл бұрын
Great video. One question: Is the intuition behind using high cosine similarity to pick threshold that essentially we are adding the scaled leaf output to create predictions and if leaf outputs are more closer to residuals then we are moving in right direction as residuals represent how far away are we from actual target? Usually we minimize the residuals which kind of means that you find similarity with target
@statquest Жыл бұрын
I think that is correct. A high similarity means the output value is close to the residuals, so we're moving the right direction.
@rishabhsoni Жыл бұрын
But one question that comes to mind: cosine similarly is based on L2 norm so Euclidean distance. Wouldnt in this case no of rows of data act as dimension and cause weird output due to curse of dimensionality
@TheDataScienceChannel Жыл бұрын
As always a great video. Was wondering if you intend to add a code tutorial as well?
@statquest Жыл бұрын
I'll keep it in mind!
@asmaain5856 Жыл бұрын
@@statquest please for soon I reaaaaally need it
@rikki146 Жыл бұрын
API for shallow models are mostly similar :\
@satyashah304526 күн бұрын
How does multiclass classification work in catboost or else in regression when there are many bins, how is ordered target encoding done? Is it individually done for each class? If possible can you make a common video for the multiclass classification problem for the boosting algorithm, your videos are very easy to understand they are really great.
@statquest26 күн бұрын
I'll keep that topic in mind. However, often people just create a bunch of 1 vs All models, one per target classification.
@satyashah304526 күн бұрын
@@statquest Can we also do this using the softmax function?? Like suppose there are three classes in the output then we can calculate the ordered target encoding for each class 0,1 and 2 for each datapoint to make trees. Then we will place the three different logit(log odds) values obtained from adding more trees into the softmax and the predicted class is the highest value we get in the softmax function? So by this we can run the algorithm simultaneously.
@statquest25 күн бұрын
@@satyashah3045 I'm not sure you need the softmax in this case, but I know that you use the cross entropy loss function.
@Mark_mochi11 ай бұрын
In 8:25, why does the threshold change to 0.87 all of a sudden?
@statquest11 ай бұрын
Oops. That looks like a typo.
@nitinsiwach19899 ай бұрын
Hello Josh, Thank you for your amazing channel In the catboost package, why do we have both 'depth' and 'max_leaves' as parameters? One would think that since the trees here are oblivious, the two are deterministically related. Can you shed some light on this?
@statquest9 ай бұрын
That's good question. Unfortunately, there have been a lot of changes to CatBoost since it was originally published and it's hard to get answers for what's going on.
@nitinsiwach198910 ай бұрын
What do bins have to do with the ordered encoding computation as you mentioned at 11:26? In the video, you have mentioned one use-case for the bins which is to reduce the number of thresholds tested like other gradient boosting methods.
@statquest10 ай бұрын
The bins are used to give us a discrete target value for Ordered Target Encoding (since it doesn't work directly with a continuous target.) For details, see: kzbin.info/www/bejne/gYmyhYahhbFljpY
@aryanshrajsaxena69617 ай бұрын
Will we use k-fold target encoding for the case of more than 2 bins?
@statquest7 ай бұрын
I believe that is correct.
@nitinsiwach19893 ай бұрын
I have a few questions 1. Catboost is still gradient boosting, right? Residuals are computed as the first derivative of the loss function. That too in the appropriate scale which would be log(odds) for the classification. You are getting the residuals as the difference because you are assuming L2 loss, right? 2. Are the output values also computed as the value that would minimize the loss function? Or, is it always the mean of the target variable falling in the node? For classification would it be the mean of the labels? For classification the output should be in the log(odds) scale. How is that done here?
@statquest3 ай бұрын
CatBoost is just Gradient Boosting + a different way to build trees. However, everything regarding the loss functions and how things are computed is lifted straight from Gradient Boosting. To learn more about those details, see: kzbin.info/www/bejne/aKnYlYOFd99grNU and kzbin.info/www/bejne/iaW6imiHjLKLedk
@yehonatanavidan99042 ай бұрын
First, I'm a big fan-thanks for the excellent explanations! 🙏 Secondly, am I wrong, or did you shuffle the connection between the 'greens' and their target when you randomized the favorite colors?
@statquest2 ай бұрын
What time point, minutes and seconds are you asking about?
@yehonatanavidan99042 ай бұрын
@statquest 1:00 vs 01:27
@statquest2 ай бұрын
@@yehonatanavidan9904 That's a "typo" at 1:27 - I should have kept the colors and values connected.
@serdargundogdu7899 Жыл бұрын
I wish, you could replay this part again :)
@statquest Жыл бұрын
:)
@АлександраРыбинская-п3л Жыл бұрын
Dear Josh, I have a question about using Catboost for Classification. In this video, which tells us about using Catboost for Regression, we calculated output values for a leaf as an average of residuals in a leaf. How do we calculate output value for Classification? Do we use the same formula as for Gradient Boosting? I mean, (Sum of residuals) in the numerator and Sum of (Previous probability(i)*(1-Previous probability(i)) in denominator.
@statquest Жыл бұрын
CatBoost is, fundamentally, based on Gradient Boost, which does classification by converting the target into a log(odds) value and then treating it like a regression problem. For details, see: kzbin.info/www/bejne/oKnYf39-asmLedU
@sanukurien27527 ай бұрын
what happens during inference time when the target is not available? How are the categorical variables encoded then?
@statquest7 ай бұрын
You use the full training dataset to encode the new data.
@alexpowell-perry2233 Жыл бұрын
How does catboost decide on the best split at level 2 in the tree if it has to be symmetric? What if the best threshold for the LHS node is different to the best threshold for the RHS node?
@statquest Жыл бұрын
It finds the best threshold given that it has to be the same for all nodes at that level. Compared to how a normal tree is created, this is not optimal. However, the point is not to make an optimal tree, but instead to create a "weak learner" so that we can combine a lot of them to build something that is good at making predictions. Pretty much all "boosting" methods do something to make the trees a little worse at predicting on their own because trees are notorious for overfitting the training data. By making the trees a little worse, they prevent overfitting.
@alexpowell-perry2233 Жыл бұрын
@@statquest thanks so much for the reply but I still dont quite understand this - so does each LEVEL get a similarity score? I dont understand how you can quantify a threshold when this threshold is being applied to more than 1 node in the tree? In your example you showed us how to calculate the cosine similarity for a split that is being applied to just 1 node - how do we calculate this when its being applied to, (in the case of a level 2 split) 2 nodes simultaneously? I also have one more question - since the tree must be symmetrical, i am assuming that a characteristic (in the case of your example - "Favourite Film") can only ever appear in a tree once?
@statquest Жыл бұрын
@@alexpowell-perry2233 In the video I show how the cosine similarity is calculated using 2 leaves. Adding more leaves doesn't change the process. Regardless of how many leaves are on a level, we calculate the cosine similarity between the residuals and the predictions for all of the data. And yes, a feature will not be used if it can no longer split the data into smaller groups.
@renyuduan Жыл бұрын
TKS a lot~ i'm looking for an answer! For the new data whose "Favorite Color" is blue, why does it belong to bin#0 instead of bin#1 ?
@statquest Жыл бұрын
The new data is not assigned to a bin at all. We just use the old bin numbers associated with the Training Data (and only the training data) to convert the color, "blue", to a number. The bin numbers in the training data are used for the sum of the 1's for the numerator.
@renyuduan Жыл бұрын
@@statquest I misunderstood, sorry~ for new data whose "Favorite Color" is blue, we use all the rows with the same color, "blue", where OptionCount and n.
@statquest Жыл бұрын
@@renyuduan yep
@ВалерийГайнанов-и9о Жыл бұрын
Thank you for your content! It's very nice, everything is clear, I hope you want stop producing your content :)
@statquest Жыл бұрын
Thank you!
@yufuzhang1187 Жыл бұрын
Dr. Starmer, when you have a chance, can you please make videos on LIghtGBM, which is quite popular these days? Also, can you do ChatGPT or GPT or Transformer, clearly explained! Thank you so much!
@statquest Жыл бұрын
I'm working on Transformers right now.
@yufuzhang1187 Жыл бұрын
@@statquest Thank you so much! Looking forward!
@xaviernogueira Жыл бұрын
@@statquest excited for that
@danieleboch32248 ай бұрын
i have a question about leaf outputs. don't gradient boosting algorithms on trees build a new tree all the way down and after that assign some values to their leafs? you rather did it iteratively, calculating outputs when the tree wasn't built yet.
@statquest8 ай бұрын
As you can see in this video, not all gradient boosting algorithms with trees do things the same way. In this case, the trees are built differently, and this is done to avoid leakage.
@danieleboch32248 ай бұрын
@@statquest thanks, i got it now! but i got another question, in the catboost documentation there is a leaf estimation parameter (set to "Newton") and it is weird as the newton method is the exact method that is used in finding leaf values in xgboost, it uses the second derivative of the loss function and creates a tree according to new information criteria based on that method. but why would we need that if we already build trees in the ordered way finding the best split with the cosine similarity function?
@statquest8 ай бұрын
@@danieleboch3224 To be honest, I an only speculate about this. My guess is that they started to play around with different leaf estimation methods and found the one xgboost uses works better than the one they originally came up with. To be honest, the "theory" of catboost seems to be quite different from how it works in practice, and this is very disappointing to me.
@serdargundogdu7899 Жыл бұрын
how was "favorite color < 29" changed into "favorite color < 0.87" in 8:28 ? Could you please explain?
@statquest Жыл бұрын
That's just a horrible and embarrassing typo. :( It should be 0.29.
@DeepaliBaghel-l9n Жыл бұрын
Big Fan !! 🙌
@statquest Жыл бұрын
Thanks!
@near_. Жыл бұрын
Awesome. I'm your new subscriber 🙂
@statquest Жыл бұрын
Thank you! :)
@reynardryanda245 Жыл бұрын
12:41 how did you get the optionCount for prediction? I thought that it’s the amount of time that color for that bin appears sequentially. But if it’s for prediction, we don’t know the actual bin right?
@statquest Жыл бұрын
At 12:41, we are trying to predict the hight of someone who likes the color blue. So, in order to change "blue" into a number, we look at the training data on the left, which has two rows with the color blue in it. One of those rows is in Bin 0 and the other is in Bin 1. Thus, to get the option count for "blue", we add 0 + 1 = 1. In other words, the option count for the new observation is derived entirely from the training dataset.
@alexpowell-perry2233 Жыл бұрын
at 11:48 when you are calculating the output values of the second tree, the residual for the 3rd record with a favourite Colour value of 0.525 and a Residual of 1.81 gets sent down the LHS leaf, even though the LHS leaf contains Residuals that are
@statquest Жыл бұрын
Oops! That's a mistake. Sorry for the confusion!
@frischidn3869 Жыл бұрын
What will the residuals and leaf output be when it is a multiclass classification?
@statquest Жыл бұрын
Presumably it's log likelihoods from cross entropy. I don't show how this works with CatBoost, but I show how it works with Neural Networks here: kzbin.info/www/bejne/bHLVhKypatZ7d7c
@alphatyad8131 Жыл бұрын
Excuse me again Dr. Starmer. Do you know how CatBoost determines the final tree (I mean from many trees of gradient boosting that CatBoost builds) till that becomes a rule so it can predict new data? Cause I haven't found a source that tells an explicit explanation of how CatBoost made the decision trees till it can be used to predict. Thanks in advance, Dr. (Or for anyone who knows, I would appreciate your help)
@statquest Жыл бұрын
You build a bunch of trees and see if the predictions have stopped improving. If so, then you are done. If not, and it looks like the general trend is to continue improving, then build more trees.
@alphatyad8131 Жыл бұрын
I got it & am so appreciate it, Dr. And then if I could ask again; Is it safe if we say CatBoost is similar to the XGBoost method in the way it chooses features for building the tree (made predictor) & defining -in this case- the classification class for the given data?
@statquest Жыл бұрын
@@alphatyad8131 They're pretty different. To learn more about XGBoost, see: kzbin.info/www/bejne/haWnaaqMlqugbKc and kzbin.info/www/bejne/bpOUe3h6q8qhh7c
@alphatyad8131 Жыл бұрын
@@statquest Well explanation, Dr. Josh Starmer. Actually, I still learning by watching your videos on 'Machine Learning'. I appreciate it, feel not stuck in the same place as before thanks to your help. Have a nice day Dr.
@recklesspanda28 Жыл бұрын
is it still work like that if i use classification?
@statquest Жыл бұрын
I believe classification is just like classification for standard Gradient Boost: kzbin.info/www/bejne/oKnYf39-asmLedU
@recklesspanda28 Жыл бұрын
@@statquest thank you🤗
@bhavanisankarlenka20 күн бұрын
Hurray Great BAMM!!😄
@statquest20 күн бұрын
Thank you!
@alphatyad8131 Жыл бұрын
Dr. Starmer, I try to manually calculate and use a calculator too for several times but it was different from the results in 7:23. I get 0.7368, but there is 0.79. Am I missing something? Does anyone get the same result as me?
@statquest Жыл бұрын
That's just a typo in the video. Sorry for the confusion.
@alphatyad8131 Жыл бұрын
Okay. Thank you for your attention and the great explanation, Dr. Josh Starmer. Such an honor and my pleasure to contribute to this video. Have a great day, Dr.
@YUWANG-du4pv Жыл бұрын
Dr. Starmer, could you explain lightGBM🤩
@statquest Жыл бұрын
I'll keep that in mind.
@TrusePkay Жыл бұрын
Do a video on LightGBM
@statquest Жыл бұрын
I'll keep that in mind.
@nilaymandal240810 ай бұрын
5:28
@statquest10 ай бұрын
A good moment.
@TheDankGoat Жыл бұрын
obnoxious, arrogant, has mistakes, but useful....
@statquest Жыл бұрын
What parts do you think are obnoxious? What parts are arrogant?And what time points, minutes and seconds, are mistakes? (The mistakes I might be able to correct or at least have a note mentions them).