8.3 Bias-Variance Decomposition of the Squared Error (L08: Model Evaluation Part 1)

Рет қаралды 10,399

Күн бұрын

Пікірлер: 37

@elnuisance 3 жыл бұрын

This was life saving. Thank you so much Sebastian. Especially for explaining why 2ab = 0 while deriving the decomposition

@bluepenguin5606 2 жыл бұрын

Hi Professor, thank you so much for the excellent explanation!! I learned bias variance decomposition long time ago but never fully understand it until I watch this video! Detailed explanation of each definition helps a lot. Also, with the code implementation, it helps me not only understand the concepts but also be able to implement into the real application which is the part I always struggle with! I'll definitely find time to watch other videos to make my ML foundation more solid.

@SebastianRaschka 2 жыл бұрын

Wohoo, glad this was so useful! 😊

@whenmathsmeetcoding1836 2 жыл бұрын

This was wonderful Sebastian after looking no such video available on you tube with such explanation

@SebastianRaschka 2 жыл бұрын

Wohoo, thanks so much for the kind comment!

@kairiannah Жыл бұрын

this is how you teach machine leanring, respectfully the prof. at my university needs to take notes!

@PriyanshuSingh-hm4tn 2 жыл бұрын

The best explanation of bias & variance I've countered so far. it would be helpful if you could include the "noise" too.

@SebastianRaschka 2 жыл бұрын

Thanks! Haha, I would defer the noise term to my statistics class but yeah, maybe I should do a bonus video on that. A director's cut. :)

@khuongtranhoang9197 3 жыл бұрын

Do you know that you are doing truly good work! clear to every single details

@SebastianRaschka 3 жыл бұрын

Thanks, this is very nice to hear!

@gurudevilangovan 3 жыл бұрын

Thank you so much for the bias variance videos. Though I intuitively understood it, these equations never made sense to me before I watched the videos. Truly appreciated!!

@SebastianRaschka 3 жыл бұрын

Awesome, I am really glad to hear that I was able to explain it well :)

@ashutoshdave1 2 жыл бұрын

Thanks for this! Provides one of the best explanations👏

@SebastianRaschka 2 жыл бұрын

Thanks! Glad to hear!

@ashutoshdave1 2 жыл бұрын

@@SebastianRaschka Hi Sebastian, visited your awesome website resource for ML/DL. Thanks again. Can't wait for the Bayesian part to be completed.

@XavierL-m6g 11 ай бұрын

Thank you so much for the intuitive explanation! The notations are clear to understand and it just instantly clicked.

@krislee9296 2 жыл бұрын

Thank you so much. This helps me to understand perfectly about Bias-Variance mathmetically.

@SebastianRaschka 2 жыл бұрын

Awesome! Glad to hear!

@imvijay1166 2 жыл бұрын

Thank you for this great lecture series!

@SebastianRaschka 2 жыл бұрын

glad to hear that you are liking it!

@andypandy1ify 3 жыл бұрын

This is an absolutely brilliant video Sebastian - thank you. I have no problem deriving the Bias-Variance Decomposition mathematically, but no one seems to explain what the variance or expectation is with respect to - is it just on one value? over multiple training sets? different values within one training set? You explained it excellently.

@SebastianRaschka 3 жыл бұрын

Thanks for the kind words! Glad it was useful!

@Rictoo 4 ай бұрын

I have a couple of questions: Regarding the variance, is this calculated across different parameter estimates given the same functional form of the model? Also, these parameter estimates depend on the optimization algorithm used, right, ie., implying the model predictions are 'empirically-derived models' vs. some sort of theoretically optimal parameter combinations, given a particular functional form? If so, would this mean that _technically speaking_, there is an additional source of error in the loss calculation, which could be something like 'implementation variance' due to our model likely not having the most optimal parameters, compared to some theoretical optimum? Hope this makes sense, I'm not a mathematician. Thanks!

@siddhesh119369 Жыл бұрын

Hi, thanks for teaching, really helpful 😊

@justinmcgrath753 9 ай бұрын

At 10:20, the bias comes out backward because the error should be y_hat - y, not y - y_hat. The "true value" in an error is substracted from the estimate. Not the other way around. This is easily remembered from thinking of a a simple random variable with mean mu and error e: y = mu + e. Thus, e = y - mu.

@bashamsk1288 Жыл бұрын

When you say bias^2+variance that is for a single model In the beginning you said bias and variance for different models trained on different datasets which one is it? If we consider single model then bias is nothing but mean error and variance is mean squared error?

@tykilee9683 2 жыл бұрын

So helpful😭😭😭

@kevinshao9148 3 жыл бұрын

Thanks for the great video! One question: 8:42 why y is constant? y=f(x) here also has distribution, is a R.V. is that correct? and when you say "apply expectation on both sides, this expectation over y or over x?

@SebastianRaschka 3 жыл бұрын

Good point. For simplicity, I assumed that y is not a random variable but a fixed target value instead

@kevinshao9148 3 жыл бұрын

@@SebastianRaschka Thank you so much for the reply! yeah that's where my confuse sticks. So what do you expectation over of? If you expectation over all the x value, then you cannot do this assumption right?

@DeepS6995 3 жыл бұрын

Professor, does your bias_variance_decomp work in google colab? It did not with me. It worked just fine in Jupyter. But the problem with Jupyter is that bagging is way slower (that's my computer) than what I could get in colab.

@SebastianRaschka 3 жыл бұрын

I think Google Colab has a very old version of MLxtend as the default. I recommend the following: !pip install mlxtend --upgrade

@DeepS6995 3 жыл бұрын

@@SebastianRaschka It works now. Thanks for the prompt response

@1sefirot9 3 жыл бұрын

any good sources or hints on dataset stratification for regression problems ?

@SebastianRaschka 3 жыл бұрын

Not sure if this is the best way, but personally I approached that by manually specifying bins for the target variable and then proceeding with stratification like for classification. There may be more sophisticated techniques out there, though, e.g., based on KL divergence or so.

@1sefirot9 3 жыл бұрын

@@SebastianRaschka hm, given a sufficiently large number of bins this should be a sensible approach, and easy to implement. I will play around with that. I am trying some of the things taught in this course on the Walmart Store Sales dataset (available from Kaggle), a naive training of LGM already returns marginally better results than what the instructor on udemy had (he used xgboost with hyperparameters returned by the AWS Sagemaker auto tuner).