Reconciling modern machine learning and the bias-variance trade-off

  Рет қаралды 12,341

Yannic Kilcher

Yannic Kilcher

Күн бұрын

Пікірлер: 26
@PeterJMPuyneers
@PeterJMPuyneers 4 жыл бұрын
I struggled with understanding this paper due to lack of knowledge (conceptually spoken), but after seeing your explanation, everything is clear. thank you very much
@AntonPanchishin
@AntonPanchishin 4 жыл бұрын
Mind blown. Super cool! I have so many tests to rerun with higher parameter count now
@danielbigham
@danielbigham 5 жыл бұрын
Fantastic video -- thank you! Fascinating...
@kristoferkrus
@kristoferkrus 5 жыл бұрын
Mind blown. Very interesting paper! Does this mean that if you are in the regime where the test loss has started to decrease (as a function of parameters) again and you add more training examples, your test accuracy will get worse because it makes it harder for the optimizer to find a simple function that perfectly mahces the training data? In theory, this could make it beneficial to reduce the number of training examples, but intuitively, that feels wrong.
@YannicKilcher
@YannicKilcher 5 жыл бұрын
That's a very interesting point. Technically yes, but I agree it seems strange.
@YannicKilcher
@YannicKilcher 5 жыл бұрын
I think it all comes down to the inductive bias given implicitly by the network architecture and the optimizer. In this framework, adding training data will take capacity away from the inductive bias and potentially worsen your result.
@andreg5206
@andreg5206 4 жыл бұрын
I know this is 10 months old, but at the end of 2019 OpenAI published a paper that suggests exactly what you imply here: openai.com/blog/deep-double-descent/
@kristoferkrus
@kristoferkrus 4 жыл бұрын
@@andreg5206 Yes, I saw that; that's so bizarre! Thanks for reminding me about it :)
@MLDawn
@MLDawn 3 жыл бұрын
you did a great job. This just left me speechless!!!
@DasGrosseFressen
@DasGrosseFressen 4 жыл бұрын
A high-complexity solution be like "Braaaah! Brrraah!" 😂👍
@995Fede
@995Fede 5 жыл бұрын
I started to read this paper during the last days and I confirm that it is really interesting! However, I have some doubts on the way they evaluate the MSE (how do they deal with the fact the function h(x) is complex?) and the zero-one loss/norm of coefficients (since it is a multi-class classification problem, they probably use one-hot encoding, but again how do they deal with the complex h(x)? Moreover, if they use one-hot encoding, the regressor is a 2D matrix, thus what norm are they plotting? L2 norm for matrices?). Did you try to reproduce their plots with the MNIST database? Are these technical passages clear to you? Thank you again for the video!
@DrAhdol
@DrAhdol 5 жыл бұрын
This is an interesting paper; I wonder if this applies to boosting/bagging with models that don't have many parameter options like multinomial naive bayes. Would parameter optimization on ensemble models have the same effect when the baseline model within are linear? Interesting option for some testing here.
@YannicKilcher
@YannicKilcher 5 жыл бұрын
Seems worth a try :) don't even know if boosting models can overfit in the classic sense...
@gyeonghokim
@gyeonghokim 3 жыл бұрын
Thanks a lot!
@sayakpaul3152
@sayakpaul3152 4 жыл бұрын
This is such an amazing study. So many synergies with the Deep Double Descent paper.
@parker1981xxx
@parker1981xxx 3 жыл бұрын
Actually, if you have a single parameter, it is commonly the bias term, not the slope.
@herp_derpingson
@herp_derpingson 5 жыл бұрын
Can you elaborate on the Hilbert space thing? What does Hilbert space to do with neural networks?
@YannicKilcher
@YannicKilcher 5 жыл бұрын
That's a bit too much for a YT comment, but the concept is usually well explained in introductory ML classes in the advanced section of kernelized SVMs.
@singhay_mle
@singhay_mle 5 жыл бұрын
Lookup 3BlueBrown's video on it
@herp_derpingson
@herp_derpingson 5 жыл бұрын
@@singhay_mle That does not explain what that has to do with neural networks.
@singhay_mle
@singhay_mle 5 жыл бұрын
@@herp_derpingson Sure, try this users.umiacs.umd.edu/~hal/docs/daume04rkhs.pdf , also it have more to do with kernel used by SVM/SVC than NN
@agusavior_channel
@agusavior_channel 2 жыл бұрын
Very clear
@ujjwalkar1886
@ujjwalkar1886 2 жыл бұрын
Is complexity of H means no of features here ?
Accelerating Deep Learning by Focusing on the Biggest Losers
25:10
Yannic Kilcher
Рет қаралды 2,7 М.
ДОКАЗАЛ ЧТО НЕ КАБЛУК #shorts
00:30
Паша Осадчий
Рет қаралды 1,8 МЛН
Новый уровень твоей сосиски
00:33
Кушать Хочу
Рет қаралды 2,7 МЛН
Dad gives best memory keeper
01:00
Justin Flom
Рет қаралды 21 МЛН
1ОШБ Да Вінчі навчання
00:14
AIRSOFT BALAN
Рет қаралды 6 МЛН
The Bias Variance Trade-Off
15:24
Mutual Information
Рет қаралды 15 М.
Neural Ordinary Differential Equations
22:19
Yannic Kilcher
Рет қаралды 54 М.
Ilya Sutskever on Deep Double Descent
5:17
Statistical Machine Learning
Рет қаралды 10 М.
Concept Learning with Energy-Based Models (Paper Explained)
39:29
Yannic Kilcher
Рет қаралды 31 М.
Lecture 08 - Bias-Variance Tradeoff
1:16:51
caltech
Рет қаралды 162 М.
Deep Double Descent
9:59
Connor Shorten
Рет қаралды 7 М.
ДОКАЗАЛ ЧТО НЕ КАБЛУК #shorts
00:30
Паша Осадчий
Рет қаралды 1,8 МЛН