i am watching this in 2025, and this is super helpful, thank you for making this channel!
@1.414216 күн бұрын
Forgot this was an old video
@lars1597Ай бұрын
very good explanation! thank you
@dr.ganeshbhokare6547Ай бұрын
Sir, if any vector is by default a column vector then the dot product equation should be "Theta Transpose x," which gives a scalar y; otherwise, it will give a matrix. I think it is a typo mistake. If your assumption is all are row vector, then it is fine.
@yurpipipchz753 ай бұрын
Thank you for the knowledge!
@newbie80513 ай бұрын
Bit tough to follow without any visualiations the relation to k-means was intuitive, as Gaussian Mixture Models essentially group the inputs as being sampled from k number of gaussians... thanks
@newbie80513 ай бұрын
oh the visualization at 12:50 was amazing, I drew a gaussian on the x-axis to better understand this I love how I am progressing with this, thanks !!!!
@user-nk1kz2fz2v4 ай бұрын
why f(x,b)=sign(x.x-b) is a circle?
@ckjdinnj5 ай бұрын
Thank you! This video is so much better than all the garbage machine learning click bait videos.
@rasu846 ай бұрын
at 11:00 what if the blue point is in between the two red points? I don't think a straight line can shatter them in that case. Or did I just misinterpret the entire thing?
@JieunKo-v1l6 ай бұрын
Thanks for wonderful explanation Do you share slides?
@dadmehrdidgar49719 ай бұрын
great video even after 10 years! thanks! :)
@pedroaragon34359 ай бұрын
By far the best explanation on GMM and specially the EM algorithm.
@carpediemcotidiem9 ай бұрын
00:01 Gradient boosting is a method of converting a sequence of weak learners into a very complex predictor. 01:31 Gradient Boosting in a nutshell 02:56 Ensembling improves predictions by combining weak learners 04:30 Ensembles in Gradient Boosting add complexity gradually for better fit 05:52 Gradient Boosting involves adjusting predictions to reduce error 07:14 Gradient Boosting involves fitting models to error residuals 08:56 Explaining the importance of step size alpha K in Gradient Boosting 10:20 Gradient boosting uses weighted sums of regressors for better predictions. Crafted by Merlin AI.
@mirettegeorgy512310 ай бұрын
Thank you for this video, helps alot!
@Noah-jz3gt11 ай бұрын
Very clear and straightforward while containing all the necessary contents to understand the concept!
@shivampadmani_iisc Жыл бұрын
Thank you so much so much sooooo much
@samfriedman5031 Жыл бұрын
4:07 MLE for sigma-hat should be X by X-transpose (outer product) not X-transpose by X (inner product)
@jokotajokisi Жыл бұрын
Oh my G. After 5 years of confusion, I finally understood Lp regularization! Thank you so much Alex!
@remandev9074 Жыл бұрын
This is not a very good explanation at all. There's WAY too much theorem dumping with difficult-to-parse variables all over the place, and a big lack of tangible examples. I don't know what other people see in this video.
@rickclark4832 Жыл бұрын
Exceptionally clear explanation of the use of EM with Gaussian Mixture Models
@Why_I_am_a_theist Жыл бұрын
Nice video , this is what I dig in youtube , an actual concise clear explanation worth any paid course
@spitalhelles3380 Жыл бұрын
Thanks :)
@GenerativeDiffusionModel_AI_ML Жыл бұрын
学习
@ZLYang Жыл бұрын
At 4:32, if x and μ are row vectors, [x-μ] should also be a row vector. Then how to multiply (Σ^(-1))* [x-μ]? Since the dimension of (Σ^(-1)) is 2*2, and the dimension of [x-μ] is 1*2.
@ZLYang Жыл бұрын
The best explanation I ever see. Hope can talk a bit about how to derive the equation.
@liubo19831214 Жыл бұрын
Prof. Ihler, could you provide the reference for hard EM (in the last slide)? Thx!
@adityabarge8603 Жыл бұрын
thanks for the explanation
@wenzwenzel2529 Жыл бұрын
Bootstrap-Aggregation. So helpful when you're trying to learn every AWS Machine Learning tool!
@wenzwenzel2529 Жыл бұрын
Very good nuance here.
@theSpicyHam Жыл бұрын
excellent presentation
@celisun10132 жыл бұрын
please publish more videos, professor Ihler!
@Phoenix2310922 жыл бұрын
Very few videos online give some key concepts here, like what we're truly trying to minimize with the penalty expression. Most just give the equation but never explain the intuition behind L1 and L2. Kudos man
@Aakash-mi8xq2 жыл бұрын
Very well explained. Thank you!
@nikhilchalla72162 жыл бұрын
Thank you very much for the amazing video! I have a couple of questions. 1. Would shatter be the right criteria to define VC dimension when classifying the data to the right class is also important? As an example of the circle classifier example, you will be able to shatter two points but you will not be able to classify the two points correctly as shown in the video. 2. One parameter with lots of power: Would it be possible to share a complete example for this case or any reference literature if you have any?
@Ranger-x5k2 жыл бұрын
Excellent explanation!! (one correction for the mistake: the outer product was written as the inner product in the slides
@erisha782 жыл бұрын
Beautiful!
@manueljenkin952 жыл бұрын
This video was so hard to follow and watch (too wordy and very little pauses) but I’m thankful anyway, since eventually I got to understand it by thinking about it and rewatching a few times.
@KulvinderSingh-pm7cr2 жыл бұрын
Simply best explanation!!
@KulvinderSingh-pm7cr2 жыл бұрын
Beautiful
@nathanzorndorf82142 жыл бұрын
Next time, I'd love it if you included the effect lambda has on regularization, including visuals!
@nathanzorndorf82142 жыл бұрын
Wow, that was such a great explanation. Thank you.
@TheProblembaer22 жыл бұрын
Hi Alexander Ihler, i watched plenty of videos, and this explanation is the best I found. Thank you!
@spyhunter00662 жыл бұрын
Could you explain more about the sum of the vectors in your notations for the maximum likelihood estimates at the minute 1.45? As far as I have noticed, there has been only one data set, namely one x vector. Thus, what actually are you summing up with j indices? Cheers.
@spyhunter00662 жыл бұрын
At the minute of 1.34, the maximum likelihood estimates formula has 1 over N coefficient. On the other hand, at the minute of 3.13, there is 1 over m coefficients. We know that N and m is the total number of values in the sums, but what is the reason you used different notations as N and m. Is it just to seperate univariate and multivariate cases while they keep their definitions (or meaning)? Also, the j values in the lower and upper limits of sum sembols are not so clear in this notation. Should we write j=1 to j=m or N for instance?
@spyhunter00662 жыл бұрын
One more question about the example at the minute of 4.24, you said independent x1 and x2 variables. Independendent of what??? As far as I see, you can have 2 univariate formula like in this example, but when you combine them to see the combined likelihood, you have to have a mean vector in size of 2 and Sigma matrix iin size of 2x2. That's always the case, right? The size of the mean vector and the Sigma matrix look like defined by the number of combination of x values. Is that right? I saw another example somewhere else, you can have L(μ=28 ,σ=2 | x1=32 and x2=34) for instance to find the combined likelihood at x1=32 and x2=34, and he uses only one mean and sigma for both. REF:kzbin.info/www/bejne/ep-Zk2yceK6Ipq8&ab_channel=StatQuestwithJoshStarmer
@spyhunter00662 жыл бұрын
I'd like to know how you call your x value for univariate caseü or x value set for multivariate case in your Gaussian distribuitons? Do you name them as "data set" or " variable set"? Also, what makes the mean value size same as the x data size? Thanks in advance. Should we think that we create one mean average for every added x data point in our data set? That's why we average them when we find the best estimated value in the end.
@spyhunter00662 жыл бұрын
In the formula at the minute 2.11, when you find the inverse of a Sigma matrix in the exp(...) , do you use unit matrix method, any coding , or some other method? Cheers.
@spyhunter00662 жыл бұрын
At 5.23, you should have said (x-mu) transpose.
@AlexanderIhler2 жыл бұрын
These slides have a number of transposition notation errors, due to my having migrated from column to row notation that year. Unfortunately KZbin does not allow updating videos, so the errors remain. It should be clear in context, since i say “outer product” for the few non inner products.
@spyhunter00662 жыл бұрын
@@AlexanderIhler NO worries, we spot them.
@d-rex70432 жыл бұрын
This should be mandatory viewing, before being assaulted with the symbolic derivations!