Sir, if any vector is by default a column vector then the dot product equation should be "Theta Transpose x," which gives a scalar y; otherwise, it will give a matrix. I think it is a typo mistake. If your assumption is all are row vector, then it is fine.
@yurpipipchz75Ай бұрын
Thank you for the knowledge!
@newbie8051Ай бұрын
Bit tough to follow without any visualiations the relation to k-means was intuitive, as Gaussian Mixture Models essentially group the inputs as being sampled from k number of gaussians... thanks
@newbie8051Ай бұрын
oh the visualization at 12:50 was amazing, I drew a gaussian on the x-axis to better understand this I love how I am progressing with this, thanks !!!!
@user-nk1kz2fz2v3 ай бұрын
why f(x,b)=sign(x.x-b) is a circle?
@ckjdinnj4 ай бұрын
Thank you! This video is so much better than all the garbage machine learning click bait videos.
@rasu844 ай бұрын
at 11:00 what if the blue point is in between the two red points? I don't think a straight line can shatter them in that case. Or did I just misinterpret the entire thing?
@JieunKo-v1l5 ай бұрын
Thanks for wonderful explanation Do you share slides?
@dadmehrdidgar49717 ай бұрын
great video even after 10 years! thanks! :)
@pedroaragon34357 ай бұрын
By far the best explanation on GMM and specially the EM algorithm.
@carpediemcotidiem7 ай бұрын
00:01 Gradient boosting is a method of converting a sequence of weak learners into a very complex predictor. 01:31 Gradient Boosting in a nutshell 02:56 Ensembling improves predictions by combining weak learners 04:30 Ensembles in Gradient Boosting add complexity gradually for better fit 05:52 Gradient Boosting involves adjusting predictions to reduce error 07:14 Gradient Boosting involves fitting models to error residuals 08:56 Explaining the importance of step size alpha K in Gradient Boosting 10:20 Gradient boosting uses weighted sums of regressors for better predictions. Crafted by Merlin AI.
@mirettegeorgy51238 ай бұрын
Thank you for this video, helps alot!
@Noah-jz3gt9 ай бұрын
Very clear and straightforward while containing all the necessary contents to understand the concept!
@shivampadmani_iisc10 ай бұрын
Thank you so much so much sooooo much
@samfriedman503111 ай бұрын
4:07 MLE for sigma-hat should be X by X-transpose (outer product) not X-transpose by X (inner product)
@jokotajokisi11 ай бұрын
Oh my G. After 5 years of confusion, I finally understood Lp regularization! Thank you so much Alex!
@remandev9074 Жыл бұрын
This is not a very good explanation at all. There's WAY too much theorem dumping with difficult-to-parse variables all over the place, and a big lack of tangible examples. I don't know what other people see in this video.
@rickclark4832 Жыл бұрын
Exceptionally clear explanation of the use of EM with Gaussian Mixture Models
@Why_I_am_a_theist Жыл бұрын
Nice video , this is what I dig in youtube , an actual concise clear explanation worth any paid course
@spitalhelles3380 Жыл бұрын
Thanks :)
@GenerativeDiffusionModel_AI_ML Жыл бұрын
学习
@ZLYang Жыл бұрын
At 4:32, if x and μ are row vectors, [x-μ] should also be a row vector. Then how to multiply (Σ^(-1))* [x-μ]? Since the dimension of (Σ^(-1)) is 2*2, and the dimension of [x-μ] is 1*2.
@ZLYang Жыл бұрын
The best explanation I ever see. Hope can talk a bit about how to derive the equation.
@liubo19831214 Жыл бұрын
Prof. Ihler, could you provide the reference for hard EM (in the last slide)? Thx!
@adityabarge8603 Жыл бұрын
thanks for the explanation
@wenzwenzel2529 Жыл бұрын
Bootstrap-Aggregation. So helpful when you're trying to learn every AWS Machine Learning tool!
@wenzwenzel2529 Жыл бұрын
Very good nuance here.
@theSpicyHam Жыл бұрын
excellent presentation
@celisun10132 жыл бұрын
please publish more videos, professor Ihler!
@Phoenix2310922 жыл бұрын
Very few videos online give some key concepts here, like what we're truly trying to minimize with the penalty expression. Most just give the equation but never explain the intuition behind L1 and L2. Kudos man
@Aakash-mi8xq2 жыл бұрын
Very well explained. Thank you!
@nikhilchalla72162 жыл бұрын
Thank you very much for the amazing video! I have a couple of questions. 1. Would shatter be the right criteria to define VC dimension when classifying the data to the right class is also important? As an example of the circle classifier example, you will be able to shatter two points but you will not be able to classify the two points correctly as shown in the video. 2. One parameter with lots of power: Would it be possible to share a complete example for this case or any reference literature if you have any?
@Ranger-x5k2 жыл бұрын
Excellent explanation!! (one correction for the mistake: the outer product was written as the inner product in the slides
@erisha782 жыл бұрын
Beautiful!
@manueljenkin952 жыл бұрын
This video was so hard to follow and watch (too wordy and very little pauses) but I’m thankful anyway, since eventually I got to understand it by thinking about it and rewatching a few times.
@KulvinderSingh-pm7cr2 жыл бұрын
Simply best explanation!!
@KulvinderSingh-pm7cr2 жыл бұрын
Beautiful
@nathanzorndorf82142 жыл бұрын
Next time, I'd love it if you included the effect lambda has on regularization, including visuals!
@nathanzorndorf82142 жыл бұрын
Wow, that was such a great explanation. Thank you.
@TheProblembaer22 жыл бұрын
Hi Alexander Ihler, i watched plenty of videos, and this explanation is the best I found. Thank you!
@spyhunter00662 жыл бұрын
Could you explain more about the sum of the vectors in your notations for the maximum likelihood estimates at the minute 1.45? As far as I have noticed, there has been only one data set, namely one x vector. Thus, what actually are you summing up with j indices? Cheers.
@spyhunter00662 жыл бұрын
At the minute of 1.34, the maximum likelihood estimates formula has 1 over N coefficient. On the other hand, at the minute of 3.13, there is 1 over m coefficients. We know that N and m is the total number of values in the sums, but what is the reason you used different notations as N and m. Is it just to seperate univariate and multivariate cases while they keep their definitions (or meaning)? Also, the j values in the lower and upper limits of sum sembols are not so clear in this notation. Should we write j=1 to j=m or N for instance?
@spyhunter00662 жыл бұрын
One more question about the example at the minute of 4.24, you said independent x1 and x2 variables. Independendent of what??? As far as I see, you can have 2 univariate formula like in this example, but when you combine them to see the combined likelihood, you have to have a mean vector in size of 2 and Sigma matrix iin size of 2x2. That's always the case, right? The size of the mean vector and the Sigma matrix look like defined by the number of combination of x values. Is that right? I saw another example somewhere else, you can have L(μ=28 ,σ=2 | x1=32 and x2=34) for instance to find the combined likelihood at x1=32 and x2=34, and he uses only one mean and sigma for both. REF:kzbin.info/www/bejne/ep-Zk2yceK6Ipq8&ab_channel=StatQuestwithJoshStarmer
@spyhunter00662 жыл бұрын
I'd like to know how you call your x value for univariate caseü or x value set for multivariate case in your Gaussian distribuitons? Do you name them as "data set" or " variable set"? Also, what makes the mean value size same as the x data size? Thanks in advance. Should we think that we create one mean average for every added x data point in our data set? That's why we average them when we find the best estimated value in the end.
@spyhunter00662 жыл бұрын
In the formula at the minute 2.11, when you find the inverse of a Sigma matrix in the exp(...) , do you use unit matrix method, any coding , or some other method? Cheers.
@spyhunter00662 жыл бұрын
At 5.23, you should have said (x-mu) transpose.
@AlexanderIhler2 жыл бұрын
These slides have a number of transposition notation errors, due to my having migrated from column to row notation that year. Unfortunately KZbin does not allow updating videos, so the errors remain. It should be clear in context, since i say “outer product” for the few non inner products.
@spyhunter00662 жыл бұрын
@@AlexanderIhler NO worries, we spot them.
@d-rex70432 жыл бұрын
This should be mandatory viewing, before being assaulted with the symbolic derivations!
@spyhunter00662 жыл бұрын
Can you tell me the diffference between bivariate and multivariate case ? Can you also mention about when the parameters are dependent where we add extra dependence coefficient parameter? There is a sample video to refer for you give a better idea: kzbin.info/www/bejne/e5nQYaCZob-ma5Y
@AlexanderIhler2 жыл бұрын
Bivariate = 2 variables; multivariate = more than one variable. So bivariate is a special case, in which the mean is two-dimensional and the covariance is 2x2. Above 2 dimensions it is hard to visualize, so I usually just draw 2D distributions; but the mathematics is exactly the same.
@spyhunter00662 жыл бұрын
@@AlexanderIhler Your initial case of 1D Gaussian with only one x value is indeed a bivariate case with one x value with two parameters,the mean and the sigma value, right? Also, bivariate case can be called the simplest case of multivariate occasion, right? If we have a data set x and a multiple variable of mean and sigmas, we have to use your MULTIVARIATE CASE with a vector of x values and mean values with a covariance matrix for the sigma values, shouldn't we? Thanks for the help in advance.
@AlexanderIhler2 жыл бұрын
No, those are the parameters; if “x” (the random variable) is scalar, it is univariate, although the distribution may have any number of parameters. So, if x is bivariate, x=[x1,x2], the mean will have 2 entries and the covariance 4 (3 free parameters, since it is symmetric), so the distribution has 5 parameters total.
@spyhunter00662 жыл бұрын
@@AlexanderIhler x is your data point, right! If it is only one scalar value, the case is called univariate case, but if it is a vector of scalar values of two, it is called bivariate by definition. That's it. For bivariate and multivariate case where the data x variable is a vector of size d, the mean is also a vector of the same size of x vector. Thus, the covariance matrix by definition the square matrix has to have d by d matrix if x and mean has d dimension as you said . I assume you said 5 parameters in total, because symmetric terms are equal in covariance matrix, so 4-1=3 parameters coming from that Sigma matrix with size d x d .
@spyhunter00662 жыл бұрын
should we get x vector also as a row vector with length d just like nü (mean) vector at the minute of 1.44!
@chyldstudios2 жыл бұрын
Solid explanation.
@itarabichi2 жыл бұрын
Great explanation! Every bit of it can be comprehended. Well Done!