Great and helpful video Dr. Brunton! Can you also make video regarding nonlinear regression (like Gaussian Process Regression)? Thank You for your consistently great videos!
@AbhayaParthy3 жыл бұрын
Thanks Steve. Great videos!
@thomle9844 жыл бұрын
Thank you so much !!
@dhananjaykansal80974 жыл бұрын
This is completely new. I mean such type of coding I just haven’t seen. It’s too much to take and understand but I’m definitely gonna sit and carefully see and understand what’s happening with the code line by line Buy, NO words on VIF or potential Outliers ? Can you kindly have a say on this please. Thx
@_J_A_G_2 жыл бұрын
> But, NO words on VIF or potential Outliers ? This video series is about SVD kzbin.info/aero/PLMrJAkhIeNNSVjnsviglFoY2nXildDCcv Regression is just an example he mentions. This video isn't clear on that in itself, but don't expect it to be anything but a simple demo, not a full deep-dive into statistics and analysis. In an earlier video he mentioned outliers and did promise go cover "robust" method, but I haven't seen it yet, probably in the book though. As for VIF, I assume that is Variance inflation factor, so future readers can look for that.
@colinbledsoe72044 жыл бұрын
Can you explain why the A matrix is standardized to z-scores prior to computing the SVD but the b vector is left in its raw form? See attribute significance plot at 5:20
@colinbledsoe72044 жыл бұрын
Another question.. was it intentional to leave the b vector sorted from least to greatest when it was used to generate the attribute significance plot?
@David-pe2dt4 жыл бұрын
I was also asking myself this question... if you plot the same graph with a mean-centered b and then normalized by its std, the shape of the graph does not change. What really matters is to normalize the attributes matrix A, as you want to compare slopes of the same dimensions (either non-dimensional, by normalizing b as well, or with the dimensions of the response vector b)
@David-pe2dt4 жыл бұрын
@@colinbledsoe7204 I believe this to be a mistake in the code
@berkeozgurarslan9745Ай бұрын
@@David-pe2dt Yes. To see that this is actually a mistake I plotted individual columns that their significance plot deems positively correlated and they are completely uncorrelated. Instead if you do this with unsorted b you can clearly see negative and positive correlations from the correct plot.
@marcinkrupowicz6834 Жыл бұрын
What's the advantage of solving least squares using pseudo inverse vs. normal equations? Is it about numerical stability of the inv(A.T @ A) ?
@MrTechie20203 жыл бұрын
why the cement data was not padded with 1 but only housing data?
@_J_A_G_2 жыл бұрын
Great question! I think a fair answer is that he wanted to start simple, for pedagogical reasons. This is an example anyway, not a model that's supposed to be perfect. There's discussion on what padding does, in another comment. kzbin.info/www/bejne/rH_NfaidmcZ6rNU&lc=Ugx6muuzT_xCOXFHUnh4AaABAg Now when we know how easy it is to add another parameter and give the model more freedom, we could try it out and see if the line would be similar or very different. On the other hand, in that example he said/showed that the model worked well, so that's also an answer. You may start simple and if that is good enough, don't add complexity to the model. This may sound like a useless answer, but yet... This is also the answer to a possibly later question of "why didn't we make a neural network model of 12 layers". In some domains (e.g. physics experiments) it's quite natural to assume a line through origin. If you predict volume of an iron weight, you'll not be surprised if it goes towards exact zero with lowering weight. I haven't looked into the data of cement, but it may be that this is one of those examples, no cement means no heat. One might also suggest that the house prices should become zero when every feature is zero, but I'm sure you'll need to pay commission to the real estate agent even if you by a house with zero rooms. :) (A more serious guess is that the linear approximation on houses isn't complete, there are hidden features not accounted for, but it may be good enough as a predictive model even if physically incorrect.)
@salihaamoura2324 жыл бұрын
thank you
@akhilife_t4 жыл бұрын
Can you share the jupyter notebook please?
@ahmedsaliem70416 ай бұрын
Thank you so much
@hamzaullahkhan86024 жыл бұрын
How can we do fitting of quadratic equations
@David-pe2dt4 жыл бұрын
A question that I would like to ask concerns computing Pearson or Spearman correlation coefficients between the original attributes matrix A and the response vector b. If the correlation coefficient for a given attribute has opposite sign to the slope of that attribute from multilinear regression, does that imply that the linear model is not a good fit for that particular attribute?
@kurtstraemann4704 жыл бұрын
Hey there Steve, could it be possible to get a link to the used housing dataset.
I would recommend installing the scikit-learn library. Not only does it have plug-and-play ML models, it also has a nice collection of datasets included in its `datasets` module (Includes boston housing dataset): scikit-learn.org/stable/datasets/index.html
@Ajwadmohimin02 жыл бұрын
Can anyone please explain why x is not sorted while plotting and working with the sorted housing data? 4:28
@luuktheman2 жыл бұрын
A bug in the code. The sorted b is used in the remainder of the code, which is incorrect. So the last bar chart is incorrect.
@_J_A_G_2 жыл бұрын
@@luuktheman I can add that the same bug is in Matlab code. Someone hinted at using immutable data, rather than changing the meaning of b suddenly. When using Jupyter and running code cells out of order I'd say that recommendation makes even more sense.
@_J_A_G_2 жыл бұрын
Although there is a bug with sorted b, 4:28 is all fine. The problem is only for the plot at 5:16. @Ajwadmohimin Explanation: A correct x is a column vector where the order of elements matches the order of features in A. If you don't rearrange the features, unchanged x works independently with any row of A. The A[sort_ind,:] rearrange rows (neighborhood samples, to the order matching b) but doesn't change order of features (the columns of A, the :). The x comes from pseudo-inverse at 3:50. This runs before changing order of b and with the unchanged A, which is correct. Changing order of b later doesn't change the x already calculated. At 5:10 the new x is calculated on rearranged b, which is incorrect and affects the bar plot. Note that rearranging x wouldn't help, x is all wrong. The problem is that we mapped each neighborhood to the wrong price, and made x a model for that. Later in the video, b is reloaded from file and never sorted. No more problems!
@JohnSmith-ok9sn3 жыл бұрын
I wonder, why did you not just choose an 80/20, or 75/25, or 87/13, or something of that sort of Train/Test samples, along with randomized sampling, rather than just use the randomized sampling alone? Why only 50/50 for Train/Test samples? This would, probably, significantly improve the result of fitting. When I did this while taking an online course of ML, I had gotten one of the highest fits for my model. I am very new to all this, and may not know all the intricacies of it. It would be great if you could explain it in a couple of sentences.
@_J_A_G_2 жыл бұрын
At 6:10 an explanation for the split is given, but allow me to try to add some more reasoning to make it more clear. You seem to have a good analysis done on your experiment, so I'll be more general. For the specific question, likely Brunton didn't aim for perfect results anyway. You do the split to verify your model on unseen data. A 50/50 split is the obvious balance between training data and validation. If you have only little data, using more for training probably helps, but when you ask yourself if you can use 75% or 87% of the data you may then lean to 95% or 99%... The problem with using a lot of the data for training is the risk of overfitting. It may look very good on your current data, but when deploying it later on unseen data, you can expect problems. Training on less data will not give a good model either, but at least you'll be aware of it, because the validation data showed the weakness. This is a clue to aquire more data overall, rather than to leave less for verification. In practice using 70% for training may be considered rule of thumb, so your question is of course valid, but I claim 50% to be the most fair choice. For completeness, it's also worth mentioning that best practice isn't to split the data in two sets, but rather to make three, train/validate/test. After training, use the validation (a.k.a dev set) to calibrate any hyper-parameters. This way you can make use of unseen data, but not spoil the test set.
@hyperduality28384 жыл бұрын
Optimizing predictions = Teleological physics or target tracking. Teleological physics is dual to non teleological physics. Increasing syntropy (optimizing predictions) is dual to increasing entropy. Thesis is dual to anti-thesis, the time independent Hegelian dialectic. Alive (thesis, being) is dual to not alive (anti-thesis, non being) -- Schrodinger's or Hegel's cat.
@pavelkonovalov89314 жыл бұрын
Can you share the config of your Jupyter template? 🙏
@tilohauke90334 жыл бұрын
I enjoy this course, but padding the matrix with ones is poorly explained, also in the book.
@_J_A_G_2 жыл бұрын
I think 2:26 isn't that bad of an explanation, but allow me to try to add some more background to make it more clear. The parameters that your model can learn are rows in x, how many you have is decided by the number of columns in A, because that must match in the matrix multiplication. The features you measure are placed in columns in A. Each sample (an observation of each feature) is a row in A (and in b). Consider the simple case of only one feature, scalar a, which places the samples on plane a,b. Linear regression is fitting a parametric line to the samples in that space. With one column in A, we can go for one parameter in x. A one parameter equation of a line is ka = b. For a = 0, it forces b = 0 regardless of k. Any such line must go through the origin and the model is severely restricted. For this reason the general equation of a line is usually written ka + m = b, which arbitrarily offset b=m at a=0. What can we do to get more parameters than features without breaking the multiplication? It's easy to rewrite ka + m = b as a matrix multiplication. Replacing variables A = [a 1], x = [k,m], b = [y] turns the general line equation into Ax = b. So what you do when padding A with ones, is actually to introduce the offset parameter m in x. Of course, if the data so indicates, the parameter m can turn zero, but if the parameter is excluded it can never be other than zero. Now, if there are more features, you have a higher dimensional space, but the math is the same. If you don't want to force the line through origin, you need one more parameter than you have features. An easy way is to add a last column of ones. You can yourself discuss the difference of putting the column first or wherever, and if it's 2 instead of 1. Why earlier examples didn't have this padding is discussed in another comment. kzbin.info/www/bejne/rH_NfaidmcZ6rNU&lc=Ugxl2sJNF-VxoJYQg514AaABAg