Linear Regression 3 [Python]

Рет қаралды 16,167

Күн бұрын

Пікірлер: 32

@vanshajkhattar8373 4 жыл бұрын

Great and helpful video Dr. Brunton! Can you also make video regarding nonlinear regression (like Gaussian Process Regression)? Thank You for your consistently great videos!

@AbhayaParthy 3 жыл бұрын

Thanks Steve. Great videos!

@thomle984 4 жыл бұрын

Thank you so much !!

@dhananjaykansal8097 4 жыл бұрын

This is completely new. I mean such type of coding I just haven’t seen. It’s too much to take and understand but I’m definitely gonna sit and carefully see and understand what’s happening with the code line by line Buy, NO words on VIF or potential Outliers ? Can you kindly have a say on this please. Thx

@_J_A_G_ 2 жыл бұрын

> But, NO words on VIF or potential Outliers ? This video series is about SVD kzbin.info/aero/PLMrJAkhIeNNSVjnsviglFoY2nXildDCcv Regression is just an example he mentions. This video isn't clear on that in itself, but don't expect it to be anything but a simple demo, not a full deep-dive into statistics and analysis. In an earlier video he mentioned outliers and did promise go cover "robust" method, but I haven't seen it yet, probably in the book though. As for VIF, I assume that is Variance inflation factor, so future readers can look for that.

@colinbledsoe7204 4 жыл бұрын

Can you explain why the A matrix is standardized to z-scores prior to computing the SVD but the b vector is left in its raw form? See attribute significance plot at 5:20

@colinbledsoe7204 4 жыл бұрын

Another question.. was it intentional to leave the b vector sorted from least to greatest when it was used to generate the attribute significance plot?

@David-pe2dt 4 жыл бұрын

I was also asking myself this question... if you plot the same graph with a mean-centered b and then normalized by its std, the shape of the graph does not change. What really matters is to normalize the attributes matrix A, as you want to compare slopes of the same dimensions (either non-dimensional, by normalizing b as well, or with the dimensions of the response vector b)

@David-pe2dt 4 жыл бұрын

@@colinbledsoe7204 I believe this to be a mistake in the code

@berkeozgurarslan9745 Ай бұрын

@@David-pe2dt Yes. To see that this is actually a mistake I plotted individual columns that their significance plot deems positively correlated and they are completely uncorrelated. Instead if you do this with unsorted b you can clearly see negative and positive correlations from the correct plot.

@marcinkrupowicz6834 Жыл бұрын

What's the advantage of solving least squares using pseudo inverse vs. normal equations? Is it about numerical stability of the inv(A.T @ A) ?

@MrTechie2020 3 жыл бұрын

why the cement data was not padded with 1 but only housing data?

@_J_A_G_ 2 жыл бұрын

Great question! I think a fair answer is that he wanted to start simple, for pedagogical reasons. This is an example anyway, not a model that's supposed to be perfect. There's discussion on what padding does, in another comment. kzbin.info/www/bejne/rH_NfaidmcZ6rNU&lc=Ugx6muuzT_xCOXFHUnh4AaABAg Now when we know how easy it is to add another parameter and give the model more freedom, we could try it out and see if the line would be similar or very different. On the other hand, in that example he said/showed that the model worked well, so that's also an answer. You may start simple and if that is good enough, don't add complexity to the model. This may sound like a useless answer, but yet... This is also the answer to a possibly later question of "why didn't we make a neural network model of 12 layers". In some domains (e.g. physics experiments) it's quite natural to assume a line through origin. If you predict volume of an iron weight, you'll not be surprised if it goes towards exact zero with lowering weight. I haven't looked into the data of cement, but it may be that this is one of those examples, no cement means no heat. One might also suggest that the house prices should become zero when every feature is zero, but I'm sure you'll need to pay commission to the real estate agent even if you by a house with zero rooms. :) (A more serious guess is that the linear approximation on houses isn't complete, there are hidden features not accounted for, but it may be good enough as a predictive model even if physically incorrect.)

@salihaamoura232 4 жыл бұрын

thank you

@akhilife_t 4 жыл бұрын

Can you share the jupyter notebook please?

@ahmedsaliem7041 6 ай бұрын

Thank you so much

@hamzaullahkhan8602 4 жыл бұрын

How can we do fitting of quadratic equations

@David-pe2dt 4 жыл бұрын

A question that I would like to ask concerns computing Pearson or Spearman correlation coefficients between the original attributes matrix A and the response vector b. If the correlation coefficient for a given attribute has opposite sign to the slope of that attribute from multilinear regression, does that imply that the linear model is not a good fit for that particular attribute?

@kurtstraemann470 4 жыл бұрын

Hey there Steve, could it be possible to get a link to the used housing dataset.

@otv9005 4 жыл бұрын

raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv

@nmay231 4 жыл бұрын

I would recommend installing the scikit-learn library. Not only does it have plug-and-play ML models, it also has a nice collection of datasets included in its `datasets` module (Includes boston housing dataset): scikit-learn.org/stable/datasets/index.html

@Ajwadmohimin0 2 жыл бұрын

Can anyone please explain why x is not sorted while plotting and working with the sorted housing data? 4:28

@luuktheman 2 жыл бұрын

A bug in the code. The sorted b is used in the remainder of the code, which is incorrect. So the last bar chart is incorrect.

@_J_A_G_ 2 жыл бұрын

@@luuktheman I can add that the same bug is in Matlab code. Someone hinted at using immutable data, rather than changing the meaning of b suddenly. When using Jupyter and running code cells out of order I'd say that recommendation makes even more sense.

@_J_A_G_ 2 жыл бұрын

Although there is a bug with sorted b, 4:28 is all fine. The problem is only for the plot at 5:16. @Ajwadmohimin Explanation: A correct x is a column vector where the order of elements matches the order of features in A. If you don't rearrange the features, unchanged x works independently with any row of A. The A[sort_ind,:] rearrange rows (neighborhood samples, to the order matching b) but doesn't change order of features (the columns of A, the :). The x comes from pseudo-inverse at 3:50. This runs before changing order of b and with the unchanged A, which is correct. Changing order of b later doesn't change the x already calculated. At 5:10 the new x is calculated on rearranged b, which is incorrect and affects the bar plot. Note that rearranging x wouldn't help, x is all wrong. The problem is that we mapped each neighborhood to the wrong price, and made x a model for that. Later in the video, b is reloaded from file and never sorted. No more problems!

@JohnSmith-ok9sn 3 жыл бұрын

I wonder, why did you not just choose an 80/20, or 75/25, or 87/13, or something of that sort of Train/Test samples, along with randomized sampling, rather than just use the randomized sampling alone? Why only 50/50 for Train/Test samples? This would, probably, significantly improve the result of fitting. When I did this while taking an online course of ML, I had gotten one of the highest fits for my model. I am very new to all this, and may not know all the intricacies of it. It would be great if you could explain it in a couple of sentences.

@_J_A_G_ 2 жыл бұрын

At 6:10 an explanation for the split is given, but allow me to try to add some more reasoning to make it more clear. You seem to have a good analysis done on your experiment, so I'll be more general. For the specific question, likely Brunton didn't aim for perfect results anyway. You do the split to verify your model on unseen data. A 50/50 split is the obvious balance between training data and validation. If you have only little data, using more for training probably helps, but when you ask yourself if you can use 75% or 87% of the data you may then lean to 95% or 99%... The problem with using a lot of the data for training is the risk of overfitting. It may look very good on your current data, but when deploying it later on unseen data, you can expect problems. Training on less data will not give a good model either, but at least you'll be aware of it, because the validation data showed the weakness. This is a clue to aquire more data overall, rather than to leave less for verification. In practice using 70% for training may be considered rule of thumb, so your question is of course valid, but I claim 50% to be the most fair choice. For completeness, it's also worth mentioning that best practice isn't to split the data in two sets, but rather to make three, train/validate/test. After training, use the validation (a.k.a dev set) to calibrate any hyper-parameters. This way you can make use of unseen data, but not spoil the test set.

@hyperduality2838 4 жыл бұрын

Optimizing predictions = Teleological physics or target tracking. Teleological physics is dual to non teleological physics. Increasing syntropy (optimizing predictions) is dual to increasing entropy. Thesis is dual to anti-thesis, the time independent Hegelian dialectic. Alive (thesis, being) is dual to not alive (anti-thesis, non being) -- Schrodinger's or Hegel's cat.

@pavelkonovalov8931 4 жыл бұрын

Can you share the config of your Jupyter template? 🙏

@tilohauke9033 4 жыл бұрын

I enjoy this course, but padding the matrix with ones is poorly explained, also in the book.

@_J_A_G_ 2 жыл бұрын

I think 2:26 isn't that bad of an explanation, but allow me to try to add some more background to make it more clear. The parameters that your model can learn are rows in x, how many you have is decided by the number of columns in A, because that must match in the matrix multiplication. The features you measure are placed in columns in A. Each sample (an observation of each feature) is a row in A (and in b). Consider the simple case of only one feature, scalar a, which places the samples on plane a,b. Linear regression is fitting a parametric line to the samples in that space. With one column in A, we can go for one parameter in x. A one parameter equation of a line is ka = b. For a = 0, it forces b = 0 regardless of k. Any such line must go through the origin and the model is severely restricted. For this reason the general equation of a line is usually written ka + m = b, which arbitrarily offset b=m at a=0. What can we do to get more parameters than features without breaking the multiplication? It's easy to rewrite ka + m = b as a matrix multiplication. Replacing variables A = [a 1], x = [k,m], b = [y] turns the general line equation into Ax = b. So what you do when padding A with ones, is actually to introduce the offset parameter m in x. Of course, if the data so indicates, the parameter m can turn zero, but if the parameter is excluded it can never be other than zero. Now, if there are more features, you have a higher dimensional space, but the math is the same. If you don't want to force the line through origin, you need one more parameter than you have features. An easy way is to add a last column of ones. You can yourself discuss the difference of putting the column first or wherever, and if it's 2 instead of 1. Why earlier examples didn't have this padding is discussed in another comment. kzbin.info/www/bejne/rH_NfaidmcZ6rNU&lc=Ugxl2sJNF-VxoJYQg514AaABAg