Machine Learning Tutorial Python - 19: Principal Component Analysis (PCA) with Python Code

Рет қаралды 217,785

codebasics

Күн бұрын

Пікірлер: 175

@codebasics 6 күн бұрын

Folks, here's a link to our bootcamp for learning AI and Data Science in the most practical way: tinyurl.com/395u4mnm

@Rainbow-lj5pp 9 ай бұрын

This is a really easy to understand and thorough explanation of principal component analysis. Many others I watched were either too technical and math theory oriented or to basic in showing how to use the function but not what it does. This is a great balance of understanding and practicality.

@mayishamaliha6369 Жыл бұрын

super helpful for newbies not scaring them off with too many statistical terms and getting overwhelmed. thank u so much

@rakeshbullet7363 Жыл бұрын

Awesome videos - Simple explanations. A balanced approach to teaching with a right mixture of theory and practicals and not overwhelming the learners . i loved the approach - After seeing numerous ML training videos from across the spectrum , this is far most the best one i have seen . Thank you for taking time to create these videos .

@dushyyanta5305 2 жыл бұрын

You are the best! I am doing PG in DS but still, I watch your videos for better understanding. Kudos! Keep it up!

@mohitupadhayay1439 3 жыл бұрын

The last few minutes were BANG ON! This is what i wanted to hear. Thanks!

@AqsaChappalwala 9 ай бұрын

Masters in Data Science in the UK and still loves watching only your videos :-)

@ernestanonde3218 2 жыл бұрын

This is the best channel on KZbin. You are simply amazing. You just saved my career. Thanks a million

@ernestanonde3218 Жыл бұрын

@Karthiktanu I am.a student of data science and analytics

@ernestanonde3218 Жыл бұрын

@@Thanusree234 yes

@bhaskarg8438 3 жыл бұрын

Thank you, PCA concept is clearly explained . Need to understand in actual real life scenarios, what we consider, the performance or process time

@prakashkoneti7630 Жыл бұрын

I would really appreciate for your hard work in making these videos and decoding the complex to easy..

@luciamatamorospava4382 Жыл бұрын

it's like the 10th video i'm watching on PCA and the FIRST one I understand, thank you so much!

@swagsterfut9992 2 жыл бұрын

at 17:35, shouldn't we be doing pca.fit_transform()on our scaled dataset (X_scaled in our case) rather than on X?

@spatialnasir 2 жыл бұрын

This came to my mind also. Perhaps the accuracy would have been higher if he scaled before pca.

@anirudhgangadhar6158 2 жыл бұрын

Yes it should be on X_scaled.

@IkramShah-s8j Ай бұрын

Using train_test_split How to use: 1: fit() 2: transform() And 3: fit_transform()

@BG4INDIA 2 жыл бұрын

Impressed with the clarity of explaination

@codebasics 2 жыл бұрын

glad you liked it

@guillermokinderman8267 Жыл бұрын

I was trying to understand PCA, this video helped me a lot

@akinloluwababalola6666 3 жыл бұрын

Hello Code basics. I usually enjoy your videos as I learn a lot from them. Can you make a video on association rules, apriori algorithms and any machine model that deals with the determination of interrelationships amongst variables? Thank you

@boogersincoffee 3 жыл бұрын

Ahhhhhh I've been struggling to understand this and this cleared everything up, thank you

@maruthiprasad8184 3 жыл бұрын

Thank you very much for simple and great explanation. I got higher accuracy in SVM=86.74 %, after PCA I got accuracy in RF=73.06

@pranjiljain4500 2 жыл бұрын

But timing and machine power also decreases heavily

@mohammeddanishreza4902 2 жыл бұрын

can you share the github link for your code please.

@CharzIntLtd 3 жыл бұрын

Thanks sir the great work, your explanation makes ML easier for sure 🙏

@TK-fx8dh 3 жыл бұрын

My long await topic!!!! Thank you for posting this PCA lesson

@dees900 Жыл бұрын

great explanation on PCA. It's an abstract concept to grasp. well done

@asamadawais 2 жыл бұрын

I am watching this vedio 2nd or 3rd time. @Dhavel you best among equals...👍👍

@jinks3669 2 жыл бұрын

Another very informative video. DHANYAVAAD ! :)

@richardshaw8326 11 ай бұрын

Great explanation on PCA. @codebasics: I must have missed it though, but after running the PCA to identify which features will give the results, I missed where one might get the features.

@mukeshkumaryadav350 2 жыл бұрын

It was an amazing explanation of PCA without much mathematics and eigen value and vector which scares me. Interesting learning 1. we can know variance explained by each PC which helps.

@gnaneshgn8341 3 жыл бұрын

Nice Video sir. Please make a video for the math behind PCA. Thanks in Advance Sir

@geekyprogrammer4831 3 жыл бұрын

kzbin.info/www/bejne/fJjEnI2ta7Bkh7M this should be sufficient if you want to know mathematics

@shubhanshugupta9754 2 жыл бұрын

U can get best of pcs by taking log(total features)

@GOALnGOAT007 Ай бұрын

Explained really well

@punnarahul4068 3 жыл бұрын

great looking for more videos dhaval bhai............../

@arjunprashanth7824 9 ай бұрын

Shouldn't X_scaled be passed inside pca.fit_transform() method? Because if you're passing X, there's no point we did the scaling right?

@ANASS-AHMADD 7 ай бұрын

Exactly, i was about to ask the same question.

@userhandle-u7b 7 ай бұрын

I tried both. When passing X to pca without scaling, I got higher score. But you're right, I also believe to pass X_scaled for parallel comparison.

@RijoSLal 3 ай бұрын

It's not always necessary , if the variance is smaller you can do pca without rescaling it

@AkaExcel 3 жыл бұрын

@codebasics Thank You for Teaching and helping us!

@nastaran1010 Жыл бұрын

Hi. I have a question. why when you perform PCA, for input, you did not give (x_scaled), you gave x?????

@Maniclout 3 жыл бұрын

Amazing explanation, I understand PCA now.

@jjanna07751 2 жыл бұрын

thanku ..pca explained very easily

@salahmahmoud2119 Жыл бұрын

You are the best!!! 👏

@leamon9024 2 жыл бұрын

Thanks for this amazing tutorial. Hope you could do one video about when to use feature selection and feature extraction, or even combination of them.

@ogobuchiokey2978 2 жыл бұрын

Your videos have helped me to complete my MSc research. Thank you for being a great teacher. I do have a question, during the explanation, you said we should always use PCA on the scaled data but during implementation, you used the unscaled data. Could you explain this?

@kreativeaman7688 Жыл бұрын

I had the same question following through.

@kreativeaman7688 Жыл бұрын

I tried using PCA on the scaled data and used it in SVM, Logistic Regression and RandomForest classifier, but the results were almost the same as to using regular data with PCA.

@Mansijswarnkar 2 жыл бұрын

Wonderful, as always - thanks for making this video, it has helped me a lot ! Regards

@MohammadYs77 2 жыл бұрын

Very informative and practical.

@souhamahmoudi7745 2 жыл бұрын

Thanks for sharing, it's highly appreciated

@purebackend1993 3 жыл бұрын

You kill it, amazing!

@DrizzyJ77 8 ай бұрын

Thank you code basics❤

@wavyjones96 2 жыл бұрын

I HAVE SOME QUESTIONS: 1)if you use ur PCA data that has been scaled before doing any train test split...wouldnt it cause Data Lakeage? 2) should not the target be dropped?

@hrithiksarma1204 2 жыл бұрын

I had the same doubt, have you got any update on this ?

@anirudh7150 11 ай бұрын

Thank you Sir. It was really helpful.

@rey40 2 ай бұрын

I was wondering why you did not use the StandardScaled data for the PCA step? Excellent video tho!

@sarangali4595 2 жыл бұрын

Sir also make a video on how PCA actually works and what type of information we can gain from the loadings like how are these features affecting the label

@usamaalicraft3646 3 жыл бұрын

Thanks sir 😊😊

@ogochukwustanleyikegbo2420 Жыл бұрын

After completing the assignment, i got a best score of 0.85 with svm:rbf kernel and after PCA my best score reduced to 0.68 still svm:rbf kernel

@albertoachavalrodriguez2461 2 жыл бұрын

Great video!

@anirudhgangadhar6158 2 жыл бұрын

17:45 - shouldn't it be pca.fit_tranform(X_scaled) instead of X ?

@codebasics 2 жыл бұрын

X is correct, PCA will take care of dimension reduction on the original dataset. However you can also try X_scaled, nothing wrong with it. You may in fact see better results, can you try and please post your findings here?

@anirudhgangadhar6158 2 жыл бұрын

@@codebasics Using X, I got an Accuracy of 97.22%, using X_scaled, I got a slightly lower accuracy of 96.39% which is interesting. I also tried with "StandardScaler" instead of "MinMaxScaler" and observed this trend.

@namantiwari8251 Жыл бұрын

Sir can you please tell which features it reduces How can I get those particular selected (reduced) features as output?

@aaditya1267 Жыл бұрын

nice explanation !!

@krishnadaskv2197 3 жыл бұрын

I am getting around 80 % score when using PCA(0.99999) in exercise, which is higher than the score before using PCA, and also getting a better score without removing outliers.

@codebasics 3 жыл бұрын

That’s the way to go kv, good job working on that exercise

@mr.luvnagpal7407 3 жыл бұрын

Thankyouu so much for this amazing video

@manjularathore1076 3 жыл бұрын

You are absolutely amazing.

@talkingbirb2808 11 ай бұрын

I would add that reducing number of columns should help with overfitting

@nikhilanand9022 Жыл бұрын

Here is my two question for u 1- why u dont scaled the target column means (y) 2- for score as accuracy why u dont compare with actual and predicted u give for score is x_test and y_test why not y_pred and y_test

@vanshoberoi2154 5 ай бұрын

1- y is target it doesnt influence the training as x inputs do , the raw values of y are used to calculate errors (e.g., loss functions) directly. Scaling y is generally unnecessary and could alter the model's predictions in unintended ways. 2- as sir has previously explained.. when u use accuracy_score then we pass ypred and ytest but in model.score it takes xtest,ytest and internally converts xtest to ypred

@TamimXplore 12 күн бұрын

Boss,thanks a lot!

@anirudhgangadhar6158 2 жыл бұрын

Highest accuracy: SVM - 85.83%, after PCA (3 PCs), accuracy was 83.87%. For all 3 models, accuracy slightly (

@sarangali4595 2 жыл бұрын

Sir please also make a video on how to find relations using descriptive technique.

@levimungai1846 3 ай бұрын

I got a question. I understand that what is contained in the PCA array is the loading scores of each feature. why do we use this as our new training data? what do these figures in the PCA array really represent?

@babalolamayowamercy186 2 жыл бұрын

Nice video Thank you

@anonymous-bi6ul 6 ай бұрын

Why didn't you used X_scaled as the parameter to the fit transform function of pca?

@Ooo12376 3 жыл бұрын

Please also explain the math behind it. You get questions on math behind PCA in interviews. People ask the derivation of PCA

@suriyaprakashgopi Жыл бұрын

nicely done

@tamirat9797 11 ай бұрын

Thank you 🙏

@ahsanurrahman8915 Жыл бұрын

Very nicely described ! I have a question: In your example PCA(0.95) reduces the dimension to 29. But, how do we know which dimensions it picked? I am asking this because I want to use PCA to determine the principal drivers in determining the targets.

@akashbhargava906 Жыл бұрын

hey buddy, PCA doesn't pick any existing dimension. It creates new dimensions which by the naked eye won't make much sense to you.

@tobe7602 2 жыл бұрын

Hi Good tutorial, i think you must use X_train in pca.fit_transform and not X. Thanks

@nyangwindicollins1018 3 жыл бұрын

Superb

@tigrayrimey6418 3 жыл бұрын

Nice points.

@pateltapasvi7277 Жыл бұрын

How can I get selected features in dataframe along with its feature name instead of number 1, 2, 3,etc.?

@mohammadhosseinkazemi8558 Жыл бұрын

Thank you for the video. I have one problem, though: Shouldn't we first split the data into training and test sets, then scale each set separately using StandardScaler(), RobustScaler(), etc. ?

@kalluriyaswanthkumar2275 Жыл бұрын

sir you told that we should scale before pca but you are applying pca to non scaled data in code

@slainiae 10 ай бұрын

Highest accuracy 0.8729 with SVM (linear) and with PCA n_components = 11.

@farahamirah2091 2 жыл бұрын

I have question, we can trained model using pca , then how about imbalance dataset? We not need to do imbalance?

@siddheshmhatre2811 Жыл бұрын

Thanks ❤

@ShanthoshKumaarSomiRajesh Жыл бұрын

I have a query to ask. You said we should pass the data to PCA after scaling but you passed the original X instead of X_scaled. Why ??

@dhineshv2590 5 ай бұрын

Since we are using grayscale image, it's already kinda in scaled values(px values).

@dhineshv2590 5 ай бұрын

Since we are using grayscale image, its already kinda in scaled value I guess.

@youktinathbhowmick4673 2 жыл бұрын

Thanks for the explanation. I have one question: When you are doing PCA, you are taking the whole data and after that you are doing train test split. Isn't bit unethical? Again, if I do pca on train data, is same will the same pca can be applied on test data? Is there anyway to store the transformation of PCA to apply that on test data?

@hrithiksarma1204 2 жыл бұрын

I had the same doubt, have you got any update on this ?

@RamveerSingh-el6zl Жыл бұрын

Can you tell how to do varimax rotation?

@krishnapatel8852 Жыл бұрын

hello, if I want to visulize this data in 3D, then what will be z axis ?

@MonilModi10 2 жыл бұрын

Why PCA rotate the axis? What is a significance of that?

@debatradas1597 2 жыл бұрын

Thank you so much

@nriezedichisom1676 11 ай бұрын

Thank you

@wangjessica1275 10 ай бұрын

How to interpret PCA result in regression?

@self.__osman 2 жыл бұрын

Hi. I might not be making any sense here but I wanted to know if same thing could be achieved with entropy and information gain. We know information gain tells you the feature with the most information or importance as a number . Therefore, in theory, we can remove all the features with really low information gains. I think it would this would work with discrete data better. I don't know if it already exists. If it does, what method does this. If it doesn't, can I know if this solution is practical.

@FutureAIDev2015 2 жыл бұрын

I have no idea where to start on the exercise or even what "z-score" means for getting rid of outliers.

@slainiae 11 ай бұрын

Check out video #41 in this video series. That teaches everything about Z Scores.

@zainnaveed267 2 жыл бұрын

sir i have a question how one can predict target values when PCA create all new columns based on its own calculations

@VickyKumar-dk6rd 3 жыл бұрын

feature_names column is now not reflecting in Load_digits() dataset

@sohailshaikh786 2 жыл бұрын

Thanks

@PiiViiDave 2 жыл бұрын

Where is the data coming from?

@Nameiscantsay 4 ай бұрын

From your a$$

@Cat_Sterling 2 жыл бұрын

Should you scale the data before PCA?

@ridoychandraray2413 2 жыл бұрын

How to see which columns are this?

@makoriobed 6 ай бұрын

is it X_pca=pca.fit_transform(X) or X_pca=pca.fit_transform(X_scaled)

@LamNguyen-jp5vh 2 жыл бұрын

Hi, I just want to ask why we use StandardScaler instead of MinMaxScaler in the lecture (not exercise). Thank you so much for your help!

@RiteshKumar-yv8nx 7 ай бұрын

Why didn't you normalise y(i.e. the dataset.target)?

@ujjwalchetan4907 5 ай бұрын

Thanks.

@snehasneha9290 3 жыл бұрын

After reducing the dimensions is it possible to know which columns are selected after applying the PCA in this example we got the 29 features is it possible to know what are those 29 out of total data frame features

@kunalchavan4685 2 жыл бұрын

I am having the same question but I think this are not the same 29 columns out of 64, this are totally different 29 columns which contains data from all 64 columns

@amrutha46 2 жыл бұрын

How do we know which columns got removed?

@einnairo Жыл бұрын

I have a question. After going through scaling, and then PCA, the features are now all different to the original of values between 0 and 16. When I have a new digit to classify and been provided with the same 64 features, how do I make this new prediction?

@ogochukwustanleyikegbo2420 Жыл бұрын

You have to scale the new digit and apply PCA to it before classification

@ajaxx627 3 жыл бұрын

Please I have a problem with some work. I was given a list of words let’s say about 200 different words. And I’m meant to create a code that generates 3 random words each together. Eg wordlist=[a, b, c, d, e,................z] Output should be = a, d, z c, o, x And so on Please how do I do it?

@vtechguruG 2 жыл бұрын

hello! i am confused about scaling of data before pca...what should i do if we have 20% categorical feature in my dataset?do i need to scale them?

@mohamedgad9429 Жыл бұрын

i think you should apply 1 hot encdoding first

@gulnawaz9670 3 жыл бұрын

Hi Sir, very informative video. I have a problem I uploaded a local dataset and when I use code dataset.keys () which shows Index(['Unnamed: 0', 'Flow ID', ............ Now at pd.DataFrame(dataset.data, columns=dataset.feature_names) then it shows an error even I changed data into Unnamed as well but ut occurs the same problem. AttributeError: 'DataFrame' object has no attribute 'data' waiting for your kind reply. Thanks.