Check out our premium machine learning course with 2 Industry projects: codebasics.io/courses/machine-learning-for-data-science-beginners-to-advanced
@Rainbow-lj5pp7 ай бұрын
This is a really easy to understand and thorough explanation of principal component analysis. Many others I watched were either too technical and math theory oriented or to basic in showing how to use the function but not what it does. This is a great balance of understanding and practicality.
@mayishamaliha6369 Жыл бұрын
super helpful for newbies not scaring them off with too many statistical terms and getting overwhelmed. thank u so much
@rakeshbullet7363 Жыл бұрын
Awesome videos - Simple explanations. A balanced approach to teaching with a right mixture of theory and practicals and not overwhelming the learners . i loved the approach - After seeing numerous ML training videos from across the spectrum , this is far most the best one i have seen . Thank you for taking time to create these videos .
@dushyyanta53052 жыл бұрын
You are the best! I am doing PG in DS but still, I watch your videos for better understanding. Kudos! Keep it up!
@AqsaChappalwala6 ай бұрын
Masters in Data Science in the UK and still loves watching only your videos :-)
@mohitupadhayay14392 жыл бұрын
The last few minutes were BANG ON! This is what i wanted to hear. Thanks!
@ernestanonde32182 жыл бұрын
This is the best channel on KZbin. You are simply amazing. You just saved my career. Thanks a million
@ernestanonde3218 Жыл бұрын
@Karthiktanu I am.a student of data science and analytics
@ernestanonde3218 Жыл бұрын
@@Thanusree234 yes
@bhaskarg84382 жыл бұрын
Thank you, PCA concept is clearly explained . Need to understand in actual real life scenarios, what we consider, the performance or process time
@prakashkoneti7630 Жыл бұрын
I would really appreciate for your hard work in making these videos and decoding the complex to easy..
@luciamatamorospava4382 Жыл бұрын
it's like the 10th video i'm watching on PCA and the FIRST one I understand, thank you so much!
@BG4INDIA Жыл бұрын
Impressed with the clarity of explaination
@codebasics Жыл бұрын
glad you liked it
@akinloluwababalola66662 жыл бұрын
Hello Code basics. I usually enjoy your videos as I learn a lot from them. Can you make a video on association rules, apriori algorithms and any machine model that deals with the determination of interrelationships amongst variables? Thank you
@maruthiprasad81842 жыл бұрын
Thank you very much for simple and great explanation. I got higher accuracy in SVM=86.74 %, after PCA I got accuracy in RF=73.06
@pranjiljain45002 жыл бұрын
But timing and machine power also decreases heavily
@mohammeddanishreza4902 Жыл бұрын
can you share the github link for your code please.
@richardshaw83268 ай бұрын
Great explanation on PCA. @codebasics: I must have missed it though, but after running the PCA to identify which features will give the results, I missed where one might get the features.
@swagsterfut99922 жыл бұрын
at 17:35, shouldn't we be doing pca.fit_transform()on our scaled dataset (X_scaled in our case) rather than on X?
@geoafrikana2 жыл бұрын
This came to my mind also. Perhaps the accuracy would have been higher if he scaled before pca.
@anirudhgangadhar61582 жыл бұрын
Yes it should be on X_scaled.
@guillermokinderman8267 Жыл бұрын
I was trying to understand PCA, this video helped me a lot
@mukeshkumaryadav3502 жыл бұрын
It was an amazing explanation of PCA without much mathematics and eigen value and vector which scares me. Interesting learning 1. we can know variance explained by each PC which helps.
@CharzIntLtd3 жыл бұрын
Thanks sir the great work, your explanation makes ML easier for sure 🙏
@boogersincoffee2 жыл бұрын
Ahhhhhh I've been struggling to understand this and this cleared everything up, thank you
@TK-fx8dh3 жыл бұрын
My long await topic!!!! Thank you for posting this PCA lesson
@ogobuchiokey29782 жыл бұрын
Your videos have helped me to complete my MSc research. Thank you for being a great teacher. I do have a question, during the explanation, you said we should always use PCA on the scaled data but during implementation, you used the unscaled data. Could you explain this?
@kreativeaman768810 ай бұрын
I had the same question following through.
@kreativeaman768810 ай бұрын
I tried using PCA on the scaled data and used it in SVM, Logistic Regression and RandomForest classifier, but the results were almost the same as to using regular data with PCA.
@gnaneshgn83413 жыл бұрын
Nice Video sir. Please make a video for the math behind PCA. Thanks in Advance Sir
@geekyprogrammer48313 жыл бұрын
kzbin.info/www/bejne/fJjEnI2ta7Bkh7M this should be sufficient if you want to know mathematics
@shubhanshugupta97542 жыл бұрын
U can get best of pcs by taking log(total features)
@anirudhgangadhar61582 жыл бұрын
Highest accuracy: SVM - 85.83%, after PCA (3 PCs), accuracy was 83.87%. For all 3 models, accuracy slightly (
@asamadawais Жыл бұрын
I am watching this vedio 2nd or 3rd time. @Dhavel you best among equals...👍👍
@wavyjones962 жыл бұрын
I HAVE SOME QUESTIONS: 1)if you use ur PCA data that has been scaled before doing any train test split...wouldnt it cause Data Lakeage? 2) should not the target be dropped?
@hrithiksarma12042 жыл бұрын
I had the same doubt, have you got any update on this ?
@slainiae7 ай бұрын
Highest accuracy 0.8729 with SVM (linear) and with PCA n_components = 11.
@dees900 Жыл бұрын
great explanation on PCA. It's an abstract concept to grasp. well done
@ogochukwustanleyikegbo2420 Жыл бұрын
After completing the assignment, i got a best score of 0.85 with svm:rbf kernel and after PCA my best score reduced to 0.68 still svm:rbf kernel
@levimungai18463 күн бұрын
I got a question. I understand that what is contained in the PCA array is the loading scores of each feature. why do we use this as our new training data? what do these figures in the PCA array really represent?
@namantiwari8251 Жыл бұрын
Sir can you please tell which features it reduces How can I get those particular selected (reduced) features as output?
@talkingbirb28088 ай бұрын
I would add that reducing number of columns should help with overfitting
@leamon9024 Жыл бұрын
Thanks for this amazing tutorial. Hope you could do one video about when to use feature selection and feature extraction, or even combination of them.
@nastaran10109 ай бұрын
Hi. I have a question. why when you perform PCA, for input, you did not give (x_scaled), you gave x?????
@arjunprashanth78246 ай бұрын
Shouldn't X_scaled be passed inside pca.fit_transform() method? Because if you're passing X, there's no point we did the scaling right?
@ANASS-AHMADD4 ай бұрын
Exactly, i was about to ask the same question.
@userhandle-u7b4 ай бұрын
I tried both. When passing X to pca without scaling, I got higher score. But you're right, I also believe to pass X_scaled for parallel comparison.
@RijoSLal23 күн бұрын
It's not always necessary , if the variance is smaller you can do pca without rescaling it
@krishnadaskv21973 жыл бұрын
I am getting around 80 % score when using PCA(0.99999) in exercise, which is higher than the score before using PCA, and also getting a better score without removing outliers.
@codebasics2 жыл бұрын
That’s the way to go kv, good job working on that exercise
@jjanna077512 жыл бұрын
thanku ..pca explained very easily
@jinks36692 жыл бұрын
Another very informative video. DHANYAVAAD ! :)
@Maniclout3 жыл бұрын
Amazing explanation, I understand PCA now.
@sarangali45952 жыл бұрын
Sir also make a video on how PCA actually works and what type of information we can gain from the loadings like how are these features affecting the label
@mohammadhosseinkazemi8558 Жыл бұрын
Thank you for the video. I have one problem, though: Shouldn't we first split the data into training and test sets, then scale each set separately using StandardScaler(), RobustScaler(), etc. ?
@ahsanurrahman8915 Жыл бұрын
Very nicely described ! I have a question: In your example PCA(0.95) reduces the dimension to 29. But, how do we know which dimensions it picked? I am asking this because I want to use PCA to determine the principal drivers in determining the targets.
@akashbhargava90610 ай бұрын
hey buddy, PCA doesn't pick any existing dimension. It creates new dimensions which by the naked eye won't make much sense to you.
@nikhilanand9022 Жыл бұрын
Here is my two question for u 1- why u dont scaled the target column means (y) 2- for score as accuracy why u dont compare with actual and predicted u give for score is x_test and y_test why not y_pred and y_test
@vanshoberoi21542 ай бұрын
1- y is target it doesnt influence the training as x inputs do , the raw values of y are used to calculate errors (e.g., loss functions) directly. Scaling y is generally unnecessary and could alter the model's predictions in unintended ways. 2- as sir has previously explained.. when u use accuracy_score then we pass ypred and ytest but in model.score it takes xtest,ytest and internally converts xtest to ypred
@AkaExcel3 жыл бұрын
@codebasics Thank You for Teaching and helping us!
@punnarahul40683 жыл бұрын
great looking for more videos dhaval bhai............../
@DrizzyJ775 ай бұрын
Thank you code basics❤
@anonymous-bi6ul4 ай бұрын
Why didn't you used X_scaled as the parameter to the fit transform function of pca?
@anirudh71508 ай бұрын
Thank you Sir. It was really helpful.
@souhamahmoudi7745 Жыл бұрын
Thanks for sharing, it's highly appreciated
@mansijswarnkar43892 жыл бұрын
Wonderful, as always - thanks for making this video, it has helped me a lot ! Regards
@purebackend19932 жыл бұрын
You kill it, amazing!
@Ooo123763 жыл бұрын
Please also explain the math behind it. You get questions on math behind PCA in interviews. People ask the derivation of PCA
@salahmahmoud2119 Жыл бұрын
You are the best!!! 👏
@MohammadYs772 жыл бұрын
Very informative and practical.
@youktinathbhowmick46732 жыл бұрын
Thanks for the explanation. I have one question: When you are doing PCA, you are taking the whole data and after that you are doing train test split. Isn't bit unethical? Again, if I do pca on train data, is same will the same pca can be applied on test data? Is there anyway to store the transformation of PCA to apply that on test data?
@hrithiksarma12042 жыл бұрын
I had the same doubt, have you got any update on this ?
@tobe76022 жыл бұрын
Hi Good tutorial, i think you must use X_train in pca.fit_transform and not X. Thanks
@pateltapasvi727711 ай бұрын
How can I get selected features in dataframe along with its feature name instead of number 1, 2, 3,etc.?
@sarangali45952 жыл бұрын
Sir please also make a video on how to find relations using descriptive technique.
@kalluriyaswanthkumar2275 Жыл бұрын
sir you told that we should scale before pca but you are applying pca to non scaled data in code
@self.__osman2 жыл бұрын
Hi. I might not be making any sense here but I wanted to know if same thing could be achieved with entropy and information gain. We know information gain tells you the feature with the most information or importance as a number . Therefore, in theory, we can remove all the features with really low information gains. I think it would this would work with discrete data better. I don't know if it already exists. If it does, what method does this. If it doesn't, can I know if this solution is practical.
@LamNguyen-jp5vh2 жыл бұрын
Hi, I just want to ask why we use StandardScaler instead of MinMaxScaler in the lecture (not exercise). Thank you so much for your help!
@farahamirah2091 Жыл бұрын
I have question, we can trained model using pca , then how about imbalance dataset? We not need to do imbalance?
@manjularathore10763 жыл бұрын
You are absolutely amazing.
@mr.luvnagpal74072 жыл бұрын
Thankyouu so much for this amazing video
@albertoachavalrodriguez24612 жыл бұрын
Great video!
@nriezedichisom16768 ай бұрын
Thank you
@sohailshaikh7862 жыл бұрын
Thanks
@usamaalicraft36463 жыл бұрын
Thanks sir 😊😊
@tamirat97978 ай бұрын
Thank you 🙏
@debatradas15972 жыл бұрын
Thank you so much
@nyangwindicollins10182 жыл бұрын
Superb
@BG4INDIA Жыл бұрын
Hi Mr. Dhaval, I am so thankful for sharing such a good informative video. Like "ogobuchiokey2978" even i wanted to know, if there is a specific reason of not selecting X_scaled while fitting into PCA? In the above demo, if I fit raw X, I get 29 new PCA-features but if i fit scaled_X i get new 40 PCA-features. Similarly through your exercise, if I fit scaled_X I get 10 features (only 1 attribute is reduced) with Accuracy of 85% and if i fit raw X, i get 2 attributes, but accuracy dips down to 69%(Random Forest) I believe this depends on the data as well.
@Cat_Sterling Жыл бұрын
Should you scale the data before PCA?
@ujjwalchetan49072 ай бұрын
Thanks.
@girishtripathy33542 жыл бұрын
The dimensions it should get reduced to, isn't it another hyper parameter? For 2 dimensions yeah you can visualize. For > 2, visualization is not possible. How can you decide what dimension you should reduce your dataset to?
@suriyaprakashgopi Жыл бұрын
nicely done
@bommubhavana87942 жыл бұрын
Hello, I have newly started working on a PCR project. I am stuck at a point and could really use some help...asap Thanks a lot in advance. I am working on python. So we have created PCA instance using PCA(0.85) and transformed the input data. We have run a regression on principal components explaining 85 percent variance(Say N components). Now we have a regression equation in terms of N PCs. We have taken this equation and tried to express it in terms of original variables. Now, In order to QC the coefficients in terms of original variables, we tried to take the N components(85% variance) and derived the new data back from this, and applied regression on this data hoping that this should give the same coefficients and intercept as in the above derived regression equation. The issue here is that the coefficients are not matching when we take N components but when we take all the components the coefficients and intercept are matching exactly. Also, R squared value and the predictions provided by these two equations are exactly same even if the coefficients are not matching I am soo confused right now as to why this is happening. I might be missing out on the concept of PCA at some point. Any help is greatly appreciated.Thank you!
@aaditya1267 Жыл бұрын
nice explanation !!
@babalolamayowamercy1862 жыл бұрын
Nice video Thank you
@zainnaveed2672 жыл бұрын
sir i have a question how one can predict target values when PCA create all new columns based on its own calculations
@siddheshmhatre2811 Жыл бұрын
Thanks ❤
@makoriobed3 ай бұрын
is it X_pca=pca.fit_transform(X) or X_pca=pca.fit_transform(X_scaled)
@HT-xt4cn3 ай бұрын
What was the purpose of scaling X at 14:18?
@MonilModi10 Жыл бұрын
Why PCA rotate the axis? What is a significance of that?
@krishnapatel8852 Жыл бұрын
hello, if I want to visulize this data in 3D, then what will be z axis ?
@tigrayrimey64182 жыл бұрын
Nice points.
@mayank66 Жыл бұрын
amajing
@RiteshKumar-yv8nx4 ай бұрын
Why didn't you normalise y(i.e. the dataset.target)?
@ShanthoshKumaarSomiRajesh10 ай бұрын
I have a query to ask. You said we should pass the data to PCA after scaling but you passed the original X instead of X_scaled. Why ??
@dhineshv25902 ай бұрын
Since we are using grayscale image, it's already kinda in scaled values(px values).
@dhineshv25902 ай бұрын
Since we are using grayscale image, its already kinda in scaled value I guess.
@gulnawaz96702 жыл бұрын
Hi Sir, very informative video. I have a problem I uploaded a local dataset and when I use code dataset.keys () which shows Index(['Unnamed: 0', 'Flow ID', ............ Now at pd.DataFrame(dataset.data, columns=dataset.feature_names) then it shows an error even I changed data into Unnamed as well but ut occurs the same problem. AttributeError: 'DataFrame' object has no attribute 'data' waiting for your kind reply. Thanks.
@snehasneha92902 жыл бұрын
After reducing the dimensions is it possible to know which columns are selected after applying the PCA in this example we got the 29 features is it possible to know what are those 29 out of total data frame features
@kunalchavan46852 жыл бұрын
I am having the same question but I think this are not the same 29 columns out of 64, this are totally different 29 columns which contains data from all 64 columns
@RamveerSingh-el6zl Жыл бұрын
Can you tell how to do varimax rotation?
@ajaxx6273 жыл бұрын
Please I have a problem with some work. I was given a list of words let’s say about 200 different words. And I’m meant to create a code that generates 3 random words each together. Eg wordlist=[a, b, c, d, e,................z] Output should be = a, d, z c, o, x And so on Please how do I do it?
@sharmilasenguptachowdhry5092 жыл бұрын
Thanks v m! can you pls help explain Eigen values and Eigen vectors from the data science perspective? thanks again
@taimoorneutron29403 жыл бұрын
hello sir, i am 27 now and masters is in progress sir i have teaching experience. but now i want to start my career in Machine learning or data science ? so it is possible? every company needs new fresh comers so what should i do?
@vanshoberoi21542 ай бұрын
doesnt 0.95 ie 95% retention means 60 out of 64 features should have been retained... why how 25
@jayuchawla18922 жыл бұрын
you applied pca on normal dataframe whereas in theory you explained we need to apply on scaled dataframe
@VickyKumar-dk6rd2 жыл бұрын
feature_names column is now not reflecting in Load_digits() dataset
@MyManiratnam3 жыл бұрын
Hi, I have seen your videos on PCA they are really informative and your explanation is really cool. I have a doubt, we apply PCA on the dataset and later we go for model fitting for example if it is a classification problem we go for classification model. Here my doubt is, after building a model we validate it with test set, after that if I have new observation i.e new row in the dataset, how to predict my label?
@kirubakaran614510 ай бұрын
hello Maniratnam, have you got the answer the above question?.
@FutureAIDev20152 жыл бұрын
I have no idea where to start on the exercise or even what "z-score" means for getting rid of outliers.
@slainiae8 ай бұрын
Check out video #41 in this video series. That teaches everything about Z Scores.