Dimensional Reduction| Principal Component Analysis

  Рет қаралды 164,465

Krish Naik

Krish Naik

Күн бұрын

Here is a detailed explanation of the Dimesnioanlity Reduction using Principal Component Analysis.
Github link: github.com/kri...
Please subscribe the channel
/ @krishnaik06
Machine Learning Playlist: • Data Science and Machi...
You can buy my book where I have provided a detailed explanation of how we can use Machine Learning, Deep Learning in Finance using python
Packt url : prod.packtpub....
Amazon url: www.amazon.com...

Пікірлер: 134
@shushantgambhir2002
@shushantgambhir2002 7 ай бұрын
This is one of the best videos on Internet for this topic. Can't thank you enough sir.
@kevinkennynatashawilfredpa9023
@kevinkennynatashawilfredpa9023 Жыл бұрын
Thank you Krish, for the concise and clear explanation!
@bea59kaiwalyakhairnar37
@bea59kaiwalyakhairnar37 2 жыл бұрын
Sir, video is very helpful. Analysis is very helpful because analysis is very perfect
@kamalkantverma6252
@kamalkantverma6252 5 жыл бұрын
Thanks for making this type of content. You explain things in a very clear and easy way
@AkshaykumarPatilAkki
@AkshaykumarPatilAkki 4 жыл бұрын
super Explanation Anna . You rocked data science.
@justusndegwa
@justusndegwa 2 жыл бұрын
Fantastic. Thanks from Nairobi Krish.
@hassamsiddiqui3373
@hassamsiddiqui3373 2 жыл бұрын
Amazingly explained video sir keep it up.
@vineetsansi
@vineetsansi 5 жыл бұрын
I think its better to mention how much variance you want keep rather then mentioning number of components. For eg - PCA(.80) # this will maintain 80% variance and will create necessary principal components to keep 80% variance. Hope this is helpful
@alankarshukla4385
@alankarshukla4385 4 жыл бұрын
Can we use always PCA for creating our ML model?
@rehansiddique1875
@rehansiddique1875 4 жыл бұрын
@@alankarshukla4385 No we can not always use PCA , we only use it when we have to many number of features or variables.
@manusingh9007
@manusingh9007 4 жыл бұрын
@@rehansiddique1875 Why PCA is only applicable for Unsupervised models ?
@rehansiddique1875
@rehansiddique1875 4 жыл бұрын
@@manusingh9007 you can also use it with supervised model
@srashtisingh1799
@srashtisingh1799 3 жыл бұрын
Excellent!! Your full channel is extremely helpful. Very well explained.
@jazzorcazz
@jazzorcazz 2 жыл бұрын
very good explanation. thank you so much !
@vishalsharda7508
@vishalsharda7508 2 жыл бұрын
Thanks for this video.👌👌👌
@MrSubhransusekhar
@MrSubhransusekhar 3 жыл бұрын
Beautifully explained
@betanapallisandeepra
@betanapallisandeepra 3 жыл бұрын
Wonderful.. thank you for doing it sir
@jongcheulkim7284
@jongcheulkim7284 2 жыл бұрын
Thank you. This is very helpful.
@osho2810
@osho2810 2 жыл бұрын
Thanks sir... it is great.....
@pritamgorain8365
@pritamgorain8365 4 жыл бұрын
Thanks for the video krish, But wondering, fresher like me would get puzzled in so many techniques of doing feature selection, it would be great if you tell us which feature selection technique to be used and when.. Regards Pritam
@souhamahmoudi7745
@souhamahmoudi7745 2 жыл бұрын
Thanks for sharing!
@sandipansarkar9211
@sandipansarkar9211 4 жыл бұрын
Great .Now I have completed my practice inside jupyter notebook successfully. Cheers
@dilipgawade9686
@dilipgawade9686 5 жыл бұрын
This is super useful video Krish..
@adityasingh788
@adityasingh788 5 жыл бұрын
Thank you for putting the video back :)
@pankajgoikar4158
@pankajgoikar4158 2 жыл бұрын
Thank you so much Sir.
@gopalakrishna9510
@gopalakrishna9510 5 жыл бұрын
if datahaving catagarical variable what we have to do?
@HonestADVexplorer
@HonestADVexplorer 4 жыл бұрын
great explanation! Thanks Krish
@nikitasinha8181
@nikitasinha8181 2 жыл бұрын
Thank you so much sir
@shashiaradya6905
@shashiaradya6905 5 жыл бұрын
Super explanation sir.
@Sir_AD
@Sir_AD 9 ай бұрын
can we check which two features are selected from 30?
@jayeshkumar2604
@jayeshkumar2604 3 жыл бұрын
awesome content... amazing
@CFATrainer
@CFATrainer 5 ай бұрын
excellent
@ankitachaudhari99
@ankitachaudhari99 3 жыл бұрын
Explained well!
@saikatroy3818
@saikatroy3818 4 жыл бұрын
Thanks for the nice representation with hands_on.
@manojnahak7776
@manojnahak7776 5 жыл бұрын
Using PCA the number of Dimensions can be reduced, but can you pls tell us on what basis these Dimensions/variables are reduced? Is it the Entropy value? or some other things....
@abul4933
@abul4933 3 жыл бұрын
He explained it at first, the reduction is based on the projection of the data .. You can think about it like the shadow of the data in 2 dimensions
@johannesmphaka7433
@johannesmphaka7433 5 жыл бұрын
thanks for your videos i'm learning alot from you. can you prove that when you increase the number of dimensions the model accuracy decreases. again is it necessary to reduce my dimensions if i have few dimension like 5D. will i still improve my model if 5D was reduced to 2dimensions.
@soumendradash5979
@soumendradash5979 5 жыл бұрын
I have some doubts...first can we apply pca for categorical data? Second i wish to know as to how can we calculate the optimum number for n-components? Do we have to calculate the variance explained by manually trying out different values for n-component?
@raj_harsh_
@raj_harsh_ 2 жыл бұрын
You can use np.cumsum(variance) to see how many components are explaining how much variance. Let's say 7 components are explaining the variance by 80% so use these 7 features for your model.
@sunilc8684
@sunilc8684 4 жыл бұрын
Best explanation of PCA . Could you please make an video on Linear Discriminant Analysis. Also please explain the Eigen vector and Eigen value concept behind PCA.
@deepcontractor6968
@deepcontractor6968 5 жыл бұрын
Make a video on pcr
@mehnaztabassum1878
@mehnaztabassum1878 3 жыл бұрын
@krish naik, could you pls tell that whether PCA is learnable/ trainable?
@af121x
@af121x 3 жыл бұрын
Thank you Krish * 1 million
@sandipansarkar9211
@sandipansarkar9211 4 жыл бұрын
great explanation. Need to get my hands dirty in Jupyter notebook. thanks
@jetendramulinti6443
@jetendramulinti6443 5 жыл бұрын
Super video😀
@krishnaik06
@krishnaik06 5 жыл бұрын
Thanks
@vijaymurugesan581
@vijaymurugesan581 5 жыл бұрын
Hi Krish, your videos are great!! Thanks a ton :)
@the_imposter_analyst
@the_imposter_analyst 5 жыл бұрын
how can you determine the optimal number of components you should reduce your features to? love your tutorials btw!!!
@rajanbalki7553
@rajanbalki7553 4 жыл бұрын
Generally, we use scree plot for this. You can plot it using the explained_variance_ratio_ method in pca in sklearn.
@linuxtubers7313
@linuxtubers7313 4 жыл бұрын
How to decide the feature number for pca?
@KnowledgeAmplifier1
@KnowledgeAmplifier1 3 жыл бұрын
Hello Linux Tubers , that depends on how much variance you want to capture after dimensional reduction (more variance == preserving more information).. I created a script for this , might be helpful to you in you are using MATLAB -- kzbin.info/www/bejne/n36rhZqtiat9oLM Happy Learning :-)
@tagoreji2143
@tagoreji2143 2 жыл бұрын
tqsm sir.
@MonilModi10
@MonilModi10 2 жыл бұрын
Why PCA rotates the axis? What is a significance of that?
@gurinderpartapsingh8694
@gurinderpartapsingh8694 5 жыл бұрын
Hello sir I have a question....how can we sure that we have to apply 2 PCA....why we are neglecting other features bcz somehow may be other features are important for the model...????Plse answer this sir.....I'm in doubt
@varunkukade7971
@varunkukade7971 4 жыл бұрын
I also have same doubt
@deepjyotisaikia382
@deepjyotisaikia382 4 жыл бұрын
We are not neglecting any features , PCA by no means that we are discarding some of the features to reduce the dimensions. In PCA we are generally creating linear relationship among all the features. and finally those number of principal components who can explain the maximum variance are selected.
@vinyasshreedhar9833
@vinyasshreedhar9833 3 жыл бұрын
For each variable if orthogonal line gives huge loss of variance, then for all the 30 features, we can only take the 1st component right? Why do we have to even consider the 2nd component? Please provide your insights.
@divyanipal2705
@divyanipal2705 5 жыл бұрын
hi I have question what if we have features more then 100 , then how we will decide how many n_component we will take . I mean is there any methodlogy to decide n_component
@swathys7818
@swathys7818 5 жыл бұрын
Step 1: Do PCA with n_components = none step 2: Now view the explained_variance_ratio for default say 10 PCA components step 3: Find out the maximum variance explained by summing how many numbers of components that is (n_components). Example : By default say u have 5 components:(0.70,0.10,0.08,........) Now your first 3 components can explain 88% of your total variance hence with this u can decide PCA(n_components = 3)
@SantoshKumar-fr5tm
@SantoshKumar-fr5tm 5 жыл бұрын
Hi Krish, nice way you explained it. Thanks. But I have one question, how we can find out the efficiency for PCA, for example, how we can compare that reducing to 2 dimensions is not fruitful as reducing to 3 dimensions. in other words, how can we be sure that not much information is lost by PCA..
@TheMangz1611
@TheMangz1611 3 жыл бұрын
should go into more maths and how its working... anyone can fit n transform..
@ramleo1461
@ramleo1461 5 жыл бұрын
Hi Krish, Your videos are very useful, thank you for the videos, I have a doubt, the reason we are dng pca is to reduce the number of features right??... So how we wil know which are the features from the given data are useful while applying different models on our data?
@g.sai_koushik
@g.sai_koushik Жыл бұрын
We can check with the help of Corelation if there is some corelation whether it might positive or negative if it has corelation than we could say that feature will be useful for us. Coming to PCA its not what you think, PCA Comesup with a value using the existing variables and we use that PCA derived variable for analysis.
@sathvikjoel1525
@sathvikjoel1525 5 жыл бұрын
Which system are you using??
@prernanichani8516
@prernanichani8516 4 жыл бұрын
Hi, How to choose the correct n_components during PCA. For eg. I have 80 features in the dataset how do I choose the n_components. Is there any logic to select the number of components.
@KK-rh6cd
@KK-rh6cd 3 жыл бұрын
I have the same question, searching for it and I found this stackoverflow.com/questions/12067446/how-many-principal-components-to-take
@fitbeat8231
@fitbeat8231 5 жыл бұрын
hello sir , I have two columns ID_code , target and There are total 200000 observations in the dataset and 202 features. how can i apply pca to this dataset. all data in numeric
@sakshamshivhare2474
@sakshamshivhare2474 2 жыл бұрын
Is there any math behind selecting number of components in PCA
@saurabhbarasiya4721
@saurabhbarasiya4721 5 жыл бұрын
Great sir
@kushkumar6467
@kushkumar6467 3 жыл бұрын
Thanks for the nice video, I had one doubt. So how do we decide when to apply PCA? Let's say when the features are 2, 3, or more than 3. Is there any constant number of features for that and can you explain the math behind it? Kudos and cheers mate!
@saipavan5194
@saipavan5194 3 жыл бұрын
Based on the requirement with certain no.of columns you need .
@vishalverma5837
@vishalverma5837 2 жыл бұрын
I would suggest not applying PCA based on a number of features. Instead, we should apply PCA in the following scenarios: 1. to reduce the memory space for the data set. 2. to improve the learning speed of the algorithm. 3. to visualize high-dimensional data to 2d or 3d plots.
@prakashdwivedy5739
@prakashdwivedy5739 5 жыл бұрын
can pca be used with Multiple linear regression??
@DanielWeikert
@DanielWeikert 6 жыл бұрын
why does pca actually require perpendicular lines for the second, third,... component?
@krishnaik06
@krishnaik06 6 жыл бұрын
One way of stating the goal of PCA is to find the linear projection that gives you the "best" representation of your data for a given dimensionality. It defines "best" by the representation with the minimal squared reconstruction error. When looking at PCA from 2 dimensions to 1 dimension, as you do there, you are not actually trying to find the line that best predicts y from x. Rather, you're trying to find the combination of y and x such that the new, combined value "best" represents all your initial 2-D points. Essentially, the reason PCA considers the perpendicular distance is because it doesn't actually try to model yy as a function of xx.
@DanielWeikert
@DanielWeikert 6 жыл бұрын
@@krishnaik06 thanks
@md.ahsanulkabirarif5448
@md.ahsanulkabirarif5448 4 жыл бұрын
Without finding Eigen value & Eigenmatrics, how can you determine that n_compnents = 2?
@basavarajpatil9821
@basavarajpatil9821 3 жыл бұрын
Why krish has not answered this, actually this is the good question
@llmstr
@llmstr 3 жыл бұрын
i think you could try to plot the scree plot so that you know how many components are representative enough
@SoumendraBagh
@SoumendraBagh 3 жыл бұрын
Here the original data is 2 Dimensional. So after applying PCA we cannot go beyond the original number of dimensions. So N=2 is the maximum PCA can generate. To get more number of dimensions than the original dataset through transformation is something we can get through Kernel Function. Kernel function which are used in Kernel SVM functions are used to project the data in a higher dimension so that they will have a clearly separable boundary so a linear classifier boundary can be drawn with margin for SVC classifier. Hence, PCA is for compression and Kernel is for the opposite
@monikajain7803
@monikajain7803 4 жыл бұрын
Sir , so as per your explanation can i say that while transforming an image we should select PC 1 only as there is less data loss in that.
@bharathkumar5870
@bharathkumar5870 4 жыл бұрын
for visualization in 2d..first two pc lines are enough....
@abhijeetjain8228
@abhijeetjain8228 2 жыл бұрын
@@bharathkumar5870 but how to decide pc1 line at the first.??
@sachinx30
@sachinx30 5 жыл бұрын
In this playlist, before video is private. Can we have it?
@apekshaagnihotri5124
@apekshaagnihotri5124 5 жыл бұрын
This is a great content Krish. I don't understand how do we interpret the two features? Would someone please explain me the final graph?
@SivaKumar-ny8pg
@SivaKumar-ny8pg 3 жыл бұрын
The features could be a transform of other features mapped on new axis. You can't say there is one to one mapping to original ones. When you predict it is important you apply PCA on predict input before passing to ml algo
@debatradas1597
@debatradas1597 3 жыл бұрын
Thanks
@swethakulkarni3563
@swethakulkarni3563 5 жыл бұрын
Why don't you start community on Slack?
@darpanaswal9291
@darpanaswal9291 2 жыл бұрын
That would be awesome
@lovefrommars7468
@lovefrommars7468 4 жыл бұрын
Still confused.... I didn't understand ...when we plot a perpendicular line and project the points on that line...it means we are creating one feature ...so what will be the values for that features?
@maneeshdhardwivedi
@maneeshdhardwivedi 4 жыл бұрын
Thanks for the video Krish. One question:How we come to knw which feature we have to select as principal component and why during scatter plotting other feature do not work in place of cancer['Target']
@KVishya
@KVishya 5 жыл бұрын
Hi Krish, great video, I have one question, How do you decide the n_components value? Is there an ideal value or should it be decided based on the initial number of features?
@swathys7818
@swathys7818 5 жыл бұрын
Step 1: Do PCA with n_components = none step 2: Now view the explained_variance_ratio for default say 10 PCA components step 3: Find out the maximum variance explained by summing how many numbers of components that is (n_components). Example :By default say u have 5 components:(0.70,0.10,0.08,........) Now your first 3 components can explain 88% of your total variance hence with this u can decide PCA(n_components = 3)
@socially_apt
@socially_apt 4 жыл бұрын
There is something called scree plot. Read about it. You actually pick up the pc number where your variance explained becomes constant. It is drawn with respect to cumulative variance explained on y axis and pc number on X . Some people also use Eigen values instead of variance explained. Give you the same thing. @krishnaik wanna comment?
@socially_apt
@socially_apt 4 жыл бұрын
My question is why don't you consider PCA a ml technique. I have used PCA for unsupervised clustering and achieved amazing results
@swathys7818
@swathys7818 4 жыл бұрын
@@socially_apt yes true but it purely depends on data
@chillbro2432
@chillbro2432 3 жыл бұрын
@@swathys7818 You might have explained well but I didn't get what you said. Could you please elaborate a bit more. Thanks in advance.
@itsmesuchethanv
@itsmesuchethanv 4 жыл бұрын
How to find how many features obtained from an image if the size of image is 100*100?
@JaiSreeRam466
@JaiSreeRam466 5 жыл бұрын
How to convert the input values to two features to predict whether person has cancer or not
@banduriambrish5092
@banduriambrish5092 3 жыл бұрын
it is not predication problem.
@tanaygupta2865
@tanaygupta2865 3 жыл бұрын
Did we remove the target/ output feature from the data set before applying PCA? @krish
@banduriambrish5092
@banduriambrish5092 3 жыл бұрын
no,
@ShahzadQureshii
@ShahzadQureshii 3 жыл бұрын
Pca vs feature selection?
@nehamanpreet1044
@nehamanpreet1044 4 жыл бұрын
What is the difference between PCA and SVD ???
@anands2239
@anands2239 4 жыл бұрын
Nice video. can you also apply a linear agression on top of the pca and show some sample? i mean, do a test, train run and predict? just to see how it works?
@manjunath.c2944
@manjunath.c2944 5 жыл бұрын
hi krish kindly do video on ARIMA Model
@equbalmustafa
@equbalmustafa 3 жыл бұрын
How to decide n_components?
@prachinainawa3055
@prachinainawa3055 3 жыл бұрын
How will I know what should I set as the value of n_components? How do you decide to reduce 30 features to "2" features only?
@datascience3046
@datascience3046 2 жыл бұрын
SAME PROBLEM I M FACING
@manjunath.c2944
@manjunath.c2944 5 жыл бұрын
super krish
@pablo_CFO
@pablo_CFO 5 жыл бұрын
So ... if we apply PCA to reduce the number of dimensions of our dataset, and then create a model to predict a class (as in the cancer dataset), what happens if we receive information from a new patient and we need to make the classification? In other words, how do we handle the new data given to us in the original format (all features) if our classification algorithm is based on the new variables of the PCA?
@adityakyatham
@adityakyatham 4 жыл бұрын
we have to apply pca on every new patient and then send it to model
@asutoshghanto3419
@asutoshghanto3419 3 жыл бұрын
we need to find totally independant components in those features which can only be obtained from eign values.
@harishvijay8490
@harishvijay8490 5 жыл бұрын
clear explanation and keep uploading video like this
@mercyjhansi8190
@mercyjhansi8190 2 жыл бұрын
Sir, one question. Do we always use only Standard Scaler before PCA, even if some of the features are highly skewed. Or can we use robust scaler in that case?
@zicobanerjee
@zicobanerjee 2 жыл бұрын
always use standard scaler. PCA basically picks out eigen vectors which work best with scaled numbers.
@kalpanaregmi2137
@kalpanaregmi2137 4 жыл бұрын
Hello sir I have a question, PCA....why we are neglecting other features because somehow may be other features that are important for the model...???? On what bases the particular column is selected Please answer this sir.....I'm in doubt
@abhijeetjain8228
@abhijeetjain8228 2 жыл бұрын
he never reply..
@mdashad1582
@mdashad1582 4 жыл бұрын
how i find this actual dataset?
@pmanojkumar5260
@pmanojkumar5260 5 жыл бұрын
Thanks Bhai..
@florinaling2902
@florinaling2902 4 жыл бұрын
Can you do PCA based on regression problem? Because Im curious how different it is if implement PCA in supervised learning as I saw in some articles.
@manusingh9007
@manusingh9007 4 жыл бұрын
Same I am Thinking
@florinaling2902
@florinaling2902 4 жыл бұрын
@@manusingh9007 i have seen from the mathematics side of it and seems to be quite confusing. Hope somebody would do more vdeo about this
@midhileshmomidi2434
@midhileshmomidi2434 5 жыл бұрын
I have a doubt that whenever we have many features in any dataset do we have to use PCA compulsory?
@bhanuPrakash-yo5wd
@bhanuPrakash-yo5wd 5 жыл бұрын
Yes
@mallarapubharath
@mallarapubharath 4 жыл бұрын
could you please explain PCA Much More Mathematically like explaining Eigean Vectors,eigean Values....
@Nifty1976
@Nifty1976 4 жыл бұрын
PCA is statistical technique first invented in 1901 by Prof R A Fisher
@bharteshtandon5095
@bharteshtandon5095 4 жыл бұрын
why we don't use PCA in every project to reduce the dimensions.. when to apply PCA??
@rehansiddique1875
@rehansiddique1875 4 жыл бұрын
No we can not always use PCA , we only use it when we have to many number of features or variables.
@kulamanisahoo4785
@kulamanisahoo4785 4 жыл бұрын
PCA also used some computaion in the background.if the features are less it will not give much benefits
@ankitachaudhari99
@ankitachaudhari99 2 жыл бұрын
Good crisp explanation but not detailed.
@thepresistence5935
@thepresistence5935 3 жыл бұрын
bro i think you know tamil if possible explain only one video in tamil- aravind
@mayankgupta1728
@mayankgupta1728 Жыл бұрын
Thanks
Principle Component Analysis (PCA) using sklearn and python
12:30
Cheerleader Transformation That Left Everyone Speechless! #shorts
00:27
Fabiosa Best Lifehacks
Рет қаралды 16 МЛН
Sigma Kid Mistake #funny #sigma
00:17
CRAZY GREAPA
Рет қаралды 30 МЛН
Standardization Vs Normalization- Feature Scaling
12:52
Krish Naik
Рет қаралды 307 М.
Data Analysis 6: Principal Component Analysis (PCA) - Computerphile
20:09
StatQuest: Principal Component Analysis (PCA), Step-by-Step
21:58
StatQuest with Josh Starmer
Рет қаралды 3 МЛН
Principal Component Analysis (PCA)
13:46
Steve Brunton
Рет қаралды 407 М.
PCA : the math - step-by-step with a simple example
20:22
TileStats
Рет қаралды 120 М.
Principal Component Analysis (PCA)
6:28
Visually Explained
Рет қаралды 244 М.
I Tried 50 Data Analyst Courses. Here Are Top 5
8:41
Stefanovic
Рет қаралды 231 М.
Principal Component Analysis (PCA) - easy and practical explanation
10:56