Principal components analysis in R

Рет қаралды 158,907

Hefin Rhys

Күн бұрын

Пікірлер: 209

@vplougoboy 4 жыл бұрын

Noone explains R better than Hefin. Give this man a medal already!!

@PhinaLovesMusic 5 жыл бұрын

I'm in graduate school and you just explained PCA better than my professor. GOD BLESS YOU!!!!

@HarmonicaTool 2 жыл бұрын

5 year old video still one of the best I found on the topic on YT. Thumbs up

@sadian3392 6 жыл бұрын

I had listened to several other lectures on this topic but the pace and the detail covered in this video is simply the best. Please keep up the good work!

@hefinrhys8572 6 жыл бұрын

Thanks Sadia! Glad to be of help.

@maitivandenbosch1541 4 жыл бұрын

Never a tutorial about PCA so clear and simply. Thanks

@rebecai.m.6670 6 жыл бұрын

OMG, this tutorial is perfection, I´m serious. You make it sound so easy and you explain every single step. Also, that is the prettiest plot I´ve seen. Thank you so much for this.

@hefinrhys8572 6 жыл бұрын

You're very welcome! If you like pretty plots, check out my video on using ggplot2 ;) kzbin.info/www/bejne/Z3jQgmh4maabfZY

@Rudblattner 3 жыл бұрын

I never comments on videos, but you really saved me here. Nothing was working on my dataset and this came smoothly. Well done on the explanations too, everything as crystal clear.

@WatchMacro16 5 жыл бұрын

Finally a perfect tutorial for POA in Rstudio. Thanks mate!

@jackiemwaniki1266 5 жыл бұрын

How i came across this video a week before ,my final year, project due date is a miracle. Thank you so much Hefin Rhys.

@mohamedadow8153 5 жыл бұрын

Jackie Mwaniki doing?

@jackiemwaniki1266 5 жыл бұрын

@@mohamedadow8153 my topic is on Macroeconomic factors and the stock prices using the APT framework.

@user-kb6ui2sh5v Жыл бұрын

really useful video thank you, I've just started my MSc project using PCA, so thank you for this. I will be following subsequent videos.

@chinmoysarangi9399 4 жыл бұрын

I have my exam in 2 days and Your video saved me tons of effort in combing through so many other articles and videos explaining PCA. A BIG Thank You! Hope you do many more videos and impart your knowledge to newbies like me. :)

@timisoutdoors 4 жыл бұрын

Quite literally, the best tutorial I've ever seen on an advanced multivariate topic. Job well done, sir!

@shantanutamuly6932 4 жыл бұрын

Excellent tutorial. I have used this for analysis of my research. Thanks a lot for sharing your valuable knowledge.

@Axle_Tavish 2 жыл бұрын

Explained everything one might need. If only every tutorial on KZbin is like this one!

@tylerripku8222 3 жыл бұрын

The best run through I've seen for using and understanding PCA.

@johnkaruitha2527 4 жыл бұрын

Great help, been doing my own work following step by step this tutorial...the whole night

@johnmandrake8829 3 жыл бұрын

its so funny I don't think you realize but myPR "my pyaar" in Urdu/Hindi means my love. Thank you for an amazing and extremely helpful video

@lilmune 4 жыл бұрын

In all honesty this is the best tutorial I've seen in months. Nice job!

@ditshegoralefeta1315 4 жыл бұрын

I've been going through your tutorials and I'm so impressed. Legend!!!

@jackpumpunifrimpong-manso6523 4 жыл бұрын

Excellent! Words cannot show how grateful I am!

@fabriziomauri9109 4 жыл бұрын

Damn, your accent is hypnotic! The explanation is good too!

@hefinrhys8572 4 жыл бұрын

Thanks! 😘

@siktrading3117 3 жыл бұрын

This tutorial is outstanding. Excellent explanation! Thank you very much!!!

@glenndejucos3891 3 жыл бұрын

This video gave a major leap in my study. Thanks.

@0xea31c0 3 жыл бұрын

The explanation is just perfect. Thank you.

@nrlzt9443 Жыл бұрын

really love your explanantion! thank you so much for your video, really helpful and i can understand it! keep it up! looking forward to your many more upcoming videos

@lisakaly6371 2 жыл бұрын

In fact I found out how to overcome the multicolinearity , by using the eigen values of PC1 and PC2! I love PCA!

@elenavlasenko5452 6 жыл бұрын

I can say for sure that it´s the best explanation I´ve ever seen!! Go on and I would be really grateful if you make one of Time Series and Forecasting :)

@hefinrhys8572 6 жыл бұрын

Thanks Elena! Thank you also for the feedback; I may make a video on time series in the future.

@HDgamesFTW 4 жыл бұрын

Best explanation I’ve found so far! Thanks mate, legend!

@HDgamesFTW 4 жыл бұрын

Uploaded the script as well what a guy

@brunopiato 7 жыл бұрын

Great video. Very instructive. Please keep making them

@brunocamargodossantos5049 2 жыл бұрын

Thanks for the the video, it helped me a lot!! Your explanation is very didactic!

@tankstube09 6 жыл бұрын

Very nice tutorial, nicely explained and really complete, looking forward to learn more in R with other of your vids, thank you for the tremendous help!

@hefinrhys8572 6 жыл бұрын

Thank you! I'm glad it helped.

@em70171 3 жыл бұрын

This is gold. I absolutely love you for this

@chris-qm2tq 2 жыл бұрын

Excellent walkthrough. Thank you!

@blackpearlstay 4 жыл бұрын

Thank you so much for this SUPER helpful video. (P.S. The explanation with the iris dataset was especially convenient for me as I'm working on a dataset with dozens of recorded plant traits:D)

@andreamonge5025 3 жыл бұрын

Thank you so much for the very clear and concise explanation!

@vagabond197979 2 жыл бұрын

Added to my stats/math playlist! Very useful.

@arunkumarmallik9091 5 жыл бұрын

Thanks for nice and easy way of explanation.It really helps me a lot.

@himand11 2 жыл бұрын

Thank you so so much!! You just saved the day and helped me really understand my homework for predictive analysis.

@OZ88 4 жыл бұрын

Ok so the Sepal.Width contributes mostly over 80% to the PC2 and the other three to PC1 more. 14:32 and so Sepal Width is fair enough as an info to separate setosa in the next plot. Isn't it also advisable to apply pca to linear problems?

@hefinrhys8572 4 жыл бұрын

You're correct about the relative contributions of the variables to each principal component. The Setosa species is discriminated from the other two species mainly by PC1, to which sepal.width contributes less that than the other variables. As PCA is a linear dimension reduction technique, it will best reveal clusters of cases that are linearly separable, but PCA is still a valid and useful approach to compress information, even in situations where this isn't true, or when we don't know about the structures in the data. Non-linear techniques such as t-SNE and UMAP are excellent at revealing non-linearly-separable clusters of cases in data, but interpreting their axes is very difficult/impossible.

@biochemistry9729 4 жыл бұрын

Thank you so much! This is GREAT! You explained very clearly and smoothly.

@rVnikov 7 жыл бұрын

Excellent tutorial Hefin. Hooked and subscribed...

@hefinrhys9234 7 жыл бұрын

Vesselin Nikov thank you! Feel free to let me know if there are other topics you'd like to see covered.

@florama5210 6 жыл бұрын

It is a really nice and clear tutorial! Thanks a lot, Hefin~

@hefinrhys8572 6 жыл бұрын

You're welcome Flora! Thank you!

@kasia9904 Жыл бұрын

when i generate the PCA with the code explained @ 20:46 my legend appears as a gradient rather than the separate values (as in your three different species appearing in red, blue green. how can i change this?

@kevinroberts5703 Жыл бұрын

thank you so much for this video. incredibly helpful.

@shafiqullaharyan261 4 жыл бұрын

Perfect! Never seen such explanation

@Fan-vk9gx 4 жыл бұрын

You are really a life saver! Thank you!

@testchannel5805 4 жыл бұрын

Very nice, guys hit the subscribe button, the best explanation so far.

@murambiwanyati3607 2 жыл бұрын

Great teacher you are, thanks

@mativillagran1684 4 жыл бұрын

thank you so much! you are the best, very clear explanation.

@mustafa_sakalli 4 жыл бұрын

Finally understood this goddamn topic! Thank you dude

@timothystewart7300 3 жыл бұрын

Fantastic video Hefin! thanks

@SUMITKUMAR-hj8im 4 жыл бұрын

a perfect tutorial for PCA... Thank you

@sandal-city-pet-clinic-1 5 жыл бұрын

simple and clear. very good

@fatimaelmansouri9338 4 жыл бұрын

Super well-explained, thank you!

@aliosmanturgut102 4 жыл бұрын

Very informative and clear Thanks.

@harryainsworth6923 4 жыл бұрын

this tutorial is slap bang fuckin perfect, god bless you, you magnificant bastard

@hefinrhys8572 4 жыл бұрын

😘

@harryainsworth6923 4 жыл бұрын

@@hefinrhys8572 stats assignment due in 12 hours and you saved me alot of hassle

@christianberntsen3856 2 жыл бұрын

10:21 - When using "prcomp", the calculation is done by a singular value decomposition. So, these are not actually eigenvectors, right?

@hefinrhys8572 2 жыл бұрын

SVD still finds eigenvectors as it's a generalization of eigen-decomposition. This might be useful: web.mit.edu/be.400/www/SVD/Singular_Value_Decomposition.htm

@christianberntsen3856 2 жыл бұрын

@@hefinrhys8572 Thank you answering! I will look into it.

@rockcandy28 6 жыл бұрын

Hello! Thanks for the video, just a question how would you modify the code if you have NA values? In advance, thank you!

@Badwolf_82 4 жыл бұрын

Thank you so much for this tutorial, it really helped me!

@stephravelo 3 жыл бұрын

Hi, i wonder if it's possible to put label in each points? I tried geom_text but i get error

@hefinrhys8572 3 жыл бұрын

Yes you should be able to. What have you tried? If you have a column called names with the label for each point, something like this should work: ggplot(df, aes(PC1, PC2, label = names)) + geom_text() Or use geom_label() if you prefer. You can also check out the ggrepel package if you have many overlapping points.

@stephravelo 3 жыл бұрын

@@hefinrhys8572 I have 18 observations and 9 variables w/represented my environmental parameters. I successfully produced the ggplot figure. But I wanted to put a label in all the points in the figure to know what variables cluster together. i tried your suggestion but it gives me the numerical value, not the environmental variables. Any other suggestion?

@aminsajid123 2 жыл бұрын

Amazing video! Thanks for the explaining everything very simply. Could you please do a video on PLS-DA?

@esterteran2872 4 жыл бұрын

Good tutorial!I have learnt a lot. Thanks !

@tonyrobinson9046 Жыл бұрын

Outstanding. Thank you.

@metadelabegaz6279 6 жыл бұрын

Sweet baby Jesus. Thank you for making this video!

@hefinrhys8572 6 жыл бұрын

You're very welcome!

@alessandrorosati969 2 жыл бұрын

How is it possible to generate outliers uniformly in the p-parallelotope defined by the coordinate-wise maxima and minima of the ‘regular’ observations in R?

@DesertHash 4 жыл бұрын

At 5:50, don't you mean that if we measured sepal width in kilometers then it would appear LESS important? Because if we measured it in kilometers instead of millimeters, our numerical values will be smaller and vary far less, making it less important in the context of PCA. Thank you for this video.

@hefinrhys8572 4 жыл бұрын

Yes, you're absolutely correct! What I meant to say was that if that length was kilometers, but we neasured it in millimeters, then it would be given greater importance. But yes, larger values are given greater importance.

@DesertHash 4 жыл бұрын

@@hefinrhys8572 Alright, thanks for the reply and for the video!

@blessingtate9387 4 жыл бұрын

You "R" AWESOME!!!

@galk32 5 жыл бұрын

amazing video, thank you

@kmowl1994 3 жыл бұрын

Very helpful, thanks!

@salvatoregiordano2511 4 жыл бұрын

Hi Hefin, Thanks for this tutorial. What do we do if PC1 and PC2 can only explain around 50% of the variation? Do we also include PC3 and PC4? If so, how?

@maf4421 3 жыл бұрын

Thank you Hefin Rhys for explaining PCA in detail. Can you please explain how to find weights of a variable by PCA for making a composite index? Is it rotation values that are for PC1, PC2, etc.? For example, if I have (I=w1*X+w2*Y+w3*Z) then how to find w1, w2, w3 by PCA.

@stinkbomb13 3 жыл бұрын

Error in svd(x, nu = 0, nv = k) : infinite or missing values in 'x' ???

@yayciencia 4 жыл бұрын

Thank you! This was very helpful to me

@jackiemwaniki1266 5 жыл бұрын

Thank again. Quick one....Would you mind also doing the Fama and Macbeth Analysis without using the KenFrench Dataframe?

@Sunny-China3 4 жыл бұрын

Very informative video. Can you tell me? When i m plotting the last plot ggplot it showed error like . R said there is no package called digest. How to deal with it kindly advise.

@stephaniefaithravelo3510 3 жыл бұрын

Hi Hefin, can I put a percentage in the PCA 1 and PC2 in the x and y-axis? How to do that?

@JibHyourinmaru 3 жыл бұрын

If my biological data only has numbers(1,2 & 3 digits) and a lot of zeros, do I need to scale also?

@stephaniefaithravelo3510 3 жыл бұрын

Hey Hefin, I wonder if you can also do a tutorial of PCA producing triplot graph?

@patriciaamado9897 4 жыл бұрын

can I put the loadings scores in the ggplot, as well?

@fsxaviator 2 жыл бұрын

Where did you define PC1 and PC2 (where you use them in the ggplot)? I'm getting "Error: object 'PC1' not found"

@AcademicActuary 4 жыл бұрын

Great presentation! However, why did you not binarize the categorical variable first, and then do the subsequent analysis? Thanks!

@anjangowdas2541 4 жыл бұрын

Thank you, it was very helpful.

@Marinkaasje 3 жыл бұрын

I run into the error when running line 17 (in the download file): Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 510, 382. What it going wrong?

@Actanonverba01 5 жыл бұрын

Clear and straight forward, good work! Bully for you! Lol

@hellthraser550 4 жыл бұрын

How can i input desired fonts and font size in that graph ?

@Jjhukri 6 жыл бұрын

Amazing video Hefin, there are lot of details covered in 27 min video, we just have to be careful not to miss any second of the video. I have a question: How does the scores are calculated for each PC's ? Why do we have to check the correlation between the variables and the PC1 & PC2 ? what value it adds practically ?

@Orange-xw4lt 4 жыл бұрын

Hi, good job but If I have an input data as a wave how can I take and separate the values of the crests starting from a certain threshold?

@rafaeu7904 4 жыл бұрын

How can I see the residuals? And correlate with scores

@heartfighters2055 5 жыл бұрын

just brilliant

@Emmyb 6 жыл бұрын

this video is fab thank you!

@hefinrhys8572 6 жыл бұрын

Thank you Emily! Happy dimension reduction!

@rafaborkowski580 2 жыл бұрын

How can I upload my data into RStudio to work with ?

@tiberiusjimbo9176 4 жыл бұрын

Thank you. Very helpful.

@mohammadtuhinali1430 2 жыл бұрын

Many thanks for your efforts to make this complex issue much easier for us. Could you enlight me to understand group similarly and dissimilarity using pca?

@amggwarrior 4 жыл бұрын

Thank you for this very clear video. Question about interpretation: I get just the 1 cluster in my ggplot, what does this mean? that all my variables relate to the same construct (component) and that they cant really be differentiated?

@hefinrhys8572 4 жыл бұрын

So when you apply PCA to your own data and plot the first two components, you see just a single cloud of data? This would indicate that you don't have distinct, linearly-separable sub-classes of cases in your dataset. PCA will still compress the majority of the information of your many variables into a smaller number of variables, so even if it doesn't reveal a class structure in your data, it can still be beneficial for dimension reduction.

@amggwarrior 4 жыл бұрын

@@hefinrhys8572 thanks for the quick reply. Yes I only see a single cloud. I am not using PCA for dimension reduction - just using it to explore my data before including these variables into a SEM. In particular, I wanted to see if it makes sense to relate these 5 variables to a single latent variable in my SEM. All the loadings for PC1 are 0.7 or 0. 8, or more, and PC1 captures 0.7 of variation. Can I take this result as support for considering these 5 variables as part of the same measuring model (linked to the same latent variable) in my SEM? theoretically it makes sense to, but I wanted to see if the data supported this. I have never done PCA or SEM so no idea if I am doing this right.

@lisakaly6371 2 жыл бұрын

Thank you for this great video. can you show how to seek multicolinearity or treat multicolinearity with PCA ? I have a data set with 40 variables with high intercorrelation because of cross reactivity . VIF and matrix correlation doesnt work probably because of multiple comparison ....:(((

@EV4UTube 3 жыл бұрын

Can I confess something that baffles me? Because, I see this all the time. OK, so you, personally, are motivated to share your knowledge with the world, right? I mean, you took time, effort, energy, focus, planning, equipment, software, etc. to prepare this explanation and exercises. You screen-captured it, you set up your microphone, you edited the video, you did all this enormous amount of work. You're clearly motivated. Yet, when it actually comes time to deliver that instruction, you think it is 100% acceptable to place all your code into an absolutely miniscule fraction of the entire screen. Like, pretty-close to 96% of the screen is 'dead-space' from the perspective of the learner. The size of the typeface is miniscule (depending on your viewing system). It would be like producing a major blockbuster film, but then publishing it at the size of a postage stamp. Surely, it would be possible for you to 'zoom-into' that section of the IDE to show people what it was you were typing - the operators, the functions, the arugments, etc. I'm not really picking on you, individually, per se. I see this happen all the time with instructors of every stripe. I have this insane idea that instruction has much, much less to do with the insturctor's ability to demonstrate their knowledge to an uninformed person and has much, much more to do with the instructor's ability to 'meet' the student 'where' they are and to carry the student from a place of relative ignoracne (about a specific topic) to a place of relative competence. One of the best tools for assessing whether you're meeting that criteria is to PRETEND that you know nothing about the topic - then watch your own video (stripping-out all the assumptions you would automatically make about what is going on based on your existing knowledge). If you didn't have a 48" monitor and excellent eye-sight, would you be able to see what was being written? Like... why would you do that? If writing of the code IS NOT important - don't bother showing it. If writing of the code IS important, then make it (freaking) visible and legible. This really baffles me. I guess instructors are so "in-their-own-head" when they're delivering content, they don't take time to realize that no one can see what is happening. . It just baffles me how often I see this.

@EV4UTube 3 жыл бұрын

If 'zooming-in' is not easily achieved, the least instructors could do is go into the preferences of the IDE and jack-up the size of the text so that it would be reasonably legible on a screen typical of, say, a laptop or tablet. It just seems like such a low-hanging fruit, and easy fix to facilitate learning and ensure legibility.

@Pancho96albo 2 жыл бұрын

@@EV4UTube chill out dude

@zahrasattari8738 3 жыл бұрын

Thanks a lot for a great video. Could you possibly guide me to a source with info on performing cross validation in R after doing PCA on the data? Possibly as clear as yours:) I've been searching and mostly came across guides on how it's performed after doing PLS-DA. I'm preparing a report for a modeling course and am asked to provide (describe and perform) a validation step.

@hefinrhys8572 3 жыл бұрын

So cross-validation is only useful for supervised learning modelling, because we have a ground truth to evaluate the model performance against. PCA is an unsupervised algorithm, (it's really just a transformation of the data), and it doesn't make predictions. In contrast, PLS-DA is a supervised classification algorithm. To train it, you need to start with labelled data. So in this case, cross-validation is a useful tool to evaluate model performance, because we can compare the model predictions to the ground truth. Does that make sense?

@zahrasattari8738 3 жыл бұрын

@@hefinrhys8572 Sure it does. Thanks a lot! If I'm right PCA is considered modeling when it comes to picking the appropriate number of PCs that best describe the data. And as it is meant to be reported for a course project and I need to include/perform a validation step I am pushing to get somewhere with it without having to include PLS-DA in the whole story.. I guess I can still consider the step with "making sure I have picked the right number of PCs for my model", as a validation step, am I right about this? I came across this link, which I guess is close to what I'm looking for www.r-bloggers.com/2018/10/obtaining-the-number-of-components-from-cross-validation-of-principal-components-regression/

@hefinrhys8572 3 жыл бұрын

Ah ok, well of you're using principal components as predictors in a supervised model, then you can use cross validation to guide the number of components you should include. For example, you train a model with the first 5 principal components and use cross-validation to evaluate the performance of this model, then try the first 4, then 3 and so on. You can pick the number of principal components that gives you the best cross validation performance. This is essentially a feature selection problem. You can do this manually, or the mlr package in R can help you achieve this by creating a 'wrapped learner'.

@zahrasattari8738 3 жыл бұрын

thanks again.. hope I can get somewhere with it:)

@zahrasattari8738 3 жыл бұрын

@@hefinrhys8572 Hi it's me again:) I am using your nice PCA plot code for plotting some data with limited number of samples and the idea is to show they are not grouped based on some variable (which they are not). Then I'd like to show in my plot the sample numbers. This is probably a very basic question, I was wondering how I could insert the sample numbers in the plot. (I'm guessing I should add something to the geom_point line (?))

@abhiagni242 7 жыл бұрын

Thanks for the video..helped a lot :)

@hefinrhys9234 7 жыл бұрын

ABHI agni Glad it helped :) Feel free to give feedback on other topics that would be useful.

@hoseinmousavi4890 4 жыл бұрын

Thanks for your nice job! I have a question. I have a biostat data. As you told in this video, we do not need to know what is our variable for colour grouping! Actually, I have a problem, and it does not work for me! aes(x = PC1, y = PC2 , col= ??? ) I really appreciate it if you reply me back!

@djangoworldwide7925 Жыл бұрын

Great tutorial but it leaves me with the question, what do i do with it? Is this just the begining of a K means classification that gives me an idea of the proper k?

@djangoworldwide7925 Жыл бұрын

Lol you just replied in 26:00... Thank you so much!