Noone explains R better than Hefin. Give this man a medal already!!
@PhinaLovesMusic5 жыл бұрын
I'm in graduate school and you just explained PCA better than my professor. GOD BLESS YOU!!!!
@HarmonicaTool2 жыл бұрын
5 year old video still one of the best I found on the topic on YT. Thumbs up
@sadian33926 жыл бұрын
I had listened to several other lectures on this topic but the pace and the detail covered in this video is simply the best. Please keep up the good work!
@hefinrhys85726 жыл бұрын
Thanks Sadia! Glad to be of help.
@maitivandenbosch15414 жыл бұрын
Never a tutorial about PCA so clear and simply. Thanks
@rebecai.m.66706 жыл бұрын
OMG, this tutorial is perfection, I´m serious. You make it sound so easy and you explain every single step. Also, that is the prettiest plot I´ve seen. Thank you so much for this.
@hefinrhys85726 жыл бұрын
You're very welcome! If you like pretty plots, check out my video on using ggplot2 ;) kzbin.info/www/bejne/Z3jQgmh4maabfZY
@Rudblattner3 жыл бұрын
I never comments on videos, but you really saved me here. Nothing was working on my dataset and this came smoothly. Well done on the explanations too, everything as crystal clear.
@WatchMacro165 жыл бұрын
Finally a perfect tutorial for POA in Rstudio. Thanks mate!
@jackiemwaniki12665 жыл бұрын
How i came across this video a week before ,my final year, project due date is a miracle. Thank you so much Hefin Rhys.
@mohamedadow81535 жыл бұрын
Jackie Mwaniki doing?
@jackiemwaniki12665 жыл бұрын
@@mohamedadow8153 my topic is on Macroeconomic factors and the stock prices using the APT framework.
@user-kb6ui2sh5v Жыл бұрын
really useful video thank you, I've just started my MSc project using PCA, so thank you for this. I will be following subsequent videos.
@chinmoysarangi93994 жыл бұрын
I have my exam in 2 days and Your video saved me tons of effort in combing through so many other articles and videos explaining PCA. A BIG Thank You! Hope you do many more videos and impart your knowledge to newbies like me. :)
@timisoutdoors4 жыл бұрын
Quite literally, the best tutorial I've ever seen on an advanced multivariate topic. Job well done, sir!
@shantanutamuly69324 жыл бұрын
Excellent tutorial. I have used this for analysis of my research. Thanks a lot for sharing your valuable knowledge.
@Axle_Tavish2 жыл бұрын
Explained everything one might need. If only every tutorial on KZbin is like this one!
@tylerripku82223 жыл бұрын
The best run through I've seen for using and understanding PCA.
@johnkaruitha25274 жыл бұрын
Great help, been doing my own work following step by step this tutorial...the whole night
@johnmandrake88293 жыл бұрын
its so funny I don't think you realize but myPR "my pyaar" in Urdu/Hindi means my love. Thank you for an amazing and extremely helpful video
@lilmune4 жыл бұрын
In all honesty this is the best tutorial I've seen in months. Nice job!
@ditshegoralefeta13154 жыл бұрын
I've been going through your tutorials and I'm so impressed. Legend!!!
@jackpumpunifrimpong-manso65234 жыл бұрын
Excellent! Words cannot show how grateful I am!
@fabriziomauri91094 жыл бұрын
Damn, your accent is hypnotic! The explanation is good too!
@hefinrhys85724 жыл бұрын
Thanks! 😘
@siktrading31173 жыл бұрын
This tutorial is outstanding. Excellent explanation! Thank you very much!!!
@glenndejucos38913 жыл бұрын
This video gave a major leap in my study. Thanks.
@0xea31c03 жыл бұрын
The explanation is just perfect. Thank you.
@nrlzt9443 Жыл бұрын
really love your explanantion! thank you so much for your video, really helpful and i can understand it! keep it up! looking forward to your many more upcoming videos
@lisakaly63712 жыл бұрын
In fact I found out how to overcome the multicolinearity , by using the eigen values of PC1 and PC2! I love PCA!
@elenavlasenko54526 жыл бұрын
I can say for sure that it´s the best explanation I´ve ever seen!! Go on and I would be really grateful if you make one of Time Series and Forecasting :)
@hefinrhys85726 жыл бұрын
Thanks Elena! Thank you also for the feedback; I may make a video on time series in the future.
@HDgamesFTW4 жыл бұрын
Best explanation I’ve found so far! Thanks mate, legend!
@HDgamesFTW4 жыл бұрын
Uploaded the script as well what a guy
@brunopiato7 жыл бұрын
Great video. Very instructive. Please keep making them
@brunocamargodossantos50492 жыл бұрын
Thanks for the the video, it helped me a lot!! Your explanation is very didactic!
@tankstube096 жыл бұрын
Very nice tutorial, nicely explained and really complete, looking forward to learn more in R with other of your vids, thank you for the tremendous help!
@hefinrhys85726 жыл бұрын
Thank you! I'm glad it helped.
@em701713 жыл бұрын
This is gold. I absolutely love you for this
@chris-qm2tq2 жыл бұрын
Excellent walkthrough. Thank you!
@blackpearlstay4 жыл бұрын
Thank you so much for this SUPER helpful video. (P.S. The explanation with the iris dataset was especially convenient for me as I'm working on a dataset with dozens of recorded plant traits:D)
@andreamonge50253 жыл бұрын
Thank you so much for the very clear and concise explanation!
@vagabond1979792 жыл бұрын
Added to my stats/math playlist! Very useful.
@arunkumarmallik90915 жыл бұрын
Thanks for nice and easy way of explanation.It really helps me a lot.
@himand112 жыл бұрын
Thank you so so much!! You just saved the day and helped me really understand my homework for predictive analysis.
@OZ884 жыл бұрын
Ok so the Sepal.Width contributes mostly over 80% to the PC2 and the other three to PC1 more. 14:32 and so Sepal Width is fair enough as an info to separate setosa in the next plot. Isn't it also advisable to apply pca to linear problems?
@hefinrhys85724 жыл бұрын
You're correct about the relative contributions of the variables to each principal component. The Setosa species is discriminated from the other two species mainly by PC1, to which sepal.width contributes less that than the other variables. As PCA is a linear dimension reduction technique, it will best reveal clusters of cases that are linearly separable, but PCA is still a valid and useful approach to compress information, even in situations where this isn't true, or when we don't know about the structures in the data. Non-linear techniques such as t-SNE and UMAP are excellent at revealing non-linearly-separable clusters of cases in data, but interpreting their axes is very difficult/impossible.
@biochemistry97294 жыл бұрын
Thank you so much! This is GREAT! You explained very clearly and smoothly.
@rVnikov7 жыл бұрын
Excellent tutorial Hefin. Hooked and subscribed...
@hefinrhys92347 жыл бұрын
Vesselin Nikov thank you! Feel free to let me know if there are other topics you'd like to see covered.
@florama52106 жыл бұрын
It is a really nice and clear tutorial! Thanks a lot, Hefin~
@hefinrhys85726 жыл бұрын
You're welcome Flora! Thank you!
@kasia9904 Жыл бұрын
when i generate the PCA with the code explained @ 20:46 my legend appears as a gradient rather than the separate values (as in your three different species appearing in red, blue green. how can i change this?
@kevinroberts5703 Жыл бұрын
thank you so much for this video. incredibly helpful.
@shafiqullaharyan2614 жыл бұрын
Perfect! Never seen such explanation
@Fan-vk9gx4 жыл бұрын
You are really a life saver! Thank you!
@testchannel58054 жыл бұрын
Very nice, guys hit the subscribe button, the best explanation so far.
@murambiwanyati36072 жыл бұрын
Great teacher you are, thanks
@mativillagran16844 жыл бұрын
thank you so much! you are the best, very clear explanation.
@mustafa_sakalli4 жыл бұрын
Finally understood this goddamn topic! Thank you dude
@timothystewart73003 жыл бұрын
Fantastic video Hefin! thanks
@SUMITKUMAR-hj8im4 жыл бұрын
a perfect tutorial for PCA... Thank you
@sandal-city-pet-clinic-15 жыл бұрын
simple and clear. very good
@fatimaelmansouri93384 жыл бұрын
Super well-explained, thank you!
@aliosmanturgut1024 жыл бұрын
Very informative and clear Thanks.
@harryainsworth69234 жыл бұрын
this tutorial is slap bang fuckin perfect, god bless you, you magnificant bastard
@hefinrhys85724 жыл бұрын
😘
@harryainsworth69234 жыл бұрын
@@hefinrhys8572 stats assignment due in 12 hours and you saved me alot of hassle
@christianberntsen38562 жыл бұрын
10:21 - When using "prcomp", the calculation is done by a singular value decomposition. So, these are not actually eigenvectors, right?
@hefinrhys85722 жыл бұрын
SVD still finds eigenvectors as it's a generalization of eigen-decomposition. This might be useful: web.mit.edu/be.400/www/SVD/Singular_Value_Decomposition.htm
@christianberntsen38562 жыл бұрын
@@hefinrhys8572 Thank you answering! I will look into it.
@rockcandy286 жыл бұрын
Hello! Thanks for the video, just a question how would you modify the code if you have NA values? In advance, thank you!
@Badwolf_824 жыл бұрын
Thank you so much for this tutorial, it really helped me!
@stephravelo3 жыл бұрын
Hi, i wonder if it's possible to put label in each points? I tried geom_text but i get error
@hefinrhys85723 жыл бұрын
Yes you should be able to. What have you tried? If you have a column called names with the label for each point, something like this should work: ggplot(df, aes(PC1, PC2, label = names)) + geom_text() Or use geom_label() if you prefer. You can also check out the ggrepel package if you have many overlapping points.
@stephravelo3 жыл бұрын
@@hefinrhys8572 I have 18 observations and 9 variables w/represented my environmental parameters. I successfully produced the ggplot figure. But I wanted to put a label in all the points in the figure to know what variables cluster together. i tried your suggestion but it gives me the numerical value, not the environmental variables. Any other suggestion?
@aminsajid1232 жыл бұрын
Amazing video! Thanks for the explaining everything very simply. Could you please do a video on PLS-DA?
@esterteran28724 жыл бұрын
Good tutorial!I have learnt a lot. Thanks !
@tonyrobinson9046 Жыл бұрын
Outstanding. Thank you.
@metadelabegaz62796 жыл бұрын
Sweet baby Jesus. Thank you for making this video!
@hefinrhys85726 жыл бұрын
You're very welcome!
@alessandrorosati9692 жыл бұрын
How is it possible to generate outliers uniformly in the p-parallelotope defined by the coordinate-wise maxima and minima of the ‘regular’ observations in R?
@DesertHash4 жыл бұрын
At 5:50, don't you mean that if we measured sepal width in kilometers then it would appear LESS important? Because if we measured it in kilometers instead of millimeters, our numerical values will be smaller and vary far less, making it less important in the context of PCA. Thank you for this video.
@hefinrhys85724 жыл бұрын
Yes, you're absolutely correct! What I meant to say was that if that length was kilometers, but we neasured it in millimeters, then it would be given greater importance. But yes, larger values are given greater importance.
@DesertHash4 жыл бұрын
@@hefinrhys8572 Alright, thanks for the reply and for the video!
@blessingtate93874 жыл бұрын
You "R" AWESOME!!!
@galk325 жыл бұрын
amazing video, thank you
@kmowl19943 жыл бұрын
Very helpful, thanks!
@salvatoregiordano25114 жыл бұрын
Hi Hefin, Thanks for this tutorial. What do we do if PC1 and PC2 can only explain around 50% of the variation? Do we also include PC3 and PC4? If so, how?
@maf44213 жыл бұрын
Thank you Hefin Rhys for explaining PCA in detail. Can you please explain how to find weights of a variable by PCA for making a composite index? Is it rotation values that are for PC1, PC2, etc.? For example, if I have (I=w1*X+w2*Y+w3*Z) then how to find w1, w2, w3 by PCA.
@stinkbomb133 жыл бұрын
Error in svd(x, nu = 0, nv = k) : infinite or missing values in 'x' ???
@yayciencia4 жыл бұрын
Thank you! This was very helpful to me
@jackiemwaniki12665 жыл бұрын
Thank again. Quick one....Would you mind also doing the Fama and Macbeth Analysis without using the KenFrench Dataframe?
@Sunny-China34 жыл бұрын
Very informative video. Can you tell me? When i m plotting the last plot ggplot it showed error like . R said there is no package called digest. How to deal with it kindly advise.
@stephaniefaithravelo35103 жыл бұрын
Hi Hefin, can I put a percentage in the PCA 1 and PC2 in the x and y-axis? How to do that?
@JibHyourinmaru3 жыл бұрын
If my biological data only has numbers(1,2 & 3 digits) and a lot of zeros, do I need to scale also?
@stephaniefaithravelo35103 жыл бұрын
Hey Hefin, I wonder if you can also do a tutorial of PCA producing triplot graph?
@patriciaamado98974 жыл бұрын
can I put the loadings scores in the ggplot, as well?
@fsxaviator2 жыл бұрын
Where did you define PC1 and PC2 (where you use them in the ggplot)? I'm getting "Error: object 'PC1' not found"
@AcademicActuary4 жыл бұрын
Great presentation! However, why did you not binarize the categorical variable first, and then do the subsequent analysis? Thanks!
@anjangowdas25414 жыл бұрын
Thank you, it was very helpful.
@Marinkaasje3 жыл бұрын
I run into the error when running line 17 (in the download file): Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 510, 382. What it going wrong?
@Actanonverba015 жыл бұрын
Clear and straight forward, good work! Bully for you! Lol
@hellthraser5504 жыл бұрын
How can i input desired fonts and font size in that graph ?
@Jjhukri6 жыл бұрын
Amazing video Hefin, there are lot of details covered in 27 min video, we just have to be careful not to miss any second of the video. I have a question: How does the scores are calculated for each PC's ? Why do we have to check the correlation between the variables and the PC1 & PC2 ? what value it adds practically ?
@Orange-xw4lt4 жыл бұрын
Hi, good job but If I have an input data as a wave how can I take and separate the values of the crests starting from a certain threshold?
@rafaeu79044 жыл бұрын
How can I see the residuals? And correlate with scores
@heartfighters20555 жыл бұрын
just brilliant
@Emmyb6 жыл бұрын
this video is fab thank you!
@hefinrhys85726 жыл бұрын
Thank you Emily! Happy dimension reduction!
@rafaborkowski5802 жыл бұрын
How can I upload my data into RStudio to work with ?
@tiberiusjimbo91764 жыл бұрын
Thank you. Very helpful.
@mohammadtuhinali14302 жыл бұрын
Many thanks for your efforts to make this complex issue much easier for us. Could you enlight me to understand group similarly and dissimilarity using pca?
@amggwarrior4 жыл бұрын
Thank you for this very clear video. Question about interpretation: I get just the 1 cluster in my ggplot, what does this mean? that all my variables relate to the same construct (component) and that they cant really be differentiated?
@hefinrhys85724 жыл бұрын
So when you apply PCA to your own data and plot the first two components, you see just a single cloud of data? This would indicate that you don't have distinct, linearly-separable sub-classes of cases in your dataset. PCA will still compress the majority of the information of your many variables into a smaller number of variables, so even if it doesn't reveal a class structure in your data, it can still be beneficial for dimension reduction.
@amggwarrior4 жыл бұрын
@@hefinrhys8572 thanks for the quick reply. Yes I only see a single cloud. I am not using PCA for dimension reduction - just using it to explore my data before including these variables into a SEM. In particular, I wanted to see if it makes sense to relate these 5 variables to a single latent variable in my SEM. All the loadings for PC1 are 0.7 or 0. 8, or more, and PC1 captures 0.7 of variation. Can I take this result as support for considering these 5 variables as part of the same measuring model (linked to the same latent variable) in my SEM? theoretically it makes sense to, but I wanted to see if the data supported this. I have never done PCA or SEM so no idea if I am doing this right.
@lisakaly63712 жыл бұрын
Thank you for this great video. can you show how to seek multicolinearity or treat multicolinearity with PCA ? I have a data set with 40 variables with high intercorrelation because of cross reactivity . VIF and matrix correlation doesnt work probably because of multiple comparison ....:(((
@EV4UTube3 жыл бұрын
Can I confess something that baffles me? Because, I see this all the time. OK, so you, personally, are motivated to share your knowledge with the world, right? I mean, you took time, effort, energy, focus, planning, equipment, software, etc. to prepare this explanation and exercises. You screen-captured it, you set up your microphone, you edited the video, you did all this enormous amount of work. You're clearly motivated. Yet, when it actually comes time to deliver that instruction, you think it is 100% acceptable to place all your code into an absolutely miniscule fraction of the entire screen. Like, pretty-close to 96% of the screen is 'dead-space' from the perspective of the learner. The size of the typeface is miniscule (depending on your viewing system). It would be like producing a major blockbuster film, but then publishing it at the size of a postage stamp. Surely, it would be possible for you to 'zoom-into' that section of the IDE to show people what it was you were typing - the operators, the functions, the arugments, etc. I'm not really picking on you, individually, per se. I see this happen all the time with instructors of every stripe. I have this insane idea that instruction has much, much less to do with the insturctor's ability to demonstrate their knowledge to an uninformed person and has much, much more to do with the instructor's ability to 'meet' the student 'where' they are and to carry the student from a place of relative ignoracne (about a specific topic) to a place of relative competence. One of the best tools for assessing whether you're meeting that criteria is to PRETEND that you know nothing about the topic - then watch your own video (stripping-out all the assumptions you would automatically make about what is going on based on your existing knowledge). If you didn't have a 48" monitor and excellent eye-sight, would you be able to see what was being written? Like... why would you do that? If writing of the code IS NOT important - don't bother showing it. If writing of the code IS important, then make it (freaking) visible and legible. This really baffles me. I guess instructors are so "in-their-own-head" when they're delivering content, they don't take time to realize that no one can see what is happening. . It just baffles me how often I see this.
@EV4UTube3 жыл бұрын
If 'zooming-in' is not easily achieved, the least instructors could do is go into the preferences of the IDE and jack-up the size of the text so that it would be reasonably legible on a screen typical of, say, a laptop or tablet. It just seems like such a low-hanging fruit, and easy fix to facilitate learning and ensure legibility.
@Pancho96albo2 жыл бұрын
@@EV4UTube chill out dude
@zahrasattari87383 жыл бұрын
Thanks a lot for a great video. Could you possibly guide me to a source with info on performing cross validation in R after doing PCA on the data? Possibly as clear as yours:) I've been searching and mostly came across guides on how it's performed after doing PLS-DA. I'm preparing a report for a modeling course and am asked to provide (describe and perform) a validation step.
@hefinrhys85723 жыл бұрын
So cross-validation is only useful for supervised learning modelling, because we have a ground truth to evaluate the model performance against. PCA is an unsupervised algorithm, (it's really just a transformation of the data), and it doesn't make predictions. In contrast, PLS-DA is a supervised classification algorithm. To train it, you need to start with labelled data. So in this case, cross-validation is a useful tool to evaluate model performance, because we can compare the model predictions to the ground truth. Does that make sense?
@zahrasattari87383 жыл бұрын
@@hefinrhys8572 Sure it does. Thanks a lot! If I'm right PCA is considered modeling when it comes to picking the appropriate number of PCs that best describe the data. And as it is meant to be reported for a course project and I need to include/perform a validation step I am pushing to get somewhere with it without having to include PLS-DA in the whole story.. I guess I can still consider the step with "making sure I have picked the right number of PCs for my model", as a validation step, am I right about this? I came across this link, which I guess is close to what I'm looking for www.r-bloggers.com/2018/10/obtaining-the-number-of-components-from-cross-validation-of-principal-components-regression/
@hefinrhys85723 жыл бұрын
Ah ok, well of you're using principal components as predictors in a supervised model, then you can use cross validation to guide the number of components you should include. For example, you train a model with the first 5 principal components and use cross-validation to evaluate the performance of this model, then try the first 4, then 3 and so on. You can pick the number of principal components that gives you the best cross validation performance. This is essentially a feature selection problem. You can do this manually, or the mlr package in R can help you achieve this by creating a 'wrapped learner'.
@zahrasattari87383 жыл бұрын
thanks again.. hope I can get somewhere with it:)
@zahrasattari87383 жыл бұрын
@@hefinrhys8572 Hi it's me again:) I am using your nice PCA plot code for plotting some data with limited number of samples and the idea is to show they are not grouped based on some variable (which they are not). Then I'd like to show in my plot the sample numbers. This is probably a very basic question, I was wondering how I could insert the sample numbers in the plot. (I'm guessing I should add something to the geom_point line (?))
@abhiagni2427 жыл бұрын
Thanks for the video..helped a lot :)
@hefinrhys92347 жыл бұрын
ABHI agni Glad it helped :) Feel free to give feedback on other topics that would be useful.
@hoseinmousavi48904 жыл бұрын
Thanks for your nice job! I have a question. I have a biostat data. As you told in this video, we do not need to know what is our variable for colour grouping! Actually, I have a problem, and it does not work for me! aes(x = PC1, y = PC2 , col= ??? ) I really appreciate it if you reply me back!
@djangoworldwide7925 Жыл бұрын
Great tutorial but it leaves me with the question, what do i do with it? Is this just the begining of a K means classification that gives me an idea of the proper k?
@djangoworldwide7925 Жыл бұрын
Lol you just replied in 26:00... Thank you so much!