Video tutorial on running principal components analysis (PCA) in R with RStudio. Please view in HD (cog in bottom right corner). Download the R script here: drive.google.com/open?id=1tbi...
Пікірлер: 209
@sadian33926 жыл бұрын
I had listened to several other lectures on this topic but the pace and the detail covered in this video is simply the best. Please keep up the good work!
@hefinrhys85726 жыл бұрын
Thanks Sadia! Glad to be of help.
@rebecai.m.66706 жыл бұрын
OMG, this tutorial is perfection, I´m serious. You make it sound so easy and you explain every single step. Also, that is the prettiest plot I´ve seen. Thank you so much for this.
@hefinrhys85726 жыл бұрын
You're very welcome! If you like pretty plots, check out my video on using ggplot2 ;) kzbin.info/www/bejne/Z3jQgmh4maabfZY
@HarmonicaTool2 жыл бұрын
5 year old video still one of the best I found on the topic on YT. Thumbs up
@maitivandenbosch15414 жыл бұрын
Never a tutorial about PCA so clear and simply. Thanks
@vplougoboy3 жыл бұрын
Noone explains R better than Hefin. Give this man a medal already!!
@user-kb6ui2sh5v Жыл бұрын
really useful video thank you, I've just started my MSc project using PCA, so thank you for this. I will be following subsequent videos.
@WatchMacro165 жыл бұрын
Finally a perfect tutorial for POA in Rstudio. Thanks mate!
@Rudblattner3 жыл бұрын
I never comments on videos, but you really saved me here. Nothing was working on my dataset and this came smoothly. Well done on the explanations too, everything as crystal clear.
@PhinaLovesMusic5 жыл бұрын
I'm in graduate school and you just explained PCA better than my professor. GOD BLESS YOU!!!!
@jackiemwaniki12664 жыл бұрын
How i came across this video a week before ,my final year, project due date is a miracle. Thank you so much Hefin Rhys.
@mohamedadow81534 жыл бұрын
Jackie Mwaniki doing?
@jackiemwaniki12664 жыл бұрын
@@mohamedadow8153 my topic is on Macroeconomic factors and the stock prices using the APT framework.
@Axle_Tavish2 жыл бұрын
Explained everything one might need. If only every tutorial on KZbin is like this one!
@chinmoysarangi93994 жыл бұрын
I have my exam in 2 days and Your video saved me tons of effort in combing through so many other articles and videos explaining PCA. A BIG Thank You! Hope you do many more videos and impart your knowledge to newbies like me. :)
@timisoutdoors4 жыл бұрын
Quite literally, the best tutorial I've ever seen on an advanced multivariate topic. Job well done, sir!
@tylerripku82223 жыл бұрын
The best run through I've seen for using and understanding PCA.
@shantanutamuly69324 жыл бұрын
Excellent tutorial. I have used this for analysis of my research. Thanks a lot for sharing your valuable knowledge.
@johnkaruitha25273 жыл бұрын
Great help, been doing my own work following step by step this tutorial...the whole night
@jackpumpunifrimpong-manso65234 жыл бұрын
Excellent! Words cannot show how grateful I am!
@nrlzt9443 Жыл бұрын
really love your explanantion! thank you so much for your video, really helpful and i can understand it! keep it up! looking forward to your many more upcoming videos
@ditshegoralefeta13154 жыл бұрын
I've been going through your tutorials and I'm so impressed. Legend!!!
@chris-qm2tq Жыл бұрын
Excellent walkthrough. Thank you!
@lilmune4 жыл бұрын
In all honesty this is the best tutorial I've seen in months. Nice job!
@brunocamargodossantos50492 жыл бұрын
Thanks for the the video, it helped me a lot!! Your explanation is very didactic!
@fabriziomauri91094 жыл бұрын
Damn, your accent is hypnotic! The explanation is good too!
@hefinrhys85724 жыл бұрын
Thanks! 😘
@HDgamesFTW4 жыл бұрын
Best explanation I’ve found so far! Thanks mate, legend!
@HDgamesFTW4 жыл бұрын
Uploaded the script as well what a guy
@siktrading31172 жыл бұрын
This tutorial is outstanding. Excellent explanation! Thank you very much!!!
@0xea31c02 жыл бұрын
The explanation is just perfect. Thank you.
@glenndejucos38913 жыл бұрын
This video gave a major leap in my study. Thanks.
@lisakaly6371 Жыл бұрын
In fact I found out how to overcome the multicolinearity , by using the eigen values of PC1 and PC2! I love PCA!
@brunopiato6 жыл бұрын
Great video. Very instructive. Please keep making them
@vagabond197979 Жыл бұрын
Added to my stats/math playlist! Very useful.
@em701713 жыл бұрын
This is gold. I absolutely love you for this
@andreamonge50252 жыл бұрын
Thank you so much for the very clear and concise explanation!
@elenavlasenko54526 жыл бұрын
I can say for sure that it´s the best explanation I´ve ever seen!! Go on and I would be really grateful if you make one of Time Series and Forecasting :)
@hefinrhys85726 жыл бұрын
Thanks Elena! Thank you also for the feedback; I may make a video on time series in the future.
@tankstube096 жыл бұрын
Very nice tutorial, nicely explained and really complete, looking forward to learn more in R with other of your vids, thank you for the tremendous help!
@hefinrhys85726 жыл бұрын
Thank you! I'm glad it helped.
@kevinroberts5703 Жыл бұрын
thank you so much for this video. incredibly helpful.
@murambiwanyati36072 жыл бұрын
Great teacher you are, thanks
@johnmandrake88293 жыл бұрын
its so funny I don't think you realize but myPR "my pyaar" in Urdu/Hindi means my love. Thank you for an amazing and extremely helpful video
@harryainsworth69234 жыл бұрын
this tutorial is slap bang fuckin perfect, god bless you, you magnificant bastard
@hefinrhys85724 жыл бұрын
😘
@harryainsworth69234 жыл бұрын
@@hefinrhys8572 stats assignment due in 12 hours and you saved me alot of hassle
@himand112 жыл бұрын
Thank you so so much!! You just saved the day and helped me really understand my homework for predictive analysis.
@florama52106 жыл бұрын
It is a really nice and clear tutorial! Thanks a lot, Hefin~
@hefinrhys85726 жыл бұрын
You're welcome Flora! Thank you!
@blackpearlstay3 жыл бұрын
Thank you so much for this SUPER helpful video. (P.S. The explanation with the iris dataset was especially convenient for me as I'm working on a dataset with dozens of recorded plant traits:D)
@biochemistry97294 жыл бұрын
Thank you so much! This is GREAT! You explained very clearly and smoothly.
@arunkumarmallik90914 жыл бұрын
Thanks for nice and easy way of explanation.It really helps me a lot.
@mustafa_sakalli3 жыл бұрын
Finally understood this goddamn topic! Thank you dude
@shafiqullaharyan2614 жыл бұрын
Perfect! Never seen such explanation
@rVnikov6 жыл бұрын
Excellent tutorial Hefin. Hooked and subscribed...
@hefinrhys92346 жыл бұрын
Vesselin Nikov thank you! Feel free to let me know if there are other topics you'd like to see covered.
@Fan-vk9gx3 жыл бұрын
You are really a life saver! Thank you!
@tonyrobinson9046 Жыл бұрын
Outstanding. Thank you.
@sandal-city-pet-clinic-15 жыл бұрын
simple and clear. very good
@timothystewart73003 жыл бұрын
Fantastic video Hefin! thanks
@blessingtate93874 жыл бұрын
You "R" AWESOME!!!
@mativillagran16844 жыл бұрын
thank you so much! you are the best, very clear explanation.
@testchannel58054 жыл бұрын
Very nice, guys hit the subscribe button, the best explanation so far.
@Jjhukri5 жыл бұрын
Amazing video Hefin, there are lot of details covered in 27 min video, we just have to be careful not to miss any second of the video. I have a question: How does the scores are calculated for each PC's ? Why do we have to check the correlation between the variables and the PC1 & PC2 ? what value it adds practically ?
@kmowl19942 жыл бұрын
Very helpful, thanks!
@fatimaelmansouri93383 жыл бұрын
Super well-explained, thank you!
@SUMITKUMAR-hj8im4 жыл бұрын
a perfect tutorial for PCA... Thank you
@metadelabegaz62796 жыл бұрын
Sweet baby Jesus. Thank you for making this video!
@hefinrhys85726 жыл бұрын
You're very welcome!
@aliosmanturgut1023 жыл бұрын
Very informative and clear Thanks.
@mario17-t342 жыл бұрын
Thanks much Hefin!!!
@Actanonverba015 жыл бұрын
Clear and straight forward, good work! Bully for you! Lol
@galk325 жыл бұрын
amazing video, thank you
@esterteran28723 жыл бұрын
Good tutorial!I have learnt a lot. Thanks !
@Badwolf_823 жыл бұрын
Thank you so much for this tutorial, it really helped me!
@heartfighters20555 жыл бұрын
just brilliant
@OZ884 жыл бұрын
Ok so the Sepal.Width contributes mostly over 80% to the PC2 and the other three to PC1 more. 14:32 and so Sepal Width is fair enough as an info to separate setosa in the next plot. Isn't it also advisable to apply pca to linear problems?
@hefinrhys85724 жыл бұрын
You're correct about the relative contributions of the variables to each principal component. The Setosa species is discriminated from the other two species mainly by PC1, to which sepal.width contributes less that than the other variables. As PCA is a linear dimension reduction technique, it will best reveal clusters of cases that are linearly separable, but PCA is still a valid and useful approach to compress information, even in situations where this isn't true, or when we don't know about the structures in the data. Non-linear techniques such as t-SNE and UMAP are excellent at revealing non-linearly-separable clusters of cases in data, but interpreting their axes is very difficult/impossible.
@yayciencia3 жыл бұрын
Thank you! This was very helpful to me
@rockcandy285 жыл бұрын
Hello! Thanks for the video, just a question how would you modify the code if you have NA values? In advance, thank you!
@andrewh87472 жыл бұрын
Fantastic!
@Emmyb6 жыл бұрын
this video is fab thank you!
@hefinrhys85726 жыл бұрын
Thank you Emily! Happy dimension reduction!
@aminsajid1232 жыл бұрын
Amazing video! Thanks for the explaining everything very simply. Could you please do a video on PLS-DA?
@maf44213 жыл бұрын
Thank you Hefin Rhys for explaining PCA in detail. Can you please explain how to find weights of a variable by PCA for making a composite index? Is it rotation values that are for PC1, PC2, etc.? For example, if I have (I=w1*X+w2*Y+w3*Z) then how to find w1, w2, w3 by PCA.
@anjangowdas25413 жыл бұрын
Thank you, it was very helpful.
@tiberiusjimbo91763 жыл бұрын
Thank you. Very helpful.
@lindseykoper7612 жыл бұрын
Thank you so much for your videos!! Your videos are the best I have seen hands down :) All of your explanations and step by step through R are what I needed to work on my research. One area I am having trouble with (since I am not a statistician) is making sure I run my data through all the necessary statistical tests before running the PCA. My data is similar to the iris dataset (skull measurements categorized by family and subfamily levels) but I am seeing different sources run different tests before the PCA (ANOVA vs non-parametric tests). If anything, would you be able to recommend some good sources for me to refer to? Thank you! I really appreciate it!
@salvatoregiordano25113 жыл бұрын
Hi Hefin, Thanks for this tutorial. What do we do if PC1 and PC2 can only explain around 50% of the variation? Do we also include PC3 and PC4? If so, how?
@bitanbasu19655 жыл бұрын
Thanks Hefin :)
@EV4UTube3 жыл бұрын
Can I confess something that baffles me? Because, I see this all the time. OK, so you, personally, are motivated to share your knowledge with the world, right? I mean, you took time, effort, energy, focus, planning, equipment, software, etc. to prepare this explanation and exercises. You screen-captured it, you set up your microphone, you edited the video, you did all this enormous amount of work. You're clearly motivated. Yet, when it actually comes time to deliver that instruction, you think it is 100% acceptable to place all your code into an absolutely miniscule fraction of the entire screen. Like, pretty-close to 96% of the screen is 'dead-space' from the perspective of the learner. The size of the typeface is miniscule (depending on your viewing system). It would be like producing a major blockbuster film, but then publishing it at the size of a postage stamp. Surely, it would be possible for you to 'zoom-into' that section of the IDE to show people what it was you were typing - the operators, the functions, the arugments, etc. I'm not really picking on you, individually, per se. I see this happen all the time with instructors of every stripe. I have this insane idea that instruction has much, much less to do with the insturctor's ability to demonstrate their knowledge to an uninformed person and has much, much more to do with the instructor's ability to 'meet' the student 'where' they are and to carry the student from a place of relative ignoracne (about a specific topic) to a place of relative competence. One of the best tools for assessing whether you're meeting that criteria is to PRETEND that you know nothing about the topic - then watch your own video (stripping-out all the assumptions you would automatically make about what is going on based on your existing knowledge). If you didn't have a 48" monitor and excellent eye-sight, would you be able to see what was being written? Like... why would you do that? If writing of the code IS NOT important - don't bother showing it. If writing of the code IS important, then make it (freaking) visible and legible. This really baffles me. I guess instructors are so "in-their-own-head" when they're delivering content, they don't take time to realize that no one can see what is happening. . It just baffles me how often I see this.
@EV4UTube3 жыл бұрын
If 'zooming-in' is not easily achieved, the least instructors could do is go into the preferences of the IDE and jack-up the size of the text so that it would be reasonably legible on a screen typical of, say, a laptop or tablet. It just seems like such a low-hanging fruit, and easy fix to facilitate learning and ensure legibility.
@Pancho96albo2 жыл бұрын
@@EV4UTube chill out dude
@alessandrorosati969 Жыл бұрын
How is it possible to generate outliers uniformly in the p-parallelotope defined by the coordinate-wise maxima and minima of the ‘regular’ observations in R?
@evidenceandlogic69364 жыл бұрын
Top notch. Thank-you.
@abhiagni2427 жыл бұрын
Thanks for the video..helped a lot :)
@hefinrhys92347 жыл бұрын
ABHI agni Glad it helped :) Feel free to give feedback on other topics that would be useful.
@tiffanyd65432 жыл бұрын
THANK YOU SO MUCH
@rifathasnat34952 жыл бұрын
Thank you!
@mohammadtuhinali1430 Жыл бұрын
Many thanks for your efforts to make this complex issue much easier for us. Could you enlight me to understand group similarly and dissimilarity using pca?
@samuelokt4 жыл бұрын
Thanks for the tutorial!!
@AcademicActuary3 жыл бұрын
Great presentation! However, why did you not binarize the categorical variable first, and then do the subsequent analysis? Thanks!
@khanofficial22493 жыл бұрын
Very informative video. Can you tell me? When i m plotting the last plot ggplot it showed error like . R said there is no package called digest. How to deal with it kindly advise.
@lisakaly6371 Жыл бұрын
Thank you for this great video. can you show how to seek multicolinearity or treat multicolinearity with PCA ? I have a data set with 40 variables with high intercorrelation because of cross reactivity . VIF and matrix correlation doesnt work probably because of multiple comparison ....:(((
@federicogarland2722 жыл бұрын
thank you very much
@danieldavieau15176 жыл бұрын
Damn good job!
@hefinrhys85726 жыл бұрын
Thanks Daniel!
@sandracuadros37874 жыл бұрын
Hi! I have a question, does it make sense to run a PCA on discrete data? I am trying something using your tutorial as a guide but I get a weird result in the plot, and I am wondering it it is because of the nature of my data. Thanks
@hefinrhys85724 жыл бұрын
Great question! If your data are not ordinal, you may get some use out of PCA if you numerically encode your discrete variables, but you may get more out of Multiple Correspondence Analysis (MCA) than PCA. Have a look here: www.rpubs.com/piterii/dimension_reduction
@DesertHash3 жыл бұрын
At 5:50, don't you mean that if we measured sepal width in kilometers then it would appear LESS important? Because if we measured it in kilometers instead of millimeters, our numerical values will be smaller and vary far less, making it less important in the context of PCA. Thank you for this video.
@hefinrhys85723 жыл бұрын
Yes, you're absolutely correct! What I meant to say was that if that length was kilometers, but we neasured it in millimeters, then it would be given greater importance. But yes, larger values are given greater importance.
@DesertHash3 жыл бұрын
@@hefinrhys8572 Alright, thanks for the reply and for the video!
@jackiemwaniki12664 жыл бұрын
Thank again. Quick one....Would you mind also doing the Fama and Macbeth Analysis without using the KenFrench Dataframe?
@kasia9904 Жыл бұрын
when i generate the PCA with the code explained @ 20:46 my legend appears as a gradient rather than the separate values (as in your three different species appearing in red, blue green. how can i change this?
@yuvenmuniandy82026 жыл бұрын
Amazing tutorial. Very simple and straight to the point. Already subscribed. I have some questions. PCA is an unsupervised method, isn't it? Is it possible to further decompose the data for Versicolor and Virginica to find further grouping? I have read before there are supervised methods. Do you have some tutorial for those?
@hefinrhys85726 жыл бұрын
Thanks enthiran! Yes, PCA is unsupervised because we don't give it any information about group membership, we give it unlabelled data and let if find the optimal projection of the data into a lower dimensional space that maximises the explained variance. If you wanted to build a model to predict group membership, then you would need to use a supervised cluster analysis algorithm, where you supply a training dataset with grouping labels (this is what makes it supervised). The algorithm will then learn which features in the data associate with each group, such that when you give the model unlabelled data, it will predict group membership. I have a video on various clustering algorithms here: kzbin.info/www/bejne/homYn4Z4fKdoitk
@shapsgh5 жыл бұрын
There I have a question. Why "iris[,-5]*myPr$rotation" is not equal to "myPr$x" ? Isn't the "myPr$rotation" matrix factor loadings? Thanks in advance...
@djangoworldwide7925 Жыл бұрын
Great tutorial but it leaves me with the question, what do i do with it? Is this just the begining of a K means classification that gives me an idea of the proper k?
@djangoworldwide7925 Жыл бұрын
Lol you just replied in 26:00... Thank you so much!
@JibHyourinmaru2 жыл бұрын
If my biological data only has numbers(1,2 & 3 digits) and a lot of zeros, do I need to scale also?
@MiloLabradoodle4 жыл бұрын
Thanks for the link to the R code.
@Orange-xw4lt4 жыл бұрын
Hi, good job but If I have an input data as a wave how can I take and separate the values of the crests starting from a certain threshold?