Correction: 3:23 The array should only have wt through wt5, ko1 through ko5. Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
@GoktugAsc1233 жыл бұрын
Thank you, I was mentioning 3:23. Your videos are great. I am a medical doctor from Turkey and currently, I am planning a career change to data science and I have been watching your videos to get prepared for a data scientist position. Could you create a few videos regarding data science interviews if it is relevant for your channel content? Best Regards, Göktuğ Aşcı, MD.
@statquest3 жыл бұрын
@@GoktugAsc123 I'll keep that in mind.
@statquest3 жыл бұрын
@@keerthik3791 Unfortunately the random forest implementations for Python are really bad and they don't have all of the features. If you're going to use a random forest, I would highly recommend that you do it in R instead.
@keerthik21683 жыл бұрын
@@statquest Thankyou for the suggestion. I am good at Python, MATLAB. Can I do random forest in MATLAB? Or is learning R necessary here?
@statquest3 жыл бұрын
@@keerthik2168 I have no idea. I've never tried to do random forests in Matlab.
@pressiyamu89764 жыл бұрын
Dude you deserve a humanitarian award.
@statquest4 жыл бұрын
Thanks! :)
@joshuamcguire48324 жыл бұрын
he is a good human in my eyes
@rezab3143 жыл бұрын
@@joshuamcguire4832 bam!
@joshuamcguire48323 жыл бұрын
@@rezab314 super bammm!!!
@mohammedghouse2353 жыл бұрын
Not only the best PCA demonstration but also THE BEST introduction to Python. Hats off to you man!!
@statquest3 жыл бұрын
Thank you! :)
@LittleScience3 жыл бұрын
I have been dabbling in data science for a while now, and only now learned that pandas stand for "panel data" xd This channel never ceases to amaze
@statquest3 жыл бұрын
:)
@advaitshirvaikar47514 жыл бұрын
Whenever I search for some machine learning based explanation, I add 'by statquest' in it ^_^. Keep up the great work :')
@statquest4 жыл бұрын
Thank you very much!
@shaktishivalingam38804 жыл бұрын
@@statquest It's True I do the same thing ..thank you for your hard work
@mattheckel26094 жыл бұрын
"Note: We use samples as columns in this example because... but there is no requirement to do so." "Alternatively, we could have used..." "One last note about scaling with sklearn vs scale() in R" This is some of the gold that sets StatQuest apart. Thank you! ❤
@statquest4 жыл бұрын
Thank you! :)
@AniruddhModi-y2x16 күн бұрын
The fact that you said bam when the plot showed what we wanted really shows that even if you are a pro python programmer, you still feel happy when you code correct, relatableeee
@statquest16 күн бұрын
bam! :)
@hayskapoy5 жыл бұрын
Finally! You explain in the language I understand much better than English haha Thanks !!!
@statquest5 жыл бұрын
:)
@superiorphi57784 жыл бұрын
but you are watching a tutorial \(-_-)/
@tl-lay4 жыл бұрын
YOU ARE SAVING MY DEGREE I LOVE YOU SO MUCH I CANT EVEN BELIEVE THIS IS THE SAME MATERIAL IM LEARNING IN MY MACHINE LEARNING CLASS RIGHT NOW.
@statquest4 жыл бұрын
Happy to help!
@vedparulekar478 Жыл бұрын
One of the best videos ever made on this topic. This channel has helped me a lot in understanding machine learning in greater detail. Keep up the good work !!
@statquest Жыл бұрын
Thank you!
@jiayoongchong26064 жыл бұрын
6:31 using scikit PCA 8:35 plotting scree plot 10:37 loading scores for each principal component
@statquest4 жыл бұрын
Thanks for the time point! I'll add those to the description to divide the video into chapters.
@x11y22z33me2 жыл бұрын
Simply loving StatQuest. Concise, clear and fun videos. One point I noted while watching this video is that the latest version of sklearn PCA() will center the data for you, but not scale it. So if you just need centering for doing pca, you don't need to worry about preprocessing.
@statquest2 жыл бұрын
Thanks for the update!
@BrunetteViking2 жыл бұрын
This channel is the best KZbin channel that I discovered. Thank you, sir!
@statquest2 жыл бұрын
Thanks!
@DATABOI7 жыл бұрын
Python. Now you're speaking my language :)
@HK-sw3vi4 жыл бұрын
want me to take out my python?
@joicejoseph21764 жыл бұрын
@@HK-sw3vi ...weirdo
@spag52964 жыл бұрын
You've got the right formula for simple explanations. Teach me dawg
@statquest4 жыл бұрын
Thank you! :)
@reneeliu66766 жыл бұрын
I am watching the 1st minute and I'm already super excited. Thanks!!
@statquest6 жыл бұрын
Hooray!!!!!! :)
@raphael38356 жыл бұрын
The only good step by step explanation I found on the web. Thank you so much!
@statquest6 жыл бұрын
Hooray!!! Thank you so much! :)
@oswaldocastro96006 жыл бұрын
Hi Josh... Simply incredible all StatQuest videos... Triple Bam!!!
@statquest6 жыл бұрын
Thank you! :)
@neptunesbounty17864 жыл бұрын
I learn so much better in Python for some reason, I think it's because it's more interactive and you can play around with the data! Good one. Stattttquueeeeeest.
@statquest4 жыл бұрын
Thanks! There should be a lot more Python videos and learning material out soon.
@godsperson55713 жыл бұрын
@@statquest looking forward to it :).
@shanmugapriyak72694 жыл бұрын
Always can find a new and detailed explanation of steps from your videos! Thank you!
@statquest4 жыл бұрын
Thank you! :)
@samirsaci67233 жыл бұрын
I push the like button even before I play the video. Because Josh never fails to amaze me.
@statquest3 жыл бұрын
bam!
@antomartanto3 жыл бұрын
You are one the best teacher that i've ever found. Thank you very much!
@statquest3 жыл бұрын
Thank you! :)
@LincolnFrias6 жыл бұрын
It's awesome to have the explanation based on python code. Thanks a lot!
@statquest6 жыл бұрын
No problem. I'm doing a lot more python coding these days, so hopefully I'll more of these "in python" videos.
@amribrahim78503 жыл бұрын
Awesome. Please create more videos about how to implement the machine learning as well as data science concepts explained here into Python. That would be super helpful for us, in particular beginners.
@statquest3 жыл бұрын
Thanks, will do!
@christopheryogodzinski68607 жыл бұрын
Another Great StatsQuest in the books!
@fvviz4094 жыл бұрын
MAKE MORE PYTHON CONTENT PLEASE I LOVE IT
@statquest4 жыл бұрын
I'm working on it. :)
@jack.1.3 жыл бұрын
Wish there were more statquest coding in python videos, they are the best! Much prefer to regular content although that is still really high quality
@statquest3 жыл бұрын
Noted.
@jiangxu38953 жыл бұрын
Thank you Josh. Such practice is important and valuable!! And you really also taught some Python tricks that I don’t know.
@statquest3 жыл бұрын
Thank you! :)
@saiakhil47513 жыл бұрын
Wow Josh.. Thanks for that unpacking concept. I never knew that my whole life...
@statquest3 жыл бұрын
You bet!
@nonalcoho4 жыл бұрын
I like the way you plot the ratio of each PC~~ It is really easy to read! BAM~~~~~~~~~~
@statquest4 жыл бұрын
Thank you!
@RimaHandewi7 ай бұрын
Wow, your explanation is so clearly!!
@statquest7 ай бұрын
Thank you! 😃
@rohitrajora98323 жыл бұрын
Really appreciate this and would love to see more concepts implemented in python.
@statquest3 жыл бұрын
Thanks!
@KikiBah3 жыл бұрын
This was so clear, thanks! Finally I can do PCA in python, BAM 😊 You DA BEST!
@statquest3 жыл бұрын
Thanks!
@merrimac15 жыл бұрын
Thanks for the tutorial! One thing I don't understand is why the PC1 can separate the wt and ko samples. Their gene expression values are generated in a same way.
@3stepsahead7042 жыл бұрын
Just stating I have the same question 2 years later.
@IntegralDeLinha2 жыл бұрын
Woww! That was absolutely awesome!!! Thank you so much!
@statquest2 жыл бұрын
Glad you liked it!
@danielcozetto4213 жыл бұрын
Hello Josh, Thank you for the amazing video! Quick question, at 9:18 how can I adapt "index=[*wt, *ko] for an excel input? Lets say that we have the same variables (Genes vs wt/ko) but in an excel file. How can I add these labels to the final plot (9:47)? Thank you again!!
@statquest3 жыл бұрын
I'm not sure I understand your question. You can export your data from excel and import it into python (or R or whatever). Or are you asking about something else?
@rabiabibi86345 жыл бұрын
Hi Josh. The best PCA explanation. Thanks a lot :-) May GOD bless you 😊
@statquest5 жыл бұрын
Thank you! :)
@pressiyamu89764 жыл бұрын
Yes, May god bless you 100 times. May the troubles of today’s world not reach your doorstep. You’re a great person.
@vipulsonawane75082 жыл бұрын
What a playlist, I simply loved it 😘
@statquest2 жыл бұрын
Thank you!
@danielvmartins4635 Жыл бұрын
Excellent work!!! 👏👏
@statquest Жыл бұрын
Thanks a lot!
@jeremylv30293 жыл бұрын
Man, u r a gem. I will pay for the knowledge later after my graduation bro. lol
@statquest3 жыл бұрын
Wow! Thank you! :)
@timharris726 жыл бұрын
This was a reallly good explanation using Python
@henkhbit57484 жыл бұрын
As always a great presentation and the python code just give the extra bite...
@statquest4 жыл бұрын
Thanks!
@KnightPapa5 жыл бұрын
Thank you! This video helped a lot with what I'm trying to do.
@statquest5 жыл бұрын
Awesome!
@kannurajnathamuni99662 ай бұрын
You are the best!!!! It would be great if you could make a video on speculative decoding using medusa and quantization of neural networks in general
@kannurajnathamuni99662 ай бұрын
@statquest
@statquest2 ай бұрын
I'll keep that in mind! :)
@miskaknapekАй бұрын
very much enjoy your explanation style. many thanks for the great videos!
@statquestАй бұрын
Thanks!
@miskaknapekАй бұрын
@@statquest excellent going - really. difficult to know what's up and down in data science, and so i'm happy your videos cover subjects from mathematical concepts to code implementation. excellent spirit and explanations, again. (sorry about the superlative avalanche - in the vast ocean that's the net, it's difficult finding authoritative sources covering subjects well ) bests from Germany/Denmark ;)
@statquestАй бұрын
@@miskaknapek BAM! :)
@wuyanyun5 жыл бұрын
Thank you! I’ve been struggling with this problem for so long !
@statquest5 жыл бұрын
Hooray! I'm glad the video was helpful. :)
@geraldopontes373 жыл бұрын
Your videos are great! Thanks
@statquest3 жыл бұрын
Thanks!
@KomangWahyuTrisna4 жыл бұрын
i really like your clear explanation. please do some videos about deep learning and NLP.
@statquest4 жыл бұрын
I'm working on them.
@KomangWahyuTrisna4 жыл бұрын
@@statquest yeah! I am waiting for that
@olehsorokin79633 жыл бұрын
That's a cool one. The fact that observations are columns makes it so confusing though. I'm really used to the tidy data notation
@statquest3 жыл бұрын
Noted
@liranzaidman16104 жыл бұрын
Amazing! this is so important, thanks a lot.
@statquest4 жыл бұрын
Thanks! :)
@ccli2 жыл бұрын
generally, in ML, we use 'columns' as 'features(variables)' and ''rows' as 'examples', but in the video, it is inverse. but is is not a big deal.
@statquest2 жыл бұрын
It depends on the field you are in. I used to work in Genetics and this is the format they used. So it's always worth checking to make sure you have the data correctly oriented.
@angeloperera20229 ай бұрын
Amazing video! I initially watched the video explaining PCA and i was mind-blown, thank you so much! I was hoping to ask if anyone on the comment section or even StatQuest if possible, would know how to implement PCA in a multivariate timeseries dataset and also "examine the loading scores" in such a dataset. Thanks in advance! :) P.S - extremely clueless on anything coding or ML, but Ive got to use PCA (and other dimensionality reduction methods) on my timeseries dataset. so would greatly appreciate any direction on how to proceed.
Incredible French accent “Poisson distribution” , I saw it three times 😆
@statquest10 ай бұрын
:)
@andresfelipehiguera7854 жыл бұрын
Python ε> now we are talking!
@statquest4 жыл бұрын
:)
@kamogelomaila39046 жыл бұрын
Hi Joshua, thanks for that. really helpful. i'm quite new to python myself, and i'm trying to compile a PCA across a range of macro-economic factors (inflation,gdp,fx, policy rate etc.,), now in all that you've done above where is the display of the PCA i.e: the newly uncorrelated data set, is it the loading scores you printed? or the wt, and ko variables you plotted? Thanks
@guohanzhao78136 жыл бұрын
COOOOOL, so easy to understand!
@damianwysokinski32855 жыл бұрын
instead of scaled_data= preprocessing.scale(data.T) we can write scaled_data = preprocessing.scale(data, axis=1). Axis 1 -> mean 0 by columns, Axis 0 -> mean 0 by rows. For me it is more intuitive :)
@matthsant5 жыл бұрын
Excelent tutorial!!
@statquest5 жыл бұрын
Thank you! :)
@karannchew25343 жыл бұрын
Queation please... 09:50 wt and ko samples are both created with the same random function Poisson (10, 1000). Why are wt samples (and ko samples) more correlated??
@statquest3 жыл бұрын
Because rd.randrange(10, 1000) returns a random number between 10 and 1000. Once we get that random value, we use it to generate 5 values for the wt samples using a poisson distribution. Then we select another random value between 10 and 1000 and use it to generate 5 values for the ko using a different (because the random value is different) poisson distribution.
@metvava9 ай бұрын
great video! thanks for these!!! have you done a redundancy analysis and dbRDA plot video? thank you for contributing to our education
@statquest9 ай бұрын
I haven't done that yet.
@metvava9 ай бұрын
@@statquest let us know if you ever do! It would be a double bam from me. It just clicks the way you explain! Thank you again for your content!!!
@zishanahmedshaikh6 жыл бұрын
Hi Joshua, Great Videos!
@revolution77N6 жыл бұрын
Thank you very much! Super helpful!
@petersq55325 жыл бұрын
to fill up the dataframe: df.iloc[:, :5] = df.iloc[:,:5].apply(lambda x: np.random... ..., axis = 1, result_type='broadcast') posh...:)
@superaluis5 жыл бұрын
Thanks! I noticed his code just put the same values for every column. Your comment was a gem!
@khastehshodam5 жыл бұрын
Hi Josh Thank you for the video. It was a great tutorial. Just one question. What you called in the python code as loading_score, isn't in fact component score? It was score for each record (gene). Please correct me if I am wrong but isn't loading score the correlation between original fields (wt1, wt2 ...etc) and components? Thank you
@advaitshirvaikar47514 жыл бұрын
Hey Josh, how do I find out which feature in the original dataset is to be removed(the one that least affects the variance im assuming)? I know we use PCA for the same, but I just can't understand how we select the unimportant feature from the original dataset using PCA.
@statquest4 жыл бұрын
You can set a threshold for the loading scores. All features with loading scores below that threshold can be discarded.
@advaitshirvaikar47514 жыл бұрын
@@statquest okay, thanks a lot!
@prakhars9622 жыл бұрын
this is so good
@statquest2 жыл бұрын
Thank you!
@saulmartinez73512 жыл бұрын
4:46 Why the gene4,ko1 has a value over 1000 if the command says "get a random value between 0 and 1000? Thanks for the value !!
@statquest2 жыл бұрын
We select a random number between 10 and 1000 to be the mean of a poisson distribution. That's just the average value, and there can be larger and smaller values.
@saulmartinez73512 жыл бұрын
@@statquest oh! i see!! thank you so much, I still learning about this
3 жыл бұрын
I don't know why in the PCA graph you plot the "features", in some other videos, they plot all the data point and visualize the data in the new subspace... And I don't know what are the meaning of the x-axis in the same plot, what does -10 mean in the PC1-89.9%? thanks
@statquest3 жыл бұрын
I don't plot the features, I plot the subjects. For details, see: kzbin.info/www/bejne/fJjEnI2ta7Bkh7M
@vasanthakumar19916 жыл бұрын
BAM!!! I understood what u said. I show my gratitude. But I have a query. I am confused with my dataset regarding which to consider a row and which as columns My dataset is regarding Phase measurement units (PMU) used in electrical grid or sort of the distribution lines we see around. One single PMU measures 21 electrical parameters for a timestamp. We use around Four PMU each measuring the 21 parameters at different locations at the same time continuously over a period of time. How can I arrange the above data for Performing PCU sir?
@vasanthakumar19916 жыл бұрын
Sir those two case you mentioned that PCU would work is what I am also interested in calculating apart from the combination of all of the PMUs time stamp. Can u mention how to arrange the data (Rows and columns) for both of the mentioned viable cases? Thanking you so much!!You are really awesome sir
@saiakhil47513 жыл бұрын
Math learned statistics from Josh ;)
@statquest3 жыл бұрын
:)
@karannchew25343 жыл бұрын
Question please. This line trasforms the original data to a 10x10 array: pca_data = pca.transform(scaled_data) The video says: it generates the coordinates for PCA graph based on loading score and scaled data. Apart from the coordinates in graph, what do the values actually represent? How should I interpret them - Is it the amount of variance of sample values attributed to each PC? The distance of each sample on PC line to the origin? What is the unit?
@statquest3 жыл бұрын
The coordinates do not have units. And, as far as I know, they are just coordinates.
@statquest3 жыл бұрын
Oops!! I make a mistake a deleted your follow up comment. Sorry about that. However, my response is "Yes, the PCA graph is a graph that uses PCs as the axes."
@karannchew25343 жыл бұрын
@@statquest No problem. Thanks for confirming.
@ColeKillian5 жыл бұрын
Amazing video thank you very much
@statquest5 жыл бұрын
Thanks! :)
@sarathkareti89933 жыл бұрын
This video is really awesome! I am just confused on one thing, what are your predictors and what is your target?
@statquest3 жыл бұрын
PCA does not have predictors and targets. All variables are just...variables. For more details about PCA, see: kzbin.info/www/bejne/fJjEnI2ta7Bkh7M
@3stepsahead7042 жыл бұрын
Very concise, I will surely be coming back to this video, however I would like to know why PCA is able to group these two categories (wt and ko), when it's shown they are generated from the same random method. If all indexes were generated at the same time, I would get it, but as they are generated index by index, I seem not to be able to grasp it.
@statquest2 жыл бұрын
The trick is at 3:48. For each group, wt and ko, we select a different parameter for the poisson distribution and generate 5 measurements from each of those two different distributions. One set is for wt and the other set is for ko.
@3stepsahead7042 жыл бұрын
@@statquest I think my confusion comes from the fact that these will make the two groups different from one another (all w's different from ko's), but I wouldn't predict them to be similar within the group (wt1 is close in vertical to wt2, and to wt3...,), thus I tend to believe PCA should tell them apart, but not in exactly two groups (wt's vs ko's), I would predict more like two clouds instead of two "vertical line of points" in the 2-D.
@statquest2 жыл бұрын
@@3stepsahead704 Remember how PCA actually works, it finds the axis that has the most variation (which is between WT and KO) and focuses on that. And then find the secondary differences (among the WT and KO). However, because the differences between WT and KO are big, the scale on the x-axis will be much bigger than the scale on the y-axis. Thus, the samples will appear to be in a vertical line rather than spaced apart like you might guess they should be. In short, check the scales of the axes, they will explain the difference between what you think you see and what you expect.
@3stepsahead7042 жыл бұрын
@@statquest Thank you very much for taking the time to explain this. I now get it!
@azrahasan37963 жыл бұрын
Hi, You are a lifesaver. I am trying to do PCA analysis on my own data but since every demo video either use the databases and you created your own data. I am missing some crucial steps, especially in defining index when i am doing it with my data. Will it be too much to ask few more videos on machine learning where you use the excel sheet data from your laptop.
@azrahasan37963 жыл бұрын
I am a newbie in data science and programming. I am a Molecular Biologist who would love to learn machine learning.
@statquest3 жыл бұрын
I'll keep that in mind for a future video.
@naviddavanikabir4 жыл бұрын
fantastic, like always. I wonder how Poisson distribution caused each wt samples and ko samples to be correlated with each other?
@statquest4 жыл бұрын
Because we generated the data, I selected different lambda values for the wt from the ko samples.
@sayanbhattacharya32334 жыл бұрын
Please post some intuitions on sparse deconvolution and compressive sensing..Would love to understand your approach..❤️
@donkkey2454 жыл бұрын
dear instructor, will you release a python version of your ml course. supper fan here!
@statquest4 жыл бұрын
One day I will.
@donkkey2454 жыл бұрын
@@statquest hope that day comes quick. stay well.
@dnuyc3 жыл бұрын
Great tutorial, sorry if my question may be ammature, but how did they differentiate WT and KO apart in the final PCA, I thought the data set was randomly generated?
@statquest3 жыл бұрын
Early on we gave the rows and columns names and kept track of them.
@innocenceesstt14 жыл бұрын
Thank you very much for this tutorial. Please can you explain how to get correlation matrix
@statquest4 жыл бұрын
With numpy, you use corrcoef().
@innocenceesstt14 жыл бұрын
@@statquest Thank you very much
@mramadan20094 жыл бұрын
Hi Josh Thank you for your efforts, really statquest is a magnificent channel , Could you please make video for Singular Value decomposition SVD. thanks
@Cat_Sterling Жыл бұрын
Thank you!!! When we are speaking about variation in PCA, is that the same as variance?
@statquest Жыл бұрын
Yep.
@Cat_Sterling Жыл бұрын
@@statquest Thank you very much for the clarification! I googled it, and seems that it's two different things, but sometimes they can be used interchangeably or be the same thing.
@statquest Жыл бұрын
@@Cat_Sterling Yes, I guess it depends on how you want to use them and whether you divide by 'n' or 'n-1', but, at least on a conceptual level, they are the same.
@Cat_Sterling Жыл бұрын
@@statquest Thank you so much again! Really appreciate your reply! Your channel helped me so much!!!
@neilanthony75962 жыл бұрын
As PCA can be used to identify principal components which cluster a sample away from some clutter, can i then use these principal components as an input to a neural network to help better identify the sample? Or is this something which happens as matter of course in a neural network (ie the neural net doesnt need any help from PCA)? (i'm new to data science, and just discovering PCA, and where i can go now with all these amazing techniques, to help identify samples.)
@statquest2 жыл бұрын
PCA is often used to "denoise" data (by removing PCs that account for only a small amount of variation). before inputting it into all kinds of algorithms, neural networks included.
@neilanthony75962 жыл бұрын
@@statquest Many thanks, i see now, PCA is a good tool to begin to investigate the usefulness, or otherwise, of metrics, before moving on to more sophisticated algorithms of data science. Your KZbin videos are about the best there are for an introduction into data science. Thanks, i look forward to move of your videos!
@weiqingwang12024 жыл бұрын
Is loading score eigenvalues? Wish to see a more linear algebra method of explaining pca!
@statquest4 жыл бұрын
For more details on how PCA works, see: kzbin.info/www/bejne/fJjEnI2ta7Bkh7M
@vaggeliskyrilas35254 жыл бұрын
Firstly, very good video. Secondly I am running the code in some spectra I have and I get good PCA plot but the loading scores seems to be wrong. Have you any idea why?
@statquest4 жыл бұрын
No idea.
@Андрюхаслазерки3 жыл бұрын
It is great. sorry for my English. I want to see a video about lasso (l1-regularization).
@statquest3 жыл бұрын
Here's a video on Ridge regression: kzbin.info/www/bejne/h2mUg4VprrChaZI , lasso regression: kzbin.info/www/bejne/hHjJYamlibKfmdU and ridge vs lasso: kzbin.info/www/bejne/jp6VdJKdiaafbsU
@redcat74674 жыл бұрын
I have listened to the song twice.
@statquest4 жыл бұрын
bam!
@richardlin69934 жыл бұрын
Looking forward to Kernel PCA in Python or explanation!
@neilanthony75962 жыл бұрын
What is the optimum number of components in a PCA analysis? I can have have say 3 or 20, i just dont know the relative importance of each component, so i need to run an analysis. However, is there some maximum number of components, beyond which a code will crash, or take too long to run? Many thanks, N
@statquest2 жыл бұрын
I've done PCA with 1000's of components before.
@neilanthony75962 жыл бұрын
@@statquest Hi Josh, thanks for your reply. OK so i guess you take as many components as you can, and the PCA then tells you the dimensionality of a reduced form of your data? Ie you might take 1000 components, only to be told that PCA can reduce the dimensionality to say 4, just for argument sake? You then continue your work with just these main components.
@statquest2 жыл бұрын
@@neilanthony7596 As shown in the video, you can look at the scree plot and see how many PCs account for most of the variation in the data. So, you might start out with 1000 dimensions, and see that you only need 4 PCs to account for 99% of the variation. So you can use those 4 PCs, and only those, to continue with your work.
@neilanthony75962 жыл бұрын
thanks for the great videos, it's fantastic stuff!
@harryliu10053 жыл бұрын
Hi Josh!, this is a very excellent video that helped me a lot!!! I have a question, what if PC3 PC4 is also essential? Do I need to draw 2 2-D graphs, or what do I need to do?
@statquest3 жыл бұрын
If you want to draw the PCs and the data, then you'll have to draw multiple graphs. Or you can use the projections from the first 4 PCs and input to a dimension reduction algorithm like t-SNE: kzbin.info/www/bejne/hHbEhoaGab6YqK8
@alexisaddicted4 жыл бұрын
What about when you have +100 variables ? Plotting graph might not be a solution, and we perhaps would need to "guess"/test the amount of PCA's that we need beforehand, is there any systematic way to achieve that ? It is mostly like, just run a loop, each with increasing number of PCAs and just see which one yields better results ? (Spray and pray type of solution)
@statquest4 жыл бұрын
When you have more variables, you simply draw a scree plot to determine how many PCs to use in the graph. I describe this process starting at 8:27
@alexisaddicted4 жыл бұрын
@@statquest thank you for the response. I think that I might have previously misunderstood that part, but I get it now. Appreciate the clarification! 👍🏻
@statquest4 жыл бұрын
@@alexisaddicted If you want to learn more about the scree plot and what it means exactly, check out this video: kzbin.info/www/bejne/fJjEnI2ta7Bkh7M
@alexisaddicted4 жыл бұрын
@@statquest I had previously watched two of your videos about PCA, from 2015 and 2018 before watching this one :D . Can I ask you another question, that was not very clear for me ? For example, let's say that we are going to use two PCA's, is there a way for us to find out which variables these PCA's they represent ?
@statquest4 жыл бұрын
@@alexisaddicted Yes, you look at the loading scores. I talk about these at 10:19
@godsperson55713 жыл бұрын
Thanks for the "easy" to follow tutorial. I am trying to do a PCA for my RNAseq data but when I run scaled_tmp = StandardScaler().fit_transform(tmp.T) I get an error message: 'could not convert string to float: 'lcl|NC_000913.3_cds_NP_414542.1_1'. The lcl... is my target gene ID and I cannot edit it since i will need it later on to identify speficific genes. Please how do I solve this error message?
@statquest3 жыл бұрын
It looks like one of the columns in your matrix is some sort of identifier instead of sequencing data. In the video, when we create the data, we move identifiers to be row names or column names (see: 3:17). Other than the row and column names, the matrix that we do math on can only contain numbers because... how do we do math with identifiers?
@godsperson55713 жыл бұрын
@@statquest Thanks I was able to rectify the issue. I did not indicate "my gene_id" column as my index when loading data. After setting the index column it now works well.
@statquest3 жыл бұрын
@@godsperson5571 Hooray!
@trustmebaz6 жыл бұрын
Thank you Joshua for this wonderful explaination. Thanks a lot. I am using your code for generating a scree plot in the same way and I obtain this error: bar() missing 1 required positional argument: 'left'
@trustmebaz6 жыл бұрын
Yes, I was using the original code given. I am using Python3, could that be the issue?
@fmetaller6 жыл бұрын
First I want to thank you for all the awesome videos on PCA. I wanted to experiment with the demo code you published but I'm having problems in the data generation. The asterisk method used to stack wt and ko series is not working.
@statquest6 жыл бұрын
Which version of Python are you using? The code was written for Python 3 and I'm not sure the asterisk method works in Python 2.x.
@fmetaller6 жыл бұрын
Oh, you are right, I was accidentally using Python 2. Now it works well in Python 3
@statquest6 жыл бұрын
Hooray! I'm glad you got it working :)
@vannanuon60776 жыл бұрын
Hi Joshua, Thank a lot for a clear explanation and you walk me well each step. I have one question relating to PCA of Scikitlearn. Actually, you have said in your clip, but would like to ask to get a bit clearer. When using PCA of Scikitlearn, we must do train and test, is it right? The one in your clip is just part of it, right? I ask this because the result that I do following your step is different with the result from other programmes (CANOCO). Many thanks and look forward to hearing from you soon.
@vannanuon60776 жыл бұрын
Thank a lot Joshua for your clear explanation. I hope and wish to see your new clip relating Train and Test PCA.
@vannanuon60776 жыл бұрын
I appologise for one more question. I use your script in your Video to run with the data from the link here ("archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"). It was run well except the last step (chart). It shows error as the following. do you know what happen with data or the script? TypeError: cannot convert the series to Thank and sorry again for another question.
@vannanuon60776 жыл бұрын
Thank a lot Joshua for the link. It is really useful.
@MainakDev10 ай бұрын
at 5:10 why do we scale our data?
@statquest10 ай бұрын
I explain why we scale the data in this video: kzbin.info/www/bejne/pYPZmKRva5uskMk
@gbchrs3 жыл бұрын
is centering included in sklearn's pca model and that's why there is no extra step to center?
@statquest3 жыл бұрын
I believe so.
@jiayiwu41013 жыл бұрын
What is the point to look at loading scores at the final step? My understanding is as follows. Each gene is a sample. If their loading scores on PC1 are similar, it means a lot of samples are projecting around a similar position on PC1. So they are clustering apparently. Am I right?
@statquest3 жыл бұрын
In this case, loading scores tell us which genes have the most influence on the PCs. This can tell us which genes have the most variation and are the most useful for determining why the cells cluster the way the do. For more details, see: kzbin.info/www/bejne/fJjEnI2ta7Bkh7M
@jiayiwu41013 жыл бұрын
@@statquest Thank you! Just found you replied to my response very fast! Wish I knew how to look at those notifications earlier!