Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
@mzheng852 жыл бұрын
I searched for an hour to get a clear understanding of normalization and you explained it in 30 seconds. Thank you!
@statquest2 жыл бұрын
BAM! :)
@tanishasharma36654 жыл бұрын
Short, well-explained and so much better than the confusing webpages I was wasting my time on. Saw your video on quantiles and percentiles as well! Thank you so much for these videos, they help me both with my job and statistical knowledge!
@statquest4 жыл бұрын
Glad it was helpful!
@alirezaforoozani78336 жыл бұрын
Oh how I wish you were my maths teacher, you make it seem so easy! I thank and salute you, sir!
@statquest6 жыл бұрын
Hooray! I'm glad you like the video. :)
@charliejin86205 жыл бұрын
thanks! quite a simple and clear explanation! much much better than our lecturers
@statquest5 жыл бұрын
Thank you! :)
@hieuthepunk Жыл бұрын
This is the clearest answer i get about normalization. Thank you
@statquest Жыл бұрын
Thanks!
@RDFannin34 жыл бұрын
Good grief, this is good stuff! I wish there was something as clear for learning R code
@statquest4 жыл бұрын
I have a few videos on R code listed on my website: statquest.org/video-index/
@khaledgeba7099Ай бұрын
Thanks so much Josh! Nice Explanation, you made my day!!
@statquestАй бұрын
Thank you!
@lunapeverell36023 жыл бұрын
As always, this is the only way statistics comes easy and non unbearable to me. Thanks!
@statquest3 жыл бұрын
Happy to help!
@severtone2639 ай бұрын
Quick, easy and fun. Thank you Josh!
@statquest9 ай бұрын
Thanks!
@blenderwang50612 жыл бұрын
You did a great explanation, man! Thank you!
@statquest2 жыл бұрын
Glad you liked it!
@afs2085 жыл бұрын
my mum asked what am I watching when she heard the intro, didn't know how to say it's an educational video
@statquest5 жыл бұрын
Ha! :)
@moomoocheng60095 жыл бұрын
very great vedio, explained so clear that even I can understand,I am pretty new to bioinformatics, I am very confused about the relationship between quantile normalization and mas5 or rma normalization.
@shuaishigao63567 жыл бұрын
Very helpful, that's exactly what I'm looking for! Thanks Joshua.
@veeranagoudayaligar7 жыл бұрын
It looked complicated, after watching your video, Umm, very simple. Thanks a lot.
@ai18886 жыл бұрын
At 3:32, one thing I noticed is that the the red colored gene for samples 1 and 3 now have the exact same intensity values. In reality this is almost certainly not true. I notice this a lot when performing RMA for microarrays where Quantile normalization compresses smaller fold change differences. Is this just a caveat of the normalization method we just have to accept?
@ai18886 жыл бұрын
I just checked and it does perform a strict quantile normalization just the way you described. Following that it fits a linear model to the normalized data and performs a median polish.
@ai18886 жыл бұрын
Hooray!
@paveldvorak20145 жыл бұрын
@Josh, which software do you use for these videos? It looks like Powepoint, but some more advanced version 😄 👍👍
@statquest5 жыл бұрын
I started out using PowerPoint (and this video was done with PowerPoint). But PowerPoint doesn't work well on my computer so I switched to Apple's "Keynote" program. Now I like Keynote a lot more than powerpoint.
@kinzarian8926 Жыл бұрын
Merci !
@statquest Жыл бұрын
Hooray!!! Thank you for supporting StatQuest!!! BAM! :)
@oanaflorean834 жыл бұрын
Awesome BAM!! Thx buddy :)
@statquest4 жыл бұрын
Any time!
@muhammadabdullahnabeel60396 ай бұрын
@StatQuest Doesn't this normalization remove information? For example, in sample 2, the levels of expression are too high compared to sample 1 and we can't conserve this information.
@statquest6 ай бұрын
Yes, some information is lost, but we gain the ability to make a comparison that we didn't have before.
@muhammadabdullahnabeel60396 ай бұрын
@@statquest Thanks for the reply! I am still learning and transitioning to computational biology. I will further research improved methods if there are any.
@zhaowu31932 жыл бұрын
Thank you for this simple yet illustrative example pf quantile normalization. I would like to know what happened if we have missing values in some of the samples. Can we still do the quantile normalization ?
@statquest2 жыл бұрын
That's a good question. I'm pretty sure you would need to impute the missing values first.
@jdm89s135 жыл бұрын
So what if I have microarray data for different cohorts, and I am not worried about the specific intensity values, but just want to compare gene expression level across cohorts (i.e. which samples express a certain gene high versus those which express it low)? Would quantile normalization be a valid way to scale the data prior to clustering?
@statquest5 жыл бұрын
Quantile normalization is commonly used with microarray data, so I would give it a try.
@etzhaim5 жыл бұрын
Thanks for this video. A question: Why perform quantile normalization instead of z-scores?
@statquest5 жыл бұрын
Great question! I think the big difference is quantiles allow you to compare ranks (i.e. quantiles tell us which measurement was the largest, or the 75th largest etc), and z-scores are more quantitative (how many standard deviations away from the mean a given data point is). Test score are often reported using quantiles since they make it easy to know how your test ranked among the others. If I said your test score was the top quantile, then you would know your test score was the best. In contrast, if I told you your test score was two standard deviations above the mean, you wouldn't know if it was the best or not... Does that make sense? There are also statistical tests that work well with rank data (quantiles), and those might be more appropriate in certain situations - but explaining all that detail might be better done in a video rather than a comment.... :)
@pratapseshachalam28595 жыл бұрын
Nice video. the order of genes is preserved. My doubt is gene expression is shown on same level among the samples after quantile normalisation. Then, how could you see the difference among the sample for the gene?
@statquest5 жыл бұрын
@@pratapseshachalam2859 Like I said in the previous response, there are statistical tests that work with rank data, which is what you have after quantile normalization. That's a subject for another StatQuest. In the mean time, check out the mann-whitney U-test: en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test
@pratapseshachalam28595 жыл бұрын
@@statquest Thanks a lot :)
@HR-yd5ib5 жыл бұрын
@@statquest , Actually since the ranks are not changed by QN you can do rank test just as well on the original data. I think the point is that one assumes that the total mRNA distribution in a cell/sample doesn't change with condition. Hence all mRNA distributions from all samples are made identical. This should improve subsequent t-tests, FC computations etc.
@dorjexx4 жыл бұрын
Thank you, Josh. BUT, I got a question about 'BUT': at the end of the video, you said: "after quantile normalization, the values for each sample are the same... BUT, the original gene orders are preserved." if the values are the same, the orders are the same, right? So, why use 'but'?
@statquest4 жыл бұрын
At 3:43 you see that the original values are on the left and the normalized values are on the right. The normalized values are all equal for each sample (1, 2 and 3), this is what I meant by "same". These equal values, however, are different from the original values on the left. So the "but" means that even though we changed the values to be all equal, the order of the values on the right is the same as the order of the values on the left.
@dorjexx4 жыл бұрын
@@statquest Than you very much, Josh. Now I see. ;) Cheers.
@糜家睿7 жыл бұрын
Hej, Joshua. Could you talk about the statistical methods that are used in single-cell RNA-seq, especially the normalization methods we used and the difference in analysis between bulk-RNA-seq and single-cell RNA-seq.
@糜家睿7 жыл бұрын
Yeah, I have gone through some online tutorials about single-cell RNA-seq. But most of them just talk about how to run the code and the subsequent fancy data visualization. The basic statistical methods are more important especially considering there are still quite a lot of differences between bulk and single-cell. Very looking forward to your following videos!!!
@msumode44934 жыл бұрын
Thank you so much Josh.
@saptashwachatterjee68754 жыл бұрын
Please do a video on quantile regression
@leesweets41102 жыл бұрын
And if the data sets in each sample have different numbers of genes? How do you quantile normalize between sets of different sizes? My first thought before starting the real explanation of this video... was that we'd simply scale and shift the data in each sample according to their own standard deviations and means. This would preserve order, fix the means, and preserve relative instensities within each sample.
@statquest2 жыл бұрын
Regardless of the number of genes in each sample, you can match the quantiles.
@leesweets41102 жыл бұрын
@@statquest but that doesn't explain where to place the point on the graph.
@statquest2 жыл бұрын
@@leesweets4110 Ah. Ok, now I understand the question better. Since the datasets have different sizes, you need to look at quantile normalization with missing values. I believe one commonly used approach is to interpolate the missing values first, to equalize the datasets, and then apply quantile normalization as described in this video.
@mrcoolgs1006 жыл бұрын
very good explanation! thank you!
@user-or7ji5hv8y4 жыл бұрын
just wondering, could we not normalize each data set into standard normal by using its respective mean and standard deviation?
@statquest4 жыл бұрын
That's definitely a common way to normalize things.
@grantsmith36534 жыл бұрын
Great vid!
@statquest4 жыл бұрын
:)
@amberrose89652 жыл бұрын
I appreciate this!
@statquest2 жыл бұрын
Thanks!
@shubha1Ana22 жыл бұрын
Hello Sir, I have a doubt. Is image segmentation for finding size, shape , pleomorphism of nuclei always necessary to classification of H&E WSI? If we you deep learning networks, can we pass HE images( may be rescaled) as it is without segmentation? Kindly answer if possible
@statquest2 жыл бұрын
My series of videos on neural networks, which includes image classification, might help: kzbin.info/www/bejne/eaKyl5xqZrGZetk
@gpgor5 жыл бұрын
How about median normalization?
@cjgilmore283 Жыл бұрын
THANK YOU you're amazing
@statquest Жыл бұрын
Thanks!
@mpat534 жыл бұрын
HI John, I have a question 1) I have been provided with a table of quantile normalized read data (RNA seq). I want to progress using the program IDEP online . Should I enter this data as 'read count data' or as 'normalised expression values eg RNA seq FPKM, microarray etc' I think it's the second one as it is quantile normalised but I'm not fully sure as it's not FPKM.. thanks
@statquest4 жыл бұрын
I think the "etc" in the second option covers the quantile normalization. You can always email support or the authors just to be sure.
@stephenpower68763 жыл бұрын
Hi Josh, great video. I'm very new to bioinformatics / statistics; I've been provided with a massive RNASeq dataset, and I've no idea if the data is quantile normalised or not. Do you know of any handy way I can check to see if quantile normalisation has been performed?
@statquest3 жыл бұрын
Do all of the highest expressed genes in each sample have the exact same value? If so, it is probably quantile normalized.
@lucarauchenberger6283 жыл бұрын
finally, I got it now!!
@statquest3 жыл бұрын
Hooray!
@danielwiczew4 жыл бұрын
A question: couldn't we just normalize the data in the y axis, by turning it into 0 mean and 1 variance? Then the scale on the y axis would be 0 ... 1.
@statquest4 жыл бұрын
Sure, you could do that, but that would be a different type of normalization. There are lots of ways to normalize data, and quantile normalization is just one of them.
@ranjeetkumar2732166 жыл бұрын
Hi, Nice Explanation. Could you talk on PCA vs Factor analysis difference?
@statquest6 жыл бұрын
One day I'll do that. Right now I'm gearing up to cover lasso and ridge regression techniques. Those videos should be out by the end of September.
@fkhan45046 жыл бұрын
Thanks for making the video
@statquest6 жыл бұрын
I'm glad you like it! I'll make more! :)
@hengdezhu28325 жыл бұрын
Thank you. Got a question, the same color of each sample represents the same gene measured from different experiment, is it right?
@statquest5 жыл бұрын
Yes. One color per gene.
@hengdezhu28325 жыл бұрын
@@statquest what if different samples have different number of gene, how to do quantile normalization? For example, Sample1 has 3 genes, A, B,C. Sample2 has 4 genes, A, B,C,D. Can I set D gene in Sample1 to zero and do the quantile normalization?
@statquest5 жыл бұрын
@@hengdezhu2832 That might work, but, to be honest, I'm not sure is best in this situation.
@hengdezhu28325 жыл бұрын
@@statquest Ok, thank you so much!
@omarabdelrahman37393 жыл бұрын
How about a quantile regression video?...PLEASE?
@statquest3 жыл бұрын
Noted
@ChadMc744 жыл бұрын
Is this similar to blocking?
@statquest4 жыл бұрын
I'm not sure what you mean. Can you elaborate on your question?
@urjaswitayadav31887 жыл бұрын
Great video. Thank you!
@TheEbbemonster5 жыл бұрын
What is the purpose for doing this?
@statquest5 жыл бұрын
It helps normalize data when you have a lot of technical noise.
@SergeySenigov Жыл бұрын
Say, three parfume experts rate 4 new parfumes. It is known that absolute scores are less reliable than relative. So we want to average equally ranked absolute parfume scores and preserve relative. Now suppose we have got very little distance between 2nd and 3rd ranks. So we cannot confidently choose between blue (ranks 1, 2, 2) and yellow (2, 1, 3) cause ranks 2 and 3 are near. Presumably we should engage the forth expert. However if the distance between 2nd and 3rd ranks is large we confidently choose blue.
@adetayoaborisade93464 жыл бұрын
Double Bam
@statquest4 жыл бұрын
:)
@abdrnasr4 жыл бұрын
Is there an example where this can be helpful ?
@statquest4 жыл бұрын
I believe quantile normalization was invented for microarrays (a method for measuring gene expression). However, I've seen it used in other situations when people wanted a non-parametric way to remove batch effects.
@alecvan71435 жыл бұрын
Amazing!
@statquest5 жыл бұрын
:)
@MrWater211 ай бұрын
Pros and cons of this normalization?
@statquest11 ай бұрын
Pros, no worries about outliers. Cons? You loose a lot of nuance in the data.
@MrWater211 ай бұрын
@@statquest Yep! But what I don't understand is that the data (values) after the trasnformation is the same across variables? It has no sense to me probably I missunderstood something
@statquest11 ай бұрын
@@MrWater2 In this case, what is important is the relative position and ranking of each measurement, rather than it's actual value. Lots of non-parametric statistical tests can be performed on ranks.
@MrWater211 ай бұрын
Aha, perfect. But I can't use as a preprocessing step in statiscal learning I guess because the transformed matrix must be I'll conditioned. Right?
@illiap38654 жыл бұрын
But doesn't it erase information about how measurements compare to each other in one sample?
@statquest4 жыл бұрын
You still retain information about rank (i.e. gene X is higher than gene Y), but you can no longer quantify the difference. However, you wouldn't quantile normalize in the first place if you were only interested in the values within a single sample.