Quantile Normalization, Clearly Explained!!!

  Рет қаралды 75,530

StatQuest with Josh Starmer

StatQuest with Josh Starmer

Күн бұрын

Пікірлер: 118
@statquest
@statquest 2 жыл бұрын
Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
@mzheng85
@mzheng85 2 жыл бұрын
I searched for an hour to get a clear understanding of normalization and you explained it in 30 seconds. Thank you!
@statquest
@statquest 2 жыл бұрын
BAM! :)
@tanishasharma3665
@tanishasharma3665 4 жыл бұрын
Short, well-explained and so much better than the confusing webpages I was wasting my time on. Saw your video on quantiles and percentiles as well! Thank you so much for these videos, they help me both with my job and statistical knowledge!
@statquest
@statquest 4 жыл бұрын
Glad it was helpful!
@alirezaforoozani7833
@alirezaforoozani7833 6 жыл бұрын
Oh how I wish you were my maths teacher, you make it seem so easy! I thank and salute you, sir!
@statquest
@statquest 6 жыл бұрын
Hooray! I'm glad you like the video. :)
@charliejin8620
@charliejin8620 5 жыл бұрын
thanks! quite a simple and clear explanation! much much better than our lecturers
@statquest
@statquest 5 жыл бұрын
Thank you! :)
@hieuthepunk
@hieuthepunk Жыл бұрын
This is the clearest answer i get about normalization. Thank you
@statquest
@statquest Жыл бұрын
Thanks!
@RDFannin3
@RDFannin3 4 жыл бұрын
Good grief, this is good stuff! I wish there was something as clear for learning R code
@statquest
@statquest 4 жыл бұрын
I have a few videos on R code listed on my website: statquest.org/video-index/
@khaledgeba7099
@khaledgeba7099 Ай бұрын
Thanks so much Josh! Nice Explanation, you made my day!!
@statquest
@statquest Ай бұрын
Thank you!
@lunapeverell3602
@lunapeverell3602 3 жыл бұрын
As always, this is the only way statistics comes easy and non unbearable to me. Thanks!
@statquest
@statquest 3 жыл бұрын
Happy to help!
@severtone263
@severtone263 9 ай бұрын
Quick, easy and fun. Thank you Josh!
@statquest
@statquest 9 ай бұрын
Thanks!
@blenderwang5061
@blenderwang5061 2 жыл бұрын
You did a great explanation, man! Thank you!
@statquest
@statquest 2 жыл бұрын
Glad you liked it!
@afs208
@afs208 5 жыл бұрын
my mum asked what am I watching when she heard the intro, didn't know how to say it's an educational video
@statquest
@statquest 5 жыл бұрын
Ha! :)
@moomoocheng6009
@moomoocheng6009 5 жыл бұрын
very great vedio, explained so clear that even I can understand,I am pretty new to bioinformatics, I am very confused about the relationship between quantile normalization and mas5 or rma normalization.
@shuaishigao6356
@shuaishigao6356 7 жыл бұрын
Very helpful, that's exactly what I'm looking for! Thanks Joshua.
@veeranagoudayaligar
@veeranagoudayaligar 7 жыл бұрын
It looked complicated, after watching your video, Umm, very simple. Thanks a lot.
@ai1888
@ai1888 6 жыл бұрын
At 3:32, one thing I noticed is that the the red colored gene for samples 1 and 3 now have the exact same intensity values. In reality this is almost certainly not true. I notice this a lot when performing RMA for microarrays where Quantile normalization compresses smaller fold change differences. Is this just a caveat of the normalization method we just have to accept?
@ai1888
@ai1888 6 жыл бұрын
I just checked and it does perform a strict quantile normalization just the way you described. Following that it fits a linear model to the normalized data and performs a median polish.
@ai1888
@ai1888 6 жыл бұрын
Hooray!
@paveldvorak2014
@paveldvorak2014 5 жыл бұрын
@Josh, which software do you use for these videos? It looks like Powepoint, but some more advanced version 😄 👍👍
@statquest
@statquest 5 жыл бұрын
I started out using PowerPoint (and this video was done with PowerPoint). But PowerPoint doesn't work well on my computer so I switched to Apple's "Keynote" program. Now I like Keynote a lot more than powerpoint.
@kinzarian8926
@kinzarian8926 Жыл бұрын
Merci !
@statquest
@statquest Жыл бұрын
Hooray!!! Thank you for supporting StatQuest!!! BAM! :)
@oanaflorean83
@oanaflorean83 4 жыл бұрын
Awesome BAM!! Thx buddy :)
@statquest
@statquest 4 жыл бұрын
Any time!
@muhammadabdullahnabeel6039
@muhammadabdullahnabeel6039 6 ай бұрын
@StatQuest Doesn't this normalization remove information? For example, in sample 2, the levels of expression are too high compared to sample 1 and we can't conserve this information.
@statquest
@statquest 6 ай бұрын
Yes, some information is lost, but we gain the ability to make a comparison that we didn't have before.
@muhammadabdullahnabeel6039
@muhammadabdullahnabeel6039 6 ай бұрын
@@statquest Thanks for the reply! I am still learning and transitioning to computational biology. I will further research improved methods if there are any.
@zhaowu3193
@zhaowu3193 2 жыл бұрын
Thank you for this simple yet illustrative example pf quantile normalization. I would like to know what happened if we have missing values in some of the samples. Can we still do the quantile normalization ?
@statquest
@statquest 2 жыл бұрын
That's a good question. I'm pretty sure you would need to impute the missing values first.
@jdm89s13
@jdm89s13 5 жыл бұрын
So what if I have microarray data for different cohorts, and I am not worried about the specific intensity values, but just want to compare gene expression level across cohorts (i.e. which samples express a certain gene high versus those which express it low)? Would quantile normalization be a valid way to scale the data prior to clustering?
@statquest
@statquest 5 жыл бұрын
Quantile normalization is commonly used with microarray data, so I would give it a try.
@etzhaim
@etzhaim 5 жыл бұрын
Thanks for this video. A question: Why perform quantile normalization instead of z-scores?
@statquest
@statquest 5 жыл бұрын
Great question! I think the big difference is quantiles allow you to compare ranks (i.e. quantiles tell us which measurement was the largest, or the 75th largest etc), and z-scores are more quantitative (how many standard deviations away from the mean a given data point is). Test score are often reported using quantiles since they make it easy to know how your test ranked among the others. If I said your test score was the top quantile, then you would know your test score was the best. In contrast, if I told you your test score was two standard deviations above the mean, you wouldn't know if it was the best or not... Does that make sense? There are also statistical tests that work well with rank data (quantiles), and those might be more appropriate in certain situations - but explaining all that detail might be better done in a video rather than a comment.... :)
@pratapseshachalam2859
@pratapseshachalam2859 5 жыл бұрын
Nice video. the order of genes is preserved. My doubt is gene expression is shown on same level among the samples after quantile normalisation. Then, how could you see the difference among the sample for the gene?
@statquest
@statquest 5 жыл бұрын
@@pratapseshachalam2859 Like I said in the previous response, there are statistical tests that work with rank data, which is what you have after quantile normalization. That's a subject for another StatQuest. In the mean time, check out the mann-whitney U-test: en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test
@pratapseshachalam2859
@pratapseshachalam2859 5 жыл бұрын
@@statquest Thanks a lot :)
@HR-yd5ib
@HR-yd5ib 5 жыл бұрын
@@statquest , Actually since the ranks are not changed by QN you can do rank test just as well on the original data. I think the point is that one assumes that the total mRNA distribution in a cell/sample doesn't change with condition. Hence all mRNA distributions from all samples are made identical. This should improve subsequent t-tests, FC computations etc.
@dorjexx
@dorjexx 4 жыл бұрын
Thank you, Josh. BUT, I got a question about 'BUT': at the end of the video, you said: "after quantile normalization, the values for each sample are the same... BUT, the original gene orders are preserved." if the values are the same, the orders are the same, right? So, why use 'but'?
@statquest
@statquest 4 жыл бұрын
At 3:43 you see that the original values are on the left and the normalized values are on the right. The normalized values are all equal for each sample (1, 2 and 3), this is what I meant by "same". These equal values, however, are different from the original values on the left. So the "but" means that even though we changed the values to be all equal, the order of the values on the right is the same as the order of the values on the left.
@dorjexx
@dorjexx 4 жыл бұрын
@@statquest Than you very much, Josh. Now I see. ;) Cheers.
@糜家睿
@糜家睿 7 жыл бұрын
Hej, Joshua. Could you talk about the statistical methods that are used in single-cell RNA-seq, especially the normalization methods we used and the difference in analysis between bulk-RNA-seq and single-cell RNA-seq.
@糜家睿
@糜家睿 7 жыл бұрын
Yeah, I have gone through some online tutorials about single-cell RNA-seq. But most of them just talk about how to run the code and the subsequent fancy data visualization. The basic statistical methods are more important especially considering there are still quite a lot of differences between bulk and single-cell. Very looking forward to your following videos!!!
@msumode4493
@msumode4493 4 жыл бұрын
Thank you so much Josh.
@saptashwachatterjee6875
@saptashwachatterjee6875 4 жыл бұрын
Please do a video on quantile regression
@leesweets4110
@leesweets4110 2 жыл бұрын
And if the data sets in each sample have different numbers of genes? How do you quantile normalize between sets of different sizes? My first thought before starting the real explanation of this video... was that we'd simply scale and shift the data in each sample according to their own standard deviations and means. This would preserve order, fix the means, and preserve relative instensities within each sample.
@statquest
@statquest 2 жыл бұрын
Regardless of the number of genes in each sample, you can match the quantiles.
@leesweets4110
@leesweets4110 2 жыл бұрын
@@statquest but that doesn't explain where to place the point on the graph.
@statquest
@statquest 2 жыл бұрын
@@leesweets4110 Ah. Ok, now I understand the question better. Since the datasets have different sizes, you need to look at quantile normalization with missing values. I believe one commonly used approach is to interpolate the missing values first, to equalize the datasets, and then apply quantile normalization as described in this video.
@mrcoolgs100
@mrcoolgs100 6 жыл бұрын
very good explanation! thank you!
@user-or7ji5hv8y
@user-or7ji5hv8y 4 жыл бұрын
just wondering, could we not normalize each data set into standard normal by using its respective mean and standard deviation?
@statquest
@statquest 4 жыл бұрын
That's definitely a common way to normalize things.
@grantsmith3653
@grantsmith3653 4 жыл бұрын
Great vid!
@statquest
@statquest 4 жыл бұрын
:)
@amberrose8965
@amberrose8965 2 жыл бұрын
I appreciate this!
@statquest
@statquest 2 жыл бұрын
Thanks!
@shubha1Ana2
@shubha1Ana2 2 жыл бұрын
Hello Sir, I have a doubt. Is image segmentation for finding size, shape , pleomorphism of nuclei always necessary to classification of H&E WSI? If we you deep learning networks, can we pass HE images( may be rescaled) as it is without segmentation? Kindly answer if possible
@statquest
@statquest 2 жыл бұрын
My series of videos on neural networks, which includes image classification, might help: kzbin.info/www/bejne/eaKyl5xqZrGZetk
@gpgor
@gpgor 5 жыл бұрын
How about median normalization?
@cjgilmore283
@cjgilmore283 Жыл бұрын
THANK YOU you're amazing
@statquest
@statquest Жыл бұрын
Thanks!
@mpat53
@mpat53 4 жыл бұрын
HI John, I have a question 1) I have been provided with a table of quantile normalized read data (RNA seq). I want to progress using the program IDEP online . Should I enter this data as 'read count data' or as 'normalised expression values eg RNA seq FPKM, microarray etc' I think it's the second one as it is quantile normalised but I'm not fully sure as it's not FPKM.. thanks
@statquest
@statquest 4 жыл бұрын
I think the "etc" in the second option covers the quantile normalization. You can always email support or the authors just to be sure.
@stephenpower6876
@stephenpower6876 3 жыл бұрын
Hi Josh, great video. I'm very new to bioinformatics / statistics; I've been provided with a massive RNASeq dataset, and I've no idea if the data is quantile normalised or not. Do you know of any handy way I can check to see if quantile normalisation has been performed?
@statquest
@statquest 3 жыл бұрын
Do all of the highest expressed genes in each sample have the exact same value? If so, it is probably quantile normalized.
@lucarauchenberger628
@lucarauchenberger628 3 жыл бұрын
finally, I got it now!!
@statquest
@statquest 3 жыл бұрын
Hooray!
@danielwiczew
@danielwiczew 4 жыл бұрын
A question: couldn't we just normalize the data in the y axis, by turning it into 0 mean and 1 variance? Then the scale on the y axis would be 0 ... 1.
@statquest
@statquest 4 жыл бұрын
Sure, you could do that, but that would be a different type of normalization. There are lots of ways to normalize data, and quantile normalization is just one of them.
@ranjeetkumar273216
@ranjeetkumar273216 6 жыл бұрын
Hi, Nice Explanation. Could you talk on PCA vs Factor analysis difference?
@statquest
@statquest 6 жыл бұрын
One day I'll do that. Right now I'm gearing up to cover lasso and ridge regression techniques. Those videos should be out by the end of September.
@fkhan4504
@fkhan4504 6 жыл бұрын
Thanks for making the video
@statquest
@statquest 6 жыл бұрын
I'm glad you like it! I'll make more! :)
@hengdezhu2832
@hengdezhu2832 5 жыл бұрын
Thank you. Got a question, the same color of each sample represents the same gene measured from different experiment, is it right?
@statquest
@statquest 5 жыл бұрын
Yes. One color per gene.
@hengdezhu2832
@hengdezhu2832 5 жыл бұрын
@@statquest what if different samples have different number of gene, how to do quantile normalization? For example, Sample1 has 3 genes, A, B,C. Sample2 has 4 genes, A, B,C,D. Can I set D gene in Sample1 to zero and do the quantile normalization?
@statquest
@statquest 5 жыл бұрын
@@hengdezhu2832 That might work, but, to be honest, I'm not sure is best in this situation.
@hengdezhu2832
@hengdezhu2832 5 жыл бұрын
@@statquest Ok, thank you so much!
@omarabdelrahman3739
@omarabdelrahman3739 3 жыл бұрын
How about a quantile regression video?...PLEASE?
@statquest
@statquest 3 жыл бұрын
Noted
@ChadMc74
@ChadMc74 4 жыл бұрын
Is this similar to blocking?
@statquest
@statquest 4 жыл бұрын
I'm not sure what you mean. Can you elaborate on your question?
@urjaswitayadav3188
@urjaswitayadav3188 7 жыл бұрын
Great video. Thank you!
@TheEbbemonster
@TheEbbemonster 5 жыл бұрын
What is the purpose for doing this?
@statquest
@statquest 5 жыл бұрын
It helps normalize data when you have a lot of technical noise.
@SergeySenigov
@SergeySenigov Жыл бұрын
Say, three parfume experts rate 4 new parfumes. It is known that absolute scores are less reliable than relative. So we want to average equally ranked absolute parfume scores and preserve relative. Now suppose we have got very little distance between 2nd and 3rd ranks. So we cannot confidently choose between blue (ranks 1, 2, 2) and yellow (2, 1, 3) cause ranks 2 and 3 are near. Presumably we should engage the forth expert. However if the distance between 2nd and 3rd ranks is large we confidently choose blue.
@adetayoaborisade9346
@adetayoaborisade9346 4 жыл бұрын
Double Bam
@statquest
@statquest 4 жыл бұрын
:)
@abdrnasr
@abdrnasr 4 жыл бұрын
Is there an example where this can be helpful ?
@statquest
@statquest 4 жыл бұрын
I believe quantile normalization was invented for microarrays (a method for measuring gene expression). However, I've seen it used in other situations when people wanted a non-parametric way to remove batch effects.
@alecvan7143
@alecvan7143 5 жыл бұрын
Amazing!
@statquest
@statquest 5 жыл бұрын
:)
@MrWater2
@MrWater2 11 ай бұрын
Pros and cons of this normalization?
@statquest
@statquest 11 ай бұрын
Pros, no worries about outliers. Cons? You loose a lot of nuance in the data.
@MrWater2
@MrWater2 11 ай бұрын
@@statquest Yep! But what I don't understand is that the data (values) after the trasnformation is the same across variables? It has no sense to me probably I missunderstood something
@statquest
@statquest 11 ай бұрын
@@MrWater2 In this case, what is important is the relative position and ranking of each measurement, rather than it's actual value. Lots of non-parametric statistical tests can be performed on ranks.
@MrWater2
@MrWater2 11 ай бұрын
Aha, perfect. But I can't use as a preprocessing step in statiscal learning I guess because the transformed matrix must be I'll conditioned. Right?
@illiap3865
@illiap3865 4 жыл бұрын
But doesn't it erase information about how measurements compare to each other in one sample?
@statquest
@statquest 4 жыл бұрын
You still retain information about rank (i.e. gene X is higher than gene Y), but you can no longer quantify the difference. However, you wouldn't quantile normalize in the first place if you were only interested in the values within a single sample.
@eiderdiaz7219
@eiderdiaz7219 4 жыл бұрын
i love it
@statquest
@statquest 4 жыл бұрын
Hooray! :)
@hamade7997
@hamade7997 5 жыл бұрын
you area a fucking king.
@ayoubbakar7907
@ayoubbakar7907 6 жыл бұрын
triple baaaam
@statquest
@statquest 6 жыл бұрын
That's right! :)
@Barbirose
@Barbirose 5 жыл бұрын
Ez pz so ez ur explenation sucks tho
p-hacking and power calculations
19:12
StatQuest with Josh Starmer
Рет қаралды 45 М.
Covariance, Clearly Explained!!!
22:23
StatQuest with Josh Starmer
Рет қаралды 575 М.
ССЫЛКА НА ИГРУ В КОММЕНТАХ #shorts
0:36
Паша Осадчий
Рет қаралды 8 МЛН
Война Семей - ВСЕ СЕРИИ, 1 сезон (серии 1-20)
7:40:31
Семейные Сериалы
Рет қаралды 1,6 МЛН
Как Ходили родители в ШКОЛУ!
0:49
Family Box
Рет қаралды 2,3 МЛН
StatQuest: Principal Component Analysis (PCA), Step-by-Step
21:58
StatQuest with Josh Starmer
Рет қаралды 3 МЛН
Standardization vs Normalization Clearly Explained!
5:48
Normalized Nerd
Рет қаралды 159 М.
Quantiles and Percentiles, Clearly Explained!!!
6:30
StatQuest with Josh Starmer
Рет қаралды 330 М.
Pearson's Correlation, Clearly Explained!!!
19:13
StatQuest with Josh Starmer
Рет қаралды 401 М.
In Statistics, Probability is not Likelihood.
5:01
StatQuest with Josh Starmer
Рет қаралды 1,2 МЛН
The standard error, Clearly Explained!!!
11:44
StatQuest with Josh Starmer
Рет қаралды 228 М.
ROC and AUC, Clearly Explained!
16:17
StatQuest with Josh Starmer
Рет қаралды 1,5 МЛН
Microarray normalization, fitting and annotation using R!
5:59
Marcos Morgan
Рет қаралды 192
Maximum Likelihood, clearly explained!!!
6:12
StatQuest with Josh Starmer
Рет қаралды 1,4 МЛН
ССЫЛКА НА ИГРУ В КОММЕНТАХ #shorts
0:36
Паша Осадчий
Рет қаралды 8 МЛН