Normalization methods for single-cell RNA-Seq data (high-level overview)

Рет қаралды 13,411

Florian Wagner

Күн бұрын

Пікірлер: 46

@shahidwani7586 2 жыл бұрын

Hey Florian, this is the best video regarding explanation the single cell normalization. 👍🏽👍🏽👍🏽

@abasu000 3 жыл бұрын

Clear and accessible explanation- thanks for the tutorial.

@cuiwang743 4 ай бұрын

Thank you very much for posting this precious video! It makes things so much easier for beginners!

@derricmorgan2282 3 жыл бұрын

Thanks a lot, really useful even after having read several papers and articles concerning the matter.

@muratseker6406 3 жыл бұрын

Thank you for the video it is clearly explained! Looking forward to see more video on scRNA :)

@Jenkins-f7s Жыл бұрын

Wow, you're one of my new YT favorites work wise. Just FYI, they are adding ads with shamelessness!

@Jenkins-f7s Жыл бұрын

Just a question about scaling. Shouldn't the amount of RNA be used in cell type characterization, or in quality control? Seems weird to scale it all away.

@wasima4463 2 жыл бұрын

examples data structures are transposed from the theoretical data structure (1:38) which creates confusion

@marcelochocki6281 2 жыл бұрын

Thank you so much for that video. Keep going :)

@I_Explain_Research 9 ай бұрын

1:56 @florianwagner1255 Could you please explain how to generate this plot with 10X scRNA-seq data in R?

@nikakhoshnevis6574 Жыл бұрын

Thank you so so much. Very much informative and made things clear that I was confused about.

@pancake9191 2 жыл бұрын

For your example at 10:15, if you assume this matrix has already been thru scaling, why are the total number of reads in two cells still so different?

@asunnyday3749 10 ай бұрын

Well done

@Yglandir 3 жыл бұрын

Hi Florian, Thanks for the great explanation. I have a question though: in your last example concerning pearson residual how do you get to these numbers? If I try to follow your formular mentioned on the slide before, I recieve different results. Did you simplify the mentioned formular and used instead the formular stated in Hafemeister and Satija (2019) or Lause et al (2021) for calculations? Did you do something else or am I just confused?

@florianwagner1255 3 жыл бұрын

Thank you! I could be confused, you could be confused, or we could both be confused :) Can you tell me why you think my math is off? For gene 1 I calculated a mean of 4, so you divide all the measurements by sqrt(4)=2. 8/2=4. For gene 2 I calculated a mean of 0.09, so you divide all the measurements by sqrt(0.09)=0.3. 4.5/0.3=15. Does that make sense?

@Yglandir 3 жыл бұрын

@@florianwagner1255 Thanks for your quick response! My confusion originates in the question how do you calculate the mean expression for each gene? For me the mean of gene 1 is (0+8+8)/3 = 5.333 and gene 2 (0+0+4.5)/3 = 1.5. Therefore "my" pearson residuals are 8/sqrt(5.333) = 3.46 (gene1) and 4.5/sqrt(1.5) = 3.67 (gene 2).

@Yglandir 3 жыл бұрын

I think I finally found my mistake! I did not take the percentage into account. If I do than my mean for gene1 is (0*0.5+8*4.8+8*0.02)/3 = 4. And following the same logic 4.5*0.02/3 = 0.09. Thanks for helping finding my mistake! =)

@florianwagner1255 3 жыл бұрын

@@Yglandir oh I think you are ignoring the cell type proportions specified in the example... Gene 1 has an expression of 8 in exactly 50% of the cells and 0 in the other 50%, so the mean is 4. Similarly, Gene 2 is only expressed in 2% of the cells. I hope that makes sense.

@Jenkins-f7s Жыл бұрын

Hi Yglandir! Thanks for open honest questioning - scientists need to do this more. Might I ask where you're from?

@tommasogiacomello7870 Жыл бұрын

Hi! Really clear explanation thanks a lot it was very useful, I have a question: how do i choose the scaling factor?

@Mirabell97 2 жыл бұрын

Hey! Thanks for the great explanation, helped a ton! Did I get that correctly, that for Pearson-residual based normalization, no Scaling is done prior to the multiplication with the weight?

@florianwagner1255 2 жыл бұрын

No, in the way that I've explained it, the same scaling applies. I'm always using this method to get rid of "efficiency noise", which would otherwise throw off these very simple approaches to normalization.

@Mirabell97 2 жыл бұрын

@@florianwagner1255 thanks a lot!

@bio_mark 3 жыл бұрын

Hi Florian, thank you for your clear explanation. I am not an expert on rna seq analysis and I am trying to learn on my own. I just have one (maybe big?) question. How would you conduct differential expression analysis after scaling and transforming your data as you explained. I know DEseq2 from R cannot be used with previously normalized data. Which R pipeline would you do after this? thank you

@bio_mark 3 жыл бұрын

or it is possible to conitnue with DESEQ2 after these steps? thank you

@florianwagner1255 3 жыл бұрын

Hi Marcos, I think most of the things I talk about in this video are not directly relevant to differential expression (DE) analysis. I think in many cases you probably still want to do a scaling step, but I don't think the transformations are very useful in the context of DE analysis. I wouldn't claim to be an expert on DE analysis of scRNA-Seq data, but I think this website might be interesting for you: biocellgen-public.svi.edu.au/mig_2019_scrnaseq-workshop/public/dechapter.html

@bio_mark 3 жыл бұрын

@@florianwagner1255 thank you for your reply and for the material!

@SuperMixedd 10 ай бұрын

@@bio_mark deseq2 always works on counts, so you'd be better off with raw counts if you work with 10x data

@sfmambero 2 жыл бұрын

Thank you for the clear explanation! The toy examples really helped in understanding the effects of the different types of normalization. What did you mean by “clipping” though when you talked about Pearson residuals?

@florianwagner1255 2 жыл бұрын

I was referring to a situation where the evidence of non-uniform expression for a gene is so strong, that the Pearson residuals become very large. This happens for example if there is a very cell type that has very high and specific expression of certain genes (e.g., hemoglobin genes in a few red blood cells that are contaminants in PBMC samples). "Clipping at X" means setting all values larger than a certain number X to X. The motivation for "clipping" is the idea that there isn't any benefit to letting Pearson residuals grow arbitrarily large, and it may result in strange outliers in certain analyses. I don't think clipping is always necessary, but it is something that has been described in the literature, so I mentioned it here.

@sfmambero 2 жыл бұрын

@@florianwagner1255 Understood. Thank you again!

@davidvanbergen2283 2 жыл бұрын

Thanks for the great explanation! One question: why considering the delta (10:52) and not the fold-change? (In my understanding fold-change is more biologically relevant.)

@florianwagner1255 2 жыл бұрын

Thank you! I am discussing fold changes while I'm talking about the examples on the slide.

@muratseker6406 3 жыл бұрын

when we look at the raw data, how can we have an idea how the raw data across every cell look like? So that we can determine like in your example?

@sailingintosunshine 2 жыл бұрын

really helpful, thanks!

@pariaalipour61 2 жыл бұрын

Thank you so much for this helpful video. I got two questions if you don't mind. First, does it matter the order of doing Normalization and Scaling? you mentioned scaling first however, in Satija vignette Normalization is done first what is the difference?. Second, what I realized is that normalization is separate from scaling. in this case, is normalization same as transformation?

@florianwagner1255 2 жыл бұрын

Thank you! The goal of the scaling step is to get rid of efficiency noise and convert from absolute expression levels to concentrations. This needs to be done first, because the transformation step is non-linear, so scaling after transformation doesn't have the same effect. Yes, "normalization" is sometimes used to mean transformation, but I've defined the term here to include both scaling and transformation, which I thought is more common.

@pariaalipour61 2 жыл бұрын

@@florianwagner1255 Thanks a lot for your explanation. Sorry, I'm trying to compare with Seurat vignette. I think the scaling+ transformation you mentioned here is done by NormalizeData in Seurat. Please correct me if I'm wrong. But what about the ScaleData in Seurat. did you mention it? or it's sth else?

@florianwagner1255 2 жыл бұрын

@@pariaalipour61 Yes I think you're right, NormalizeData does both scaling and transformation. ScaleData does something completely different, it subtracts the mean of each gene and divides by its standard deviation, which is usually called (feature) standardization or z-score normalization: satijalab.org/seurat/reference/scaledata

@pariaalipour61 2 жыл бұрын

@@florianwagner1255 you didn't mention that. Do you think it's not necessary for downstream analysis?