Pseudobulk single-cell analysis in Python with Scanpy and pyDeseq2

Рет қаралды 9,606

Күн бұрын

Пікірлер: 33

@sanbomics Жыл бұрын

Important typo in code when making pseudo-replicates: Need to add [indices[i]]. It should be as follows: rep_adata = sc.AnnData(X = samp_cell_subset[indices[i]].X.sum(axis = 0), var = samp_cell_subset[indices[i]].var[[]]) Also, If you get an error about the shape you will have to add .reshape(1, -1) to the end of sum(axis = 0)

@Brickkzz Жыл бұрын

Eternally grateful for this channel - the most useful resource on scRNAseq analysis in Python on the internet!

@sanbomics Жыл бұрын

Thank you :) ... Borne of my avoidance of R at all costs xD

@gracegregory4846 8 ай бұрын

Not sure if the DeseqDataSet parameters have changed since this tutorial but I had to change clinical to metadata when running: dds = DeseqDataSet( counts = counts, metadata=pb.obs, design_factors="tumour")

@sanbomics 8 ай бұрын

Yup its changed a lot. I'll be remaking it soon!

@lly6115 Жыл бұрын

My gratitude. Thank you for you time.

@sanbomics Жыл бұрын

Any time

@Luvinlife411 2 ай бұрын

Unfortunately pyDeseq2 does not work anymore. They updated at some point and for example the clinical= parameter doesnt exist. And as soon as I ran dds.deseq2() with my data or the test data on their github, the RAM shoots up and the kernel crashes. Back to R ugh

@sanbomics 2 ай бұрын

It still works but you just have to change it a little. e.g., clinical should be metadata instead. hmm, how much ram do you have? i have never seen it use that much at all

@neishajmoments 10 ай бұрын

You are a life saver ! 😊 Thanks

@ZnaniumTV 11 ай бұрын

Thank you very much for this very helpful video. I have a question regarding batch correction before using DESeq2. I obtained 6 samples using hashing; however, they were sequenced in 2 lanes, leading to a significant batch effect that can be observed. Usually, this is corrected with integration methods in Scanpy or Seurat. However, if we pseudobulk based on our hashing and obtain the raw data needed for DESeq2, we lose this batch correction step. Would you have any ideas on how to address this? I've checked that some of the options are RUVSeq or SVA. Thank you very much.

@marwanmohamed3844 10 ай бұрын

i have similar issue , of batch effect in my libraries and if i use pseudo bulk rawcounts for deseq2 i see strong batch effect, did you manage to solve this? thanks would appreciate your advice on this

@sjorsmaassen3764 Жыл бұрын

Thanks a lot for the tutorial. You are really doing a great service for anyone who is trying to learn more about scRNA seq analysis. I have a question that I hope someone here can anwser: For making a pseudobulk wouldnt it make more sence to get the mean of your counts instead of the sum? Because the sum method can be influenced by the total number of cells in a condition I would say. So if by random change you have outliers from a batch, or you have just more of a certain cell type in you tissue (which I would image to be the case for marcophages during a covid infection), this wouldinfluence you results.

@sanbomics Жыл бұрын

Good question. Later, the counts are corrected by size factor which will account for differences due to the total number of cells.

@qhawenid Жыл бұрын

Thanks much for such a concise and informative tutorial. One question. Is there a way to do pseudobulk DGE analysis between cell types? Thanks in advance.

@sanbomics Жыл бұрын

You could just subset the cells by cell type, similarly to what we do here. You can pseudobulk any set of cells you can subset from your data. Although, usually cell type differences are so apparent that you don't really have to worry about pseudobulk. Maybe useful if you are comparing cell type subpopulations

@qhawenid Жыл бұрын

@@sanbomics Thanks for the timely response. You're ding God's work!

@sanbomics Жыл бұрын

Thanks :) You're too kind.. It wasn't that timely xD

@stefisjustthebest 7 ай бұрын

Have you come across omicverse which uses pydeg to compare two cell types and do you think thats a valid way of doing it? I'm not sure they even aggregate the cells by sample origin but would be interested to hear your thoughts!

@carlahamilcaro6457 8 ай бұрын

Hello thank you so much. I was wondering could I do differential expression analysis control vs treatment on all cell types at the same time ?

@sanbomics 8 ай бұрын

I would put each cell type in a loop and do them separately but you can put all the results back together in the end. I'll have an example posted in the next couple of weeks.

@carlahamilcaro6457 8 ай бұрын

@@sanbomics oh that is amazing thank you so much ! Another question would it also be possible to do de on 3 categories at the same time ? say I want control vs sample that responded to treatment vs samples that did not respond to treatment. Thank you for all the help !

@qhawenid 10 ай бұрын

How to randomly partition samples (for a scRNA-seq dataset with one sample per condition) to obtain pseudo-replicate samples, and annotate these in metadata of the main adata object? or is there a way to map the newly generated pseudo-replicates to the main adata object?

@leoburgy 10 ай бұрын

You can insert the partition (described in the video) as a column (e.g., "replicate") of the adata.obs dataframe (of the main adata).

@qhawenid 10 ай бұрын

@@leoburgy Thank you for this

@jalv1499 Жыл бұрын

Thank you very much! This is very helpful! I have One question: Can you clarify the difference between differential abundance analysis and this pseudo bulk approach to study the difference of two conditions?

@sanbomics Жыл бұрын

They are similar, but pseuobulk looks at the summed expression of a population of cells and other methods might look at the distribution of expression in all cells in a population. One issue, among others, being that the high sample size of many cells inflates significance.

@ramadatta7046 5 ай бұрын

Hi, great channel and videos. May I know if we can use soupx corrected counts instead of raw counts?

@sanbomics 2 ай бұрын

I would recommend using soupX (or other denoised counts) over the raw counts if they are available

@estebanelias6958 Жыл бұрын

Hi. Firstly, thank you very much for these tutorials. Very useful. I have 3 questions: 1. How can I check if I saved my raw data after normalization, 2. Can pseudoreplicates be applied in an experiment with 2 conditions that contains pools of cells from 2-3 different samples? 3. How differences in the number of cells in a cluster from 2 conditions can affect DGE results with this method? Thanks

@sanbomics Жыл бұрын

1) Make sure to save the raw data in a layer before you normalize or it wont be there. 2) Yes, this should be ok. 3) Theoretically, the counts are normalized by size factors, but if the number of cells are vastly different, some lowly expressed genes may show in the larger population just because its larger. It shouldn't affect the genes with higher expression