Important typo in code when making pseudo-replicates: Need to add [indices[i]]. It should be as follows: rep_adata = sc.AnnData(X = samp_cell_subset[indices[i]].X.sum(axis = 0), var = samp_cell_subset[indices[i]].var[[]]) Also, If you get an error about the shape you will have to add .reshape(1, -1) to the end of sum(axis = 0)
@Brickkzz Жыл бұрын
Eternally grateful for this channel - the most useful resource on scRNAseq analysis in Python on the internet!
@sanbomics Жыл бұрын
Thank you :) ... Borne of my avoidance of R at all costs xD
@gracegregory48468 ай бұрын
Not sure if the DeseqDataSet parameters have changed since this tutorial but I had to change clinical to metadata when running: dds = DeseqDataSet( counts = counts, metadata=pb.obs, design_factors="tumour")
@sanbomics8 ай бұрын
Yup its changed a lot. I'll be remaking it soon!
@lly6115 Жыл бұрын
My gratitude. Thank you for you time.
@sanbomics Жыл бұрын
Any time
@Luvinlife4112 ай бұрын
Unfortunately pyDeseq2 does not work anymore. They updated at some point and for example the clinical= parameter doesnt exist. And as soon as I ran dds.deseq2() with my data or the test data on their github, the RAM shoots up and the kernel crashes. Back to R ugh
@sanbomics2 ай бұрын
It still works but you just have to change it a little. e.g., clinical should be metadata instead. hmm, how much ram do you have? i have never seen it use that much at all
@neishajmoments10 ай бұрын
You are a life saver ! 😊 Thanks
@ZnaniumTV11 ай бұрын
Thank you very much for this very helpful video. I have a question regarding batch correction before using DESeq2. I obtained 6 samples using hashing; however, they were sequenced in 2 lanes, leading to a significant batch effect that can be observed. Usually, this is corrected with integration methods in Scanpy or Seurat. However, if we pseudobulk based on our hashing and obtain the raw data needed for DESeq2, we lose this batch correction step. Would you have any ideas on how to address this? I've checked that some of the options are RUVSeq or SVA. Thank you very much.
@marwanmohamed384410 ай бұрын
i have similar issue , of batch effect in my libraries and if i use pseudo bulk rawcounts for deseq2 i see strong batch effect, did you manage to solve this? thanks would appreciate your advice on this
@sjorsmaassen3764 Жыл бұрын
Thanks a lot for the tutorial. You are really doing a great service for anyone who is trying to learn more about scRNA seq analysis. I have a question that I hope someone here can anwser: For making a pseudobulk wouldnt it make more sence to get the mean of your counts instead of the sum? Because the sum method can be influenced by the total number of cells in a condition I would say. So if by random change you have outliers from a batch, or you have just more of a certain cell type in you tissue (which I would image to be the case for marcophages during a covid infection), this wouldinfluence you results.
@sanbomics Жыл бұрын
Good question. Later, the counts are corrected by size factor which will account for differences due to the total number of cells.
@qhawenid Жыл бұрын
Thanks much for such a concise and informative tutorial. One question. Is there a way to do pseudobulk DGE analysis between cell types? Thanks in advance.
@sanbomics Жыл бұрын
You could just subset the cells by cell type, similarly to what we do here. You can pseudobulk any set of cells you can subset from your data. Although, usually cell type differences are so apparent that you don't really have to worry about pseudobulk. Maybe useful if you are comparing cell type subpopulations
@qhawenid Жыл бұрын
@@sanbomics Thanks for the timely response. You're ding God's work!
@sanbomics Жыл бұрын
Thanks :) You're too kind.. It wasn't that timely xD
@stefisjustthebest7 ай бұрын
Have you come across omicverse which uses pydeg to compare two cell types and do you think thats a valid way of doing it? I'm not sure they even aggregate the cells by sample origin but would be interested to hear your thoughts!
@carlahamilcaro64578 ай бұрын
Hello thank you so much. I was wondering could I do differential expression analysis control vs treatment on all cell types at the same time ?
@sanbomics8 ай бұрын
I would put each cell type in a loop and do them separately but you can put all the results back together in the end. I'll have an example posted in the next couple of weeks.
@carlahamilcaro64578 ай бұрын
@@sanbomics oh that is amazing thank you so much ! Another question would it also be possible to do de on 3 categories at the same time ? say I want control vs sample that responded to treatment vs samples that did not respond to treatment. Thank you for all the help !
@qhawenid10 ай бұрын
How to randomly partition samples (for a scRNA-seq dataset with one sample per condition) to obtain pseudo-replicate samples, and annotate these in metadata of the main adata object? or is there a way to map the newly generated pseudo-replicates to the main adata object?
@leoburgy10 ай бұрын
You can insert the partition (described in the video) as a column (e.g., "replicate") of the adata.obs dataframe (of the main adata).
@qhawenid10 ай бұрын
@@leoburgy Thank you for this
@jalv1499 Жыл бұрын
Thank you very much! This is very helpful! I have One question: Can you clarify the difference between differential abundance analysis and this pseudo bulk approach to study the difference of two conditions?
@sanbomics Жыл бұрын
They are similar, but pseuobulk looks at the summed expression of a population of cells and other methods might look at the distribution of expression in all cells in a population. One issue, among others, being that the high sample size of many cells inflates significance.
@ramadatta70465 ай бұрын
Hi, great channel and videos. May I know if we can use soupx corrected counts instead of raw counts?
@sanbomics2 ай бұрын
I would recommend using soupX (or other denoised counts) over the raw counts if they are available
@estebanelias6958 Жыл бұрын
Hi. Firstly, thank you very much for these tutorials. Very useful. I have 3 questions: 1. How can I check if I saved my raw data after normalization, 2. Can pseudoreplicates be applied in an experiment with 2 conditions that contains pools of cells from 2-3 different samples? 3. How differences in the number of cells in a cluster from 2 conditions can affect DGE results with this method? Thanks
@sanbomics Жыл бұрын
1) Make sure to save the raw data in a layer before you normalize or it wont be there. 2) Yes, this should be ok. 3) Theoretically, the counts are normalized by size factors, but if the number of cells are vastly different, some lowly expressed genes may show in the larger population just because its larger. It shouldn't affect the genes with higher expression
@emilynwo4254 Жыл бұрын
Could you do a video on other RNA seq analysis such as SLAM-seq?