DoubletFinder: Detect doublets in single-cell RNA-Seq data in R

DoubletFinder: Detect doublets in single-cell RNA-Seq data in R | Detailed workflow tutorial

Рет қаралды 21,488

Bioinformagician

Күн бұрын

Пікірлер: 57

@熊飞-b5k 3 ай бұрын

Hello Khusbu, when I run "> sweep.res.list

@juliwang3751 Ай бұрын

I think it's important that you explain why we assumer 7.5% doublet in our data. I know it has something to do with the number of droplets captured. But how do we determine the number of droplets captured (in order to infer the estimated % of real doublets)? Thank you!

@kitdordkhar4964 2 жыл бұрын

This was very useful. It was different from our analyst strategy. Small request, instead of terminal bash, it would be helpful if you can route through save folders and files [setwd> ]. Thanks!

@Bioinformagician 2 жыл бұрын

Thank you for the suggestion, I am more comfortable in maneuvering through the folders via terminal. However, I shall try to do it via R next time :)

@hyebinhan6473 2 жыл бұрын

THANK YOU!!! This was a life saver. Quick question: I plan to use tabula muris senis, the mega mouse single-cell dataset and I was able to manuver through selecting age/organs I wanted to use. BUT I believe they have datasets per mouse and per organ... if that's the case, do I still have to run doubletFinder on each mouse or do you think I can use the selected age/organ, with the assumption that preparation process was similar enough that batch effect would likely be minimal..... I have 15 mice on Tabular muris I plan to use and additional 15 mice I have to filter 🥲

@Bioinformagician 2 жыл бұрын

I suggest you first process your data with all 15 mice at once, as a merged object and visualize. Look for batch effects. If you don't find any, then you run doubletFinder on merged object. If you do find batch effects in your data then you will have to take the run doubletFinder for each individual mice route.

@xiaosajackxu4242 2 жыл бұрын

Amazing job! Can you paste your codes of how you subset and recluster singlets after finishing DoubletFinder? Or can you confirm if you did exactly the same as the following steps? Thanks! singlet

@Bioinformagician 2 жыл бұрын

Yes, I would run the steps you ran to recluster my cells after removing doublets from my data. Thank you for the suggestions for video topics, I have them in my pipeline :)

@mostafaismail4253 2 жыл бұрын

Please we need application of NMF (non negative matrix factorization) in scRNA-seq for finding expression programs

@Bioinformagician 2 жыл бұрын

I'll consider making a video on this soon :) Thanks for the suggestion.

@sonaaritra 11 ай бұрын

Hello Khusbu, I'm working with a publicly available dataset GSE193688 where they have provided individual .h5 files for every samples. I'm trying to run the doublet finder program on it but as you have mentioned that it should not be preferable to run on merged samples then should I run it for each one separately? I have a total of 18 files for individual biopsy samples. Is there any faster method?

@marionaisern6420 Жыл бұрын

I don't understand why in a dataset of 15000 real cells, a pN of 0,25 would represent the integration of 5000 artificial doublets... If anyone can solve my question... Thank you!!!

@문홍만-y2t Жыл бұрын

22:25 I want to clear lines with doublet characters from DF.classification column in metadata table. How can I clear it by typing command? Because to remove the doublet and integrate all samples.

@giovaniclaresta2356 Жыл бұрын

Hi Thank you for very details tutorial!! May i know how I can get the cell identity from demuxlet data after I get all the singlet?thank you

@熊飞-b5k 2 жыл бұрын

Thank you for this video, but the question is whether the search and removal of doublet should be carried out before data merging and QC. In your previous video of data integration, you merged 7 samples. Does that mean that we need to clean the data 7 times before merge？Hope for your reply.

@熊飞-b5k 2 жыл бұрын

What I mean is when we need to integrate several datasets, before which step should we perform the detection of doublets？Befor merge datasets？If the detection of doublets should be done before merge() function, is it necessary to perform QC and pre process standard workflow for each dataset separately？

@Bioinformagician 2 жыл бұрын

Yes, it is recommended to perform doublet removal and QC for each dataset individually before integrating datasets. It can however be run on merged data. The standard workflow steps just helps identify and remove clusters of cells with low UMI or high mitochondrial %. These low quality cells must be filtered out before running a doublet prediction algorithm and before integrating and moving ahead with further downstream analysis.

@EdDone-q6g 9 ай бұрын

Thanks for this workflow and shared the code. I have one issue when I run your code at the second last step. > DimPlot(pbmc.seurat.filtered, reduction = 'umap', group.by = "DF.classifications_0.25_0.21_691") Error in `[.data.frame`(data, , group) : undefined columns selected In addition: Warning message: The following requested variables were not found: DF.classifications_0.25_0.21_691 Could you please help to check it? Thanks.

@Carolina_pt Жыл бұрын

Thank you so much for this tutorial it's very informative. I was wondering if you knew how to find the expected number of doublets for icell8 sequencing data? Thank you in advance

@veerachon2281 2 жыл бұрын

Could you please explain, How to assume this or this value is commonly expected ? -> Assuming 7.5% doublet formation rate

@Bioinformagician 2 жыл бұрын

10X user guides provide expected multiplet rate for different protocols. Here I have used the table on page 18 from the Chromium Next GEM Single Cell 3ʹ Reagent Kits v3.1 user guide (support.10xgenomics.com/single-cell-gene-expression/library-prep/doc/user-guide-chromium-single-cell-3-reagent-kits-user-guide-v31-chemistry) to get the doublet formation rate.

@youvikasingh7955 Жыл бұрын

@@Bioinformagician But what if I had 10000 cells as input and approx 1100 recovered cells?🤔..Thanks really helpful channel😍

@jessicacastillo8535 11 ай бұрын

@@youvikasingh7955 How did you solve that issue? Thanks!

@kalpanidesilva3062 Жыл бұрын

Thank you very much. Can you please do a tutorial on how to use DropletUtils library

@chadhighfill4578 2 жыл бұрын

How would you filter out the doublets?

@tomasmontserrat704 2 жыл бұрын

I think you can use subset(): pbmc.seurat.filtered

@Bioinformagician 2 жыл бұрын

That's right! You can use subset() to filter out doublets.

@chadhighfill4578 2 жыл бұрын

@@Bioinformagician How do you do this when DF.classification_SOME VALUE is always changing? i.e. how do you filter out the doublets in a dynamic way?

@SavannahVictoria-d8i 11 ай бұрын

Thank you for your tutorial,could you please tell me if the paper tell us how to mark doublets in the raw data?

@Surajcxscsingh Жыл бұрын

so we are only putting aside hetrotropic doublets not homotropic

@blackmatti86 2 жыл бұрын

Can I still run DoubletFinder on 'SCTransform normalised' sample? If yes, is it as simple as setting 'sct = TRUE' in 'sweep.res.list_pbmc

@Bioinformagician Жыл бұрын

DoubletFinder can be used on Seurat object that has been SCTransform during pre-processing steps. And yes, it is as simple as setting sct = TRUE.

@anaarsenijevic3207 Жыл бұрын

Hello, Thanks for the great tutorial! I have one question, maybe I missed it, but - why do you use the nsclc data when calculating the pK value (starting from line 47) rather than pbmc that you used in the steps before that? Thank you!

@RupakDeySarkar Жыл бұрын

@anaarsenijevic3207, she used the pbmc seurat object only in line 47. Only the name of the list she created has the nsclc name, you can name it anything you want.

@kendy17 2 жыл бұрын

You're awesome keep up the amazing work!

@Bioinformagician 2 жыл бұрын

Thank you :)

@ravimore5786 Жыл бұрын

Thank you very much for this workflow. It's really helpful to understand the process and steps involved in the doubletfinder. I appreciate your efforts to educate the researcher through this activity.

@Ob-xt4ej Жыл бұрын

Thank you for tutorial. I run pK Identification code, and then pK=0.2. The number of doublets is the same, but the shape of the graph is different. I wonder if I can move on to the next step or if I need to fix this issue. Thank you!

@Bioinformagician Жыл бұрын

Did you use Strategies for pK optimization? Did you find your optimum pK to be 0.2?

@kimiaslk9348 7 ай бұрын

you are amazing thank you so much

@parmenideskim9739 2 жыл бұрын

A really great video!!! Thank you very much !!!

@tulikabhardwaj484 2 жыл бұрын

Waiting for your metagenomics and metatranscriptomics one.

@Bioinformagician 2 жыл бұрын

I will surely consider making a video on this in the near future :)

@tushardhyani3931 2 жыл бұрын

Thank you for this video !!

@pariaalipour61 2 жыл бұрын

Thank you so much for this helpful video. I have a question. At the last step that we detect doublets and we remove them how we could go back to the first step to do integration? no sure how to transfer the needed assay to the data.

@Bioinformagician 2 жыл бұрын

You shall use "integrated" assay (if used CCA method to integrate), and move forward with the steps just how you would process data in 'RNA' slot of Seurat object.

@pariaalipour61 2 жыл бұрын

@@Bioinformagician When I do DoubletFinder the integration still needs to be done. I mean after subsetting doublets from every individual sample, what approach I need to take. Should I move forward with subsetted samples and integrate. Thanks

@kanahia7460 2 жыл бұрын

I do really enjoy your channel 🤠 I am doing same analysis and it is very kind of you that you share your approach and code! Many thanks 👍

@Bioinformagician 2 жыл бұрын

I am glad to hear my videos have been helpful! Thank you for your kind words :)

@tulikabhardwaj484 2 жыл бұрын

Thanks thanks thanks a lot

@blackmatti86 2 жыл бұрын

What do you do when running 'bcmvn_pbmc

@Bioinformagician Жыл бұрын

I am unable to answer why you get NULL at find.pK step as I cannot recreate this error.

@rahmaqadeer9178 Жыл бұрын

Did you sort this out? I also get the same 'null' as I run this although my data is stored in this variable when I print it

@blackmatti86 Жыл бұрын

@@rahmaqadeer9178 No, didn’t manage to fix this

@beatriceplougastel-douglas1861 Жыл бұрын

I am also getting ' bcmvn_nsclc % select(pK)' my numeric value for the pK is 20

@NBAasDOGG Жыл бұрын

@@rahmaqadeer9178 The problem is that ParamSweep cannot find your normalized RNA counts. Here’s how to fix it: Instead of using "NormalizedData(sobj, normalization.method = "LogNormalize", scale.factor = 10000)" Do the following: "sobj