Super helpful as a biologist with some CS training. Thank you!!!
@jimmylao349 Жыл бұрын
Very good explanation, I got the final script to try. Thanks
@prasadchaskar85422 жыл бұрын
Thanks a lot for the tutorial. Could you please add a tutorial on trajectory analysis?
@Bioinformagician2 жыл бұрын
Working on that. Please stay tuned! :)
@prasadchaskar85422 жыл бұрын
Thanks a lot.
@learningtime13672 жыл бұрын
Thanks so much! Can you please do a video on GO analysis/KEGG for bulk rna-seq analysis? Thanks again
@Bioinformagician2 жыл бұрын
Thanks for the suggestion. I have plans to make a video covering this topic. Please stay tuned :)
@pegahhejazi8399 Жыл бұрын
Hello, thank you for the super helpful tutorial. I have a question regarding my own dataset. I have 3 groups (each has 3 rep), young, old+treatment, and old w/ treatment, does this tutorial apply to compare 3 groups? if not do you have other tutorials for that kind of dataset?
@raghavsharma4347 Жыл бұрын
Why do you have a young dataset, is it meant to be a control? You will need to set your model matrix as ~ age + treatment, and your contrasts will need to compare the treatment to the no treatment.
@aravindsundar4968 Жыл бұрын
Great tutorial! Thanks for sharing.
@bondjams80842 жыл бұрын
Thank you so much! Your videos are so good!
@jakobhansen547711 ай бұрын
Thankyou for a great video! what if I have very different cellcounts in clusters I want to compare? I would expect very different expression just due to different cell counts. Will a normalization step in deseq2 cancel out this difference?
@davidepasini38072 жыл бұрын
Hi, thanks for the video and the nice explanation, this video happens at the right time, in fact I had thought to try this kind of analysis these days, I watched and tried your tutorial and I wondered how much can weigh the amount of cells per sample, for example in your case you have (looking at B cells) 864 with ind 1015 and 81 with ind 1039 this affects the analysis?
@Bioinformagician2 жыл бұрын
If I am understanding you correctly, you mean to ask does the amount of cells per sample affect the analysis? I would think not, because we are aggregating instead of averaging the counts across all cells to the sample level. So the number of cells should not affect the count values.
@anguscampbell3020 Жыл бұрын
@@Bioinformagician There are a number of methods which argue that the drop out in scRNA-seq data needs to be accounted for. It would be great if you could do a tutorial on MAST which is supposed to be able to account for this and differentiate between biological and technical variability in cell specific UMI.
@saraalidadiani58816 ай бұрын
Thank you for the nice video. Just a question, how to account for two covariates in differential gene expression of single cell RNA seq data like sex and Age? thanks!
@subhasen26112 жыл бұрын
Thanks for the nice tutorials. Will you be adding any tutorial for trajectory analysis/ Cell Fate Decisions?
@Bioinformagician2 жыл бұрын
Yes, I will be making a video covering these topics. Thanks for the suggestion! :)
@tushardhyani39312 жыл бұрын
Thank you for this video !!
@mayconmarcao45542 жыл бұрын
Graceful tutorial! I wonder which would be better to modeling a phenotype prediction (as input): i) pseudobulk or ii) single cell expression levels? Thanks for your existence =].
@Bioinformagician2 жыл бұрын
What is the outcome that you are hoping to predict? I do not have experience with statistical modeling, I am afraid I might not have useful inputs.
@mayconmarcao45542 жыл бұрын
@@Bioinformagician I think I misunderstood the pseudobulk concept. Pseudobulk turns a single cell matrix into a patient-based matrix (as bulk RNAseq). What I thought was pseudobulk: I thought that with pseudbulk I'd be able to concatenate similar cells within a cell cluster to increase gene expression signals. But in this way pseudobulk would not represent patients but subclusters. Do you know if I can adapt pseudobulk strategy to aggregate subclusters?
@熊飞-b5k2 жыл бұрын
Hi, thanks for the video,this is very helpful,Will you be adding any tutorial for monocle3? thank you again for these wonderful videos.
@Bioinformagician2 жыл бұрын
Yes, I definitely have plans on making videos using monocle3.
@熊飞-b5k2 жыл бұрын
@@Bioinformagician Hi there,In the my study I face to another problem: Is it possible to compare two conditions without repetition within a certain cell type?Which analysis method could be used, or what package could be used?Hope for your reply.
@Bioinformagician2 жыл бұрын
@@熊飞-b5k Can you explain what do you mean by "compare two conditions without repetition within a certain cell type?" You mean you want to restrict comparison between two conditions to only certain clusters?
@zahraabdi16132 жыл бұрын
@@熊飞-b5k I have same problem. If you have found the solution, would you mind expalining it to me, please?
@wi1lhunting3 ай бұрын
Very good videos, but why my 'cts' is not a table, because seurat v5? Can u tell me the answer? THX U!!!
@ncedilemankahla97584 ай бұрын
excellent video
@singhh50502 жыл бұрын
Hi! Do you think that pseudobulk analysis or GSEA is better for downstream analysis of scRNA-seq data? Especially when considering that there may be two different conditions (experimental and control). What are the advantages and disadvantages for using each method?
@Bioinformagician2 жыл бұрын
Pseudobulking and GSEA are completely different methods serving different purposes. Each of the downstream analysis would make sense, depending on what the goal of your analysis is. Typically, pseudobulking is performed to find genes differentially expressed followed by which we use enrichment methods to find what pathways/GO terms are enriched.
@singhh50502 жыл бұрын
@@Bioinformagician Okay, that makes sense!! Thanks so much :)
@rosaicelalunaramirez12842 жыл бұрын
Thank you for the great tutorials, they've helped a lot on my research. I am currently working with my own single-cell data that I obtained from 6 samples (3 controls and 3 experimental). I have tried your tutorial but I get stuck on the part where you include the ind, the individual identification. Cell ranger only gives me the cell sequence followed by a -1 so I tried that and adding the condition. It looked like this CONTROL_ACCAACAGTGCATTAC-1 but when I use the aggregate expression function it gives me 12,972 columns as if it was taking each of the cells as individual sample. How can I perform your analysis without an identification number? or how can I assign it? Thank you!!
@Bioinformagician2 жыл бұрын
The goal is to aggregate counts at sample level. In my case, each sample belong to an individual hence counts are aggregated to ind level. In your case, you might not need ind information. You could simply add a 'sample' column in your metadata, merge all samples and aggregate counts to the sample.
@urmom.com629 Жыл бұрын
@@Bioinformagician how do you "merge all samples"?
@wanisajad7859 ай бұрын
@Bioinformagician: Are you suggesting to use raw counts (slot =count) for un-integrated data and normalized counts (slot=data) for integrated seurat object?
@koushikponnanna83125 күн бұрын
Even with integrated data, RNA assay; slot=counts can be used
@blackmatti862 жыл бұрын
Your videos have been truly instrumental for me to grasp the concept of bioinformatic data analysis, especially for single cell RNA-seq. As far as I understand, scRNA-seq (or scATAC-seq) can be divided into droplet-based (e.g. 10X) and plate-based approaches, e.g. SMART-seq2. There seem to be a fair amount of help guides and instructions for the former method but not so much for the latter, I have noticed. Is there a resource that you know, that can guide a novice through a single cell (or single nucleus) RNA-seq performed using a plate approach (e.g. single cells FACS sorted into 384 WPs)? Thank you! xx
@Bioinformagician2 жыл бұрын
To get an overall idea of the pipeline, check this out: www2.stat.duke.edu/~sayan/Sta613/2018/singlecellrnaseq-170131050320.pdf This paper performs a comparison analyses between 10X and Smart-seq2: www.sciencedirect.com/science/article/pii/S1672022921000486#s0055 Seurat also provided a vignette to integrate multiple datasets across different technologies (which includes smart-seq2): satijalab.org/seurat/archive/v3.1/integration.html This can give you an idea of how these datasets are processed before integration. Hope this helps!
@blackmatti862 жыл бұрын
@@Bioinformagician Thank you ❤️
@xiaosajackxu4242 Жыл бұрын
If I have 4 conditions, how to modify the codes to find DEGs that is enriched/depleted in at least one condition?
@zahraabdi16132 жыл бұрын
It was great! Thanks so much❤What should I do if my Seurat object doesn't have 'ind' column? I mean each cell just has the information about its cluster and the condition but not the individual information.
@Bioinformagician2 жыл бұрын
Can you tell me where did you download your data from?
@baymin4827 Жыл бұрын
Your videos have been very helpful to me! What should I do if my Seurat object doesn't have 'ind' column? I am analyzing my own dataset. Thanks in advance
@maytelopez-cascales6113 Жыл бұрын
Very nice tutorial, I have a question, how could I do a differential expression analysis making the contrast between counts coming from different experiments, I have already done the pseudobulk with the single cell experiments, and I want to compare them with the counts from my RNAseq. Could I make a matrix with the data coming from two different techniques? will you make a tutorial about that, thanks.
@raghavsharma4347 Жыл бұрын
You can add your counts from your RNA-seq as another sample then adjust your contrasts so that it is your RNA-seq data minus your single cell datasets.
@张凯-z4w2 жыл бұрын
good video! thank you sooooooo much!!!!
@abassohilebo22132 жыл бұрын
Thank you for the video Can you organize workshop?
@Bioinformagician2 жыл бұрын
I haven't given a thought on organizing one yet. I shall think about it.
@abassohilebo22132 жыл бұрын
@@Bioinformagician please do People tends to love workshop more, and it will double if not triple your subscribers
@bigteeth56442 жыл бұрын
Hey there! First of all, I'd love to express my thanks to you! Your videos are helpful for our analysis. Although I ran into some problems trying to follow your tutorial. Our dataset is the aggregated snRNAseq dataset from six samples. We performed doublet removal, SoupX, scTransform normalization and integration. Some of the assay 'RNA' values are not integer. When I was searching for a solution, I read from the DESeq2 vignette that we should use un-normalized data. Do you have any suggestions on this issue? Thank you!
@Bioinformagician2 жыл бұрын
Which slot in 'RNA' assay are you particularly referring to i.e. counts, data or scale slot? As for the demonstration here, we have used 'counts' slot which stores un-normalized raw counts to aggregate across samples.
@khr11382 жыл бұрын
because of SoupX, it makes raw counts rational number. Use round() function! in DESeqDataSetFromMatrix
@Iman_19872 жыл бұрын
could you please demonstrate isoform analysis by nanopore?? thnx
@thwoals4562 жыл бұрын
Hello, really thank you so much for your video!!!!! I have one question. I have followed your single-cell tutorial video using my single cell data. However, there is no 'ind' column in my seurat object. Could you tell how to make that column? Additionally, I did scRNA seq for one control sample and for two treatment samples (total 3). Then, is it possible to make an 'ind' column in the control sample? And, the ratio of control versus sample (1:2) can affect the downstream analysis?? Sorry for my many questions..
@Bioinformagician2 жыл бұрын
The 'ind' column was already present in the dataset, I did not create it. Did you download the data the same way I did it in the tutorial? Are the two treatment samples replicates or separate samples?
@thwoals4562 жыл бұрын
@@Bioinformagician Instead of using the data in your video, I used my scRNA-seq data for pseudo-bulk analysis. So I asked how to make the column similar to the 'ind' column. And "the former" is my reply to your second question. I have two replicates of the treatment sample.
@Bioinformagician2 жыл бұрын
@@thwoals456 Oh I get it now. So basically "ind" column is nothing but information about samples in my dataset (ind stood for individuals). If your dataset have sample information, you could use that column to aggregate your counts to sample level.
@mischmuuu2 жыл бұрын
Thank you for this great tutorial! Is it possible to do a pseudo-bulk DE analysis with only one single-cell sample per condition? How would the statistics work?
@akundiraghukiranvydhyanath99392 жыл бұрын
I'm afraid that won't be possible. Deseqw requires atleast 2 biological sample replicates. The other alternative would be edgeR but you have to give an dispersion value
@Bioinformagician2 жыл бұрын
DESeq2 is not designed to work without replicates.
@albanaisai34292 жыл бұрын
Hi there great video, do you know how to ise Kallisto?
@SerorONG2 жыл бұрын
Hey there, great tutorial! May I just ask, how did you get so proficient with RegEx (regular expression). I feel that its one of the few core skills that would help immensely and is highly transferrable, especially during the initial stages of data-processing. Jus wanna know if you could recommend any resources to learn RegEx?
@Bioinformagician2 жыл бұрын
I first learnt regex when I learnt Perl. The more I kept using regex, the more it started to make sense. I use regexr (regexr.com/) often to practice and build my regex. Here are a few resources that could help you practice it more - 1. regexone.com/ 2. regexlearn.com/ 3. www.hackerrank.com/domains/regex Hope this helps!
@bumpingbell2 жыл бұрын
Hi, I am analyzing differentially expressed genes in a snRNA-seq dataset (GSE159812), for subsequent pathway analysis. Using FindMarkers, I get extremely small p-values for differentially expressed genes. However when I aggregate the counts by cell type & sample and perform pseudo-bulk analysis, less than 0.1% genes are significant (p
@Bioinformagician2 жыл бұрын
FindMarkers tend to inflate p-values as each cell is treated as a sample (as cells within a sample are not truly independent of each other) unlike pseudo-bulk where counts are aggregated at the sample levels. Both methods will not give you the same differentially expressed genes as single cell methods tend to identify variation between cells and pseudo-bulking will identify variation among samples (between populations). Also single-cell methods tend to identify highly expressed genes as differentially expressed and exhibit low sensitivity for genes having low expression. Did you aggregate counts by samples or by both - samples & cell types?
@bumpingbell2 жыл бұрын
@@Bioinformagician I meant that I aggregated counts across the cells to sample level, and for each cell type I made comparisons between 8 case & 8 control samples. I think pseudo-bulk is closer to my expectations, but as mentioned, only few genes have significant adj p-values from this method (which is surprising to me). This makes comparing single genes barely possible. If we use pathway analysis, where we can just input the log2FC of each gene from pseudo-bulk, we may not need to care about the statistical significance of each gene, but we still need to filter with adj p-values for the input to be valid. Am I right in this sense? We’re using Ingenuity Pathway Analysis. (Sorry if this is a bit off-topic in any way)
@smachead2 жыл бұрын
Hi there! I was wondering why you are using the normalised and scaled data to generate the aggregate counts - should we not use the raw data?
@Bioinformagician2 жыл бұрын
I am using "counts" slots that stores raw counts to generate aggregate counts. cts
@NicholasJohnson-m9l Жыл бұрын
Aren't we not supposed to normalize first? DESeq requires raw read counts. Or is the counts slot raw?
@raghavsharma4347 Жыл бұрын
When she aggregates counts, the function pulls data from the raw counts. Normalized counts are only used for the Seurat pipeline, but not used for differential expression analysis.
@sreejas1302 Жыл бұрын
Thank you so much. Is it possible to convert bulk RNA seq into Sc-RNA seq by R? Please reply.
@raghavsharma4347 Жыл бұрын
Why are you trying to convert bulk data to single cell? Are you trying to compare data between two datasets?