Jesus, I am desperately struggling with integration. You saved my life. Your Video is very clear and understandable. Thank you so much.
@Bioinformagician2 жыл бұрын
Thank you, I am happy to hear my video was helpful!
@qbaseathens2 жыл бұрын
@@Bioinformagician you are truly amazing
@juliabalewska68469 ай бұрын
you are a hero, explaining things in a very clear way. big thanks!
@tahadinc13022 жыл бұрын
I love your tutorials. They helped me understand the fundamentals of single-cell RNA seq. You're a great teacher. Thank you!
@PranavKatragadda-w4r3 ай бұрын
this is so good, i was paralyzed and stood up to turn it up
@seakayaker20 Жыл бұрын
Really nice tutorial. Admitedly I was lost between 24:33-25:19 when you say ' clearly see'. I've played the section back many times and still don't follow. I'd love to see a more detailed explanation for this. Many thanks and keep up the great work!
@shrabantimazumder3039 Жыл бұрын
Thank you so much for the excellent video. It is very helpful to understand, why and how to remove the batch effect.
@ravimore57862 жыл бұрын
I like your ScRNA-Seq session to explain the basics and logic behind each and every step in detail. This is really helpful for the beginner in this field. Thank you very much for educating the Bioinformatics community with your expertise.
@mostafamalmir36212 жыл бұрын
Your Tutorial are very useful for me!!! Thank you a lot.
@mrbarakgut11 ай бұрын
You the best. The vid is really thoughtful and clear. Thanks!
@amitrupani98982 жыл бұрын
There is always something new to learn from your videos. Keep it up and coming. :) Cheers!
@amitrupani98982 жыл бұрын
I didn't really understand the need for "re" normalizing samples here 26:36 (we already performed normalization once here 21:00). Just curious. Also, you should be able to see the plots side by side (or up/down) just by using pipe operator for instance, p1 | p2 (side by side ) p1/p2 (up and down).
@Bioinformagician2 жыл бұрын
@@amitrupani9898 As far as my understanding goes, NormalizeData() works off of the counts slot and overwrites the data slot. After splitting the objects based on patients, it should not affect the normalized counts, as normalization depends on library size and not number of samples (in our case cells). Hence, you are right, we would not require a second normalization after splitting. However, my thought process behind doing it this way is - 1. This is a good practice, before integrating your data, ensuring you have normalization performed on each object separately. Let’s say you read in objects separately, perform QC and filtering steps, and performed integration without merging and performing normalization as a part of standard workflow steps (when you know for sure your data has batch effects or have data from different conditions or modalities you want to integrate). Then it is important to normalize and find variable features for each object individually. 2. Running log normalization twice or a couple of times BEFORE INTEGRATION (I want to emphasize this point) will not necessarily change your normalization values, it would simply use raw counts and overwrite data slot over and over again. Long story short, was it necessary to “re-normalize” after we have performed normalization of merged seurat object? - No. However, I wanted this code to reflect the best practice for integrating data and be applicable to scenarios where a prior normalization may not be performed as in our case. Thank you for pointing this out to me and also for showing me how pipe operator and forward slash can be used to arrange plots. This is so cool, I am definitely using this henceforward!
@iheartmcyrus2 жыл бұрын
you are a life saviour!! could u do tutorials on how to run integration via harmony and pseudotime analysis in the future please
@Bioinformagician2 жыл бұрын
Those are definitely in the pipeline, will put out videos on these topics soon :)
@kitdordkhar49642 жыл бұрын
You are awesome! I can understand the group.by command better now. Thanks!
@Bioinformagician2 жыл бұрын
That's awesome! I am glad this video was helpful! :)
@lisaszmolyan4381 Жыл бұрын
Wow thank you so much, this is exactly what i looked for and so clearly explained! Keep up the great work! Thanks a lot :)
@mahimabose2 жыл бұрын
Hi, this was indeed very useful. You are a lifesaver. Could you also make tutorials on pseudotime analysis and RNA velocity analysis packages like Monocle 3, Velocyto etc in the future? Thanks
@Bioinformagician2 жыл бұрын
Thank you, I am happy to hear my video was informative! I have plans to make videos on the topics you mentioned, hopefully, will be able to post them soon. Thanks for the suggestion! :)
@nayande21512 жыл бұрын
@@Bioinformagician @ did you make video on pseudo time analysis and RNA velocity determination from single cell analysis....
@saafvaaf328611 ай бұрын
Thank you for your uploading,they are very useful
@dannyk9386 ай бұрын
Thank you for the walkthrough, your videos have helped me a lot so far as someone with no programming or data science background. I am trying to do a horizontal integration of a KO and a rescue sample - I made it through to the very end, but when I run the IntegrateData step I am getting the following error: "Error in .subscript.2ary(x, i, , drop = TRUE) : subscript out of bounds". I get that this error occurs when trying to access an index that doesn't exist, but I'm not sure what's causing that issue in my data. Any ideas on where to look would be much appreciated. Thanks!
@kritisen2 жыл бұрын
Wow this was a superb tutorial! Many thanks for putting this together
@theresahutchins2035 Жыл бұрын
Your videos are so amazing wowie!
@MsZhang6662 жыл бұрын
Thank you very much again!!! So great
@ziqifu22322 жыл бұрын
fantastic introduction!
@jaskaransingh28138 ай бұрын
Hi, Great Tutorial....I have one fundamental question: If individual Seurat objects that are merged have already been through the Normalization, find variable, scaling, and Run PCA... Do we have to again run these parameters after merging (The ones that we run before they are integrated)?
@tushardhyani39312 жыл бұрын
Thank you for this video !!
@harishnarasimhan73672 жыл бұрын
These tutorials are great, looking forward to the next ones. Could the gzip function be used instead to unzip the files?
@Bioinformagician2 жыл бұрын
These are two different compression methods and hence gunzip cannot be used to uncompress .zip files and vice-versa.
@chiranjitdas3959 Жыл бұрын
Just a question, when you initially ran the basic Seurat pipeline for normalization, scaling, etc you merged the datasets from different patients and tissue types. But later while you run the integration steps, you split the data based on patients and again ran the normalization and variable feature steps. So is it necessary to run the second normalization and variable feature step before integration and if so why? (Since we had run those steps initially)
@priyaamadhukaran4745 Жыл бұрын
Hi, very helpful video. I am a beginner and would like you to do a detailed video on where the downloaded file is supposed to be saved and how do you open them in seurat etc. At 5.52 (time ) of the video you use a screen, what is it? how did you get it (data bash 80x24) which is used to convert the tar file to normal file without tar.I see you brining up the screen as you work on seurat.
@张凯-z4w2 жыл бұрын
Your tutorials are great! I have a question. I wanna find the differences between primary tumor and metastatic tumor using scrna data, do I need to integrate these two datasets? Thank you!
@Bioinformagician2 жыл бұрын
The goal of integration is to align cell types from one condition/tumor type with the same cell types in another condition/tumor type. This can aid in cell type identification and comparison across specific cell types across conditions/tumor types. If that is what you are hoping to do, then yes, you should integrate your data.
@riyarakshit11667 ай бұрын
hi...I am using R V4.3.3 and seurat V5 was trying to do ananlysis of scRNA-seq data but the commands for unzipping the gz.tar files are not working in my laptop. I found your way of explanation very easily understandable. Please help
@maanasss Жыл бұрын
Hey BioinforMAGICIAN.. the tutorials are truly amazing and extremely easy to understand for a novice like me. I am a dentist learning to conduct scRNAseq on dental tissues. And strictly following the steps performed here in the videos. I had a query regarding merging datasets. I have merged 3 datasets from 3 different patients; but when I view the metadata - the orig.ident does not show whether that row is from patient 1, 2 or 3... all of them show "SeuratProject". I am unable to detect which rows are from which sample; and so, I cannot have different colors in the UMAPs. Can you please let me know how can I address this, Thanks in advance.
@xiaosajackxu42422 жыл бұрын
Great Job! I have a quick question: Let's say we integrate single-cell datasets "object_A" and "object_B" into "object_AB". In the integrated "object_AB", we have the 10 clusters with cluster labels as "AB-1, AB-2, AB-3.....AB-10". If I want to transfer these clusters labels to a UMAP projection in the original "object_A" based on corresponding cells' names (or barcode IDs), what kind of code can I use? Note that the cell names (or barcode IDs) of object_A did not change in "object_AB". Thanks!
@Bioinformagician2 жыл бұрын
Create a separate data.frame with cell barcodes and corresponding cluster labels from integrate object_AB like this - cell_cluster_mapping
@kimayatekade52672 ай бұрын
Hey, great video thanks! Can one use SelectIntegrationFeatures on log normalized data? I thought it is only for SCTransformed data. Please correct me if I am wrong :)
@明明-v1y Жыл бұрын
thanks! good course!
@bigteeth56442 жыл бұрын
Thank you so much for putting together all these tutorials! They are super helpful! I used to use the findIntegrationAnchors method to integrate data until I got some questions about some of the downstream analysis. Some bioinformaticians suggested me to use SCTransform to normalize data and Harmony to integrate data. Do you have any comments on this? Thank you!
@Bioinformagician2 жыл бұрын
SCTransform performs more effective normalization and effectively removes technical effects from the data. SCTransform replaces NormalizeData(), ScaleData(), and FindVariableFeatures(), so I would recommend to use that over standard log-normalization. In terms of choosing an integration method, I don't have a strong opinion on which integration method I would choose. I guess, if I need batch corrected expression values to be return I would choose CCA (more computationally intensive) and if not then I might go with Harmony.
@faisalaziz84117 ай бұрын
Great work.
@navyav.b85727 ай бұрын
anchors
@zkzhang413111 ай бұрын
谢谢!
@naVn11115 ай бұрын
I could not find the link to the video for QC, could you please put that in description. Thanks.
@mahamoussa5712 Жыл бұрын
Actually, you are the best bioinformatician! Do you use your laptop for doing data integration or do use a supercomputer? I can not run this on my laptop.
@Bioinformagician Жыл бұрын
I performed demo on my laptop. CCA can be very slow and computer intensive. Try rpca method, runs significantly faster.
@treponema69772 жыл бұрын
Thank you for making this tutorials they are very helpful. Can you provide is information about the computational resources that you used for this data set, thank in advance
@Bioinformagician2 жыл бұрын
I have mentioned the software/tools/packages that have been used to perform this analysis in the video. Hardware wise, I have a MacBook Pro with Apple M1 pro chip and 16 gigs of RAM.
@treponema69772 жыл бұрын
@@Bioinformagician I have 16 gigs of ram too but I couldn't finish the tutorial cuz ram issues
@Bioinformagician2 жыл бұрын
@@treponema6977 Are you using the same dataset?
@treponema69772 жыл бұрын
@@Bioinformagician yes exactly the same data set, running on Ubuntu 22.04 R version 4.2.1 R Studio 2022.07.1 Build 554, idk what is causing the high use of Ram finally I had to create a 32gigs swapfile to finish the tutorial
@Bioinformagician2 жыл бұрын
@@treponema6977 Wow, that's strange! I cannot think of anything that could be causing you memory issue if you have the exact same config.
@alpr18642 жыл бұрын
Hey! Thank you for this informative video! I have a question. The goal of these steps is to integrate/merge multiple datasets into one unified Seurat object. After performing these steps, I guess in order to proceed with the standard workflow of single-cell RNA sequencing, I need to normalize the "seurat.integrated" via the function of NormalizeData() in Seurat package, and then find the "Variable gene" via the function of the FindVariableFeatures() in Seurat package. I think that after scaling, dim reduction, clustering, and identifying the cluster name, I am ready to present a UMAP which represents the cells of those samples. Am I right or not?? Thank you in advance!
@Bioinformagician2 жыл бұрын
Yes, the approach seems sensible, merge the datasets first, visualize and determine whether integration is really required. Also, make sure there is no unwanted biological variation like cell cycle effects. If you do find such unwanted variation, then you will have to regress it out. Check this article out which explains how to check for it and regress out the variation - github.com/hbctraining/scRNA-seq_online/blob/master/lessons/06_SC_SCT_normalization.md Once data is integrated, then the standard workflow steps you mentioned above make sense.
@alpr18642 жыл бұрын
@@Bioinformagician Thanks for your response. I followed the standard workflow. However,, the Rstudio does not like to normalize the data after the integrated Seurat object has been created. P.S. I think that we already had normalized our data during the creation of the integrated Seurat object. Thus, I guess that is the reason for the R's error; however, I am not sure!
@chintanbhavsar56812 жыл бұрын
I'm trying to see if it is possible to use seurat for proteomics data. By using this seurat object, I plan to use cell - cell communication pipelines like NATMI, LIANA or CellCall for analysing my proteomics dataset. Any insight in this would be very helpful as I'm just getting started.
@Bioinformagician2 жыл бұрын
Unfortunately, my experience with proteomics is very limited and I do not want to mislead you by giving suggestions that I am not confident about. Perhaps digging up some papers for proteomics data can be resourceful.
@hathormaat80782 жыл бұрын
Thanks for the amazing tutorial. However I have one question: when running: seurat.integrated
@Bioinformagician2 жыл бұрын
How large are the datasets you are trying to integrate and how much memory are you using? Also, are you using CCA method to integrate? If yes, try 'rpca', it is computationally less intensive. Also check out this thread: github.com/satijalab/seurat/issues/1355
@josyulavijaysai2223 Жыл бұрын
Hi, I really like the information and thanks a lort. I was wondering if there is a way I can perform differential expression between the samples in each cluster rather than between the clusters?
@mdnaveedkn9 ай бұрын
Hi, please make a video on vertical integration scRNA-seq and ScATAC-seq from same cell❤
@cats_like_felix Жыл бұрын
Hi, thanks so much for the videos. Can I ask please, I'm trying to merge datasets where one dataset is missing a prefix to the rownames thats was added when trying to seperate features between samples run at the same time from different species. Is there a way to add or remove a prefix from all the rownames or features from one dataset? Thanks again!
@kylereese6463 Жыл бұрын
Hi, your videos are incredibly helpful for my ugrad research. At 18:05, you use the function PercentageFeatureSet with the pattern set to '^MT-'. I looked through the data that we're using in the video, and I couldn't find any kind of variable with the substring 'MT' in it. Where exactly is the regex expression pulling that pattern from? Thank you! Additionally, do you know of a way to get the gene expression values for each ensembl gene for each sample in this example?
@Bioinformagician Жыл бұрын
PercentageFeatureSet function calculates percentage of all counts belonging to a subset of features (i.e. genes). So we here we are calculating percentage of counts corresponding to mitochondrial genes which start with MT. I have explained these single cell RNA-Seq basics in this video: kzbin.info/www/bejne/a3mlq5qpr52kr80
@kowshicroy1418 Жыл бұрын
Thank you so much
@meetukaur0909 Жыл бұрын
I have made the Seurat objects just like you said, but on doing the merged_seurat process i am getting an error. It says the said seurat object is not found. IDK why
@joshuagrant45692 жыл бұрын
Really useful tutorial, thank you! By any chance do you know if it is possible to merge two h5 files to run this analysis on the merged matrix?
@Bioinformagician2 жыл бұрын
You could read each h5 run into a Seurat object and then merge two Seurat objects.
@purplepandaoverlord77802 жыл бұрын
Hi. Non-computational, struggling lab person here. I am trying to do a snRNAseq analysis but I am using just one sample. How can I make a Seurat object from a Seurat list again so I can skip the Integration step (which makes a new Seurat object by default). I need to use the object rather than a list for subsequent steps and I can't find anywhere online an answer to this. Thank you!
@alyaahessin77842 жыл бұрын
Thank you so much for such useful tutorials, every time I download the files from GEO, they are not directly showing up in R program like yours; would you advice how I can transfer them from downloads to RStudio? so I can follow the tutorial with you
@Bioinformagician2 жыл бұрын
After downloading files from GEO, I load data into R using using commands to read files in. What commands are you using to read your files in R?
@alyaahessin77842 жыл бұрын
@@Bioinformagician Thank you for replying, would you please share with me the command you used to read files in?
@germanovicente4616 Жыл бұрын
Hi! I have performed quality control individually for each data set in my analysis, but when I try and merge the seurat objects, I get an error saying: Error in `.rowNamesDF
@manjushagovindh4527 Жыл бұрын
Hi, I have a doubt, for single-cell RNA data taken from GEO there will be 3 raw data (count matrix, barcodes, and gene expression ) so should we take all 3 data or only the count matrix? or load all 3 raw data into R and do the analysis??
@Bioinformagician Жыл бұрын
I had previously created a video that would answer your question: kzbin.info/www/bejne/aanGhaOnht-IrbM
@tahadinc13022 жыл бұрын
I see that the ram usage on your Rstudio is pretty low. How do you keep the ram usage that low although you have all those data structures in the environment?
@Bioinformagician2 жыл бұрын
I think that's because I am not running memory intensive processes on all those data structures at once. I am sure my RAM usage must be going up when I am running memory intensive Seurat functions. It must be coming back down when I am wrangling or just visualizing my data.
@tahadinc13022 жыл бұрын
@@Bioinformagician Thank you for letting me know and I am looking forward to future episodes!
@abdullahugurlu2622 Жыл бұрын
did not we already used filtered data in the beginning? why did we do QC and filtering again?
@kuldeepmakwana7242 Жыл бұрын
Hi! I have a little different question to ask. How can I create an Anndata object file from Seurat object to then run ran velocity estimation?
@shubhamoyghosh60052 жыл бұрын
Hi It was very useful. Wondering whether scanpy has similar methods for integration.
@Bioinformagician2 жыл бұрын
I am sure there must be...
@anamikapandey47692 жыл бұрын
thankyou for this video, i have one question if we do not have tar.gz file in the provided GEO accession no. THEN how should i start with ? please suggest as i am quite puzzled with this thought, the files provided in the accession no are the peaks tables. please kindly drop your suggestion. my aim is to identify the expression of particular gene in a particular cell.please suggest. thankyou
@Bioinformagician2 жыл бұрын
Can you confirm the data you are looking at is a RNA-Seq dataset?
@anamikapandey3613 Жыл бұрын
@@Bioinformagician yes it is RNA seq dataset ma'am
@sreejas1302 Жыл бұрын
Hi, after integrating the dataset by CCA analysis how we can extract the correlation coefficients of the integrated dataset?
@chrisdoan32102 жыл бұрын
Hi @Bioinformagician. I have 2 data from a healthy and a diseased person and I would like to compare 2 data sets and see differently regulated genes. Could I use integrate workflow? Thank you so much!
@Bioinformagician2 жыл бұрын
Yes, you can.
@chrisdoan32102 жыл бұрын
@@Bioinformagician This advice made me confused: "Integration is more complicated where it is attempting to find cells with similar expression profiles and uses them as anchors, but it is only appropriate in certain situations. Merging is just putting 2 data sets in the same Seurat object, so is a lot simpler." What do you think about this?
@surinderpal9498 Жыл бұрын
Hello Ma'am, I am facing this error from last 4 days, can't resolve it, please help me to solve it, Thanks.. > merged_seurat_filtered
@ljing65 Жыл бұрын
after I run the create seurat object code, I got this "Warning: path[1]="GSE180665_RAW/HB17_background_filtered_feature_bc_matrix/matrix.mtx.gz": No such file or directoryError: Cannot find expression matrix at GSE180665_RAW/HB17_background_filtered_feature_bc_matrix/matrix.mtx.gz" Any idea? Thank you very much.
@ljing65 Жыл бұрын
solved.
@arianescajeda639 Жыл бұрын
Grate videos. I am having this Error in validityMethod(as(object, superClass)) : object 'CsparseMatrix_validate' not found each time I try running: marged_seurat.s
@Bioinformagician Жыл бұрын
It seems the issue is stemming from "Matrix" package. Can you try to re-install or update the package and see if you still get the error?
@arianescajeda639 Жыл бұрын
@@Bioinformagician You are so nice for answering, I uninstall it and reinstalled but it did noit work. is it possible to skip this part or use another method ?
@ahmedadelelbaz1694 Жыл бұрын
Is this different than RUNCCA ?
@MrQiushenfeng2 жыл бұрын
The first for loop, i am seeing "Error in url(description = uri) : URL scheme unsupported by this method"
@Bioinformagician2 жыл бұрын
Can you send me the command you are trying to run? Also, you are sure the paths to matrix, feature and barcode files you are providing are correct?
@Bioinformagician2 жыл бұрын
Apparently, another user encountered the same issue. The user could solve the issue - quoting the user (@Alp R): Solved! The problem was from the new version of R. For windows users, you can install R version 4.0.5 (2021-03-31). For more info: github.com/satijalab/seurat/issues/5687 Hope this helps you as well!
@johnreddy18172 жыл бұрын
Changing R version to 4.0.5 didn't work. You can also use Read10X function to solve the above issue. for(x in dirs) { name
@dotheneedful55 Жыл бұрын
Thank you so much for this information. 5:25 , I am having trouble with the ReadMtx command. I continue to receive an Error: Cannot find expression matrix at ....Rproj.usermatrix.mtx.gz. I've tried a variety of solutions. Do you have any hints?
@dotheneedful55 Жыл бұрын
I figured it out. I simply had the wrong working directory