Pathway enrichment analysis tutorial in R with clusterProfiler()

  Рет қаралды 14,230

Biostatsquid

Biostatsquid

Күн бұрын

Пікірлер: 26
@singh_nimisha
@singh_nimisha Жыл бұрын
Laura, you teach us like we are a bunch of kids. I find it awesome! You are so sweet! This helped me so much, Ma'am! Thank you.
@xlxeat
@xlxeat Жыл бұрын
The tutorial is very helpful even i ran the enrichment pipeline lots of times before. Your code gave me useful tips!
@markrenton6981
@markrenton6981 9 ай бұрын
Has anyone tried changing all of the mouse .gmt files to .RDS? I can get all of them to do it except for the GO CC set. Anyone else run into this problem? It will read the .gmt file, but when i execute the saveRDS() function, it just doesn't appear in the folder like it did for all of the other .gmt files
@tanmoychatterjee7922
@tanmoychatterjee7922 24 күн бұрын
Please ma'am don't use preinputed code it is not helpful. We need how to write R script
@xiaofeili7379
@xiaofeili7379 7 ай бұрын
This is a great tutorial. I have a question, how about if I want to analyze mouse data and GSEA didn't have a murine KEGG gene set?
@aidaht1
@aidaht1 Жыл бұрын
your channel and videos are greatI liked your website as well, ! thanks so much for your help. I have a question, I have conducted differential expression analysis on TCGA-PRAD and a microarray dataset (GPL570) to get differential expressed genes between Normal and Cancer tissues. after that I drew a Venn diagram to get common DEGs between these two dataset, however my common DEGs ar just gene symbols, I don't have logFC or p.value for them(I have these for each of the datasets but I don't have them after drawing Venn diagram). how can I do PEA with cluster pofiler for my common DEGs obtained from Venn diagram? thanks in advance.
@biostatsquid
@biostatsquid Жыл бұрын
Hi! Thanks so much for your feedback, I'm glad your found them useful! I think the best option is to perform PEA independently for each of the two datasets (careful, remember to subset the background genes for the genes present in the datasets separately). Then maybe you can use a similar approach and see which pathways overlap. Otherwise, you might consider doing GSEA (video coming up soon!) on your selected gene list, ranking them by a consensus metric - e.g., some kind of average (but careful if you are considering log2FC as the sign is also important). This paper on concordant integrative gene set enrichment analysis might help: pubmed.ncbi.nlm.nih.gov/24564564/ Hope this helped!:)
@warmtaylor
@warmtaylor 10 ай бұрын
Should codes in this chunk: # Subset to those pathways that have p adj < cutoff and gene count > cutoff (you can also do this in the enricher function) target_pws genecount_cutoff]) # select only target pathways have p adjusted < 0.05 and at least 6 genes res_df genecount_cutoff) as there are some cases when one of the two direction (up or down) of pathways with the same name does not pass the padj_cutoff, so directly filtering the values themselves would be more accurate?
@pulcinella96
@pulcinella96 Жыл бұрын
Laura, thank you so much for all of your amazing content. It's helped me so much during my MSc course. Just wanted you to know that all of your hard work is much appreciated!
@biostatsquid
@biostatsquid Жыл бұрын
Thank you so much for your comment, this means a lot! Glad it helped:)
@mihacerne7313
@mihacerne7313 Жыл бұрын
SQUUUUUUUUUUUUIDTAAAAAAASTICCCCCC
@KeshavSharma-lh7zf
@KeshavSharma-lh7zf 4 ай бұрын
can i follow the same for proteomics data
@shrivastava3892
@shrivastava3892 3 ай бұрын
The differential data that you loaded in the r script initially, which has approx 30 thousand something genes and four variables, are they pre-processed data, like removing the duplicates and adjusting the p values and log FC?? Or are they raw data tT saved from r script?
@praveenkhatri4084
@praveenkhatri4084 Жыл бұрын
Very informative, I was wondering, If I want to GSEA for plant for eg soybean, how I do that, as ORG.db library is not available for that, can u plz help me with that
@biostatsquid
@biostatsquid Жыл бұрын
Hi Praveen, thank you for your comment! Actually, I have no experience working with non-model organisms, but I think perhaps another tool might be of more use? I saw a few people recommend agriGO enrichment tool for plant species - www.biostars.org/p/112022/ www.biostars.org/p/261449/ but if you want to stick with clusterProfiler, you can always create a custom gene set, as long as you keep the format clusterProfiler needs:) Good luck!
@Myri912
@Myri912 4 ай бұрын
Hello! thank you very much for the video, it has helped me a lot. However I had a query as I have played the whole script on my computer with my own SDR data. I have run the whole script and everything seems to be correct except when I run the last step "target_pws
@Myri912
@Myri912 4 ай бұрын
I have another query, I have tried to use another data set and I get this result directly when running ClusterProfile: --> No gene can be mapped.... --> Expected input gene ID: HSD11B2,PTPN11,ABCG1,GALE,WASL,PLA2G12A --> return NULL... --> No gene can be mapped.... --> Expected input gene ID: APBB1,BID,GALT,NDUFA1,ABCB4,RUNX1 --> return NULL... It's like my genes don't match...how can that happen? Thanks in advance!
@MinuMathews-dc3oy
@MinuMathews-dc3oy 6 ай бұрын
When i put in df
@biostatsquid
@biostatsquid 6 ай бұрын
That's probably because your file is in a different folder, or not there at all. Make sure to download the file, put it in a folder and then set in_path to the full path of that folder. You can check if the file is there with list.files(in_path), for example. Hope this helps!
@shawsheryl5092
@shawsheryl5092 Жыл бұрын
Aaaaaaaaawesome!!!!! I've finished watching all your videos about pathway analysis and they really help a lot!! I'm really grateful for your excellent explaination!!!! But I wonder if I could apply GSEA into proteomic analysis? I've get the expression matrix of the proteins, but I don't know if I could match the protein ids with the gene set... could you please provide me some suggestions? I'd be approciate it a lot!!
@biostatsquid
@biostatsquid Жыл бұрын
Thanks for your comment! Glad you liked the videos:) Unfortunately,I have never applied GSEA to proteomics (which I believe is called PSEA;) so I cannot give you a sure answer, but I here are some suggestion to try out: - Following the same steps as for GSEA, but before running GSEA, convert gene symbols to protein IDs. There are a few tools to do this within R, or you could also use the UniProt Retrieve/ID Mapping tool. I think this should work if the IDs match, and you use gene sets based on protein-coding genes. - You might want to check out this publication, presenting PSEA-Quant: www.ncbi.nlm.nih.gov/pmc/articles/PMC5352860/ It allows you to perform PSEA (it's a web-based tool as far as I know) - but most importantly, if you check the methods you might figure out how to download protein sets from the tool itself. Hope you find a solution! Let me know! Good luck!
@shawsheryl5092
@shawsheryl5092 Жыл бұрын
@@biostatsquid Thanks for your suggestions! I'm sorry to reply you so late because I'm not confident of my consequences. First I checked the PSEA-Quant article but I failed to visit the url they provided.🥲 Then I tried to find if there are protein datasets directly matching Uniprot ID so that I can lose as least information as I can. But when I tried to use uniprot id to analyse by clusterprofiler(), it showed error. I even tried to make my own gmt file(use uniprot id directly) to use in gsea, but it failed too. And I'm not that professional enough to build my own package...(keep learning💪) Finally, I chose to transfer uniprot id into entrezid, and got my results. But I doubt the reliability of this method because some proteins come from the same gene, and some of them up, some of them down, which may act as counteraction. Fortunately in my protein set there are only 2 proteins from the same gene and I eliminate them, to some degree the result still has some value as a reference.
@joeyoviedo5202
@joeyoviedo5202 Жыл бұрын
wow, so glad I found your channel, very high quality content. I would love to see more workflows using other clusterProfiler functions. Also, It would be cool to have workflow options for generating data visualizations that are good for comparing exposure groups and exposure windows using overlapping significant DEGs. Thank you! Have a squidtastic🦑day!
@biostatsquid
@biostatsquid Жыл бұрын
Thank you so much for your comment! Glad you like the videos. Great suggestions - will definitely add them to my list;) Quick question - what do you mean by 'exposure groups' and 'exposure windows'?
@joeyoviedo5202
@joeyoviedo5202 Жыл бұрын
​@@biostatsquid Hi, so I just mean for example when like there are lets say 3 exposure windows ie 24H, 3Days, 7Days and 3 exposure groups ie like different concentration of treatment or possibly different tissue/cell types, etc. Does that hopefully help what I mean lol. And its so nice to chat with you! Cheers!
@biostatsquid
@biostatsquid Жыл бұрын
@@joeyoviedo5202 Oh I see, so like ways to visualise comparisons of DEGs at different time points and possibly groups? That's a really good idea, will definitely add that to my todo list;) Thanks for the suggestion!
Pathway Enrichment Analysis plots: easy R tutorial
24:35
Biostatsquid
Рет қаралды 9 М.
Pathway enrichment analysis - simple explanation!
12:53
Biostatsquid
Рет қаралды 23 М.
Running With Bigger And Bigger Lunchlys
00:18
MrBeast
Рет қаралды 120 МЛН
ОТОМСТИЛ МАМЕ ЗА ЧИПСЫ🤯#shorts
00:44
INNA SERG
Рет қаралды 4,8 МЛН
Gene Set Enrichment Analysis (GSEA) with fgsea - easy R tutorial
24:56
How to use DAVID for functional annotation of genes
12:55
Genomics Guru
Рет қаралды 78 М.
Gene Set Enrichment Analysis (GSEA) - simply explained!
8:14
Biostatsquid
Рет қаралды 30 М.
RNAseq analysis | Gene ontology (GO) in R
5:16
Sanbomics
Рет қаралды 59 М.
How to perform gene enrichment (GO and KEGG pathways) analysis with SR plot
6:28
Dr. Asif's Mol. Biology
Рет қаралды 24 М.
Running With Bigger And Bigger Lunchlys
00:18
MrBeast
Рет қаралды 120 МЛН