Gene Set Enrichment Analysis (GSEA) - simply explained!

  Рет қаралды 29,987

Biostatsquid

Biostatsquid

Күн бұрын

Пікірлер: 64
@mocabeentrill
@mocabeentrill Жыл бұрын
Hi Biostatsquid. This is the most straight forward explanation on GSEA i've heard. Thank you for your hard work.
@CrucialBullish
@CrucialBullish 22 күн бұрын
Walker Thomas Rodriguez Christopher Hernandez Jeffrey
@Bee-zp5vo
@Bee-zp5vo Жыл бұрын
Hi mam, Could you make a video new generation tool "topology based method " for pathway enrichment analysis which you mentioned in this video @7:26
@sanjaisrao484
@sanjaisrao484 Жыл бұрын
Thanks mam, mam upload GSEA analysis in R, please
@mihacerne7313
@mihacerne7313 Жыл бұрын
Squidtastic!
@imanmevaa9679
@imanmevaa9679 6 күн бұрын
Thank you for your work! Your video helped me better understand a paper I am presenting to my lab. Clear and complete explanations 👏
@lianahayrapetyan4191
@lianahayrapetyan4191 Ай бұрын
Thanks for the great explanation. What if the genes are enriched at both ends of the ranked list and are still significant without random distribution? That is something found in the STRING biological database. How would these terms be biologically interpreted? Some genes from our gene set contribute to the upregulation of the term, and some - to the downregulation?
@arfaarashid
@arfaarashid Жыл бұрын
Hi Biostatsquid, thanks for the video! I had a question about the amount of genes that these analyses are performed on. In a workshop I did performing functional analysis, my input contained around 20,000 genes. Is this normal for GSEA? Or should the input size be around 20 or 100? Thanks again
@biostatsquid
@biostatsquid Жыл бұрын
Hi Arfaa! Great question. 20,000 genes sounds more than fine for GSEA. Actually GSEA makes more sense with many input genes, more than just 20 (in that case it wouldn't take that long to research what each gene does)
@genuinity
@genuinity Жыл бұрын
Thank so much for both videos, such clear and concise explanation, please continue making videos.🙃
@Brain837
@Brain837 Ай бұрын
Very well explained and easy to follow! I really enjoyed the video
@apedike
@apedike Жыл бұрын
So glad I discovered this channel! Looking forward to all these videos.
@biostatsquid
@biostatsquid Жыл бұрын
Thank you! Glad you enjoyed it:)
@pygmypuffdraws2753
@pygmypuffdraws2753 7 ай бұрын
That was super helpful, thank you so much!
@funnyarian
@funnyarian 6 ай бұрын
Squidtastic!! How accurate it is to say that in the ranked list at the top we have the most upregulated and at the bottom the most downregulated (as you said in the video and image)? Because I would change into - at the top we have the most significant upregulated, and at the bottom the most significant and downregulated. Again maybe one the most significant (by pval/padj) is the most significant but it is not the most upregulated/downregulated?
@biostatsquid
@biostatsquid 6 ай бұрын
Hi! Great point. If you rank them by sign(-log2FC)*p-val it's exactly what you said: you'd be ranking them from most significant & upregulated > less significant upregulated > less significant downregulated > most significant downregulated. Does this make sense? And yes, exactly, maybe the one with the highest sign(-log2FC)*p-val , is not the most upregulated, but rather the most significant:)
@pabloaguirreazorin8324
@pabloaguirreazorin8324 5 ай бұрын
Hi Biostatsquid. What do you use to get the ranked list: p-value or adjusted p-value? If it is p-value, Why?
@biostatsquid
@biostatsquid 5 ай бұрын
Hi! Thanks for your comment:) I normally use -log10(p-adj) * sign(log2FC), maybe this will help: www.biostars.org/p/375584/ www.biostars.org/p/298312/
@MeWatchingYouTubeVideos
@MeWatchingYouTubeVideos Жыл бұрын
Thank you so much! Perfect for beginners to quickly grasp it!
@simrangambhir782
@simrangambhir782 Жыл бұрын
thank you very much for the explanation😃
@mariabirkisdottir1035
@mariabirkisdottir1035 Ай бұрын
Great explanation! Thanks a lot
@nanditasatish2297
@nanditasatish2297 Жыл бұрын
love this channel
@juliangrandvallet5359
@juliangrandvallet5359 Жыл бұрын
AMAZING!!!
@GaryMartinez-g5m
@GaryMartinez-g5m 3 күн бұрын
Bahringer Rest
@jehadyasin04
@jehadyasin04 Жыл бұрын
Truly amazing videos!
@gabrielecarciofi7886
@gabrielecarciofi7886 28 күн бұрын
Thank you so much for this explanation. I was lost trying to understand where the data came from, now i got it. Thank youuuu ❤❤❤ you're a brilliant angel
@GilesBradley-y8v
@GilesBradley-y8v 4 күн бұрын
Kuhn Crossing
@DouglasBullock-t4e
@DouglasBullock-t4e 12 күн бұрын
Bogisich Inlet
@NAVYAB-eb2jp
@NAVYAB-eb2jp 4 ай бұрын
Thank you for explaining it well.. Can you pls provide information on the inputs needed to perform ssGSEA ...
@JocelynSmith-q5j
@JocelynSmith-q5j 4 күн бұрын
Zieme River
@Scientific_Updates
@Scientific_Updates Жыл бұрын
Dear BioStatquid, Thanks for the video, your explanation is really nice. I need to ask that few online platform for performing GSEA require organisms database e.g. Broad Institute GSEA. and it does not contain database for bacterial genome, I have RNASeq data that I need to perform GSEA but unable to perform it, because of unavailability of database in input format. Please suggest. Thanks in Advance
@biostatsquid
@biostatsquid Жыл бұрын
Hi! Thanks for your feedback. I have not really worked with prokaryotes, but FUNAGE-Pro could be a possible solution - 'comprehensive web server for gene set enrichment analysis of prokaryotes' pubmed.ncbi.nlm.nih.gov/35641095/ funagepro.molgenrug.nl/ Hope it works!
@Scientific_Updates
@Scientific_Updates Жыл бұрын
@@biostatsquid Thanks for your response, I hae performed analysis through FunagePRO, but its functional enrichment analysis in my case didn't work. Trying cluster profiler, and Goseq but all need an org database which I don't have.
@MillardValladores-z8j
@MillardValladores-z8j 12 күн бұрын
Goldner Corners
@HodgeTammy-p2n
@HodgeTammy-p2n 6 күн бұрын
Darrel Wells
@JeremiahWayne-r2n
@JeremiahWayne-r2n 12 күн бұрын
Leffler Shores
@goodoo6745
@goodoo6745 Жыл бұрын
I love the way you explain the whole concept in simple terms. could you elaborate more on how to rank the gene list from the FC and Pvalues of the differential expression? I a trying to make the rnk file to be imported to.GSEA
@biostatsquid
@biostatsquid Жыл бұрын
Thanks for your comment! That is a great question, I think many people will have the same issue. I am working on a GSEA tutorial which will show you exactly how to do it but consider this an advancement on the full script!:P I work with the package fgsea bioconductor.org/packages/release/bioc/html/fgsea.html You can read the documentation for more detailed instructions and examples, but for example, if you want to use the sign of log2FC multiplied by the -log10(pval) as ranking to order your gene list, you can do something like: rankings
@davidguardamino
@davidguardamino Жыл бұрын
Hi! Great video. I have seen that it is very popular to use the foldChange to rank the genes... so here, when using FC*-log(p-value) , is it a convention? (Sorry if my question is very odd, I am new in this)
@biostatsquid
@biostatsquid Жыл бұрын
Hi David. Not at all, that is a great question! So it depends on what you have. If you rank all genes, you include also genes with a very high p-value (for example, gene X with p-val = 0.8). So yeah, perhaps your gene X has an amazing fold change meaning there is a big difference between the two groups you are comparing, but with a p-val of 0.8, that big change is just not significant. So using sign(FC) * -log(pval) is a way of taking this into account. -log(p-val) will transform those p-values (going from 0 to 1) into a more manageable scale (basically instead of pval 0.00000000000000001 you have a -logpval of 17). The sign(FC) just transforms that manageable number into positive (if upregulated, or FC > 0) or negative (if downregulated, so FC < 0). This way, you genes will be ranked from downregulated, SIGNIFICANT genes -> downregulated, less significant genes --> non-significant genes ----> upregulated, more significant genes ----> upregulated and significant genes. Of course, you can also pre-filter your genes to only include significant ones (e.g., using pval < 0.01 or 0.05), and then just sort them by FC without worrying about the significance. Does this make sense? Hopefully this helped. Thanks for the question!
@rishikeshlotke
@rishikeshlotke 10 ай бұрын
@@biostatsquid Hi, I tried to work with the formula you present at 3:49 for the gene ALDOB from your table. From my calculation based on your formula, the rank for ALDOB comes to -27.1066. In your orange ranked table at 3:49, I see the ranking is done by just using -log10(pval) but in the next slide at 3:51 ALDOB has a positive ranked value of 11.3. Could you explain what I am doing wrong or missing here? Also, does it make any sense to use adjusted p-values (FDR) instead of regular p-values for such a ranking calculation? Why or why not? Thanks for your clarification in advance.
@VioletNehemiah-s3p
@VioletNehemiah-s3p 20 күн бұрын
Adonis Island
@cintiapalu1929
@cintiapalu1929 4 ай бұрын
Amazing, I will definitely recommend to my colleagues - thanks for such a nice work
@jennyhu5011
@jennyhu5011 11 ай бұрын
what does the list of background genes do?
@biostatsquid
@biostatsquid 11 ай бұрын
Hi Jenny, thanks so much for your question - I don't think I mentioned it in this video, so sorry for the confusion! In GSEA, we just need a list of all the genes we're interested in, and a list of gene sets. The background genes are used to filter out the genes that were not measured in our experiment from the gene sets, to avoid bias. E.g., if you download cancer hallmark gene sets, some pathways may contain genes that were not measured in your experiment for whatever reason (e.g., if you have liver samples, brain-related genes may be very downregulated or not expressed). So we must remove all those genes from the gene set list we use for our analysis. Hopefully this made sense! You can read more about it in my PEA blogpost/I think I also explain it in the PEA video:)
@amrsalaheldinabdallahhammo663
@amrsalaheldinabdallahhammo663 Жыл бұрын
Simply genius :) ... Keep on making videos and entertain us
@svitlanatretyak4438
@svitlanatretyak4438 8 ай бұрын
Thanks for the info! Really helpful 🙌🏻 In my experiment multiple conditions were tested and I used multiple comparison tests. Thus, I have no the FC value. Can I simply use the results of F-statistics (or p val/p val adj) for my list of genes to perform GSEA)? Did you ever have this problem? Thanks in advance!
@eloisadalsin2300
@eloisadalsin2300 4 ай бұрын
up
@ZullyPulido
@ZullyPulido 4 ай бұрын
Eres la mejor!! Saludos desde Colombia :)
@karoljacek8858
@karoljacek8858 Жыл бұрын
Great material! Do you know of any topology-based methods that works on single-cell datasets (or pseudo-bulk single cell data)?
@SashaAnronikov
@SashaAnronikov Жыл бұрын
such an awesome video. informative and clear to follow. thank you so much
@danielgladish2502
@danielgladish2502 4 ай бұрын
Great video! Really helpful for getting an understanding of the analysis workflow! A small critique / suggestion for improvement that I think could be made is in terminology being used, specifically referring to genes in the ranked list as being overrepresented. As you said in the video, one is not filtering any genes, so when looking at your gene set in GSEA, you aren't looking at the proportion of the genes being part of your list, but rather where are the genes located in the unfiltered ranked list containing all the genes.
@biostatsquid
@biostatsquid 4 ай бұрын
Totally agree! Thanks for your comment:)
@enraegen561
@enraegen561 Жыл бұрын
I thoroughly enjoy the illustrations. Thank you! :D
@anmolpardeshi3138
@anmolpardeshi3138 Жыл бұрын
Hi. thanks for the amazing depiction! I was wondering if you can clear out the "permutation" step used in GSEA or FCS analysis. Thanks.
@biostatsquid
@biostatsquid Жыл бұрын
Hi Anmol, thank you so much for your comment. As for your question in gene permutation steps, I think the best explanation is the given by Anthony Castanza in this discussion: The gene_set permutation mode, which we acknowledge is inferior to the phenotype permutation mode, tests gene sets on the basis of how likely it is that a random gene set of a given size was to be enriched within the given dataset. The results from this distribution of random enrichment scores calculated as a result of sampling random gene sets that would be the same size as the set of interest, are then compared to the true enrichment score of the identically sized real set to determine if the observed enrichment is more extreme than would be expected if the true set, like the random sets, had no functional connection to a given process. In this permutation mode, GSEA constructs a "null" distribution of sets that are random and therefore are assumed to have no coordinated biological function, therefore the null hypothesis would be that the given real set has no coordinated biological function within the data, an enrichment more extreme than that observed in the null distribution (sets that we "know" are random and have no coordinated biological function) would allow us to reject the null hypothesis and say that the set does have a coordinated function at [pValue] level of probability. groups.google.com/g/gsea-help/c/dveYVGQGMS0/m/l5l2sli6CwAJ? Hope this helped!
@jayashreelaxmekuppuswami8600
@jayashreelaxmekuppuswami8600 Жыл бұрын
How does KS test answer the question of whether the ranked list is random or not? Isn't that a test of normality of distribution? How can it inform us about randomness or non randomness of a ranked list?? Pls explain
@biostatsquid
@biostatsquid Жыл бұрын
Hi Jayashree, thank you for your question, I will elaborate a bit more than in the video. The KS test checks whether two samples follow the same distribution. It has many uses, for example, as you mention, to test for normality. In this case, however, we use it to check whether the distribution of genes from a certain pathway across the ranked list follows a random distribution or not. So for example, we check the distribution of genes related to 'ATP synthesis' in our ranked list (sorting genes by most to least upregulated). If most of the genes involved in ATP synthesis are upregulated in one condition, they will be located at the top of the list, so the distribution across our ranked list is clearly not random. Aka they don't follow a random distribution. Therefore, we conclude that ATP synthesis is a differential pathway between our two conditions. The KS test will sort out the statistics for us, giving us p-values to help us decide when a pathways is statistically significant for our comparison. Hope this was a bit clearer!
@swifttaylor3107
@swifttaylor3107 11 ай бұрын
YOU UNDERSTANDED ME THANK YOU
@duupu8417
@duupu8417 Жыл бұрын
So helpful. Thanks a lot.
@jfromtheusa
@jfromtheusa Жыл бұрын
wow this is such a clear description!
@jgk9111
@jgk9111 11 ай бұрын
The best video
@mercedesdebernardi4215
@mercedesdebernardi4215 3 ай бұрын
Tus videos me estan ayudando muchisimo!!! Sigue asi!!
Pathway enrichment analysis tutorial in R with clusterProfiler()
18:59
Worst flight ever
00:55
Adam W
Рет қаралды 29 МЛН
Win This Dodgeball Game or DIE…
00:36
Alan Chikin Chow
Рет қаралды 38 МЛН
Pathway enrichment analysis - simple explanation!
12:53
Biostatsquid
Рет қаралды 23 М.
Gene Set Enrichment Analysis| GSEA algorithm
18:09
LiquidBrain Bioinformatics
Рет қаралды 14 М.
Gene set analysis - GSEA and Fisher's exact test
14:55
TileStats
Рет қаралды 7 М.
Gene Set Enrichment Analysis (GSEA) with fgsea - easy R tutorial
24:56
Single cell transcriptomics - Differential gene expression and Enrichment analysis (8 of 10)
1:06:42
SIB - Swiss Institute of Bioinformatics
Рет қаралды 2,9 М.
STAT115 Chapter 5.6 Gene Set Enrichment Analyses
20:54
Xiaole Shirley Liu
Рет қаралды 11 М.