Gene Set Enrichment Analysis (GSEA)

Gene Set Enrichment Analysis (GSEA) - simply explained!

Рет қаралды 29,987

Biostatsquid

Күн бұрын

Пікірлер: 64

@mocabeentrill Жыл бұрын

Hi Biostatsquid. This is the most straight forward explanation on GSEA i've heard. Thank you for your hard work.

@CrucialBullish 22 күн бұрын

Walker Thomas Rodriguez Christopher Hernandez Jeffrey

@Bee-zp5vo Жыл бұрын

Hi mam, Could you make a video new generation tool "topology based method " for pathway enrichment analysis which you mentioned in this video @7:26

@sanjaisrao484 Жыл бұрын

Thanks mam, mam upload GSEA analysis in R, please

@mihacerne7313 Жыл бұрын

Squidtastic!

@imanmevaa9679 6 күн бұрын

Thank you for your work! Your video helped me better understand a paper I am presenting to my lab. Clear and complete explanations 👏

@lianahayrapetyan4191 Ай бұрын

Thanks for the great explanation. What if the genes are enriched at both ends of the ranked list and are still significant without random distribution? That is something found in the STRING biological database. How would these terms be biologically interpreted? Some genes from our gene set contribute to the upregulation of the term, and some - to the downregulation?

@arfaarashid Жыл бұрын

Hi Biostatsquid, thanks for the video! I had a question about the amount of genes that these analyses are performed on. In a workshop I did performing functional analysis, my input contained around 20,000 genes. Is this normal for GSEA? Or should the input size be around 20 or 100? Thanks again

@biostatsquid Жыл бұрын

Hi Arfaa! Great question. 20,000 genes sounds more than fine for GSEA. Actually GSEA makes more sense with many input genes, more than just 20 (in that case it wouldn't take that long to research what each gene does)

@genuinity Жыл бұрын

Thank so much for both videos, such clear and concise explanation, please continue making videos.🙃

@Brain837 Ай бұрын

Very well explained and easy to follow! I really enjoyed the video

@apedike Жыл бұрын

So glad I discovered this channel! Looking forward to all these videos.

@biostatsquid Жыл бұрын

Thank you! Glad you enjoyed it:)

@pygmypuffdraws2753 7 ай бұрын

That was super helpful, thank you so much!

@funnyarian 6 ай бұрын

Squidtastic!! How accurate it is to say that in the ranked list at the top we have the most upregulated and at the bottom the most downregulated (as you said in the video and image)? Because I would change into - at the top we have the most significant upregulated, and at the bottom the most significant and downregulated. Again maybe one the most significant (by pval/padj) is the most significant but it is not the most upregulated/downregulated?

@biostatsquid 6 ай бұрын

Hi! Great point. If you rank them by sign(-log2FC)*p-val it's exactly what you said: you'd be ranking them from most significant & upregulated > less significant upregulated > less significant downregulated > most significant downregulated. Does this make sense? And yes, exactly, maybe the one with the highest sign(-log2FC)*p-val , is not the most upregulated, but rather the most significant:)

@pabloaguirreazorin8324 5 ай бұрын

Hi Biostatsquid. What do you use to get the ranked list: p-value or adjusted p-value? If it is p-value, Why?

@biostatsquid 5 ай бұрын

Hi! Thanks for your comment:) I normally use -log10(p-adj) * sign(log2FC), maybe this will help: www.biostars.org/p/375584/ www.biostars.org/p/298312/

@MeWatchingYouTubeVideos Жыл бұрын

Thank you so much! Perfect for beginners to quickly grasp it!

@simrangambhir782 Жыл бұрын

thank you very much for the explanation😃

@mariabirkisdottir1035 Ай бұрын

Great explanation! Thanks a lot

@nanditasatish2297 Жыл бұрын

love this channel

@juliangrandvallet5359 Жыл бұрын

AMAZING!!!

@GaryMartinez-g5m 3 күн бұрын

Bahringer Rest

@jehadyasin04 Жыл бұрын

Truly amazing videos!

@gabrielecarciofi7886 28 күн бұрын

Thank you so much for this explanation. I was lost trying to understand where the data came from, now i got it. Thank youuuu ❤❤❤ you're a brilliant angel

@GilesBradley-y8v 4 күн бұрын

Kuhn Crossing

@DouglasBullock-t4e 12 күн бұрын

Bogisich Inlet

@NAVYAB-eb2jp 4 ай бұрын

Thank you for explaining it well.. Can you pls provide information on the inputs needed to perform ssGSEA ...

@JocelynSmith-q5j 4 күн бұрын

Zieme River

@Scientific_Updates Жыл бұрын

Dear BioStatquid, Thanks for the video, your explanation is really nice. I need to ask that few online platform for performing GSEA require organisms database e.g. Broad Institute GSEA. and it does not contain database for bacterial genome, I have RNASeq data that I need to perform GSEA but unable to perform it, because of unavailability of database in input format. Please suggest. Thanks in Advance

@biostatsquid Жыл бұрын

Hi! Thanks for your feedback. I have not really worked with prokaryotes, but FUNAGE-Pro could be a possible solution - 'comprehensive web server for gene set enrichment analysis of prokaryotes' pubmed.ncbi.nlm.nih.gov/35641095/ funagepro.molgenrug.nl/ Hope it works!

@Scientific_Updates Жыл бұрын

@@biostatsquid Thanks for your response, I hae performed analysis through FunagePRO, but its functional enrichment analysis in my case didn't work. Trying cluster profiler, and Goseq but all need an org database which I don't have.

@MillardValladores-z8j 12 күн бұрын

Goldner Corners

@HodgeTammy-p2n 6 күн бұрын

Darrel Wells

@JeremiahWayne-r2n 12 күн бұрын

Leffler Shores

@goodoo6745 Жыл бұрын

I love the way you explain the whole concept in simple terms. could you elaborate more on how to rank the gene list from the FC and Pvalues of the differential expression? I a trying to make the rnk file to be imported to.GSEA

@biostatsquid Жыл бұрын

Thanks for your comment! That is a great question, I think many people will have the same issue. I am working on a GSEA tutorial which will show you exactly how to do it but consider this an advancement on the full script!:P I work with the package fgsea bioconductor.org/packages/release/bioc/html/fgsea.html You can read the documentation for more detailed instructions and examples, but for example, if you want to use the sign of log2FC multiplied by the -log10(pval) as ranking to order your gene list, you can do something like: rankings

@davidguardamino Жыл бұрын

Hi! Great video. I have seen that it is very popular to use the foldChange to rank the genes... so here, when using FC*-log(p-value) , is it a convention? (Sorry if my question is very odd, I am new in this)

@biostatsquid Жыл бұрын

Hi David. Not at all, that is a great question! So it depends on what you have. If you rank all genes, you include also genes with a very high p-value (for example, gene X with p-val = 0.8). So yeah, perhaps your gene X has an amazing fold change meaning there is a big difference between the two groups you are comparing, but with a p-val of 0.8, that big change is just not significant. So using sign(FC) * -log(pval) is a way of taking this into account. -log(p-val) will transform those p-values (going from 0 to 1) into a more manageable scale (basically instead of pval 0.00000000000000001 you have a -logpval of 17). The sign(FC) just transforms that manageable number into positive (if upregulated, or FC > 0) or negative (if downregulated, so FC < 0). This way, you genes will be ranked from downregulated, SIGNIFICANT genes -> downregulated, less significant genes --> non-significant genes ----> upregulated, more significant genes ----> upregulated and significant genes. Of course, you can also pre-filter your genes to only include significant ones (e.g., using pval < 0.01 or 0.05), and then just sort them by FC without worrying about the significance. Does this make sense? Hopefully this helped. Thanks for the question!

@rishikeshlotke 10 ай бұрын

@@biostatsquid Hi, I tried to work with the formula you present at 3:49 for the gene ALDOB from your table. From my calculation based on your formula, the rank for ALDOB comes to -27.1066. In your orange ranked table at 3:49, I see the ranking is done by just using -log10(pval) but in the next slide at 3:51 ALDOB has a positive ranked value of 11.3. Could you explain what I am doing wrong or missing here? Also, does it make any sense to use adjusted p-values (FDR) instead of regular p-values for such a ranking calculation? Why or why not? Thanks for your clarification in advance.

@VioletNehemiah-s3p 20 күн бұрын

Adonis Island

@cintiapalu1929 4 ай бұрын

Amazing, I will definitely recommend to my colleagues - thanks for such a nice work

@jennyhu5011 11 ай бұрын

what does the list of background genes do?

@biostatsquid 11 ай бұрын

Hi Jenny, thanks so much for your question - I don't think I mentioned it in this video, so sorry for the confusion! In GSEA, we just need a list of all the genes we're interested in, and a list of gene sets. The background genes are used to filter out the genes that were not measured in our experiment from the gene sets, to avoid bias. E.g., if you download cancer hallmark gene sets, some pathways may contain genes that were not measured in your experiment for whatever reason (e.g., if you have liver samples, brain-related genes may be very downregulated or not expressed). So we must remove all those genes from the gene set list we use for our analysis. Hopefully this made sense! You can read more about it in my PEA blogpost/I think I also explain it in the PEA video:)

@amrsalaheldinabdallahhammo663 Жыл бұрын

Simply genius :) ... Keep on making videos and entertain us

@svitlanatretyak4438 8 ай бұрын

Thanks for the info! Really helpful 🙌🏻 In my experiment multiple conditions were tested and I used multiple comparison tests. Thus, I have no the FC value. Can I simply use the results of F-statistics (or p val/p val adj) for my list of genes to perform GSEA)? Did you ever have this problem? Thanks in advance!

@eloisadalsin2300 4 ай бұрын

@ZullyPulido 4 ай бұрын

Eres la mejor!! Saludos desde Colombia :)

@karoljacek8858 Жыл бұрын

Great material! Do you know of any topology-based methods that works on single-cell datasets (or pseudo-bulk single cell data)?

@SashaAnronikov Жыл бұрын

such an awesome video. informative and clear to follow. thank you so much

@danielgladish2502 4 ай бұрын

Great video! Really helpful for getting an understanding of the analysis workflow! A small critique / suggestion for improvement that I think could be made is in terminology being used, specifically referring to genes in the ranked list as being overrepresented. As you said in the video, one is not filtering any genes, so when looking at your gene set in GSEA, you aren't looking at the proportion of the genes being part of your list, but rather where are the genes located in the unfiltered ranked list containing all the genes.

@biostatsquid 4 ай бұрын

Totally agree! Thanks for your comment:)

@enraegen561 Жыл бұрын

I thoroughly enjoy the illustrations. Thank you! :D

@anmolpardeshi3138 Жыл бұрын

Hi. thanks for the amazing depiction! I was wondering if you can clear out the "permutation" step used in GSEA or FCS analysis. Thanks.

@biostatsquid Жыл бұрын

Hi Anmol, thank you so much for your comment. As for your question in gene permutation steps, I think the best explanation is the given by Anthony Castanza in this discussion: The gene_set permutation mode, which we acknowledge is inferior to the phenotype permutation mode, tests gene sets on the basis of how likely it is that a random gene set of a given size was to be enriched within the given dataset. The results from this distribution of random enrichment scores calculated as a result of sampling random gene sets that would be the same size as the set of interest, are then compared to the true enrichment score of the identically sized real set to determine if the observed enrichment is more extreme than would be expected if the true set, like the random sets, had no functional connection to a given process. In this permutation mode, GSEA constructs a "null" distribution of sets that are random and therefore are assumed to have no coordinated biological function, therefore the null hypothesis would be that the given real set has no coordinated biological function within the data, an enrichment more extreme than that observed in the null distribution (sets that we "know" are random and have no coordinated biological function) would allow us to reject the null hypothesis and say that the set does have a coordinated function at [pValue] level of probability. groups.google.com/g/gsea-help/c/dveYVGQGMS0/m/l5l2sli6CwAJ? Hope this helped!

@jayashreelaxmekuppuswami8600 Жыл бұрын

How does KS test answer the question of whether the ranked list is random or not? Isn't that a test of normality of distribution? How can it inform us about randomness or non randomness of a ranked list?? Pls explain

@biostatsquid Жыл бұрын

Hi Jayashree, thank you for your question, I will elaborate a bit more than in the video. The KS test checks whether two samples follow the same distribution. It has many uses, for example, as you mention, to test for normality. In this case, however, we use it to check whether the distribution of genes from a certain pathway across the ranked list follows a random distribution or not. So for example, we check the distribution of genes related to 'ATP synthesis' in our ranked list (sorting genes by most to least upregulated). If most of the genes involved in ATP synthesis are upregulated in one condition, they will be located at the top of the list, so the distribution across our ranked list is clearly not random. Aka they don't follow a random distribution. Therefore, we conclude that ATP synthesis is a differential pathway between our two conditions. The KS test will sort out the statistics for us, giving us p-values to help us decide when a pathways is statistically significant for our comparison. Hope this was a bit clearer!