Volcano plots with ggplot2 for differential gene expression| Beginner-friendly R

  Рет қаралды 17,207

Biostatsquid

Biostatsquid

Күн бұрын

Пікірлер: 34
@GlykeriaSpyrou
@GlykeriaSpyrou Жыл бұрын
One remark! I think for the y intercept dashed line you should assign it as: `yintercept = c(-log10(0.05))` so that it shows the line correctly. Thanks a lot for the tutorial! I just did my own volcano plot :)
@biostatsquid
@biostatsquid Жыл бұрын
Thanks so much for your comment, totally right!:)
@npavlovi1
@npavlovi1 Жыл бұрын
I can't tell you how helpful this was! Thank you so so much!
@joeyoviedo5202
@joeyoviedo5202 Жыл бұрын
Awesome tutorial for volcano plots in R using ggplot2 and useful customizable functions to help make it publication-ready. Thank you!
@mihacerne7313
@mihacerne7313 2 жыл бұрын
Fantastic video, wish I had someone like you while I was still at university! You can't imagine how ugly my plots were :O
@CynthiaFrancis-sv4rc
@CynthiaFrancis-sv4rc 7 ай бұрын
Absolutely amazing! Thank you for doing this! Great job
@henverbrunetta6387
@henverbrunetta6387 Жыл бұрын
fantastic video, super easy to follow the instructions. thank you very much
@hozifaelgadal623
@hozifaelgadal623 7 ай бұрын
thank you very much , that was very informative and joy to watch .
@abc_ratio
@abc_ratio 3 ай бұрын
Waovv it was so nice video, like the one I am looking to
@sanjaisrao484
@sanjaisrao484 Жыл бұрын
You are so cool, please keep uploading more, its so helpful, Thank you very much
@francescacatellani1021
@francescacatellani1021 Жыл бұрын
Really helpful, thank you!
@amritabhattacharjee4596
@amritabhattacharjee4596 6 ай бұрын
Hi. This is a nice video. I am new to data visualisation and I find it very complex as to how to memorise the code or understand how to use it with various datasets. Could you please share some tips on how you do that?
@biostatsquid
@biostatsquid 6 ай бұрын
Hi, thanks so much for your comment! My recommendation is... don't memorise code! You'll end up remembering the most common functions and bits and pieces anyway if you use them a lot - but a lot of bioinformatics is just googling:) As for what to use in which case and with which data... honestly, it comes with practice. Seeing and reading what other people do with similar problems / datasets definitely helps, e.g., from publications, tools, github repos... if you encounter a problem, odds are someone already did too! And probably solved it:) Good luck, you'll see how it gets easier the more you do it! Just have fun with it:)
@josuesalinasochoa8826
@josuesalinasochoa8826 Жыл бұрын
You are my hero
@abelinoskosmos1679
@abelinoskosmos1679 10 ай бұрын
I'm running into a problem with adding labels onto the volcano plot and I've scoured the internet i.e. stackoverflow with my error message but its no where. I use the below to add my gene name labels: geom_text_repel(max.overlaps = Inf) But get this error code alongside a blank plot screen: Error in replace_null(unclass(data), label = "a", angle = 0) : could not find function "replace_null" In addition: Warning message: Removed 11240 rows containing missing values (`geom_text_repel()`). Funny thing is when I use geom_text instead, it works but instead the labels obviously overlaps because there's so many. Ideally I would like geom_text_repel to work, it would make it so much easier. I thought to use geom_text and individually shift each of the 30 labels I've got but of course not ideal. Do you have any idea? Thanks
@aninditabhattacharjee9043
@aninditabhattacharjee9043 Ай бұрын
First and foremost, thank you for providing such a detailed video. I am having an unusual difficulty with my data. I have 2345 significantly differentially expressed genes, and when I try to plot all of the significant genes, I can see all of the gene names on the volcano plot. However, when I try to visualize merely 30 or 100 genes, the codes work correctly but I am unable to see them. Could you help provide me some advice on how to solve this issue? Thank you in advance.
@morgantee583
@morgantee583 Жыл бұрын
Hello, thank you for this helpful video. I am making a plot following your instructions, but when I go to make the top30deg list, only the downregulated genes get added to my plot, none of the upregulated even though they have larger values. Do you have any advice? Here is my code #script to add column with the TOP90 differentially expressed genes names top30degs
@sanjaisrao484
@sanjaisrao484 Жыл бұрын
Mam I request you please upload video on heat maps for DGE. Thanks
@biostatsquid
@biostatsquid Жыл бұрын
Hi Sanjai thank you for your comment, it is definitely high up in my list, hope to work on it soon:)
@Aoffyfeefy
@Aoffyfeefy Жыл бұрын
Thank you so much. ❤❤❤
@rafidahmohdariff
@rafidahmohdariff Жыл бұрын
thank you for this video. I would like to ask, do must have replicates data in order to be able to run volcano analysis? If we only have one set data, does it mean that we cannot run the volcano analysis? Thank you so much.
@biostatsquid
@biostatsquid Жыл бұрын
Great question! I'm assuming you're using a volcano plot to visualize the results of statistical tests comparing two different conditions (e.g., treatment vs. control) for a large number of variables (e.g., genes, proteins). If you have a single data point for each condition (e.g., one sample for treatment and one sample for control, so 1 replicate), you won't be able to calculate meaningful statistics like fold change or perform traditional hypothesis testing. So you can plot it using a volcano plot if you'd like to, but it's what you plot that's the problem - the p-values/log2FC won't be really robust, and the interpretation and usefulness of the plot will be limited. Hopefully this helped!
@niiinjaaa3241
@niiinjaaa3241 Ай бұрын
Thank you!
@huiminlu8436
@huiminlu8436 10 ай бұрын
i love the instruction, thank you so much.
@malteg.9220
@malteg.9220 2 жыл бұрын
i tried to do my volcano plot watching 4 different guides before and since i am shit with R, it didnt work until i watched this one. I do have another question though: I need to label only the upregulated and downregulated genes, instead of the top 30. Sadly i cant manage to do that. Does anyone have a line of code, that would help me with my problem?
@biostatsquid
@biostatsquid 2 жыл бұрын
Hi Malte, glad you found it useful! That's a really good point you make, sometimes we want to show ALL significant genes So there are several ways of doing this. One of the easiest I can think of is using these two lines of code: 1) Create a list of your significant genes (e.g., called 'significant_genes'). You can do this by taking the column 'gene_symbol' of your dataframe, but only of those rows (genes) that have either diffexpressed 'UP' or diff_expressed 'DOWN' (check 8:27 in the video if you want to know how to create this column). This works: significant_genes
@biostatsquid
@biostatsquid 2 жыл бұрын
Btw! Forgot to say, check how big your list is first (so how many significant genes you have) - if there are a lot you might want to reduce the list because R might go a bit 'crazy' trying to display so many labels...
@mohamedirfan5480
@mohamedirfan5480 11 ай бұрын
Hi Dr Laura. What is the least log FC threshold one can use?
@biostatsquid
@biostatsquid 11 ай бұрын
Hi! Thanks for your question, it's a really good one. I guess... it depends on you and your data! Remember that fold change is the ratio of the change, so if your gene expression doubles (30 > 60) between conditions, FC = 2 -> so logFC = 1. So logFC = 1 is already quite a big change. logFC of a 50% increase (e.g., 30 > 45) = 0.18. If you have a lot of DGEs, you might want to go with a higher threshold to filter out stuff. If you have very few, you might want to lower it - but at some point it stops being biologically meaningful. E.g., even if significant, does a fold change of 1.2 (e.g., 30 > 35), logFC = 0.07, tell you much? (Maybe it does) If your variable is thought to be constant between conditions, maybe it is an interesting finding to report, but e.g., if you are thinking of a marker gene, you might want to go with a different one with higher logFC! Hope this answered your question! (Btw I'm not a doctor! You can just call me Laura:)
@TheMothaDuckingHamMan
@TheMothaDuckingHamMan 11 ай бұрын
:( I can't get the labels on my top ten proteins. I keep removing it and I've tried a couple of codes. Why would this be happening? I am using tthe implement if in top 30 command which should override the NA but it just stays as NA
@alima9353
@alima9353 10 ай бұрын
Yes same 😢 it doesn’t work with geom_text_repel but does with geom_text but I need top 30 and the later only includes 10
@philipfleischhauer5011
@philipfleischhauer5011 2 жыл бұрын
Hello, thanks for this nice video, it was very helpful. I have one question: my pvalues contain scientific E values (e.g. 5.44100706155786E-92). Therefore changing column label is not possible for me. It especially this problem because when I try to change with "RNAseq_ggplot_2$pvalue < 1]
@biostatsquid
@biostatsquid 2 жыл бұрын
Hi Philip, thanks for your feedback! About scientific notation. At the beginning of my scripts, I always set the options(scipen = 999) which disables scientific notation. See if it works for you. Otherwise I think format() might also work for you, you have more info here: stackoverflow.com/questions/9397664/force-r-not-to-use-exponential-notation-e-g-e10 Hope it solves your problem!
Volcano plots explained | How to interpret a volcano plot for DGE
6:01
We Attempted The Impossible 😱
00:54
Topper Guild
Рет қаралды 56 МЛН
小丑教训坏蛋 #小丑 #天使 #shorts
00:49
好人小丑
Рет қаралды 54 МЛН
99.9% IMPOSSIBLE
00:24
STORROR
Рет қаралды 31 МЛН
How To Create A Volcano Plot In GraphPad Prism
11:38
Steven Bradburn
Рет қаралды 117 М.
RNAseq mapping with Salmon for differential expression
13:13
Sanbomics
Рет қаралды 10 М.
Make a Heatmap on R Studio
13:15
HowToDataViz
Рет қаралды 120 М.
Pathway enrichment analysis tutorial in R with clusterProfiler()
18:59
How to Interpret a Volcano Plot
5:20
Vincent Stevenson
Рет қаралды 21 М.
Creating a Volcano Plot using Microsoft Excel
6:54
Precision Health
Рет қаралды 22 М.