Volcano plots with ggplot2 for differential gene expression| Beginner-friendly R

  Рет қаралды 15,288

Biostatsquid

Biostatsquid

Күн бұрын

In this video I will explain how to create and customise your own volcano plot using R. I will give you a step by step explanation and code to create and customise your own volcano plot in R to visualise your DEG results.
Data & step-by-step code explanation: biostatsquid.c...
Bacher T cell data: rdrr.io/github...
Hope you like it!
--------------------------------------------------------------------------------------------------------------------
Watched it already?
If you liked this video or found it useful, please let me know! Your comments and feedback are very much appreciated😊 Like, subscribe and share with someone who might find it useful!
If you have questions, don't hesitate to leave me a comment down below, I will answer as soon as I can:)
--------------------------------------------------------------------------------------------------------------------
For more biostatistics tools and resources, you can visit: biostatsquid.com/
for more
• simple and clear explanations of biostatistics methods
• computational biology tools
• easy step-by-step tutorials in R and Python
to analyse and visualise your biological data!
Or follow me on Instagram at @biostatsquid: / biostatsquid
Don’t forget to subscribe if you don’t want to miss another video from me!
--------------------------------------------------------------------------------------------------------------------
More volcano plots resources:
In this video, I explain how to create your own volcano plot in R:
• Volcano plots with ggp...
This is also a really cool tool made with RShiny which allows you to customise your own volcano plot online:
huygens.scienc...
And here is a nice, short explanation on how to interpret volcano plots:
www.htgmolecul...

Пікірлер: 32
@GlykeriaSpyrou
@GlykeriaSpyrou 10 ай бұрын
One remark! I think for the y intercept dashed line you should assign it as: `yintercept = c(-log10(0.05))` so that it shows the line correctly. Thanks a lot for the tutorial! I just did my own volcano plot :)
@biostatsquid
@biostatsquid 10 ай бұрын
Thanks so much for your comment, totally right!:)
@CynthiaFrancis-sv4rc
@CynthiaFrancis-sv4rc 3 ай бұрын
Absolutely amazing! Thank you for doing this! Great job
@hozifaelgadal623
@hozifaelgadal623 4 ай бұрын
thank you very much , that was very informative and joy to watch .
@sanjaisrao484
@sanjaisrao484 Жыл бұрын
You are so cool, please keep uploading more, its so helpful, Thank you very much
@mihacerne7313
@mihacerne7313 Жыл бұрын
Fantastic video, wish I had someone like you while I was still at university! You can't imagine how ugly my plots were :O
@josuesalinasochoa8826
@josuesalinasochoa8826 Жыл бұрын
You are my hero
@abc_ratio
@abc_ratio 2 күн бұрын
Waovv it was so nice video, like the one I am looking to
@TheMothaDuckingHamMan
@TheMothaDuckingHamMan 7 ай бұрын
:( I can't get the labels on my top ten proteins. I keep removing it and I've tried a couple of codes. Why would this be happening? I am using tthe implement if in top 30 command which should override the NA but it just stays as NA
@alima9353
@alima9353 7 ай бұрын
Yes same 😢 it doesn’t work with geom_text_repel but does with geom_text but I need top 30 and the later only includes 10
@amritabhattacharjee4596
@amritabhattacharjee4596 3 ай бұрын
Hi. This is a nice video. I am new to data visualisation and I find it very complex as to how to memorise the code or understand how to use it with various datasets. Could you please share some tips on how you do that?
@biostatsquid
@biostatsquid 3 ай бұрын
Hi, thanks so much for your comment! My recommendation is... don't memorise code! You'll end up remembering the most common functions and bits and pieces anyway if you use them a lot - but a lot of bioinformatics is just googling:) As for what to use in which case and with which data... honestly, it comes with practice. Seeing and reading what other people do with similar problems / datasets definitely helps, e.g., from publications, tools, github repos... if you encounter a problem, odds are someone already did too! And probably solved it:) Good luck, you'll see how it gets easier the more you do it! Just have fun with it:)
@morgantee583
@morgantee583 8 ай бұрын
Hello, thank you for this helpful video. I am making a plot following your instructions, but when I go to make the top30deg list, only the downregulated genes get added to my plot, none of the upregulated even though they have larger values. Do you have any advice? Here is my code #script to add column with the TOP90 differentially expressed genes names top30degs
@abelinoskosmos1679
@abelinoskosmos1679 7 ай бұрын
I'm running into a problem with adding labels onto the volcano plot and I've scoured the internet i.e. stackoverflow with my error message but its no where. I use the below to add my gene name labels: geom_text_repel(max.overlaps = Inf) But get this error code alongside a blank plot screen: Error in replace_null(unclass(data), label = "a", angle = 0) : could not find function "replace_null" In addition: Warning message: Removed 11240 rows containing missing values (`geom_text_repel()`). Funny thing is when I use geom_text instead, it works but instead the labels obviously overlaps because there's so many. Ideally I would like geom_text_repel to work, it would make it so much easier. I thought to use geom_text and individually shift each of the 30 labels I've got but of course not ideal. Do you have any idea? Thanks
@philipfleischhauer5011
@philipfleischhauer5011 Жыл бұрын
Hello, thanks for this nice video, it was very helpful. I have one question: my pvalues contain scientific E values (e.g. 5.44100706155786E-92). Therefore changing column label is not possible for me. It especially this problem because when I try to change with "RNAseq_ggplot_2$pvalue < 1]
@biostatsquid
@biostatsquid Жыл бұрын
Hi Philip, thanks for your feedback! About scientific notation. At the beginning of my scripts, I always set the options(scipen = 999) which disables scientific notation. See if it works for you. Otherwise I think format() might also work for you, you have more info here: stackoverflow.com/questions/9397664/force-r-not-to-use-exponential-notation-e-g-e10 Hope it solves your problem!
@malteg.9220
@malteg.9220 Жыл бұрын
i tried to do my volcano plot watching 4 different guides before and since i am shit with R, it didnt work until i watched this one. I do have another question though: I need to label only the upregulated and downregulated genes, instead of the top 30. Sadly i cant manage to do that. Does anyone have a line of code, that would help me with my problem?
@biostatsquid
@biostatsquid Жыл бұрын
Hi Malte, glad you found it useful! That's a really good point you make, sometimes we want to show ALL significant genes So there are several ways of doing this. One of the easiest I can think of is using these two lines of code: 1) Create a list of your significant genes (e.g., called 'significant_genes'). You can do this by taking the column 'gene_symbol' of your dataframe, but only of those rows (genes) that have either diffexpressed 'UP' or diff_expressed 'DOWN' (check 8:27 in the video if you want to know how to create this column). This works: significant_genes
@biostatsquid
@biostatsquid Жыл бұрын
Btw! Forgot to say, check how big your list is first (so how many significant genes you have) - if there are a lot you might want to reduce the list because R might go a bit 'crazy' trying to display so many labels...
@sanjaisrao484
@sanjaisrao484 Жыл бұрын
Mam I request you please upload video on heat maps for DGE. Thanks
@biostatsquid
@biostatsquid Жыл бұрын
Hi Sanjai thank you for your comment, it is definitely high up in my list, hope to work on it soon:)
@joeyoviedo5202
@joeyoviedo5202 Жыл бұрын
Awesome tutorial for volcano plots in R using ggplot2 and useful customizable functions to help make it publication-ready. Thank you!
@henverbrunetta6387
@henverbrunetta6387 11 ай бұрын
fantastic video, super easy to follow the instructions. thank you very much
@npavlovi1
@npavlovi1 9 ай бұрын
I can't tell you how helpful this was! Thank you so so much!
@francescacatellani1021
@francescacatellani1021 8 ай бұрын
Really helpful, thank you!
@rafidahmohdariff
@rafidahmohdariff Жыл бұрын
thank you for this video. I would like to ask, do must have replicates data in order to be able to run volcano analysis? If we only have one set data, does it mean that we cannot run the volcano analysis? Thank you so much.
@biostatsquid
@biostatsquid Жыл бұрын
Great question! I'm assuming you're using a volcano plot to visualize the results of statistical tests comparing two different conditions (e.g., treatment vs. control) for a large number of variables (e.g., genes, proteins). If you have a single data point for each condition (e.g., one sample for treatment and one sample for control, so 1 replicate), you won't be able to calculate meaningful statistics like fold change or perform traditional hypothesis testing. So you can plot it using a volcano plot if you'd like to, but it's what you plot that's the problem - the p-values/log2FC won't be really robust, and the interpretation and usefulness of the plot will be limited. Hopefully this helped!
@Aoffyfeefy
@Aoffyfeefy Жыл бұрын
Thank you so much. ❤❤❤
@mohamedirfan5480
@mohamedirfan5480 8 ай бұрын
Hi Dr Laura. What is the least log FC threshold one can use?
@biostatsquid
@biostatsquid 8 ай бұрын
Hi! Thanks for your question, it's a really good one. I guess... it depends on you and your data! Remember that fold change is the ratio of the change, so if your gene expression doubles (30 > 60) between conditions, FC = 2 -> so logFC = 1. So logFC = 1 is already quite a big change. logFC of a 50% increase (e.g., 30 > 45) = 0.18. If you have a lot of DGEs, you might want to go with a higher threshold to filter out stuff. If you have very few, you might want to lower it - but at some point it stops being biologically meaningful. E.g., even if significant, does a fold change of 1.2 (e.g., 30 > 35), logFC = 0.07, tell you much? (Maybe it does) If your variable is thought to be constant between conditions, maybe it is an interesting finding to report, but e.g., if you are thinking of a marker gene, you might want to go with a different one with higher logFC! Hope this answered your question! (Btw I'm not a doctor! You can just call me Laura:)
@huiminlu8436
@huiminlu8436 6 ай бұрын
i love the instruction, thank you so much.
SHAPALAQ 6 серия / 3 часть #aminkavitaminka #aminak #aminokka #расулшоу
00:59
Аминка Витаминка
Рет қаралды 2,2 МЛН
An Unknown Ending💪
00:49
ISSEI / いっせい
Рет қаралды 57 МЛН
How To Create A Volcano Plot In GraphPad Prism
11:38
Steven Bradburn
Рет қаралды 111 М.
Principal Component Analysis (PCA) - easy and practical explanation
10:56
Volcano plots explained | How to interpret a volcano plot for DGE
6:01
DESeq basics
44:22
mike vandewege
Рет қаралды 24 М.
Differential expression in Python with pyDESeq2
16:19
Sanbomics
Рет қаралды 21 М.
Pathway enrichment analysis - simple explanation!
12:53
Biostatsquid
Рет қаралды 23 М.
Step-by-step heatmap tutorial in R with pheatmap()
24:57
Biostatsquid
Рет қаралды 10 М.