Phylogenetic Analysis of ITS sequences in R

  Рет қаралды 17,868

Russell Gray

Russell Gray

4 жыл бұрын

A beginning-to-end tutorial of gathering ITS sequence data, reading it into R, aligning the data, and performing analyses/building phylogenetic trees.
Take my Ecology in R course for more valuable tutorials on ecology and phylogenetic analyses in R:
www.udemy.com/course/ecology-...
***Note: This is a simple exercise which should be interpreted as an example of using sequences in R, but is in no way a reflection of how to use sequences in any real-world analysis.
Link to code & data:
github.com/RussellGrayxd/Phyl...
To pre-emptively address errors:
- I said Tyrosine instead of Thymine for the T in the codon sequences, sorry about that. I was a bit tired when I made this video.
- I said I'm going to tick 10 sequences and only ticked 9... same excuse.
Thanks for understanding.

Пікірлер: 68
@marksanda9786
@marksanda9786 4 жыл бұрын
Thank you Russ, this made my day.
@shayan9882
@shayan9882 2 жыл бұрын
I can't believe how much easier this was in comparison to my attempts with the msa package, thank you !
@joydeepnag885
@joydeepnag885 8 ай бұрын
Thank you very much!!! If only people kept it short and sweet such as this.. Kudos... :)
@TomasDuqueAcosta
@TomasDuqueAcosta Жыл бұрын
Thanks for the detailed explanation, I could create a tree really easy with your video
@vasilikiskiada2332
@vasilikiskiada2332 3 жыл бұрын
Nicely explained! Thank you
@ruddhidavidwans292
@ruddhidavidwans292 2 жыл бұрын
Thank you so much for this detailed video....It helped me a lot with my analysis. It will be of great help if you can also show how to analyze publically available RNAseq from NCBI GEO.
@jimmychurchward890
@jimmychurchward890 2 жыл бұрын
Such a big help thank you!!
@abubakarbashir7951
@abubakarbashir7951 4 жыл бұрын
Nice job, keep it up.
@DG-xg8vg
@DG-xg8vg 4 жыл бұрын
Good job thanks for sharing!
@andreacassarino1342
@andreacassarino1342 4 жыл бұрын
Nice job
@alexisjose7515
@alexisjose7515 4 жыл бұрын
excelente!
@archimedemulega7086
@archimedemulega7086 3 жыл бұрын
Thanks!
@siffo10
@siffo10 4 жыл бұрын
Really good stuff. Is there an attr() like function that will allow one to pull the geographic location of each of the sequence? Is that included in the metadata?
@siffo10
@siffo10 3 жыл бұрын
@Joe Partington-Smith Geographic location.
@margauxk952
@margauxk952 3 жыл бұрын
Great video! Do you have any recommendations of packages or code for MLST analysis in R?
@RJG_Ecology
@RJG_Ecology 3 жыл бұрын
Hey Margaux, yes there are two packages that are used for MLSR in R: 1) MLSTar bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2887-1 github.com/iferres/MLSTar and 2) STRAIN bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2887-1 I'm not very familiar with it but also 3) mlstverse github.com/ymatsumoto/mlstverse and 4) StrainR github.com/jbisanz/StrainR This blog may be helpful too: www.r-bloggers.com/2017/01/descriptive-analysis-of-mlst-data-for-mrsa/
@drali87
@drali87 Жыл бұрын
How do we define the node values?
@SvaraMandira
@SvaraMandira 11 күн бұрын
Thanks for the detailed explanation, how do you run the boot strap.
@RJG_Ecology
@RJG_Ecology 11 күн бұрын
You can use the msa package and a few others for the Bayesian and ML analyses you would usually see in softwares like MEGA. Here's an rbubs blog that goes through the basic process: rpubs.com/mvillalobos/L01_Phylogeny
@Hekateras
@Hekateras 2 жыл бұрын
Very helpful guide. Question: Why use neighbor-joining instead of something like Maximum Likelihood to build your tree?
@RJG_Ecology
@RJG_Ecology 2 жыл бұрын
Just using defaults for the tutorial, the packages have multiple different methods that can be applied, I'm not sure if this one has Bayesian inference or not, but I agree that or MLE would probably be optimal!
@nobodyreally1634
@nobodyreally1634 2 жыл бұрын
Does anyone know how to build the tree using Maximum Likelihood instead of neighbor-joining?
@judithestherbairdlujano1493
@judithestherbairdlujano1493 4 жыл бұрын
Thanks for the video Russ! Does your Udemy course includes how to run phylogenetic analysis using the maximum likelihood method?
@RJG_Ecology
@RJG_Ecology 4 жыл бұрын
It does not, but I think I may add this in the near future. If you're already a student of the course, you can add a question regaurding this on the message board and I would be happy to post some code to walk you through it.
@dr.ahmedelaswad5453
@dr.ahmedelaswad5453 4 жыл бұрын
Great job! How do you get the values for the nodes?
@RJG_Ecology
@RJG_Ecology 4 жыл бұрын
Hey Ahmed, you can find node values within the phylo object (which I named "tre" in the tutorial) by using the function nodepath(). In this case you would run nodepath(tre), and it will show the initial node first (the entire tree) the secondary node (in this case my three secondary nodes are 17, 19, and 20), where the first branches are rooted, and so on...
@dr.ahmedelaswad5453
@dr.ahmedelaswad5453 4 жыл бұрын
@@RJG_Ecology Thank you very much, Russ.
@michellecheng6549
@michellecheng6549 Жыл бұрын
Thanks!! How can we group sequences into different colors based on their taxonomic group?
@RJG_Ecology
@RJG_Ecology Жыл бұрын
Can you elaborate a bit more on what you want to do? In the meantime here is the ggtree documentation, it might have what you're looking for. 4va.github.io/biodatasci/r-ggtree.html
@nailagulzar4328
@nailagulzar4328 2 жыл бұрын
Hi. It was easy. Thanks. Can you please provide some information that can allow me do diversity estimation using phylogenetic trees ( I don’t have any count matrix. I only have sequences from hiv patients). What R package can do that? Is there a GUI tool that can do diversity estimation and statistical test (t-test) ?
@RJG_Ecology
@RJG_Ecology 2 жыл бұрын
Tons of phylogenetics R packages. For tree building and visualization I would say phylotools, phytools, ape, and ggtree package are the most helpful. RevGadgets package has a mix of everything. see their paper here: besjournals.onlinelibrary.wiley.com/doi/epdf/10.1111/2041-210X.13750
@khanofficial2249
@khanofficial2249 2 жыл бұрын
Hi sir, it was very really informative R function, can i apply this function on Tree data?
@RJG_Ecology
@RJG_Ecology 2 жыл бұрын
With tree data do you mean ".tre" files? You can just read those in with the read.tree function and combine them with merge_tree if they have common variables
@khanofficial2249
@khanofficial2249 2 жыл бұрын
@@RJG_Ecology thank you sir for reply. Actually my teacher told me to calculate phylogenetic diversity from tree data he send me i can use R but it makes me more confuse since last week i m trying did not find anyway how to do it. If you can guide me about phylogenetic diversity would be very appreciated. Thank you sir.
@MrAraxon
@MrAraxon 3 жыл бұрын
Great job! I am developing an algorithm via the R program to create phylogenetic trees and calculate values that interest me like homoplasy, CI, RI etc. On 2019 I had used a function called ''matord'' but I can't find it anymore. Specifically I needed it for calculation of two matrices for CI and RI. Is there any way to know something about this function ? The packages that I used to complete the creation of phylogenetic trees and calculate the homoplasy and the distance are: phangorn, ape, ade4, graphics, and seqinr. Nicely explained! Thank you very much!
@RJG_Ecology
@RJG_Ecology 3 жыл бұрын
Hey Nic, the function matord doesn't ring any bells for me... do you know specifically what package it was from, or do you know what the function does? If the purpose is as the name suggests, to order a matrix, there is simple ways to do that in R depending on what way you're trying to order values. There seems to be a custom object within a function of the ClusterSeq package with the name "matord" but that's about all I could find rdrr.io/bioc/clusterSeq/src/R/associatePosteriors.R
@RJG_Ecology
@RJG_Ecology 3 жыл бұрын
Also, there's this custom function gist.github.com/pedroj/1872314
@MrAraxon
@MrAraxon 3 жыл бұрын
@@RJG_Ecology In order to test the relation between distance and homoplasy I create this algorithm. The general concept of algorithm is to look for the most central strain of a given group of strains. This strain is the one that minimizes the average distance within a square distance matrix. Once the most central strain has been found, the other strains are sorted in increasing distance order. Adding one strain at a time, it is possible to have an increasing number of strains coming into play. At each addition, homoplasy and average distance of the strains from the most central strains are calculated and plotted. This procedure allows to consider carefully the trend of homoplasy and distance, as well as the Rescaled Index.
@RJG_Ecology
@RJG_Ecology 3 жыл бұрын
@@MrAraxon Not sure if you've seen this package yet, but maybe it has some helpful functionality? www.ncbi.nlm.nih.gov/pmc/articles/PMC6412054/
@ticklishpineapples
@ticklishpineapples 2 жыл бұрын
Do you have any suggestions for renaming the tip labels from GenBank accession numbers to genus names?
@RJG_Ecology
@RJG_Ecology 2 жыл бұрын
Hi Pamela, Yes actually! The taxonomizr package (see tutorial here: cran.r-project.org/web/packages/taxonomizr/readme/README.html) has a two step process for this purpose with functions "accessionToTaxa", which convert accession numbers to taxonomic IDs, and then "getTaxonomy" convert taxonomic IDs to taxonomy. They have examples of how to do so in the link. Let me know if you run into any issues!
@vasilikiskiada2332
@vasilikiskiada2332 3 жыл бұрын
Hello, I would like to add bootstrap values in my tree. Any idea how to do that? Thank you.
@RJG_Ecology
@RJG_Ecology 3 жыл бұрын
Hey Vasilik, yes! So bootstrap values need to be appended to the phylo object itself as node labels, and then called in the ggtree as geom_nodetext. The top answer on the stackoverflow question addresses this in detail as well as how you can apply it yourself with coded examples: stackoverflow.com/questions/22749634/how-to-append-bootstrapped-values-of-clusters-tree-nodes-in-newick-format-in
@RJG_Ecology
@RJG_Ecology 3 жыл бұрын
Check this guys response too: www.researchgate.net/post/SOLVED_How_do_you_export_bootstrap_node_support_in_Rs_ape_package
@vasilikiskiada2332
@vasilikiskiada2332 3 жыл бұрын
@@RJG_Ecology thank you very much. I may have found a solution by calculating bootstrap values with boot.phylo() and assigning them to the phylo object with full_join() but I will also take a look at the page you are suggesting.
@Andi-mg2eh
@Andi-mg2eh 2 жыл бұрын
@@vasilikiskiada2332 would you mind sharing your solution?
@mariachalsev9219
@mariachalsev9219 2 жыл бұрын
I keep getting this error : in f(p.profile[, anchors[2, n - 1]:anchors[1, n], drop = FALSE], : Alignment larger (16,317,302,694) than the maximum allowable size (2,147,483,647) Could you help me understand why? I've already tried DECIPHER in two different versions: 2.20.0 and 2.22.0
@RJG_Ecology
@RJG_Ecology 2 жыл бұрын
What line of code are you getting this error and with what data?
@ramshaazhar7338
@ramshaazhar7338 2 жыл бұрын
Can you please share this code .
@RJG_Ecology
@RJG_Ecology 2 жыл бұрын
Link to the code and data is in the description already
@Gayensubrata89
@Gayensubrata89 2 жыл бұрын
I am getting an Error in gray(valgris[numclass]) : invalid gray level, must be in [0,1]. how to solve that?
@RJG_Ecology
@RJG_Ecology 2 жыл бұрын
What lines of code are you running when you get the error?
@Gayensubrata89
@Gayensubrata89 2 жыл бұрын
@@RJG_Ecology temp
@RJG_Ecology
@RJG_Ecology 2 жыл бұрын
@@Gayensubrata89 it looks like the ade4 "table.paint" function has updated and removed the argument "cleg", just remove that and it should work. i.e. table.paint(temp, clabel.row=.4, clabel.col=.4)+ scale_color_viridis()
@Gayensubrata89
@Gayensubrata89 2 жыл бұрын
@@RJG_Ecology No it is not working. Still having the error- Error in gray(valgris[numclass]) : invalid gray level, must be in [0,1].
@RJG_Ecology
@RJG_Ecology 2 жыл бұрын
@@Gayensubrata89 Please show me the code you ran to get that error.
@uguremre3287
@uguremre3287 2 жыл бұрын
could not find function "OrientNucleotides" I got this error. Could you pls help me guys
@RJG_Ecology
@RJG_Ecology 2 жыл бұрын
Hey Ugur, the reason for this error in R is that you have not opened the function library. In this case, the function library is "DECIPHER". Make sure you have installed DECIPHER using: if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("DECIPHER") and then open it using: library(DECIPHER)
@uguremre3287
@uguremre3287 2 жыл бұрын
@@RJG_Ecology Thank you for replying Russ. But I got new error like this: in f(p.profile[, anchors[2, n - 1]:anchors[1, n], drop = FALSE], : Alignment larger (9,174,518,227) than the maximum allowable size (2,147,483,647). How can I fix it?
@RJG_Ecology
@RJG_Ecology 2 жыл бұрын
@@uguremre3287 The maximum allowable size for alignments with DECIPHER alignseqs() is 2,147,483,647. Therefore anything larger will need to use a different alignment function such as FindSynteny() followed by AlignSynteny().
@uguremre3287
@uguremre3287 2 жыл бұрын
@@RJG_Ecology I tried to run from AlignSynteny() but I couldn't figure out it:(
@uguremre3287
@uguremre3287 2 жыл бұрын
Error in AlignSynteny(apricot) : synteny must be an object of class 'Synteny'
@gembarry8280
@gembarry8280 4 жыл бұрын
Your video is good however, the poor visual makes it difficult to follow R commands
@agricultureenginner8852
@agricultureenginner8852 3 жыл бұрын
thanks for share... nice job but the link is not working (github.com/RussellGrayxd/Phylogenetics). where can i find the formulas for rstudio
@RJG_Ecology
@RJG_Ecology 3 жыл бұрын
The link is working fine on my end. Check your browser and firewall settings, could also be connection. Can you access github by itself? github.com/
Regression Trends SMART R-Plugin
7:56
Russell Gray
Рет қаралды 368
Editing phylogenetics trees in FigTree
29:43
Janecka Genomics
Рет қаралды 18 М.
Каха и суп
00:39
К-Media
Рет қаралды 5 МЛН
R for Bioinformatics | How to Visualize Phylogenetic Trees using ggtree
29:30
Bioinformatics Coach
Рет қаралды 3,7 М.
Likert Scale Analysis in R
12:44
Russell Gray
Рет қаралды 3,1 М.
Dendrogram Phylogram Data Visualization Tutorial
17:01
mighster
Рет қаралды 10 М.
1. Phylogenetic analysis of pathogens(lecture - part1) -
7:18
The Roslin Institute - Training
Рет қаралды 141 М.
Interpreting phylogenetic trees
22:47
Janecka Genomics
Рет қаралды 38 М.
Editing phylogenetic data matrices for use in PAUP and TNT
21:23
Robert Asher
Рет қаралды 4,2 М.
Analyzing DNA Sequences Using MEGA & BLAST
8:29
Lisa Sorlie
Рет қаралды 86 М.
Import Sequences From NCBI in R
6:49
Bioinformatics With Ease
Рет қаралды 2,3 М.