Roary pan genome tutorial | Bioinformatics tutorial on Pangenome analysis of bacterial genomes

  Рет қаралды 15,392

Bioinformatics Coach

Bioinformatics Coach

Күн бұрын

This video shows you the step-by-step process of performing pangenome analysis using the tools Prokka and Roary.
Download the script and materials from here: / roary-pan-genome-75792858
Support my work
www.buymeacoff...
www.paypal.com...
/ bigdataanalytics
One-on-One coaching (Video Conferencing)
calendly.com/b...
One-on-One coaching(Audio Call)
clarity.fm/vin...
Get more bioinformatics tutorials on Patreon
/ bigdataanalytics
Subscribe to my channels
Bioinformatics: / @bioinformaticscoach
Data Science: / @datasciencecoach
Short Clips: / @bioinformaticscforbeg...
Reach out
bioinformaticscoach@gmail.com
Materials
________________________________________________________________________________________________
How to install anaconda
Linux • Install , Configure an...
MacOS • Install, Configure and... - GUI Installer
• Install, Configure and... - Command line installer
Roary github page
github.com/san...
python script:github.com/san...
Data for the S.aureus strains can be downloaded from here
www.ncbi.nlm.n... V521 strain
www.ncbi.nlm.n... M48 strain
www.ncbi.nlm.n... P10 strain
www.ncbi.nlm.n... AR465 strain
www.ncbi.nlm.n... NRS1 strain
www.ncbi.nlm.n... R50 strain
Chapters
00:10 Outline
01:33 Explanation and importance of pangenome analysis
03:04 PC Requirement
04:02 Add conda channels
05:19 Create conda environment and install tools
06:36 Activate conda environment
07:15 Set working directory
07:57 Download roary_plot.py python script : github.com/san...
09:40 Install python dependencies
13:44 Download genome sequences
17:31 Perform genome annotation using prokka
24:29 Perform pangenome analysis using roary
33:52 Roary output
34:04 Interpret results
34:09 Gene presence and absence file
36:19 Pangenome matrix
38:17 Pangenome pie chart
#bioinformatics #genomics #bioinformática #microbiology #datascience

Пікірлер: 58
@bioinformaticscoach
@bioinformaticscoach 2 жыл бұрын
Download the script and materials from here: www.patreon.com/posts/roary-pan-genome-75792858
@Vanilla_sky_8781
@Vanilla_sky_8781 8 ай бұрын
Thanks for your video, Coach 😊 4:11 add conda channels 6:45 activate environment 7:36 set working directory 8:05 download python script 9:50 install python libraries 13:52 download sequences 17:39 perform genome annotation
@Vanilla_sky_8781
@Vanilla_sky_8781 8 ай бұрын
24:29 perform pangenome analysis using Roary 29:44 FastTree➡️image 34:20 gene presence and absence file 36:21 result: pangenome matrix 38:21 result: pangenome piechart 🎉Thank you again
@michaelmugo5126
@michaelmugo5126 Ай бұрын
I really enjoyed this! I Had a lot of challenge following Roary git but with your explanation it was quite straight Kudos!
@forbesavila8006
@forbesavila8006 Жыл бұрын
First, thanks man your vids help me alot. Can you please continue this topic and teach us using SCOARY?
@edwardoseigyimah8847
@edwardoseigyimah8847 3 ай бұрын
Thank you very much sir, can you consider doing a tutorial on pangenome analysis of different species of bacteria using the GET_HOmologues tools?
@yushanlin2745
@yushanlin2745 2 жыл бұрын
Coach, do you know how to do function annotion after pangenome, for example do function annotion with KEGG and GO on core gene. This confuse me for a long time.
@kalonjitshisekedi6037
@kalonjitshisekedi6037 2 жыл бұрын
Very informative video Tanks!
@genhub9288
@genhub9288 5 ай бұрын
Hi Coach, I tried with gffs generated by bakta but get this warning(MSG: ##feature-ontology header tag parsing unimplemented) and quite different results than prokka. Can't be used bakta with roary?🤔
@CHARLES-ADOLFUSHIRIMA
@CHARLES-ADOLFUSHIRIMA 5 ай бұрын
Hello coach , can you please explain how to solve this error?. Use of uninitialized value in require at /usr/lib/x86_64-linux-gnu/perl5/5.36/Encode.pm line 70.
@gianmarcocastillohuaccho244
@gianmarcocastillohuaccho244 6 ай бұрын
gold
@VIDYARASHMIHANEHALLI
@VIDYARASHMIHANEHALLI Жыл бұрын
when I try to run roary, it shows there's some file missing, i.e the rule module, I am not able to understand this. If someone knows about this, could you please help?
@rianrafsan7433
@rianrafsan7433 2 жыл бұрын
Update:Please activate conda environment AFTER you add channels and THEN you perform installation of softwares l(prokka /roary or any other)....otherwise you will face error.
@christyakachukwu2660
@christyakachukwu2660 2 жыл бұрын
Please can one replicate this for malaria parasite
@mofaophoka2601
@mofaophoka2601 3 жыл бұрын
Coach , can you do a vedio on the use of .Rtab and the use of query_pan_genome use to understand the results further
@bioinformaticscoach
@bioinformaticscoach 3 жыл бұрын
On the roary web page there is instructions on the usage of .Rtab sanger-pathogens.github.io/Roary/
@luisrendon5792
@luisrendon5792 3 жыл бұрын
Hey, thaks for this video, has been useful. I have a question, can I run this in Mac terminal? I've started this in my mac, at the beggining I did'nt have problems, but I got a problem in the step: "Download the polishing tool pilon", the error is: wget is not found. How can I solve this? Thanks
@bioinformaticscoach
@bioinformaticscoach 3 жыл бұрын
Yes. wget is a downloader. You can use another downloader such as curl. Just make sure you have installed that first
@anindorahman2600
@anindorahman2600 2 жыл бұрын
Hello sir, I am facing problem with installing the roary its showing solving environment for a vast amount of time. Please help me in this matter. Another problem is that after using roary when i run the FastTree app its runs but after some time it gets killed message in the end. And the mytree.newick file remains blank how to solve this problem sir? Thanks in advance
@muhammadnafees6192
@muhammadnafees6192 2 жыл бұрын
when I apply conda create -n pangenome prokka roary ......results as Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source. Collecting package metadata (repodata.json): done Solving environment: \ did you resolve and how I can ? please
@remousocloo5213
@remousocloo5213 Жыл бұрын
Hi @Vincent, please I installed roary via docker but the core_genome_alignment.aln was not generated. I was not able to install roary using conda it is stacked at solving environment . I try updating conda but still did not work. Please kindly advise
@bioinformaticscoach
@bioinformaticscoach Жыл бұрын
Which operating system do you use?
@remousocloo5213
@remousocloo5213 Жыл бұрын
I use Ubuntu Linux as a virtual machine
@bioinformaticscoach
@bioinformaticscoach Жыл бұрын
Check the roary page. There is an instruction for Ubuntu users: github.com/sanger-pathogens/Roary
@remousocloo5213
@remousocloo5213 Жыл бұрын
@@bioinformaticscoach okay thansk. I managed to generate the .aln uding the -e option
@MamtaPuraswani
@MamtaPuraswani Жыл бұрын
Can you please do a session on Kraken also
@bioinformaticscoach
@bioinformaticscoach Жыл бұрын
You can check my metagenomics playlist
@muhammadnafees6192
@muhammadnafees6192 2 жыл бұрын
Hello Coach: when I apply conda create -n pangenome prokka roary ......results as Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source. Collecting package metadata (repodata.json): done Solving environment: \ please help me how to solve the problem? Thanks in advance
@bioinformaticscoach
@bioinformaticscoach 2 жыл бұрын
Try updating conda
@muhammadnafees6192
@muhammadnafees6192 2 жыл бұрын
@@bioinformaticscoach thanks for your response. I updated but not resolved the problem. May I separate create environment for prokka and roary?
@joseluisvillalpandoaguilar7546
@joseluisvillalpandoaguilar7546 Жыл бұрын
hello could you test the follow command by install roary: conda install -c bioconda/label/cf201901 roary
@chin-soonphan4979
@chin-soonphan4979 3 жыл бұрын
\\wsl$\Ubuntu\home\phan\documents\sequences (inside folder sequences has a 1.fasta). I type: prokka --cpus 4 --kingdom Bacteria --prefix files1 sequences/1.fasta But this message came out "sequences/1.fasta is not a readable non-empty FASTA file". Do you know what went wrong?
@bioinformaticscoach
@bioinformaticscoach 3 жыл бұрын
Its likely that the correct path for the fasta file was not specified.
@mofaophoka2601
@mofaophoka2601 3 жыл бұрын
Try to rename the file, it will work
@mofaophoka2601
@mofaophoka2601 3 жыл бұрын
Coach, based on the gene_count_summary.py script, What is the basis of the results? I try to use that python script but the vandiagrans were not reflecting the csv files shared genes count in terms of intersections.
@bioinformaticscoach
@bioinformaticscoach 3 жыл бұрын
The gene_count_summary.py script is used for comparing genes for three samples. You said teh counts were not reflecting what was in the csv. Please give additional details of what you saw, then we can discuss.
@mofaophoka2601
@mofaophoka2601 3 жыл бұрын
@@bioinformaticscoach please share the mail with me so that I sent you my results and my csv. What I mean is, when I take 3 strains and check on csv file , I will see more than "50"shared genomes between 2 strains, but after running the python script, The intersections count will be "2". Unless there is dependency I need, I installed pandas, matplotlib, seaborn and biopython. I also opened the script but I am not good in python, I could not understand the coding, but I felt not sure of the intersection gene counts as the van-diagrams showed low numbers in comparison to what I see with necked eye on my csv file of gene_presence_absence. Unless I am not clear of the outcomes gene court
@christyakachukwu2660
@christyakachukwu2660 2 жыл бұрын
Please I want to understand Pangenome, can one build a pangenome using different Genus of bacteria
@christyakachukwu2660
@christyakachukwu2660 2 жыл бұрын
@bioinfocoach I asked this because we were asked to build a pangenome using the genome of rice and tomatoes, but I felt it was not right
@bioinformaticscoach
@bioinformaticscoach 2 жыл бұрын
@@christyakachukwu2660 Yest its possible but to do that. What I will do in your case is to look for research articles on pangenome. Read them , look at at what has been done. This will guide you on what to do.
@christyakachukwu2660
@christyakachukwu2660 2 жыл бұрын
@@bioinformaticscoach thanks but the project has passed however I want to build a pangenome for plasmodium falciparium, don't now if you will guide me through
@davidphilips1823
@davidphilips1823 2 ай бұрын
Using the core_gene_alignment.aln to generate the mytree.newick takes me longer than usual I rather used accessory_binary_genes.fa.newick to generate the three visualized information If I use this instead, will be visualization be wrong?
@mofaophoka2601
@mofaophoka2601 3 жыл бұрын
I have a problem with roary, It runs well but show the message of (Parallel citation) message pops and stops the programs) .What is the solution to this problem. I work on remote server , the could be low CPU of my computer
@bioinformaticscoach
@bioinformaticscoach 3 жыл бұрын
trying using small number for the cpus eg. 4 and see if it runs
@mofaophoka2601
@mofaophoka2601 3 жыл бұрын
@@bioinformaticscoach my laptop has 2 cpus, prokka was able to annotate with 2 cpu. One important matter to highlight is , I tested the roary with 4 genomes and results came Out well. Now I am using 20 genomes, also, when I use 4 again like before, the message persist.
@mofaophoka2601
@mofaophoka2601 3 жыл бұрын
@@bioinformaticscoach I am suing Core i5 laptop but working remotely on a server.
@bioinformaticscoach
@bioinformaticscoach 3 жыл бұрын
from what I see its running but probably its a bit slow. So you have to increase the number of CPUs. The time taken will also depend on how many samples you have.
@mofaophoka2601
@mofaophoka2601 3 жыл бұрын
@@bioinformaticscoach alright, let me see how I get a better machine tomorrow, any recommendations?
@oluwatoyosietal3616
@oluwatoyosietal3616 3 жыл бұрын
Can this approach be used in crop plants too?
@bioinformaticscoach
@bioinformaticscoach 3 жыл бұрын
I am not sure about it. The original paper made mention of prokaryotes. You can read the full paper here: academic.oup.com/bioinformatics/article/31/22/3691/240757 You can check other tools as well. You can download the book I showed in the presentation and read it and get the list of tools.
@shivasatija9889
@shivasatija9889 3 жыл бұрын
Can this workflow be used for viral genomes
@bioinformaticscoach
@bioinformaticscoach 3 жыл бұрын
It depends on whether tools have been to designed to also work on viral genomes. Maybe you can find out and let us know. Thanks
Whoa
01:00
Justin Flom
Рет қаралды 60 МЛН
Modus males sekolah
00:14
fitrop
Рет қаралды 21 МЛН
Rayan Chikhi | Hands-on introduction to pangenome graphs
41:05
Computational Genomics Summer Institute CGSI
Рет қаралды 2,3 М.
How to use DAVID for functional annotation of genes
12:55
Genomics Guru
Рет қаралды 77 М.
Как устроен рынок биоинформатики | Андрей Афанасьев, yRisk
58:47
Bioinformatics Institute | Институт биоинформатики
Рет қаралды 7 М.
R Programming for Beginners | Complete Tutorial | R & RStudio
49:45
Dynamic Data Script
Рет қаралды 715 М.
🚨 YOU'RE VISUALIZING YOUR DATA WRONG. And Here's Why...
17:11
Adam Finer - Learn BI Online
Рет қаралды 150 М.
How AI 'Understands' Images (CLIP) - Computerphile
18:05
Computerphile
Рет қаралды 200 М.
Using MAUVE for multiple genome alignments.
15:13
Genome Projects
Рет қаралды 13 М.
NGS Data Analysis 101: RNA-Seq, WGS, and more - #ResearchersAtWork Webinar Series
33:29
Applied Biological Materials - abm
Рет қаралды 81 М.
Whoa
01:00
Justin Flom
Рет қаралды 60 МЛН