Whole Genome Sequence Analysis | Bacterial Genome Analysis | Bioinformatics for Beginners

  Рет қаралды 27,271

Bioinformatics Coach

Bioinformatics Coach

3 жыл бұрын

This tutorial shows you how to analyze whole genome sequence of a bacterial genome.
Thank me with a Coffee: www.buymeacoffee.com/informat...
Book a Session(One on One)
calendly.com/bioinformaticscoach
Get more bioinformatics tutorials on Patreon: / bigdataanalytics
One-on-One coaching (Video Conferencing): calendly.com/bioinformaticscoach
One-on-One coaching(Audio Call): clarity.fm/vincentappiah
Buy me a Coffee: www.buymeacoffee.com/informat...
Support my work
www.buymeacoffee.com/informat...
www.paypal.com/paypalme/thein...
/ bigdataanalytics
Subscribe to my channels
Bioinformatics: / @bioinformaticscoach
Data Science: / @datasciencecoach
Short Clips: / @bioinformaticsclips
Reach out
bioinformaticscoach@gmail.com
Materials
__________
Source of Data: www.ncbi.nlm.nih.gov/pmc/arti...
Github repository of pipeline :
github.com/vappiah/bacterial-genomics-tutorial
How to setup Linux in Windows
• Quickly Install and us...
How to install Anaconda in Linux
Linux • Install , Configure an...
MacOS • Install, Configure and... - GUI Installer
• Install, Configure and... - Command line installer
How to install BRIG
• A bioinformatics tutor...
Other useful videos
• Roary pan genome tutor... Pan genome Analysis
How to find antimicrobial resistance genes
• Antimicrobial Resistan...
Genome visualization
• Genome Visualization T...
Reading fasta and genbank files
• Python for Bioinformat...
Chapters
00:01 Introduction
00:26 Analysis workflow
00:48 Where to find the scripts
01:59 Setting up the analysis pipeline
08:43 Running the commands
download the example data
fastqc quality control
trim fastqreads with sickle
genome assembly with spades
evaluating the genome assembly
reference guided scaffolding of the contigs
genome sequence annotation with prokka
multi locus sequence typing
antimicrobial resistance genes detection with abricate
pangenome analysis with roary
genome comparison with BRIG
47:38 Explaining results for ANI-Dendogram
52:36 Explaining results for Pangenome Analysis
58:01 MLST output
58:37 AMR output
58:59 Genome map
Tools Used
Anaconda for software installation
SPADES for genome assembly
PROKKA for genome annotation
ROARY for pangenome analysis
Abricate for antimicrobial gene detection
BRIG for sequence visualization
Python for data visualization
Pilon assembly polishing
FASTQC for Quality Control
QUAST for evaluating genome assembly
RagTag for reference guided scaffolding
quality control
genome assembly
genome annotation
prokka
roary pangenome analysis
genome visualization
A bioinformatics tutorial on Bacterial Whole Genome Sequence Analysis | Comparative Genomics
Whole Genome Sequence Analysis | Bacterial Genome Analysis | Bioinformatics for Beginners
bacteria
#bioinformatics #bioinformática #bioinformaticsforbeginners #linuxforbeginners #genomesequencing #genomics #genome #microbiology
#Bioinformatics #Microbiology #DataScience #Genomics

Пікірлер: 95
@elizabethgyamfi1617
@elizabethgyamfi1617 3 жыл бұрын
Great work. Simplified presentation. Well done
@ayoajayi280
@ayoajayi280 3 жыл бұрын
Hello.
@humphreyaddy7716
@humphreyaddy7716 11 ай бұрын
I wish I discovered this channel long ago. I have all the resources to become a good in the area of bioinformatics.
@naveedkhan-fi6ux
@naveedkhan-fi6ux Жыл бұрын
a great piece of work..... awesome explanation, make it easy to follow....... I wish you could upload a video for fungus comparative genome to sort out the effector
@kubrateksen8845
@kubrateksen8845 2 жыл бұрын
Amazing, we are waiting more videos.
@yusufomowumi4771
@yusufomowumi4771 2 жыл бұрын
This video was very helpful. Can you do a tutorial on how to detect contamination from reads? Thank you!
@ehecatl3830
@ehecatl3830 2 жыл бұрын
Thanks Dr. Your are very good!!!!!
@ldipotet
@ldipotet 3 жыл бұрын
that's amazing work you have done here !! congrats
@bioinformaticscoach
@bioinformaticscoach 3 жыл бұрын
Thanks. Expect more of such videos soon.
@ldipotet
@ldipotet 3 жыл бұрын
@@bioinformaticscoach A challenge that could be interesting could be all these commands in a CWL pipeline.
@bioinformaticscoach
@bioinformaticscoach 3 жыл бұрын
@@ldipotet Yes. That will interesting. Maybe we can take it up in the future.
@alita2220
@alita2220 10 ай бұрын
This is an amazing tutorial, thank you! Because the sequence data is short + long, I am changing a few softwares for pacbio hifi data, it teaches me how to fish, it would be great if in the future there are videos for calling variants!
@bioinformaticscoach
@bioinformaticscoach 10 ай бұрын
You can watch the tutorial on snippy, bcftools and freebayes.
@leonmaric5055
@leonmaric5055 2 жыл бұрын
Very helpful! greetings
@biozarrice
@biozarrice 3 жыл бұрын
Good Morning. I think your bioinformatics tutorials are amazing. Could you do a tutorial on genome annotation of eukaryotic organisms?
@bioinformaticscoach
@bioinformaticscoach 3 жыл бұрын
Thanks Fernando for the suggestion. I will consider that.
@johirislam8174
@johirislam8174 8 ай бұрын
hlw. does this lectures covers the WGS data analysis from initial to final in linux ??? I mean from quality check to variant calling variant annotation??
@bioinformaticscoach
@bioinformaticscoach Жыл бұрын
One-on-one coaching ______________________________________________________________________________________________ clarity.fm/vincentappiah Reach out ______________________________________________________________________________________________ bioinformaticscoach@gmail.com
@yushanlin2745
@yushanlin2745 2 жыл бұрын
Thank you for such an amazing video, it really help me a lot with my research. I have a confusion: is reorder indispensable for bacterium assembly? Whether ignoring reorder affeccts pangenome analysis. I have finished mlst, detect virulence gene and it doesn't matter. My data is iIllumina NovaSeq Paired-end, 2×150bp. I read paper of Ragtag and find the data is long-read genome sequencing (average 15 kbp ) and from plant. Looking forward for your reply. Thans again.
@bioinformaticscoach
@bioinformaticscoach 2 жыл бұрын
Reordering is not really necessary for pangenome. But I advise you do that if you want to generate a draft sequence of your sample.BEcause it maps your sequence to a reference genome and reorder the contigs using the reference genome as template. So the sequence is you get is better than the raw assembly assembly contigs.
@SobinSGupta-vq3zn
@SobinSGupta-vq3zn 2 жыл бұрын
your tutorial is amazing but when I try to follow the same steps automatically it loads the the sequence you have used for demonstration. How can I work with my own sequence following the same steps mentioned in the video. please reply me as early as possible. I will be really thankful
@billclintonaglomasa6543
@billclintonaglomasa6543 3 жыл бұрын
Great.
@MrManikprabhu
@MrManikprabhu Жыл бұрын
Hi, is it possible to make venn diagram for five or six genomes?
@muhammadshafiq3242
@muhammadshafiq3242 2 жыл бұрын
Hello, Sir, I have a problem with trimming. Could you kindly help me? When I write the skip it does not run for trimming.
@bioinformaticscoach
@bioinformaticscoach Жыл бұрын
One-on-one coaching: calendly.com/bioinformaticscoach
@jesusgiovanimamani5671
@jesusgiovanimamani5671 4 ай бұрын
thank you so much. But i have some doubts, Im using MacOS terminal, and I failed installing the environment. yaml. Is the problem for the type of OS? Is this tutorial only for Linux command?
@bluefox_genshin
@bluefox_genshin 2 ай бұрын
Hi, I'm also experiencing the same thing. :(
@raselbarua4578
@raselbarua4578 3 жыл бұрын
Good job
@bioinformaticscoach
@bioinformaticscoach 3 жыл бұрын
Thanks
@josephwestley789
@josephwestley789 2 жыл бұрын
Hello, this is a great tutorial, thank you for putting it together! I am encountering an error when trying to run ./polish.sh. I am getting "Unable to access jarfile /bacterial-genomics-tutorial/apps/pilon.jar". Do you have any idea of what might be causing this error? Thanks in advance! EDIT: I am doing this is WSL1 by the way.
@bioinformaticscoach
@bioinformaticscoach 2 жыл бұрын
This pipeline was designed to run directly on the bash. If you are having this error, then you have to modify the script and put the path of the pilon jar file in it. Or check to make sure the pilon jar file has been downloaded
@josephwestley789
@josephwestley789 2 жыл бұрын
@@bioinformaticscoach EDIT: Thank you for your reply, I had not extracted the jar folder contents. I have done so now, and it appears to be running!
@muhammadshafiq3242
@muhammadshafiq3242 2 жыл бұрын
Very Nice tutorial. Can I use XFTP and XShell instead of anaconda to to such kind of anlaysis ? Thanks
@bioinformaticscoach
@bioinformaticscoach 2 жыл бұрын
Anaconda is used to install the tools. So it is important you install it. But if you have a server that has all the tools installed then you don't need to install it. XFTP and XShell are used to login to ssh servers. You use them if you are accessing the remote Server
@muhammadshafiq7141
@muhammadshafiq7141 2 жыл бұрын
@@bioinformaticscoach yes we have these servers providing by school, can i use these method's in xshell and xptp, which you used in this tutorial. I just watch the video.
@bioinformaticscoach
@bioinformaticscoach 2 жыл бұрын
@@muhammadshafiq7141 you may want to ask your system admin for this. I personally use a linux system so I use my terminal to login. I have also used mobaxterm on windows.
@manishvictor5293
@manishvictor5293 2 жыл бұрын
Dear Dr. Vappiah very nice GitHub page and description of the same in the video. I am having problem in the ./polish.sh the program runs fine but in the end it returns CP: cannot stat 'pilon_stage1.fasta': no such file or directory Cat:polishing_process/pilon_stage1.changes:No such file or directory. Please can you sort the error
@bioinformaticscoach
@bioinformaticscoach 2 жыл бұрын
Its likely you missed a step. Try to start the analysis from beginning
@syedahafsaali5208
@syedahafsaali5208 3 жыл бұрын
I have to work on project for this semester, and I want to do bioinformatics study on microbial data? I just need a direction..like what studies do I can do using bioinformatics techniques, or machine learning?
@bioinformaticscoach
@bioinformaticscoach 3 жыл бұрын
That will be nice. First you need to read some papers to get to know what kinds of bioinformatics studies are done in the field. And that is why I made this video on bacterial genome analysis. You can do a similar analysis with the pipeline I demonstrated and use the explanation I gave as a guide. For the machine learning, you need to first identify your area of interest and look at how machine learning is being applied in that area. There are lots of dataset available for you to use. Just identify your area of interest and you will be able to connect the dots. For example if I am interested in cancer studies, then I would look at how machine learning is used to predict cancer. Get to know what datasets are available and choose the one that works for you.
@syedahafsaali5208
@syedahafsaali5208 3 жыл бұрын
@@bioinformaticscoach thank you soo much. May God bless you
@bioinformaticscoach
@bioinformaticscoach 3 жыл бұрын
@@syedahafsaali5208 You are welcome. You can also share to those who need it.
@naveedkhan-fi6ux
@naveedkhan-fi6ux Жыл бұрын
Hi dear..... I was following your guideline for BRIGS but I can not able to compare my genomes because it shows the error of having big genome size, my specie genome size is 41mb so what other tool I can use for genome comparision
@bioinformaticscoach
@bioinformaticscoach Жыл бұрын
You can use Circos. Alternatively you can book a session with me and we can discuss further
@purvagohil2240
@purvagohil2240 2 жыл бұрын
Is this possible for single-end reads from ion torrent?
@bioinformaticscoach
@bioinformaticscoach 2 жыл бұрын
Yes. the procedure can be applied. This paper may help: dl.acm.org/doi/10.1145/3093338.3093362
@ldipotet
@ldipotet 2 жыл бұрын
Hi Vincent I was trying yout our pipeline and I found that in my scenario spades.py fails wiht the option --carefull so I had to ran it with the --isolate option and the result is the same like you when running it with --carefull option. I guess that it is due to spade.py software version or any other aspect in this environment BUT in my scenario with --carefull generate an execution in Standar mode and rise different exceptions related with some internal compression processes. I'm new in these kind of ecosystem so what determine the the version of intalled software? because in your environment.yaml you never indicate any version. In my case I do it in my docker file that installs first the installation of my platform and after that I tailor every specific thing that I need for every especific channel. thanks in advance and a hint on this would be appreciated ..
@bioinformaticscoach
@bioinformaticscoach 2 жыл бұрын
Hi @ldg, thanks for the message. For anaconda, if you don't specify a software version , it uses the most recent one. the --careful option works with spades 3.14 and upwards. So if you got the error , the its likely your spades version is lower than 3.14. Thanks for the suggestion as well
@ldipotet
@ldipotet 2 жыл бұрын
@@bioinformaticscoach The version that I am running: SPAdes genome assembler v3.15.3. The manual indicate about Isolate : "This option is not compatible with --only-error-correction or --careful options." Thank you so much for your answer and for the clarification about versions management in Anaconda.
@bioinformaticscoach
@bioinformaticscoach 2 жыл бұрын
@@ldipotet Yes its true about the compatibility. So you have to choose.
@rajneeshdadwal
@rajneeshdadwal Жыл бұрын
I wish to extract the draft genome from ragtag output how can i do the same??
@bioinformaticscoach
@bioinformaticscoach Жыл бұрын
I do this by using some python codes. You can modify the extract_reordered.py file and use it to extract the draft sequence. If you still have issues, then you can book a session with me.
@anindorahman2600
@anindorahman2600 2 жыл бұрын
Hlew sir i have query, Conda env create -f environment.yaml, This code isnt working. Its remaining in the solving environment state for 2-3 days but still dosent work i have done all the things but dosent work can you please help in this matter
@bioinformaticscoach
@bioinformaticscoach 2 жыл бұрын
Try updating your conda . If you still have issues, you can book a session with me and we can look at it.
@anindorahman2600
@anindorahman2600 2 жыл бұрын
When are you available sir? I want to book e session... It still dosent work
@bioinformaticscoach
@bioinformaticscoach 2 жыл бұрын
@@anindorahman2600 You can request a session here: clarity.fm/vincentappiah
@ayoajayi280
@ayoajayi280 3 жыл бұрын
Hello. I love this presentation. I am a beginner, can someone please quickly take me through the system requirement, how I can get or install Linux, how I can get it installed with some of the tools for genome analysis. Thanks.
@bioinformaticscoach
@bioinformaticscoach 3 жыл бұрын
First of all there are different flavors of linux (Ubuntu, CentOS,etc). They are all free to download and install. You can install in a virtual environment using the virtual box tool. Once you do that you can send a notice and we pick it up from there.
@ayoajayi280
@ayoajayi280 3 жыл бұрын
@@bioinformaticscoach Thanks. Please what is the minimum system requirement that will be ideal for analysis of bacterial genomes and installation of those tools
@bioinformaticscoach
@bioinformaticscoach 3 жыл бұрын
@@ayoajayi280 I will recommend a core i7 3.40GHz., 16GB RAM (32GB or higher will be great) and 1TB storage. I will recommend you install Linux as the main operating system instead of the virtual box approach.
@nmg1909
@nmg1909 3 жыл бұрын
@@bioinformaticscoach I love your presentation here. I have been searching for a bacteria population dataset for my research: "Biocorrosion detection in structures" I would appreciate if you can point me to a link where I can get the microbial organism population dataset. Thanks.
@bioinformaticscoach
@bioinformaticscoach 3 жыл бұрын
@@nmg1909 What I do is search for papers on bacterial genome analysis. Usually they show the list of genomes used and you can download them. Here is an example of a dataset: cge.cbs.dtu.dk/services/evolution_data.php We can discuss this further on my facebook page ( web.facebook.com/Bioinformatics-Coach-100614805459525 ) or twitter ( @BioinfoCoach )
@user-zl4rp4cj2o
@user-zl4rp4cj2o 2 ай бұрын
hello, i am having issues trying to create the env.yaml in conda even after updating conda ... it says- warning libmamba Problem type not implemented SOLVER_RULE_STRICT_REPO_PRIORITY I am using WSL
@user-zl4rp4cj2o
@user-zl4rp4cj2o 2 ай бұрын
Could not solve for environment specs The following packages are incompatible... its talking about bioperl
@user-zl4rp4cj2o
@user-zl4rp4cj2o 2 ай бұрын
i have installed perl but it's showing the same issue
@luisrendon5792
@luisrendon5792 2 жыл бұрын
Hello, I'm still with problems in the step: reorder_contigs.sh... I've repeated the pipeline several timer but I can't continue, how can I solve this? Thanks
@bioinformaticscoach
@bioinformaticscoach 2 жыл бұрын
What's the error message that is displayed?
@luisrendon5792
@luisrendon5792 2 жыл бұрын
@@bioinformaticscoach when I execute: ./reorder_contigs.sh I have not results, this what the result told me: FileNotFoundError: [Errno 2] No such file or directory: 'P7741_reordered/ragtag.scaffolds.fasta'
@bioinformaticscoach
@bioinformaticscoach 2 жыл бұрын
@@luisrendon5792 Its likely you missed one of the steps.I would like you to take your time and repeat them. Also, are you running the commands on Linux or MacOS?
@graphiomics
@graphiomics 2 жыл бұрын
I also got an error in this process when I used my own sequences. "ragtag.scaffolds.fasta" is not found. It shows something wrong with the reference genome. Your help is an emergence. Thanks a lot for this tutorial.
@sidratahir3645
@sidratahir3645 Жыл бұрын
@@bioinformaticscoach Traceback (most recent call last): File "/home/sar/bacterial-genomics-tutorial/extract_reordered.py", line 10, in allseq=[i for i in SeqIO.parse(fastafile,'fasta')] File "/home/sar/miniconda3/envs/bacterial-genomics-tutorial/lib/python3.10/site-packages/Bio/SeqIO/__init__.py", line 605, in parse return iterator_generator(handle) File "/home/sar/miniconda3/envs/bacterial-genomics-tutorial/lib/python3.10/site-packages/Bio/SeqIO/FastaIO.py", line 183, in __init__ super().__init__(source, mode="t", fmt="Fasta") File "/home/sar/miniconda3/envs/bacterial-genomics-tutorial/lib/python3.10/site-packages/Bio/SeqIO/Interfaces.py", line 48, in __init__ self.stream = open(source, "r" + mode) FileNotFoundError: [Errno 2] No such file or directory: 'P7741_reordered/ragtag.scaffolds.fasta' this error occured while using ./reorder_contigs.sh?
@nickalbbar
@nickalbbar 11 ай бұрын
First of all, thanks for this very very helpful video. I was following the pipe line, but got stuck in an error in the step where you run the reorder_contigs script. It starts to run, but then i got the following message """ Traceback (most recent call last): File "extract_reordered.py", line 13, in reordered=[i for i in allseq if 'RagTag' in i.id and ID in i.id][0] IndexError: list index out of range """ Then it doesnt generate the P7741.reordered.fasta. I've tried to repeat the process, but can't find a solution What should i do?
@bioinformaticscoach
@bioinformaticscoach 11 ай бұрын
Hi @nickalbbar. Are you running the pipeline on your own dataset or the data provided in the tutorial?
@nickalbbar
@nickalbbar 11 ай бұрын
@@bioinformaticscoach i'm using the dataset provided in the tutorial
@bioinformaticscoach
@bioinformaticscoach 11 ай бұрын
Hi @nickalbbar. I am investigating the issue. I will get back to you
@nickalbbar
@nickalbbar 11 ай бұрын
@@bioinformaticscoach OK!! Once again, thank you so much
@bioinformaticscoach
@bioinformaticscoach 11 ай бұрын
@@nickalbbar In the meantime you can watch this tutorial. I am sure it will be useful: kzbin.info/www/bejne/enzNeIitpaiHeqM
@abdullahijama690
@abdullahijama690 3 жыл бұрын
Thanks for your tutorial and I have learnt a lot from this tutorial. I have problem when I was doing bacterial-genomics-tutorial; when I want to create conda env create --quiet -f environment.yaml : Solving environment: ...working... failed ResolvePackageNotFound: - sratoolkit I am getting this message!
@bioinformaticscoach
@bioinformaticscoach 3 жыл бұрын
I have made modification to the yaml file. Please run the command again and let me know if it works
@bioinformaticscoach
@bioinformaticscoach 3 жыл бұрын
Please you have to download the updated yaml file or manually edit the yaml file and remove the line with the sratoolkit
@sheynjila2457
@sheynjila2457 Жыл бұрын
I am encountering the following problems when installing the python packages: bacterial-genomics-tutorial> conda env create --quiet -f environment.yaml Retrieving notices: ...working... done Collecting package metadata (repodata.json): ...working... done Solving environment: ...working... failed ResolvePackageNotFound: - porechop - mash - samtools - spades - perl-db-file - roary - sra-tools - perl-padwalker - sickle-trim - bwa - mafft - minimap2 - mummer
@bioinformaticscoach
@bioinformaticscoach Жыл бұрын
Try updating your conda before installing the packages
@sheynjila2457
@sheynjila2457 Жыл бұрын
@@bioinformaticscoach Thanks for the reply. I have updated conda but it has not changed.
Galaxy Tutorials | How to use Galaxy for Bioinformatics (Beginners)
29:16
Bioinformatics Coach
Рет қаралды 10 М.
Presentation - Intro to Genome Analysis (Christina Austin-Tse)
43:44
ClinGen Resource
Рет қаралды 10 М.
He sees meat everywhere 😄🥩
00:11
AngLova
Рет қаралды 10 МЛН
Неприятная Встреча На Мосту - Полярная звезда #shorts
00:59
Полярная звезда - Kuzey Yıldızı
Рет қаралды 7 МЛН
Я нашел кто меня пранкует!
00:51
Аришнев
Рет қаралды 3 МЛН
Fundamentals of Genome Assembly
51:10
Bioinformatics DotCa
Рет қаралды 58 М.
Rayan Chikhi | Hands-on introduction to pangenome graphs
41:05
Computational Genomics Summer Institute CGSI
Рет қаралды 2 М.
Bioinformatics Lecture 13: Genome Assembly
1:09:03
Nathaniel Jue
Рет қаралды 3,6 М.
Comparative Genomics
45:00
Murray Cox
Рет қаралды 2,2 М.
Bio305 2012 Lecture Bacterial Genome Annotation and Analysis
55:06
Introduction to bacterial genome sequencing
30:19
Mark Pallen
Рет қаралды 11 М.
Genome Assembly - Overview Part 1
9:29
Loren Launen
Рет қаралды 27 М.
What are reads, contigs and scaffold?
3:48
XploreBio
Рет қаралды 34 М.
Спутниковый телефон #обзор #товары
0:35
Product show
Рет қаралды 2 МЛН
iPhone 16 с инновационным аккумулятором
0:45
ÉЖИ АКСЁНОВ
Рет қаралды 2,1 МЛН