WGS Variant Calling: Variant calling with GATK - Part 1 | Detailed NGS Analysis Workflow

  Рет қаралды 45,478

Bioinformagician

Bioinformagician

Күн бұрын

Пікірлер
@fesehaabebe-akele2684
@fesehaabebe-akele2684 Жыл бұрын
Excellent work!! What makes this young lady standout is that she started with raw data and showed every step in clear terms. I have seen many "tutorials" and almost all started with munged data without showing what was done and how it was done leaving the beginner at a loss. Highly commendable work!
@enricoperez-kp8zc
@enricoperez-kp8zc Жыл бұрын
Oh my god! You are awesome. Tu eres la mejor. You are the best. I have found finally someone who can explain the theory of what needs to be done, then do it on the command line, and then explain the reason why to use the terms. The pace is good but I also agree and I would like to suggest to give a bit of time to see what is written in the Terminal. I learnt so much from you in the past two days for watching your videos on GATK variant calling and analysis than the past two years trying to do it by myself. You make writing a script so easy, with explanations along the way. Please, keep doing these types of tutorials, so we can all benefit from these great tutorials. They make my life easier. My next tutorial with you in on RNA-seq. I already have done STAR, which it will be nice if you can do a tutorial, but if you think the way to go is HISAT2 then I am with you. Any tutorial on HISAT-3N soon on your channel? also it will be nice to explain how to install HISAT-3N step by step. I am still in the limbo with this one.
@kubracelikbas6606
@kubracelikbas6606 Жыл бұрын
It would be very nice if you can also provide somatic variant analysis workflow
@taniadas3301
@taniadas3301 2 жыл бұрын
Was waiting for this tutorial since so long 🥺
@PsycheSnacks657
@PsycheSnacks657 2 жыл бұрын
Many many thanks!. You are helping a lot of folks out there. 😀
@jeetnanshi4357
@jeetnanshi4357 10 ай бұрын
Hi, great information content in the video, just one suggestion, could you use some pointer or animations at each step so that it become interactive and keep track where you are talking. Thanks
@tiagominuzzi5499
@tiagominuzzi5499 Жыл бұрын
Great video! Thank you very much. I'm starting to work on this field, this video helped me a lot.
@srushtikunsavalikar3307
@srushtikunsavalikar3307 8 ай бұрын
You are helping a lot ! Thank you so much 🤗
@emantoraih7700
@emantoraih7700 2 ай бұрын
You are the best. Is there possibility to make similar videos for somatic mutations? Thanks.
@mockondo3011
@mockondo3011 10 ай бұрын
Great tutorial, the best I’ve seen. I just had one quick question, what if I don’t have a known variants file to perform base quality recalibration? Should I just continue with the next steps using the deduplicated bam file? Hope you can help me with this, in any case, Thanks for the magnificent overview
@dongholee2144
@dongholee2144 3 ай бұрын
did you figure this out?
@Sjjeien
@Sjjeien 2 ай бұрын
Hy i was an aspiring bioinformatician. But by watching all those tutorials i feels that i would surely struggle to get a job in this competative field even though i learned python and r for this masters. I think its better as we are competing with cs grads than lifescience grads... So due to this uncertainity. Im about to take microbiology / biochemistry ... Not all our dream comes true. For me bioinfo was one such dream
@kumarparas5700
@kumarparas5700 9 ай бұрын
Excellent Video Ma'am 👍
@nikver1102
@nikver1102 Жыл бұрын
This is high quality content! Very helpful and crystal clear! Thanks so much! I was wondering... could we use fastp instead of fastqc ? since i think you can also trim with fastp if needed...is that correct? Again, thank you for this content, it is magical 😀
@Bioinformagician
@Bioinformagician Жыл бұрын
I haven't used fastp before. If it does the same thing as FastQC and allows you to perform trimming as well, then go for it.
@RaviKumar-wg9zm
@RaviKumar-wg9zm Жыл бұрын
Thank you Bioinformagician ma'am
@Dalibenamor-j8f
@Dalibenamor-j8f 17 күн бұрын
Thank you for this very instructive tutorial. Do we use exactly the same pipeline (packages and tools) to process WES data? thank you in advance
@harshasatuluri4540
@harshasatuluri4540 2 жыл бұрын
Thanks, madam, I am waiting for this tutorial!
@QueenieTsang-h6g
@QueenieTsang-h6g Жыл бұрын
Thanks for this very informative tutorial! I am trying to run the bwa index reference step, and it looks like it is stuck on the "[bwa_index] construct SA from BWT and Occ ..." step. How long does this step usually take to run?
@javiflaja4063
@javiflaja4063 Жыл бұрын
depends on the computational power you have and the size of the reference genome... For a human reference genome, it usually takes between 30min to 1 hr
@BilalAhmad-gb7ui
@BilalAhmad-gb7ui 2 жыл бұрын
It would be great if you could please make another part to make the publication ready graphs and stuff like annotations from the raw vcf file.
@Bioinformagician
@Bioinformagician 2 жыл бұрын
Currently working on it!
@kavoosmomeni4165
@kavoosmomeni4165 9 ай бұрын
Very helpfull and informative, thank you!
@ayeshatariq8094
@ayeshatariq8094 2 жыл бұрын
Hi, thank you so much. Can you please make a video about how to compare two or more vcf files? e.g I want to extract variants that are different in sample vs group
@Bioinformagician
@Bioinformagician 2 жыл бұрын
Thanks for the suggestion, will surely plan a video on it.
@isadoramachadoghilardi3168
@isadoramachadoghilardi3168 9 ай бұрын
Thank you very much for this wonderful explanation!! I have a question: can I identify CNVs in exome sequencing? I love your videos!
@RaviKumar-wg9zm
@RaviKumar-wg9zm Жыл бұрын
You are doing a great job ma'am. That was really helpful 😊👍
@andreslavore3928
@andreslavore3928 5 ай бұрын
Excellent video!! how do you run GATK for poliploid data which variant calling was made using Freebayes? how do you extract several different haplotypes from this kind of data? Thanks
@johirislam8174
@johirislam8174 Жыл бұрын
nice . SO do you have a complete pipeline of WGS data analysis from beginning to end . I mean from sequencing fastq data to variant calling??
@ОПривет-ъ2ъ
@ОПривет-ъ2ъ Жыл бұрын
Amazing tutorial! Thank so much for your work!
@kajalpanchal8239
@kajalpanchal8239 2 жыл бұрын
hey! you are really helping a lot ! Thaaaank you so much!
@moluscosm
@moluscosm Жыл бұрын
I have a question, does this work demo needs can be run in a 16gb ram computer??'
@dorrarjaibi
@dorrarjaibi Жыл бұрын
Thank u foe the explanation ❤ i have a question , if i m using GATK to detect snps in virus , the.BQSR from GAtK resources bundle i use the one for human or for the virus , cause i didn't find anything on the GATK web site
@MuhammadUsama-u2s
@MuhammadUsama-u2s 12 сағат бұрын
this tutorial is very nice. I just have one question that my vcf file does not contains the dbsnp id. So is it normal or i am doing any mistake?
@ayushsafar6289
@ayushsafar6289 2 жыл бұрын
you are really helping a lot ! Thaaaank you so much!
@AnastasiaPetukhova-o9v
@AnastasiaPetukhova-o9v Жыл бұрын
Hi! Could you please make another video on how to call germline CNV and CV or provide me a good tutorial on this topic?
@Sumit516
@Sumit516 4 ай бұрын
Can this pipeline be used for variant calling in bacterial genomes? Bacterial genome is haploid as opposed to eukaryotes
@anikeshbanik345
@anikeshbanik345 Ай бұрын
How much time does the alignment to reference process takes? I am learning it to research about somatic variants in breast cancer cell line.
@ryanwelch2831
@ryanwelch2831 5 ай бұрын
How long did the gatk BaseRecalibrator algorithm take to make the table? Thank you!
@Subhash_mahamkali
@Subhash_mahamkali 5 ай бұрын
does base Qulity recalibration step is very important? beacuse, I am using this pipeline on WGS of sorghum data set. Now I have called the variants without this step ( filtering is done). I used the same vcf file for BQSR step and for some odd reason.. in my .g.vcf file there are no SNPs at all.....
@himanisareen205
@himanisareen205 Жыл бұрын
do you have any video that informs us about the prerequisites that you mentioned like the linux coding and all?
@ramachandran8106
@ramachandran8106 2 жыл бұрын
Hi, could you upload GWAS tutorial videos please.......
@moluscosm
@moluscosm Жыл бұрын
does anywone else gets failed: Operation timed out. when trying to download the read ... ???
@ДмитрийХолмс-щ9р
@ДмитрийХолмс-щ9р Жыл бұрын
Thank you dear 😊
@nourlarifi1689
@nourlarifi1689 Жыл бұрын
thank you again for you videos. Could we apply this workflow on scRNAseq data ?
@kathy_kath
@kathy_kath Жыл бұрын
Hi. Thanks for the tutorial, it helps me a lot! I have a question, how many GB is the resulting .sam file?
@stemcell1167
@stemcell1167 Жыл бұрын
Hi BQSR is too slow any idea how to speed it up? I am using GATK's latest version
@Harshraj19988
@Harshraj19988 Жыл бұрын
Perfect. Good job.
@rajeshsingh-xv7wy
@rajeshsingh-xv7wy Жыл бұрын
Ma'am when are you gonna upload the same stuffs for somatic mutation calling?
@hassanchoudhary9594
@hassanchoudhary9594 6 ай бұрын
how can we set ploidy argument in GATk , can you share tge script fot that. thanks
@lauras7670
@lauras7670 Жыл бұрын
Hi, thanks so much for your video. When I run HaplotypeCaller I get the error "Unable to trim uncertain bases without flow order information" and I cannot find anything on the GATK website. My .bam files are validated... I was wondering if you happen to know how to solve this problem. Thanks!
@tarkkrloglu2406
@tarkkrloglu2406 Жыл бұрын
Hİ, thank you for tutorials. I have one question. Is there any difference wes and wgs data analysis? Can I use this workflow in wes data analysis?
@Bioinformagician
@Bioinformagician Жыл бұрын
There are some tweaks you'd have to do for WES data. Check out this thread: gatk.broadinstitute.org/hc/en-us/community/posts/4411453286811-Is-there-different-pipelines-between-WGS-variant-calling-and-WES-variant-calling-
@miannuman8914
@miannuman8914 2 жыл бұрын
Hello, you are doing a great job, but putting the command in the terminal do it a little bit slow so we can get it easily. hope you understand my point.
@sakshimehta5467
@sakshimehta5467 7 ай бұрын
hi, could you please tell how do we remove genome duplication in our sample?
@leearmstrong1581
@leearmstrong1581 2 жыл бұрын
Could you make a video on using the limma package for gpr files from GEO? It would be so helpful. Or do you do private tuition? Thanks a million!
@taniadas3301
@taniadas3301 2 жыл бұрын
Even I would like to attend if she conducts tuition
@Bioinformagician
@Bioinformagician 2 жыл бұрын
I will surely consider making a video using limma package and no, I do not provide private or any type of tutoring. Thanks!
@pratibhagour4336
@pratibhagour4336 2 жыл бұрын
Thanks a ton for putting up this video.. Could you please make videos on bi-sulfite sequencing data analysis and also chip-seq data analysis? Thanks in advance :)
@Bioinformagician
@Bioinformagician 2 жыл бұрын
Sure, definitely have plans to make videos covering these topics :)
@silvereyes000
@silvereyes000 4 ай бұрын
From where did she get the reference genome? She didn't mention anything about it. Anyone else knows?
@Grzegorz-f1b
@Grzegorz-f1b 6 ай бұрын
❤❤❤ Thanks You for This learn 👌💪🙏
@beatricefulton4581
@beatricefulton4581 Жыл бұрын
This is really helpful thank you! Your videos are amazing. Unfortunately, I am having trouble getting CreateSequenceDictionary to work. saying :Neither file nor parent directory exist
@nourlarifi1689
@nourlarifi1689 Жыл бұрын
I have the same issue how did u solve it plz ?
@diegopolancoalonso4507
@diegopolancoalonso4507 Жыл бұрын
me too, did you find a way to solve it?@@nourlarifi1689
@vinaydeep26
@vinaydeep26 Жыл бұрын
GATK HaplotypeCaller runs very slow. any tips to make it work fast?
@kubracelikbas6606
@kubracelikbas6606 Жыл бұрын
Thanks so much 🌸
@desaishailesh3527
@desaishailesh3527 Жыл бұрын
Haplotype caller is taking more time, is there problem, i am using GATK 4.0, its going through each chromosome and position, i have 32 GB system though, its more than 3 hours and still have not finished process of till chromosome 2 of one whole genome
@NishantShekhar-k5m
@NishantShekhar-k5m Жыл бұрын
Could you please tell... Is 16 gb ram is enough for running gatk?
@riyakokate9333
@riyakokate9333 2 жыл бұрын
How to add executable to the path? I am using Windows for running this. Would you please tell me how to use Docker to run this commands?
@kabongontumba9492
@kabongontumba9492 8 ай бұрын
Amazing amazing
@UmerBaig117
@UmerBaig117 2 жыл бұрын
At 37:27, you did not use the $ sign with the variable data "{data}/recal_data.table". will this variable work without using the $ sign? Can you tell the difference between using HaplotypeCaller for One sample Nad Using HaplotypeCaller with -ERC gVCF mode. And when to use which one?
@Bioinformagician
@Bioinformagician 2 жыл бұрын
GATK recommends to run Haplotype caller in GVCF mode when you are running more than one sample. With just one sample, running HaplotypeCaller as normal should be sufficient.
@Hypertyz
@Hypertyz Жыл бұрын
Thanks a lot :D
@GüllüElifÖzdemir
@GüllüElifÖzdemir Жыл бұрын
Hello, I am having a serioud problem. I am performed your video content from start to finish. However, when performing fastqc step, I constantly get an error that the middle line didn't start with + for the SRR062634_2_filt.fast.gz file. Then, instead of performing this step on the command line, I did this step manually in the fastqc tool, the proces was completed but this time it gave an error in the base sequence quality of the SRR062634_1 file. So I couldn't get the same fastqc results as you. I continued to perform the steps, but after alignment, I could not get any result in the samtools view and flagstat steps. Could you please help me solve this problem? I would also like to thank you for preparing such a nice content.
@naveedkhan-fi6ux
@naveedkhan-fi6ux 2 жыл бұрын
It is great work.... its great to watch..... would you please also make a video for 3000 rice genome project and to explain how we can perform SNP variation/haplotype variation within the 3000 rice lines (not just compare two one line with the reference). I also want to perform haplotype analysis for my target gene using 3000 Rice genome project and want to check the haplotype diversity within 3000 rice lines. Anyone have some idea on it..... i would appreciate if someone can help me out
@mananchelvan157
@mananchelvan157 Жыл бұрын
Hi, any updates on doing SNP analysis on 3000 lines?
@naveedkhan-fi6ux
@naveedkhan-fi6ux Жыл бұрын
@@mananchelvan157 no dear, still no update. Maybe this is not her area of interest. I also really need this analysis
@ayushsafar6289
@ayushsafar6289 2 жыл бұрын
what about SRR __ 2 ?
@RuqaiyaTasneem-z5w
@RuqaiyaTasneem-z5w 6 ай бұрын
your bases at the end are of low quality so why not trim it ?
@ayushsafar6289
@ayushsafar6289 2 жыл бұрын
could you help about ancestry
@nishapaudel5572
@nishapaudel5572 Ай бұрын
Getting started with GATK4 Follow. GATK - properly pronounced "Gee-ay-tee-kay" (/dʒi•eɪ•ti•keɪ/) and not "Gat-kay" (/ɡæt•keɪ/) - stands for Genome Analysis Toolkit.
@MsZhang666
@MsZhang666 2 жыл бұрын
really really really helpful !!!!! I succeed !!! hhhhh😘
@wajidiqbalwajidiqbal13
@wajidiqbalwajidiqbal13 2 жыл бұрын
It would be great if you could please make another part to make the publication ready graphs and stuff like annotations from the raw vcf file
@Bioinformagician
@Bioinformagician 2 жыл бұрын
Thanks for suggestion!
NGS Data Analysis 101: RNA-Seq, WGS, and more - #ResearchersAtWork Webinar Series
33:29
Applied Biological Materials - abm
Рет қаралды 84 М.
Lamborghini vs Smoke 😱
00:38
Topper Guild
Рет қаралды 37 МЛН
Twin Telepathy Challenge!
00:23
Stokes Twins
Рет қаралды 129 МЛН
How to Fight a Gross Man 😡
00:19
Alan Chikin Chow
Рет қаралды 19 МЛН
The IMPOSSIBLE Puzzle..
00:55
Stokes Twins
Рет қаралды 192 МЛН
Somatic Variant Calling with Mutect2 | GATK Best Practices Tutorial
37:30
Bioinformagician
Рет қаралды 1,8 М.
Sequencing, Variant Calling, and Cancer Genomics
50:20
Genomics Education Partnership
Рет қаралды 5 М.
Understanding File Formats in Bioinformatics: VCF and gVCF
25:40
Bioinformagician
Рет қаралды 13 М.
Whole genome sequencing: From sample to report
3:49
Genomics Education Programme
Рет қаралды 27 М.
How To Understand Raw NGS Data | Zymo Research
27:48
Zymo Research
Рет қаралды 10 М.
Methods in genomic variant calling
1:08:40
European Bioinformatics Institute - EMBL-EBI
Рет қаралды 12 М.
Bioinformatics Pipelines for Beginners
44:46
OGGY INFORMATICS
Рет қаралды 11 М.
Lamborghini vs Smoke 😱
00:38
Topper Guild
Рет қаралды 37 МЛН