2. Local Alignment (BLAST) and Statistics

  Рет қаралды 124,965

MIT OpenCourseWare

MIT OpenCourseWare

Күн бұрын

MIT 7.91J Foundations of Computational and Systems Biology, Spring 2014
View the complete course: ocw.mit.edu/7-9...
Instructor: Christopher Burge
In this lecture, Professor Burge reviews classical and next-generation sequencing. He then introduces local alignment (BLAST) and some of the associated statistics.
License: Creative Commons BY-NC-SA
More information at ocw.mit.edu/terms
More courses at ocw.mit.edu

Пікірлер: 49
@sfmambero
@sfmambero 5 жыл бұрын
56:45 statistics of alignment
@DeanJordan-qp8kw
@DeanJordan-qp8kw 7 ай бұрын
Timestamp: 33:01 Goal by End of Session: 33:01 edit: complete! :) coming back in one hour to finish the lecture
@hgz11
@hgz11 2 жыл бұрын
good question by the young lady...helped me clarify things in my head.
@sfmambero
@sfmambero 5 жыл бұрын
31:30 Local alignment
@coltonhenry3636
@coltonhenry3636 Жыл бұрын
The girl from 43:14 with her hand up is so f☠️ing relatable 😂😂
@phartatmisassa5035
@phartatmisassa5035 9 жыл бұрын
I was thinking about ways to optimize the matching algorithm in assembly, and this is what I came up with. Let Mx, Nx, My, and Ny be CPU registers of 64bits each. Recall base pairs A-T, G-C, and (A,G) are purines, and (T,C) are pyrimidines. Let registers Mx and Nx represent whether the base at bit location i is an A-T pair as 0, or a G-C pair as 1 Let registers My and Ny represent whether the base at bit location i is a purine (A or G) as 0, or a pyrimidine (T or G) as 1 We can now represent a base at bits i of registers Mx and My, (ie both regs Mx and My are required to represent a sequence, the N registers represent another sequence we are comparing to): Let sequence M be: ACTCCGAT... (just to keep things short) Reg Mx would equal: 01011100... Reg My would equal: 01111001... I hope that makes sense, if not just ask and I will try to clarify Let sequence N be: GCTAAGAT... Reg Nx would equal: 11000100... Reg Ny would equal: 01100001... Now the fun part, you can compare a sequence (or sub-sequence) of 64 bases at once with these instructions (MMX might make things faster if you can do bitwise intructions with them, haven't got there yet in my studies): This is the formula I worked out to get the matches in a sequence of 1's for match, 0 for mismatch at that base/bit position: (~ is not, ^ is xor, | is or) ~ ( (Mx ^ Nx) | ( My ^ Ny) ) that would be... ----------- xor r8, r9 ;; 1 at bit i if miss-match, r8 = Sx = Mx ^ Nx xor r10, r11;; 1 at bit i if miss-match, r10 = Sy = My ^ Ny or r8, r10 ;; 1 at bit i if miss-match in either Sx, or Sy, or both not rax, r8 ;; rax now has 1 in bit i if there was a match at bit/base i ------------ ...in pseudo x64 assembly if SSE4.2 is available on the CPU, then the popcnt instruction is available, it returns the number of 1's in a register, something like: ------------ popcnt rbx, rax ;; would put the number of matches in Seq M and N ------------- So you can then divide the number of matches (in rbx) divided by 64 times 100 to get the percentage of matching bases in Seq M and Seq N, you can also do bit scanning or masking to evaluate exactly which bases matched, that is represented in rax already. You can "clap" together 64 bases at once using this method, though it would take some preprocessing of a sequence represented as chars (bytes in ASCII). You can also speed it up by parallelizing over many sub-sequences on different cores, or maybe use GPGPU with OpenGL/CL. The same thing can be done in C, just use unsigned long integers, or uint64_t, don't use signed, it get confusing and the two's compliment will most probably screw things up.
@5050view
@5050view 7 жыл бұрын
@Rohan Patgiri: What's your problem dude? People are here to learn and discuss the lecture. You are not in the gossip section!
@ProfBeckmann
@ProfBeckmann 3 жыл бұрын
Is it much faster, coding it in assembly?
@Mahi-nw5vh
@Mahi-nw5vh 2 жыл бұрын
Be my sensei
@ProfBeckmann
@ProfBeckmann 3 жыл бұрын
Thanks! Awesome lecture. Nice to see the dideoxy ntp structures shown - super helpful! I would like to hear more about the errors prone to each methodology. would also be nice to see a read length table for each technology. Can you make a video explaining how to setup and sort the barcoding reactions referenced? How long are the barcodes?
@josephmargaryan
@josephmargaryan 3 ай бұрын
But the BLAST algorithm is more complex than this right. It uses k-mers to find exact matches, and even then, it uses hash lookups to extend the seeds. I would have preferred him to talk about the seed extension algorithm, such as how and when the seed extension stops. Does it consider match, mismatch, and gap penalties during the extension algorithm, and does it keep extended even if no match is found? What about the affine gap model? When does it make sense to open a gap even though it incurs a higher penalty than extending it?
@coconutade111
@coconutade111 8 жыл бұрын
You guys are awesome! thank you for providing this class!
@rabiafidan1502
@rabiafidan1502 5 жыл бұрын
In llumina sequencing, we add four dNTPs and one of them is attached then we reverse termination and add four dNTPs again. Think about a situation where there are C repeats for example CCCCACCCTCCCCCGCCCCCG now, in this situation concentration of dGTPs will be lower than other dNTPs. Doesn't this increase the risk of error on Cs and misincorporation? each time you need dGTP, its concentration is lower.
@luisheribertogarciaislas1785
@luisheribertogarciaislas1785 7 жыл бұрын
Thank you for sharing! I'm new on bioinformatics and I find it really interesting!
@JRush374
@JRush374 7 жыл бұрын
Luis Heriberto Garcia Islas I just got done with this bioinformatics lecture series from UC Davis and it was really good. Fundamental Algorithms in Bioinformatics: kzbin.info/aero/PL_w_qWAQZtAbh8bHfzXYpdnVKCGCDmmoL
@luisheribertogarciaislas1785
@luisheribertogarciaislas1785 7 жыл бұрын
Thank you Josh! I'll start to take a look at it right away! Regards!
@EDUARDO12348
@EDUARDO12348 9 жыл бұрын
Break the query into 2-3 letters and find where the best matches lie. Then evaluate based on best fits
@kishorkumarsarker8270
@kishorkumarsarker8270 4 жыл бұрын
Could you please help to find out previous lecture of it. Means the first lecture.
@Ilya_Sss
@Ilya_Sss 4 жыл бұрын
Here are the first lecture kzbin.info/www/bejne/onvdqpV7jdJ8oJI
@sumitkumar-el3kc
@sumitkumar-el3kc 4 жыл бұрын
1:01:39 does Pi and Rj being 1/4 means we're assuming there's the equal distribution of all four bases in query as well as subject?
@NirvanSengupta
@NirvanSengupta 10 жыл бұрын
Good overview of traditional Sanger sequencing and next generation sequencing technologies like roche 454, illumina, ...
@soheilh
@soheilh 4 жыл бұрын
Thank you
@Yzhang250
@Yzhang250 6 жыл бұрын
When the lady asked can we do something better than m times n. I was thinking about KMP. I know KMP is for exact match, but can we make some modification and make it work?
@marwanmohamed6575
@marwanmohamed6575 6 жыл бұрын
what kind of statistics i should learn to understand this ... i know nothing about statistics
@mitocw
@mitocw 6 жыл бұрын
One of these two prerequisites for this course might help: 18.440 Probability and Random Variables, 6.041 Probabilistic Systems Analysis and Applied Probability. See the course on MIT OpenCourseWare for more details: ocw.mit.edu/7-91JS14. Best wishes on your studies!
@bikidas6690
@bikidas6690 4 жыл бұрын
Any book for bioinformatics?
@Joeythegoats
@Joeythegoats 3 жыл бұрын
"bioinformatics and functional genomics" is best so far
@sumitkumar-el3kc
@sumitkumar-el3kc 4 жыл бұрын
How is it O(mn) when the last bases of the query are never touching the first bases of the subject? Am I missing something??
@sumitkumar-el3kc
@sumitkumar-el3kc 4 жыл бұрын
Or do we also go right to left in the alignment after left to right? Please help anyone??
@turnerburchard5272
@turnerburchard5272 4 жыл бұрын
@@sumitkumar-el3kc O notation isn't that precise. You could say it is something like O(mn-n) but at that point the difference is inconsequential so we estimate it simply to O(mn). In practice, the algorithm won't take exactly mn computations, but it will at least be on the same scale so estimation is easy.
@sumitkumar-el3kc
@sumitkumar-el3kc 4 жыл бұрын
@@turnerburchard5272 understood, thank you, I'm new to these concepts.
@MrAntihumanism
@MrAntihumanism 10 жыл бұрын
Very good to listen to.
@phartatmisassa5035
@phartatmisassa5035 9 жыл бұрын
Thank you!
@maheribrahim685
@maheribrahim685 3 жыл бұрын
i'm a biology major with no cs background and very little stat and math knowledge, is it okay if i get nothing at all, and how to change that?!
@josephmargaryan
@josephmargaryan 3 ай бұрын
What do you mean by you get nothing at all? Do you not understand anything at all
@michaelchoi5880
@michaelchoi5880 7 жыл бұрын
Thank you very much
@AhmedIbrahim-co9rw
@AhmedIbrahim-co9rw 5 жыл бұрын
what book did the professor use in his lectures ? are there any textbook or slides available to facilitate studying ? thanks
@mitocw
@mitocw 5 жыл бұрын
Visit the course on MIT OpenCourseWare to see all the materials (readings, lectures notes, assignments, projects) available for the course at: ocw.mit.edu/7-91JS14. Best wishes on your studies!
@shubhrabhattacharya223
@shubhrabhattacharya223 6 жыл бұрын
Where can I get the pdf of the book he referred to?
@mitocw
@mitocw 6 жыл бұрын
Visit the course on MIT OpenCourseWare for the materials at: ocw.mit.edu/7-91JS14.
@gurpchirp
@gurpchirp 2 жыл бұрын
yo i am fucking lost. can someone list prereqs?
@josephmargaryan
@josephmargaryan 3 ай бұрын
I am taking this course right now. It says it's intended for people with no background in biology. In one month, I will be at the oral exam, where I will have to choose one out of 8 topics for me to have 20 minutes of preparatory time before going in and giving a presentation. Oh snap
@ZortLF2
@ZortLF2 6 жыл бұрын
What a dismally bad lecture. Blasted through the biology stuff dropping jargon in the first lecture of the semester, wasted time splitting hairs for the algorithm stuff. Why spend 10 minutes talking about lambda just to conclude with saying it's not important???
@josephmargaryan
@josephmargaryan 3 ай бұрын
It is important. It scales the score
@solomongrundy2211
@solomongrundy2211 5 жыл бұрын
im a nigga
@tartanhandbag
@tartanhandbag 8 жыл бұрын
while i'm sure this presenter has great knowledge, his presentation style is so dull and dry that even though the material is mind blowing, i have once again failed to make it all the way through the video on my 3rd attempt.
@ProfBeckmann
@ProfBeckmann 3 жыл бұрын
just up the speed of playback bruh. 😎
@fahlihamza7937
@fahlihamza7937 8 жыл бұрын
address address searched sddcdse the same time as the
3. Global Alignment of Protein Sequences (NW, SW, PAM, BLOSUM)
1:20:01
MIT OpenCourseWare
Рет қаралды 46 М.
MIT Introduction to Deep Learning | 6.S191
1:09:58
Alexander Amini
Рет қаралды 858 М.
Гениальное изобретение из обычного стаканчика!
00:31
Лютая физика | Олимпиадная физика
Рет қаралды 4,8 МЛН
Support each other🤝
00:31
ISSEI / いっせい
Рет қаралды 81 МЛН
VIP ACCESS
00:47
Natan por Aí
Рет қаралды 30 МЛН
1% vs 100% #beatbox #tiktok
01:10
BeatboxJCOP
Рет қаралды 67 МЛН
26. Chernobyl - How It Happened
54:24
MIT OpenCourseWare
Рет қаралды 2,9 МЛН
Daniel Everett, "Homo Erectus and the Invention of Human Language"
1:10:43
Harvard Science Book Talks and Research Lectures
Рет қаралды 577 М.
Bill Gates Reveals What About Trump ‘Impressed’ Him | WSJ
11:38
The Wall Street Journal
Рет қаралды 537 М.
Is It Possible For Trump To Actually Buy Greenland?
10:55
Two MIT Professors ACCIDENTALLY discovered this simple SECRET TO LEARNING
5:10
20. Savings
1:14:29
MIT OpenCourseWare
Рет қаралды 1,1 МЛН
MIT 6.S191: Reinforcement Learning
1:00:19
Alexander Amini
Рет қаралды 72 М.
7. ChIP-seq Analysis; DNA-protein Interactions
1:21:28
MIT OpenCourseWare
Рет қаралды 63 М.
Гениальное изобретение из обычного стаканчика!
00:31
Лютая физика | Олимпиадная физика
Рет қаралды 4,8 МЛН