superb explanation. Just started my masters in Computational biology. Thanks
@amy-cv2nwКүн бұрын
Thank you so much for the simple explanation, this was so helpful!
@Cat-ct9hn Жыл бұрын
Thank you very much for your video! I'm taking a bioinformatics course and this is the clearest explanation I've come across so far, excellent material.
@niemasd Жыл бұрын
Thank you for the kind words! Glad you're enjoying it 😄
@YaBoiMoustafa Жыл бұрын
I’m studying for a bioinformatics exam and reversing the BWT is almost guaranteed to be on the final. I was scared of the question but this video made it super simple. I’m hoping this actually works with any string because if it does then that’s amazing!!!! Thank you
@YaBoiMoustafa Жыл бұрын
Just tested it with another string, it worked! I’m surprised it’s so easy. Thank you!
@niemasd Жыл бұрын
@@YaBoiMoustafa No problem, glad the video helped! Good luck on the exam 😄
@RLLLx8 ай бұрын
This is impressive! How do people come up with this kind of algorithm?!
@onescYT4 ай бұрын
right? we are standing on the shoulders of giants
@niemasd2 жыл бұрын
Someone had asked how we can possibly sort the Last column in linear time to get the First column, but I can't seem to find the comment anymore. Here's the explanation: All you need to do is keep track of the counts of each letter in your alphabet, which you can do in linear time. Here's a Python implementation, which would be O(n + length of alphabet), and given that n >> "length of alphabet" (not in this simple example, but in reality), it reduces to O(n): pastebin.com/DMrL3fKU Note that ord(c) converts a character to its ASCII integer, and chr(i) converts an ASCII integer to the corresponding character. Here's a link that lets you step through this code in the browser: pythontutor.com/render.html#code=bwt%20%3D%20%22ANNB%24AA%22%0Acount%20%3D%20%5B0%20for%20_%20in%20range%28256%29%5D%0Afor%20c%20in%20bwt%3A%0A%20%20%20%20count%5Bord%28c%29%5D%20%2B%3D%201%0Afor%20i%20in%20range%28256%29%3A%0A%20%20%20%20for%20_%20in%20range%28count%5Bi%5D%29%3A%0A%20%20%20%20%20%20%20%20print%28chr%28i%29%29&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false
@Xn_Fdez6 ай бұрын
Gracias por el video máquina! Muy buena explicación.
@julianbiju1761 Жыл бұрын
Question: Thank you for making this video, it really has helped! A quick question: As you have subscripted the numbers, we can identify that B_1 is followed by A_3. However, programmatically, the burrows-wheeler encoding given to us will not have these subscripts(I.e. perhaps we receive an encoding ABBBKAAAAAA where we can't distinguish between any of the duplicate characters). Given your example, assuming we observe the first column corresponding to B_1 we see A_3, we will in reality have no idea whether that is A_1, A_2, or A_3. How do we reconcile this in our code?
@niemasd Жыл бұрын
No problem; I'm glad you enjoyed the video! Great question! You just go down the first and the last column and subscript each element like I do at time 1:52. I didn't go into it in this video, but the i-th occurrence of symbol x in the first column *is exactly* the i-th occurrence of symbol x in the last column. So even though the BWT you're given doesn't have the subscripted numbers, you just fill them in yourself (as I do in the last column in the video, which *is* the BWT)
@mlemImlem3 ай бұрын
very good explanation thank you for teaching this
@nottofind7 ай бұрын
What is the DOI of the original Burrows Wheeler Transformation Paper? I can't find it :/ Great explanation though!
@niemasd7 ай бұрын
Great question! BWT was originally not intended for this task (it was originally intended for data compression), and the original BWT paper can be found here: www.eecs.harvard.edu/~michaelm/CS222/burrows-wheeler.pdf *To my knowledge*, this is the first paper that applies BWT to the "match a bunch of short strings to a single long string" problem in Bioinformatics: doi.org/10.1186%2Fgb-2009-10-3-r25 Also, *to my knowledge*, this is the first paper that applies BWT to genomic data: doi.org/10.1089/cmb.2005.12.943