Advanced Data Structures: Inverting the BWT

  Рет қаралды 24,059

Niema Moshiri

Niema Moshiri

Күн бұрын

Пікірлер: 18
@arpitmathur2933
@arpitmathur2933 3 жыл бұрын
superb explanation. Just started my masters in Computational biology. Thanks
@amy-cv2nw
@amy-cv2nw Күн бұрын
Thank you so much for the simple explanation, this was so helpful!
@Cat-ct9hn
@Cat-ct9hn Жыл бұрын
Thank you very much for your video! I'm taking a bioinformatics course and this is the clearest explanation I've come across so far, excellent material.
@niemasd
@niemasd Жыл бұрын
Thank you for the kind words! Glad you're enjoying it 😄
@YaBoiMoustafa
@YaBoiMoustafa Жыл бұрын
I’m studying for a bioinformatics exam and reversing the BWT is almost guaranteed to be on the final. I was scared of the question but this video made it super simple. I’m hoping this actually works with any string because if it does then that’s amazing!!!! Thank you
@YaBoiMoustafa
@YaBoiMoustafa Жыл бұрын
Just tested it with another string, it worked! I’m surprised it’s so easy. Thank you!
@niemasd
@niemasd Жыл бұрын
@@YaBoiMoustafa No problem, glad the video helped! Good luck on the exam 😄
@RLLLx
@RLLLx 8 ай бұрын
This is impressive! How do people come up with this kind of algorithm?!
@onescYT
@onescYT 4 ай бұрын
right? we are standing on the shoulders of giants
@niemasd
@niemasd 2 жыл бұрын
Someone had asked how we can possibly sort the Last column in linear time to get the First column, but I can't seem to find the comment anymore. Here's the explanation: All you need to do is keep track of the counts of each letter in your alphabet, which you can do in linear time. Here's a Python implementation, which would be O(n + length of alphabet), and given that n >> "length of alphabet" (not in this simple example, but in reality), it reduces to O(n): pastebin.com/DMrL3fKU Note that ord(c) converts a character to its ASCII integer, and chr(i) converts an ASCII integer to the corresponding character. Here's a link that lets you step through this code in the browser: pythontutor.com/render.html#code=bwt%20%3D%20%22ANNB%24AA%22%0Acount%20%3D%20%5B0%20for%20_%20in%20range%28256%29%5D%0Afor%20c%20in%20bwt%3A%0A%20%20%20%20count%5Bord%28c%29%5D%20%2B%3D%201%0Afor%20i%20in%20range%28256%29%3A%0A%20%20%20%20for%20_%20in%20range%28count%5Bi%5D%29%3A%0A%20%20%20%20%20%20%20%20print%28chr%28i%29%29&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false
@Xn_Fdez
@Xn_Fdez 6 ай бұрын
Gracias por el video máquina! Muy buena explicación.
@julianbiju1761
@julianbiju1761 Жыл бұрын
Question: Thank you for making this video, it really has helped! A quick question: As you have subscripted the numbers, we can identify that B_1 is followed by A_3. However, programmatically, the burrows-wheeler encoding given to us will not have these subscripts(I.e. perhaps we receive an encoding ABBBKAAAAAA where we can't distinguish between any of the duplicate characters). Given your example, assuming we observe the first column corresponding to B_1 we see A_3, we will in reality have no idea whether that is A_1, A_2, or A_3. How do we reconcile this in our code?
@niemasd
@niemasd Жыл бұрын
No problem; I'm glad you enjoyed the video! Great question! You just go down the first and the last column and subscript each element like I do at time 1:52. I didn't go into it in this video, but the i-th occurrence of symbol x in the first column *is exactly* the i-th occurrence of symbol x in the last column. So even though the BWT you're given doesn't have the subscripted numbers, you just fill them in yourself (as I do in the last column in the video, which *is* the BWT)
@mlemImlem
@mlemImlem 3 ай бұрын
very good explanation thank you for teaching this
@nottofind
@nottofind 7 ай бұрын
What is the DOI of the original Burrows Wheeler Transformation Paper? I can't find it :/ Great explanation though!
@niemasd
@niemasd 7 ай бұрын
Great question! BWT was originally not intended for this task (it was originally intended for data compression), and the original BWT paper can be found here: www.eecs.harvard.edu/~michaelm/CS222/burrows-wheeler.pdf *To my knowledge*, this is the first paper that applies BWT to the "match a bunch of short strings to a single long string" problem in Bioinformatics: doi.org/10.1186%2Fgb-2009-10-3-r25 Also, *to my knowledge*, this is the first paper that applies BWT to genomic data: doi.org/10.1089/cmb.2005.12.943
@kaushalrao101
@kaushalrao101 4 жыл бұрын
very helpful, thanks!
@gedgod111
@gedgod111 3 жыл бұрын
Thank you very much
Advanced Data Structures: Pattern Matching Using the BWT
8:10
Niema Moshiri
Рет қаралды 18 М.
Burrows-Wheeler Transform, part 1
29:16
Ben Langmead
Рет қаралды 11 М.
Deadpool family by Tsuriki Show
00:12
Tsuriki Show
Рет қаралды 5 МЛН
How Much Tape To Stop A Lamborghini?
00:15
MrBeast
Рет қаралды 259 МЛН
Advanced Data Structures: Count-Min Sketches
7:19
Niema Moshiri
Рет қаралды 28 М.
25. Burrows- Wheeler- Transform ( BWT) decoding with example
16:43
Advanced Data Structures: Burrows-Wheeler Transform (BWT)
7:26
Niema Moshiri
Рет қаралды 35 М.
Inverting Burrows-Wheeler
13:26
Bioinformatics Algorithms: An Active Learning Approach
Рет қаралды 16 М.
How Bzip2 Works (Burrows Wheeler Transform) - Computerphile
10:41
Computerphile
Рет қаралды 49 М.
encoding and decoding lempel ziv
7:09
Heba Sayed
Рет қаралды 8 М.
Advanced Data Structures: Aho-Corasick Automaton
9:55
Niema Moshiri
Рет қаралды 61 М.
Burrows-Wheeler Transform
37:00
Ben Langmead
Рет қаралды 83 М.
Burrow Wheeler Transform made simple
5:27
Made Simple
Рет қаралды 10 М.
FM Index
37:52
Ben Langmead
Рет қаралды 35 М.
Deadpool family by Tsuriki Show
00:12
Tsuriki Show
Рет қаралды 5 МЛН