*The new Compressor Head episode is finally here!* We know you've been waiting on pins and needles for the next episode of Compressor Head, and it's finally here! Colt McAnlis returns as the algorithm obsessed geek who loves to share his knowledge with the programmers of the world. He’s joined by fellow Googler Mike Burrows, co-inventor of the Burrows-Wheeler transform, an almost magical algorithm that will turn your head inside out. Stay tuned to the Google Developers KZbin channel (kzbin.info) for the soon to be released episodes 5 and 6, coming your way Friday mornings from the Googleplex in Mountain View CA. #CompressorHead #PerfMatters
@roberttompkins11729 жыл бұрын
Nice
@haroldlcarpenter84819 жыл бұрын
chrome-dome nerd
@ColtMcAnlis9 жыл бұрын
Compressor Head is BACK! And I'm going "bananas" over Episode 4! (....you won't get the joke until you watch the episode....)
@ColtMcAnlis9 жыл бұрын
Karthik Kumar Viswanathan I've ranted about that exact problem before. There's a plethora of existing formats out there to handle this, devs just aren't embracing them (protobuffs, gRPC, captn proto, flatbuffers etc)
@danielschreck90834 жыл бұрын
great video, one error: Magnus does the sort wrong, and the final matrix comes out wrong as a result: at 7:58, for example, with 3 characters in each row, Magnus sorts NAN before NAB. The final matrix (8:17) has last column BNNAAA (doesn't match the original last column), should be NNBAAA if sorts are done correctly in the invert phase.
@АдисХоджаяров4 жыл бұрын
that's why Compressor Head said: "I don't think he got a shot at that one".
@vonkruel9 жыл бұрын
I remember reading that article in Dr. Dobb's Journal, and implementing this algorithm was a great exercise. It's a very special algorithm. I _loved_ that story about declining to re-submit the paper, and saying "it's my policy not to explain these decisions." Golden!
@amitsubhash1586 Жыл бұрын
This was released 8 years aho and I am still able to learn from it , thank you team google!
@yuvrajgarg39215 жыл бұрын
Thanks Google Glad to hear words directly from Sir Michael Burrows .
@jackninja19 жыл бұрын
These video's are amazing, really like the amount of detail and the length that they have, keep up the good work!
@mgehad8 жыл бұрын
I can't find the words to describe this. this such so interesting and fun. You're such an amazing head man! Thanks.
@BryceDoesLife7 жыл бұрын
My left ear loved that intro.
@asedaaboagye58779 жыл бұрын
Another awesome episode!
@silkwurm4 жыл бұрын
Entropy doesn’t “fail to take into account the order of the symbols”, that’s a problem with character codes, not entropy. Entropy is defined for the joint distribution, order and all
@boushaft34595 жыл бұрын
Hey man, great video, really entertaining!
@runlaketahoe9 жыл бұрын
Will have to put #houseofcards on hold for a bit to watch Compressor Head!
@RakhiDhavale9 жыл бұрын
Great to know about BWT !
@matthewpobox3 жыл бұрын
what was the point of 2:55?
@zihuatanejo77414 жыл бұрын
from coursera algorithm Ⅱ,the specification of the last assignment is pretty arcane...
@yohanrizk33989 жыл бұрын
Woah this stuff is coool! I was following along the delta coding part, and I might be totally be wrong about this , but isn't the delta coding of [9,2,1,3,4,8,0,6,7,5] equal to [9,-7,-1,2,1,4,-8,6,1,-2] ? your calculation has -1 at the end. Great Video, BWT really is something else.
@diegonayalazo3 жыл бұрын
Thanks
@yohanrizk33988 жыл бұрын
Also... "Competetive sorter" is misspelled , the correct spelling is Competitive
@heidimetzler6 жыл бұрын
Error around 7:50: Linux implementation uses Huffman, not arithmetic coding. Great video.
@stgigamovement5 жыл бұрын
It originally did use Fenwick arithmetic coding, but the good news is that bwtc was invented which omits the first RLE step of bzip1 and bzip2 and uses arithmetic coding by default.
@simonchou45972 жыл бұрын
Why the entropy from 0 to 9 is 4 at 1:28? 3.32 is what I calculated(base 2)
@rustix33 жыл бұрын
8:37 The last letter of the first row 'B' should be switched with last letter of the 3rd row 'N'.
@jelouodsa13959 жыл бұрын
Colt McAnlis I think there is a better way when doing inverse transforming the BWT output by following the index ... than sorting it over and over again.... It would be great if you share that in the next episode... :)
@ColtMcAnlis9 жыл бұрын
Jelou odsa Lots of perf optimizations/approaches that we didn't get to talk about in this episode. My favorite has to be RadixZip.
@Duhdad9 жыл бұрын
Great job!! I Love Compressor Head. Keep feeding my grandkids. .Hey. Wasn't Magus riding a bike last we saw?? I/O Bytes? Is he going for a triathalon thingy?? GO Mag ~(8^(!)~
@josh27319 жыл бұрын
One part I did not understand about the sort was what way was he sorting the data. Does it have to be sorted using a specific sorting method(first to last, in clusters, both ends same time) to work. Or as long as it is the same sorting method used in sorting and unsorting
@ColtMcAnlis9 жыл бұрын
You want a lexicographical sort (A-Z sorting). Doesn't matter what method you use, as long as it lexicographically results in the same matrix. There's other crazy variants of BWT that use different types of sort methods in order to achieve different results (cryptographic, etc) but we didn't have time to go into those ;)
@petergrajciar15048 жыл бұрын
In 7:58 I think , that it was wrong sorting. 1, [ABA] 2, [ANA] 3, [ANA] , 4, [BAN] 5, [NAN] 6, [NAB]. It must be: 1, [ABA] 2, [ANA] 3, [ANA] , 4, [BAN] 5, [NAB] 6, [NAN]. So if you will continue in sorting , than result it will be: 1, [NABANA] 2, [NANABA] 3, [BANANA] , 4, [ABANAN] 5, [ANABAN] 6, [ANANAB]. BANANA is in the third line , but when you doing lexical sorting on 4:10 , you have BANANA on fourth line. Sorry for English :)
@MightyPuff7 жыл бұрын
I agree with you.
@zrouth9 жыл бұрын
This shit is Bananas
@Chester_Snorflex7 жыл бұрын
8:09 third row doesnt have a 'B' in it?
@shawn5767 жыл бұрын
Can't explain how they came up with it = they killed someone and stole it
@SiddharthKulkarniN9 жыл бұрын
I am 'gelous', and not because it's a pun :) :)
@delucabruno8 жыл бұрын
Im sory but I don't think that sorting everything back N times is very efficient, cause if I want to compress the whole Harry Potter book, it would take a lot of time to decode..
@blueskyresearch67019 жыл бұрын
BWT would be a great way to prep data for RLE. I know RLE am I in kindergarten or what.
@stgigamovement4 жыл бұрын
Bzip1, bzip2, and bwtc do use rle after bwt, while bzip1 and bzip2 also have rle as the first step
@Wandelaarke9 жыл бұрын
And again, thank you Google. Go free education about cool stuff that teachers don't understand themselves!
@CasperBang9 жыл бұрын
Why Move-To-Front rather than RLE after the BW transform? Seems like the clustering of BW makes it a perfect candidate for applying RLE to it, followed by i.e. Huffman encoding?!
@ColtMcAnlis9 жыл бұрын
Casper Bang RLE works best as an statistical encoder when there's long runs of similar characters, that aren't oddly broken up. Remember that BWT doesn't output homogenous runs of characters. There's no guarantee of that. You may (luckily) end up with a run of 20 of the same character, only to be disrupted by a 20 character run of individual, non-run characters. MTF takes this type of statistical changes into account. If you get 20 "E"s broken up by a single "Z" you don't lose as much encoding efficiency as you would having to restart a run in RLE mode. And to be honest, there's really cool ways to merge MTF and RLE together to make some fancy algorithms. RadixZip introduced something like that : citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.106.3977
@CasperBang9 жыл бұрын
Colt McAnlis Ah I see, thanks for the info - just used my ACM subscription to get to the RadixZip paper, it's the second time I hear you mention it so it must be a worthy read. :) Just hope it isn't too encompassed by patents and the like.
@Michael_190567 жыл бұрын
The strongest encoding methods for BWT do not use any secondary transform at all. M03, for instance, generally achieves the strongest compression of all the BWT encoding techniques and it uses no secondary transforms whatsoever. Instead, the symbols are encoded as they appear in the BWT but using the statistics gathered from the contexts under which that symbol appeared within the original steam. see: www.michael-maniscalco.com/download.htm www.juergen-abel.info/files/preprints/preprint_post_bwt_stages.pdf