How Huffman Trees Work - Computerphile

  Рет қаралды 252,609

Computerphile

Computerphile

Күн бұрын

How do we derive the most compact codes for a situation? Huffman Trees can help. Professor Brailsford explains how computer scientists like their trees to be upside down.
"Entropy in Compression - Computerphile" precedes this: • Entropy in Compression...
EXTRA BITS: More on Huffman Trees: • EXTRA BITS/TRITS - Huf...
Error Correction: • Error Correction - Com...
/ computerphile
/ computer_phile
This video was filmed and edited by Sean Riley.
Computer Science at the University of Nottingham: bit.ly/nottscom...
Computerphile is a sister project to Brady Haran's Numberphile. See the full list of Brady's video projects at: bit.ly/bradycha...

Пікірлер: 209
@TonyManso
@TonyManso 11 жыл бұрын
I just love his enthusiasm and ability to simplify things that might otherwise be difficult to understand
@Cathalion
@Cathalion 9 жыл бұрын
"And now a bit on chemistry...." The mans level of knowledge is astounding lol.
@bobbyd9115
@bobbyd9115 6 жыл бұрын
Just wrote a file compression program in smalltalk for my class. I love how he explains the algorithm.
@jonnypanteloni
@jonnypanteloni 9 жыл бұрын
I don't know what I'm looking at after the tree diagram. I would love to learn and take notes on what this all means - I watch computerphile in my spare time when I am not editing videos, photos and other work for clients. It's good for the mind!
@AvZNaV
@AvZNaV 8 жыл бұрын
I used H for enthalpy in my chemistry classes, S for entropy and Q for heat
@cyberb4ss
@cyberb4ss 8 жыл бұрын
Title confused me, because I'm so used to seeing upside down trees!
@joealias2594
@joealias2594 10 жыл бұрын
Why did he start with an example where every probability is equal? That makes the encoding totally arbitrary and really defeats the whole point.
@timsiwula
@timsiwula 9 жыл бұрын
Does anyone have a link to the original 12 page paper David Huffman published in 1952? Thanks!!
@Toksyuryel
@Toksyuryel 11 жыл бұрын
What's left unspoken in this video is that the codes are all padded to the same length afterwards. After that padding is applied, the Bass, Tuna, and Barracuda codes you suggested are no longer unique: they are all 100.
@AlanKey86
@AlanKey86 11 жыл бұрын
I was a little confused at the start of the video - I thought I'd skipped a bit by mistake. But then, almost immediately, I remembered about that good ol' weather station and it all came clear. Maybe there should've been a "Previously on Computerphile..." intro sequence
@PiratWeber
@PiratWeber 11 жыл бұрын
I could listen to his voice for hours and hearing about computer science.
@stellarfirefly
@stellarfirefly 11 жыл бұрын
Prof. Brailsford should use a simple word with obvious repeating characters, such as "abracadabra", to create a Huffman tree. He could then later explain how Morse Code was a very early attempt to use a Huffman-type encoding. (For example, the letter "e" in Morse Code, which is the most used letter in the English language, is simply a single "dot".)
@Kelimion
@Kelimion 11 жыл бұрын
They occur in different probabilities, so if you're sending a rather large text describing each of the fish, coding the fish that occur more often in fewer bits will result in a total shorter than the naïve coding which doesn't take the probabilities into account.
@Kelimion
@Kelimion 11 жыл бұрын
No problem :) The receiver might then have seen 11 at the end of a packet and know another 1 or 0 is needed to make sense of it. They'll wait for the next packet to arrive and add that to the buffer. Basically they'll always consume the buffer a bit at a time until they get a code they recognise. For the termination code, if you can't send a header because you don't know how much you'll send, you'd add a code with the lowest possible probability (it occurs once in the whole stream).
@taesheren
@taesheren 11 жыл бұрын
It was not directed at you specifically. I just thought I'd put it out there, because I know it is a common misconception.
@JimCullen
@JimCullen 11 жыл бұрын
I believe he explained it in the first video on this topic. When you are sending the message, if at any point the sent message is equal to one of the messages that they recognise, it stops. So in your example, if they wanted to send Shark, they would send 11. However, in order to send 11, you must first send the first 1, and then the second 1. After the first 1 has been sent, it recognises that as Bass, and stops.
@Kelimion
@Kelimion 11 жыл бұрын
Put differently: In your example you'd need extra symbols inbetween each code to tell 0+0 apart from 00. Huffman codes can always be told apart. Because the shorter codes are used for things sent more frequently, the average bits used tends toward the optimal amount you could possibly use to send things without any ambiguity.
@mkaatr
@mkaatr 11 жыл бұрын
It is used for compression. For example if you want to send a message stating the order of fishes such as : Code,Bass,Tuna,Code,Tuna,Bass,Code,Code,Tuna ... then instead you could send: 01011001101000110 which is much shorter (notice that these are bits, so there are 3 numbers to be send instead of 36 numbers for the whole list). The other side would decode the message and generate the original list.
@Keduce22
@Keduce22 11 жыл бұрын
My last computer science project of the year was a huffman compression algorithm encode/decode. My solution was so hacked together. The provided code for c++ contained c code which works perfectly but is not exactly right in terms of good coding practices. Then my group member did his own solutions. My code was the most buggy thimg ever. It took weeks to get it right and for the automarker to accept it.
@MeepMeep
@MeepMeep 5 жыл бұрын
Studying for my midterm by watching Computerphile feels wrong. I'm not supposed to enjoy studying
@MeepMeep
@MeepMeep 6 жыл бұрын
THIS MAKES SO MUCH SENSE THANK YOU
@razorborne
@razorborne 11 жыл бұрын
you could represent them all with 3 bits, but then you'd have to send more data on average. while it represents a couple of the results with 4 bits, the two that you're most likely to send are represented by 1 and 2 bits each, so 66% of the time you're beating the 3-bit line. another 11% of the time you're matching it, and 22% of the time you're one over. so while you're increasing the maximum amount you might have to send, you're reducing the average.
@Kelimion
@Kelimion 11 жыл бұрын
No problem. That's why you wouldn't use 100 for barracuda, because you can't distinguish it from bass+cod+cod. The useful property of Huffman codes is that you can always tell codes apart, and that the running average of bits used converges to the optimum amount. (For powers of 2 probabilities, otherwise use arithmatic coding for example, where you can use any probability you like).
@parkamark
@parkamark 11 жыл бұрын
Peter's reply is right, but to explain it another way, Huffman Codes are "instantaneously decodable". Your codes are not this. Take the bitstream: 011001. Under your coding scheme, does this mean Cod (0), Bass (1), Tuna (10), Cod (0), Bass (1) OR Cod (0), Bass (1), Barracuda (100), Bass (1) ? An instantaneously decodable code is one which, as you read the bit stream, once you have a match, that's your answer, and you know where the start of the next code word is.
@Kelimion
@Kelimion 11 жыл бұрын
The codes are constructed so you always know if you've already received it, or if you're waiting for more bits. So with the 1 on the end you know it's not a valid code, yet, and wait for the buffer to fill up to tell you what follows.
@Disthron
@Disthron 10 жыл бұрын
Is that kind of reel paper still in use or is that all old stock? I can't remember the last time I saw those types of printer paper. I thought they were only used in dot matrix printers.
@thecassman
@thecassman 11 жыл бұрын
The fish example is a bit contrived as a proof but... Think of it this way; if i wanted to send "Tuna Cod" as a message in your coding above then i would send "100"... But that's also the code for Barracuda, so the receiver wouldn't know which you meant. In the coding of the video "Tuna Cod" would be sent as "1100" which is a sequence not shared by any other possibility, so it wouldn't be confused with anything else.
@StSin666
@StSin666 11 жыл бұрын
Not a comment for this video, but a request for Computerphile. I would like to see more videos about hardware, i.e. different architectures of a CPU, von Neumann and Harvard. Difference between a CPU and an GPU. Logic gates, FPGAs, parallell computing, sensors, I/Os, AI hardware and so on.
@Kelimion
@Kelimion 11 жыл бұрын
It's the average length of bits needed for things you're sending using those probabilities. In his calculation for that average he takes the individual probabilities into account, whereas you didn't.
@MrCOPYPASTE
@MrCOPYPASTE 11 жыл бұрын
Imagine that you have a sequence of 16 zeros, if you use your coding you will have 16 * 3 bits as a result but if you use only one bit for each zero it will produce 16 bits, the key part of this algorithm is that it replaces the most frequent data with shorter codes not fixed sized ones.
@millionsteve
@millionsteve 11 жыл бұрын
I agree....the camera work in most of these computerphile videos do that. it's like it's filmed for dramatic effect than education.
@qwertyfinger
@qwertyfinger 9 жыл бұрын
so it actually approximates numbers of bits between integers? that's really cool. the first example kind of misses that point, much easier to see with the non-powers of two.
@TheMoolica
@TheMoolica 11 жыл бұрын
So Huffman was Australian?
@GilliamHimself
@GilliamHimself 11 жыл бұрын
A numberphile about logarithms would be very helpful
@HenkJanBakker
@HenkJanBakker 11 жыл бұрын
LOL. Eye rolling sure brings the 'I don't get it' to a whole new level.
@TylerLarson
@TylerLarson 11 жыл бұрын
Compression. Given the relative frequency of a a given set of "words", determine the smallest way to store some sequence of words.
@MrAnonymousCitizen
@MrAnonymousCitizen 11 жыл бұрын
This man's voice is wonderful.
@samposyreeni
@samposyreeni 9 жыл бұрын
Why don't you do Shannon-Fano as well? That's of course what the Zip/DEFLATE format uses, so huge practical appeal. Thought not quite optimal. :)
@uriituw
@uriituw 11 жыл бұрын
I loved studying Huffman encoding in uni.
@jimbean5657
@jimbean5657 11 жыл бұрын
Right you could divide into two spaces... I was thinking more along the lines of a Mediasite Video presentation style since KZbin presenters are advancing in their production values. Anyhow... sorry for interrupting the thread
@LudicrousTachyon
@LudicrousTachyon 11 жыл бұрын
With the 1/3 example. It seems the efficiency is lower mostly because the shorter codes aren't exhausted before moving on to longer codes. There are no 2, 3, or 4 digit codes beginning with 0.
11 жыл бұрын
That... Actually made an amazing amount of sense. Kudos
@Beesman88
@Beesman88 11 жыл бұрын
you can't differ 0 and 00 so you can use clasic info sending for 5 states (000,001,010,011,100) and you got 101,110and111 leftover - also you will always send 3 bits. But with this method he showed - you send 2.2~something bits on average. so you are eficient.. this is ofcourse not really mindblow you but if it would be sending 10MB per second or sending 6MB per second per average - you see where it is going..
@Tat2ice
@Tat2ice 11 жыл бұрын
Huffman codes, which are part of Information Theory, are used to compress data, with is a "computer-related" idea. Also, I find Numberphile to be related to purer mathematics, rather than its applications. But that's my opinion :)
@TheLatterPartOfToday
@TheLatterPartOfToday 11 жыл бұрын
I absolutely love this channel (and numberphile of course). Thanks for another great video!
@MichielVanuytsel
@MichielVanuytsel 11 жыл бұрын
That depends on the possiblities of the different answers :) Huffman code gives you the smallest average length (or power length, not sure)
@vicpc55
@vicpc55 11 жыл бұрын
Yes, there are, but they tend to be less important than having a smaller average length. With a shorter maximum code, for example, you may need a smaller buffer on the receiving end to decode.
@eideticex
@eideticex 11 жыл бұрын
Imagine catalog cards for library books. Each one specifies exactly where in the library you can find the book it represents. We do the same in programming for memory space. There's also usually a NOP in everything that matters for coherent behavior: no operation, just sits there acting like nothing happened preserving it's state..
@geordonworley5618
@geordonworley5618 11 жыл бұрын
That would be really cool to watch, but I imagine they won't go into depth on anything like that. They might do something like talk overall about FPGAs (probably using a lot of broad analogies), but never actually talking in depth about the topic. The fact that you are aware of all of these things likely means you already know at least the overall purpose of these things, so I would suggest to Computerphile to cover these topics in brief, as they do normally, for the majority of their viewers.
@edwaars
@edwaars 8 жыл бұрын
Using this on my exam tomorrow! thanks.
@TheMohawkNinja
@TheMohawkNinja 11 жыл бұрын
So how does this work in a computer, since the codes are of different length? You can't just send no signal, and have it be perceived, right?
@he1986
@he1986 11 жыл бұрын
Damn. What part of youtube am I on now? A polite conversation between a comment and a reply.
@StSin666
@StSin666 11 жыл бұрын
I'm currently writing a master thesis on FPGAs. I love hardware stuff =)
@yukhui
@yukhui 11 жыл бұрын
I don't know anything about this subject but I'm wondering is there any particular reason for using 2 sig figs in showing the sum of the probabilities equal to 1.0?
@taesheren
@taesheren 11 жыл бұрын
Just thought I'd point out that mushrooms (and other fungi) is not plants. They are considered a separate form of organism from plant life.
@Kelimion
@Kelimion 11 жыл бұрын
You'd first send a header, letting the other party know how many entities to expect when the bits are expanded. And/or you'd add a termination code, which tells you when the list is done. Then you just send the codes sequentially until you fill up a packet, cut there, and send the rest in the next packet(s).
@JBinero
@JBinero 11 жыл бұрын
Maybe it's because the 1/3rd happens 3 times as much as the 1/9th, so it kinda makes sense to give it also a way shorter code.
@shaihuld
@shaihuld 11 жыл бұрын
Best teacher ever, can't wait for it to get deeper ^^ !
@gpaluk
@gpaluk 11 жыл бұрын
Guess what... Math is relevant to computer science too :p IBM and such did a lot of the work so that you didn't have to think about this, but it's systems like this that make computer optimizations a field of study and make your program code run faster.
@kolnder
@kolnder 11 жыл бұрын
Can someone pls help me understand that? first, did they made a video about the whole entropie and bit length things? second, if you are sending those codes, what happens if you send a message ending ith a 1, and then one of the coded fish names, how do you know wich fish it is? e.g. ...1110... this could be a 1 from the last message and then tuna or just shark?
@DiaStarvy
@DiaStarvy 11 жыл бұрын
If you want to send a list (say for fish caught in the week), something like "cod, bass, tuna" (0101) would be indistinguishable from "cod, shark, bass" (0101) in your system. Using a Huffman tree solves this problem.
@L33tH4ks
@L33tH4ks 10 жыл бұрын
Awesome video Brady!
@Mobin92
@Mobin92 11 жыл бұрын
I didn't get it... Whats the use of those codes? Especially in the fishing example it doesn't make any sense to me. It needs 4 bits to indicate 5 different fishes, but 3 bits would have been enough... ?
@daedra40
@daedra40 11 жыл бұрын
Nice one professor, I was actually confused with the entropy symbol, and you saved me :P
@Kelimion
@Kelimion 11 жыл бұрын
How would you tell 00 apart from 0+0? Or 11 from 1+1? Or 01 from 0+1. That's why.
@symbolxchannel
@symbolxchannel 11 жыл бұрын
Isn't it important that the codes are not the same length? I mean a "computer" would usually request (wait for) packets of specific sizes?
@BGBTech
@BGBTech 11 жыл бұрын
audio and video compression are also pretty big users. (granted, many newer video codecs use arithmetic coding instead...).
@growlerjack
@growlerjack 11 жыл бұрын
so i think my brain just melted, and to think i want to do this sort of stuff in uni next year, damn i think i need to get on and learn haha
@Hiimstring3
@Hiimstring3 11 жыл бұрын
This is very interesting stuff, thanks for doing the video!
@TheWeepingCorpse
@TheWeepingCorpse 11 жыл бұрын
so is 10 = tuna or is10 = to bass followed by cod? how would the receiver distinguish between them?
@christiandinkel8481
@christiandinkel8481 11 жыл бұрын
In the end of the video, shouldn't it say ternary rather than trinary? Not that I'd know, I just always assumed.
@jsnadrian
@jsnadrian 11 жыл бұрын
This video should have really contained some sort of motivation for why Huffman trees matter. I understand what Pro Brailsford is talking about ... but if it weren't for his smooth voice and smarty-pants accent, I'd hardly care.
@c.danielpremkumar8495
@c.danielpremkumar8495 2 жыл бұрын
10.28 why is there a MINUS sign in the formula for "H" ?
@ten.seconds
@ten.seconds 11 жыл бұрын
So, is unicode a similar approach to these?
@Retr0id
@Retr0id Жыл бұрын
Shout out to everyone revisiting in 2023 ;)
@exiletomars
@exiletomars 11 жыл бұрын
Are you going to do a video on recursion?
@Teekles
@Teekles 11 жыл бұрын
If you quantize the universe to say everything is separate, two versions of the exact same thing never occur. The problem is where does the number 2 come from. If you can't get to 2 by adding 1+1, because two versions of 1 don't exist, the only other way you can get to 2 is to multiply. The only problem is the square root of two is irrational, and no group of numbers can ever create it. So how do you prove the existence of more than one thing? Impossible.
@lloydgush
@lloydgush 11 жыл бұрын
Yes! S for entropy, and H for enthalpy!
@TehGordonFreeman
@TehGordonFreeman 11 жыл бұрын
"Here's an unsolved problem, I've done it, please give me a PhD"... New signature :)
@scotianbank
@scotianbank 10 жыл бұрын
This Washington my last project for programming 101 in Java
@JanCRefsgaard
@JanCRefsgaard 11 жыл бұрын
from wikipedia: A mushroom is the fleshy, spore-bearing fruiting body of a fungus,
@Tfin
@Tfin 11 жыл бұрын
000, 010, 100, 101, 110? Just group lowest from left rather than lowest from right.
@adrienperie6119
@adrienperie6119 11 жыл бұрын
*eyes rolling in infinite tiredness*
@pdieraue
@pdieraue 11 жыл бұрын
Those codes are ambiguous. There's no way to tell the difference between "Bass, Cod" and "Tuna". Huffman codes don't have that problem because the way you construct them prevents any of the codes from being a prefix of another code.
@zelexi
@zelexi 11 жыл бұрын
Prediction, next video on arithmetic coding.
@Dobby2719
@Dobby2719 10 жыл бұрын
I have a question if anyone can answer... why doesn't he make the first two probabilities have a "code" of 0 and 1 and the next four be 00, 01, 10, 11 and so on for how ever many there are to save on bit space? Why do they all have to be the same amount of bits long?
@davidparker1960
@davidparker1960 10 жыл бұрын
Huffman coding guarantees lossless compression. If he had just set the probabilities without building a tree from the probabilities then the receiver of the data has no way of reversing the code back to its original state. Example, for A = 1 B = 0 C = 10, then how would you know if 110 meant AAB or AC?
@Dobby2719
@Dobby2719 10 жыл бұрын
Oh i see... thanks!
@P4INKiller
@P4INKiller 11 жыл бұрын
Yup, makes sense.
@joaosebastiaoca
@joaosebastiaoca 11 жыл бұрын
How he calculated the L in the fish trip?
@KarlRamstedt
@KarlRamstedt 11 жыл бұрын
And how do Huffman codes not fall into this same trap?
@FernieCanto
@FernieCanto 11 жыл бұрын
I think you are mixing up two different things. This video has to do basically with how information is stored and transmitted, not with how programs are run. When you're talking about algorithms, "efficiency" refers to executing as few operations as possible; but with information, it refers to using as few bits as possible to represent data.
@Booskop.
@Booskop. 11 жыл бұрын
When did funghi become plants?
@idjles
@idjles 11 жыл бұрын
if you understood this, then you have understood almost half of how computers compress zip and mp3 files
@seanski44
@seanski44 11 жыл бұрын
Tuna is 110
@JanCRefsgaard
@JanCRefsgaard 11 жыл бұрын
Until rather recently when we redecorated the tree of life ;), back when animals and trees were different 'kingdoms' and archaea and bacteria was the same, oh, the good ol days ;)
@Keduce22
@Keduce22 11 жыл бұрын
Programming is a form of mathematics, they go hand in hand. Programming is a way to solve a problem logically. You could even go further to a proof which states that any algorithm is in fact a Turing Machine and every turing machine can be written as a single statement which would resemble more what people expet mathematics to look like. No really a practical thing to do though.
@RyanJensenEE
@RyanJensenEE 11 жыл бұрын
Numbering the huffman tree reminds me of drawing multiplexers... :3
@ThePharphis
@ThePharphis 11 жыл бұрын
He's right about H and S in chemistry. i cringed when he wrote H and said it was entropy just out of habit :p
@onwul
@onwul 11 жыл бұрын
Why average length is 2.22..? If you add all lengths and divide by the count: (1+2+3+4+4)/5 you get 2.8 . Im not saying that calculations in the video are wrong, its just I thought thats the way you calculate average of numbers.
@TheGrunt76
@TheGrunt76 11 жыл бұрын
Couldn't agree more! Great stuff :)
@theminechesser
@theminechesser 11 жыл бұрын
love this guy great vid
Huffman Codes: An Information Theory Perspective
29:11
Reducible
Рет қаралды 230 М.
The Most Difficult Program to Compute? - Computerphile
14:55
Computerphile
Рет қаралды 1,4 МЛН
小丑妹妹插队被妈妈教训!#小丑#路飞#家庭#搞笑
00:12
家庭搞笑日记
Рет қаралды 37 МЛН
Миллионер | 1 - серия
34:31
Million Show
Рет қаралды 1,7 МЛН
Новый уровень твоей сосиски
00:33
Кушать Хочу
Рет қаралды 4,8 МЛН
EXTRA BITS/TRITS - Huffman Trees - Computerphile
5:38
Computerphile
Рет қаралды 56 М.
Punch Card Programming - Computerphile
14:55
Computerphile
Рет қаралды 881 М.
Error Detection and Flipping the Bits - Computerphile
8:24
Computerphile
Рет қаралды 119 М.
Cracking Websites with Cross Site Scripting - Computerphile
8:34
Computerphile
Рет қаралды 1,5 МЛН
Characters, Symbols and the Unicode Miracle - Computerphile
9:37
Computerphile
Рет қаралды 2 МЛН
The "Best" OS For Game Development
13:53
Timothy Cain
Рет қаралды 18 М.
Fishy Codes: Bletchley's Other Secret - Computerphile
15:57
Computerphile
Рет қаралды 221 М.
Has Generative AI Already Peaked? - Computerphile
12:48
Computerphile
Рет қаралды 992 М.
小丑妹妹插队被妈妈教训!#小丑#路飞#家庭#搞笑
00:12
家庭搞笑日记
Рет қаралды 37 МЛН