A Short Introduction to Entropy, Cross-Entropy and KL-Divergence

Рет қаралды 352,492

Күн бұрын

Entropy, Cross-Entropy and KL-Divergence are often used in Machine Learning, in particular for training classifiers. In this short video, you will understand where they come from and why we use them in ML.
Paper:
"A mathematical theory of communication", Claude E. Shannon, 1948, pubman.mpdl.mpg...
Errata:
At 5:05, the sign is reversed on the second line, it should read: "Entropy = -0.35 log2(0.35) - ... - 0.01 log2(0.01) = 2.23 bits"
At 8:43, the sum of predicted probabilities should always add up to 100%. Just pretend that I wrote, say, 23% instead of 30% for the Dog probability and everything's fine.
The painting on the first slide is by Annie Clavel, a great French artist currently living in Los Angeles. The painting is reproduced with her kind authorization. Please visit her website: www.annieclavel....

Пікірлер: 469

@revimfadli4666 4 жыл бұрын

This feels like a 1.5-hour course conveyed in just 11 minutes, i wonder how much entropy it has :)

@grjesus9979 3 жыл бұрын

hahaha

@anuraggorkar5595 3 жыл бұрын

Underrated Comment

@klam77 3 жыл бұрын

ahhh....too clever. the comment has distracted my entropy from the video. Negative marks for you!

@Darkev77 3 жыл бұрын

@@klam77 Could you elaborate on his joke please?

@ashrafg4668 3 жыл бұрын

@@Darkev77 The idea here is that most other resources (videos, blogs) take a very long time (and more importantly say a lot of things) to convey the ideas that this video did in a short time (and with just the essential ideas). This video, thus, has low entropy (vs most other resources that have much higher entropy).

@jennyread9464 6 жыл бұрын

Fantastic video, incredibly clear. Definitely going to subscribe! I do have one suggestion. I think some people might struggle a little bit around 2m22s where you introduce the idea that if P(sun)=0.75 and P(rain)=0.25, then a forecast of rain reduces your uncertainty by a factor of 4. I think it's a little hard to see why at first. Sure, initially P(rain)=0.25 while after the forecast P(rain)=1, so it sounds reasonable that that would be a factor of 4. But your viewers might wonder why you can’t equally compute this as, initially P(sun)=0.75 while after the forecast P(sun)=0. That would give a factor of 0! You could talk people through this a little more, e.g. say imagine the day is divided into 4 equally likely outcomes, 3 sunny and 1 rainy. Before, you were uncertain about which of the 4 options would happen but after a forecast of rain you know for sure it is the 1 rainy option - that’s a reduction by a factor of 4. However after a forecast of sun, you only know it is one of the 3 sunny options, so your uncertainty has gone down from 4 options to 3 - that’s a reduction by 4/3.

@AurelienGeron 6 жыл бұрын

Thanks Jenny! You're right, I went a bit too fast on this point, and I really like the way you explain it. :)

@god-son-love 6 жыл бұрын

Shouldn't one use information gain to check the extent of reduction ? IG = (-1log2(1) - 0log2(0) ) - (-(3/4)log2(4/3)-(1/4)log2(1/4)) = 0.01881437472 bit

@dlisetteb 6 жыл бұрын

thank youuuuuuuuuuuuuuuuu

@rameshmaddali6208 6 жыл бұрын

Actually I understand the concept from your comment than the video itself :) thanks a lot

@maheshwaranumapathy 6 жыл бұрын

awesome, great insight i did struggle to get it at first place. Checked out the comments and bam! Thanks :)

@ArxivInsights 6 жыл бұрын

As a Machine Learning practitioner & KZbin vlogger, I find these videos incredibly valuable! If you want to freshen up on those so-often-needed theoretical concepts, your videos are much more efficient and clear than reading through several blogposts/papers. Thank you very much!!

@AurelienGeron 6 жыл бұрын

Thanks! I just checkout out your channel and subscribed. :)

@pyeleon5036 6 жыл бұрын

I like your video too! Especially the VAE one

@fiddlepants5947 5 жыл бұрын

Arxiv, it was actually your video on VAE's that encouraged me to check out this video for KL-Divergence. Keep up the good work, both of you.

@grjesus9979 4 жыл бұрын

thank you, at first i messed up trying to understand but now reading your comment i understamd it. Thank you! 😊

@xintongbian 6 жыл бұрын

I've been googling KL Divergence for some time now without understanding anything... your video conveys that concept effortlessly. beautiful explanation

@agarwaengrc Жыл бұрын

Haven't seen a better, clearer explanation of entropy and KL-Divergence, ever, and I've studied information theory before, in 2 courses and 3 books. Phenomenal, this should be made the standard intro for these concepts, in all university courses.

@yb801 Жыл бұрын

Thank you , I have always confused about these three concepts, you make these concepts really clear for me.

@SagarYadavIndia Жыл бұрын

Beautiful short video, explaining the concept that is usually a 2 hour explanation in about 10 minutes.

@summary7428 3 жыл бұрын

this is by far the best and most concise explanation on the fundamental concepts of information theory we need for machine learning..

@JakeMiller2020 4 жыл бұрын

I always seem to come back to watch this video every 3-6 months, when I forget what KL Divergence is conceptually. It's a great video.

@metaprog46and2 4 жыл бұрын

Phenomenal explanation of a seemingly esoteric concept into one that's simple & easy-to-understand. Great choice of examples too. Very information-dense yet super accessible for most people (I'd imagine).

@chenranxu6941 3 жыл бұрын

Wow! It's just incredible to convey so much information while still keeping everything simple & well-explained, and within 10 min.

@timrault 6 жыл бұрын

Hey Aurélien, thanks so much for this great video ! I have a few questions : 1/ I struggle with the concept of uncertainty. In the example where p(sun)=0.75 and p(rain)=0.25, what would be my uncertainty ? 2/ At 6:42, I don't understand why to use 2 bits for the sunny weather means that we are implicitly predicting that it'll be sunny every four days on average. 3/ Would it be a bad idea to try to use a cross entropy loss for something different from classification (i.e. where the targets wouldn't be one-hot vectors) ? I think there is a possibility that we can find a predicted distribution q different from the true distribution p, which would also minimise the value of the cross entropy, but I'm not sure.

@paulmendoza9736 Жыл бұрын

I want to like this video 1000 times. To the point, no BS, clear, understandable.

@AladinxGonca 5 ай бұрын

You are the most talented tutor I've ever seen

@michaelzumpano7318 Жыл бұрын

Wow! This was the perfect mix of motivated examples and math utility. I watched this video twice. The second time I wrote it all out. 3 full pages! It’s amazing that you could present all these examples and the core information in ten minutes without it feeling rushed. You’re a great teacher. I’d love to see you do a series on Taleb’s books - Fat Tails and Anti-Fragility.

@rampadmanabhan4258 4 жыл бұрын

Great video! However, I have a doubt related to around 7:11 onwards. I don't understand the point where you say that "the code doesn't use messages starting with 1111, and hence the sum of predicted probabilities is not 1". Could you explain this?

@aa-xn5hc 6 жыл бұрын

you are a genius in creating clarity

@billmo6824 2 жыл бұрын

Really, I definitely cannot come up with an alternative way to explain this concept more concisely.

@sushilkhadka8069 3 ай бұрын

Wow best explaination ever, I found this while I was in college. I just come here once a year just to refresh my intution.

@colletteloueva13 Жыл бұрын

One of the most beautiful videos I've watched and understood a concept :')

@GreenCowsGames Жыл бұрын

I am new to information theory and computer science in general, and this is the best explanation I could find about these topics by far!

@011azr 6 жыл бұрын

Sir, you have a talent to explain stuff in a crystal clear manner. You just make something that is usually explained by a huge sum of math equations to be something so simple like this. Great job, please continue on making more KZbin videos!

@robinranabhat3125 6 жыл бұрын

you are 3blues1brown kind of guy. nowadays i see lot of youtubers making machine learning videos by repeating the words found in research papers and wikipedia . u r different

@bhargavasavi 4 жыл бұрын

Grant Sanderson is like the Morgan Freeman of visual Mathematics.....I wish his videos existed during my earlier days in college

@glockenspiel_ 4 жыл бұрын

Thank you, very well explained! I decided to get into machine learning in this hard quarantine period but I didn't have many expectations placed on me. Thanks to your clear and friendly explanations in your book I am learning, improving and, not least, enjoying a lot. So thank you so much!

@Dr.Roxirock Жыл бұрын

I really enjoyed the way you are explaining it. It's so inspiring watching and learning difficult concepts from the author of such an incredible book in the ML realm. I wish you could teach via video other concepts as well. Cheers, Roxi

@s.r8081 3 жыл бұрын

Fantastic! This short video really explains the concept of entropy, cross-entropy, and KL-Divergence clearly, even if you know nothing about them before. Thank you for the clear explaination!

@jdm89s13 5 жыл бұрын

This 11-ish minute presentation so clearly and concisely explained what I had a hard time understanding from a one hour lecture in school. Excellent video!

@陈亮宇-m1s 6 жыл бұрын

I come to find Entorpy, but I received Entorpy, Cross-Enropy and KL-divergence. You are so generous!

@fberron 3 жыл бұрын

Finally I understood Shannon's theory of information. Thank you Aurélien

@DailyHomerClips 5 жыл бұрын

this is by far the best description of those 3 terms , can't be thankful enough

@sagnikbhattacharya1202 6 жыл бұрын

You make the toughest concepts seem super easy! I love your videos!!!

@frankcastle3288 3 жыл бұрын

I have been using cross-entropy for classification for years and I just understood it. Thanks Aurélien!

@Dinunzilicious 3 жыл бұрын

Incredibly video, easily one of the top three I've ever stumbled across in terms of concise educational value. Also love the book, great for anyone wanting this level of clarity on a wide range of ml topics. Not sure if this will help anyone else, but I was having trouble understanding why we choose 1/p as the "uncertainty reduction factor," and not, say 1-p or some other metric. What helped me gain an intuition for this was realizing 1/p is the number of bits we would need to encode a uniform distribution if every event had the probability p. So the information, -log(p), is how many bits that event would be "worth" were it part of a uniform distribution. This uniform distribution is also the maximum entropy distribution that event could possibly come from given its probability...though you can't reference entropy without first explaining information.

@voraciousdownloader 4 жыл бұрын

Really the best explanation of KL divergence I have seen so far !! Thank you.

@ykkim77 4 жыл бұрын

This is the best explanation of the topics that I have ever seen. Thanks!

@jamesjenkins9480 2 жыл бұрын

I've learned about this before, but this is the best explanation I've come across. And was a helpful review, since it's been a while since I used this. Well done.

@hassanmatout741 6 жыл бұрын

This channel will sky rocket. no doubt. Thank you so much! Clear, visualized and well explained at a perfect pace! Everything is high quality! Keep it up sir!

@thegamersschool9978 2 жыл бұрын

I am reading your book! and oh man oh what a book!!! first I thought how the book and video has exactly same example for explanation until I saw the book of yours on the later part of the video, and realized it's you it's so great to listen to you after reading you!!

@jackfan1008 6 жыл бұрын

This explanation is absolutely fantastic. Clear, concise and comprehensive. Thank you for the video.

@ramensusho 4 ай бұрын

The no. of bits I received is way higher than I expected !! Nice video

@maryamzarabian4617 2 жыл бұрын

thank you for useful video , and also really thanks for your book . You express very difficult concepts of machine learning like a piece of cake .

@Rafayak 5 жыл бұрын

Finally, someone who understands, and doesn't just regurgitate the wikipedia page :) Thanks alot!

@swapanjain892 6 жыл бұрын

You have no idea how much this video has helped me.Thanks for making such quality content and keep creating more.

@alirezamarahemi2352 2 жыл бұрын

Not only this video is fantastic in explaining the concepts, but also the book "Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow_ Concepts, Tools, and Techniques to Build Intelligent Systems-O’Reilly Media (2019)" is the best book I've studied on machine learning subject by the same author (Aurélien Géron).

@hazzaldo 5 жыл бұрын

Great video. Finally I somewhat understand Entropy and Cross Entropy. Many thanks. I do have a few questions, and would really appreciate any help. One thing I was confused about, which seemed to me like a small inconsistency in the explanation (or maybe I just understood wrong). At 6:00 you mentioned: "So if you compute the average number of bits that we send every day, you get 2.42 bits. That's our new and improved Cross-Entropy". First question is: so is this the way to calculate Cross-Entropy: Σ p * number of bits in message? Second question which is where I found the inconsistency. At 6:25 you mentioned: "That's really bad! It's roughly twice the Entropy". Did you mean: twice the Cross-Entropy (NOT entropy)? Third question - at 7:11 you mention: "Note that our code doesn't use messages starting with 1111. So that's why if you add up all the predicted probabilities in this example, they don't up to 100%". I didn't understand this point. What does using 1111 message has to do with adding up the predicted probabilities to equal 100%? And what does this mean? What's the significance of it? Fourth question: at 7:25 you mention the function for Cross-Entropy calculation. But earlier as per my first question, at 6:00, the Cross-Entropy computation seems to differ from this one (if at 6:00 that is indeed the Cross-Entropy calculation). Many thanks in advance.

@AurelienGeron 5 жыл бұрын

Thanks Hazzaldo N! I'll try to answer below: First question: yes, the cross-entropy is the mean number of bits per message, no we can compute it by summing, for each possible message, the number of bits in that message times the probability of that message. Second question: I did mean twice the entropy, because to know if the cross-entropy is good or not, we need to compare it to the entropy. We computed the entropy earlier (at 4:59), and it was 2.23 bits (it only depends on the distribution of probabilities of each weather condition, not on the specific code we will use to communicate the weather information, and it does not change if we swap all the probabilities around, as I did at 6:12). The cross-entropy is 4.58 when the weather conditions are reversed, and that's roughly double the entropy (2.23*2=4.46). Third question: suppose there are four weather conditions, and my code uses four types of messages to communicate the weather: 00, 01, 10, 11. When will this code be optimal? The answer is: when the 4 possible weather conditions are equally likely, each with a 25% chance every day. Ideally (in order for the code to be optimal), every message should encode a weather condition whose probability is equal to 1/2^m where m is the message length (in this case 2 bits, so the optimal probability is 1/4=25%). If there are just 3 possible weather conditions, and I just the messages 00, 01 and 10 to communicate them, then the "optimal probabilities" of these messages will add up to 75%. There's no way I can use this code and get KL divergence equal to 0 (i.e., I will necessarily have cross-entropy > entropy), no matter what the distribution of weather conditions are. But if I use the messages 0, 10 and 11, then the corresponding "optimal probabilities" are 50%, 25% and 25%, so this code *can* be optimal if I'm lucky and the 3 weather conditions also has a distribution of 50%, 25% and 25%. Fourth question: indeed, they look different, but they are equivalent. The optimal probability of the event that a message encodes is q=1/2^m, as explained above. The negative binary log of that probability is thus simply m. So if the corresponding weather condition occurs with a probability p, then - p * log2(q) = p*m. At 6:00, I used p*m, and at 7:25, I used -p*log2(q). Hope this helps!

@fahdciwan8709 4 жыл бұрын

phew !! as newbie to Machine Learning without a background in maths this video saved me, else i never expected to grasp the Entropy concept

@shannenlam679 4 жыл бұрын

Great vid, such clarity. Just like to check one point at 10:00, isn't -log(0.25) = 2 ? (considering for log base 2)

@AurelienGeron 3 жыл бұрын

Yes, sorry for the confusion, I am using the natural logarithm here, instead of the binary logarithm (or the decimal logarithm). In Machine Learning, the natural logarithm is often used instead of the binary log when computing the cross-entropy. It doesn't change much since the natural log is equal to the binary log times a constant: ln(x) = log2(x) * ln(2). So it makes little difference for optimization: minimizing ln(x) will also minimize log2(x).

@bingeltube 6 жыл бұрын

Very recommendable! Finally, I found someone who could explain these concepts of entropy, cross entropy in very intuitive ways

@YYchen713 2 жыл бұрын

Fantastic video! Now all the dots are connected! I have used loss function for NN machine learning, but not knowing the math behind it! This is so enlightening!

@areejabdu3125 6 жыл бұрын

this explanation really helps the learner in understanding such vague scientific concepts, thanx for the clear explanation !!

@shuodata 5 жыл бұрын

Best Entropy and Cross-Entropy explanation I have ever seen

@williamdroz6890 6 жыл бұрын

Merci Aurélien. Is cross-entropy as a loss function always a better choice than a simple euclidean distance? (for classification in ML).

@AurelienGeron 6 жыл бұрын

I'm careful not to say "always" as I'm sure people can find some counter-examples, but I'll risk saying "almost always". :) In practice cross-entropy converges faster than simple euclidian distance (or mean squared error, or mean absolute error, etc.) on most real-life classification problems. This is largely due to the fact that it has very strong gradients, especially when the classifier is both wrong and highly confident, whereas euclidian distance would not penalize bad mistakes too much.

@williamdroz6890 6 жыл бұрын

Thank you for the quick answer, I understand it better now :)

@rcphillips88 6 жыл бұрын

Thanks for the video! I wanted to clarify something. In this example, we are taking the output of the weather report as 100% correct, right? If there's a 25% chance of rain, and the weather station is correct 70% of the time, I imagine things get more complex in terms of bits transmitted?

@AurelienGeron 6 жыл бұрын

Great question! Indeed, in this video I assume that the weather station is always 100% correct, and that the communications are perfect: whatever data is sent by the station is received by the recipient (no noise, no loss). Shannon's theory covers these cases of course, but I wanted to keep things simple.

@unleasedflow8532 3 жыл бұрын

Nicely conveyed what is to be learned about the topic. I think I absorbed all the way. Best tutorial, keep dropping video like this.

@deteodskopje 4 жыл бұрын

Very nice. Really short yet clearly grasping the point of these concepts. Subscribed. I was really excited when I found this chanel. I mean the book Hands On Machine Learning is maybe the best book you can find these days

@matthewwilson2688 6 жыл бұрын

This is the best explanation of entropy and KL I have found. Thanks

@LC-lj5kd 6 жыл бұрын

ur tutorial is always invincible. quite explicit with great examples. Thanks for ur work

@julioreyram 3 жыл бұрын

I'm amazed by this video, you are a gifted teacher.

@sajil36 5 жыл бұрын

Dear Aurélien Géron, I have the following questions. It would be great if you can answer these also. 1. How about continuous systems where the number of states possible is not discrete. Is it possible to use entropy in such cases? 2. What if we have no idea about the probability distribution of the weather states? In such case how can we assign more bits to rare events and less number of bits to frequent events?. 3. In cross-entropy calculation, the same number of bits for each state is assumed rather than varying number of bits (more bits to rare events and less number of bits to frequent events) why?

Жыл бұрын

the best video on cross entropy on youtube so far

@sagarsaxena7202 5 жыл бұрын

Great work in the explanation. I have been pretty confused with this concept and the implication of Information theory with ML. This video does the trick in clarifying the concepts while providing a sync between information theory and ML usage. Thanks much for the video.

@GuilhermeKodama 5 жыл бұрын

the best explanation I ever had about the topic. It was really insightful.

@sanjaykrish8719 4 жыл бұрын

Aurelien has a knack for making things simpler. Check out his Deep leaning using TensorFlow course in Udacity. It's amazing.

@romanmarakulin7448 5 жыл бұрын

Thank you so much! Not only it helped me understand KL-Divergence, but also it is helpful to remember the formula. From now I will place signs in right places. Keep it up!

@CowboyRocksteady Жыл бұрын

i'm loving the slides and explaination. I noticed the name in the corner and thought, oh nice i know that name. then suddenly... It's the author of that huge book i love!

@salman3112 6 жыл бұрын

Your channel has become one of my favorite channels. Your explanation of CapsNet and now this is just amazing. I am going to get your book too. Thanks a lot. :)

@ramonolivier57 4 жыл бұрын

Excellent explanation and discussion. Thank you very much!!

@laura_uzcategui 5 жыл бұрын

Really good explanation, the visuals were also great for understanding! Thanks Aurelien.

@akshiwakoti7851 4 жыл бұрын

Hats off! One of the best teachers ever! This definitely helped me better understand it both mathematically and intuitively just in a single watch. Thanks for reducing my 'learning entropy'. My KL divergence on this topic is near zero now. ;)

@elvisng1977 2 жыл бұрын

This video is so clear and so well explained, just like his book!

@DEEPAKKUMAR-sk5sq 5 жыл бұрын

Please do a video on 'PAC learning'. It seems very complex. Your way of explanation can make it easy!!

@mohamadnachabe1 5 жыл бұрын

This was the best intuitive explanation of entropy and cross entropy I've seen. Thanks!

@khaledelsayed762 2 жыл бұрын

Very elegant indicating how cognizant the presenter is.

@JoeVaughnFarsight Жыл бұрын

Merci Aurélien Géron, c'était une très belle présentation !

@m.y.s4260 4 жыл бұрын

5:12 A tiny typo: the entropy should have a negative sign

@sufyansiddiqui6130 6 жыл бұрын

Hi Aurellen , this is really an awesome video. Subscribed your channel to watch more. Quick question , if there is true distribution known then why we are predicting the distribution for log? Is it hypothesized true distribution?

@samzhao1827 4 жыл бұрын

Very few of people can explain like you to be honest! I read so many decision tree tutorial and they are actually talking the same thing(information gain), but after I reading their articles I got 0 understanding still, big thanks to this video!

@saraths9044 2 жыл бұрын

Great video, I do have a doubt though, if the weather forecast says it is going to be raining today, our uncertainty actually reduces to zero right??

@leastactionlab2819 4 жыл бұрын

Great video to learn interpretations of the concept of cross-entropy.

@gowthamramesh2443 5 жыл бұрын

Kinda feels like 3Blue1Brown's version of Machine learning Fundamentals. Simply Amazing

@AurelienGeron 5 жыл бұрын

Thanks a lot, I'm a huge fan of 3Blue1Brown! 😊

@sarvagyagupta1744 3 жыл бұрын

So correct me if I'm wrong, but is number of bits equivalent to how "valuable" the information is? Like in a completely uncertain system, each information is as valuable. But in a biased system, the information that is less likely is more valuable than the other. Can we say that?

@Darkev77 3 жыл бұрын

Yes, that's a good way to put it, and I believe it's quite logical; if someone tells you that the sun will rise from the east, that's not much information since it's well-known (very probable / high-bias), but if someone tells you it's rising from the west, now that's very valuable information (since very unlikely to be true).

@meerkatj9363 6 жыл бұрын

I've seen all your videos now. You've taught me a lot of things and this was some good moments. Can't wait for more. Thanks so much

@-long- 5 жыл бұрын

Guys this is the best explanation on Entropy , Cross-Entropy and KL-Divergence.

@DiogoSanti 4 жыл бұрын

Awesome video! Hope you deliver more content here very soon!

@klingefjord 3 жыл бұрын

I don't quite get where the predicted distribution comes from at 7:08. Why are we implicitly assuming it will be sunny every four days because we're using a two bit message?

@OmriHarShemesh 6 жыл бұрын

I really enjoyed your book and these videos! Keep them coming! Even though some part of my PhD had to do with Information Theory I enjoyed the way you explain IT and Cross Entropy in a very practical way. Helped understand why it is used in machine learning the way it is. Looking forward for more great videos (and maybe a second book?)!

@AurelienGeron 6 жыл бұрын

Thanks Omri, I'm glad you enjoyed the book & videos. :) I recently watched a great series of videos by Grant Sanderson (3Blue1Brown) about the Fourier Transform, and I loved the way he presents the topic: I thought I already knew the topic reasonably well, but it's great to see it from a different angle. Cheers!

@OmriHarShemesh 6 жыл бұрын

Yes, the Fourier transform is a fascinating and multifaceted topic ;) In physics we use it very often for very surprising reasons. I'm looking for a book similar to yours which focuses specifically on NLP with Python and is very well written and modern. Do you have any recommendations? Thanks! Omri

@MrFurano 6 жыл бұрын

To-the-point and intuitive explanation and examples! Thank you very much! Salute to you!

@Amin-nh4wg 6 жыл бұрын

Thank you so much for this amazing video. I'm having a hard time understanding your statement in 7:00 about the implicitly assigned probability of (of 25%) to sunny weather. Why so and how does the number of bits used to represent certain messages relate to the probability of the event occurring?

@christoph3333 5 жыл бұрын

Okay, I probably got this wrong but let me try: The message length for sunny weather is 2 bits. Every bit of a sent message should correspond to a bit of useful information if my coding scheme is ideal. So ideally my 2 bit message should divide the recipient's uncertainty by a factor of 2² = 4. This is only the case if the probability for sunny weather is 1/4 = 0.25.

@Yu-nu3fu 3 жыл бұрын

Thank you! I have one question: why there is no need to normalize the predicted distribution (make it to be the probability distribution) when calculating the cross entropy? What about the multi-label classification?

@davidbeauchemin3046 6 жыл бұрын

Awesome video, you made the concept of entropy so much clearer.

@ashutoshnirala5965 4 жыл бұрын

Thankyou for such a wonderful and to the point video. Now I know: Entropy, Cross Entropy, KL Divergence and also why cross entropy is such a good choice as loss function.

@pyeleon5036 6 жыл бұрын

It's so good to watch your video! Thank you so much!

@CosmiaNebula 5 жыл бұрын

I feel _triggered_ that you didn't use huffman code at 6:06. It would have made the sum of predicted distribution equal to 100% at 7:04. Otherwise this video is 10/10 good job.

@vman049 4 жыл бұрын

If we had used Huffman coding, is it still possible to reduce the cross-entropy any further?

@danyalkhaliq915 4 жыл бұрын

super clear .. never I heard this explanation of Entropy and Cross Entropy !

@misnik1986 3 жыл бұрын

Thank you so much Monsieur Geron pour cette explication simple et limpide

@srikumarsastry7473 6 жыл бұрын

So much clear explanation! Need more of them!

@michaelding5970 4 жыл бұрын

The best explanation I've seen on this topic.

@ilanaizelman3993 5 жыл бұрын

Thanks, for people who are looking for ML explanation: Cross-Entropy is computed with -log(0.25)

@PerisMartin 6 жыл бұрын

Your explanations are so much better than other "famous" ML vloggers (... looking at you Siraj Raval!). You truly know what you are talking about, even my grandma could understand this!! Subscribed, liked and belled. More, please!

@AurelienGeron 6 жыл бұрын

Thanks Martin, I'm glad you enjoyed this presentation! My agenda is packed, but I'll do my best to upload more videos asap. :)

@SoroushNilton Жыл бұрын

Hi Aurelien, amazing video. I am also reading that Hands on ML and its so good. However, in this video I could not wrap my head around something. As you said, we use log base 10 for cross entropy, however, the log base 10 of 0.25 is not what you have written (1.386). the only way I could get to 1.386 was to take ln(log base e) of 0.25, which was not explained in the video.