And since B is the max element, this justifies the interpretation of the log-sum-exp as a 'smooth max operator'
@aditya39846 күн бұрын
Really well explained, thanks.
@aditya39846 күн бұрын
Great video.
@RuthClark-f1j10 күн бұрын
Drake Forges
@awenzhi10 күн бұрын
I'm confused about the deriative of a vector function at 5:40, i think the gradient of a function f:Rn→Rm should be a matrix of size m×n. not sure about it
@BradleyCooper-f1n17 күн бұрын
Brown Paul Wilson Deborah Harris Shirley
@andrefreitas993618 күн бұрын
2:59 actually the pseudo algo you are using is 0 index.
@Gwittdog19 күн бұрын
Wonderful Lecture. Thank you
@warpdrive922921 күн бұрын
Cuz world wars!
@SarahSanchez-b2w22 күн бұрын
Allen Shirley Miller Elizabeth Thomas Linda
@RahulSinghChhonkar22 күн бұрын
For AnyOne having any doubts in relation bt NLL abd Cross entropy . this is a must watch !!!
@Josia-p5m24 күн бұрын
This helped a lot. Fantastic intuitive explanation.
@kamperh24 күн бұрын
Super happy that it helped! :)
@EdwardHernandez-l4z25 күн бұрын
Hall Margaret Jones Angela Wilson Larry
@nschweiz126 күн бұрын
Great video series! The algorithm video was the one that finally got me to "get" DTW!
@JoellaAlberty-z5c26 күн бұрын
Lopez Sharon Davis George Taylor Laura
@adityasonale1608Ай бұрын
Your content is amazing !!!
@kamperhАй бұрын
Thanks Aditya!
@tgzhu3258Ай бұрын
So if I have a list of categorical inputs, where the order indeed imply their closeness, then I should not use one-hot encoding, but just use numerical values to represent the categories, is that right?
@tgzhu3258Ай бұрын
love this series! you explained the concepts really well and dive into details!
@kamperhАй бұрын
So super grateful for the positive feedback!!!
@viswanathvuppala4526Ай бұрын
You look like Benedict Cumberbatch
@kamperhАй бұрын
The nicest thing that anyone has ever said!
@molebohengmokapane3311Ай бұрын
Thanks for posting Herman, super insightful!
@kamperhАй бұрын
Thanks a ton for the feedback! :)
@Alabsi3AАй бұрын
You are good
@kamperhАй бұрын
The nicest thing anyone has ever said ;)
@cuongnguyenuc1776Ай бұрын
Awesome, very great video!!
@pleasebittАй бұрын
I am not a student at your university, but I am glad that you are such a good prof.
@kamperhАй бұрын
Very happy you find this helpful!! 😊
@rahilnecefov20182 ай бұрын
I learned a lot as an Azerbaijani student. Thanks a lot <3
@rrrmil2 ай бұрын
Really great explanations. I also really like your calm way of explaining things. I get the feeling that you distill everything important before recording the video. Keep up the great work!
@kamperh2 ай бұрын
Thanks a ton for this!! I enjoy making the videos, but it definitely takes a bit of time :)
@liyingyeo59202 ай бұрын
Thank you
@rahilnecefov20182 ай бұрын
bro just keep teaching, that is great!
@josephengelmeier98562 ай бұрын
These videos are sorely underrated. Your explanations are concise and clear, thank you for making this topic so easy to understand and implement. Cheers from Pittsburgh.
@kamperh2 ай бұрын
Thanks so much for the massive encouragement!!
@Aruuuq2 ай бұрын
Working in NLP myself, I very much enjoy your videos as a refresher of the current ongoings. Continuing from your epilogue, will you cover the DPO process in detail?
@kamperh2 ай бұрын
Thanks for the encouragement @Aruuuq! Jip I still have one more video in this series to make (hopefully next week). It won't explain every little detail of the RL part, but hopefully the big stuff.
@OussemaGuerriche2 ай бұрын
your way of explanation is very good
@shylilak2 ай бұрын
Thomas 🤣
@MuhammadSqlain2 ай бұрын
good sir
@TechRevolutionNow2 ай бұрын
thank you very much professor.
@ozysjahputera76692 ай бұрын
One of the best explanations on PCA relationship with SVD!
@martinpareegol52633 ай бұрын
Why is it prefered to solve the problem as minimize the cross entropy over minimize de NLL? Are there more efficient properties doing that?
@chetterhummin14823 ай бұрын
Thank you, really great explanation, I think I can understand it now.
@zephyrus13333 ай бұрын
Thanks for lecture.
@adosar72613 ай бұрын
With regards to the clock analogy (0:48): "If you know where you are on the clock then you will know where you are in the input". Why not just a single clock with very small frequency? A very small frequency will guarantee that even for large sentences there will be no "overlap" at the same position in the clock for different positions in the input.
@ex-pwian11903 ай бұрын
The best explanation!
@frogvonneumann97613 ай бұрын
Great explanation!! Thank you so much for uploading!
@Le_Parrikar3 ай бұрын
Great video. That meow from the cat though
@kobi9813 ай бұрын
Thanks ! great video
@harshadsaykhedkar15154 ай бұрын
This is one of the better explanations of how the heck we go from maximum likelihood to using NLL loss to log of softmax. Thanks!
@shahulrahman25164 ай бұрын
Great Explanation
@shahulrahman25164 ай бұрын
Thank you
@yaghiyahbrenner89024 ай бұрын
Sticking to a simple Git workflow is beneficial, particularly using feature branches. However, adopting a 'Gitflow' working model should be avoided as it can become a cargo cult practice within an organization or team. As you mentioned, the author of this model has reconsidered its effectiveness. Gitflow can be cognitively taxing, promote silos, and delay merge conflicts until the end of sprint work cycles. Instead, using a trunk-based development approach is preferable. While this method requires more frequent pulls and daily merging, it ensures that everyone stays up-to-date with the main branch.
@kamperh4 ай бұрын
Thanks a ton for this, very useful. I think we ended up doing this type of model anyway. But good to know the actual words to use to describe it!
@basiaostaszewska77754 ай бұрын
It very clear explanation, thank you very much!
@bleusorcoc10804 ай бұрын
Does this algorithm work with negative instance? I mean can i use vectors with both negative and postive values?
@kundanyalangi29224 ай бұрын
Good explanation. Thank you Herman
@niklasfischer31464 ай бұрын
Hello Herman, first of all a very informative video! I have a question: How are the weight matrices defined? Are the matrices simply randomized in each layer? Do you have any literature on this? Thank you very much!
@kamperh4 ай бұрын
This is a good question! These matices will start out being randomly initialised, but then -- crucially -- they will be updated through gradient descent. Stated informally, each parameter in each of the matrices will be wiggled so that the loss goes down. Hope that makes sense!