Amazing content! I almost didn't click on the video because of the title "Intro to ml sd" but I'm glad I watched it to learn about the complexities of the facebook-friends recommender design. I came for an intro but got the real content I was seeking. Thanks!
@jacquelinecroftoon20718 күн бұрын
Hernandez Kenneth Walker Jessica Walker Lisa
@jacquelinecroftoon20719 күн бұрын
Perez Dorothy Perez Paul Martin Melissa
@bhaskartripathi11 күн бұрын
Great video. But I would have loved if you had spent a minute also on why float32 vs bfloat16 is applied in backpropagation. But the video is still brilliant as always!
@chrisogonas11 күн бұрын
Incredibly useful! Thanks Damien.
@Zoronoa0117 күн бұрын
Thank you for this great explanation!
@AnshumanAwasthi-kd7qxАй бұрын
I clicked to learn , but you're speaking Embeddings. Speak English 😂
@MahSan-nv4jvАй бұрын
Solid explanation and well explored on the problem. Please share diverse problems when possible. Thank you so much
@rachadlakis1Ай бұрын
Thanks
@subodhsharma5097Ай бұрын
Very Insightful!
@nikhilshaganti5585Ай бұрын
Thank you for the video. In the code at 06:20, shouldn't we deduct 1 from cumcount to make sure we are not counting the current row?
@Pedritox0953Ай бұрын
Great video!
@WillMoody-crmstormАй бұрын
Holy moly. Thank you. I thought these concepts were beyond me until watching this video. You have a serious gift for explanation
@beincheekym8Ай бұрын
thank you for the clear and concise video!
@madhu819-j6o2 ай бұрын
how to convert a decimal number to a bfloat16 format in verilog
@TheMLTechLead2 ай бұрын
I have no idea!
@bougfou9722 ай бұрын
wow, very clear explanation. THank you very much for this format (much more clear than medium article)
@math_in_cantonese2 ай бұрын
I have a question, for pos=0 and "horizontal_index"=2, shouldn't it be PE(pos,2) = sin(pos/10000^(2/d_model)) ? I believe you used the same symbol "i" for 2 different way of indexing, right ? 7:56
@TheMLTechLead2 ай бұрын
Yeah you are right, I realized I made that mistake. I need to reshoot it.
@AlainDrolet-e4z9 күн бұрын
Thank you Damien, and math_in_cantonese I'm in the middle of writing a short article discussing position encoding. Damien, feel proud that you are the first reference I quote in the article! I was just going crazy trying to nail the exact meaning of "i". In Damien's video it is clear he means "i" the dimension index, and the values shown with sin/cos match. But now I could not make any logic of this understanding with the equation formulation below: PE(pos,2i) = sin(pos/10000^2i/dmodel) PE(pos,2i+1) = cos(pos/10000^2i/dmodel) If see this as PE(pos, 0) referreing to the first column (column zero) and, say, PE(pos,5) as referring to the sixth column (column 5), with 5 = 2i+1 => i = (5-1)/2 = 2. So "i" is more like the index of a (sin,cos) pair of dimensions. Its range is d_model/2. The original sin (😄, pun intended) is in the Attention is all you need. There they simply state: > where pos is the position and i is the dimension This is wrong, it seems, 2i and 2i+1 are the dimensions. In any case big thank you Damien, I have watched, many of your videos. They are quite useful in ramping me up on LLM and the rest. Merci beaucoup Alain
@TemporaryForstudy2 ай бұрын
nice. but i have one doubt. like how adding sine and cosine values ensuring that we are encoding the positions. like how did the author come to this conclusion why not other values?
@TheMLTechLead2 ай бұрын
The sine and cosine functions provide smooth and continuous representations, which help in learning the relative positions effectively. For example, the encoding for positions k and k+1 will be similar, reflecting their proximity in the sequence. The frequency-based sinusoidal functions allow the encoding to generalize to sequences of arbitrary length without needing to re-learn positional information for different sequence lengths. The model can understand relative positions beyond the length of sequences seen during training. The combination of sine and cosine functions ensures that each position has a unique encoding. The orthogonality property of these functions helps in distinguishing between different positions effectively, even for long sequences. The different frequencies used in the positional encodings allow the model to capture both short-term and long-term dependencies within the sequence. Higher frequency components help in understanding local relationships, while lower frequency components help in capturing global structures. Also, sinusoidal functions are differentiable, which is crucial for backpropagation during training. This ensures that the model can learn to use the positional encodings effectively through gradient-based optimization methods.
@do-yeounlee72022 ай бұрын
Thanks for the clear explanation. I've watched a few of your videos and follow you on LinkedIn, and I can say that you're killing it brother. Also love the simplicity in your infographics that you have in your videos. Do you get them from elsewhere or do you make it yourself?
@TheMLTechLead2 ай бұрын
I make them myself. Takes me most of my time!
@do-yeounlee72022 ай бұрын
@@TheMLTechLead Respect! What do you use to make them?
@TheMLTechLead2 ай бұрын
@@do-yeounlee7202 I use canva.com
@karnaghose47842 ай бұрын
Great explanation 👍🏻
@bassimeledath22242 ай бұрын
Excellent. Good ML system design videos are hard to find on KZbin so really appreciate this!
@TheMLTechLead2 ай бұрын
Glad it was helpful!
@adityagupta44652 ай бұрын
Really well explained. You've earned a subscriber 🎉
@passportkaya3 ай бұрын
not really. I'm a US citizen been all over Europe. I say it's the same .
@TheMLTechLead3 ай бұрын
How long have you lived in Europe and what countries exactly?
@sebastianguerrero56263 ай бұрын
nice content, keep it up!
@TheMLTechLead3 ай бұрын
Thanks, will do!
@EmpreendedoresdoBEM3 ай бұрын
very clear explanation. thanks
@naatcollections79763 ай бұрын
I like your channel
@TheMLTechLead3 ай бұрын
Thank you!
@godzilllla24523 ай бұрын
I've got it now. I wonder why we can't calculate the x gradient by starting the backward pass closer to x instead of going through all the activations.
@TheMLTechLead3 ай бұрын
I am not sure I understand the question.
@mateuszsmendowski26773 ай бұрын
One of the best explanations on KZbin. Substantively and visually at the highest level :) Are you able to share those slides e.g. via Git?
@TheMLTechLead3 ай бұрын
I cannot share the slide but you can see the diagrams in my newsletter: newsletter.theaiedge.io/p/understanding-the-self-attention
@zeeshankhanyousafzai52293 ай бұрын
❤
@milleniumsalman19843 ай бұрын
too good
@milleniumsalman19843 ай бұрын
great video
@milleniumsalman19843 ай бұрын
good video
@Snerdy08673 ай бұрын
Phenomenal visuals and explanations. Best video on this concept I've ever seen.
@TheMLTechLead3 ай бұрын
I am liking reading that!
@IkhukumarHazarika3 ай бұрын
Is it rnn 😅
@IkhukumarHazarika3 ай бұрын
Love the way you teach every point please start teaching this way
@IkhukumarHazarika3 ай бұрын
More good content indeed good one❤
@AbuzarbhuttaG3 ай бұрын
💯💯💯
@faysoufox3 ай бұрын
Thank you for your videos
@math_in_cantonese3 ай бұрын
I will use your videos as interview refresher....... It is so easy to forget about the details when everyday work floods in for a period of years.
@TheMLTechLead3 ай бұрын
I am glad to read that!
@math_in_cantonese3 ай бұрын
Thanks, I forgot some details about Gradient Boosted Algorithm and I was too lazy to look it up.
@vivek23193 ай бұрын
Please make more videos
@TheMLTechLead3 ай бұрын
Well I do!
@jairjuliocc3 ай бұрын
Thanks You.Can you explain the entire self attention flow? (from postional encode to final next word prediction). I think it will be an entire series 😅
@TheMLTechLead3 ай бұрын
It is coming! It will take time
@CrypticPulsar3 ай бұрын
Thank you, Damien!!
@va9403 ай бұрын
Very good advice ❤
@va9403 ай бұрын
Awesome
@elmoreglidingclub30303 ай бұрын
Excellent!! Very good explanation. I need to work on my ear for French. But pausing and backing up the video helped. Great stuff!!
@TheMLTechLead3 ай бұрын
My accent + my speaking skills are my weaknesses. Working on it and I think I am improving!
@elmoreglidingclub30303 ай бұрын
@@TheMLTechLead Thanks for your reply but absolutely no apology necessary!! I think it is an excellent video and helpful information. Much appreciation for posting. I am a professor in a business school and always looking for insights into how to teach the technical side of technology in the context of business. Your explanation has been very helpful.
@Gowtham253 ай бұрын
It's really good and usefull... Expecting for training an llm from the scratch for the next and interested in KAN-FORMER...
@astudent88853 ай бұрын
ML is a black box but boosting seems to be more interpretable (potentially) if we can make the trees more sparse and orthogonal
@TheMLTechLead3 ай бұрын
Tree-based method can naturally be used to measure Shapley values without approximation: shap.readthedocs.io/en/latest/tabular_examples.html
@astudent88853 ай бұрын
Do you mean that the new tree is predicting the error? In that case, wouldn't you subtract the new prediction from the previous predictions
@TheMLTechLead3 ай бұрын
So we have an ensemble of trees F that predicts y such that F(x) = \hat{y}. The error is y - F(x) = e. We want to add a tree that predicts the error T(x) = \hat{e} = e + error = y - F(x) + error. Therefore F(x) + T(x) = y + error