The ML Tech Lead!

Пікірлер

@SterlingRamroach 5 күн бұрын

Amazing content! I almost didn't click on the video because of the title "Intro to ml sd" but I'm glad I watched it to learn about the complexities of the facebook-friends recommender design. I came for an intro but got the real content I was seeking. Thanks!

@jacquelinecroftoon2071 8 күн бұрын

Hernandez Kenneth Walker Jessica Walker Lisa

@jacquelinecroftoon2071 9 күн бұрын

Perez Dorothy Perez Paul Martin Melissa

@bhaskartripathi 11 күн бұрын

Great video. But I would have loved if you had spent a minute also on why float32 vs bfloat16 is applied in backpropagation. But the video is still brilliant as always!

@chrisogonas 11 күн бұрын

Incredibly useful! Thanks Damien.

@Zoronoa01 17 күн бұрын

Thank you for this great explanation!

@AnshumanAwasthi-kd7qx Ай бұрын

I clicked to learn , but you're speaking Embeddings. Speak English 😂

@MahSan-nv4jv Ай бұрын

Solid explanation and well explored on the problem. Please share diverse problems when possible. Thank you so much

@rachadlakis1 Ай бұрын

Thanks

@subodhsharma5097 Ай бұрын

Very Insightful!

@nikhilshaganti5585 Ай бұрын

Thank you for the video. In the code at 06:20, shouldn't we deduct 1 from cumcount to make sure we are not counting the current row?

@Pedritox0953 Ай бұрын

Great video!

@WillMoody-crmstorm Ай бұрын

Holy moly. Thank you. I thought these concepts were beyond me until watching this video. You have a serious gift for explanation

@beincheekym8 Ай бұрын

thank you for the clear and concise video!

@madhu819-j6o 2 ай бұрын

how to convert a decimal number to a bfloat16 format in verilog

@TheMLTechLead 2 ай бұрын

I have no idea!

@bougfou972 2 ай бұрын

wow, very clear explanation. THank you very much for this format (much more clear than medium article)

@math_in_cantonese 2 ай бұрын

I have a question, for pos=0 and "horizontal_index"=2, shouldn't it be PE(pos,2) = sin(pos/10000^(2/d_model)) ? I believe you used the same symbol "i" for 2 different way of indexing, right ? 7:56

@TheMLTechLead 2 ай бұрын

Yeah you are right, I realized I made that mistake. I need to reshoot it.

@AlainDrolet-e4z 9 күн бұрын

Thank you Damien, and math_in_cantonese I'm in the middle of writing a short article discussing position encoding. Damien, feel proud that you are the first reference I quote in the article! I was just going crazy trying to nail the exact meaning of "i". In Damien's video it is clear he means "i" the dimension index, and the values shown with sin/cos match. But now I could not make any logic of this understanding with the equation formulation below: PE(pos,2i) = sin(pos/10000^2i/dmodel) PE(pos,2i+1) = cos(pos/10000^2i/dmodel) If see this as PE(pos, 0) referreing to the first column (column zero) and, say, PE(pos,5) as referring to the sixth column (column 5), with 5 = 2i+1 => i = (5-1)/2 = 2. So "i" is more like the index of a (sin,cos) pair of dimensions. Its range is d_model/2. The original sin (😄, pun intended) is in the Attention is all you need. There they simply state: > where pos is the position and i is the dimension This is wrong, it seems, 2i and 2i+1 are the dimensions. In any case big thank you Damien, I have watched, many of your videos. They are quite useful in ramping me up on LLM and the rest. Merci beaucoup Alain

@TemporaryForstudy 2 ай бұрын

nice. but i have one doubt. like how adding sine and cosine values ensuring that we are encoding the positions. like how did the author come to this conclusion why not other values?

@TheMLTechLead 2 ай бұрын

The sine and cosine functions provide smooth and continuous representations, which help in learning the relative positions effectively. For example, the encoding for positions k and k+1 will be similar, reflecting their proximity in the sequence. The frequency-based sinusoidal functions allow the encoding to generalize to sequences of arbitrary length without needing to re-learn positional information for different sequence lengths. The model can understand relative positions beyond the length of sequences seen during training. The combination of sine and cosine functions ensures that each position has a unique encoding. The orthogonality property of these functions helps in distinguishing between different positions effectively, even for long sequences. The different frequencies used in the positional encodings allow the model to capture both short-term and long-term dependencies within the sequence. Higher frequency components help in understanding local relationships, while lower frequency components help in capturing global structures. Also, sinusoidal functions are differentiable, which is crucial for backpropagation during training. This ensures that the model can learn to use the positional encodings effectively through gradient-based optimization methods.

@do-yeounlee7202 2 ай бұрын

Thanks for the clear explanation. I've watched a few of your videos and follow you on LinkedIn, and I can say that you're killing it brother. Also love the simplicity in your infographics that you have in your videos. Do you get them from elsewhere or do you make it yourself?

@TheMLTechLead 2 ай бұрын

I make them myself. Takes me most of my time!

@do-yeounlee7202 2 ай бұрын

@@TheMLTechLead Respect! What do you use to make them?

@TheMLTechLead 2 ай бұрын

@@do-yeounlee7202 I use canva.com

@karnaghose4784 2 ай бұрын

Great explanation 👍🏻

@bassimeledath2224 2 ай бұрын

Excellent. Good ML system design videos are hard to find on KZbin so really appreciate this!

@TheMLTechLead 2 ай бұрын

Glad it was helpful!

@adityagupta4465 2 ай бұрын

Really well explained. You've earned a subscriber 🎉

@passportkaya 3 ай бұрын

not really. I'm a US citizen been all over Europe. I say it's the same .

@TheMLTechLead 3 ай бұрын

How long have you lived in Europe and what countries exactly?

@sebastianguerrero5626 3 ай бұрын

nice content, keep it up!

@TheMLTechLead 3 ай бұрын

Thanks, will do!

@EmpreendedoresdoBEM 3 ай бұрын

very clear explanation. thanks

@naatcollections7976 3 ай бұрын

I like your channel

@TheMLTechLead 3 ай бұрын

Thank you!

@godzilllla2452 3 ай бұрын

I've got it now. I wonder why we can't calculate the x gradient by starting the backward pass closer to x instead of going through all the activations.

@TheMLTechLead 3 ай бұрын

I am not sure I understand the question.

@mateuszsmendowski2677 3 ай бұрын

One of the best explanations on KZbin. Substantively and visually at the highest level :) Are you able to share those slides e.g. via Git?

@TheMLTechLead 3 ай бұрын

I cannot share the slide but you can see the diagrams in my newsletter: newsletter.theaiedge.io/p/understanding-the-self-attention

@zeeshankhanyousafzai5229 3 ай бұрын

❤

@milleniumsalman1984 3 ай бұрын

too good

@milleniumsalman1984 3 ай бұрын

great video

@milleniumsalman1984 3 ай бұрын

good video

@Snerdy0867 3 ай бұрын

Phenomenal visuals and explanations. Best video on this concept I've ever seen.

@TheMLTechLead 3 ай бұрын

I am liking reading that!

@IkhukumarHazarika 3 ай бұрын

Is it rnn 😅

@IkhukumarHazarika 3 ай бұрын

Love the way you teach every point please start teaching this way

@IkhukumarHazarika 3 ай бұрын

More good content indeed good one❤

@AbuzarbhuttaG 3 ай бұрын

💯💯💯

@faysoufox 3 ай бұрын

Thank you for your videos

@math_in_cantonese 3 ай бұрын

I will use your videos as interview refresher....... It is so easy to forget about the details when everyday work floods in for a period of years.

@TheMLTechLead 3 ай бұрын

I am glad to read that!

@math_in_cantonese 3 ай бұрын

Thanks, I forgot some details about Gradient Boosted Algorithm and I was too lazy to look it up.

@vivek2319 3 ай бұрын

Please make more videos

@TheMLTechLead 3 ай бұрын

Well I do!

@jairjuliocc 3 ай бұрын

Thanks You.Can you explain the entire self attention flow? (from postional encode to final next word prediction). I think it will be an entire series 😅

@TheMLTechLead 3 ай бұрын

It is coming! It will take time

@CrypticPulsar 3 ай бұрын

Thank you, Damien!!

@va940 3 ай бұрын

Very good advice ❤

@va940 3 ай бұрын

Awesome

@elmoreglidingclub3030 3 ай бұрын

Excellent!! Very good explanation. I need to work on my ear for French. But pausing and backing up the video helped. Great stuff!!

@TheMLTechLead 3 ай бұрын

My accent + my speaking skills are my weaknesses. Working on it and I think I am improving!

@elmoreglidingclub3030 3 ай бұрын

@@TheMLTechLead Thanks for your reply but absolutely no apology necessary!! I think it is an excellent video and helpful information. Much appreciation for posting. I am a professor in a business school and always looking for insights into how to teach the technical side of technology in the context of business. Your explanation has been very helpful.

@Gowtham25 3 ай бұрын

It's really good and usefull... Expecting for training an llm from the scratch for the next and interested in KAN-FORMER...

@astudent8885 3 ай бұрын

ML is a black box but boosting seems to be more interpretable (potentially) if we can make the trees more sparse and orthogonal

@TheMLTechLead 3 ай бұрын

Tree-based method can naturally be used to measure Shapley values without approximation: shap.readthedocs.io/en/latest/tabular_examples.html

@astudent8885 3 ай бұрын

Do you mean that the new tree is predicting the error? In that case, wouldn't you subtract the new prediction from the previous predictions

@TheMLTechLead 3 ай бұрын

So we have an ensemble of trees F that predicts y such that F(x) = \hat{y}. The error is y - F(x) = e. We want to add a tree that predicts the error T(x) = \hat{e} = e + error = y - F(x) + error. Therefore F(x) + T(x) = y + error

@siddharthsingh7281 3 ай бұрын

share the resources in description

@MCroppered 3 ай бұрын

Why

@MCroppered 3 ай бұрын

“Give me the exam solutions pls”

Ең жақсы KZbin

Пікірлер