The Most Important Machine Learning Algorithm (ML in C Ep.03)

Рет қаралды 48,983

Tsoding Daily

Күн бұрын

Пікірлер: 102

@devon_eggproduct Жыл бұрын

"What's the point of going fast if you don't know the direction?" seriously amazing advice!

@cedricvillani8502 Жыл бұрын

That's what makes racing so much fun

@Verrisin Жыл бұрын

running away

@rafagd Жыл бұрын

He's stocastically gradients himself.

@lolcat69 10 ай бұрын

hearing this just after I've fallen of my skate lmfao, actually a good advice

@flobuilds Жыл бұрын

4 hours of fun let's go

@eboubaker3722 Жыл бұрын

Go where, go where??

@Humble_Electronic_Musician 11 ай бұрын

My Favorite quote: 2:46:46 "It is not difficult per se: it is complicated"

@pauleagle97 Жыл бұрын

After around 7 days of digesting the calculus, designing and implementing an algorithm (Python), I got it all working today. It outperformed the finite difference computation, yielding the same cost reduction 30 times faster! I learned so much and so fast, I could hardly imagine any university giving me that. Your contribution to the ML community is terrific. Thank you lad, and I can't wait to see the rest of your streams dedicated to this fascinating topic 💪

@chillydill4703 8 ай бұрын

Big props to tsoding on drawing how things actually works. Excellent teaching!

@yc699bk Жыл бұрын

Andrej Karpathy has a great explanation on how backprop works

@flickering.embers Жыл бұрын

Thanks for your incredible videos, as always! Just one thing: the approach you took for deriving the backpropagation formula -- defining intermediate cost functions -- is very interesting, but I thnk there were some mistakes along the way. In 2:10:00 e_i is left out from the differentiation process for the reason that it is a constant. But according to the definition of e_i, its value depends on the activation values of the layer 2 which is in turn dependent on the weights and biases of the current layer 1, so it cannot be a constant. Since ai^(1) - ei = \pd[ai^(1)]C^(2), the correct way to differentiate ai^(1) - ei by w^1 would be to compute the second partial derivative \pd[w^1]\pd[ai^(1)]C^(2) (which immediately shows that the result of this computation cannot coincide with that based on the chain rule, since the latter only consists of joining first partial derivatives together). If we carry on with the calculation, \pd[ai^(1)]C^(2) has a bunch of ai^(2)'s multiplied together, so for each term we would have to plug in \pd[w^1]ai^(2) = ai^(2) * (1 - ai^(2)) * w^2 * ai^(1) * (1 - ai^(1)) * xi. I have not followed through this line of thought, but I suspect the result would be fairly complicated. Anyway, if my calculations are correct, the formula actually used in the code differs from the conventional one by a factor of two. So for a neural network with n layers, I would predict that the weights and biases of neurons in the first layer will be updated roughly 2^n times more rapidly than they would have if the conventional formula had been used. I didn't have time to test it out, but it would be interesting to actually see how this variant formula performs compared to the conventional one.

@Bolvarsdad Жыл бұрын

As a first year in applied AI, I can't wait until the day when I can read what the fuck you just said and be like "Oh yeah that's entirely sensible". I'm taking Calc 2 and Calc 3 next year as elective courses to better understand third year material such as gradients, hats off to you sir.

@TsodingDaily Жыл бұрын

Feel free to submit a patch: github.com/tsoding/nn.h/blob/63ae4a1ba3396215be67196b789593e4c56f9139/nn.h#L306-L360 And also correct the tex file: github.com/tsoding/ml-notes/blob/fd64cea05648e1480b6f9556a887237def0c6300/papers/grad.tex

@Сергей-ц2т2х Жыл бұрын

Однозначно лучший канал по программированию на Ютубе, вы помогли мне понять очень много вещей, и показали мне новый способ программирования. Спасибо! P. S. У меня сложилось впечатление, что вы говорите по русски

@foxcirc Жыл бұрын

Your way of explaining things is great and I really learned a lot about derivatives (and LaTex) in this video. Love your content and community ❤

@DGHere12 Жыл бұрын

I was not able to get this concept any how, By this video i have learnt a lot in depth, Your videos are awesome.

@sharokhkeshawarz2122 Жыл бұрын

Bro i just started learning machine learning few days ago this is gonna be helpful!

@RickeyBowers Жыл бұрын

Thank you for posting to the tubes - I missed this stream.

@anthonypace5354 10 ай бұрын

Great content! But there is one more step needed at 41:10 You could remove many superfluous multiplications if you move the constant 2 outside the sum, so 1/n becomes 2/n I watched later videos where you address that people told you to remove the 2, but that doesn't seem necessary as you will be modifying the outside constant with the learning rate anyways, and topped with a different activation function I think you are doing fine. Again, great content! Your personality is awesome!

@jdobdob8947 11 ай бұрын

I love learning from raw code especially in c. Awesome.

@CodePhiles Жыл бұрын

the journey through this was amazing , even when I get lost, thank you for your effort and insights during this, looking forward to complete watching and understanding the whole series.

@mohammedalhaddar2182 Жыл бұрын

It was the first time in my life when i've seen GCC optimization giving that big of performance jump! I can't believe I'm learning all of this for free

@ecosta Жыл бұрын

My greatest regret from college was not investing enough time to learn LaTeX. The language is ugly, the tools are verbose, but the end result is so amazing. No word processor could describe math (or science in general) in such elegant way while providing you with full control. And I feel really sorry for whoever has to use the "Web 2.0" tools to write their college stuff nowadays.

@ffuuzzuu5943 Жыл бұрын

if you still wanna learn a language with the power of LaTeX, but without the verbosity and ugliness, have a look a typst. its newer, and is obviously far less widely used, but it builds upon many of LaTeXs problems, and its syntax is quite nice

@623-x7b Жыл бұрын

You can copy and paste equations from Symbolab into math mode if that makes it easier.

@raphaelmorgan2307 Жыл бұрын

my spouse is a math major rn and they're using LaTeX! And for both of our stats classes we're using R and markdown lol

@eukaryote0 Жыл бұрын

Pretending to be non-scientist successfully failed

@lkda01 Жыл бұрын

wow, I realized the usage of ++i instead of i++ simply made engineer and mathematician happy. Love it!

@bakaneko39 Жыл бұрын

Very good video ! Would love to see you re-implement this with BLAS !

@blacklistnr1 Жыл бұрын

When you said "weights and biases" I was expecting an ad and was surprised you have a sponsor... Can back-prop de-ad my brain? :))

@FirstnameLastname-cw1my Жыл бұрын

Sir your videos are really good i am not into machine learning right now and still i am getting everything thanks alot for quality content......Have a great day:]

@joseppratmonell4969 Жыл бұрын

question1: in backprop method, why you do not divide /n the activations (.as) too? as you do for weights and bias? question2: should we divide by (n-j) instead of just (n)?

@ratchet1freak Жыл бұрын

1. because 3b1b said so and if you work them out you will find that the activation actually applies for each neuron they are the input to so it makes sense to not diminish their contribution

@dohyun0047 Жыл бұрын

thanks for the awesome video and quality, everything made more sense when I watched the video you mentioned 3b1b, and had an ah-ha moment. thank you so much~!!!!! really appreciate

@AhmedRefaat532 Жыл бұрын

I've been following you since the Haskell days

@anon_y_mousse Жыл бұрын

The thing that stuck out most in the entire stream was something you said early on about everyone using cheat sheets, and my thought on why teachers don't like them is because they're trying to teach you not only how to do it but to understand it, and if you don't need to refer to the cheat sheet to do it then presumably you understand how it works. In other words, the best cheating method is to memorize it. It just so happens to have the added benefit of making you faster too. As for the math, I've got an idea I'd like to try about reorganizing the graphs as it processes that *might* enable faster growth. "Might" being the keyword as it's probably a stupid idea. I really should get a GitHub account so that I could post things and share.

@galbalandroid Жыл бұрын

the problem is, most teachers don't teach in an intuitive way but rather just throw definitions at you, so it's your responsibility to find the way to understand it intuitively. which leads to why the most just stick to cheat sheets.

@anon_y_mousse Жыл бұрын

@@galbalandroid Indeed, and there's nothing wrong with that, in my opinion, but it makes sense for teachers to want you to understand the material, whether *they* have any idea how to impart that knowledge or not. Also, if you understand the material you should have all of what's on the cheat sheet memorized and thus be faster at solving the equations.

@TsodingDaily Жыл бұрын

> if you understand the material you should have all of what's on the cheat sheet memorized No.

@stewartzayat7526 Жыл бұрын

But differentiating functions usually boils down to a lot of mechanical work for which you do not need any understanding of what differentiating is, but do need a lot of formulas and rules. You either have those memorized, which comes naturally if you differentiate enough functions, or you don't have them memorized, in which case you look at a cheat sheet and arrive at the same result. There is no benefit to having them memorized other than that you are faster. You don't need to memorize those formulas to understand what a derivative is.

@anon_y_mousse Жыл бұрын

@@stewartzayat7526 If you truly understand it then you can derive those formulae yourself, and in such cases you don't need the cheat sheet. Faster is always better, memorizing is the ultimate cheat.

@Anonymous-fr2op 9 ай бұрын

3:30:18 idk why tf j+bits in the second assignment crashes my program, am using c++ btw. If anybody got any idea why, plz explain. Cuz imo, the index is perfectly bound as well, yet it just doesn't wanna work

@joseppratmonell4969 Жыл бұрын

since a_i^(0)=x_i and x_i is a constant. partial derivative a_i^(0)C became all full of constants which is 0. Thats why we do not compute that. And thanks to a_i^(n)=y_i being something that we know, we can start back propagation from there. I think I understood something... at least 1% hahaha Intra layers have to take into account partial derivation of activations since those will be like weights and biases, we do not know them, and, thats how layers actually chain with each other, at least in my mind hahaha And thanks for this awesome series!

@postmodernist1848 Жыл бұрын

Omg a long one! Just like the old days

@filipposcaramuzza2953 Жыл бұрын

Thanks god, it's the first time I hear someone finding the derivative notation having a very low SNR as I do. In high school we were used to use the good old f' and suddenly in uni those nerds started using d/dx f(x) and ruin our game

@grawlixes Жыл бұрын

you are legit one of the best teachers I've ever watched. thanks!

@AlguienMas555 Жыл бұрын

I kept this moment for me. 32:50

@renefernandez360 Жыл бұрын

Before this I did know that matrix multiplication was an integral part of AI, but never got to understand it the way you are explaining it. But this gets me thinking: Isn't AI an heuristic algorithm with extra steps? You have a randomized initial solution that you want to optimize or drive to a local optimum, you need a function to know how good your solution is(objective or cost function), you randomly do small changes towards a better solution until you reach a certain number of iterations where you either reached a local optimum or are still going towards it.

@bayesianmonk 6 ай бұрын

The law of total derivative is the one you are referring to.

@bhavik_1611 Жыл бұрын

CPU at constant 90*+ and a 6.7 gib porn folder 💀my guy is upto somthing

@lucifer-5ybtn Жыл бұрын

Can anybody help me out with one thing? I haven’t had the need for “high level math” till now in any of my projects. But I have been looking into the AI/ML stuff lately along with deep learning and thought that I should pick up math now. My math skills ARE HORRIBLE. It’s because my basics aren’t clear as I couldn’t care any less about the “education system”. I have been thinking about starting math (and physics too) from scratch and will do a self study kind of thing in the sense that I wouldn’t have to mug up a ton of unnecessary rules and then spit them out in some exam. My sole focus shall be on concepts which I can apply in “real world” stuff. If there is anyone here who can help me out by suggesting a path for learning math and physics from scratch?

@atharvparlikar8765 Жыл бұрын

I'm no maths expert but I do understand fair enough as a CS student who's been decent since school. I'm also working on a deep learning/ autograd library written in python so I might be able to help you with neural nets and their ins and outs. As for maths I'm not very confident but I'll help as much as I can, first off be very specific about how much maths you know right now, and we'll see how we can go further

@adama7752 Жыл бұрын

Calculus 101. Just integrals and derivatives.

@anon-fz2bo Жыл бұрын

Do some linear algebra & calculus, & discreet math for theory stuff (graphs theory and stuff) I suck ass too but I had to learn these things for my CS degree

@mentalsquirt1423 Жыл бұрын

3blue1brown and statquests

@lucifer-5ybtn Жыл бұрын

@@atharvparlikar8765 No trigo No calculus, basic algebra basic geometry. But I am good at logic building, I can easily understand what the problem is (programming problem) and how I can come up with a solution. But i’d say that assume that I’m a complete beginner who does not know anything more than 4 basic operations and fractions.

@thedebapriyakar Жыл бұрын

pure legend

@hamzadlm6625 Жыл бұрын

ur are a universal blessing upon our lives

@xhivo97 Жыл бұрын

Since Odin supports matrices at a language level would that make it a good fit for ML?

@xhivo97 Жыл бұрын

Nevermind I think it only supports matrices up to 16

@freestyle85 Жыл бұрын

It will be nice to add separate activation functions to inner layers like ReLU, because a sigmoid activation slowdown speed of learning. In practice, more the one hidden layer with sigmoid does not using, but with ReLU can be.

@JohnWasinger Жыл бұрын

28:50 You speak German too? What are you using to edit your LaTex document? Vim plugin perhaps?

@GegoXaren Жыл бұрын

He's using Emacs. He just runs the latexpdf command, and the PDF reader auto updates.

@MrOnePieceRuffy Жыл бұрын

double time stream, poggers

@michaelhaag3367 Жыл бұрын

man, I love you, hope everything is fine where you are❤

@MatthewPherigo Жыл бұрын

Is your screen lagged from your cam and audio? Sometimes you seem to react to something on screen before it appears

@RahulPawar-dl9wn Жыл бұрын

4:04:09 what this "-03" parameter ?

@nibrobb Жыл бұрын

Optimization level. -O3 is the most agressive optimization, it unrolls for-loops and uses lots of tricks to go faster. Read the man pages for detailed info

@kuyajj68 Жыл бұрын

How to become smart like tsoding?

@GegoXaren Жыл бұрын

Time. Patience. Practice.

@SomeSubhuman Жыл бұрын

Also a couple extra IQ points

@johanngambolputty5351 Жыл бұрын

Oh no, please don't tell me you're calling glorified chain rule the most important ML alg, it hurts my soul. Most important for neural nets perhaps, but obviously not super applicable to other ML techniques, especially if they're non-differentiable.

@TsodingDaily Жыл бұрын

Please, forgive me. That what you have to do on KZbin to survive.

@johanngambolputty5351 Жыл бұрын

@@TsodingDaily Forgiven :) also I don't think the stochastic part of sgd is the "shuffling", I think its because you're sampling a subset of your dataset. It's an estimator of the full cost (and the gradient that you calculate is an estimator of the true gradient), which is like an average of costs across individual datapoints anyway. If you like, the stochastic, or random part, is the error from the true cost. Haven't overly looked into it, but you essentially just want an update rule that is robust to this error I think.

@silencedspec1711 Жыл бұрын

Just finished last video and we have more!

@garygonsalves6956 Жыл бұрын

Thank you very much for the educational videos! Is the source code for each video available somewhere?

@JJacquesLoheac Жыл бұрын

Very Interesting, Great job for a not AI guy

@__noob__coder__ Жыл бұрын

More machine learning stuff from now on. Please 😅

@gasun1274 Жыл бұрын

this couldnt have come at a better time lol

@AMith-lv2cv Жыл бұрын

Zozi got hacked?! wth, bitcoin support? Is this new on the channel or did I miss smth..

@UnrealCatDev Жыл бұрын

It's kinda off to name carry of, shouldn't it be carry on?

@antropod 5 ай бұрын

It is black boxes all the way down

@anashussain3540 Жыл бұрын

Which IDE did you use?

@DuskyDaily Жыл бұрын

Emacs

@DuskyDaily Жыл бұрын

@Numerius because vim is the best editor of all time

@anashussain3540 Жыл бұрын

@@DuskyDaily thankyou, i like your typing, very attractive :)

@spectrespect Жыл бұрын

@Numerius Ed? Oh man, ed is the hell

@XELER53 Жыл бұрын

Khello Khello

@MuhammadJamil-ho6wl Жыл бұрын

He is not writing code he just copy past from his mind.

@edgaremmanuel3197 11 ай бұрын

i tried the framework for xor with 3 inputs.

@edgaremmanuel3197 11 ай бұрын

did not work, is there something am missing

@edgaremmanuel3197 11 ай бұрын

i trained the model using 70% of the data

@lkda01 Жыл бұрын

lol im lost

@alonner-gaon6012 Жыл бұрын

I opened an issue in the github which I think will interest you @Tsoding