Neural Networks for beginners: How to write general notation!

Рет қаралды 252

Күн бұрын

We're learning more about the math behind neural networks, the foundation of Large Language Models (LLMs), a type of AI model that gives us ChatGPT!
This is part 3, where we learn how to write the general notation for a neural.network of any size!

Пікірлер: 5

@dc1049 3 ай бұрын

Really enjoying the content and I appreciate your approach in making it conceptually accessible! I'm going to have follow up questions soon but first I'd like to chew on it.

@mininao 4 ай бұрын

wow, super clear and enlightening ! thank you so much for making this video :) !

@JenFoxBot 4 ай бұрын

yayyy so glad to hear!! thank you for sharing :D

@andresj5512 4 ай бұрын

I'm not going to lie, since I'm a potato when it comes to maths everything after the addition of layer 4 is a blurry hieroglyph hahaha. My question is: We have a number of inputs on layer 1 with a number of layers doing calculations on "how influential" is the input of the previous layer into the result (hope this is correct) How we avoid getting all the results at once; What is the function that "filters" undesired results? From what I got, we are only saying from 0 to 1 the result of the influence from the previous layer, the result should be closer to "this". But if we are not discarding any inputs, it's hard for me to think that the result will be anything different that the neural network training +/- some difference according to the input. I know I'm missing much of the "real work" that it's done inside the artificial neuron, but we are always taking its result for the next layer or not necessarily? Thanks for the video!

@JenFoxBot 4 ай бұрын

If i understand your question correctly, you're wondering how we get to a single prediction from all those big layers? The output of a neural network is that h_theta(x) function --> this is the prediction of our neural network, the hypothesis function. For example, for LLMs, the prediction is "what is the next most likely word". During training, we train the hypothesis function over MANY rounds, or epochs (typically >>1000). It starts by being very, very inaccurate, and then the full training program does gradient descent (i.e., is the next prediction better or worse than the last, if better keep going in that direction, if worse then try diff direction). After training, we test the hypothesis function on a new dataset to check accuracy. If it's good, we may release into the world! But, yes, a lot of complicated layers go into the hypothesis function which then outputs a single # between 0 and 1. Simplifying a bit but hopefully that helps. Plz LMK if I did not capture your question!