The intuition for denoising auto encoder is greatly presented by adding noise to original image and recreating the image with ae 💯
@alfcnz4 күн бұрын
😊😊😊
@donthulasumanth541522 күн бұрын
@23:59 Number of trainable params in decoder are 2. One for w1cos(z), one for w2sin(z) respectively.
@alfcnz19 күн бұрын
Precisely 🙂
@donthulasumanth5415Ай бұрын
53:34 trying to answer what happened here, after 4 rounds of rotations (linear transformations) and squishing (with ReLu) the final affine transformation lead to the projection of points on a straight line. Which still is not linearly separable space, missing my intuition on what's special about this piece🙃
@hasanrantsАй бұрын
Alfredo thank you very much for this concrete explanation. Yann's lectures are full of dense knowledge and I had some doubts and gaps that are filled by this practicum. Appreciated/
@alfcnzАй бұрын
🥳🥳🥳
@donthulasumanth5415Ай бұрын
Guessing, the five lines at the end of video are basis vectors in un warped space or transformed space where we can draw lines to separate the colored dots. Support vector machines applies the same principle but with pre-defined functions and they are not as sensitive as neural nets.
@alfcnzАй бұрын
Where did I get those vectors? 😀
@donthulasumanth5415Ай бұрын
@@alfcnz comment check
@donthulasumanth5415Ай бұрын
@@alfcnzideally if we consider a 5-d space there are basis vectors which means a unit vector which will help what direction I should go in that space to go to a vector. Suppose in 2-d case i,j. Let's assume if we are in 5-d space then i, j, k, l, m are unit vectors. If we start with 2 features a neural network will help us to find combination of these two to get 5 features and corresponding weights will help us get to w1i, w2j, w3k, w4l, w5m. I may be wrong or unclear and I need to rethink on the same problem at the end of this playlist 😀.
@ankitnmnaik2292 ай бұрын
Thanks alot for this.
@alfcnzАй бұрын
You're welcome! 😀😀😀
@hasanrants2 ай бұрын
1:57:06 What Yann LeCun did in 1989 without Python, Pytorch or Jupyter Notebook is clearly spectacular.
@alfcnz2 ай бұрын
🤩🤩🤩
@hasanrants2 ай бұрын
Starting this playlist on 3rd October 2024, Extremely thankful for open source education. I'm excited as well as afraid of the mathematics involved.
@alfcnz2 ай бұрын
Haha, don’t be! Also, Marc’s Mathematics of Machine Learning book is a good resource to check out. See my blog on suggested visual prerequisites.
@jono_gg3 ай бұрын
Hi Alfredo, thank you for your content! Is there a place where we can find the first part of this course? I would like to take the whole course while I go through the book as well 😊
@alfcnz3 ай бұрын
No, the first part of the course is not online nor the professor who taught it is interested in uploading it. Moreover, so far I’ve only pushed half of my part.
@jono_gg3 ай бұрын
@@alfcnz got it, thank you! 🙏🏼
@alfcnz3 ай бұрын
I mentioned the topics name and chapters so that one can look them up in the book. Also, not everyone passion is education so … those videos are not necessarily very polished. 😅😅😅
@jono_gg3 ай бұрын
@@alfcnz 😂😂 ok then, I better stick with the book. Thank you again!!
@TitusAugust-l6n3 ай бұрын
Lewis Carol Thompson Brenda Allen Laura
@alfcnz3 ай бұрын
Who? 🧐
@heyyou11433 ай бұрын
Mass 🎉 video
@marvinmeng72133 ай бұрын
[NYUDLFL24-alumni] 1:02:47 Question about the dimensionality of weight matrix W_g; We say that the dimension of W_g should have dimension [d_f x d_g], since the transpose of W_g has dimension [d_g x d_f]. But since z_g is dimension [d_g x 1], z_f is dimension [d_f x 1], isn't the forward pass given as: z_g = W_g * z_f? If so, shouldn't W_g have dimension [d_g x d_f]? --> [d_g x 1] = [d_g x d_f] * [d_f x 1] Instead of [d_f x d_g]? Unless the forward pass should be z_f^T * W_g ?
@FamilyYoutubeTV-x6d3 ай бұрын
Yann is awesome.
@alfcnz3 ай бұрын
He is! 🥳🥳🥳
@muhammadharris44703 ай бұрын
Great resource, Text needs to have a background to be eligible
@alfcnz3 ай бұрын
Thanks! 😀 That’s why we provide the slides 😊
@rodrigofernandezaragones36073 ай бұрын
Gelatos Alfredo! I use to eat glasses at Fredo in Argentina. Atcold. Thank you so much Alfred for sharing all this stuff. You are making history. Gracias!
@alfcnz3 ай бұрын
😊😊😊
@lyuzongyao47703 ай бұрын
9/11/2024 Update: Answer given by ChatGPT 4O to the example question given by Yann at 1:35:58 - "what is the country that has a common border with Germany with the largest commercial exchanges with China?": Poland.
@manuelbradovent35624 ай бұрын
Thanks, I am also learning something new!
@alfcnz4 ай бұрын
😀😀😀
@sk7w4tch3r4 ай бұрын
Thanks!
@alfcnz4 ай бұрын
🥳🥳🥳
@siddharthagrawal83004 ай бұрын
The part where he mentioned FMA made me burst out laughing! Can't wait to join in-person classes!
@siddharthagrawal83004 ай бұрын
and then with the jentai lmfao
@alfcnz4 ай бұрын
👀
@OpenAITutor4 ай бұрын
I can totally see how a quantum computer could be used to perform gradient descent in all directions simultaneously, helping to find the true global minimum across all valleys in one go! 😲 It's mind-blowing to think about the potential for quantum computing to revolutionize optimization problems like this!
@dimitri304 ай бұрын
Thank you so much, it's amazing how easy you make this to understand.
@alfcnz4 ай бұрын
You're very welcome! 😀😀😀
@iamumairjaffer4 ай бұрын
Thank you, Alfredo! I just started this course, and I think it's incredibly detailed and amazing. I can't express my excitement in words! ❤️❤️❤️
@alfcnz4 ай бұрын
That’s awesome! 🥳🥳🥳
@iamumairjaffer4 ай бұрын
Amazing ❤❤❤❤❤
@alfcnz4 ай бұрын
😀😀😀
@PedroAugusto-kg1ss4 ай бұрын
Just finished all videos. Really amazing. Thank you for sharing.
@alfcnz4 ай бұрын
Glad you like them! 🥳🥳🥳
@RC-iz8bw5 ай бұрын
J'adore la couverture de Fluide Glacial avec Super-Dupont ... on est vraiment de la même génération !
@amit07prakash5 ай бұрын
Hey I’m kinda new to this field , but I’ve tried to catch up Which of the playlist do you recommend me to follow ? The one with Dr Yann Lecunn or the current one ? 🙏
@alfcnz5 ай бұрын
These are videos from my undergrad course. On my website I explain where to start and what to watch.
@MsLaula12125 ай бұрын
This ASMR is a bit odd.
@alfcnz4 ай бұрын
👀
@hyphenpointhyphen5 ай бұрын
Why cant we use counters for the loops in neural nets - would a loop not make the network more robust in the sense of stabilizing output?
@alfcnz5 ай бұрын
You need to add a timestamp if you’re expecting an answer to a specific part of the video. Otherwise it’s impossible for me to understand what you’re talking about.
@hyphenpointhyphen5 ай бұрын
@@alfcnz Sorry, around 34:39 - thanks for replying
@TomChenyangJI5 ай бұрын
only a few words on his own masterpiece work haha
@alfcnz5 ай бұрын
🤭🤭🤭
@PedroAugusto-kg1ss5 ай бұрын
Hello! First of all, thank you for uploading the material. Very, very good course. However, in this part of EBMs, I'm a little bit confused: Lets supposed that I've trained a denoising AE (or other variation) with a bunch of y's. After training, how do I use it in practice? I'd pick a random z and use to generate a y_tilde? From which distribution I'd sample such z?
@НиколайНовичков-е1э5 ай бұрын
Thank you, Alfredo! :)
@alfcnz5 ай бұрын
You’re welcome! 😀
@sudarshantak26805 ай бұрын
❤
@alfcnz5 ай бұрын
😀😀😀
@НиколайНовичков-е1э5 ай бұрын
Thank you, Alfredo. You made such a great visualization!
@alfcnz5 ай бұрын
😊😊😊
@aloklal995 ай бұрын
How were the neural nets trained before 1985 ie before back prop was invented?
@alfcnz5 ай бұрын
I have a few videos on that on my most recent playlist, second chapter. There, I explain how the Perceptron (a binary neuron with an arbitrary number of inputs) used an error correction strategy for learning. Let me know if you have any other question. 😇😇😇 Chapter 2, video 4-6 kzbin.info/www/bejne/nWXWhIhsd55se80
@aloklal995 ай бұрын
@@alfcnz thanks! 🙏
@НиколайНовичков-е1э5 ай бұрын
Thank you, Alfredo :)
@alfcnz5 ай бұрын
🤗🤗🤗
@NewGirlinCalgary6 ай бұрын
Amazing Lecture!
@alfcnz6 ай бұрын
🥳🥳🥳
@НиколайНовичков-е1э6 ай бұрын
Thank you Alfredo, you have made a very clear explanation of this topic. :)
@alfcnz4 ай бұрын
Glad it was helpful! 😀😀😀
@НиколайНовичков-е1э6 ай бұрын
Thank you, Alfredo!
@alfcnz6 ай бұрын
🥰🥰🥰
@housebyte6 ай бұрын
This principle of running differential equations backward is used in diffusion when you find the lagrange Loss function from the Score which is the time reversing Langevin dynamic equation. Cost and Energy or Momentum and Energy. Both are deterministic reversible dynamic systems.
@alfcnz6 ай бұрын
Without a timestamp I have no clue what you’re referring to.
@НиколайНовичков-е1э6 ай бұрын
Thank you, Alfredo! I am happy that you are back
@alfcnz6 ай бұрын
🥳🥳🥳
@НиколайНовичков-е1э6 ай бұрын
Hello, Alfredo! :)
@alfcnz6 ай бұрын
Long time no see! 👋🏻
@dimitri306 ай бұрын
Thank you for sharing. I'v one question about the NNs on scrambled data. If I had to make a prediction I would have said we will have an accuracy about 15%, not more, thanks to the amount of pixel that can help to determine which digit is corresponding. So is it enough to get an accuracy of 83-85% or there is something else? I supposed that the fully connected neural network would have duplicate the filters, but there is no change with the scrambled data.
@alfcnz6 ай бұрын
I don’t understand the question. Try asking in your native language.
@dimitri306 ай бұрын
@@alfcnz Yes of course. If think my french explanation was not clear either. I would have assumed that with scrambled data, we would have had an accuracy of around 15%, not more (which is more than 10% thanks to the fact that by counting the number of pixels, the model can have an idea of which digit is the most probable). I have trouble understanding how the model can achieve as "good" results as 85% on scrambled data. Does the model count the number of pixels and determine it that way, or is there something else? I had assumed that in reality, the dense model would have worked like a ConvNet by learning the same kernels multiple times. Essentially, we would have had weight redundancy to get something similar to a ConvNet. Is it because of the lack of parameters in the dense network? If we had given a lot more parameters to it, would it have come back to having a ConvNet with weight redundancy to "simulate" the filter's movement? In french: J'aurai supposé qu'avec les données brouillées on aurait eu une précision d'environ 15% pas plus. (Ce qui est plus que 10% grâce au fait qu'en comptant le nombre de pixel le modèle peut avoir une idée de quel chiffre est le plus probable. J'ai du mal à comprendre comment le modèle peut avoir d'aussi "bon" résultats que 85% sur des données brouillées. Est-ce que le modèle compte le nombre de pixel et le détermine comme ça ou il y autre chose ? J'avais supposé qu'en réalité le modèle dense aurait fonctionné comme un ConvNet en apprenant plusieurs fois les mêmes kernel. En gros on aurait eu une redondance des poids pour avoir quelque chose de ressemblant à un ConvNet. Est-ce à cause du manque de paramètre du réseau Dense ? Si on avait donné beaucoup plus de paramètres à celui-ci, est-ce que ce serait revenu à avoir un convNet avec une redondance des poids pour "simuler" le déplacement du filtre ? Thank you
@alfcnz6 ай бұрын
There's a lot going on in this question. First, let's address the fully-connected model. The model does not care if you scramble the input or if you don't. If smartly initialised, the model will learn *the same* weights but with a permutated order. That's why the model performance is (basically) the same before and after permutation. Until here, are you following? Do you have any specific question on this first part of my answer?
@dimitri306 ай бұрын
@alfcnz Thanks for your reply. I'm sorry for wasting your time, I just didn't pay enough attention to the fact that this is a DETERMINISTIC shuffle.
@alfcnz6 ай бұрын
Oh, yes! It is! The point here was to show how convolutional nets should be used only when specific assumptions hold for the input data. 😊😊😊
@Acn0w6 ай бұрын
Thanks a lot! Your content is giving me motivation to get back into this field. Keep it up please 👏🙏
@alfcnz6 ай бұрын
Happy to hear that! I'll keep feeding knowledge to my subscribers! 😇😇😇
@PicaPauDiablo16 ай бұрын
Thank you for posting this. Looks like a great hour is ahead.
@alfcnz6 ай бұрын
You bet! 😎😎😎
@tantzer61136 ай бұрын
@14:04 Paraphrase: Missing a positive (i.e., false negative) is more critical (i.e., worse) than a FALSE POSITIVE. (Note: "falsely identify a negative case" means "falsely identify AS A POSITIVE what is actually a negative case.)
@alfcnz6 ай бұрын
This is true _for the specific case_ of medical diagnosis. The contrary is true for other applications, such as spam detection.
@TemporaryForstudy6 ай бұрын
Loved the video ❤. hey i am working as an nlp engineer in india. do you have any remote opportunities for me? let me know if you have something.
@alfcnz6 ай бұрын
Thanks for your appreciation! 🥰 Currently, I’m video editing and writing a textbook. Not sure these tasks are suitable for opportunities. 🥺
@Palatino-Art6 ай бұрын
@TemporaryForstudy *I am from India too learning machine learning can I get your contact?*
@CyberwizardProductions6 ай бұрын
that's the entire reason to teach :) learn how to do something and pass it on
@alfcnz6 ай бұрын
🥳🥳🥳
@tantzer61136 ай бұрын
@5:24 Paraphrase: "So, what is the accuracy of a classifier that classifies everything as HAM, detecting no SPAM, thus yielding NO POSITIVES ?"
@alfcnz6 ай бұрын
Yup, precisely! 😊😊😊
@monanasery19926 ай бұрын
Thank you so much for sharing this series. I especially loved the vintage ConvNets and the brain part :) I have a question: I didn't understand how we define the number of feature maps. For example, in 1:27:00 , how did we go from 6 feature maps in layer 2 to 12 feature maps in layer 3? (By the way, there are 16 feature maps in layer 3 (C3) in the architecture of LeNet-5 in this paper: yann.lecun.com/exdb/publis/pdf/lecun-98.pdf (Fig 2. the architecture of LeNet-5).
@wolpumba40996 ай бұрын
*Summary* *Probability Recap:* * *[**0:00**]* *Degree of Belief:* Probability represents a degree of belief in a statement, not just true or false. * *[**0:00**]* *Propositions:* Lowercase letters (e.g., cavity) represent propositions (statements). Uppercase letters (e.g., Cavity) are random variables. * *[**5:15**]* *Full Joint Probability Distribution:* Represented as a table, it shows probabilities for all possible combinations of random variables. * *[**10:08**]* *Marginalization:* Calculating the probability of a subset of variables by summing over all possible values of the remaining variables. * *[**17:04**]* *Conditional Probability:* The probability of an event happening given that another event has already occurred. Calculated as the ratio of joint probability to the probability of the conditioning event. * *[**16:14**]* *Prior Probability:* The initial belief about an event before observing any evidence. * *[**16:40**]* *Posterior Probability:* Updated belief about an event after considering new evidence. *Naive Bayes Classification:* * *[**32:48**]* *Assumption:* Assumes features (effects) are conditionally independent given the class label (cause). This simplifies probability calculations. * *[**32:48**]* *Goal:* Predict the most likely class label given a set of observed features (evidence). * *[**44:04**]* *Steps:* * Calculate the joint probability of each class label and the observed features using the naive Bayes assumption. * Calculate the probability of the evidence (observed features) by summing the joint probabilities over all classes. * Calculate the posterior probability of each class label by dividing its joint probability by the probability of the evidence. * Choose the class label with the highest posterior probability as the prediction. * *[**36:24**]* *Applications:* * *Digit Recognition:* Classify handwritten digits based on pixel values as features. * *[**47:34**]* *Spam Filtering:* Classify emails as spam or ham based on the presence of specific words. * *[**33:56**]* *Limitations:* * *Naive Assumption:* The assumption of feature independence is often unrealistic in real-world data. * *[**42:11**]* *Data Sparsity:* Can struggle with unseen feature combinations if the training data is limited. *Next Steps:* * *[**1:05:58**]* *Parameter Estimation:* Learn the probabilities (parameters) of the model from training data. * *[**59:53**]* *Handling Underflow:* Use techniques like logarithms and softmax to prevent numerical underflow when multiplying small probabilities. i used gemini 1.5 pro to summarize the transcript
@alfcnz6 ай бұрын
They are a bit off. The first two titles should not be simultaneous nor at the very beginning. Similarly, Gemini thinks that the first two titles of Naïve Bayse Classification are also simultaneous. I can see, though, how these could be helpful, if refined a bit.