Alfredo Canziani (冷在)

58:59

06 - Optimisation and gradient ascent

6 ай бұрын

59:12

05 - Multi-class perceptron, binary and multi-class logistic regression

6 ай бұрын

56:05

04 - Binary classifier evaluation, binary Perceptron

6 ай бұрын

0:46

Chapter 1, video 1-3

6 ай бұрын

1:00:36

03 - Naïve Bayes parameters estimation and Laplace smoothing

6 ай бұрын

1:06:41

02 - Discrete probability recap, Naïve Bayes classification

6 ай бұрын

2:48

00 - Course introduction

6 ай бұрын

1:05:08

01 - Course first part recap, Naïve Bayes intro

6 ай бұрын

1:43:43

14 - From latent-variable EBM (K-means, sparse coding) to target prop to autoencoders, step-by-step

Жыл бұрын

53:14

07 - Classification, an energy perspective - PyTorch 5-step training code

Жыл бұрын

1:47:39

06 - Classification, an energy perspective - Backprop and contrastive learning

Жыл бұрын

50:30

05 - Classification, an energy perspective - Notation and introduction

Жыл бұрын

1:07:19

03 - Inference with neural nets

Жыл бұрын

2:09

00 - Intro to NYU Deep Learning Fall 2022 playlist

Жыл бұрын

1:05:28

10P - Non-contrastive joint embedding methods (JEMs) for self-supervised learning (SSL)

2 жыл бұрын

56:52

09P - Contrastive joint embedding methods (JEMs) for self-supervised learning (SSL)

2 жыл бұрын

2:12:36

14L - Lagrangian backpropagation, final project winners, and Q&A session

3 жыл бұрын

1:51:32

13L - Optimisation for Deep Learning

3 жыл бұрын

1:54:23

07L - PCA, AE, K-means, Gaussian mixture model, sparse coding, and intuitive VAE

3 жыл бұрын

1:54:44

08L - Self-supervised learning and variational inference

3 жыл бұрын

2:00:29

09L - Differentiable associative memories, attention, and transformers

3 жыл бұрын

1:14:45

14 - Prediction and Planning Under Uncertainty

3 жыл бұрын

1:48:54

06L - Latent variable EBMs for structured prediction

3 жыл бұрын

1:51:31

05L - Joint embedding method and latent variable energy based models (LV-EBMs)

3 жыл бұрын

1:01:22

13 - The Truck Backer-Upper

3 жыл бұрын

51:41

04L - ConvNet in practice

3 жыл бұрын

1:59:48

03L - Parameter sharing: recurrent and convolutional nets

3 жыл бұрын

1:42:27

02L - Modules and architectures

3 жыл бұрын

1:51:04

01L - Gradient descent and the backpropagation algorithm

3 жыл бұрын

Пікірлер

@donthulasumanth5415 4 күн бұрын

The intuition for denoising auto encoder is greatly presented by adding noise to original image and recreating the image with ae 💯

@alfcnz 4 күн бұрын

😊😊😊

@donthulasumanth5415 22 күн бұрын

@23:59 Number of trainable params in decoder are 2. One for w1cos(z), one for w2sin(z) respectively.

@alfcnz 19 күн бұрын

Precisely 🙂

@donthulasumanth5415 Ай бұрын

53:34 trying to answer what happened here, after 4 rounds of rotations (linear transformations) and squishing (with ReLu) the final affine transformation lead to the projection of points on a straight line. Which still is not linearly separable space, missing my intuition on what's special about this piece🙃

@hasanrants Ай бұрын

Alfredo thank you very much for this concrete explanation. Yann's lectures are full of dense knowledge and I had some doubts and gaps that are filled by this practicum. Appreciated/

@alfcnz Ай бұрын

🥳🥳🥳

@donthulasumanth5415 Ай бұрын

Guessing, the five lines at the end of video are basis vectors in un warped space or transformed space where we can draw lines to separate the colored dots. Support vector machines applies the same principle but with pre-defined functions and they are not as sensitive as neural nets.

@alfcnz Ай бұрын

Where did I get those vectors? 😀

@donthulasumanth5415 Ай бұрын

@@alfcnz comment check

@donthulasumanth5415 Ай бұрын

@@alfcnzideally if we consider a 5-d space there are basis vectors which means a unit vector which will help what direction I should go in that space to go to a vector. Suppose in 2-d case i,j. Let's assume if we are in 5-d space then i, j, k, l, m are unit vectors. If we start with 2 features a neural network will help us to find combination of these two to get 5 features and corresponding weights will help us get to w1i, w2j, w3k, w4l, w5m. I may be wrong or unclear and I need to rethink on the same problem at the end of this playlist 😀.

@ankitnmnaik229 2 ай бұрын

Thanks alot for this.

@alfcnz Ай бұрын

You're welcome! 😀😀😀

@hasanrants 2 ай бұрын

1:57:06 What Yann LeCun did in 1989 without Python, Pytorch or Jupyter Notebook is clearly spectacular.

@alfcnz 2 ай бұрын

🤩🤩🤩

@hasanrants 2 ай бұрын

Starting this playlist on 3rd October 2024, Extremely thankful for open source education. I'm excited as well as afraid of the mathematics involved.

@alfcnz 2 ай бұрын

Haha, don’t be! Also, Marc’s Mathematics of Machine Learning book is a good resource to check out. See my blog on suggested visual prerequisites.

@jono_gg 3 ай бұрын

Hi Alfredo, thank you for your content! Is there a place where we can find the first part of this course? I would like to take the whole course while I go through the book as well 😊

@alfcnz 3 ай бұрын

No, the first part of the course is not online nor the professor who taught it is interested in uploading it. Moreover, so far I’ve only pushed half of my part.

@jono_gg 3 ай бұрын

@@alfcnz got it, thank you! 🙏🏼

@alfcnz 3 ай бұрын

I mentioned the topics name and chapters so that one can look them up in the book. Also, not everyone passion is education so … those videos are not necessarily very polished. 😅😅😅

@jono_gg 3 ай бұрын

@@alfcnz 😂😂 ok then, I better stick with the book. Thank you again!!

@TitusAugust-l6n 3 ай бұрын

Lewis Carol Thompson Brenda Allen Laura

@alfcnz 3 ай бұрын

Who? 🧐

@heyyou1143 3 ай бұрын

Mass 🎉 video

@marvinmeng7213 3 ай бұрын

[NYUDLFL24-alumni] 1:02:47 Question about the dimensionality of weight matrix W_g; We say that the dimension of W_g should have dimension [d_f x d_g], since the transpose of W_g has dimension [d_g x d_f]. But since z_g is dimension [d_g x 1], z_f is dimension [d_f x 1], isn't the forward pass given as: z_g = W_g * z_f? If so, shouldn't W_g have dimension [d_g x d_f]? --> [d_g x 1] = [d_g x d_f] * [d_f x 1] Instead of [d_f x d_g]? Unless the forward pass should be z_f^T * W_g ?

@FamilyYoutubeTV-x6d 3 ай бұрын

Yann is awesome.

@alfcnz 3 ай бұрын

He is! 🥳🥳🥳

@muhammadharris4470 3 ай бұрын

Great resource, Text needs to have a background to be eligible

@alfcnz 3 ай бұрын

Thanks! 😀 That’s why we provide the slides 😊

@rodrigofernandezaragones3607 3 ай бұрын

Gelatos Alfredo! I use to eat glasses at Fredo in Argentina. Atcold. Thank you so much Alfred for sharing all this stuff. You are making history. Gracias!

@alfcnz 3 ай бұрын

😊😊😊

@lyuzongyao4770 3 ай бұрын

9/11/2024 Update: Answer given by ChatGPT 4O to the example question given by Yann at 1:35:58 - "what is the country that has a common border with Germany with the largest commercial exchanges with China?": Poland.

@manuelbradovent3562 4 ай бұрын

Thanks, I am also learning something new!

@alfcnz 4 ай бұрын

😀😀😀

@sk7w4tch3r 4 ай бұрын

Thanks!

@alfcnz 4 ай бұрын

🥳🥳🥳

@siddharthagrawal8300 4 ай бұрын

The part where he mentioned FMA made me burst out laughing! Can't wait to join in-person classes!

@siddharthagrawal8300 4 ай бұрын

and then with the jentai lmfao

@alfcnz 4 ай бұрын

👀

@OpenAITutor 4 ай бұрын

I can totally see how a quantum computer could be used to perform gradient descent in all directions simultaneously, helping to find the true global minimum across all valleys in one go! 😲 It's mind-blowing to think about the potential for quantum computing to revolutionize optimization problems like this!

@dimitri30 4 ай бұрын

Thank you so much, it's amazing how easy you make this to understand.

@alfcnz 4 ай бұрын

You're very welcome! 😀😀😀

@iamumairjaffer 4 ай бұрын

Thank you, Alfredo! I just started this course, and I think it's incredibly detailed and amazing. I can't express my excitement in words! ❤️❤️❤️

@alfcnz 4 ай бұрын

That’s awesome! 🥳🥳🥳

@iamumairjaffer 4 ай бұрын

Amazing ❤❤❤❤❤

@alfcnz 4 ай бұрын

😀😀😀

@PedroAugusto-kg1ss 4 ай бұрын

Just finished all videos. Really amazing. Thank you for sharing.

@alfcnz 4 ай бұрын

Glad you like them! 🥳🥳🥳

@RC-iz8bw 5 ай бұрын

J'adore la couverture de Fluide Glacial avec Super-Dupont ... on est vraiment de la même génération !

@amit07prakash 5 ай бұрын

Hey I’m kinda new to this field , but I’ve tried to catch up Which of the playlist do you recommend me to follow ? The one with Dr Yann Lecunn or the current one ? 🙏

@alfcnz 5 ай бұрын

These are videos from my undergrad course. On my website I explain where to start and what to watch.

@MsLaula1212 5 ай бұрын

This ASMR is a bit odd.

@alfcnz 4 ай бұрын

👀

@hyphenpointhyphen 5 ай бұрын

Why cant we use counters for the loops in neural nets - would a loop not make the network more robust in the sense of stabilizing output?

@alfcnz 5 ай бұрын

You need to add a timestamp if you’re expecting an answer to a specific part of the video. Otherwise it’s impossible for me to understand what you’re talking about.

@hyphenpointhyphen 5 ай бұрын

@@alfcnz Sorry, around 34:39 - thanks for replying

@TomChenyangJI 5 ай бұрын

only a few words on his own masterpiece work haha

@alfcnz 5 ай бұрын

🤭🤭🤭

@PedroAugusto-kg1ss 5 ай бұрын

Hello! First of all, thank you for uploading the material. Very, very good course. However, in this part of EBMs, I'm a little bit confused: Lets supposed that I've trained a denoising AE (or other variation) with a bunch of y's. After training, how do I use it in practice? I'd pick a random z and use to generate a y_tilde? From which distribution I'd sample such z?

@НиколайНовичков-е1э 5 ай бұрын

Thank you, Alfredo! :)

@alfcnz 5 ай бұрын

You’re welcome! 😀

@sudarshantak2680 5 ай бұрын

❤

@alfcnz 5 ай бұрын

😀😀😀

@НиколайНовичков-е1э 5 ай бұрын

Thank you, Alfredo. You made such a great visualization!

@alfcnz 5 ай бұрын

😊😊😊

@aloklal99 5 ай бұрын

How were the neural nets trained before 1985 ie before back prop was invented?

@alfcnz 5 ай бұрын

I have a few videos on that on my most recent playlist, second chapter. There, I explain how the Perceptron (a binary neuron with an arbitrary number of inputs) used an error correction strategy for learning. Let me know if you have any other question. 😇😇😇 Chapter 2, video 4-6 kzbin.info/www/bejne/nWXWhIhsd55se80

@aloklal99 5 ай бұрын

@@alfcnz thanks! 🙏

@НиколайНовичков-е1э 5 ай бұрын

Thank you, Alfredo :)

@alfcnz 5 ай бұрын

🤗🤗🤗

@NewGirlinCalgary 6 ай бұрын

Amazing Lecture!

@alfcnz 6 ай бұрын

🥳🥳🥳

@НиколайНовичков-е1э 6 ай бұрын

Thank you Alfredo, you have made a very clear explanation of this topic. :)

@alfcnz 4 ай бұрын

Glad it was helpful! 😀😀😀

@НиколайНовичков-е1э 6 ай бұрын

Thank you, Alfredo!

@alfcnz 6 ай бұрын

🥰🥰🥰

@housebyte 6 ай бұрын

This principle of running differential equations backward is used in diffusion when you find the lagrange Loss function from the Score which is the time reversing Langevin dynamic equation. Cost and Energy or Momentum and Energy. Both are deterministic reversible dynamic systems.

@alfcnz 6 ай бұрын

Without a timestamp I have no clue what you’re referring to.

@НиколайНовичков-е1э 6 ай бұрын

Thank you, Alfredo! I am happy that you are back

@alfcnz 6 ай бұрын

🥳🥳🥳

@НиколайНовичков-е1э 6 ай бұрын

Hello, Alfredo! :)

@alfcnz 6 ай бұрын

Long time no see! 👋🏻

@dimitri30 6 ай бұрын

Thank you for sharing. I'v one question about the NNs on scrambled data. If I had to make a prediction I would have said we will have an accuracy about 15%, not more, thanks to the amount of pixel that can help to determine which digit is corresponding. So is it enough to get an accuracy of 83-85% or there is something else? I supposed that the fully connected neural network would have duplicate the filters, but there is no change with the scrambled data.

@alfcnz 6 ай бұрын

I don’t understand the question. Try asking in your native language.

@dimitri30 6 ай бұрын

@@alfcnz Yes of course. If think my french explanation was not clear either. I would have assumed that with scrambled data, we would have had an accuracy of around 15%, not more (which is more than 10% thanks to the fact that by counting the number of pixels, the model can have an idea of which digit is the most probable). I have trouble understanding how the model can achieve as "good" results as 85% on scrambled data. Does the model count the number of pixels and determine it that way, or is there something else? I had assumed that in reality, the dense model would have worked like a ConvNet by learning the same kernels multiple times. Essentially, we would have had weight redundancy to get something similar to a ConvNet. Is it because of the lack of parameters in the dense network? If we had given a lot more parameters to it, would it have come back to having a ConvNet with weight redundancy to "simulate" the filter's movement? In french: J'aurai supposé qu'avec les données brouillées on aurait eu une précision d'environ 15% pas plus. (Ce qui est plus que 10% grâce au fait qu'en comptant le nombre de pixel le modèle peut avoir une idée de quel chiffre est le plus probable. J'ai du mal à comprendre comment le modèle peut avoir d'aussi "bon" résultats que 85% sur des données brouillées. Est-ce que le modèle compte le nombre de pixel et le détermine comme ça ou il y autre chose ? J'avais supposé qu'en réalité le modèle dense aurait fonctionné comme un ConvNet en apprenant plusieurs fois les mêmes kernel. En gros on aurait eu une redondance des poids pour avoir quelque chose de ressemblant à un ConvNet. Est-ce à cause du manque de paramètre du réseau Dense ? Si on avait donné beaucoup plus de paramètres à celui-ci, est-ce que ce serait revenu à avoir un convNet avec une redondance des poids pour "simuler" le déplacement du filtre ? Thank you

@alfcnz 6 ай бұрын

There's a lot going on in this question. First, let's address the fully-connected model. The model does not care if you scramble the input or if you don't. If smartly initialised, the model will learn *the same* weights but with a permutated order. That's why the model performance is (basically) the same before and after permutation. Until here, are you following? Do you have any specific question on this first part of my answer?

@dimitri30 6 ай бұрын

@alfcnz Thanks for your reply. I'm sorry for wasting your time, I just didn't pay enough attention to the fact that this is a DETERMINISTIC shuffle.

@alfcnz 6 ай бұрын

Oh, yes! It is! The point here was to show how convolutional nets should be used only when specific assumptions hold for the input data. 😊😊😊

@Acn0w 6 ай бұрын

Thanks a lot! Your content is giving me motivation to get back into this field. Keep it up please 👏🙏

@alfcnz 6 ай бұрын

Happy to hear that! I'll keep feeding knowledge to my subscribers! 😇😇😇

@PicaPauDiablo1 6 ай бұрын

Thank you for posting this. Looks like a great hour is ahead.

@alfcnz 6 ай бұрын

You bet! 😎😎😎

@tantzer6113 6 ай бұрын

@14:04 Paraphrase: Missing a positive (i.e., false negative) is more critical (i.e., worse) than a FALSE POSITIVE. (Note: "falsely identify a negative case" means "falsely identify AS A POSITIVE what is actually a negative case.)

@alfcnz 6 ай бұрын

This is true _for the specific case_ of medical diagnosis. The contrary is true for other applications, such as spam detection.

@TemporaryForstudy 6 ай бұрын

Loved the video ❤. hey i am working as an nlp engineer in india. do you have any remote opportunities for me? let me know if you have something.

@alfcnz 6 ай бұрын

Thanks for your appreciation! 🥰 Currently, I’m video editing and writing a textbook. Not sure these tasks are suitable for opportunities. 🥺

@Palatino-Art 6 ай бұрын

@TemporaryForstudy *I am from India too learning machine learning can I get your contact?*

@CyberwizardProductions 6 ай бұрын

that's the entire reason to teach :) learn how to do something and pass it on

@alfcnz 6 ай бұрын

🥳🥳🥳

@tantzer6113 6 ай бұрын

@5:24 Paraphrase: "So, what is the accuracy of a classifier that classifies everything as HAM, detecting no SPAM, thus yielding NO POSITIVES ?"

@alfcnz 6 ай бұрын

Yup, precisely! 😊😊😊

@monanasery1992 6 ай бұрын

Thank you so much for sharing this series. I especially loved the vintage ConvNets and the brain part :) I have a question: I didn't understand how we define the number of feature maps. For example, in 1:27:00 , how did we go from 6 feature maps in layer 2 to 12 feature maps in layer 3? (By the way, there are 16 feature maps in layer 3 (C3) in the architecture of LeNet-5 in this paper: yann.lecun.com/exdb/publis/pdf/lecun-98.pdf (Fig 2. the architecture of LeNet-5).

@wolpumba4099 6 ай бұрын

*Summary* *Probability Recap:* * *[**0:00**]* *Degree of Belief:* Probability represents a degree of belief in a statement, not just true or false. * *[**0:00**]* *Propositions:* Lowercase letters (e.g., cavity) represent propositions (statements). Uppercase letters (e.g., Cavity) are random variables. * *[**5:15**]* *Full Joint Probability Distribution:* Represented as a table, it shows probabilities for all possible combinations of random variables. * *[**10:08**]* *Marginalization:* Calculating the probability of a subset of variables by summing over all possible values of the remaining variables. * *[**17:04**]* *Conditional Probability:* The probability of an event happening given that another event has already occurred. Calculated as the ratio of joint probability to the probability of the conditioning event. * *[**16:14**]* *Prior Probability:* The initial belief about an event before observing any evidence. * *[**16:40**]* *Posterior Probability:* Updated belief about an event after considering new evidence. *Naive Bayes Classification:* * *[**32:48**]* *Assumption:* Assumes features (effects) are conditionally independent given the class label (cause). This simplifies probability calculations. * *[**32:48**]* *Goal:* Predict the most likely class label given a set of observed features (evidence). * *[**44:04**]* *Steps:* * Calculate the joint probability of each class label and the observed features using the naive Bayes assumption. * Calculate the probability of the evidence (observed features) by summing the joint probabilities over all classes. * Calculate the posterior probability of each class label by dividing its joint probability by the probability of the evidence. * Choose the class label with the highest posterior probability as the prediction. * *[**36:24**]* *Applications:* * *Digit Recognition:* Classify handwritten digits based on pixel values as features. * *[**47:34**]* *Spam Filtering:* Classify emails as spam or ham based on the presence of specific words. * *[**33:56**]* *Limitations:* * *Naive Assumption:* The assumption of feature independence is often unrealistic in real-world data. * *[**42:11**]* *Data Sparsity:* Can struggle with unseen feature combinations if the training data is limited. *Next Steps:* * *[**1:05:58**]* *Parameter Estimation:* Learn the probabilities (parameters) of the model from training data. * *[**59:53**]* *Handling Underflow:* Use techniques like logarithms and softmax to prevent numerical underflow when multiplying small probabilities. i used gemini 1.5 pro to summarize the transcript

@alfcnz 6 ай бұрын

They are a bit off. The first two titles should not be simultaneous nor at the very beginning. Similarly, Gemini thinks that the first two titles of Naïve Bayse Classification are also simultaneous. I can see, though, how these could be helpful, if refined a bit.

Ең жақсы KZbin

Пікірлер