Lesson 5: Practical Deep Learning for Coders 2022

Рет қаралды 70,150

Jeremy Howard

Күн бұрын

Пікірлер: 48

@goutamgarai9624 2 жыл бұрын

Thanks Jeremy for this great tutorial.

@GilesThomas 6 ай бұрын

Awesome as always! Worth noting that the added columns (""Sex_male", "Sex_female", etc) are now bools rather than ints, so you need to explicitly coerce the df[indep_cols] explicitly at around @25.42 -- t_indep = tensor(df[indep_cols].astype(float).values, dtype=torch.float)

@mohamedahmednaji5544 4 ай бұрын

❤

@senditco 6 ай бұрын

I might be cheating a lil because I've already done a deep learning subject at Uni, but this course so far is fantastic. It's really helping me flesh out what I didn't fully understand before.

@zzznavarrete Жыл бұрын

Amazing as always Jeremy

@howardjeremyp Жыл бұрын

Glad you think so!

@blenderpanzi 10 ай бұрын

Semi off topic: What I really dislike about Python is the lack of types (or that type hints are optional). It really makes it difficult to understand things if you learn complicated new stuff like this. Is that argument a float or a tensor? What is the shape of the tensor? If that would be in a type of the function argument it would make reading the code much more easy when learning this stuff.

@Kevin-mw6kc 9 ай бұрын

If python had a strong type system it would be misaligned with it's purpose.

@noahchristie5267 7 ай бұрын

You can force more typing and type restrictions with different external sources and scripts

@minkijung3 Жыл бұрын

Thank yo so much for this lecture, Jeremy🙏

@420_gunna 10 ай бұрын

Does anyone have an alternate way of explaining what he's trying to get across at 1:05:00?

@mdidactics 7 ай бұрын

If the coefficients are too large or too small, they create gradients that are either too steep or too gentle. When the gradient is too gentle, a small horizontal step won't take you down very far, and the gradient descent will take a long time. If the gradient is too steep, a small horizontal step will correspond to a big vertical drop and a big vertical swoop up the other side of the valley. So you might even get further away from the minimum. So what you want is something in between.

@jimshtepa5423 Жыл бұрын

why use sigmoid and not just round the absolute values of predictions to either 1 or 0?

@hausdorffspace 11 ай бұрын

Clipping the values might not be too bad if they are mostly in the range of 0 to 1, but if they were evenly spread out between 0 and 1000 (for example) then most of the values would get clipped to 1. You would have to scale them down first, and that means you would need to know how much to scale them by. With the sigmoid, it doesn't matter how big the numbers get, they will be squashed to the range of 0 to 1. Also, the sigmoid is differentiable, which makes it easy to calculate a gradient.

@tegsvid8677 2 жыл бұрын

Why we divide the layer1 with n_hidden?

@rizakhan2938 11 ай бұрын

54:43 looks like its tough to control here.Good Question

@ansopa Жыл бұрын

coeffs =torch.rand(n_coeff)-0.5 What is the use of subtracting 0.5 from the coefficients? Is there a problem that the values are just between 0 and 1? Thanks a lot.

@m-gopichand 10 ай бұрын

torch.rand() generates random numbers range 0 to 1, subtracting 0.5 from the random coefficients is a simple technique to center the random values around zero, I believe that help in optimizing the gradient descent.

@emirkanmaz6059 10 ай бұрын

Shifting the range between -0.5, 0.5 so it can take positive and negative. There is different strategies you can google "weight initialization strategy" Libraries does this auto for relu or tanh etc

@thomasdeniffel2122 4 ай бұрын

In `one_epoch` at 44.09, there is a `coeffs.grad.zero_()` missing :-)

@iceiceisaac 8 ай бұрын

I get different results after the matrix section.

@thomasdeniffel2122 4 ай бұрын

Adding a dimension at kzbin.info/www/bejne/laO7q5iNppl2bNk is very important as otherwise the minus in the loss function, which then would do incorrect broadcasting leading do an model, which achieves at most 0.55 accuracy. The error is silent, as the mean in the loss function hides this.

@navrajsharma2425 8 ай бұрын

27:18 Why don't we have a constant in our model? How can we know that there's not going to be a constant in the equation? Can someone explain this to me?

@Deco354 8 ай бұрын

I think the dummy variables effectively act as a constant because they’re either 1 or 0

@mustafaaytugkaya3020 Жыл бұрын

@howardjeremyp I haven't examined the best-performing Gender Surname Model for the Titanic dataset in detail, but something seems rather strange to me. Isn't using the survival status of other family members constituting a data leak? After all, at the time of inference, which is before the Titanic incident, I would not have this information.

@twisted_cpp Жыл бұрын

Depends on how you look at it. If you're truing to predict whether a person has survived or not, and you already have a list of confirmed survivors and casualties then it's probably a good way to make the prediction, as in if Mrs X has died, then it's safe to assume that Mr X has died as well. Or if their children have died, then it's safe to assume that both their parents are dead if you consider that women and children board the lifeboats first.

@hausdorffspace 11 ай бұрын

This is going to sound very pedantic, you use the word "rank" where I think "order" would be more correct. Rank usually means the number of independent columns in a matrix. At about 1:02:00, you say that the coefficients vector is a rank 2 matrix, but I would say its rank is 1 and its order is 2.

@leee5487 7 ай бұрын

torch.rand(n_coeff, n_hidden) How does one set of coeffs, output 20 n_hidden values? I mean, mathematically, a single set of coefficients multiplied by a specific set of values will alway equal the same thing right?

@bobuilder4444 6 ай бұрын

Im assuming you are in the section about NN (before deep learning). The term n_hidden is a bad variable name. Its only 1 hidden layer, but the hidden layer is the linear combination of n_hidden relu's. Each of the relus have coefficients to learn which we store in a matrix size n_coeff by n_hidden.

@mattambrogi8004 5 ай бұрын

Great lesson! I found myself a bit confused by the predictions and loss.backward() at ~37:00. did some digging to clear my confusion up which might be helpful for others: - At 37:00 minutes when we're creating the predictions, Jeremy says we're going to add up (each independent variable * coef) over the columns. There's nothing wrong with how he said this, it just didn't click for my brain: we're creating a prediction for each row by adding up each of the indep_vars*coeffs. So at the end we have a predictions vector with the same number of predictions as we have rows of data. - This is what we then calculate the loss on. Then using the loss, we do gradient descent to see how much changing each coef could have changed the loss (backprop). Then we go and apply those changes to update the coefs, and that's one epoch.

@blenderpanzi 10 ай бұрын

1:02:38 Does trn_dep[:, None] do the same as trn_dep.reshape(-1, 1)? For me reshape seems a tiny bit less cryptic (though the -1 is still cryptic).

@michaelphines1 Жыл бұрын

If the gradients are updated inline, don't we have to reset the gradients after each epoch?

@LordMichaelRahl Жыл бұрын

Just to note, this has been fixed in the actual downloadable "Linear model and neural net from scratch" notebook.

@garfieldnate 2 жыл бұрын

It drives me absolutely batty to do matrix work in Python because it's so difficult to get the dimension stuff right. I always end up adding asserts and tests everywhere, which is sort of fine but I would rather not need them. I really want to have dependent types, meaning that the tensor dimensions would be part of the type checker and invalid operations would fail at compile time instead of run time. Then you could add smart completion, etc. to help get everything right quickly.

@howardjeremyp 2 жыл бұрын

You might be interested in hasktorch, which does exactly that!

@garfieldnate 2 жыл бұрын

@@howardjeremyp Hey that's pretty neat! Wish it worked in Python, though :D

@c.c.s.1102 2 жыл бұрын

What helped me was reading the PyTorch source code with the `??` operator and thinking about the operations in terms of linear algebra. It's hard to keep all of the ranks in mind. At the end of the day I just have to keep hacking through the errors.

@blenderpanzi 10 ай бұрын

What does Categorify do? I looked it up and didn't understand. Is it converting names ("male", "female") to numbers (1, 2) or something?

@VolodymyrBilyachat 5 ай бұрын

Instead of splitting code to cells I like to run notebook in VsCode and i can debug as normal

@СтаниславКолчин-н7ы 5 ай бұрын

it's awesome! thanks a lot

@DevashishJose Жыл бұрын

Thank you for this lecture jeremy.

@alyaaka82 7 ай бұрын

I was wondering why you replaced the NaN in the data frame with the mode not the mean?

@AzhaanNazim 3 ай бұрын

what would be the mean of names ?

@anthonypercy1770 8 ай бұрын

Simply brilliant workshop... I had to change/add dtype=float e.g pd.get_dummies(tst_df, columns=["Sex","Pclass","Embarked"], dtype=float) to get it to work maybe due to a later version of pandas?

@ukaszgandecki9106 Ай бұрын

Thanks! should be pinned, I had the same problem. For people googling, error was: "can't convert np.ndarray of type numpy.object_"