Hey! How can I sign-up for 2024 version of the course?
@UUalead3 ай бұрын
Thanks a lot
@lonewolf000003 ай бұрын
Can you please release the latest version? I love this course but I feel 2022 version is too outdated now.
@minma022623 ай бұрын
Is the jacobian way to compute the gradient too cumbersome and error prone to implement as code. The lecture doesn't explain exactly what cumbersome part is it.
@minma022623 ай бұрын
Just finished this video, and 20 plus videos to go! Great course!
@fire_nakamura4 ай бұрын
I've been looking for a material to derive the local gradient of CNN layer, thank you for the great material. But one thing uncertain about is 44:45 for the derivative of the vector 𝑥 Given that 𝑥 is a column vector and 𝑊 is a matrix of dimensions(𝑚×𝑛). I thought the following to be true: Since the columns of 𝑊 correspond to the number of rows in 𝑥. each column in 𝑊 would interact with each row in 𝑥 as follows: 𝑤11,𝑤12⋅𝑥1 𝑤21,𝑤22⋅𝑥2 Thus, 𝑥1 would be multiplied by 𝑤11,𝑤21 and 𝑥2 would be multiplied by 𝑤12,𝑤22. Therefore, the derivative of 𝑥 should be the transposed matrix 𝑊, not 𝑊. Am I wrong in this understanding?
@fire_nakamura4 ай бұрын
here to learn English
@cleyang4 ай бұрын
img2col reshape operation for blocked overlap image `[4,4,3,3].reshape(16,9)` will expanding data, which is meaning we copy data in the same location, not reuse it? Then when performing gradient descent, we will have two gradients for same location. But shouldn't convolution accumulate gradients at the same location when performing gradient descent?
@ankurkumarsrivastava69584 ай бұрын
code? notebook?
@feishen.31765 ай бұрын
Great illustration!!! Is it better to use Sum{+-w2*Relu(w1*x+b1)+b2} for the universal function approximation part?
@mystmuffin36005 ай бұрын
39:39 errata: mapping from nxk -vector- matrix to real-valued -vector- scalar.
@timkris15745 ай бұрын
Impressive lecture!There are just too many annoying ads…
@YumingDing-x6n5 ай бұрын
47:24 so funny. lol
@AnEnderNon5 ай бұрын
omg tysm
@akashprajapathi60566 ай бұрын
sir please provide implementation code also
@sarracen1a7 ай бұрын
太棒了,拯救了我的期末大作业😭
@oraz.7 ай бұрын
This is the best autodiff lesson
@alirezaolama58067 ай бұрын
Nice lecture! Can anyone provide some resources to read more about these materials, please? Especially, reverse mode AD by extending computational graph?
@programmer13797 ай бұрын
Loved this lecture. Rich in information about ML library design and why it's important to take time to think about the design choices initially. Some design choices are difficult to rollback.
@lonewolf000007 ай бұрын
This is a great course, please provide a updated version.
@GerardoHuerta-w2j6 ай бұрын
this version don't work?
@lonewolf000006 ай бұрын
@@GerardoHuerta-w2j This is atlest a year older
@lonewolf000003 ай бұрын
I feel this is outdated now in 2024
@programmer13797 ай бұрын
The "pretend and check" method is a nice way to approach the derivation problem. What's important here is that there is an easy way to check: namely, nudge the value of a parameter a tiny bit and evaluate, the value we get is expected to be very close to the "pretend and check" value. Love the content of the course, thank you for making it public!
@hieuha37248 ай бұрын
Found this course too late. Thank you so much!
@MeridianLights8 ай бұрын
What’s wrong with the other comments?? I found this to be super clear and helpful. Thank you! Please make more implementation videos like this!
@kelvin42108 ай бұрын
Can I sign up for the course now? Can't find the entry.
@soumitrapandit34448 ай бұрын
This is fantastic stuff! Thank you so much for making it available!
@TianhaoLu-v4n9 ай бұрын
fantastic!
@soumitrapandit34449 ай бұрын
This was amazing. Thank you so much!
@rpraver19 ай бұрын
Where is the code?
@rpraver19 ай бұрын
Why is scaled_dot_product transposing Q instead of K which is the process used in original paper?
@_jasneetsingh_9 ай бұрын
at timestamp 32.40 good explanation of why we need multi-head attention
@_jasneetsingh_9 ай бұрын
your interpretation/perspective of attention is nice. In the past, I simply understood the mechanics of transformer, rather than (your) the abstracted view you presented. thank you
@howardsmith41289 ай бұрын
Great lesson thanks!
@yimingsun663810 ай бұрын
where can we find the notebook you are writing alive?
@parsakhavarinejad10 ай бұрын
great work man
@acsport572810 ай бұрын
Is from this course of 24 lecture series could i will able to make llms chat bot 0:40
@axe86311 ай бұрын
High-Throughput Highly Parallelizable GPU ..... Sequential Tasks with low OH CPU .... Low Latency FPGA
@stupid_1boy58011 ай бұрын
thank you so much
@chengluo204011 ай бұрын
Considering that Reverse mode AD will build a computational graph, will it use more memory than the Backprop method? How to avoid this issue?
@programmer13797 ай бұрын
Yes, the extra space is needed to store the nodes of the computation graph. The benefit is that now you can take partials of partials and optimize the graph by fusing nodes whenever possible for example. It's a trade-off, if extra storage is an issue, you can use in-place updates of partials with backpropagation.
@DmitryOt Жыл бұрын
When explaining temporal convolution networks (12:30 and later) you emphasize that units on a next layer can only look back (in time) when scanning previous layer and can not look forward. I do not understand where this restriction is coming from? At the training time we have the whole sequence available from very beginning, so we can look both back and forward without any problem. At inference time it is a bit more difficult, but still possible. When we already produced 10 words and need to produce 11th word, unit 5 can scan units (3 - 7), so 2 back and 2 forward with ease. And if we don't have future tokens yet we can use padding.
@amortalbeing Жыл бұрын
thanks a lot
@amortalbeing Жыл бұрын
Thanks a lot man
@amortalbeing Жыл бұрын
thanks. it was great but you forgot to add the positional embeddings