23 - Model Deployment
42:53
Жыл бұрын
Lecture 19 - RNN Implementation
54:34
Lecture 15 - Training Large Models
46:37
Lecture 14 - Implementing Convolutions
1:19:42
Lecture - 12 GPU Acceleration
44:20
2 жыл бұрын
Lecture 11 - Hardware Acceleration
45:22
Lecture 10 - Convolutional Networks
1:08:31
Lecture 4 - Automatic Differentiation
1:03:35
Пікірлер
@kenpaul-h2w
@kenpaul-h2w 28 күн бұрын
46:53 V_(2->4) bar is wrong ,isn't it?
@Yitongchen-vn4dy
@Yitongchen-vn4dy Ай бұрын
Great work man
@ethanq.8007
@ethanq.8007 2 ай бұрын
Anyone knows how can I enroll in 2024?
@usmanusmonov2340
@usmanusmonov2340 2 ай бұрын
Hey! How can I sign-up for 2024 version of the course?
@UUalead
@UUalead 3 ай бұрын
Thanks a lot
@lonewolf00000
@lonewolf00000 3 ай бұрын
Can you please release the latest version? I love this course but I feel 2022 version is too outdated now.
@minma02262
@minma02262 3 ай бұрын
Is the jacobian way to compute the gradient too cumbersome and error prone to implement as code. The lecture doesn't explain exactly what cumbersome part is it.
@minma02262
@minma02262 3 ай бұрын
Just finished this video, and 20 plus videos to go! Great course!
@fire_nakamura
@fire_nakamura 4 ай бұрын
I've been looking for a material to derive the local gradient of CNN layer, thank you for the great material. But one thing uncertain about is 44:45 for the derivative of the vector 𝑥 Given that 𝑥 is a column vector and 𝑊 is a matrix of dimensions(𝑚×𝑛). I thought the following to be true: Since the columns of 𝑊 correspond to the number of rows in 𝑥. each column in 𝑊 would interact with each row in 𝑥 as follows: 𝑤11,𝑤12⋅𝑥1 𝑤21,𝑤22⋅𝑥2 Thus, 𝑥1 would be multiplied by 𝑤11,𝑤21 and 𝑥2 would be multiplied by 𝑤12,𝑤22. Therefore, the derivative of 𝑥 should be the transposed matrix 𝑊, not 𝑊. Am I wrong in this understanding?
@fire_nakamura
@fire_nakamura 4 ай бұрын
here to learn English
@cleyang
@cleyang 4 ай бұрын
img2col reshape operation for blocked overlap image `[4,4,3,3].reshape(16,9)` will expanding data, which is meaning we copy data in the same location, not reuse it? Then when performing gradient descent, we will have two gradients for same location. But shouldn't convolution accumulate gradients at the same location when performing gradient descent?
@ankurkumarsrivastava6958
@ankurkumarsrivastava6958 4 ай бұрын
code? notebook?
@feishen.3176
@feishen.3176 5 ай бұрын
Great illustration!!! Is it better to use Sum{+-w2*Relu(w1*x+b1)+b2} for the universal function approximation part?
@mystmuffin3600
@mystmuffin3600 5 ай бұрын
39:39 errata: mapping from nxk -vector- matrix to real-valued -vector- scalar.
@timkris1574
@timkris1574 5 ай бұрын
Impressive lecture!There are just too many annoying ads…
@YumingDing-x6n
@YumingDing-x6n 5 ай бұрын
47:24 so funny. lol
@AnEnderNon
@AnEnderNon 5 ай бұрын
omg tysm
@akashprajapathi6056
@akashprajapathi6056 6 ай бұрын
sir please provide implementation code also
@sarracen1a
@sarracen1a 7 ай бұрын
太棒了,拯救了我的期末大作业😭
@oraz.
@oraz. 7 ай бұрын
This is the best autodiff lesson
@alirezaolama5806
@alirezaolama5806 7 ай бұрын
Nice lecture! Can anyone provide some resources to read more about these materials, please? Especially, reverse mode AD by extending computational graph?
@programmer1379
@programmer1379 7 ай бұрын
Loved this lecture. Rich in information about ML library design and why it's important to take time to think about the design choices initially. Some design choices are difficult to rollback.
@lonewolf00000
@lonewolf00000 7 ай бұрын
This is a great course, please provide a updated version.
@GerardoHuerta-w2j
@GerardoHuerta-w2j 6 ай бұрын
this version don't work?
@lonewolf00000
@lonewolf00000 6 ай бұрын
@@GerardoHuerta-w2j This is atlest a year older
@lonewolf00000
@lonewolf00000 3 ай бұрын
I feel this is outdated now in 2024
@programmer1379
@programmer1379 7 ай бұрын
The "pretend and check" method is a nice way to approach the derivation problem. What's important here is that there is an easy way to check: namely, nudge the value of a parameter a tiny bit and evaluate, the value we get is expected to be very close to the "pretend and check" value. Love the content of the course, thank you for making it public!
@hieuha3724
@hieuha3724 8 ай бұрын
Found this course too late. Thank you so much!
@MeridianLights
@MeridianLights 8 ай бұрын
What’s wrong with the other comments?? I found this to be super clear and helpful. Thank you! Please make more implementation videos like this!
@kelvin4210
@kelvin4210 8 ай бұрын
Can I sign up for the course now? Can't find the entry.
@soumitrapandit3444
@soumitrapandit3444 8 ай бұрын
This is fantastic stuff! Thank you so much for making it available!
@TianhaoLu-v4n
@TianhaoLu-v4n 9 ай бұрын
fantastic!
@soumitrapandit3444
@soumitrapandit3444 9 ай бұрын
This was amazing. Thank you so much!
@rpraver1
@rpraver1 9 ай бұрын
Where is the code?
@rpraver1
@rpraver1 9 ай бұрын
Why is scaled_dot_product transposing Q instead of K which is the process used in original paper?
@_jasneetsingh_
@_jasneetsingh_ 9 ай бұрын
at timestamp 32.40 good explanation of why we need multi-head attention
@_jasneetsingh_
@_jasneetsingh_ 9 ай бұрын
your interpretation/perspective of attention is nice. In the past, I simply understood the mechanics of transformer, rather than (your) the abstracted view you presented. thank you
@howardsmith4128
@howardsmith4128 9 ай бұрын
Great lesson thanks!
@yimingsun6638
@yimingsun6638 10 ай бұрын
where can we find the notebook you are writing alive?
@parsakhavarinejad
@parsakhavarinejad 10 ай бұрын
great work man
@acsport5728
@acsport5728 10 ай бұрын
Is from this course of 24 lecture series could i will able to make llms chat bot 0:40
@axe863
@axe863 11 ай бұрын
High-Throughput Highly Parallelizable GPU ..... Sequential Tasks with low OH CPU .... Low Latency FPGA
@stupid_1boy580
@stupid_1boy580 11 ай бұрын
thank you so much
@chengluo2040
@chengluo2040 11 ай бұрын
Considering that Reverse mode AD will build a computational graph, will it use more memory than the Backprop method? How to avoid this issue?
@programmer1379
@programmer1379 7 ай бұрын
Yes, the extra space is needed to store the nodes of the computation graph. The benefit is that now you can take partials of partials and optimize the graph by fusing nodes whenever possible for example. It's a trade-off, if extra storage is an issue, you can use in-place updates of partials with backpropagation.
@DmitryOt
@DmitryOt Жыл бұрын
When explaining temporal convolution networks (12:30 and later) you emphasize that units on a next layer can only look back (in time) when scanning previous layer and can not look forward. I do not understand where this restriction is coming from? At the training time we have the whole sequence available from very beginning, so we can look both back and forward without any problem. At inference time it is a bit more difficult, but still possible. When we already produced 10 words and need to produce 11th word, unit 5 can scan units (3 - 7), so 2 back and 2 forward with ease. And if we don't have future tokens yet we can use padding.
@amortalbeing
@amortalbeing Жыл бұрын
thanks a lot
@amortalbeing
@amortalbeing Жыл бұрын
Thanks a lot man
@amortalbeing
@amortalbeing Жыл бұрын
thanks. it was great but you forgot to add the positional embeddings
@PriyaMittal-tx7xw
@PriyaMittal-tx7xw Жыл бұрын
Can you please provide github link for the code?
@zhiqiqin1286
@zhiqiqin1286 Жыл бұрын
great!
@bohao8002
@bohao8002 Жыл бұрын
very nice
@xxxiu13
@xxxiu13 Жыл бұрын
Great lecture, thank you!
@hehehe5198
@hehehe5198 Жыл бұрын
very nice lecture