Vanishing & Exploding Gradient explained | A problem resulting from backpropagation

  Рет қаралды 121,990

deeplizard

deeplizard

Күн бұрын

Let's discuss a problem that creeps up time-and-time during the training process of an artificial neural network. This is the problem of unstable gradients, and is most popularly referred to as the vanishing gradient problem.
We're first going to answer the question, what is the vanishing gradient problem, anyway? Here, we'll cover the idea conceptually. We'll then move our discussion to talking about how this problem occurs. Then, with the understanding that we'll have developed up to this point, we'll discuss the problem of exploding gradients, which we'll see is actually very similar to the vanishing gradient problem, and so we'll be able to take what we learned about that problem and apply it to this new one.
🕒🦎 VIDEO SECTIONS 🦎🕒
00:00 Welcome to DEEPLIZARD - Go to deeplizard.com for learning resources
00:28 Gradient review
01:18 Agenda
01:45 The vanishing gradient problem
03:27 The cause of the vanishing gradients
05:30 Exploding gradient
07:13 Collective Intelligence and the DEEPLIZARD HIVEMIND
💥🦎 DEEPLIZARD COMMUNITY RESOURCES 🦎💥
👋 Hey, we're Chris and Mandy, the creators of deeplizard!
👉 Check out the website for more learning material:
🔗 deeplizard.com
💻 ENROLL TO GET DOWNLOAD ACCESS TO CODE FILES
🔗 deeplizard.com/resources
🧠 Support collective intelligence, join the deeplizard hivemind:
🔗 deeplizard.com/hivemind
🧠 Use code DEEPLIZARD at checkout to receive 15% off your first Neurohacker order
👉 Use your receipt from Neurohacker to get a discount on deeplizard courses
🔗 neurohacker.com/shop?rfsn=648...
👀 CHECK OUT OUR VLOG:
🔗 / deeplizardvlog
❤️🦎 Special thanks to the following polymaths of the deeplizard hivemind:
Tammy
Mano Prime
Ling Li
🚀 Boost collective intelligence by sharing this video on social media!
👀 Follow deeplizard:
Our vlog: / deeplizardvlog
Facebook: / deeplizard
Instagram: / deeplizard
Twitter: / deeplizard
Patreon: / deeplizard
KZbin: / deeplizard
🎓 Deep Learning with deeplizard:
Deep Learning Dictionary - deeplizard.com/course/ddcpailzrd
Deep Learning Fundamentals - deeplizard.com/course/dlcpailzrd
Learn TensorFlow - deeplizard.com/course/tfcpailzrd
Learn PyTorch - deeplizard.com/course/ptcpailzrd
Natural Language Processing - deeplizard.com/course/txtcpai...
Reinforcement Learning - deeplizard.com/course/rlcpailzrd
Generative Adversarial Networks - deeplizard.com/course/gacpailzrd
🎓 Other Courses:
DL Fundamentals Classic - deeplizard.com/learn/video/gZ...
Deep Learning Deployment - deeplizard.com/learn/video/SI...
Data Science - deeplizard.com/learn/video/d1...
Trading - deeplizard.com/learn/video/Zp...
🛒 Check out products deeplizard recommends on Amazon:
🔗 amazon.com/shop/deeplizard
🎵 deeplizard uses music by Kevin MacLeod
🔗 / @incompetech_kmac
❤️ Please use the knowledge gained from deeplizard content for good, not evil.

Пікірлер: 98
@deeplizard
@deeplizard 6 жыл бұрын
Machine Learning / Deep Learning Fundamentals playlist: kzbin.info/aero/PLZbbT5o_s2xq7LwI2y8_QtvuXZedL6tQU Keras Machine Learning / Deep Learning Tutorial playlist: kzbin.info/aero/PLZbbT5o_s2xrwRnXk_yCPtnqqo4_u2YGL BACKPROP VIDEOS: Backpropagation explained | Part 1 - The intuition kzbin.info/www/bejne/jnaWnKWcaKiEotU Backpropagation explained | Part 2 - The mathematical notation kzbin.info/www/bejne/aJ62qqaIrZJkmZI Backpropagation explained | Part 3 - Mathematical observations kzbin.info/www/bejne/fWbFZZ2Id7CBrtk Backpropagation explained | Part 4 - Calculating the gradient kzbin.info/www/bejne/kKOYp5x3j6yhmqc Backpropagation explained | Part 5 - What puts the “back” in backprop? kzbin.info/www/bejne/rnTPfJKVeNaNpLM
@yuxiaofei3442
@yuxiaofei3442 5 жыл бұрын
The voice is so nice and confident.
@deeplizard
@deeplizard 5 жыл бұрын
Thanks, yu!
@himanshutanwani5118
@himanshutanwani5118 5 жыл бұрын
@@deeplizard lol, was that intentional? xD
@jackripper6066
@jackripper6066 2 жыл бұрын
I was stuck on this concept for hours and didn't click on this video because of the views but I was wrong this is the clearest and simplest explaination I've found thanks a lot!
@lonewolf2547
@lonewolf2547 5 жыл бұрын
I landed here after checking andrews vidoes about this(which was confusing), but this video explained it very clearly and simple
@prateekkumar151
@prateekkumar151 5 жыл бұрын
Same here, Didn't like his explanation. This was very clear.Thanks
@milindbebarta2226
@milindbebarta2226 Жыл бұрын
Yep, his explanations aren't clear sometimes. It's frustating.
@MarkW_
@MarkW_ 3 жыл бұрын
Perhaps a small addition to the explanation for vanishing gradients in this video, from a computer architecture point of view. When a network is trained on an actual computer system the variable types (e.g. floats) have a limited 'resolution' that they can express due to their numerical representation. This means that adding or subtracting a very small value from a value 'x' could actually result in 'x' (unchanged), meaning that the network stopped training that particular weight. For example: 0.29 - 0.000000001 could become 0.29. With neural networks moving towards smaller variable types (e.g. 16 bit floats instead of 32) this problem is becoming more pronounced. For a similar reason, floating point representations usually do not become zero, they just approach zero.
@RabbitLLLord
@RabbitLLLord 2 ай бұрын
Dude, this is super insightful. Thanks!
@cmram23
@cmram23 3 жыл бұрын
The best and the simplest explanation of Vanishing Gradient I have found so far.
@NikkieBiteMe
@NikkieBiteMe 5 жыл бұрын
I fiiiinally understand how the vanishing gradient problem occurs
@mita1498
@mita1498 4 жыл бұрын
Very good channel indeed !!!
@GirlKnowsTech
@GirlKnowsTech 3 жыл бұрын
00:30 What is the gradient 01:18 Introduction 01:45 What is the vanishing gradient problem? 03:28 How does the vanishing gradient problem occurs? 05:31 What about exploding gradient?
@deeplizard
@deeplizard 3 жыл бұрын
Added to the description. Thanks so much!
@deepaksingh9318
@deepaksingh9318 6 жыл бұрын
What an easy explanation.. 👍 I am jst loving this Playlist and dont want it to end ever 😁
@deeplizard
@deeplizard 6 жыл бұрын
Thanks, deepak! Be sure to check out the Keras series as well 😎 kzbin.info/aero/PLZbbT5o_s2xrwRnXk_yCPtnqqo4_u2YGL
@anirudhgangadhar6158
@anirudhgangadhar6158 Жыл бұрын
The best explanation of exploding and vanishing gradients I have come across so far. Great job!
@neoblackcyptron
@neoblackcyptron 5 жыл бұрын
Wow this explanation was really clear and to the point, subbed immediately, going to check out all the videos over time.
@100deep1001
@100deep1001 4 жыл бұрын
Underrated channel ! Thanks for posting these videos :)
@HassanKhan-kq4lj
@HassanKhan-kq4lj Ай бұрын
I am having my deep learning exam tomorrow. Started studying just one day before the exam. Couldn't understand anything. Then found your video. Now I understood this concept. Thanks a lot 😭😭😭
@farshad-hasanpour
@farshad-hasanpour 3 жыл бұрын
This channel never let us down. great work!
@prithviprakash1110
@prithviprakash1110 2 жыл бұрын
Great job explaining this, understood something I was unsure about in a very decisive and clear way. Thanks!
@tymothylim6550
@tymothylim6550 3 жыл бұрын
Thank you very much for this video! I learnt about these similar problems of vanishing and exploding gradients and how it affects the convergence of weight values to their optimal values!
@craigboger
@craigboger 3 жыл бұрын
Thank you so much for your series and explanations!
@karimafifi5501
@karimafifi5501 4 жыл бұрын
The intro is so relaxing. It is like you are in another dimension.
@panwong9624
@panwong9624 6 жыл бұрын
cannot wait to watch the next video that addresses the vanishing and exploding gradient problem. :)
@deeplizard
@deeplizard 6 жыл бұрын
Have you got around to it yet? This is the one - Weight initialization: kzbin.info/www/bejne/bpzVlWingLuqY7M
@sciences_rainbow6292
@sciences_rainbow6292 3 жыл бұрын
Your videos are just perfect ! the voice, the explanation, the animation! Genius of pedagogy :)
@carinacruz3945
@carinacruz3945 4 жыл бұрын
The best source to understand machine learning concepts in an easy way. =)
@billycheung7095
@billycheung7095 4 жыл бұрын
Well explained. Thanks for your works.
@harshitraj8409
@harshitraj8409 2 жыл бұрын
Crystal Clear Explanation.
@jsaezmarti
@jsaezmarti 3 жыл бұрын
Thanks for the explaining the concept so clearly :D
@user-ho4be1ef9m
@user-ho4be1ef9m 2 жыл бұрын
I finally understand about gradient vanishing and exploding from your video!! Thanks : )
@MrSoumyabrata
@MrSoumyabrata 3 жыл бұрын
Thank you for such a nice video. Understood the concept.
@loneWOLF-fq7nz
@loneWOLF-fq7nz 5 жыл бұрын
best explanation !!! Good Work
@entertainment8067
@entertainment8067 2 жыл бұрын
Thanks for an amazing tutorial, love from Afghanistan
@albertoramos9586
@albertoramos9586 2 жыл бұрын
Thank you so much!!!
@aidynabirov7728
@aidynabirov7728 Жыл бұрын
Awesome video!
@laurafigueroa2852
@laurafigueroa2852 3 жыл бұрын
Thank you!
@abdullahalmazed5387
@abdullahalmazed5387 5 ай бұрын
awesome explanation
@ritukamnnit
@ritukamnnit 3 жыл бұрын
good explanation. keep up the great work :)
@islamicinterestofficial
@islamicinterestofficial 4 жыл бұрын
Thank you so much
@bashirghariba879
@bashirghariba879 4 жыл бұрын
Good description
@_seeker423
@_seeker423 4 жыл бұрын
Yet to see a video that explained this so clearly. One question. Does 1. both vanishing and exploding gradients lead to underfitting or 2. vanishing leads to underfitting and exploding lead to overfitting?
@Sikuq
@Sikuq 3 жыл бұрын
Excellent #28 follow up to your playlist ## 23-27. Thanks.
@hossainahamed8789
@hossainahamed8789 3 жыл бұрын
loved it
@absolute___zero
@absolute___zero 4 жыл бұрын
there is a bigger problem with gradient descent and it is not vanishing or exploding thing. It is that it gets stuck in local minima, and that hasn't been solved yet. Only partial solutions like simulated annealing or GAs. UKF , montecarlo and stuff like that, which involves randomness. The only way to find better minimum is to introduce randomness.
@alphatrader5450
@alphatrader5450 5 жыл бұрын
Great explanation! Background gives me motion sickness though.
@deeplizard
@deeplizard 5 жыл бұрын
Thanks for the feedback! I'll keep that in mind.
@petraschubert8220
@petraschubert8220 4 жыл бұрын
Great viedeo thanks for that! But I have one long time question about the backpropagation. I can adjust which layer's weight I hit bit summing up the components. But which weight will actually be updated then? will the layers inbetween the components of my chainrule update aswell? Would be very greatful for an answer, thanks!
@yongwoo1020
@yongwoo1020 6 жыл бұрын
I would have labeled your layers or edges "a", "b", "c", etc when you were discussing the cumulative effect on gradients that are earlier in the network (gradient = a * b * c * d, etc.). It can be a bit confusing since convention has us thinking one way and notation is reinforcing that while the conversation is about backprop which runs counter to that convention. The groundwork is laid for a very basic misunderstanding that could be cured with simple labels. Great video, btw.
@deeplizard
@deeplizard 6 жыл бұрын
Appreciate your feedback, Samsung Blues.
@fritz-c
@fritz-c 4 жыл бұрын
I spotted a couple slight typos in the article for this video. we don't perse, ↓ we don't per se, further away from it’s optimal value ↓ further away from its optimal value
@deeplizard
@deeplizard 4 жыл бұрын
Fixed, thanks Chris! :D
@justchill99902
@justchill99902 5 жыл бұрын
Hello! Question - I might sound silly here but do we ever have a weight update in the positive direction? I mean the weight was let's say 0.3 and then after update, it turned to 0.4? as while updating we always "subtract" the gradient * very low learning rate, this product that we subtract from the actual weight will always be very small.. so unless this product is negative (which only happens when the gradient is negative) , we will never add some value to the current weight but always reduce it right? So to make some sense out of it, when do we get negative gradients ? do we generally have this happening?
@ink-redible
@ink-redible 5 жыл бұрын
Yes, we do get negative gradients, if looked at the formula for back prop; you will easily find when the gradient will turn out to be negative
@gideonfaive6886
@gideonfaive6886 4 жыл бұрын
{ "question": "Vanishing Gradient is mostly related to ………… and is usually caused by having too many values …………… in calculating the …………", "choices": [ "earlier weights, less than one, gradient", "earlier weights, less than one, loss", "back propagation, less than one, gradient", "back propagation, less than one, loss" ], "answer": "earlier weights, less than one, gradient", "creator": "Hivemind", "creationDate": "2020-04-21T21:46:35.500Z" }
@deeplizard
@deeplizard 4 жыл бұрын
Thanks, Gideon! Just added your question to deeplizard.com/learn/learn/video/qO_NLVjD6zE :)
@dennismuller371
@dennismuller371 5 жыл бұрын
A gradient gets substracted from the weights to update them. This gradient can be really small and hence has no impact. Is also can become really large. How comes, since the gradient gets substracted, exploding gradient creates values that are larger than their former values? Should it not be sth. like a negative weight then? Nice videos btw. :)
@deeplizard
@deeplizard 5 жыл бұрын
Hey Dennis - Good question and observation. Let me see if I can help clarify. Essentially, vanishing gradient = small gradient update to the weight. Exploding gradient = large gradient update to the weight. With exploding gradient, the large gradient causes a relatively large weight update, which possibly makes the weight completely "jump" over its optimal value. This update could indeed result in a negative weight, and that's fine. The "exploding" is just in terms of how large the gradient is, not how large the weight becomes. In the video, I did illustrate the exploding gradient update with a larger positive number, but it probably would have been more intuitive to show the example with a larger negative number. Does this help clarify?
@ahmedelhamy1845
@ahmedelhamy1845 2 жыл бұрын
@@deeplizard I think that gradients can't be greater than 0.25 when using sigmoid as activation function as its derivative range from 0 to 0.25 thus it will never exceed 1 by any means. I think gradient explode is coming due to weight initialization problem as weights are initialized with large values. Correct or clarify me please?
@RandomGuy-hi2jm
@RandomGuy-hi2jm 4 жыл бұрын
what can we do to prevent it???? i think we should use relu activation function
@deeplizard
@deeplizard 4 жыл бұрын
The answer is revealed in the following video. deeplizard.com/learn/video/8krd5qKVw-Q
@mikashaw7926
@mikashaw7926 3 жыл бұрын
OMG I UNDERSTAND NOW
@abihebbar
@abihebbar 3 жыл бұрын
In the case of exploding gradient, when it gets multiplied with the Learning Rate (between 0.0001 & 0.01), the result will be much less (usually less than 1). When this is further subtracted with the existing weight, wouldn't the updated weight still be less than 1? In that case, how is it different from vanishing gradient?
@absolute___zero
@absolute___zero 4 жыл бұрын
Vanishing gradient is not a problem. It is a feature of stacking something upon something that depends on something else and so on. It is like falling dominoes but with increasing piece on each step. Because chaining function after function after function after function after function, where all the functions are summing, and at the end doing a ReLU, you are going to get your output values blowing up! Vanishing gradients is not a problem, it is how it is supposed to work. The math is right, at the lower layers you can't use big gradients because they are going to affect the output layer exponentialy. And also, cut the first minute and a half of the video because it is just loss of time.
@garrett6064
@garrett6064 3 жыл бұрын
Hmm... possible solutions, my guess is (A) too many layers in the net, or (B) a separate Learning Rate for each layer used in conjunction with a function to normalize the Learning Rate and thus the gradient. But this is just my guess.
@deeplizard
@deeplizard 3 жыл бұрын
Nice! Not sure what type of impact these potential solutions may have. A known solution is revealed in the next episode ✨
@garrett6064
@garrett6064 3 жыл бұрын
@@deeplizard after spending a further ten seconds thinking about this I decided I didn't like my "solutions". But I did think that "Every first solution should be Use Less Layers." Was not a bad philosophy. 😆
@hamidraza1584
@hamidraza1584 3 жыл бұрын
Is this problem occurs in simple neuruel or rnn lstm networks??
@fosheimdet
@fosheimdet Жыл бұрын
Why is this an issue? If the partial derivative of the loss w.r.t. a weight is small, its change should also be small so that we step in the direction of steepest descent of the loss function. Is the problem of vanishing gradients that we effectively lose the ability to train certain weights of our network, reducing dimensionality of our model?
@sidbhatia4230
@sidbhatia4230 5 жыл бұрын
Vanishing gradient is dependent on the learning rate of the model, right?
@askinc102
@askinc102 6 жыл бұрын
If the gradient (of loss) is small, doesn't it imply that a very small update is required?
@deeplizard
@deeplizard 6 жыл бұрын
Hey sandesh - Good question. Michael Nielsen addresses this question in chapter 5 of his book, and I think it's a nice explanation. I'll link to the full chapter, but I've included the relevant excerpt below. Let me know if this helps clarify. neuralnetworksanddeeplearning.com/chap5.html "One response to vanishing (or unstable) gradients is to wonder if they're really such a problem. Momentarily stepping away from neural nets, imagine we were trying to numerically minimize a function f(x) of a single variable. Wouldn't it be good news if the derivative f′(x) was small? Wouldn't that mean we were already near an extremum? In a similar way, might the small gradient in early layers of a deep network mean that we don't need to do much adjustment of the weights and biases? Of course, this isn't the case. Recall that we randomly initialized the weight and biases in the network. It is extremely unlikely our initial weights and biases will do a good job at whatever it is we want our network to do. To be concrete, consider the first layer of weights in a [784,30,30,30,10] network for the MNIST problem. The random initialization means the first layer throws away most information about the input image. Even if later layers have been extensively trained, they will still find it extremely difficult to identify the input image, simply because they don't have enough information. And so it can't possibly be the case that not much learning needs to be done in the first layer. If we're going to train deep networks, we need to figure out how to address the vanishing gradient problem."
@VinayKumar-hy6ee
@VinayKumar-hy6ee 5 жыл бұрын
As gradient of sigmoid or tan function are very low wouldn't it lead to vanishing gradient effect and why these these are chosen as activation function deeplizard
@abdulbakey8305
@abdulbakey8305 5 жыл бұрын
@@deeplizard so what is the reason for the derivative to come out small if it is not for the function to be at optimum w.r.t that particular weight
@kareemjeiroudi1964
@kareemjeiroudi1964 5 жыл бұрын
May I know who edits your videos? Because whoever does, he/ she has humor 😃
@deeplizard
@deeplizard 5 жыл бұрын
Haha thanks, kareem! There are two of us behind deeplizard. Myself, which you hear in this video, and my partner, who you'll hear in other videos like our PyTorch series. Together, we run the entire assembly line from topic creation and recording to production and editing, etc. :)
@absolute___zero
@absolute___zero 4 жыл бұрын
4:22 this is all wrong. updates can be positive or negative. this depends on the direction of the slope, if the slope is positive, the update is going to be substracted, if the slope is negative the weight updated is going to be added. So, weights don't just get smaller and smaller, they are updated in small quantities that's it, if you wait long enough (like weeks or months) you are going to get your ANN training completed just fine. You should learn about derivatives of composite functions and you will understand then that vanishing gradient is not a problem. It can be a problem though if you use float32 data type (single precision), because it has considerable error when using long chain of calculations. Switching to double will help with vanishing gradient "problem" (in quotes)
@himanshupoddar1395
@himanshupoddar1395 5 жыл бұрын
But Exploding Gradient can be handled by batch normalisation. Isn't it?
@gideonfaive6886
@gideonfaive6886 4 жыл бұрын
{ "question": "Vanishing gradient impair our training by making our weight being updated and their values getting further away from the optimal weights value", "choices": [ "False", "True", "BLANK_SPACE", "BLANK_SPACE" ], "answer": "False", "creator": "Hivemind", "creationDate": "2020-04-21T21:53:20.326Z" }
@deeplizard
@deeplizard 4 жыл бұрын
Thanks, Gideon! Just added your question to deeplizard.com/learn/video/qO_NLVjD6zE :)
@driesdesmet1069
@driesdesmet1069 3 жыл бұрын
Nothing about the sigmoid function? I thought this was also one of the causes of a vanishing/exploding gradient?
@coolguy-dw5jq
@coolguy-dw5jq 6 жыл бұрын
Is there any reason behind the name of your channel?
@deeplizard
@deeplizard 6 жыл бұрын
_I could a tale unfold whose lightest word_ _Would harrow up thy soul._
@stormwaker
@stormwaker 5 жыл бұрын
Love the series, but please refrain from zoom-in zoom-out background animation - it makes me distracted, nauseous even.
@deeplizard
@deeplizard 5 жыл бұрын
Thank you for the feedback!
@rishabbanerjee5152
@rishabbanerjee5152 4 жыл бұрын
math-less machine learning is so good :D
@zongyigong6658
@zongyigong6658 4 жыл бұрын
What if some intermediate gradients are large (> 1) and some are small (< 1), they could balance out to give early gradients still normal sizes. The described problem seems not to stem from the defect of the theory but rather from practical numerical implementation perspectives. The heuristic explanation is a bit handwaving. When should we expect to have a vanishing problem and when should we expect to have an exploding problem, is one vs the other purely random. If they are not random but depend on the nature of data or the NN architecture, what are they and why?
@VinayKumar-hy6ee
@VinayKumar-hy6ee 5 жыл бұрын
As gradient of sigmoid or tan function are very low wouldn't it lead to vanishing gradient effect and why these these are chosen as activation function
@deeplizard
@deeplizard 5 жыл бұрын
Hey Vinay - Check out the video on bias below to see how the result from a given activation function may actually be "shifted" to a greater number, which in turn might help to reduce vanishing gradient. kzbin.info/www/bejne/fpbXd5yeqL2Gr9U Let me know if this helps.
@abdulmukit4420
@abdulmukit4420 4 жыл бұрын
The background image is really unnecessary and is rather disturbing
@midopurple3665
@midopurple3665 Жыл бұрын
You are great, I feel like wanting to marry you for your intelligence
@lovelessOrphenKoRn
@lovelessOrphenKoRn 5 жыл бұрын
I wish you wouldnt talk so fast. Cant keep up. Or subtitles wouldnt bei nice
@deeplizard
@deeplizard 5 жыл бұрын
There are English subtitles automatically created by KZbin that you can turn on for this video. Also, you can slow down the speed on the video settings to 75% or 50% of the real-time speed to see if that helps.
@mouhamadibrahim3574
@mouhamadibrahim3574 2 жыл бұрын
I want to marry you
@albertodomino9420
@albertodomino9420 3 жыл бұрын
Please talk slowly.
Backpropagation explained | Part 1 - The intuition
10:56
deeplizard
Рет қаралды 114 М.
The child was abused by the clown#Short #Officer Rabbit #angel
00:55
兔子警官
Рет қаралды 13 МЛН
WHO DO I LOVE MOST?
00:22
dednahype
Рет қаралды 74 МЛН
UFC Vegas 93 : Алмабаев VS Джонсон
02:01
Setanta Sports UFC
Рет қаралды 225 М.
But what is a convolution?
23:01
3Blue1Brown
Рет қаралды 2,5 МЛН
Tutorial 7- Vanishing Gradient Problem
14:30
Krish Naik
Рет қаралды 196 М.
Batch Normalization (“batch norm”) explained
7:32
deeplizard
Рет қаралды 219 М.
Recurrent Neural Networks (RNNs), Clearly Explained!!!
16:37
StatQuest with Josh Starmer
Рет қаралды 495 М.
Vanishing , Exploding problems and solutions.
15:24
Ahmed Yousry
Рет қаралды 6 М.
The U-Net (actually) explained in 10 minutes
10:31
rupert ai
Рет қаралды 83 М.
What is backpropagation really doing? | Chapter 3, Deep learning
12:47
3Blue1Brown
Рет қаралды 4,4 МЛН
The child was abused by the clown#Short #Officer Rabbit #angel
00:55
兔子警官
Рет қаралды 13 МЛН