Thank you Rupert! Excellent, excellent explanation and intuition for this :)
@xagent632712 күн бұрын
The solution to pad with zeros fixed the number of channels, but how did they then reduce the dimensions from 64x64 to 32x32?
@siddhantpassi8237Ай бұрын
Amazing video!
@rogercomix5648Ай бұрын
I liked it but you did not explain the role of the 3*3 kernel, and how it scans the pixels of the image at each layer, and the reason for the downsampling is because it is more expensive to increase the size of the kernel at each layer so we downsample the image so we get the same relative size differential as if we did increase the size of the kernel. Apart from that, it’s brilliant.
@samruddhisaoji7195Ай бұрын
9:02 i have doubt: how are the number of features in the LHS and RHS matching? LHS = w *h*c. RHS = (w/2)*(h/2)*(2*c). Thus RHS = 2*LHS
@Bryanvas25Ай бұрын
actually RHS = (1/2) * LHS, and yes, i also dont understand that part
@samruddhisaoji7195Ай бұрын
@@Bryanvas25 yes youre right about RHS = LHS/2. My bad!
@caleharrison5387Ай бұрын
Thanks, this is really good. One thing that would be helpful is if the example was itself convoluted like the algorithm, to make easier to visualise the algo.
@frommarkham4242 ай бұрын
U NETS RULEEEEEEEEEEEEE
@mehdiaraghi44572 ай бұрын
The best explanation that I've ever seen. You answer all the questions I've had. kudos to you
@ABCEE10002 ай бұрын
whould you please make a presentation on 3D Unet . that would be really appreciated
@ABCEE10002 ай бұрын
Man i like you ! . you are the best ! how you simplify thing and how you are careful to deliver the idea perfectly >> please keep this great presentation up >>
@hemalathat87732 ай бұрын
I LIKEED THE ANIMATIONS AND YOUR PTESENTING STYLE IN THE VIDEO. THANKS.
@Atreyuwu2 ай бұрын
I found this while looking up UNet ELI5... 😭😭
@shinobidattebayo76502 ай бұрын
nice effort, but the sound of music is distracting.
@ciciy-wm5ik2 ай бұрын
at time 2:09 image 1- image2 = image 3 does not imply image1 + image 3 = image 2
@gunasekhar8440Ай бұрын
I mean we need to assume like that. Because in the paper they said h(x) be our desired mapping, x was input and f(x) would be some transformation. So f(x)=h(x)-x
@liliznotatnikiem67553 ай бұрын
I’m interested at multiclass problems (recognising bike, human AND house). Also what would you choose instead of confusion matrix?
@prammar19513 ай бұрын
everyone is praising the video, maybe it's just me but i really didn't understand what the residual connection hopes to achieve? and how does it do that? didn't make it clear.
@TheJDen18 күн бұрын
“Residuals” are what mathematicians call the difference between the actual and predicted data values. Imagine you had a simple dataset that looked linear but with some oscillating variation (like put x + sin(3x) into graphing calculator). One option to model this data would be to train a network on each x and y. In this case, the model would have to learn the underlying linear trend (x), and the oscillation (sin(3x)). Alternatively, we could estimate the slope of the line (without variations). We could then repeatedly feed the estimated height of the line at x into the network whenever it is training on an x y pair. This way, the model only has to learn the oscillation, the difference between the line and the variation, the residual (sin(3x)). It makes the model’s job easier because it doesn’t have to learn and keep track of the linear trend (x) since we remind it every few steps. In more complex things like he showed in the video it means it doesn’t have to learn both how to maintain a good representation of a flower and make resolution higher, only how to make resolution higher (because it always has access to original flower).
@louisdante84573 ай бұрын
7:53 Why is there a need to preserve the time complexity per layer?
@samruddhisaoji7195Ай бұрын
The number of elements in the input and output of a convolution layer should remain same, as later we will be performing an element-wise operation
@boughouyasser74713 ай бұрын
Make a video on I-JEPA
@dhanushs48333 ай бұрын
great vide mate , would love to see more brilliant stuff like this❤❤
@MuhammadHamza-o3r3 ай бұрын
Very well explained
@pranavgandhiprojects3 ай бұрын
Hey just show this first video from your channel and immediately subscribed to your:) Great explaination with visuals
@HelloIamLauraa4 ай бұрын
best explainer!! great video, I had an "aaaaááaaa" moment at 8:05
@faaz123564 ай бұрын
Very useful and great explanation.
@HarshChinchakar4 ай бұрын
This is one of the best videos ive ever come across on youtube ngl, GG
@wege84094 ай бұрын
6:38 this is the part that really made me understand, thank you
@FORCP-bq5fo4 ай бұрын
Love it bro!!
@terjeoseberg9904 ай бұрын
You didn't explain how the skip connections are connected across. What is the data that's transferred and how is it incorporated into the output half of the U-Net?
@AaronNicholsonAI5 ай бұрын
Thanks a whole big ton!
@sathvikmalgikar28425 ай бұрын
so simple and straightforward
@SakshamGupta-em2zw5 ай бұрын
Love the Music
@SakshamGupta-em2zw5 ай бұрын
And love that you used manim, keep it up
@VikashSingh-vd9cp5 ай бұрын
bestvideo for understanding U-net model
@paruldhariwal5 ай бұрын
It was really the most simplified and to the point video I watched on this topic. Great work!!
@luisluiscunha5 ай бұрын
You are very funny!
@mincasurong5 ай бұрын
Great summary, Great thanks
@atifadib5 ай бұрын
If you want to just use the Decoder how would you do it?
@ozzafar19825 ай бұрын
great explanation thanks!
@jaybrodnax5 ай бұрын
I feel like this is more a description to experts than an actual explanation of how and why it works. Questions I'm left with: What is the purpose of downsampling/upsampling (I'm guessing performance?) How is segmentation actually done by the u-net? How is feature extraction actually done? What are max pooling layers? What does "channel doubling" mean, and what does it achieve? How does the encoder know "these are the pixels where the bike is"? Why is it beneficial to connect the encoder features to the decoder features at each step, versus in the last step? How does unet achieve anything other than downscaling/upscaling performance efficiency? Where are the actual operations to derive features? How is u-net specifically applied for various use cases like diffusion? What does diffusion add or change, for example.
@abansalah46775 ай бұрын
(Disclaimer: I am a beginner, and this is not intended to be a complete answer.) You should read about convolutional layers and pooling layers to better understand this video. At any rate: A colored image has three channels: R, G, and B. A convolutional layer is specified by some spatial parameters (stride, kernel size, padding) and how many filters are there - the number of filters is the number of channels of the output. You can think of each filter as trying to capture different information. Doubling the channels, therefore, means using double the number of filters when using a stride of 2. The segmentation is done just like any ML task - the training data consists of pairs of images and their annotated versions. I think it's often hard to decipher the inner workings of a particular neural networks, and your question can/should be asked in a more general way - how do neural networks learn?
@TechHuntBD5 ай бұрын
Nice explanation
@LucaBovelli5 ай бұрын
bro why did u stop making videos i need you lmao (its a painful lmao.)
@LucaBovelli5 ай бұрын
dude thankssssss i thought this was another one of these things thatll take me 2 hours of youtube to *not* understand, but u saved me
@s4lome7926 ай бұрын
Clearly explained. What caused my consfusion in the first place is, in the graphic in the original paper, why does the segmentation mask not have the same dimensionality than the input image?
@mridulsehgal77736 ай бұрын
The best ever video you can get on Unet explaination
@Atreyuwu2 ай бұрын
Not even close lol
@usaid35696 ай бұрын
Great video champ
@rezadadbin46846 ай бұрын
Fucking fabulous
@notrito6 ай бұрын
If anyone wonders how to concatenate the features if they don't match the size... they crop it.
@ingenuity88866 ай бұрын
Thank you very much bro...
@SarraAissaoui-sp3sm6 ай бұрын
I clicked on thumb down for wasting one minute of my precious time in the intro. Get to the F point !!