Пікірлер
@HadbbdbdDhhdbd
@HadbbdbdDhhdbd Күн бұрын
Helpful
@Manar-Sg
@Manar-Sg 4 күн бұрын
thank you so much!
@oblivitus.
@oblivitus. 4 күн бұрын
brilliant! thank you for this illustration!
@DannyGeisz-vb2dt
@DannyGeisz-vb2dt 11 күн бұрын
Thank you Rupert! Excellent, excellent explanation and intuition for this :)
@xagent6327
@xagent6327 12 күн бұрын
The solution to pad with zeros fixed the number of channels, but how did they then reduce the dimensions from 64x64 to 32x32?
@siddhantpassi8237
@siddhantpassi8237 Ай бұрын
Amazing video!
@rogercomix5648
@rogercomix5648 Ай бұрын
I liked it but you did not explain the role of the 3*3 kernel, and how it scans the pixels of the image at each layer, and the reason for the downsampling is because it is more expensive to increase the size of the kernel at each layer so we downsample the image so we get the same relative size differential as if we did increase the size of the kernel. Apart from that, it’s brilliant.
@samruddhisaoji7195
@samruddhisaoji7195 Ай бұрын
9:02 i have doubt: how are the number of features in the LHS and RHS matching? LHS = w *h*c. RHS = (w/2)*(h/2)*(2*c). Thus RHS = 2*LHS
@Bryanvas25
@Bryanvas25 Ай бұрын
actually RHS = (1/2) * LHS, and yes, i also dont understand that part
@samruddhisaoji7195
@samruddhisaoji7195 Ай бұрын
@@Bryanvas25 yes youre right about RHS = LHS/2. My bad!
@caleharrison5387
@caleharrison5387 Ай бұрын
Thanks, this is really good. One thing that would be helpful is if the example was itself convoluted like the algorithm, to make easier to visualise the algo.
@frommarkham424
@frommarkham424 2 ай бұрын
U NETS RULEEEEEEEEEEEEE
@mehdiaraghi4457
@mehdiaraghi4457 2 ай бұрын
The best explanation that I've ever seen. You answer all the questions I've had. kudos to you
@ABCEE1000
@ABCEE1000 2 ай бұрын
whould you please make a presentation on 3D Unet . that would be really appreciated
@ABCEE1000
@ABCEE1000 2 ай бұрын
Man i like you ! . you are the best ! how you simplify thing and how you are careful to deliver the idea perfectly >> please keep this great presentation up >>
@hemalathat8773
@hemalathat8773 2 ай бұрын
I LIKEED THE ANIMATIONS AND YOUR PTESENTING STYLE IN THE VIDEO. THANKS.
@Atreyuwu
@Atreyuwu 2 ай бұрын
I found this while looking up UNet ELI5... 😭😭
@shinobidattebayo7650
@shinobidattebayo7650 2 ай бұрын
nice effort, but the sound of music is distracting.
@ciciy-wm5ik
@ciciy-wm5ik 2 ай бұрын
at time 2:09 image 1- image2 = image 3 does not imply image1 + image 3 = image 2
@gunasekhar8440
@gunasekhar8440 Ай бұрын
I mean we need to assume like that. Because in the paper they said h(x) be our desired mapping, x was input and f(x) would be some transformation. So f(x)=h(x)-x
@liliznotatnikiem6755
@liliznotatnikiem6755 3 ай бұрын
I’m interested at multiclass problems (recognising bike, human AND house). Also what would you choose instead of confusion matrix?
@prammar1951
@prammar1951 3 ай бұрын
everyone is praising the video, maybe it's just me but i really didn't understand what the residual connection hopes to achieve? and how does it do that? didn't make it clear.
@TheJDen
@TheJDen 18 күн бұрын
“Residuals” are what mathematicians call the difference between the actual and predicted data values. Imagine you had a simple dataset that looked linear but with some oscillating variation (like put x + sin(3x) into graphing calculator). One option to model this data would be to train a network on each x and y. In this case, the model would have to learn the underlying linear trend (x), and the oscillation (sin(3x)). Alternatively, we could estimate the slope of the line (without variations). We could then repeatedly feed the estimated height of the line at x into the network whenever it is training on an x y pair. This way, the model only has to learn the oscillation, the difference between the line and the variation, the residual (sin(3x)). It makes the model’s job easier because it doesn’t have to learn and keep track of the linear trend (x) since we remind it every few steps. In more complex things like he showed in the video it means it doesn’t have to learn both how to maintain a good representation of a flower and make resolution higher, only how to make resolution higher (because it always has access to original flower).
@louisdante8457
@louisdante8457 3 ай бұрын
7:53 Why is there a need to preserve the time complexity per layer?
@samruddhisaoji7195
@samruddhisaoji7195 Ай бұрын
The number of elements in the input and output of a convolution layer should remain same, as later we will be performing an element-wise operation
@boughouyasser7471
@boughouyasser7471 3 ай бұрын
Make a video on I-JEPA
@dhanushs4833
@dhanushs4833 3 ай бұрын
great vide mate , would love to see more brilliant stuff like this❤❤
@MuhammadHamza-o3r
@MuhammadHamza-o3r 3 ай бұрын
Very well explained
@pranavgandhiprojects
@pranavgandhiprojects 3 ай бұрын
Hey just show this first video from your channel and immediately subscribed to your:) Great explaination with visuals
@HelloIamLauraa
@HelloIamLauraa 4 ай бұрын
best explainer!! great video, I had an "aaaaááaaa" moment at 8:05
@faaz12356
@faaz12356 4 ай бұрын
Very useful and great explanation.
@HarshChinchakar
@HarshChinchakar 4 ай бұрын
This is one of the best videos ive ever come across on youtube ngl, GG
@wege8409
@wege8409 4 ай бұрын
6:38 this is the part that really made me understand, thank you
@FORCP-bq5fo
@FORCP-bq5fo 4 ай бұрын
Love it bro!!
@terjeoseberg990
@terjeoseberg990 4 ай бұрын
You didn't explain how the skip connections are connected across. What is the data that's transferred and how is it incorporated into the output half of the U-Net?
@AaronNicholsonAI
@AaronNicholsonAI 5 ай бұрын
Thanks a whole big ton!
@sathvikmalgikar2842
@sathvikmalgikar2842 5 ай бұрын
so simple and straightforward
@SakshamGupta-em2zw
@SakshamGupta-em2zw 5 ай бұрын
Love the Music
@SakshamGupta-em2zw
@SakshamGupta-em2zw 5 ай бұрын
And love that you used manim, keep it up
@VikashSingh-vd9cp
@VikashSingh-vd9cp 5 ай бұрын
bestvideo for understanding U-net model
@paruldhariwal
@paruldhariwal 5 ай бұрын
It was really the most simplified and to the point video I watched on this topic. Great work!!
@luisluiscunha
@luisluiscunha 5 ай бұрын
You are very funny!
@mincasurong
@mincasurong 5 ай бұрын
Great summary, Great thanks
@atifadib
@atifadib 5 ай бұрын
If you want to just use the Decoder how would you do it?
@ozzafar1982
@ozzafar1982 5 ай бұрын
great explanation thanks!
@jaybrodnax
@jaybrodnax 5 ай бұрын
I feel like this is more a description to experts than an actual explanation of how and why it works. Questions I'm left with: What is the purpose of downsampling/upsampling (I'm guessing performance?) How is segmentation actually done by the u-net? How is feature extraction actually done? What are max pooling layers? What does "channel doubling" mean, and what does it achieve? How does the encoder know "these are the pixels where the bike is"? Why is it beneficial to connect the encoder features to the decoder features at each step, versus in the last step? How does unet achieve anything other than downscaling/upscaling performance efficiency? Where are the actual operations to derive features? How is u-net specifically applied for various use cases like diffusion? What does diffusion add or change, for example.
@abansalah4677
@abansalah4677 5 ай бұрын
(Disclaimer: I am a beginner, and this is not intended to be a complete answer.) You should read about convolutional layers and pooling layers to better understand this video. At any rate: A colored image has three channels: R, G, and B. A convolutional layer is specified by some spatial parameters (stride, kernel size, padding) and how many filters are there - the number of filters is the number of channels of the output. You can think of each filter as trying to capture different information. Doubling the channels, therefore, means using double the number of filters when using a stride of 2. The segmentation is done just like any ML task - the training data consists of pairs of images and their annotated versions. I think it's often hard to decipher the inner workings of a particular neural networks, and your question can/should be asked in a more general way - how do neural networks learn?
@TechHuntBD
@TechHuntBD 5 ай бұрын
Nice explanation
@LucaBovelli
@LucaBovelli 5 ай бұрын
bro why did u stop making videos i need you lmao (its a painful lmao.)
@LucaBovelli
@LucaBovelli 5 ай бұрын
dude thankssssss i thought this was another one of these things thatll take me 2 hours of youtube to *not* understand, but u saved me
@s4lome792
@s4lome792 6 ай бұрын
Clearly explained. What caused my consfusion in the first place is, in the graphic in the original paper, why does the segmentation mask not have the same dimensionality than the input image?
@mridulsehgal7773
@mridulsehgal7773 6 ай бұрын
The best ever video you can get on Unet explaination
@Atreyuwu
@Atreyuwu 2 ай бұрын
Not even close lol
@usaid3569
@usaid3569 6 ай бұрын
Great video champ
@rezadadbin4684
@rezadadbin4684 6 ай бұрын
Fucking fabulous
@notrito
@notrito 6 ай бұрын
If anyone wonders how to concatenate the features if they don't match the size... they crop it.
@ingenuity8886
@ingenuity8886 6 ай бұрын
Thank you very much bro...
@SarraAissaoui-sp3sm
@SarraAissaoui-sp3sm 6 ай бұрын
I clicked on thumb down for wasting one minute of my precious time in the intro. Get to the F point !!