225 - Attention U-net. What is attention and why is it needed for U-Net?

Рет қаралды 54,704

Күн бұрын

Пікірлер: 113

@psmanici4 2 жыл бұрын

Im a biologist by training, each of my degrees was in wet-laboratory biology research. Im doing a secondment which involves images. I have never had exposure to lectures or formal training in programming. I just want to leave a note to say how useful your videos are and how proud you should feel about the work you put into this channel

@giroyu 2 жыл бұрын

I cant believe I couldnt find this cahnnel and your precious hard work till now. Thank you so much for your work. Please Keep going!

@DigitalSreeni 2 жыл бұрын

Thank you so much!

@hejarshahabi114 2 жыл бұрын

The way explain such complex models it's so nice and easy to understand, thanks for the videos you make.

@nagamanigonthina9306 Жыл бұрын

This video is very clear with detailed explanation. It cleared all my doubts. Thanks a lot.

@DigitalSreeni Жыл бұрын

You are most welcome

@angelceballos8714 3 жыл бұрын

Looking forward to the next tutorial! Thanks again!

@DigitalSreeni 3 жыл бұрын

Coming soon! :)

@100000Andy 10 ай бұрын

Your videos are amazing! You explain this so good that everyone can understand it! I am really impressed.

@ParniaSh 3 жыл бұрын

I love how clear you explained the attention mechanism. Thank you!

@DigitalSreeni 3 жыл бұрын

Glad it was helpful!

@perioguatexgaming1333 2 жыл бұрын

I think on the entire youtube you are the only one who explained the concept of attention and how to implement it. Thank you sir.

@DigitalSreeni 2 жыл бұрын

Thanks :)

@birdshotbill 2 жыл бұрын

Your videos have been incredibly useful to me, I am studying a masters in deep learning and computer vision and your content is excellent for my learning and interest in the subject. Keep up the amazing work, I hope your channel continues to grow and receive the recognition it deserves!

@perioguatexgaming1333 2 жыл бұрын

I am in undergrad and you know its so difficult to actually understand and more so to implement these things. This guy is a saviour.

@visuality2541 2 жыл бұрын

This is extremely helpful; very detailed, easy, and clear. Thank you very much Sir!

@husammasalkhi7817 Жыл бұрын

very very good video and greatly apprecieate you going through the steps of the gate and also the code for it

@timtomov3361 3 жыл бұрын

As you said g in coming from a lower level and therefore has a lower level resolution 64x64 vs 128x128 in the skip connection. However aren´t in the "normal" design the #features larger in lower levels i.e. you would probably have g = 64x64x#2*numFeatures and x = 128x128#numFeatures ?

@tshele1488 2 жыл бұрын

It was rather the x that comes from the lower early layers and it has better spatial information. Also, the skip connection is x, not g.

@tedonk03 2 жыл бұрын

This is a really great explanation, thanks so much for this. However, just to note, add is not the same as concatenate. Concatenate is appending the tensor on a certain dimension, while add is actually adding the tensor values

@legendgrable9762 Ай бұрын

I looked for it and now am applying it in early detection of prostate cancer

@mansubatabassum6629 6 ай бұрын

this is really great explanation , thank you

@anonyme3029 3 жыл бұрын

Thank you for explaining in a very understandable way

@leonardocesar1619 2 жыл бұрын

Amazing explanation! Thank you so much

@benjaminp.9572 2 жыл бұрын

It is very helpful. Thank you.

@DigitalSreeni 2 жыл бұрын

Glad to hear that!

@ShivaniAMehta 2 жыл бұрын

Thankyou so much sir for this video. It is a perfect blend of intuition of concept as well as implementation. Best wishes ahead.

@vivaliberte Жыл бұрын

very good explanation. thanks

@siddharthct4799 3 жыл бұрын

Thank you for the videos.

@niladrichakraborti5443 Жыл бұрын

Very Helpful !!

@yongcheng7397 3 жыл бұрын

Very nice and useful video, thank you

@DigitalSreeni 3 жыл бұрын

Glad it was helpful!

@sahilbhardwaj2360 3 жыл бұрын

Looking forward to the next video :)

@masterlitheon 3 жыл бұрын

Very good explanation!

@DigitalSreeni 3 жыл бұрын

Glad you think so!

@ela_bd 3 жыл бұрын

So useful, Thanks.

@francisferri2732 Жыл бұрын

I love your videos ❤

@soumyadrip 3 жыл бұрын

Thank you for the video

@arizmohammadi5354 2 жыл бұрын

like always great!

@vaibhavsingh1049 3 жыл бұрын

This is great content.

@rbhambriiit Жыл бұрын

Thanks for the great lectures. Small feedback/correction on "Concatenating or adding same thing" - Add layer adds two input tensor while concatenate appends two tensors

@edmald1978 3 жыл бұрын

Amazing explanation!!!!!!

@mujeebpa 2 жыл бұрын

Clear and detailed explanation... Could you please post a video on Single and Multi Head attention as well?

@shalini286r9 3 жыл бұрын

Great Explanation.Can you share code for Attention Unet and what modification we can made when we use AG in Unet

@DigitalSreeni 3 жыл бұрын

That would be the content I'd be covering next week. Please wait until July 14th.

@kasikrit 3 жыл бұрын

Very good explanation

@RabiulIslam-tw6qc Жыл бұрын

Wonderful.

@牛煜烁 2 жыл бұрын

Really brilliant tutorial. I really appreciate it. Just wonder, if somewhere I could download this slice? Thanks.

@SakvaUA 2 жыл бұрын

So why applying 1x1 conv with stride 2 to x which discards 2/3 of information, instead of upsampling g? This way you won't need the last upsampling step.

@hanjiang4643 3 жыл бұрын

Thanks for your clear explaination! But what if I want to cite your figures in this video? Just add the youtube channel link as a reference?

@franciscofreiri 2 жыл бұрын

@DigitalSreeni Should I change the sigmoid funtion to softmax in attention mechanism if I'm segmentating multiclass images?

@gerhitchman 2 жыл бұрын

Cool and clear explanation -- but I'm not sure how this would work for multi-class semantic segmentation -- seems like we would need to generate many attention weight-masks.

@hafizaaymen2291 Жыл бұрын

very well explained👍 plz make a video on VGG16 model

@ParniaSh 3 жыл бұрын

Just a very minor correction: to reduce the dimensions of x to half, we need a 2x2 conv with stride of 2, not 1x1.

@DigitalSreeni 3 жыл бұрын

The output shape is defined by the stride and not by the kernel size of the convolution. Here is a few lines of code if anyone wats to experiment. In fact, these types of snippets can be fun and educational. import tensorflow as tf from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Conv2D model = Sequential() model.add(Conv2D(1, kernel_size=(2,2), padding='same', strides=(1,1), input_shape = (256, 256,1))) print(model.summary())

@bingbingsun6304 2 жыл бұрын

insightful explaniation. But is there any mathatical support for this, how it is related to probability and information theory?

@charlesd4572 Жыл бұрын

I think this is superb but are the number feature maps not the wrong way round between the g and w - surely the skip connection should be half the lower level channel dimension.

@ezioauditore5705 3 жыл бұрын

Loved the video.... Could you also make a video on squeeze and attention networks for segmentation?

@giacomobenvenuti3172 2 жыл бұрын

Question: in your AG schematic shouldn't the skip tensor x have less features than the gating input g? Thank you for these great lectures!!

@gpietra Жыл бұрын

Concordo

@fahd2372 Жыл бұрын

AFAIK they should have the same number of features. Because to get g, you send the layer from which g is coming from(the layer right beneath the skip connection) through a conv layer which halves the number of filters

@tilkesh 2 жыл бұрын

Thank you.

@DigitalSreeni 2 жыл бұрын

Welcome!

@cipherxen2 Жыл бұрын

Isn't it better to use 2x2 convolution for stride of 2,2?

@drforest 10 ай бұрын

Thanks!

@DigitalSreeni 10 ай бұрын

Thank you very much.

@jagdishgg Жыл бұрын

my colab session crashes every time when I train the model. I am using Colab free account with GPU

@mingbka5134 4 ай бұрын

i think that the n_channels of input g would double the n_channels of input x, isn't it?

@angelachristabel7684 Жыл бұрын

you have no idea how much this video helped me in my exams.. implementations of attention masks in u-net were scarce, and the ones i found didn't really explain step-by-step. you did, however, and in such a nice and simple way! thank you soooo much you're my savior 🥲

@DigitalSreeni Жыл бұрын

Glad it helped!

@rajroy2426 Жыл бұрын

should not g have dobule the channel number than the skip connection ? @9:14

@trustsfundbaby7344 8 ай бұрын

I know this is a bit old, but yea I'm pretty sure he got it backwards. I'm looking at my implementation of RESUNET++ and the bridge block outputs more channels not reduces.

@zahrakhalid4731 Жыл бұрын

What is global and local attention in convolutional nural network?

@Ajaysharma-yv7zp 3 жыл бұрын

Thanks sir ... Great content

@DigitalSreeni 3 жыл бұрын

Glad you liked it

@Ajaysharma-yv7zp 3 жыл бұрын

@@DigitalSreeni Yes !! Sir also try to make videos on Optimization of features extracted using any transfer learning models with BAT , Gray wolf optimizer or any one else. Thanks again

@vishawjeetjyoti6566 9 ай бұрын

Can someone tell me if it's called channel-wise attention or spatial attention? I'm confused about it.

@DigitalSreeni 9 ай бұрын

In my code, the attention mechanism is implemented using both channel-wise attention and spatial attention. The attention mechanism is applied at multiple levels during the upsampling process. Channel-wise attention is incorporated through the use of gating signals and attention blocks. The gating_signal function creates a channel-wise gating signal by applying a 1x1 convolution to the input. The attention block (attention_block function) performs channel-wise attention by combining information from the downsampled and upsampled paths. Spatial attention is achieved through the use of upsampling layers and concatenation operations.

@MrDagonzin 3 жыл бұрын

Thanks for the video Mr. Sreeni. I'm really interested in this segmentation topic and your series of videos are amazing. I would like to ask you for a tool to segment images by hand to start a personal project. I remembered that you mentioned one in a past video but I can't find it. Thanks again

@DigitalSreeni 3 жыл бұрын

You can use any image processing software that lets you paint over your image and then save the overlay as a mask. These masks are your manual segmented results. For example, you can use APEER. May be the first few minutes of this video can help you understand the process. kzbin.info/www/bejne/nnrQl6ivgZeUhtU

@fahadsherwani2434 2 жыл бұрын

Thank you sir

@manavmadan793 3 жыл бұрын

Hi, very good explanation. But How is 0.2 < 0.1? and what does unaligned weights mean ?

@DigitalSreeni 3 жыл бұрын

I am not sure what you mean by 0.2 < 0.1. If you are referring to my example of aligned large weights get larger whereas aligned small weights get smaller, then may be I could have used better words to explain. In summary, if both weights are large then you get a large sum. If they both are small then you get a smaller sum value. If one is large and one is large, then you get something in between. Therefore, the final weights can reflect the attention given to objects of interest as the weights would be aligned at these locations. In the video, I did not say 0.2 < 0.1, I said, the sum would be small. What I missed was to say, 'the sum would would be small, in comparison to the sum of aligned weights.' I thought that was implied, but apparently not. Thanks for pointing it out.

@pankjain25 Жыл бұрын

This is Channel attention or Spatial attention ?

@armstrongaboah4504 3 жыл бұрын

I like your content

@omamuzoodeh9459 2 жыл бұрын

Please can it be used for multi class segmentation?

@yoyomcg 3 жыл бұрын

Is there a way to add in attention layer to unet models from segmentation_models?

@DigitalSreeni 3 жыл бұрын

You can import a model from segmentation models and deconstruct and reconstruct using attention. It involves some work and I do not recommend it as I am not convinced about the real effectiveness of attention.

@rs9130 3 жыл бұрын

please make video on fcn implementation

@McQLin Жыл бұрын

Why y = layers.multiply([unsample_psi, x]), but not unsample_psi multiply g, [unsample_psi, g]?

@DigitalSreeni Жыл бұрын

In an attention U-Net, the output of the attention mechanism is typically a weighting factor (also called attention coefficients or attention maps) that is used to modulate the feature maps of the U-Net. This weighting factor is computed by passing the input feature maps through a set of convolutional layers and then applying a softmax activation to obtain a set of values between 0 and 1 that represent the importance of each feature map. To use the attention weighting factor to modulate the feature maps, we need to multiply it with the original feature maps. In other words, we want to scale each feature map by its corresponding attention coefficient. This can be done using the Keras multiply layer, which multiplies two tensors element-wise. In the case of the attention U-Net, the two tensors that we want to multiply are the upsampled attention coefficients (upsample_psi) and the feature maps of the U-Net (x). We want to multiply these two tensors element-wise to obtain a set of scaled feature maps that take into account the attention coefficients. This is expressed as y = layers.multiply([upsample_psi, x]). We don't want to multiply the attention coefficients with the gating signal g, because the gating signal is used to compute the attention coefficients and is not itself a set of feature maps. The gating signal g is used to modulate the feature maps at a later stage in the network, after the attention mechanism has been applied. Specifically, the gating signal g is concatenated with the upsampled and modulated feature maps to produce the final output of the attention U-Net.

@TheMayankDixit 2 жыл бұрын

Very Nice Sir

@vishnuvinod7683 2 жыл бұрын

Thank you for your video. I have a doubt. What is the difference between unet with attention and a unet transformer? Or both are same?

@amankushwaha8927 2 жыл бұрын

All are different architectures. Transformers have scaled dot product attention which doesn't have convolution layers. Whereas U-Net is convolution based model.

@kaveh3480 2 жыл бұрын

Tnx!

@nisrinadinda5253 3 жыл бұрын

Hi ser, thankyou for your amazing explanation! I want to ask, why after ReLu activation, you did the convolution with 1x1 wtih n_filters = 1, why didnt you do it by applying 1x1 convolution with n_filters = 128?

@victorstepanov8483 2 жыл бұрын

Hi! As I understand, this step is required to transform data acquired by summing X and G, which both are 64 x 64 x 128 at this point (128 being the number of features for each pixel) into a matrix of weights for each pixel, which must be 64 x 64 x 1 (1 being the weight of each pixel). So after applying the ReLu you need a way to 'squash' these 128 features to just 1 value - the weight of the current pixel. And this is exactly what convolution with n_filters = 1 does.

@zakirshah7895 3 жыл бұрын

can we use this with CNN for image classification?

@DigitalSreeni 3 жыл бұрын

I am not sure, I haven't thought about that concept. May be there is some literature out there.

@padmavathiv2429 3 жыл бұрын

thank you sir nice explanation sir Is U-net architecture good only for definite shapes?

@sakethsathvik4183 3 жыл бұрын

Hai.. I watch all your videos.. Really useful.. Thank you. I implemented 2 architectures recurrent residual unet and it's attention gated variant.. But I am getting more performance for R2unet than attention gated R2unet.. Is this possible?why so?

@DigitalSreeni 3 жыл бұрын

In general, I find simple U-net to be more effective and efficient compared to many of its variants. R2Unet and Attention Unet may give you better results based on the type of details in your objects and background. Unfortunately, there are no papers I am aware of that talk about when R2Unet/Attention/ etc. will work better and when they fail. In summary, what you found out about R2Unet and its attention equivalent is possible.

@sakethsathvik4183 3 жыл бұрын

@@DigitalSreeni Thank you.

@yujanshrestha3841 3 жыл бұрын

Amazing content! Just curious, do you do any consulting? I have a few ML problems I could use an expert like yourself to help me with.

@McQLin Жыл бұрын

what is spacial information?

@DigitalSreeni Жыл бұрын

Spatial information in the context of Attention U-Net refers to the spatial relationships between elements of an image, such as pixels or objects. The Attention U-Net uses attention mechanisms to dynamically weight different regions of an image based on their spatial information. This allows the network to focus on the most relevant features for each task and improve accuracy of image segmentation tasks.

@caiyu538 3 жыл бұрын

nice

@Champignon1000 3 жыл бұрын

i love your videos, thank you. - i don't mean to be rude or anything, but can you say "ok" a little bit less?

@DigitalSreeni 3 жыл бұрын

ok :)

@pesky_mousquito 2 жыл бұрын

note that this self-attention is not exactly like transformers self-attention

@XX-vu5jo 3 жыл бұрын

No coding tutorial?

@DigitalSreeni 3 жыл бұрын

This is the intro video, the coding video will be out on July 14th., in a couple of days.

@AlainFavre-n4k Жыл бұрын

Well ...I'm a bit disappointed by this video! I know that segmenting mytochondria from liver cells is a challenge. It comes out from this tuto that 'U-Net with attention' does not seem to solve the problem.