If I didn't already know what was going on, I'd be supremely confused by this explanation 3:31... The channel @AnimatedAI has a great explanation on 1x1 convolutions. The way I think about it is you've made a previous layer with a bunch of filters. So maybe you have one filter detecting vertical edges, another doing horizontal edges, another doing angled edges, another detecting red to black transitions, another for yellow colors, etc etc. Now you've got a stack of those images, and you want to go thru each pixel and combine the results of those filters with some weights. So if you want to get complete edge detections, you might add the horizontal edge channel + vertical edge channel + diagonal edge channel (and not include the yellow channel results or the red to black channel). That's what the 1x1 convolution is doing. Mixing the results of the various filters. Maybe you have a second 1x1 filter channel that is trying to isolate yellow objects next to red objects (like mustard bottles next to ketchup bottles, idk). Then the second filter channel might heavily weight the yellow channel pixels and the red-black channel pixels but ignore the other channels. You inherently need mixing like this if you want to eventually get to "detect a dog's face".
@lennonli91004 жыл бұрын
isnt it just a regular filter with 1by1 dimension that is not used for edge detection but change filter dimension or add non linearity?
@lvdifulАй бұрын
For the example at 6:12, with the 32 1x1 filters, even the channel number dropped down to 32, but each channel are almost identical or be different just by the value of the 1*1 filter, is this correct? What is a typical use case for this example?
@JagtarSingh-pv9mn5 жыл бұрын
Is there information sharing happening across the channels in this case?
@katyhessni Жыл бұрын
Thanks
@gauravfotedar3 ай бұрын
I don't see the point in what he said about 1x1 convolution reducing filters? For any convolution filter 3x3, 5x5 or any size, the output channels are always determined by the number of filters not by filter size. So if you have 192 input channels, if you use 32 3x3 size filters, that will also reduce the channel dimension to 32 just like using 32 1x1 filters. So why decouple reducing height width and reducing channels? Filters of any size do both at the same time anyway.
@gauravfotedar3 ай бұрын
Okay, One use case is explained in the next Inception motivation video
@siddhantvats90883 жыл бұрын
Curious if we use filters any some other dim but less channel, won't it reduce the resulting channel?
@nikilragav4 ай бұрын
yes.
@subhamjha8917 Жыл бұрын
In what situations is it useful? Can you please provide some case study/example.
@whitesaladchips7 ай бұрын
to reduce number of channels
@deeplearningexplained5 ай бұрын
1. GoogLeNet Inception network 2. ResNet when they get to more than 50 layers
@MeAndCola5 жыл бұрын
Is this also the case for normal sized filters too? Filters aren't applied over 2D'ally, for each channel, but rather 3D'ally, over the entire channels?
@XxXMrGuiTarMasTerXxX4 жыл бұрын
As far as I understood, yeah. But I think sometimes (specially at the beginning of the network) the filter is shared over the three channels RGB. This is, instead of, for example, a 3x3x3 filter, you only have a 3x3x1 filter and the parameters are shared. However, this is a trick, and the filters are applied in 3D
@MrAmgadHasan Жыл бұрын
Yes. Every filter in a cnn will have 3 dimensions (height, width, depth) with depth being equal to the depth of the input features maps.
@btobin865 жыл бұрын
I'm not quite sure what he means when he says the output is the # of filters. Doesn't the output of one of those 1 x 1 x 32 (in this case) filters just a single real number?
@chaitragn33795 жыл бұрын
may be its a channel(R,G,B) and number of filters are different.
@anasfirdousi5 жыл бұрын
The output after applying filter = ( n - f + 1 ) x ( n - f + 1 ) x #of filters n = input dimension which is 6 x 6, so n = .6 f = filter dimension which is 1 x 1, so f = 1 # of filters = 32 so the final o/p after applying all filters will be: ( 6 - 1 + 1 ) x ( 6 -1 +1 ) x 32 = 6 x 6 x 32 The formula n - f + 1 works when stride = 1 , watch : kzbin.info/www/bejne/qZ6rkmVqaZd0npY
@NikhilAngadBakshi5 жыл бұрын
Yes for each filter at each location is a single real number. Therefore for one 1x1 filter the output over all locations in the input image volume is an image with depth=1. Usually we have multiple filters and hence the output depth is equal to the number of filters.
@sandyz10005 жыл бұрын
( 1 x 1 x 32 ) is the filter volume and #n x ( 1 x 1 x 32) where as n = no of filter. The output of ( 1 x 1 x 32) gives a scalar value for each pixel in the 32 input channel and #n is the no of channel in the output filter.
@טללהט-ו7ו2 жыл бұрын
This is old but I will answer for future viewers - In my understanding the # of filters IS NOT 32, the number of filters will be the number of times you applied different filters of 1X1X32, so if you did 1X1X32 X Z times you will get here 6X6XZ
@sandipansarkar92113 жыл бұрын
good explanation
@nikhilrana88005 жыл бұрын
When we multiply 1*1*32 filter with 6*6*32 then no. after multiplied we get for all the 32 channels, we have to take the sum and then apply the relu function to it. Is I am right??
@sammathew2435 жыл бұрын
Yes
@shaelanderchauhan19633 жыл бұрын
@@sammathew243 and relu will be a Number if greater than zero and 0 if no is less or equl to 0 right?
@urarakono4422 жыл бұрын
Does the yellow block of size 1*1*32 have the same numbers over 32 voxels?
@MrAmgadHasan Жыл бұрын
No. It can have 32 different weights.
@MabrookAlas Жыл бұрын
Great 👍🏼
@sadenb5 жыл бұрын
Can a siamese network be done upon 1x1 convolutions if we have precomputed 1-D features ?
@sandyz10005 жыл бұрын
You can use inception n/w which uses 1x 1 convolution for computing the embedding for the siamese n/w. Siamese which means same or similar is used in the final layer where constructive loss / triplet loss is used to optimise the loss function so that similar vectors tend to have less distance than dissimilar vectors with certain margins in it.
@som65537 ай бұрын
@@sandyz1000 when should one use it ? and when not?