*DeepMind x UCL | Deep Learning Lectures | 3/12 | Convolutional Neural Networks for Image Recognition* *My takeaways:* *1. Plan for this lecture **0:10* *2. Background **1:30* 2.1 How can we feed images to a neural network? 2:55 2.2 Feedforward network can't learn images well, because it learns from the spatial correlation pixels, not features in images 4:50 2.3 Locality and translation invariance in inputs: images, sounds, text, molecules, etc. 7:12 2.4 How to take advantage of topological structure in inputs from 2.3: weight sharing and hierarchy 10:48 2.5 History: data-driven research, ImageNet challenge 12:17: -2012: AlexNet -2013: Improvement on AlexNet -2014: New architecture breakthroughs like VGG and GoogLeNet are fundamentally different from AlexNet -2015: New architecture breakthrough: ResNet -2015+: Saturated performance. Combine the predication of lots of models (assemble network), new building blocks, etc. have been tried *3. Building blocks **19:46* 3.1 From fully connected to locally connected 19:53 3.2 Receptive field, feature map, kernel/filter, channel 24:50 3.3 Valid convolution: output size = input size - kernel size +1 28:05 3.4 Full convolution: output size = input size + kernel size -1 28:50 3.5 Same convolution (desirable to reduce computation and create feature hierarchies): output size = input size 29:35 3.6 Strided convolution: kernel slides along with the image with a step >1 31:14 3.7 Dilated convolution: kernel is spread out, step >1 between kernel elements 32:27 3.9 Depth-wise convolution 34:00 3.10 Pooling: compute mean or max over small windows to reduce resolution 34:48 *4. Convolutional neural networks (CNN) **35:32* 4.1 Computational graphs: recap from lecture 2 36:20 *5. Going deeper: case studies **38:10* 5.1 LeNet-5 (1998) for handwritten digits 38:10 5.2 AlexNet (2012) for ImageNet dataset 39:35 -Very few connections between groups to reduce GPU communication -It is very difficult to train deep neural networks (e.g. 4-5 layers) with saturated activation function such as Sigmond and Tanh. AlexNet uses ReLU. -Regularization: dropout, weight decay 5.3 What limits the number of layers in CNN: computation complex, optimization difficulties 46:44 5.4 VGG (2014) 48:03 -Stack many convolution layers before pooling -Use the same convolution to avoid resolution reduction -Stack 3x3 kernels to have the same receptive filed as 5x5 kernels, but contain fewer parameters -Use data parallelism during training vs AlexNet uses model parallelism -Error plateaus after 16 layers (VGG has a few versions: VGG-11, VGG-13, VGG-16, VGG-19), this is due to optimization difficulties 51:18 5.5 Challenges of depth: computation complex, optimization difficulties 52:07 5.6 Techniques for improving optimization 52:45 -Careful initialization -Sophisticated optimizer: e.g. variants of gradient descent -Normalization layers -Network design: e.g. ResNet 5.7 GoogLeNet/Inception (2014) 55:27 -Inception block: multiple convolutions with different sizes in parallel with each other, or with pooling -Inception v2 introduces Batch Normalization (BN): reduces the sensitivity to initialization, makes the network more robust to the large learning rate. Moreover, because BN is applied on a back of data, it introduces stochasticity/noises to the network and acts as a regularizer. The downside of BN introduces a dependency between different images in the batch at test time, and this could be a source of bugs 56:39 5.8 ResNet (2015) 1:00:08 -Residual connections help training deeper networks: e.g. 152 layers -Bottleneck block: reduce the number of parameters 1:02:00 -ResNet v2 avoid all nonlinearities in the residual pathway: help training even deeper networks, e.g. 1000 layers 1:03:05 -Bottlenet block makes the ResNet cheap to compute 1:03:55 5.9 DenseNet (2016) 1:05:03 -Connect layers to all previous layers 5.10 Squeeze-and-Excitation Networks (SENet, 2017) 1:05:36 -Incorporate global context 5.11 AmoebaNet (2018) 1:06:33 -Neural architecture search -Search acyclic graphs composed of predefined layers 5.12 Other techniques for reducing complexity 1:07:50 -Depthwise convolution -Separable convolution -Inverted bottleneck (MobileNet v2, MNasNet, EfficientNet) *6. Advanced topics **1:09:40* 6.1 Data augmentation 1:09:58 6.2 Visualizing CNN 1:11:44 6.3 Other topics to explore (not in this lecture): pre-training and fine-tuning; group equivariant CNN; recurrence and attention 1:14:36 *7. Beyond image recognition **1:16:32* 7.1 What else can we do with CNN 1:16:45 -Object detection; semantic segmentation; Instance segmentation -Generative adversarial networks (GANs), autoencoders, Autoregressive models -Representation learning, self-supervised learning -Video, audio, text, graphs -Reinforcement learning agents *Ending **1:18:55*
@Amy_Yu20234 жыл бұрын
Thanks for shaing. Very helpful.
@simondemeule39344 жыл бұрын
Thank you for taking the time to type this out
@leixun4 жыл бұрын
@@simondemeule3934 You are welcome!
@bazsnell31784 жыл бұрын
You didn't need to do all this, but could have downloaded the slides when you see the link when you click on the SHOW MORE text at the top of the page. ll of the lectures have the accompanying slides.
@ans19753 жыл бұрын
It's impossible not to love this guy: Beside clarity, kindness shines through these words!
@lukn41003 жыл бұрын
Great lecture and big thanks to DeepMind for sharing this great content.
@franciswebb75224 жыл бұрын
Really enjoyed this. Can anyone explain how we achieve 96 channels at 43:58? Would this layer contain 32 different 11x11 kernels that are fed in each input channel (RGB)?
@dkorzhenkov4 жыл бұрын
No, this layer consists of 96 filters with a kernel size of 11x11. Each of these filters takes all three channels (RGB) as input and returns a single channel. Therefore, the total number of learnable parameters for this layer equals 3*96*11*11 = input_dim*output_dim*kernel_size*kernel_size
@pervezbhan17082 жыл бұрын
kzbin.info/www/bejne/qJC0YmWLfsuAoqc
@djeurissen1114 жыл бұрын
Increasing the depth in combination with skip-connections feels like flattening an iteration. In the previous lecture it was already mentioned that Neural Networks cannot do multiplication, which is essentially repeatably applying addition with respect to an certain amount of repetitions. What if we make the ConvNets recursive, take image => compute some features and a probability if we are done => take image and features => compute features => repeat until done...
@bartekbinda11143 жыл бұрын
Thank you for this great informative lecture, really helped me to understand the topic better and prepare for my bachelor thesis :)
@DanA-st2ed3 жыл бұрын
im doing CAD of TB using CXR for my bachelors also, good luck!
@ninadesianti95874 жыл бұрын
Thank you for sharing the lecture. Great presentation!
@esaprakasa4 жыл бұрын
Clear and detail presentation!
@_rkk3 жыл бұрын
excellent lecture . Thankyou Sander
@lizgichora64723 жыл бұрын
Very interesting, Architectural Structure and infrastructure, thank you very much.
@1chimaruGin0_04 жыл бұрын
Is there any lecture offer by this man?
@Iamine19814 жыл бұрын
ImageNet was a Classification problem involving 1000 distinct classes. Why did Research stop at 1000 classes only, and not expand the set of classes to way more than that? Was K = 1000 chosen as an optimal number based on the availability of data, or were there other concerns? Thank you. Amine
@paulcurry83833 жыл бұрын
AFIAK 1000 classes is an arbitrary choice so most likely availability of data, and 1000 being a sufficiently large number of classes to make the dataset “challenging”
@shahrozkhan46834 жыл бұрын
Great presentation!
@luisg.camara25254 жыл бұрын
Really good and engaging lecture. As a note, and being a misophonia sufferer, I had trouble with the constant mouth smacking sounds.
@whatever_mate2 жыл бұрын
Came in the comments just to see if I was alone in this :(