DeepMind x UCL | Deep Learning Lectures | 3/12 | Convolutional Neural Networks for Image Recognition

Рет қаралды 63,313

Күн бұрын

Пікірлер: 30

@leixun 4 жыл бұрын

*DeepMind x UCL | Deep Learning Lectures | 3/12 | Convolutional Neural Networks for Image Recognition* *My takeaways:* *1. Plan for this lecture **0:10* *2. Background **1:30* 2.1 How can we feed images to a neural network? 2:55 2.2 Feedforward network can't learn images well, because it learns from the spatial correlation pixels, not features in images 4:50 2.3 Locality and translation invariance in inputs: images, sounds, text, molecules, etc. 7:12 2.4 How to take advantage of topological structure in inputs from 2.3: weight sharing and hierarchy 10:48 2.5 History: data-driven research, ImageNet challenge 12:17: -2012: AlexNet -2013: Improvement on AlexNet -2014: New architecture breakthroughs like VGG and GoogLeNet are fundamentally different from AlexNet -2015: New architecture breakthrough: ResNet -2015+: Saturated performance. Combine the predication of lots of models (assemble network), new building blocks, etc. have been tried *3. Building blocks **19:46* 3.1 From fully connected to locally connected 19:53 3.2 Receptive field, feature map, kernel/filter, channel 24:50 3.3 Valid convolution: output size = input size - kernel size +1 28:05 3.4 Full convolution: output size = input size + kernel size -1 28:50 3.5 Same convolution (desirable to reduce computation and create feature hierarchies): output size = input size 29:35 3.6 Strided convolution: kernel slides along with the image with a step >1 31:14 3.7 Dilated convolution: kernel is spread out, step >1 between kernel elements 32:27 3.9 Depth-wise convolution 34:00 3.10 Pooling: compute mean or max over small windows to reduce resolution 34:48 *4. Convolutional neural networks (CNN) **35:32* 4.1 Computational graphs: recap from lecture 2 36:20 *5. Going deeper: case studies **38:10* 5.1 LeNet-5 (1998) for handwritten digits 38:10 5.2 AlexNet (2012) for ImageNet dataset 39:35 -Very few connections between groups to reduce GPU communication -It is very difficult to train deep neural networks (e.g. 4-5 layers) with saturated activation function such as Sigmond and Tanh. AlexNet uses ReLU. -Regularization: dropout, weight decay 5.3 What limits the number of layers in CNN: computation complex, optimization difficulties 46:44 5.4 VGG (2014) 48:03 -Stack many convolution layers before pooling -Use the same convolution to avoid resolution reduction -Stack 3x3 kernels to have the same receptive filed as 5x5 kernels, but contain fewer parameters -Use data parallelism during training vs AlexNet uses model parallelism -Error plateaus after 16 layers (VGG has a few versions: VGG-11, VGG-13, VGG-16, VGG-19), this is due to optimization difficulties 51:18 5.5 Challenges of depth: computation complex, optimization difficulties 52:07 5.6 Techniques for improving optimization 52:45 -Careful initialization -Sophisticated optimizer: e.g. variants of gradient descent -Normalization layers -Network design: e.g. ResNet 5.7 GoogLeNet/Inception (2014) 55:27 -Inception block: multiple convolutions with different sizes in parallel with each other, or with pooling -Inception v2 introduces Batch Normalization (BN): reduces the sensitivity to initialization, makes the network more robust to the large learning rate. Moreover, because BN is applied on a back of data, it introduces stochasticity/noises to the network and acts as a regularizer. The downside of BN introduces a dependency between different images in the batch at test time, and this could be a source of bugs 56:39 5.8 ResNet (2015) 1:00:08 -Residual connections help training deeper networks: e.g. 152 layers -Bottleneck block: reduce the number of parameters 1:02:00 -ResNet v2 avoid all nonlinearities in the residual pathway: help training even deeper networks, e.g. 1000 layers 1:03:05 -Bottlenet block makes the ResNet cheap to compute 1:03:55 5.9 DenseNet (2016) 1:05:03 -Connect layers to all previous layers 5.10 Squeeze-and-Excitation Networks (SENet, 2017) 1:05:36 -Incorporate global context 5.11 AmoebaNet (2018) 1:06:33 -Neural architecture search -Search acyclic graphs composed of predefined layers 5.12 Other techniques for reducing complexity 1:07:50 -Depthwise convolution -Separable convolution -Inverted bottleneck (MobileNet v2, MNasNet, EfficientNet) *6. Advanced topics **1:09:40* 6.1 Data augmentation 1:09:58 6.2 Visualizing CNN 1:11:44 6.3 Other topics to explore (not in this lecture): pre-training and fine-tuning; group equivariant CNN; recurrence and attention 1:14:36 *7. Beyond image recognition **1:16:32* 7.1 What else can we do with CNN 1:16:45 -Object detection; semantic segmentation; Instance segmentation -Generative adversarial networks (GANs), autoencoders, Autoregressive models -Representation learning, self-supervised learning -Video, audio, text, graphs -Reinforcement learning agents *Ending **1:18:55*

@Amy_Yu2023 4 жыл бұрын

Thanks for shaing. Very helpful.

@simondemeule3934 4 жыл бұрын

Thank you for taking the time to type this out

@leixun 4 жыл бұрын

@@simondemeule3934 You are welcome!

@bazsnell3178 4 жыл бұрын

You didn't need to do all this, but could have downloaded the slides when you see the link when you click on the SHOW MORE text at the top of the page. ll of the lectures have the accompanying slides.

@ans1975 3 жыл бұрын

It's impossible not to love this guy: Beside clarity, kindness shines through these words!

@lukn4100 3 жыл бұрын

Great lecture and big thanks to DeepMind for sharing this great content.

@franciswebb7522 4 жыл бұрын

Really enjoyed this. Can anyone explain how we achieve 96 channels at 43:58? Would this layer contain 32 different 11x11 kernels that are fed in each input channel (RGB)?

@dkorzhenkov 4 жыл бұрын

No, this layer consists of 96 filters with a kernel size of 11x11. Each of these filters takes all three channels (RGB) as input and returns a single channel. Therefore, the total number of learnable parameters for this layer equals 3*96*11*11 = input_dim*output_dim*kernel_size*kernel_size

@pervezbhan1708 2 жыл бұрын

kzbin.info/www/bejne/qJC0YmWLfsuAoqc

@djeurissen111 4 жыл бұрын

Increasing the depth in combination with skip-connections feels like flattening an iteration. In the previous lecture it was already mentioned that Neural Networks cannot do multiplication, which is essentially repeatably applying addition with respect to an certain amount of repetitions. What if we make the ConvNets recursive, take image => compute some features and a probability if we are done => take image and features => compute features => repeat until done...

@bartekbinda1114 3 жыл бұрын

Thank you for this great informative lecture, really helped me to understand the topic better and prepare for my bachelor thesis :)

@DanA-st2ed 3 жыл бұрын

im doing CAD of TB using CXR for my bachelors also, good luck!

@ninadesianti9587 4 жыл бұрын

Thank you for sharing the lecture. Great presentation!

@esaprakasa 4 жыл бұрын

Clear and detail presentation!

@_rkk 3 жыл бұрын

excellent lecture . Thankyou Sander

@lizgichora6472 3 жыл бұрын

Very interesting, Architectural Structure and infrastructure, thank you very much.

@1chimaruGin0_0 4 жыл бұрын

Is there any lecture offer by this man?

@Iamine1981 4 жыл бұрын

ImageNet was a Classification problem involving 1000 distinct classes. Why did Research stop at 1000 classes only, and not expand the set of classes to way more than that? Was K = 1000 chosen as an optimal number based on the availability of data, or were there other concerns? Thank you. Amine

@paulcurry8383 3 жыл бұрын

AFIAK 1000 classes is an arbitrary choice so most likely availability of data, and 1000 being a sufficiently large number of classes to make the dataset “challenging”