*DeepMind x UCL | Deep Learning Lectures | 4/12 | Advanced Models for Computer Vision* *My takeaways:* *1. Why we need to go beyond image classification **0:15* *2. Plan for this lecture **3:40* *3. Tasks beyond classification **5:20* 3.1 Object detection 7:22 -Model: Fast R-CNN, two-stage detector 14:40 ---Identify good candidate bounding boxes ---Classify and refine -Model: RetinaNet, one-stage detector 20:55 3.2 Semantic segmentation 28:12 -Model: U-Net 32:45 3.3 Instance segmentation 36:31 -Reference model: Mask R-CNN 3.4 Metrics and benchmarks 37:48 -Classification: percentage of correct predictions; Top-1: top predication is the correct class; Top5: correct class is in top-5 predictions -Object detection and segmentation: Intersection-over-union (IoU) -Object detection and segmentation dataset: Cityscapes, COCO 3.5 Training tricks 42:58 -Transfer learning *4. Beyond single input image: motions are import cues **50:26* 4.1 Pairs of images 59:16 -Model: FlowNet 1:00:35 4.2 Video input 1:03:56 -Apply 2D model to each frame 1:04:12 -3D convolutions 1:05:48 4.3 Applications: -Action recognition 1:09:30 --Model: SlowFast 1:11:25 4.4 Training tricks 1:14:35 -Transfer learning 4.5 Challenges: difficult to obtain labels; large memory requirements; high Latency; high energy consumption 1:15:53 *5. Beyond strong supervision **1:20:23* 5.1 Data labelling is tedious 1:20:36 5.2 Self supervised learning 1:21:40 -Standard loss: learn mapping between inputs and output distributions/values -Metric learning: learn to predict distances between inputs given some similarity measure (e.g. same person or not) -State-of-the-art representation learning vs supervised learning on accuracy and number of parameters 1:29:41 *6. Open questions **1:30:16*
@TheAero Жыл бұрын
These lectures as interesting, however it feels like they are way too high-level.
@Marcos10PT4 жыл бұрын
The object detection part was a little confusing, too much explanation with only a few images to refer to. I think a more visual explanation would work better 😊
@wy25284 жыл бұрын
I feel the same
@gringo69694 жыл бұрын
Great course ! Thanks DeepMind, Thanks Viorica! just a little remark, it's hard to tell what she is showing with the laser pointer on the projection :)
@sayakchakrabarty4 жыл бұрын
This lecture series is great and bound to spread great knowledge
@susmitislam19103 жыл бұрын
Great lectures, thanks! One small request I'd like to make is that due to showing only the lecturer's face and the computer slides and not the projector screen in the class, in some complicated slides it gets hard to follow, as the teacher is obviously pointing at parts of those slides as she is speaking. If it's not inconvenient, please try and do that with a mouse pointer instead, so that it's clearer to the KZbin viewers. Thanks again!
@abhishekyadav4794 жыл бұрын
Correct me if I'm wrong but faster RCNN is a one stage detector and end-to-end differentiable as opposed to what is given in the lecture
@ArshedNabeel4 жыл бұрын
abhishek yadav It’s a single unit for the forward pass; but during training, the RPN (region proposal network) is trained separately using objectness scores for loss.
@ninadesianti95874 жыл бұрын
Oh my goodness. I’m so left behind in this field. I don’t know how to catch up. Thank you for the lesson!
@farhanhubble4 жыл бұрын
Thank you for sharing this. It's a great walkthrough of how computer vision has improved.
@dsazz8012 жыл бұрын
Thank you so much for the kind, simple, and well-explained lecture! The open question parts were great that give some insights about the near future :)
@lukn41003 жыл бұрын
Great lecture and big thanks to DeepMind for sharing this great content.
@DatascienceConcepts4 жыл бұрын
Quite useful
@ben64 жыл бұрын
I don't get why literally everywhere in the research (CV) community quote FPS without hardware. Even children know that FPS is hardware dependent, because they play games and the same game will have different FPS on the same graphics settings, sometimes even on the same machine based on cooling. Changing the performance of the hardware will drastically change the 'FPS'. From 1 to 1000. This number is totally meaningless without indicating the hardware. I guess its my job to guess which card you used and how many? In 2018 I would assume 1 or 2 NVIDIA GTX 1080TIs.
@ArshedNabeel4 жыл бұрын
Ben B You raise a very valid point, this is one of my pet peeves about CV literature too! Numbers like FPS or even running time are too dependent on the underlying hardware to be meaningful without context. A more meaningful measure would perhaps be ‘#computations per forward pass’ or something similar. In this particular case, the 5fps claim comes directly from the Faster RCNN paper, which came out in 2015. The exact details of the hardware are not mentioned in the paper (or at least I couldn’t find it). I assume it will be quite a bit faster on contemporary GPUs.
@thomasdeniffel21224 жыл бұрын
thank you!
@lizgichora64723 жыл бұрын
Thank you.
@danielsoeller3 жыл бұрын
I really like the series, and tr to watch half a a episode per day. This lecture was for me(not a native English Speaker) quit hard to follow. I don`t want to offend, i just want to give feedback. I think it was gerat that you gave it a shot, keep at it and you will became better :)
@mortenkallese40244 жыл бұрын
I hardly think the length of this video is a coincidence?!?!
@mohammadelassal80794 жыл бұрын
What do you mean? 😅
@ben64 жыл бұрын
Wow! We blink to reduce activity in the brain. I'm going to close my eyes when I think about things now. :)