SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization (Paper Explained)

Рет қаралды 10,638

Күн бұрын

Пікірлер: 61

@WalidHefny-qs8bw 4 жыл бұрын

I have got to say I am deeply grateful for how you express jargon in other words. I realize scientists have to use formally-recognized vocabulary in their papers, but I do believe this hinders communicating science -- at least for beginners like myself.

@immortaldiscoveries3038 4 жыл бұрын

ya cus i have no frikin clue what he said in the video, and I'm the top AGI researcher on Earth as it stands, so...

@WannabePianistSurya 4 жыл бұрын

Inspired by your videos, started a research paper study group thing, presenting papers in a similar way of your videos. I thank you for putting these kind of videos for us plebs who don't have access to high quality education (I'm from India). Your work is invaluable.

@DeepGamingAI 4 жыл бұрын

Any vacancies in your study group? 👀 (Would love to have people to discuss these topics with since learning in isolation isn't proving to be very productive.)

@arjunashok4956 4 жыл бұрын

@@DeepGamingAI +1. Same question. I am an undergraduate and I have worked on a few research projects.

@manojb8876 4 жыл бұрын

Same here! Please let us know

@rahuldeora1120 4 жыл бұрын

@@manojb8876 Yes let us know

@WannabePianistSurya 4 жыл бұрын

Hello, I am glad to see so many people are interested. Please message me on LinkedIn, Name-"Surya Kant Sahu", (not sure if I can post link here). I'll send you a link to the group.

@PortfolioCasWognum 4 жыл бұрын

Hi Yannic, thank you for your consistent, high quality videos! I would be very interested in hearing your take on "Making sense of sensory input" by Richard Evans et al. (DeepMind). I have been lucky enough to attend a presentation by Richard Evans at my university on the paper, but have more recently, especially after your video on Chollet's paper, repeatedly found myself thinking that it might be (the start of) a big missing piece in current AI research, yet it doesn't seem to get any attention.

@andresfernandoaranda5498 4 жыл бұрын

Your contributions are valuable, thanks for doing these vids))

@chris--tech 4 жыл бұрын

the first thing i need to do every morning is checking out your videos to see which new papers recently is proposed in deep learning. By the way i'm a student from Beijing, it's helpful to know the newest progress and thanks for sharing.

@abhirajkanse6418 4 жыл бұрын

Bruh, the exact same idea(attn between diff-size features) came into my mind while you were explaining the paper. Sounds quite interesting, won't be surprised to see a paper on it soon.

@creatiffshik 4 жыл бұрын

Probably they should just go ahead allowing triple connection? Two papers down the line.... Great break down!

@slackstation 4 жыл бұрын

As I was watching, I had the intuition of what if we could vary the blocks for learning on each image. Then maybe we could learn from the tags on those images what the best route of learning blocks is for that tag or combination of tags. We could learn that scenes with many things or overlapping things tend to do well with this route, outdoor scenes do better with this route, etc. Then I see your proposal at minute 28. It feels good for an intuition to be in line with someone of your expertise. As always, great work. This gave me a good insight into the reasoning and architecture of Resnet models.

@YannicKilcher 4 жыл бұрын

I like your idea ;)

@florianhonicke5448 4 жыл бұрын

Nice idea to compute the perfect network using attention. We should try that out

@herp_derpingson 4 жыл бұрын

15:21 Reminds me of the paper "SqueezeNet", it used to be quite popular back in 2017. . 16:50 There must be some bandwidth limit for gradient information in a float32. We cant just reduce the dimensions of a matrix to a 1x1 and expect it to have the same performance. I wonder what happens if we use float64 for these bottlenecks. . 20:19 Doesnt our brain look like this too? In our brain the neurons are pretty much fully connected locally. It would be interesting if someone made a network which has skip connections going from all layers to all layers, even backwards. Ok, maybe not backwards. 28:15 Very similar idea, now also add skip connections. . Also, although the network has the same number of parameters, it does not mean that it consumes the same about of floating point operations. I think it should take significantly more due to the large number of upsample and downsample operations.

@YannicKilcher 4 жыл бұрын

Yea I thought of SqueezeNet too, but I guess that was also hand-designed, so not as fancy cool :) I think they explicitly measure flops and show that theirs consumes less, but I agree, compared to something like a vgg, the TPU processing this jumbled mess must be constantly tripping up :D

@swanbosc5371 4 жыл бұрын

A year ago I read the paper octConv and thought of an architecture a little like what you are proposing there. However, I didn't use attention layer to route information : After each blocks, half of the features would be kept and the other half would be up or down sampled to be stacked with other feature maps. I started too investigate the addition of SEblocks to take care of the "dynamic" routing of information. Never finished testing this tho

@Tehom1 4 жыл бұрын

The paper mentions reward to the super-NN that controls the topology of the NN, implying that the super-NN is always optimizing for best final performance, but I have to wonder if it might be better for the super-NN to spend most of its time active-learning to predict final performance from initial configuration, and only optimize after that is learned well. But I like your design better.

@adamantidus 4 жыл бұрын

Thank you, Yannic, for the great job you are doing! I am not a big fan of this kind of research. Of course, if you explore enough you will eventually come up with a weird architecture that happens to work better. This, however, comes at the cost of sacrificing the intuition behind the model and our (already limited) understanding of the whole thing. I think Yannic made the point when he says that the idea of doing the spine model deeper by simply concatenating blocks goes in the opposite direction of the core idea in the paper. I also agree with Yannic in that it is quite likely that the boost performance comes from the fact that the connections have been doubled, and that this should have been explored further. Finally, the idea of Yannic about implementing a sort of dynamic routing via attention is very interesting. It should even be computationally cheaper than using RL to explore architectures. The whole paper is interesting though. Thanks again for reviewing it!!

@YannicKilcher 4 жыл бұрын

thanks for the feedback :)

@samkumar2377 4 жыл бұрын

Really this is very helpful. I wanna see it's improvisation with capsule networks.

@tylertheeverlasting 4 жыл бұрын

The double input to layers is technically not a double input because there's almost always some small number of layers in between. The double skip output is a new thing though (that I know of) ... Typically the double outputs are to the FPN/UNet etc block, but the backbone usually has single skip output.

@oneman7094 4 жыл бұрын

It would be interesting to see how the architecture found differs from one image dataset to the other. It seems to me that the found architecture could not be worse that ResNet50 (it could just use that then) and that this is just hyperparameter overfitting.

@UsmanAhmed-sq9bl 4 жыл бұрын

Awesome. Great video. Keep going. 🎊👍

@samanthaqiu3416 4 жыл бұрын

7:10 Those skip connections seem to make ENTIRELY POINTLESS the bottleneck, since the goal was to force it to learn high-level global features Why is my conclusion wrong? Or do you agree that skip connections defeat the purpose of the bottleneck?

@YannicKilcher 4 жыл бұрын

You might be right, but on the other hand, skip connections have no or few learnable parameters, so any computation still has to be done by the bottleneck

@deterministicalgorithmslab1744 4 жыл бұрын

I think the ablation with ResNet with 2 connections because each layer of ResNet already has 2 connection, one residual and one transformational. There are no residual connections in SpineNet.

@Baradthalion 4 жыл бұрын

I wonder how this new backbone compared to the backbone used in more modern detection and segmentation network (SENet or CSP)

@Ting3624 4 жыл бұрын

the juice of this video : 27:00

@siyn007 4 жыл бұрын

You said it in the last video. It seems like this field is starting to become more and more difficult to navigate by just using a laptop. I wonder what the next field is to only need a laptop for...RL atm?

@YannicKilcher 4 жыл бұрын

idk, let's just all become cattle ranchers :)

@siyn007 4 жыл бұрын

@@YannicKilcher XD

@noninvasive_rectal_probe8990 4 жыл бұрын

Waiting for ultimate YannicNet-69 with routing by attention🤗

@Ronschk 4 жыл бұрын

I don't think 1x1 convolutions were introduced in the resnet paper (as you say around 15:12), but in "Network in Network" [ arxiv.org/pdf/1312.4400.pdf ](?) Not that it matters that much :P

@YannicKilcher 4 жыл бұрын

True, thanks :)

@joddden 3 жыл бұрын

Why did you write the "c" in "cat" last?

@speed100mph 4 жыл бұрын

isn't your idea very similar to inception network ?

@aishwaryabalwani7545 4 жыл бұрын

Thought the same too - except that attention allows the network to be a little more "dynamic" than InceptionNet...

@Awesome_Lemon_0 4 жыл бұрын

The idea you propose looks like this paper (openreview.net/pdf?id=BkXmYfbAZ ), except with weird block sizes

@manojb8876 4 жыл бұрын

Link doesnt work

@Awesome_Lemon_0 4 жыл бұрын

@@manojb8876 thanks; it didn't work because the bracket became part of the link; seems to work now

@G12GilbertProduction 4 жыл бұрын

But meta-supervised networks architecture in 153×26 for the block segments it's outfront for other 153 block company analysed by hyperstructure resource decrease in not themselves source data, but covered outerspace of RNN.

@pablovela2053 4 жыл бұрын

I'd love for you to go through the paper HRNet arxiv.org/abs/1908.07919 as its exploring a similar concept

@samjoel4152 4 жыл бұрын

Yannicks idea is awesome...but we need heavy computational resources to do this I guess😅

@ahmadchamseddine6891 4 жыл бұрын

Terrorists can use it for their own purposes as much as any military/intelligence agency!! 35:28

@sheggle 4 жыл бұрын

Is your proposal not just Google's 600B parameter language model?

@YannicKilcher 4 жыл бұрын

mine's only 599

@444haluk 4 жыл бұрын

This paper is a pseudo meta-learning. "Try every possible combination but eliminate some combination via RL, voila reward is ready."

@marat61 4 жыл бұрын

It is very chinese article

@StanislavSchmidt1 4 жыл бұрын

Hi Yannic thanks for the effort you're putting into your videos. A couple of comments I have: 1. I find the fact that you giggle quite a lot throughout your videos a bit distracting. But maybe it's just me. 2. I see you say things like "to up the number of features" but I think saying "to increase the number of features" would sound better. Maybe a native speaker could give their opinion here.

@Coolguydudeness1234 4 жыл бұрын

Don't mind the giggles personally

@rbain16 4 жыл бұрын

I don't mind them either. The criticisms seem rather trivial, no offense.