#55 Dr. ISHAN MISRA - Self-Supervised Vision Models

Рет қаралды 23,601

Machine Learning Street Talk

Күн бұрын

Пікірлер: 55

@ChaiTimeDataScience 3 жыл бұрын

I can never get enough of the Epic Tim intros! :D

@MachineLearningStreetTalk 3 жыл бұрын

❤

@yuviiiiiiiiiiiiiiiii 3 жыл бұрын

It was like a small literature review section in itself.

@rogerfreitasramirezjordan7188 3 жыл бұрын

This is what youtube for. Clear explanations and a beautiful intro! Tim intro is fundamental for understanding latter

@MachineLearningStreetTalk 3 жыл бұрын

Thanks!

@AICoffeeBreak 3 жыл бұрын

Thanks, this episode is 🔥! You ask many questions I had in mind lately.

@aurelius2515 3 жыл бұрын

This was definitely one of the better episodes - covered a lot of ground in some good detail with excellent content and good guiding questions and follow-up questions.

@tinyentropy 2 жыл бұрын

You guys are so incredible. Thank you so much. We appreciate this every single second. ☺️☺️☺️

@beliefpropagation6877 3 жыл бұрын

Thank you for acknowledging the serious problems of calling images from Instagram "random", as is claimed in the SEER paper!

@Self-Duality 2 жыл бұрын

Diving deep into this topic myself! So complex yet elegant… 🤔🤩

@maltejensen7392 3 жыл бұрын

Such high quality content, so happy I found this channel!

@drpchankh 3 жыл бұрын

Great episode and discussion! I think this discussion should also include GAN latent discovery discussion. Unsupervised learning is what every DS nirvana in production. On a side note, modern GAN can potentially span multi-domain though current works mainly are centered on single domain dataset area like Face, Bedroom etc. The latent variables or feature spaces are discovered in an unsupervised fashion by the networks though much work remains to be discovered for better encoder and generator/discriminator architecture. Current best model can reconstruct scene with different view angles, different lightings, different colours etc BUT they still CANNOT conjure up a structurally meaningful texture/structure of the scene, e.g. bed, table, curtain gets contorted beyond being a bed, table. ... It will be interesting to see if latent features discovered in GAN can help in unsupervised learning too.

@drpchankh 3 жыл бұрын

GANs are unsupervised learning algorithms that use a supervised loss as part of the training :)

@minma02262 3 жыл бұрын

My gawd. I love this episode!!!

@strategy_gal 3 жыл бұрын

What a very interesting topic! It's amazing to know why these vision algorithms actually work!

@yuviiiiiiiiiiiiiiiii 3 жыл бұрын

Here from Lex Fridman's shout out in his latest interview with Ishan Misra.

@MachineLearningStreetTalk 3 жыл бұрын

❤

@sugamtyagi101 3 жыл бұрын

An agent always has a goal. No matter how broad or big, the data samples that it will collect from real world will be skewed towards that broader goal. So data samples collected by a such an agent will also have an inductive bias. Therefore collection of data is never completely disentangled from the task. So even if you pose a camera on a monkey or a snail, there will be a pattern to the data (i.e.. bias) that is collected. On the contrary to this approach of say taking completely random samples of images, say generated by a camera, which is parameterized by it's position (in the world) and view direction which are generated by a random number generator, will have very uniform distribution. But it that sense, is that even intelligence ? I think any form of intelligence ultimately imbues some sort of intrinsic bias. Humans beings being the most general intelligence machines and our goals (which is also learnt over time), also collect visual data in a converging fashion with age. Though still very general, humans too have a direction. PS. Excellent Video. Thanks for picking this up.

@ayushthakur736 3 жыл бұрын

Loved the episode. :)

@mfpears 3 жыл бұрын

23:00 The tendency of mass to clump together and increase spatial and temporal continuity...

@abby5493 3 жыл бұрын

Amazing video 😍

@akshayshrivastava97 3 жыл бұрын

Great discussion! A follow-up question, one thing I didn't quite understand (perhaps I'm missing something obvious)..... with ref. to 6:36, from what I heard/read through the video/paper, these attention masks were gathered from the last self-attention layer of a VIT. DINO paper showed that one of the heads in the last self-attention layer is paying attention to areas that correspond to actual objects in the original image. Kinda seems weird, I'd think that by the time you reach the last few layers, the image representation would have been altered in ways that would make the original image irrecoverable. Would it be accurate to say this implies the original image representation either makes it through to the last layer(s) or it's somehow recovered?

@dmitryplatonov 3 жыл бұрын

It is recovered. It traces back where are the inputs which trigger the most attention.

@akshayshrivastava97 3 жыл бұрын

@@dmitryplatonov thanks.

@LidoList 2 жыл бұрын

Correction: In 13:29, you said BYOL as Bring Your Own Latent. Actually, it should be Bootstrap Your Own Latent (BYOL) Augmentation technique

@MachineLearningStreetTalk 2 жыл бұрын

Yep sorry

@nathanaelmercaldo2198 3 жыл бұрын

Splendid video! Really like the intro music. Would anyone happen to know where to find the music used?

@MachineLearningStreetTalk 3 жыл бұрын

soundcloud.com/unseenmusic/sets/ambient-electronic-1

@valkomilev9238 3 жыл бұрын

I was wondering if quantum computing will help with the latent variables mentioned at 1:24:54

@zahidhasan6990 3 жыл бұрын

It doesn't matter when I am not around, i.e. what happens in 100 years. - Modified from Mishra.

@sabawalid 3 жыл бұрын

Is a "cartoon banana" and a "real banana" subtypes of the same category, namely a "banana"? There's obviously some relation between the two, but Ishan Misra is absolutely right, a "cartoon banana" is a different category and is not a subtype of a "banana" (it cannot be eaten, it does not smell or taste like a banana, etc...) Interesting episode, as usual, Tim Scarfe

@tfaktas 2 жыл бұрын

What software are you using for annotating/presenting the papers?

@angelvictorjuancomuller809 3 жыл бұрын

Hi, awesome episode! Can I ask which paper's is the figure in 1:15:51? It's supposed to be DINO but I can't find it in the DINO paper. Thanks in advance!

@MachineLearningStreetTalk 3 жыл бұрын

Page 2 of the DINO paper. Note "DINO" paper full title is "Emerging Properties in Self-Supervised Vision Transformers" arXiv:2104.14294v2

@angelvictorjuancomuller809 3 жыл бұрын

@@MachineLearningStreetTalk Thanks! I was looking to another DINO paper (arXiv:2102.09281 ).

@MadlipzMarathi 3 жыл бұрын

here from lex.

@rubyabdullah9690 3 жыл бұрын

what if you create a simulation about a first world (when there is no technology etc) and then create an Agent that learn about the environtment make the Agent and World rule as close as possible in real world and then try to learn like the monster architecture of Tesla, but it's unlabelled, it's kinda super duper hard to make, but i think that the best approach to create an Artificial General Intelligence :v

@himanipku22 3 жыл бұрын

44:23 Is there a paper somewhere that I can read on this?

@MachineLearningStreetTalk 3 жыл бұрын

You mean the statement from Ishan that you could randomly initialise a CNN and it would already know cats are more similar to each other than dogs? Hmm. The first paper which comes to mind is this arxiv.org/abs/2003.00152 but I think there must be something more fundamental. Can anyone think of a paper?

@_ARCATEC_ 3 жыл бұрын

It's interesting how useful simple edits like crop, rotation, contrast, edge and curve + the Appearance of dirty pixels within intentionally low resolution images are, while Self learning is being applied. 🍌🍌🍌😂So true 💓 the Map is not the territory.

@shivarajnidavani5930 3 жыл бұрын

Fake blur is very irritating. Hurts to see

@massive_d 3 жыл бұрын

Lex gang

@MachineLearningStreetTalk 3 жыл бұрын

We are humbled to get the shout-out from Lex!

@fast_harmonic_psychedelic 3 жыл бұрын

Theres a lot of emphasis on this "us vs them" "Humans vs the machine" themes in your introduction, which i think is excessive and biased . Its not man and machine. It's just us. They are us. We're them.

@SimonJackson13 3 жыл бұрын

Radix sort O(n)

@SimonJackson13 3 жыл бұрын

When k < log(n) it's fantastic.

@SimonJackson13 3 жыл бұрын

For a cube root of bits in range a 6n FILO stack list sort time is indicated.

@MachineLearningStreetTalk 3 жыл бұрын

We meant that O(N log N) is the provably fastest comparison sort but great call out on Radix 😀

@fast_harmonic_psychedelic 3 жыл бұрын

machines are just an extension of nature just like a tree, a beehive, or a baby

@MachineLearningStreetTalk 3 жыл бұрын

For those who want to learn more from Ishan and more academic detail on the topics covered in the show today, Alfredo Canziani just released another show twitter.com/alfcnz/status/1409481710618693632 😎