Loved your series on self-supervised learning. Are you also planning to cover DINOv2? I am particularly curios about the emergence property of the model -- how it is able to regress semantically consistent features for different parts of the objects (and not simple FG-BG separation as in DINOv1)!
@江楓漁火-e5u4 ай бұрын
Hi, I'm a bit confused about the centering method you described in this video(3:25). In your video, you're adding the center to the online network's output, which is different from what I've seen in other implementations of DINO (kzbin.info/www/bejne/nmTMm2Z8aMiDf80si=BUj7iQMXKaEs0Nr1&t=1296). Most implementations subtract the center from the output. Could you please clarify if there's an error in the video or if this is a different approach to centering?
@nasosgerontopoulos5267 Жыл бұрын
Very good content. Congrats 👍. Reading papers can be tough for many people, and such videos make it a lot easier to keep up with these state of the art advancements. As a fellow researcher, do you think investing time in self-supervised learning research is worth it right now? Considering that me and my team do not have access to such computational power as META and Google, I am not sure if we can keep up.
@borismeinardus Жыл бұрын
Hey, thanks! 😊 I think it is worth it! SSL is a broad field and SSL in the case of Multi-Modal Learning is very relevant. Yes, you will likely not be able to build the largest foundation models and go for scale, but you can definitely work on more nuanced research. E.g. Imagebind is a great example of a simple idea that does not require all the data and compute in the world. Btw. I also have a video on that paper :) kzbin.info/www/bejne/h4KtZHyIZcabg80si=VYxxIQPiyAXnlsw9