Yannic is like that nurse that mashes the potato with a spoon and gives it to you so that you toothless nerds can get fed
@YannicKilcher4 жыл бұрын
This made me laugh so hard :D
@cerebralm4 жыл бұрын
LOL
@theoboyer38124 жыл бұрын
That's a funny summary of what a teacher is
@cerebralm4 жыл бұрын
@@theoboyer3812 I heard concise explanation described as "you have to make an ikea furniture without using all the pieces, but it still has to be sturdy when your done"
@swordwaker77493 жыл бұрын
Ahh... more like a chef. The papers in the original form can be hard to digest without... some help. BTW, the paper is like dragon meat.
@finlayl25054 жыл бұрын
Relationship ended with conv nets, transformers are my best friend now
@hoaxuan70744 жыл бұрын
A Fast Transform fixed filter bank neural network trained as an autoencoder works quite well as a GAN. Noise in image out. I guess with filter in the title...
@lunacelestine85743 жыл бұрын
That made my day
@dasayan054 жыл бұрын
25:57 convolutions are for loosers, we're all for locally applied linear tranformation .. 😂
@xtli59653 жыл бұрын
They actually updated the paper so that: they no long use super-resolution co-training and locality-aware initialization, but instead using relative positional embedding and modified normalization. Also they tried larger images with local self-attention to reduce memory bottleneck. The most confusing part in this paper for me is the UpScale and AvgPool operation, since outputs from a transformer are suppose to be global feature, so it feels strange to directly upsample or pool as we do to convolution features.
@puneetsingh52194 жыл бұрын
Yannic is on fire 🔥🔥
@rallyram4 жыл бұрын
Why do you think they go with the wgan grad penalty instead of the spectral normalization as per Appendix A.1?
@hk27804 жыл бұрын
So why should we not use the conv when we use the locally linear function? I do not get any point from that. Also why they use 16 crop things. To be honest it is almost same as 16 stride 16 x 16 kernel size conv. And then they said that we do not use the convolution. Well they use the same thing to do with convolution. Sounds like it becomes more con artist tihg.
@dl99263 жыл бұрын
but that would be so expensive isn't it ?
@wilsonthurmanteng94 жыл бұрын
Hi Yannic, fast reviews as usual! I would just like your thoughts on the loss functions of the recent Continuous Conditional GAN paper that was accepted into ICVR 2021.
@MightyElemental Жыл бұрын
I was attempting to build a TransGAN for a university project and ended up with a very similar method. Only thing that was missing was the localized attention. No way was I gonna get that 💀
@dasayan054 жыл бұрын
1:11 "which bathroom do the TransGANs go to ?"
@nguyenanhnguyen76583 жыл бұрын
There is no high-res benchmark for TransGAN vs StyleGANV2 so we do not know if it is worth trying.
@G12GilbertProduction3 жыл бұрын
12:51 Wait... 3 samples for the 1 × 156 pixel upsampled patch of data is corigates between the r² (alpha) and r² (beta) + ... r² (omega) channel transformers, or even 156 layer architecture base to finitely decoding he was recreating themself upper to 9 samples, right?
@tnemelcfljdsqkf95293 жыл бұрын
Thank you a lot for your work, it's helping me a lot ! Which software are you using to take some notes on top of the paper like this ? :)
@raunaquepatra39664 жыл бұрын
I didn't get the point of data agumentatios in generators. Isn't the number of input samples practically infinite? I mean I can feed as many random vectors and get as many samples as needed?
@tedp91464 жыл бұрын
How exaclty is the classification head attached to the last transformer head?
@Snehilw4 жыл бұрын
Great explanation!
@minhuang88484 жыл бұрын
Dang, you learned some Chinese phonemes, didn't you? Pronunciation was pretty on point!
@dasayan054 жыл бұрын
YannicNet has been trained for several years now on AuthorName dataset. No wonder output quality is good
@minhuang88484 жыл бұрын
@@dasayan05 That's the good-ass Baidu language models lol
@florianhonicke54484 жыл бұрын
I like your jokes a lot. It is much easier to me to learn something when it is fun!
@syslinux22684 жыл бұрын
What is your opinion on MIT's new "Liquid Neural Network" ?
@YannicKilcher4 жыл бұрын
Haven't looked at it yet, but I will
@syslinux22684 жыл бұрын
@@YannicKilcher Similar to an RNN but instead of scaling into billions of parameters, focuses more on higher quality neurons. Less parameters but with good or even better results than average sized neural networks.
@dasayan054 жыл бұрын
@@syslinux2268 paper link ? is it public yet ?
@shivamraisharma14744 жыл бұрын
Do you mean the paper liquid time constant neural networks?
@syslinux22684 жыл бұрын
@@shivamraisharma1474 Yep. It's just too long to type.
@MrMIB9833 жыл бұрын
Great video
@romanliu46293 жыл бұрын
An arithmetic question: what are the parameter size of the linear transforms before the "Scaled Dot-Product Attention" which produce Q, K, and V, when synthesizing 256^2 images? If we reverse the role of the "flattened" spatial axes and channel axis, how is it related to or different from 1×1 convolution? Why flattening and reshaping features and upscaling images via pixel-shuffle, which can disrupt spatial information and lead to checkerboard artefacts?
@tylertheeverlasting4 жыл бұрын
What would have been the issue with an ImageGPT like Generator? .. Would it be too slow to train due to serial generation?
@YannicKilcher4 жыл бұрын
Apparently, transformer generators have just been unstable for GANs so far
@KanugondaRohithKumar-h6r Жыл бұрын
Sir can you please provide Transgan training code and testing code
@АлексейТучак-м4ч4 жыл бұрын
what if we change feedforward layer in transformer to another transformer? like a nested transformer
@raunaquepatra39664 жыл бұрын
How in the SuperResolution auxiliary task the LR image is calculated? especially how the not of channels is matched with the input channel of the Network? eg SR image 64x64x3 , LR image 8x8x3???? (but the network need 8x8x192)
@raunaquepatra39664 жыл бұрын
couldn't find anything in the paper also😔
@array59464 жыл бұрын
17:15 - is cropping a differential operation?
@udithhaputhanthri20023 жыл бұрын
I think what he says is, if cropping is a differentiable operation, we can use it.
@Kram10324 жыл бұрын
Can't help but think that the upsampling stuff is kinda like implicit convolutions... Not that it'd be particularly reasonable to not do this but it's setting up a similar localized attention type deal.
@WhatsAI3 жыл бұрын
Hey Yannic, love the video! May I ask what tools are you using to read this paper, highlighting the lines and for recording it? Thanks! :)
@CosmiaNebula3 жыл бұрын
likely a Microsoft Surface with a pen. Then any of the pdf annotator would work, even Microsoft Edge's pdf reader has that. As for screen recording, try OBS Studios. Would you make your own paper reading videos?
@WhatsAI3 жыл бұрын
@@CosmiaNebula Thank you for the answer! I would indeed like to try that style of videos sometimes, but my initial question was mainly because I would love to use something similar in meetings to show explanations, math and etc
@chuby354 жыл бұрын
Could this be used with vq-vae2, so the lower res "images" that fed into this TransGAN are actually the latent space representations produced by vq-vae2?
@timoteosozcelik4 жыл бұрын
How do they give LR image as input of Stage 2? What I’ve understood so far that over the stages number of channel is decreasing (which means that more than 3 channel will be in the Stage 2) but LR will have only 3 channel.
@chuby354 жыл бұрын
Probably using the same trick used for the upsample, the other way around, so scaling down the image by moving the information to more channels. (Since this aux task is used only for teaching the upsample to work properly, I don't think the LR images losing any information just rearranging it to these "super-pixels") But I haven't looked at the code yet, so my guess is as good as any. :)
@timoteosozcelik4 жыл бұрын
@@chuby35 It makes sense, thanks. But applying directly what you said made me questioning the necessity (meaning) of Stage 3 for such cases. I checked the code, but couldn’t see anything about that.
@lucasferos4 жыл бұрын
Thank you
@etiennetiennetienne4 жыл бұрын
cool bag of tricks! instead of this hardcoded mask, could it be just a initialization problem? if the probability to predict a positional-encoding vector aggreeing with far-away vector is low at beginning of training?
@pastrop20033 жыл бұрын
For the generator network do I understand correctly that when you are using an example of a 4-pixel image that starts the generation and then say that every pixel is a token going into a transformer, you imply that each of this token has an embedding with the dimensionality equal to a number of channels? I.E. if one starts with a 2x2 image with 64 channels, every pixel (token) has 64-dimensional embedding going into the transformer?
@simonstrandgaard55034 жыл бұрын
Impressive
@SequoiaAlexander3 жыл бұрын
Thanks for the video about this paper. Just what I was looking for. I will kindly suggest that the comment about the bathrooms would likely make some trans people uncomfortable. It is an unfortunate name for this research. Maybe best to leave it at that. Cheers and thanks for your work.
@nasenach3 жыл бұрын
Actually FID score is also kinda wrong, since the D stands for distance here... Ok, nerd out.
@SHauri-jb4ch3 жыл бұрын
Ich mag deine Videos, aber wenn du das Wort "Trans" liest und deine erste Assoziation ist, dass das etwas mit WCs zu tun hat, solltest du mal über deine Vorurteile nachdenken. Wenn du so viele Leute erreichst, sollte dir klar sein, dass du ein diverses Publikum haben 'könntest'. Leider sind wir da im Bereich ML noch nicht so weit und solche Mikroaggressionen tragen meiner Meinung nach dazu bei, dass es so bleiben wird.
@sudo42b4 жыл бұрын
Wow
@phoenixwithinme4 жыл бұрын
Yannic gets them, the ml papers, like in the targeted distance. 😂
@ihoholko95224 жыл бұрын
Hi, what program are you using for papers readig?
@alpers.21234 жыл бұрын
OneNote
@ihoholko95224 жыл бұрын
@@alpers.2123 Thanks
@paiwanhan3 жыл бұрын
Gan actually means the act of copulation in Mandarin. So TransGAN is even more unfortunate.
@pratik2453 жыл бұрын
Ai papers are like news articles now.. So many and so similar
@bluestar22533 жыл бұрын
Convnets are dead, long live transformers! -- reminded me of the late 80s "AI is dead, long live neural nets!" Karma is a bitch.
@siquod4 жыл бұрын
But what will transGANs do to the ganetics of organic crops if their pollen gets into the wild?
@G12GilbertProduction3 жыл бұрын
Discriminator sounds like more cancelingly. ×D
@JFIndustries3 жыл бұрын
The joke about the name was really unnecessary
@xanderx82894 жыл бұрын
TransGen
@tarmiziizzuddin3374 жыл бұрын
"Convolutions are for losers".. 😅
@panhuitong4 жыл бұрын
"convolution is for the loser"..... feeling sad about that
@circuitguy97503 жыл бұрын
For the sake of your colleagues and students, I hope you realize how your "trans bathroom" joke is harmful, disrespectful, and unprofessional.