TransGAN: Two Transformers Can Make One Strong GAN (Machine Learning Research Paper Explained)

Рет қаралды 33,675

Күн бұрын

Пікірлер

@Ronnypetson 4 жыл бұрын

Yannic is like that nurse that mashes the potato with a spoon and gives it to you so that you toothless nerds can get fed

@YannicKilcher 4 жыл бұрын

This made me laugh so hard :D

@cerebralm 4 жыл бұрын

LOL

@theoboyer3812 4 жыл бұрын

That's a funny summary of what a teacher is

@cerebralm 4 жыл бұрын

@@theoboyer3812 I heard concise explanation described as "you have to make an ikea furniture without using all the pieces, but it still has to be sturdy when your done"

@swordwaker7749 3 жыл бұрын

Ahh... more like a chef. The papers in the original form can be hard to digest without... some help. BTW, the paper is like dragon meat.

@finlayl2505 4 жыл бұрын

Relationship ended with conv nets, transformers are my best friend now

@hoaxuan7074 4 жыл бұрын

A Fast Transform fixed filter bank neural network trained as an autoencoder works quite well as a GAN. Noise in image out. I guess with filter in the title...

@lunacelestine8574 3 жыл бұрын

That made my day

@dasayan05 4 жыл бұрын

25:57 convolutions are for loosers, we're all for locally applied linear tranformation .. 😂

@xtli5965 3 жыл бұрын

They actually updated the paper so that: they no long use super-resolution co-training and locality-aware initialization, but instead using relative positional embedding and modified normalization. Also they tried larger images with local self-attention to reduce memory bottleneck. The most confusing part in this paper for me is the UpScale and AvgPool operation, since outputs from a transformer are suppose to be global feature, so it feels strange to directly upsample or pool as we do to convolution features.

@puneetsingh5219 4 жыл бұрын

Yannic is on fire 🔥🔥

@rallyram 4 жыл бұрын

Why do you think they go with the wgan grad penalty instead of the spectral normalization as per Appendix A.1?

@hk2780 4 жыл бұрын

So why should we not use the conv when we use the locally linear function? I do not get any point from that. Also why they use 16 crop things. To be honest it is almost same as 16 stride 16 x 16 kernel size conv. And then they said that we do not use the convolution. Well they use the same thing to do with convolution. Sounds like it becomes more con artist tihg.

@dl9926 3 жыл бұрын

but that would be so expensive isn't it ?

@wilsonthurmanteng9 4 жыл бұрын

Hi Yannic, fast reviews as usual! I would just like your thoughts on the loss functions of the recent Continuous Conditional GAN paper that was accepted into ICVR 2021.

@MightyElemental Жыл бұрын

I was attempting to build a TransGAN for a university project and ended up with a very similar method. Only thing that was missing was the localized attention. No way was I gonna get that 💀

@dasayan05 4 жыл бұрын

1:11 "which bathroom do the TransGANs go to ?"

@nguyenanhnguyen7658 3 жыл бұрын

There is no high-res benchmark for TransGAN vs StyleGANV2 so we do not know if it is worth trying.

@G12GilbertProduction 3 жыл бұрын

12:51 Wait... 3 samples for the 1 × 156 pixel upsampled patch of data is corigates between the r² (alpha) and r² (beta) + ... r² (omega) channel transformers, or even 156 layer architecture base to finitely decoding he was recreating themself upper to 9 samples, right?

@tnemelcfljdsqkf9529 3 жыл бұрын

Thank you a lot for your work, it's helping me a lot ! Which software are you using to take some notes on top of the paper like this ? :)

@raunaquepatra3966 4 жыл бұрын

I didn't get the point of data agumentatios in generators. Isn't the number of input samples practically infinite? I mean I can feed as many random vectors and get as many samples as needed?

@tedp9146 4 жыл бұрын

How exaclty is the classification head attached to the last transformer head?

@Snehilw 4 жыл бұрын

Great explanation!

@minhuang8848 4 жыл бұрын

Dang, you learned some Chinese phonemes, didn't you? Pronunciation was pretty on point!

@dasayan05 4 жыл бұрын

YannicNet has been trained for several years now on AuthorName dataset. No wonder output quality is good

@minhuang8848 4 жыл бұрын

@@dasayan05 That's the good-ass Baidu language models lol

@florianhonicke5448 4 жыл бұрын

I like your jokes a lot. It is much easier to me to learn something when it is fun!

@syslinux2268 4 жыл бұрын

What is your opinion on MIT's new "Liquid Neural Network" ?

@YannicKilcher 4 жыл бұрын

Haven't looked at it yet, but I will

@syslinux2268 4 жыл бұрын

@@YannicKilcher Similar to an RNN but instead of scaling into billions of parameters, focuses more on higher quality neurons. Less parameters but with good or even better results than average sized neural networks.

@dasayan05 4 жыл бұрын

@@syslinux2268 paper link ? is it public yet ?

@shivamraisharma1474 4 жыл бұрын

Do you mean the paper liquid time constant neural networks?

@syslinux2268 4 жыл бұрын

@@shivamraisharma1474 Yep. It's just too long to type.

@MrMIB983 3 жыл бұрын

Great video

@romanliu4629 3 жыл бұрын

An arithmetic question: what are the parameter size of the linear transforms before the "Scaled Dot-Product Attention" which produce Q, K, and V, when synthesizing 256^2 images? If we reverse the role of the "flattened" spatial axes and channel axis, how is it related to or different from 1×1 convolution? Why flattening and reshaping features and upscaling images via pixel-shuffle, which can disrupt spatial information and lead to checkerboard artefacts?

@tylertheeverlasting 4 жыл бұрын

What would have been the issue with an ImageGPT like Generator? .. Would it be too slow to train due to serial generation?

@YannicKilcher 4 жыл бұрын

Apparently, transformer generators have just been unstable for GANs so far

@KanugondaRohithKumar-h6r Жыл бұрын

Sir can you please provide Transgan training code and testing code

@АлексейТучак-м4ч 4 жыл бұрын

what if we change feedforward layer in transformer to another transformer? like a nested transformer

@raunaquepatra3966 4 жыл бұрын

How in the SuperResolution auxiliary task the LR image is calculated? especially how the not of channels is matched with the input channel of the Network? eg SR image 64x64x3 , LR image 8x8x3???? (but the network need 8x8x192)

@raunaquepatra3966 4 жыл бұрын

couldn't find anything in the paper also😔

@array5946 4 жыл бұрын

17:15 - is cropping a differential operation?

@udithhaputhanthri2002 3 жыл бұрын

I think what he says is, if cropping is a differentiable operation, we can use it.

@Kram1032 4 жыл бұрын

Can't help but think that the upsampling stuff is kinda like implicit convolutions... Not that it'd be particularly reasonable to not do this but it's setting up a similar localized attention type deal.

@WhatsAI 3 жыл бұрын

Hey Yannic, love the video! May I ask what tools are you using to read this paper, highlighting the lines and for recording it? Thanks! :)

@CosmiaNebula 3 жыл бұрын

likely a Microsoft Surface with a pen. Then any of the pdf annotator would work, even Microsoft Edge's pdf reader has that. As for screen recording, try OBS Studios. Would you make your own paper reading videos?

@WhatsAI 3 жыл бұрын

@@CosmiaNebula Thank you for the answer! I would indeed like to try that style of videos sometimes, but my initial question was mainly because I would love to use something similar in meetings to show explanations, math and etc

@chuby35 4 жыл бұрын

Could this be used with vq-vae2, so the lower res "images" that fed into this TransGAN are actually the latent space representations produced by vq-vae2?

@timoteosozcelik 4 жыл бұрын

How do they give LR image as input of Stage 2? What I’ve understood so far that over the stages number of channel is decreasing (which means that more than 3 channel will be in the Stage 2) but LR will have only 3 channel.

@chuby35 4 жыл бұрын

Probably using the same trick used for the upsample, the other way around, so scaling down the image by moving the information to more channels. (Since this aux task is used only for teaching the upsample to work properly, I don't think the LR images losing any information just rearranging it to these "super-pixels") But I haven't looked at the code yet, so my guess is as good as any. :)

@timoteosozcelik 4 жыл бұрын

@@chuby35 It makes sense, thanks. But applying directly what you said made me questioning the necessity (meaning) of Stage 3 for such cases. I checked the code, but couldn’t see anything about that.

@lucasferos 4 жыл бұрын

Thank you

@etiennetiennetienne 4 жыл бұрын

cool bag of tricks! instead of this hardcoded mask, could it be just a initialization problem? if the probability to predict a positional-encoding vector aggreeing with far-away vector is low at beginning of training?

@pastrop2003 3 жыл бұрын

For the generator network do I understand correctly that when you are using an example of a 4-pixel image that starts the generation and then say that every pixel is a token going into a transformer, you imply that each of this token has an embedding with the dimensionality equal to a number of channels? I.E. if one starts with a 2x2 image with 64 channels, every pixel (token) has 64-dimensional embedding going into the transformer?

@simonstrandgaard5503 4 жыл бұрын

Impressive

@SequoiaAlexander 3 жыл бұрын

Thanks for the video about this paper. Just what I was looking for. I will kindly suggest that the comment about the bathrooms would likely make some trans people uncomfortable. It is an unfortunate name for this research. Maybe best to leave it at that. Cheers and thanks for your work.

@nasenach 3 жыл бұрын

Actually FID score is also kinda wrong, since the D stands for distance here... Ok, nerd out.

@SHauri-jb4ch 3 жыл бұрын

Ich mag deine Videos, aber wenn du das Wort "Trans" liest und deine erste Assoziation ist, dass das etwas mit WCs zu tun hat, solltest du mal über deine Vorurteile nachdenken. Wenn du so viele Leute erreichst, sollte dir klar sein, dass du ein diverses Publikum haben 'könntest'. Leider sind wir da im Bereich ML noch nicht so weit und solche Mikroaggressionen tragen meiner Meinung nach dazu bei, dass es so bleiben wird.

@sudo42b 4 жыл бұрын

Wow

@phoenixwithinme 4 жыл бұрын

Yannic gets them, the ml papers, like in the targeted distance. 😂

@ihoholko9522 4 жыл бұрын

Hi, what program are you using for papers readig?

@alpers.2123 4 жыл бұрын

OneNote

@ihoholko9522 4 жыл бұрын

@@alpers.2123 Thanks

@paiwanhan 3 жыл бұрын

Gan actually means the act of copulation in Mandarin. So TransGAN is even more unfortunate.

@pratik245 3 жыл бұрын

Ai papers are like news articles now.. So many and so similar

@bluestar2253 3 жыл бұрын

Convnets are dead, long live transformers! -- reminded me of the late 80s "AI is dead, long live neural nets!" Karma is a bitch.

@siquod 4 жыл бұрын

But what will transGANs do to the ganetics of organic crops if their pollen gets into the wild?

@G12GilbertProduction 3 жыл бұрын

Discriminator sounds like more cancelingly. ×D

@JFIndustries 3 жыл бұрын

The joke about the name was really unnecessary

@xanderx8289 4 жыл бұрын

TransGen

@tarmiziizzuddin337 4 жыл бұрын

"Convolutions are for losers".. 😅

@panhuitong 4 жыл бұрын

"convolution is for the loser"..... feeling sad about that

@circuitguy9750 3 жыл бұрын

For the sake of your colleagues and students, I hope you realize how your "trans bathroom" joke is harmful, disrespectful, and unprofessional.

@ioannispanop 2 жыл бұрын

Very nice transphobic joke! #unsub

@FreakFolkerify 4 жыл бұрын

Can it turn my male dog into Female?