Generate long form video with Transformers | Phenaki from Google Brain explained

  Рет қаралды 11,365

AI Coffee Break with Letitia

AI Coffee Break with Letitia

Күн бұрын

Пікірлер: 26
@automatescellulaires8543
@automatescellulaires8543 2 жыл бұрын
This will revolutionize the meme market.
@DerPylz
@DerPylz 2 жыл бұрын
Thank you for also explaining Phenaki! I was curious about a non-diffusion model for video generation! 🎊
@davidyang102
@davidyang102 2 жыл бұрын
Because it resumes generation from a few frames it will lose context. Imagine generating a paragraph and then the next one using only the last word you generated. Luckily images captured a lot of information so it's not that obvious. But for example you can't do a video that looks around 360 degrees is it's generated with two iterations. Very dreamlike.
@federicolusiani7753
@federicolusiani7753 2 жыл бұрын
Thank you for your video, great content as always! One question: in the video, you say that the video encoder is auto-regressive, so that it can be used on arbitrary number of video patches. But aren't standard transformer encoders already able to process inputs of arbitrary length? Usually the auto-regressive architecture is used in the decoder, because at inference time, we need it to generate the output causally. Am I missing something?
@AICoffeeBreak
@AICoffeeBreak 2 жыл бұрын
Thanks for this great question. Transformer sequence length is an interesting topic, which we've discussed here already: kzbin.info/www/bejne/jqnXpGSfqc2opqs Basically, even if it can generate / take in variable length input, it still has a predefined maximum input / output length due to practical limitations (compute time and memory). You are asking whether a causal model could not generate infinitely long video and -- for practical reasons -- the answer is no. Unmodified causal attention means that one attends to the whole generated past and for very long sequences. This means that the attention window increases linearly and computation time and memory increases quadratically. So because of limited compute time and memory, we cannot generate indefinitely, unless one applies such tricks as the Phenki authors with MaskGIT, to only attend to a small fraction of the tokens of the past generated output.
@Handelsbilanzdefizit
@Handelsbilanzdefizit 2 жыл бұрын
Maybe there will be a way to visualize memories and dreams, by using Electroencephalography (EEG) and Neural Networks. So you can see what others think. Or see what others see, through their eyes.
@mrinmoybanik5598
@mrinmoybanik5598 2 жыл бұрын
Good luck collecting training dataset🙂
@johnkintner
@johnkintner 2 жыл бұрын
researchers have already used fmri to do something similar! This was a while ago :D
@rewixx69420
@rewixx69420 2 жыл бұрын
i want so much infinite video generation on diffusion models
@AICoffeeBreak
@AICoffeeBreak 2 жыл бұрын
Soon. Just give Google some time to mount more TPUs in their racks. 😅
@AICoffeeBreak
@AICoffeeBreak 2 жыл бұрын
twitter.com/_akhaliq/status/1595645248243650560?t=PHepVXOP40pPdc5q3upUbQ&s=19 what about this? Didn't look into it.
@summary7428
@summary7428 Жыл бұрын
great video, but i think it was wrongly placed in your (awesome) diffusers playlist =)
@AICoffeeBreak
@AICoffeeBreak Жыл бұрын
You are right, it is not a diffusion model. It's about content generation. 😅 I was more comfortable with it being in this playlist (especially as the last video in the row) rather than being nowhere close to it's fellow competition. But sure, I do not have the Paella video in the list, although Paella can be argued to be a diffusion model. I need to clean up.
@elev007
@elev007 2 жыл бұрын
Great explanation- thank you 🙏
@barberb
@barberb 2 жыл бұрын
thank you letitia
@TheGatoskilo
@TheGatoskilo Жыл бұрын
I wonder how do they pad the video tensors with variable sequence length.
@AICoffeeBreak
@AICoffeeBreak Жыл бұрын
Do you see this as problematic?
@TheGatoskilo
@TheGatoskilo Жыл бұрын
I just wonder to the implementation level, these padding values as well as masking the tokens, did someone decide that we will fill these tensors with 0s? Does it matter what we are going to fill those vectors with? What if these padded/masked values of 0s overlap with actual data, how do we effectively instruct the model to disentangle masked values from 0s corresponding to the actual data?
@TheGatoskilo
@TheGatoskilo Жыл бұрын
@@AICoffeeBreak No, I just wonder how it works in the implementation
@TimScarfe
@TimScarfe 2 жыл бұрын
Amazeballs ❤
@AICoffeeBreak
@AICoffeeBreak 2 жыл бұрын
Thanks, Tim! 😅 Happy to see MLST release content since what it feels a long time!
@sadface7457
@sadface7457 2 жыл бұрын
Hello MsCB 👋
@AICoffeeBreak
@AICoffeeBreak 2 жыл бұрын
Hello Sad Face! Did not see you in the comments in a long time! 👋
@DerPylz
@DerPylz 2 жыл бұрын
Woohooo! Sad Face is back! 🎊
@VivaLaRevoMW3PS3
@VivaLaRevoMW3PS3 2 жыл бұрын
Boring video there's already like 100 of this kind, people want to know where they can use it or how they can use it
Movie Diffusion explained | Make-a-Video from MetaAI and Imagen Video from Google Brain
14:38
PaLM Pathways Language Model explained | 540 Billion parameters can explain jokes!?
16:32
AI Coffee Break with Letitia
Рет қаралды 21 М.
Lazy days…
00:24
Anwar Jibawi
Рет қаралды 6 МЛН
Twin Telepathy Challenge!
00:23
Stokes Twins
Рет қаралды 123 МЛН
Thank you Santa
00:13
Nadir Show
Рет қаралды 39 МЛН
Yay😃 Let's make a Cute Handbag for me 👜 #diycrafts #shorts
00:33
LearnToon - Learn & Play
Рет қаралды 117 МЛН
MAMBA and State Space Models explained | SSM explained
22:27
AI Coffee Break with Letitia
Рет қаралды 54 М.
Mamba Language Model Simplified In JUST 5 MINUTES!
6:14
Analytics Camp
Рет қаралды 7 М.
ConvNeXt: A ConvNet for the 2020s - Paper Explained (with animations)
19:20
AI Coffee Break with Letitia
Рет қаралды 22 М.
OpenAI’s CLIP explained! | Examples, links to code and pretrained model
14:48
AI Coffee Break with Letitia
Рет қаралды 38 М.
Interstellar Technologies: A Japanese Small Satellite Launch Provider
9:49
SCORING MY FIRST CAR COMMERCIAL!!
11:08
EdTalenti
Рет қаралды 207 М.
Beyond neural scaling laws - Paper Explained
13:16
AI Coffee Break with Letitia
Рет қаралды 14 М.
A Journey Inside Your Body
9:12
BRIGHT SIDE
Рет қаралды 17 МЛН
Swin Transformer paper animated and explained
11:10
AI Coffee Break with Letitia
Рет қаралды 70 М.
Lazy days…
00:24
Anwar Jibawi
Рет қаралды 6 МЛН