Genie explained 🧞 Generative Interactive Environments paper explained

Рет қаралды 3,893

AI Coffee Break with Letitia

Күн бұрын

Пікірлер: 25

@Thomas-gk42 8 ай бұрын

Understood a little bit this time, and impressing how the parallax and the robotic arm control works.

@AICoffeeBreak 8 ай бұрын

I love your perseverance!

@DeepFindr 8 ай бұрын

So we can expect a coffee bean game soon? ;-)

@AICoffeeBreak 8 ай бұрын

That would be so cool! DeepMind, give us acces!

@sasankpotluri4422 8 ай бұрын

Thank you very much for an amazing video. This looked very close to the V-JEPA paper

@AICoffeeBreak 8 ай бұрын

Yes, it totally reminds me of what Yann Le Cun says about the importance of pure observation (no embodiment or RL).

@harumambaru 8 ай бұрын

What an amazing time to be alive! :)

8 ай бұрын

I have both a video game and machine learning background. This research is extra curios for me But I would come up with a more granular approach rather than try to generate a whole game from little data. I would instead convert a standard game creation pipeline into multiple generation and revision stages. What do you think, Letitia?

@AICoffeeBreak 8 ай бұрын

Did you casually call 300,000 hours of KZbin data "little data"? I do not know if I understood this right, jus wondering while being amused. :) Now coming to the very good point you raised there. Sure, game development is a pipeline, and instead of trying to generate everything at once (image, actions -> frames), one could help individual steps in that pipeline. For example, NVIDIA has these neural nets that predict plausible object or game character animations. I am not advocating against deep learning to help individual steps in game development. For the present day, I think this is the way to go. But the aim for the the long term would be to generate everything in one go (Genie style). It is hard and requires a lot of data, but when this succeeds (and I do not see a reason why it should succeed in a few years), it spares a lot of pipelining engineering and doing things correctly at each step. This is especially since there pipelines behave badly with error propagation: if the first step is wrong, the whole pipeline will be wrong / hard to recover from. Imagine if we would think about image generation as of a pipeline too: get the coloring right, do the lighting, the textures, all in a pipeline. With enough data, we do not need pipelines to break down problems into smaller ones. In the same way in which diffusion models generate images or videos in one go, Genie could do too, with action as prompts. Btw, Genie also makes the point that Yann LeCun is making, namely that observing data alone (no RL or embodiment) can really discover important features, such as actions.

@AICoffeeBreak 8 ай бұрын

I know that Genie outputs today are very small resolution and extremely short. 😅 I just think of these like being the first silly GAN-generated faces 10 years ago. But wait 10 more years and this could really go somewhere.

8 ай бұрын

@@AICoffeeBreak I meant little "input" data not "training" data of course. The little you provide, the little control you have over minute details. If I want to make an impactful game, I want to have lots of dials to turn. Just providing a prompt with a video would not be enough for commercial games for now. But in the future, maybe there will be more variety of inputs, and prompts might become like GDDs, then it can generate hyper-casual games

@AICoffeeBreak 8 ай бұрын

Haha, now this makes sense. 😅

@DerPylz 8 ай бұрын

Thanks so much for the explanation! I'm a bit sceptical about this approach and the use cases, but I'm excited to be surprised about where this will go in the future

@AICoffeeBreak 8 ай бұрын

What are you sceptical about exactly?

@DerPylz 8 ай бұрын

@@AICoffeeBreak I think AI can work great for the procedural generation of background stuff in large open worlds, but I don't quite foresee it actually creating gameplay. Small games are usually either all about a specific game play idea or about telling a story (often times with environmental storytelling) so letting an AI take over there would mean losing a lot of control (as even the best models today often still just ignore parts of the prompt randomly). But, maybe I'm a bit too narrow minded here and I'll be surprised :D

@AICoffeeBreak 8 ай бұрын

@@DerPylzMaybe gameplay is more promptable from text than we think. Of course, the source of entropy must come from somewhere and we need human prompters to tell it a bit into which direction it should go. But I estimate that a lot of the details would be just placed by AI.