No Priors Ep.61 | OpenAI's Sora Leaders Aditya Ramesh, Tim Brooks and Bill Peebles

Рет қаралды 9,869

No Priors: AI, Machine Learning, Tech, & Startups

Күн бұрын

Пікірлер: 31

@Glowbox3D 6 ай бұрын

As a 3d artist, filmmaker and actor, SORA has me super excited. I can't wait to play around with this tech. It's pretty crazy how all these modalities are happening at once--image, video, voice, sound effect, and music. All the pipelines needed to create media. There will be a time not far off, where we can plug in the prompt, and SORA 5 will create all the needed departments. As the human working with this, I would of course be heavily involved in the iterative generation and direction of each piece of media...and in the end the edit would be mine. I wonder how much 'authorship' a creator will have or be given.

@boonkiathan 6 ай бұрын

but prior to commercially utilizing the SORA output there must be clarity on the source of the training data it can't be OpenAI pushing it to creators, and the creators saying they trust OpenAI this is almost the exact same issue as textual generation for fun and brainstorming, fair use i suppose

@char_art 4 ай бұрын

🤡🤡🤡

@jonkraghshow 6 ай бұрын

Really great interview. Thanks to all.

@erniea5843 6 ай бұрын

Cool interview, awesome to see a glimpse into the innovation being done to develop these video models

@leslietetteh7292 6 ай бұрын

Interesting video! It really highlights the potential of using 3D tokens with time as an added dimension :). My experience with diffusion models and video generation didn't show anything quite like Sora's temporal coherence. Looking ahead, I'm excited about the prospects of evolving from polygon rendering to photorealism via image-to-image inference. While I might be biased due to my interest in this rendering, I think incorporating 'possibility' as an additional dimension, as suggested by "imagining higher dimensions", could address issues like the leg switching effects we currently see. Such physics-consistent behavior could potentially be borrowed from game engine scenarios, where, unlike an apple that behaves predictably when dropped, a leg has specific movement constraints (also affected by perspective shifts). It’s a speculative route, but it might be worth exploring if it promises substantial improvements.

@tianjiancai1118 6 ай бұрын

Maybe internal 3D modling should be introduced to solve the issue you have mentioned (leg switching, or so called "entity inconsistency".

@leslietetteh7292 6 ай бұрын

@@tianjiancai1118 How so? (NB: you're familiar with how diffusion models work? It's just learning to denoise an image, or a cube in this case. I just suggest that it learns to denoise the branching possibilities rather than a cube, so it knows what is not a possibility - suggesting, not guaranteeing the idea will work. There are things like ControlNets though, so if this internal 3D modelling is a valid idea, please share)

@tianjiancai1118 6 ай бұрын

Sorry to clear that, but internal 3d modeling is hard to achieve in a diffusion model (as far as I know). What I mean is somehow a totally new arch.

@garsett 6 ай бұрын

Smart! 😊 Personalisation and esthetics. Cool. But also PRACTICAL worldbuilding please. How can this help create quality lifestyles? Happy communities? A convivial society?

@amritbro 6 ай бұрын

Im definitely following these three talented guys on X. Really great interview and without a doubt Sora is already making an impact in Hollywood like once Pixar did during a steve jobs era.

@EnigmaCodeCrusher 6 ай бұрын

Great interview

@JustinHalford 6 ай бұрын

Compute and data are converging on becoming interchangeable sides of the same coin. Flops are all you need.

@oiuhwoechwe 6 ай бұрын

I'm old. these guys look like they just left high school.

@voncolborn9437 6 ай бұрын

Haha, I'm 71. I know exactly what you mean. The average age of the developers of the first Mac was 28 years old. It seems like the average age of the AI community is so young but that gives these super smart people a lot of years to get things straightened out.

@mosicr 6 ай бұрын

They almost have . Peebles is just out of university.

@AIlysAI 6 ай бұрын

Really all these amazing things are just possible with transformers, nothing much innovation but just apply transformers to X and scale it. The most innovative thing they did was a tokenization method as boxes the rest is mechanics.

@leslietetteh7292 6 ай бұрын

Adding another axis in the form of imaginary numbers improved our ability to model higher dimensional interactions before. That's negative, bordering on bias - if it isn't innovation, then why didn't everyone else do it?

@phen-themoogle7651 6 ай бұрын

The Matrix basically

@BadWithNames123 6 ай бұрын

vocal fry contest

@jeffspaulding43 6 ай бұрын

our subconscious does a much better job at modeling physics. you conscious mind imagines the apple falling vaguely. your subconsious mind can learn to juggle several apples without dropping them so it knows when they will be where

@leslietetteh7292 6 ай бұрын

We perceive possibility (which can be thought of as an extra dimension, idea from "imagining extra dimensions"). I would think if trained on branching "possibilities" it'd be much more consistent physics. But especially with the idea of polygon-rendering to photoreal image-to-image inference on the horizon, there's more of a focus on speeding up inference these days (see Meta's amazing work on "Imagine flash" with emu). With this sort of temporal consistency, if openai manages to get inference speed up, could just use a traditional videogame physics engine with photoreal inference laid on top. It'll probably sell a lot, especially if they map electrical signals through the spinal cord to touch input and replicate that. Seeing and touching the real world through vr will be epic, and yeah probably sell loads. Could train the next gen of AI engineers (think deep-sea or deep space repair) in a simulation that looks identical to, and behaves identically to the real world.

@tianjiancai1118 6 ай бұрын

Branching possibility introduces higher cost in an exponential way, so knowing how to (ralatively) precisely predict something is also important. Human certainly learn possibility, and we learn certainty too.

@leslietetteh7292 6 ай бұрын

@tianjiancai1118 Certainly. I'm almost sure it'd have a positive effect on modelling what are essentially 4d interactions effectively, but with the sort of inference speed ups we're seeing now, I'm pretty sure image-to-image inference, polygon rendering to photorealistic is the way to go for the easy win.

@tianjiancai1118 6 ай бұрын

You have memtioned "easy win". I would argue that any generation without understanding its nature can't be precise enough. Reference speed is important, but reference quality is also important to achieve indistinguishable (or so called no mistake) result. Though you can speed up reference and offer realtime generation, they are still cases requiring resonable results.

@leslietetteh7292 6 ай бұрын

@@tianjiancai1118 "Imagine Flash: Accelerating Emu Diffusion Models with Backward Distillation" is a really good paper by Meta that you should read, its achieves super-fast inference without really compromising on quality. there are some pretty good demos of the quality they're achieving with real-time inference.