As a 3d artist, filmmaker and actor, SORA has me super excited. I can't wait to play around with this tech. It's pretty crazy how all these modalities are happening at once--image, video, voice, sound effect, and music. All the pipelines needed to create media. There will be a time not far off, where we can plug in the prompt, and SORA 5 will create all the needed departments. As the human working with this, I would of course be heavily involved in the iterative generation and direction of each piece of media...and in the end the edit would be mine. I wonder how much 'authorship' a creator will have or be given.
@boonkiathan6 ай бұрын
but prior to commercially utilizing the SORA output there must be clarity on the source of the training data it can't be OpenAI pushing it to creators, and the creators saying they trust OpenAI this is almost the exact same issue as textual generation for fun and brainstorming, fair use i suppose
@char_art4 ай бұрын
🤡🤡🤡
@jonkraghshow6 ай бұрын
Really great interview. Thanks to all.
@erniea58436 ай бұрын
Cool interview, awesome to see a glimpse into the innovation being done to develop these video models
@leslietetteh72926 ай бұрын
Interesting video! It really highlights the potential of using 3D tokens with time as an added dimension :). My experience with diffusion models and video generation didn't show anything quite like Sora's temporal coherence. Looking ahead, I'm excited about the prospects of evolving from polygon rendering to photorealism via image-to-image inference. While I might be biased due to my interest in this rendering, I think incorporating 'possibility' as an additional dimension, as suggested by "imagining higher dimensions", could address issues like the leg switching effects we currently see. Such physics-consistent behavior could potentially be borrowed from game engine scenarios, where, unlike an apple that behaves predictably when dropped, a leg has specific movement constraints (also affected by perspective shifts). It’s a speculative route, but it might be worth exploring if it promises substantial improvements.
@tianjiancai11186 ай бұрын
Maybe internal 3D modling should be introduced to solve the issue you have mentioned (leg switching, or so called "entity inconsistency".
@leslietetteh72926 ай бұрын
@@tianjiancai1118 How so? (NB: you're familiar with how diffusion models work? It's just learning to denoise an image, or a cube in this case. I just suggest that it learns to denoise the branching possibilities rather than a cube, so it knows what is not a possibility - suggesting, not guaranteeing the idea will work. There are things like ControlNets though, so if this internal 3D modelling is a valid idea, please share)
@tianjiancai11186 ай бұрын
Sorry to clear that, but internal 3d modeling is hard to achieve in a diffusion model (as far as I know). What I mean is somehow a totally new arch.
@garsett6 ай бұрын
Smart! 😊 Personalisation and esthetics. Cool. But also PRACTICAL worldbuilding please. How can this help create quality lifestyles? Happy communities? A convivial society?
@amritbro6 ай бұрын
Im definitely following these three talented guys on X. Really great interview and without a doubt Sora is already making an impact in Hollywood like once Pixar did during a steve jobs era.
@EnigmaCodeCrusher6 ай бұрын
Great interview
@JustinHalford6 ай бұрын
Compute and data are converging on becoming interchangeable sides of the same coin. Flops are all you need.
@oiuhwoechwe6 ай бұрын
I'm old. these guys look like they just left high school.
@voncolborn94376 ай бұрын
Haha, I'm 71. I know exactly what you mean. The average age of the developers of the first Mac was 28 years old. It seems like the average age of the AI community is so young but that gives these super smart people a lot of years to get things straightened out.
@mosicr6 ай бұрын
They almost have . Peebles is just out of university.
@AIlysAI6 ай бұрын
Really all these amazing things are just possible with transformers, nothing much innovation but just apply transformers to X and scale it. The most innovative thing they did was a tokenization method as boxes the rest is mechanics.
@leslietetteh72926 ай бұрын
Adding another axis in the form of imaginary numbers improved our ability to model higher dimensional interactions before. That's negative, bordering on bias - if it isn't innovation, then why didn't everyone else do it?
@phen-themoogle76516 ай бұрын
The Matrix basically
@BadWithNames1236 ай бұрын
vocal fry contest
@jeffspaulding436 ай бұрын
our subconscious does a much better job at modeling physics. you conscious mind imagines the apple falling vaguely. your subconsious mind can learn to juggle several apples without dropping them so it knows when they will be where
@leslietetteh72926 ай бұрын
We perceive possibility (which can be thought of as an extra dimension, idea from "imagining extra dimensions"). I would think if trained on branching "possibilities" it'd be much more consistent physics. But especially with the idea of polygon-rendering to photoreal image-to-image inference on the horizon, there's more of a focus on speeding up inference these days (see Meta's amazing work on "Imagine flash" with emu). With this sort of temporal consistency, if openai manages to get inference speed up, could just use a traditional videogame physics engine with photoreal inference laid on top. It'll probably sell a lot, especially if they map electrical signals through the spinal cord to touch input and replicate that. Seeing and touching the real world through vr will be epic, and yeah probably sell loads. Could train the next gen of AI engineers (think deep-sea or deep space repair) in a simulation that looks identical to, and behaves identically to the real world.
@tianjiancai11186 ай бұрын
Branching possibility introduces higher cost in an exponential way, so knowing how to (ralatively) precisely predict something is also important. Human certainly learn possibility, and we learn certainty too.
@leslietetteh72926 ай бұрын
@tianjiancai1118 Certainly. I'm almost sure it'd have a positive effect on modelling what are essentially 4d interactions effectively, but with the sort of inference speed ups we're seeing now, I'm pretty sure image-to-image inference, polygon rendering to photorealistic is the way to go for the easy win.
@tianjiancai11186 ай бұрын
You have memtioned "easy win". I would argue that any generation without understanding its nature can't be precise enough. Reference speed is important, but reference quality is also important to achieve indistinguishable (or so called no mistake) result. Though you can speed up reference and offer realtime generation, they are still cases requiring resonable results.
@leslietetteh72926 ай бұрын
@@tianjiancai1118 "Imagine Flash: Accelerating Emu Diffusion Models with Backward Distillation" is a really good paper by Meta that you should read, its achieves super-fast inference without really compromising on quality. there are some pretty good demos of the quality they're achieving with real-time inference.
@davidh.656 ай бұрын
Why would they hype Sora up and then not even have a timeline for releasing a product??
@tianjiancai11186 ай бұрын
Because they are still working on prevention from misuse