DiT: The Secret Sauce of OpenAI's Sora & Stable Diffusion 3

Рет қаралды 61,381

Күн бұрын

Пікірлер: 155

@bycloudAI 6 ай бұрын

Don't miss out on these exciting upgrades designed to elevate your content creation experience with DomoAI! Go try out: discord.gg/sPEqFUTn7n

@TheBigLou13 6 ай бұрын

Is DomoAI selfhosted or do you have to rely/trust a third party with your data?

@doritime5229 6 ай бұрын

its not free I can not try anything

@amortalbeing 6 ай бұрын

5:43, OpenAI itself said that the more compute they threw at sora, the better the results got, so you are right in that the compute here does absolutely matter!

@carkawalakhatulistiwa 6 ай бұрын

More and more GPU

@hitmusicworldwide 6 ай бұрын

So far what a waste of compute

@Words-. 6 ай бұрын

@@hitmusicworldwide How so?

@ADMNtek 6 ай бұрын

pretty sure Jen-Hsun started salivating when he heard that.

@hitmusicworldwide 6 ай бұрын

@@Words-. Marvel Universe's Avengers End Game used $80.00 per GPU farms (compared to 30k per H100 ) with Autodesk Maya . The results of their efforts not only employed a small towns worth of people, it cost 400 million dollars to produce including talent salaries and has earned 2.7 BILLION dollars. Now THAT is a productive use of compute. The resulting Maya GPU generated video looks 2.7 billion times better as well. Sora uses 4,200-10,500 Nvidia H100 GPU's and produces this crap? Create a python script that selects video clips and runs Maya routines instead. Do the math.

@ovum 6 ай бұрын

The hell with the Frieren thumbnail lmao

@Chris-iu6ws 6 ай бұрын

Yeah not cool

@the_gobbo 6 ай бұрын

It is absolute perfection obviously, peak thumbnail right there

@anywallsocket 6 ай бұрын

I’m a simple man 😂

@capitandonculo 6 ай бұрын

honestly loved the thumnail, may be the only reason i clicked, (wtf is DiT?) and it was a great fucking video

@alexmehler6765 6 ай бұрын

report him for misleading thumbnail

@imerence6290 6 ай бұрын

Are you telling me Stable diffusion had ADHD 💀

@Happ1ness 6 ай бұрын

[Insert "always has been" meme here]

@PhotoBomber 6 ай бұрын

Lol brilliant

@Dogo.R 6 ай бұрын

Yes just like in humans it seems to mostly be an education and enviroment problem.

@adesa756 6 ай бұрын

@mehulagarwal858 6 ай бұрын

Hey, great video as always. Would love to see a deep dive video into the Stable Diffusion 3 architecture and other DIT methods!

@cybertruck2008 6 ай бұрын

Those facebook users are probably bots, because we are reaching the dead internet theory

@TerraGlide 6 ай бұрын

Does that mean you’re a bot too?

@Titangrille 6 ай бұрын

my words exactly. Best I can do is hope we are not there already. Hello stranger from another part of the world with the same interest as me.

@H0mework 6 ай бұрын

@@TitangrilleHELLO HUMAN. YES I agree with your opinion, beep beep.

@ianglenn2821 6 ай бұрын

yep, the irony of "those are my words exactly" haha there's only one person on the internet, and its us

@zeroking981 6 ай бұрын

@@TerraGlideor say something a bot cant say like some austrian painter

@manuffls1756 6 ай бұрын

In the interview, the developers mentioned that Sora will still need quite a while until its release. In the subsequent interview, however, the chief technology officer stated that there might be expanded access this year, possibly even in the coming months. I believe this is yet another instance of the conflict between scientific caution and the realization that, if high prices are initially charged as announced, there can be significant earnings.

@mktwos 6 ай бұрын

i hope sora will never be released to the public, the consequences would be disastrous.

@manuffls1756 6 ай бұрын

@@mktwos Basically you're right, I would like to play with it, but preferably without anyone else being able to do it haha Being on KZbin and knowing that in a year you will probably be searching every remotely exotic video for evidence of AI is not pleasant. On the other hand, let's be honest: even if Sora waits another year, other companies will have published corresponding algorithms by then... so for Open AI it's once again not so much a question of whether you should do it, but whether you should want to control this development by being the first

@incription 6 ай бұрын

we will likely get open source equivalent of sora next year at the latest. and by then openai will have an even better model@@mktwos

@autohmae 6 ай бұрын

@@mktwos very big maybe, with an embedded watermark in each frame which isn't visible to normal people (thus using a form of steganography)

@Vivaerti 6 ай бұрын

Corporations such as OpenAI care for the money and only for the money, they could care less about the small guys (Artists), and they care more for the potential profit they could rake in from having Sora go public once it's finished. Remember: To them, we are only consumers, and they don't care for what we say.

@adityajoshi287 6 ай бұрын

Just letting you know, Yes, I would like to see your video on DiffiT and HDiT architecture if you make one! Love your videos!!

@gemstone7818 6 ай бұрын

yeah it would be nice to hear more about DiT architectures

@rakly3473 6 ай бұрын

"Can you even tell which one is the real and fake image?" - Uh, yes, it was pretty obvious. I didn't even had to 'nitpick' about it, I could tell pretty much instantly. I completely disagree with the notion we are high up on the curve. If you actually work with AI, not just use MidJourney. I'm talking about Krita and text completion models (for example to make Agents'), you will see there is still much progress day over day. I do generate a couple thousand of images each and every day (work related). So I do have quite some experience from seeing so many of them. I haven't even seen an upscaler yet that satisfies my requirements. The only way to get better upscales right now is more VRAM on a single device. Which isn't practical. My current workflow is pretty convoluted. I have multiple GPU's all rendering different image layers, each with their own focus (like attention). I have to use multiple GPU's, one for each layer to make it viable. Rendering each layer one after another on a single GPU would not be practical. - So instead of using a GPU with tons of VRAM or use a very complex workflow with multiple GPU's, I'm still seeing great progress being made in efficiency. for example SD Cascade. I think your focus on what progress is, is too narrow. Don't just look at what is possible, you have to take in to account how much progress there is being made in what is possible with limited resources. Which isn't just the amount of GPU's either. Also the amount of, and quality of, the required input.

@jghifiversveiws8729 6 ай бұрын

I definitely would like for you to explore the other DiTs

@slackstation 6 ай бұрын

Definitely want to see an indepth look at DiT

@chrisfleitas615 6 ай бұрын

Domo AI can get expensive very fast. Went through a month's tokens from the basic subscription in an hour transforming my client's 30 sec ad into an anime. The results were good.

@claxvii177th6 6 ай бұрын

My exact reaction when started usong if ai in comfyui

@NikoKun 6 ай бұрын

The "Sigmoid Curve" is FAR too misleading a representation to actually give people any reliable indication of where we are, unless their goal is a skeptical narrative. It's impossible to tell where we are on the curve, or what the curve truly looks like at this time, as it's significantly different with every advance, and how we arbitrarily classify advances also changes the appearance of the curves. Such representations can ONLY be applied retrospectively, when analyzing the past, they have no value for reliable prediction. Maybe the curve has multiple slow spots along it, or maybe multiple sigmoid curves chain together. Somewhere I saw an explanation, that overlayed a whole bunch of sigmoid curves related to recent technological advances, and when you average out all their overlaps, you end up with the same exponential singularity curve, guys like Kurzweil predicted. Tho I can't seem to find that.

@technolus5742 6 ай бұрын

exactly, a few years ago progress was slow, this guy could have said the exact same thing and be completely wrong 🤦‍♂

@vedforeal7835 6 ай бұрын

The only vid someone needs to understand current ai situation

@Anderson-f4t6c 6 ай бұрын

Frieren thumbnail, nice

@CosmicAnew 6 ай бұрын

I can still tell the difference between fake and real images but that's because I study photos and paint them. Something about the noise and the lighting is really off in the first image, most people wouldn't light something slightly green since it's not a natural type of light. I think if you don't have a background with visual art or something similar, it's probably really difficult to tell real and fake images apart. (0:14)

@MemeMultiverseGo 6 ай бұрын

Venturing into the world of storytelling and creative videos, VideoGPT becomes the invisible hand that refines my content, making it resonate with a professional vibe.

@TimeLordRaps 6 ай бұрын

The only thing they added was space time relation is such an understatement.

@kronux3831 5 ай бұрын

To me, the biggest hurdles to overcome with image generation are character consistency (which is exactly what it sounds like) and object transfer. (The ability to select a specific object in one image, such as a shirt, and have it be included as part of the resultant generation.) AI image generation doesn’t need to look perfect, and I’m not to sure how much of a return companies will get over marginally increasing quality, when solving the issues I described above would lead to greater direct application. One immediate possibility would be how useful these advances would be for creating ai generated comics or animations. If I were a betting man, I’d say these two concepts are what most AI image generation companies are working on right now.

@nikroth 6 ай бұрын

Transformers is a very good movie. You can't go wrong to watch it with someone (except for some movie snobs). Transformers and chill is the new 10/10.

@deltamico 6 ай бұрын

If only someone came up with a proven optimal way of using transformers for something and called it optimus

@nikroth 6 ай бұрын

@@deltamico HEHEHEH :D

@VisionaryPathway 6 ай бұрын

Did you see the MIT paper that just published? Using "Distribution Matching Distillation" (DMD), 30x faster image generation is achieved vs. Stable Diffusion, and at the same/higher quality of image. How's that for near the top of sigmoid curve 😉

@OMGLittleB 6 ай бұрын

yep it's pretty awesome

@cdkw8254 6 ай бұрын

Love how the ai took over even sponsors

@Kuchenrolle 6 ай бұрын

+1 for the follow-up video on the more technical details of DiT.

@SliceOfFish 6 ай бұрын

Ability to generate more complex scenes is cool but I don't see much difference between SD3 and SDXL in terms of image quality.

@noobicorn_gamer 6 ай бұрын

It's much harder to tell what's fireship and what's bycloud on recommended feed these days than AI image progress :D

@brainstormsurge154 6 ай бұрын

Just watched your video on Mamba. After watching this it makes me wonder about the Mamba model being used more and more for it's precision. Things like pixel art or voxel style need a lot more precision than regular diffusion or other image/video generation has. At least with what I've seen. Although part of that is that how people make that is by giving themselves limiters such as only drawing on a bitmap within certain parameters or with programs that automates those parameters. That won't be as much of a limiting factor if AI is now getting access to a command line which means it's only a matter of time before the AI has access to a program where the parameters it works with are constrained and narrowed to get the results people want.

@lio1234234 6 ай бұрын

I'd love to have a video from you on those architectures!

@the_gobbo 6 ай бұрын

THE THUMBNAIL THO LOL i lov it

@Neonagi 6 ай бұрын

The only thing you could really say is we are at the top of the 'current' sigmoid curve, until it's broken by yet another sigmoid curve. Using sigmoid curves doesn't work for predicting the future, they're only useful for looking at past processes.

@draken5379 6 ай бұрын

If you like DomoAI, you can find the open source communities that they take all their workflows from, and learn to do it yourself for free :)

@7satsu 6 ай бұрын

GPT-6 -> AGI - Supercomputers at consumer level → Quantum Computing -> ASI

@DaKussh 3 ай бұрын

It's funny because the results have been arithmetically inversed but the VRAM requirements for the most basic stuff has been increasing between 25% and 50% between each iteration.

@MilesBellas 5 ай бұрын

yes.... dive into DiT, Diff it and HDit, CorrDiff, eDiff-I etc... 😊

@oneinazillion 6 ай бұрын

Just because companies can afford lots of compute does not mean that they have a commercially viable/environmentally sustainable product. These are great "experiments" for sure and kudos to the amazing work being done by these scientists and engineers. But to me, these are still experimental. I would call an AI product successful when I can run it on my phone's compute or something like an OS that can greatly augment general purpose tasks without ever having to connect to a cloud subscription.

@lukasgruber1280 6 ай бұрын

lower pic was Pulp Fiction so the other one had to be fake

@dengyun846 6 ай бұрын

I wish you would go into more detail on the actual mechanisms for those who can follow it.

@DrW1ne 6 ай бұрын

You reached the pick of chill in this video. I like the vibe.

@disguisedpuppy 6 ай бұрын

I am starting to get confused between these channels

@AlyphRat 6 ай бұрын

Am I crazy, or is the editing oddly similar to Fireship?

@canyoupleaserunfast 6 ай бұрын

I'd like to know more about diffIT and hdit ^ _ ^

@rje4242 6 ай бұрын

Ringo and Honey Bunny are the real image. Not shown: wallet with "bad mother fucker" stamped on it.

@aleanscm9350 6 ай бұрын

The most probable cause is that we are limiting the growth of ai, at least we are starving it with novelty

@jeffg4686 6 ай бұрын

@4:06 - the other ones don't follow your prompt because they don't want you to compete with them - the others are heavily funded by corporations - all the wealthy are aligned and they don't like competition...

@yudi8204 6 ай бұрын

Fireship but 8 times longer

@manutebol956 6 ай бұрын

pulp fiction reference nice

@tommysalami420 6 ай бұрын

lmao I livestreamed teaching the chatbots to use stable diffusion XD

@waterbot 6 ай бұрын

DiffiT video WHEN?

@Malorianarms999 6 ай бұрын

Banger

@cdkw8254 6 ай бұрын

I love how some millionaire guy was like let's just throw money and it and it actually works

@levimccallum9006 6 ай бұрын

Can you explain the DiT usage in Pixart sigma?

@asterlofts1565 6 ай бұрын

Their secret, I think, is that they are open source... because PEOPLE THEMSELVES MODIFY THIS AND IMPROVE IT AS THEY WANT.

@frederikcalsius5014 6 ай бұрын

Please do DiffIT and HDiT videos!

@SumitRana-life314 6 ай бұрын

Man i picked the One below and got stumped whent he above one was fake. I swead these are getting so good that someday you can do it with both images as real and I would still not get it.

@jatiquep5543 6 ай бұрын

Is this fireship second channel

@l.halawani 6 ай бұрын

I've been waiting for this video!

@nonetrix3066 6 ай бұрын

I think we are far from perfecting image generation even with SD3, it still struggles with background details and hands to a lesser extent

@southcoastinventors6583 6 ай бұрын

Can't really say that about a base model they showing it off on reddit 3 days ago it is amazing and once it fine-tuned with Juggernaut or Dreamshaper it will be amazing especially with controlnet and inpainting

@Vivaerti 6 ай бұрын

I don't know, take a look at Niji journey V6 or NovelAI's latest model, they can do some neat backgrounds and hands. They still mess up the hands but nowhere as much as before.

@WhhhhhhjuuuuuH 6 ай бұрын

"attention is that we need" i see what you did there 😅

@MuslimFriend2023 6 ай бұрын

Brilliant style. All the best insh'Allah :)

@user-up4wj9vi3w 6 ай бұрын

its finally over for artists

@Vivaerti 6 ай бұрын

It isn't really, ai art still has some issues with anatomy and I doubt it will take over as the main form of art. Though that isn't as noticeable now if you use NovelAI's latest model which can generate hands that are accurate and consistent.

@user-up4wj9vi3w 6 ай бұрын

@@Vivaerti obvoiusly ai art can be spotted, but that doesn't stop corpos from firing some if not all of their artists and have the remaining one work with ai

@Vivaerti 6 ай бұрын

@@user-up4wj9vi3w Corpos are corpos, they care for the money and only money. It may be possible to spot the signs of AI art now, but slowly but surely it will be harder to tell as image generation improves over time.

@hitmusicworldwide 6 ай бұрын

It's all fun n games until you realize that a Sora 2 min 720 p blurry and unintentional artifact filled video requires 720k H100 GPU's @ $30k each whilst Avengers Endgame using Autodesk Maya generates 4k masterpieces with GPU's that you can buy for $80.00 on eBay and 3d animators that work for less than 3 H100's. And Avengers Endgame generated 2.8 BILLION dollars from a $356 million production cost and paid a lot of human's grocery bills.

@thatvexiol 6 ай бұрын

Source for the 720k h100 gpu pls + if you're to be believed then Blackwell gpu have already made that 720k to 120k Blackwell gpu and will make a single gpu in 4 years Who cares about paying the groceries of humans ? We care about the growth of companies and moni 🤑

@christophkogler6220 4 ай бұрын

that sounds like the training setup, not inference

@Miss0Demon 6 ай бұрын

Oh boy I can’t wait for easy to make AI blackmail material and mass unemployment!

@jonmichaelgalindo 6 ай бұрын

I'm worried SD3 will never release though.

@southcoastinventors6583 6 ай бұрын

They were showing it off 3 days ago as beta build in reddit where they were showing of doing peoples prompts still in early beta said maybe released in a month. Most likely the last open source image generator from Stability due to Emad not paying Amazon on time for using their clusters.

@Vivaerti 6 ай бұрын

It will, it seems to still be in the beta phase at the moment, but once it's released, I know a certain community that is gonna go crazy for it and no I'm not talking about safe stuff. If you know, you know.

@MilesBellas 5 ай бұрын

Stability AI must be saved.

@unlomtrash 2 ай бұрын

Not anymore. We have black forest

@jollyamvxgifts379 6 ай бұрын

background music?

@CUBETechie 6 ай бұрын

I predict some r34 content

@jasonhemphill8525 6 ай бұрын

Nooooo. On my good christian diffusion model? No way.

@JazevoAudiosurf 6 ай бұрын

how about MoE DiT SSM b1.58 transformers

@TheLiverX 6 ай бұрын

Finally

@JotaroKujoJoJo-qx2yc 6 ай бұрын

Why does this video type looks like fireship's video.... Did firship stole the idea(or inspired) from this channel... Or this channel stole it(or inspired) from fireship????

@Aurelloyell 6 ай бұрын

this one is a hot one

@comic--sans 6 ай бұрын

why do you only have on screen subtitles in certain parts of your videos? it kinda defeats the purpose of subtitles in my opinion.

@rayhere7925 6 ай бұрын

It's ok. There, there now. Shhh... Go back to sleep.

@dannyyyXYZ 6 ай бұрын

Ai generated subtitles as well

@comic--sans 6 ай бұрын

@@rayhere7925 I see this everywhere and it drives me insane. people realize you can make ai subtitles using real captions instead of on screen ones right? youtubers seem to care more about viewer retention than accessibility.

@spencernorman2626 6 ай бұрын

The thumbnail......😂

@AmirHamzah_MAHBAR 6 ай бұрын

What is that game in @5:07?

@c0d3_m0nk3y 6 ай бұрын

ClearConnect VR , according to Bing Copilot.

@MrTurbo_ 6 ай бұрын

I'm worried a 4090 is not gonna be enough for stable diffusion 3 anymore lol. And knowing nvidia they probably won't be releasing a GPU with more than 24GB of ram any time soon

@Vivaerti 6 ай бұрын

Eh, sites that allow image generation will sort this out like they did when this technology was first released.

@patriciogarcia5442 6 ай бұрын

Dive into DiffiT HDiT plz 🙏🏼

@omkarjamdar4076 6 ай бұрын

Can I still participate in the giveaway coz I need a PC and all I have is a i5 7thGen laptop

@FileNotFound404 6 ай бұрын

AI is genuinely getting to the point where it isn't even fun anymore. the only reason I ever found it interesting was that I thought it wouldn't really go anywhere, but now I see it so often, and I'm constantly trying to figure out if something is "AI". Like I just want it to stop now before it starts taking jobs from people. Especially considering that, basically every field I'm talented in is being attacked by these corporations looking to give even less money to the ones who deserve it.

@Neonagi 6 ай бұрын

It's like wagon makers and horse maintainers lamenting the creation of the motor vehicle. As all things, not everyone wins with a new technology, and we are forced to adapt and overcome.

@Vivaerti 6 ай бұрын

I still find it fun to try and see how many concepts it can do accurately or poses, but it is annoying shifting through thousands of ai posts on Pixiv.

@phobosmoon4643 6 ай бұрын

i dont think its the sigmoid curve of ai image generation its the sigmoid curve of agi

@tommysalami420 6 ай бұрын

I actually helped figure this out :3

@falsechord 6 ай бұрын

ai art devs should focus on composition not quality. at this point the only composition method is control net which works great with humans but...what about everything else buildings, landscapes, dragons, eldritch creatures.

@BinaryDood 6 ай бұрын

sketch it yourself first... it's literally just shapes

@falsechord 6 ай бұрын

@@BinaryDood there are things like that have the same shape like a basket ball and a soccer ball. if i draw 2 circles in 2 different locations the ai wont be able to tell which type of ball to place in which circle. this is just a basic example, a fox and a wolf is another.

@BinaryDood 6 ай бұрын

@@falsechord idk, read "Picture this" by Molly Bang and see what you can come up with

@MortyMortyMorty 6 ай бұрын

Stolen Fireship thumbnails to farm views! Clever little kid!

@morphidevtalk 6 ай бұрын

WHITE THEME DISCORD WTF

@seriousOmajan 6 ай бұрын

TBH I'm dumbfounded how much you rely on low res gifs with text on top while talking about image/video generation. It seems that it is basically useless to you or you have no idea how to pivot it to your workflow.

@deltamico 6 ай бұрын

By using those gifs he culturaly bonds with the viewer. But I agree incorporating generation would signify more experiance with given subject