This new type of illusion is really hard to make

  Рет қаралды 749,095

Steve Mould

Steve Mould

Күн бұрын

Пікірлер: 2 100
@SteveMould
@SteveMould 3 ай бұрын
That was a real Parker Twisty Square. The sponsor is Jane Street. Find out about their internship at: jane-st.co/internship-stevemould NOTE THE URL ON SCREEN IS INCORRECT! This is the correct URL. I'd call it a Parker URL but Matt got it right.
@blobbo.
@blobbo. 3 ай бұрын
ok
@thetechfury
@thetechfury 3 ай бұрын
I like your video Steve Mould. Keep up the good work. ^w^
@yuyurolfer
@yuyurolfer 3 ай бұрын
Nice how you and Matt uploaded your linked videos at the same time
@h1234e1234
@h1234e1234 3 ай бұрын
Server not found. Maybe AI was not so much of a hype. XD Joking :P
@thetechfury
@thetechfury 3 ай бұрын
@@yuyurolfer Indeed.
@CriticalMonkey623
@CriticalMonkey623 3 ай бұрын
Okay, hear me out. THIS is AI art. Not people using AI to just generate whatever they put in a prompt. But actual human creativity and ingenuity using AI as a tool to create something which previously would have been extremely difficult, if not impossible. There are a lot of ethical and aesthetic problems with generative AI in its current state, but this is the first time I've seen something made with AI and thought "that's beautiful".
@candycryptid2832
@candycryptid2832 3 ай бұрын
I agree!
@bl4cksp1d3r
@bl4cksp1d3r 3 ай бұрын
it is interesting, yeah, I will argue that in this specific case AI is DEFINITELY used as a tool to find a solution. My problem from day one always was with people who say they are AI artists. But that's clearly not what this video is about
@KindOfWitch
@KindOfWitch 3 ай бұрын
yass queenses this is totes the stuff
@taffy4801
@taffy4801 3 ай бұрын
A novel solution to a novel problem. Well put.
@candycryptid2832
@candycryptid2832 3 ай бұрын
@@bl4cksp1d3r I AGREE
@alexholker1309
@alexholker1309 3 ай бұрын
15:14 Bias and hallucination in the context of generative AI aren't simply human fallibilities, they're the mechanism by which it functions: you're handing an algorithm a block of random noise and hoping it has such strong biases that it can tell you exactly what the base image looked like even though there never was a base image.
@truejim
@truejim 3 ай бұрын
Well said. Also: bias and hallucination are so commonplace in our own neural networks (our brains) that we even given them categories and names, such as “over generalization”, “confirmation bias”, “sunk cost fallacy”, or the catch-all “brain fart”. All neural networks (including our own) apply learned patterns in contexts where the learned pattern shouldn’t be applied. That’s why (to your point) the neural network driving diffusion can denoise noise that was never there in the first place.
@istinaanitsi3342
@istinaanitsi3342 3 ай бұрын
это переход к изображению которое было в другой параллельной реальности и ваш мозг может существовать сразу в нескольких таких если его тренировать к непредвзятости, а то что это можно воспроизвести на компьютере впечатляет меньше чем древнекитайский язык в котором эта опция обязательна к применению, вы просто зациклены на вашем языке и это делает вас способными к удивлению
@truejim
@truejim 3 ай бұрын
@@istinaanitsi3342 I think that’s the premise of Blake Crouch’s novel “Dark Matter”. 😀
@istinaanitsi3342
@istinaanitsi3342 3 ай бұрын
@@truejim не читал, но словосочетание темная материя просто говорит о неспособности науки понять мир, поэтому они заменяют знания темными словами
@WavyCats
@WavyCats 3 ай бұрын
​@@truejim Very true. The ability for humans to recognise faces, even in places where there is no face, can be said to be one of our biases, yet a useful one at that, which makes me wonder whether hallucination and bias in reasoning is not merely a flaw, but something that may have inadvertently assisted in our survival throughout history.
@PixelSodaCreates
@PixelSodaCreates 3 ай бұрын
The rabbit/duck illusion got a serious glow-up
@oliviervancantfort5327
@oliviervancantfort5327 3 ай бұрын
Too bad the cover image of the video was edited to make the transformation more dramatic. The left rabbit ear on the second cube was basically erased on the duck image...
@MollyHJohns
@MollyHJohns 2 ай бұрын
​@@oliviervancantfort5327 oof now that you pointed it out 😢
@Neptutron
@Neptutron 3 ай бұрын
Hey Steve and Matt, thank you guys for featuring our research - it was a lot of fun working with you! I'm Ryan Burgert, the author of Diffusion Illusions - I'll try to answer as many questions as I can in the comments!
@lmeeken
@lmeeken 3 ай бұрын
One thing I wasn't clear on. They describe taking the first images of two iterative prompt responses, flipping and layering them, and then using that single image as the first step in two different prompts (in this case, for penguin and giraffe). But how do you end up with a single image, rather than two different images that just used the same starting point?
@I.swareitsnotpersonal
@I.swareitsnotpersonal 3 ай бұрын
@Neptutron hey, I’m just wondering from an artist perspective, how this might be used to make artworks. I’ve made previous comment about it. I just wanted to say your work sounds amazing and looks amazing! Although 😅I’m a little worried about people wanting to steal and profit from the other artist’s artwork. 👍
@venanziadorromatagni1641
@venanziadorromatagni1641 3 ай бұрын
I just wished they hadn’t used Midjourney pics. That company is pretty exploitive, both towards copyright holders AND to their customers.
@koalachick8029
@koalachick8029 3 ай бұрын
For the diffusion array: could I put in a bunch of images and a "goal" image and have the machine output the correct arrays?
@Zutia
@Zutia 3 ай бұрын
Hi, important ethical question. Can you say with 100% certainty that your copy of Stable Diffusion is entirely divorced from stolen artwork?
@mccoydtromb
@mccoydtromb 3 ай бұрын
I would love to hear this kind of illusion done with audio, such as reversing the audio file and hearing different text, or a piece of music!
@blackwing1362
@blackwing1362 3 ай бұрын
Or something with the Yanny or Laurel thing but on purpose.
@danklemonsoda
@danklemonsoda 2 ай бұрын
4 different sounds when overlayed making a completely different one would be cool
@KINIIKIO
@KINIIKIO 2 ай бұрын
⁠​⁠​⁠​⁠@@blackwing1362 if u increase/decrease the pitch of that audio, you will be able to hear each word on purpose
@jasondashney
@jasondashney 2 ай бұрын
I'm not sure that would work because these images can be based on something that vaguely sort of kind of resembles a penguin or giraffes, but I don't think our brains give us the same leeway for sounds. I don't think there's a pareidolia for sounds, is there?.
@danklemonsoda
@danklemonsoda 2 ай бұрын
​@@jasondashneywords, sentences, we can derive words from really distorted sounds
@davetech9403
@davetech9403 3 ай бұрын
Those blocks would sell really well in gift shops. Especially in Zoos.
@WitchOracle
@WitchOracle 2 ай бұрын
I would buy so many of them for real (I like to have a basket of fidgets, puzzles, and tactile art pieces on my coffee table and these would fit right in)
@vicnaum
@vicnaum 2 ай бұрын
@@WitchOracle I think there is a method you can 3D print it in-place (no assembly), and also transfer a color to the the first layer from a piece of ink-jet printed paper (it was on TeachingTech channel I think)
@gakulon
@gakulon 3 ай бұрын
Loved the Matt Parker jumpscare in the image sequence
@doq
@doq 3 ай бұрын
I literally pushed pause right on that frame and lost it 💀
@arnabbiswasalsodeep
@arnabbiswasalsodeep 3 ай бұрын
Not a jumpscare but easter egg. Also its Maths Parker²
@AnasHart
@AnasHart 3 ай бұрын
2:52 at 0.25x speed
@larshofmann7516
@larshofmann7516 3 ай бұрын
Ah yes, the Parker Scare
@WillBinge
@WillBinge 3 ай бұрын
I thought I saw him
@etunimenisukunimeni1302
@etunimenisukunimeni1302 3 ай бұрын
This wasn't a video about how diffusion models work and are trained... but you still managed to explain both better than the majority of videos on YT about the subject. Can you make a video explaining how you became so damn good at explaining things? Oh, and this is the coolest application of image generators I've seen to date. Brilliant idea leveraging the intermediate diffusion steps to sneakily steer the result into multiple directions simultaneously!
@theblinkingbrownie4654
@theblinkingbrownie4654 3 ай бұрын
Im not him but I'll guess it's due to how many years he has been explaining such a variety of topics.
@-danR
@-danR 3 ай бұрын
This was the _least_ illuminating Steve Mould video I have ever seen. Most of them are exceptionally lucid, even in a single pass. I lost my bearings past the "keep adding noise..." stage.
@etunimenisukunimeni1302
@etunimenisukunimeni1302 3 ай бұрын
@@-danR Can't blame you, it's a weird process that seems completely backwards the first time you learn about it. It sounds so stupid that first they make this giant model that can remove noise only to put most of it back in, but it's the only way to iterate enough to get a clear picture. Anyway, it was only background information, and it doesn't really matter if it didn't become crystal clear how all of it works - the important thing is that the model is trained to be good at removing noise from a grainy picture. If you then start from a random mess and tell the model it's an extremely noisy picture of a cat, it will make into a picture of a cat by taking the supposed noise away. And because it happens in steps, you can alternate the subject between a cat and a dog in every other step, and it becomes both a cat and a dog in the end (obviously oversimplified)
@TheClintonio
@TheClintonio 3 ай бұрын
This is a really educational video on AI which _should_ help most people understand and realise that these LLM and diffusion models are not General AI (ie; "truly intelligent") and just simple mathematical models. I studied AI and ML long before LLMs became a thing and have always been aware of this but convincing people of it is very hard in a short timeframe.
@Zutia
@Zutia 3 ай бұрын
Honestly, as long as this shit continues to be trained by stealing work from actual human artists I don't care. I'm genuinely disappointed in Matt and Steve.
@DaveEtchells
@DaveEtchells 3 ай бұрын
How do you know that we aren’t simply somewhat more complex mathematical models? 😉
@theninja4137
@theninja4137 3 ай бұрын
​@Zutia what is and isn't stealing in this context is something that still needs to be established. As the training images are not directly used, but just statistics on them (the training images are not actually stored in the final model, it's therefore impossible for it to "copy-paste" parts of them into the output image), so it doesn’t conflict with the current copyright. And if we change copyright in that regard, we also need to consider what that implies for artists being inspired by each other
@FENomadtrooper
@FENomadtrooper 3 ай бұрын
I don't see many people calling them general AI, but I do run into hordes of people on the internet vehemently claiming that an LLM is not even a type of AI at all.
@raccoon1302
@raccoon1302 3 ай бұрын
@@Zutia being disappointed in Steve for covering an extremely interesting and relevant application of a novel technology is quite frankly nuts. touch grass
@dekumarademosater2762
@dekumarademosater2762 3 ай бұрын
So a person could do this too - rough outline sketch of penguin, of a giraffe; flip one, work out an average rough from both; flip one back, do more detail on both, flip one. Repeat till you're happy or you give up. But some people just do it in their head - amazing!
@rayscotchcoulton
@rayscotchcoulton 3 ай бұрын
Was thinking the same thing. With enough trial and error with both your original image and whatever secondary image that sort of manifests itself, this seems absolutely doable. It feels like an artist expression that humans could absolutely be trained in, but just haven't really ever largely pursued.
@cmmartti
@cmmartti 3 ай бұрын
It's a common trend to do this with names or words in fancy script, so that it reads the same flipped upside down. I've seen a bunch on KZbin and he does it in a few seconds (I couldn't tell you what it's called, it was something I saw in passing).
@mistrsportak9940
@mistrsportak9940 3 ай бұрын
​@@rayscotchcoultonWith enough trial and error, a monkey can write the Hamlet
@seav80
@seav80 3 ай бұрын
@@cmmartti those are called ambigrams
@cmmartti
@cmmartti 3 ай бұрын
@@seav80 There we go!
@protocol6
@protocol6 3 ай бұрын
A Mould-Parker crossover video about double image illusions in which you create several of them and you didn't do one that morphed from Parker to Mould?
@hundredfireify
@hundredfireify 3 ай бұрын
This is pushing it too far imho
@freemanmarco3373
@freemanmarco3373 3 ай бұрын
​@@hundredfireify nah😭😭😭. we need that
@megaing1322
@megaing1322 3 ай бұрын
I don't think the tech is currently up to that, since the models don't have a concept of steve or Matt.
@dside_ru
@dside_ru 3 ай бұрын
​@@megaing1322one word: embeddings.
@megaing1322
@megaing1322 3 ай бұрын
@@dside_ru One word: Models More random words you want to throw at me with no real relation to what I said?
@APrettyGoodChannel
@APrettyGoodChannel 3 ай бұрын
The reason some text models struggle with counting the number of r characters in a word like strawberry is because they don't see the word, they receive a vector which was trained to represent the different meanings of the word when looked at through different filters, similar to these illusions, which is what attention QKV projections do (extracting information from the vector which is layered in there). Sometimes the vector would have managed to store information about a word such as spelling and rhyming which the model can use, but oftentimes not, it depends on chance with how often things appear in the training data. The model could count it if the word was split into individual letters with spaces between them, because each would encode into a unique vector.
@LarryFain-y9w
@LarryFain-y9w 3 ай бұрын
write words the way they sound so the AI can say them easier
@RamsesTheFourth
@RamsesTheFourth 3 ай бұрын
@@LarryFain-y9w Phonetic consistency in english language would be great news for all non-english native speakers.
@Pandora_The_Panda
@Pandora_The_Panda 3 ай бұрын
Wouldn't work because english has too many different accents and dialects, unfortunately.
@anchpop
@anchpop 3 ай бұрын
not quite, the model receives a stream of tokens which are not semantically meaningful. a model whose tokens mapped 1-1 with english characters would have no problem counting the number of r characters in strawberry. what you are referring to is a part of the model that converts chunks of the tokens stream into token embeddings
@RamsesTheFourth
@RamsesTheFourth 3 ай бұрын
@@Pandora_The_Panda Only one accent and dialect would be standarad others would not. Or each country would have their own standard.
@zero01101
@zero01101 3 ай бұрын
this is absolutely the best explanation of the u-net and text encoder and how they work together i've ever heard
@ex5tube
@ex5tube 3 ай бұрын
I'm a software engineer and a midjourney user, and I've watched maybe 50 - 100 videos on LLM and generative AI. In 17 minutes you managed to provide the best simple explanation for how generative AI works with LLMs to produce images from prompts. Steve, you should teach a paid course on this stuff.
@ScottiStudios
@ScottiStudios 3 ай бұрын
I was going to comment the same thing. Such a compact and simple yet comprehensive explanation. Well done.
@istinaanitsi3342
@istinaanitsi3342 3 ай бұрын
ой ой ой
@evansdm2008
@evansdm2008 3 ай бұрын
Yes, same. I’m a software engineer also. Exactly as Steve says, I feel satisfied with that explanation.
@Just_A_Dude
@Just_A_Dude 3 ай бұрын
I'm saving this video for the next time someone calls generative AI a "collage tool cut-and-pasting other people's images."
@kevinbrown2701
@kevinbrown2701 3 ай бұрын
I'm NOT a software engineer (I can barely string a string together), and yet it still made sense to me! I was left with one burning question though: Where can I buy these things to show other people?
@KeplersDream
@KeplersDream 3 ай бұрын
It's like a sci fi version of the old Mad Magazine 'fold-in' pictures, if anyone remembers them.
@frankhooper7871
@frankhooper7871 3 ай бұрын
73 years old and remember them well 😊
@stigcc
@stigcc 2 ай бұрын
I 'member!
@truejim
@truejim 3 ай бұрын
4:30 Minor nit. I don’t think the token embedding is really embedding based on semantics. It’s embedding based on how humans have used tokens in our writing. Since we tend to use semantically similar tokens in linguistically similar ways, the embedding does tend to cluster semantically similar tokens near each other. But it will also cluster tokens that aren’t semantically similar, merely because they’re used in the same way linguistically. For example “the” and “his” will be near each other in the embedding space not because they’re similar in meaning, but because they’re interchangeable in many sentences.
@muschgathloosia5875
@muschgathloosia5875 3 ай бұрын
What else is semantics then? The model is essentially doing what linguists do but using raw statistics instead of pattern recognition.
@truejim
@truejim 3 ай бұрын
@@muschgathloosia5875 A purely semantic embedding would cluster tokens based only on similar *meaning*. Embeddings such as Word2Vec cluster tokens based on how the token is used in written English. So two tokens can be embedded near each other because they have similar meaning, *or* because they’re interchangeable in a sentence. “I ate his pie” vs “I ate that pie”. The words ‘his’ and ‘that’ don’t mean similar things, yet they’re still clustered near each other. The neural network is being trained on how words are used, not what they mean. It just so happens that words with similar meaning are also often interchangeable in a sentence.
@marigold2257
@marigold2257 3 ай бұрын
@@muschgathloosia5875it’s not understanding the semantics because the way it arranges things has nothing to do with semantics and everything to do with frequency of use together, if for every string of words I had a dice I could roll that would give me a a word to write down then I could generate sentences, if that dice was weighted via analysis of how often words are used together then my writing would look human and because of how language works it would look like semantic understanding, but I don’t understand anything I’m just rolling a dice based on frequency of word use
@coolguyflex
@coolguyflex 3 ай бұрын
​@@marigold2257LLMs are not Markov chains. They capture very complex and subtle relations between words. An LLM works by analyzing it's training data and representing it numerically in a way so it can reuse it to satisfy prompts. But the training process forces the model to be efficient with its organisation. The model is unable to learn all word patterns, so it has to instead find and learn subtle higher order concepts that are simple to memorize but can be used to satisfy many prompts. It's like getting a kid to solve a thousand exam questions. They can't possibly learn all answers, so they will be forced to pick up patterns in answers, allowing them to answer questions they haven't seen before. These patterns will be artificacts of the way the questions are framed, as well as real knowdlege about the subject of the test. It's difficult to examine exactly what the model knows, but it's possible to show that at it organizes it's knowledge in a way that encodes concepts similar to our semantic concepts. For example age may be represented as a geometric direction where words further along that direction are semantically older. Does that mean the model "understands" age? That's a philosophical question. But it means the model can use the concept of age in ways similar to what we do. People often take poor math abilities as an example that the LLM isn't actually reasoning like we are. I think that's mostly training artefacts. There is not enough pressure on the model to learn mathematical concepts, so it instead learns shortcuts to produce plausible answers. However, concepts like age, sex, size are quite well represented because they are very useful useful to answer the types of prompts the model was trained for.
@muschgathloosia5875
@muschgathloosia5875 3 ай бұрын
@@marigold2257 I'm not claiming it has any 'understanding' I'm just saying that the vector of tokens created is probably relevant to semantics more than just happenstance. I'm not putting any merit on the output of a generative model just the intermediary organization of the data.
@Dialethian
@Dialethian 3 ай бұрын
Oh the overlap with mundane cryptography could be interesting. The order of words could be scrambled between two outputs. The idea of synthesizing sound that says different things if you understand different languages is kinda horrifying.
@ianmoore5502
@ianmoore5502 3 ай бұрын
Or sounds which mean the same thing in multiple languages. What a time to be alive!
@Gabu_
@Gabu_ 3 ай бұрын
That's already a real thing!
@noahjacobs7486
@noahjacobs7486 3 ай бұрын
We could create infinite laurel/yanny prompts or images that have hidden details for color blind individuals
@istinaanitsi3342
@istinaanitsi3342 3 ай бұрын
это будет проблема для английского языка, в русском языке заложена защита от такой глупости, дети умеют пользоваться этим в играх на русском языке
@istinaanitsi3342
@istinaanitsi3342 3 ай бұрын
@@minhuang8848 просто ваш мозг вас разыгрывает
@Levaaant
@Levaaant 3 ай бұрын
0:27 can we get the link to that please?
@Brightguy858
@Brightguy858 3 ай бұрын
Its github
@iropiupiu9642
@iropiupiu9642 3 ай бұрын
I want the link too (._.)
@miguelabro
@miguelabro 2 ай бұрын
Get this comment up
@amarissimus29
@amarissimus29 3 ай бұрын
In the settings of automatic1111, you can enable a clipskip slider right up top next to your model, vae, etc. Very useful if you're playing around with CLIP, especially when you've got novel length prompts. Doesn't really help you understand how the vector spaces really work, but it does help you to pretend to understand how they work.
@eler90
@eler90 3 ай бұрын
Just don't forget that "but it eorks either way" means actually that scientists have tried I would assume thousands of ideas regarding the network architectures, hyperparameters etc. and only some ideas have worked so well that they allowed for the next step. Showcasing results is one thing, developing the models another. It's hard work.
@macronencer
@macronencer 3 ай бұрын
The one where you combine the four transparencies together is a very cool new form of steganography. Excellent!
@BarnabyPine
@BarnabyPine 3 ай бұрын
Salvador Dali has a painting which looks like a woman in a dress going through a door in some kind of cubic world. When you go to take a picture of it, it looks like a pixelated Abraham Lincoln
@1alexandermichael
@1alexandermichael 3 ай бұрын
That is an example of a hybrid image
@RFC3514
@RFC3514 3 ай бұрын
That's basically a highpass / lowpass image (close up you see the fine details, further away you only see the big blocks). They're not hard to make. There's one in this video at 14:36 (not pixelated, but it's still the same highpass / lowpass concept). P.S. - I'm pretty sure the woman in Dali's Lincoln painting isn't in a dress, unless the fabric is incredibly thin. 😉
@maxnami
@maxnami 3 ай бұрын
Canadian artist Bob Gonsalves used to do that kind of paintings. Search for his works on internet.
@ghislainbugnicourt3709
@ghislainbugnicourt3709 3 ай бұрын
@@RFC3514 I agree with you but "they're not hard to make" is misleading. Some are hard to make. One example I like from Dali is The Hallucinogenic Toreador, where the same effect is used but with a smaller scale difference. I believe that's much harder to make, and that's without even considering the artistic aspect.
@caesarpizza1338
@caesarpizza1338 3 ай бұрын
@user-gt5df8yt1v What painting is this?
@jenshaglof8180
@jenshaglof8180 3 ай бұрын
Steve! This video actually taught me how text-to-image AI works. I've seen many videos about it but it still seemed like magic to me. Now, I actually understand the underlying process. Thank you so much!!!
@Tigrou7777
@Tigrou7777 3 ай бұрын
The idea of generating images by removing noise is just as crazy as LLMs that generate text by predicting the next word (these are gross simplifications, but that's basically what it is).
@istinaanitsi3342
@istinaanitsi3342 3 ай бұрын
так работает резчик по камню или дереву, что в этом особенного
@vectoralphaSec
@vectoralphaSec 3 ай бұрын
AI is just mathematical magic. It's amazing.
@istinaanitsi3342
@istinaanitsi3342 3 ай бұрын
@@vectoralphaSec математика основа мира, а для вас просто мусор видимо
@nio804
@nio804 3 ай бұрын
It's even weirder than text prediction because the image model is trained to predict what noise was *added* to an image to make it noisier, and then by running that "backwards" on random noise you just happen to get an unreasonably efficient image generator.
@istinaanitsi3342
@istinaanitsi3342 3 ай бұрын
@@nio804 это не предсказание а угадывание общих шаблонов загрязнения сигналов, уверен - работает только в заранее заданных условиях, обычный саморекламный трюк
@lordfly88
@lordfly88 2 ай бұрын
Wow! That was incredible! It went from the most mind bending optical puzzles, to such a fantastic explanation of the whole thing. This is what KZbin is truly meant for.
@wholelottavideo8381
@wholelottavideo8381 3 ай бұрын
Amazing stuff. A word about your video editing. You have to give viewers enough time to assimilate the starting image before progressing to the secondary. Probably an extra second would do. When editing, you know what you are looking at, but a viewer doesn't. Wanted to rewind and pause all the time.
@MrBelles104
@MrBelles104 3 ай бұрын
I saw both video thumbnails pop up in my feed, noting the similarities, and I loved the opportunity you had to collab with Stand-up Maths!
@SimonJ57
@SimonJ57 3 ай бұрын
Seeing the pair have 3 different images (maybe a 4th) depending on the other squares orientation absolutely Blew, my, mind. And I would love to buy some.
@jonathanlevi2458
@jonathanlevi2458 3 ай бұрын
Dude, was deep and understandable. Thanks!
@gameeverything816
@gameeverything816 3 ай бұрын
Whoa whoa @ 11:01 you just gonna gloss over that?! That was awesome! I wanna see more of that, that was wild!
@Cyrribrae
@Cyrribrae 2 ай бұрын
Yea that was the part that really blew my mind and was only briefly mentioned. Just so much cool stuff that just was out of reach before
@MerchantMarineGuy
@MerchantMarineGuy 3 ай бұрын
5:12 sports…..what??!??
@torchy_
@torchy_ 3 ай бұрын
...jewish people...?????
@jackpoco
@jackpoco 3 ай бұрын
And again at 7:44
@Tiniuc
@Tiniuc 3 ай бұрын
12:10 dear god, a jigsaw puzzle with multiple answers!!!
@zacharydefeciani7890
@zacharydefeciani7890 3 ай бұрын
I saw that hidden matt parker at 2:52
@standupmaths
@standupmaths 3 ай бұрын
Highlight of the video.
@yuyurolfer
@yuyurolfer 3 ай бұрын
@@standupmaths It's him!
@zacharydefeciani7890
@zacharydefeciani7890 3 ай бұрын
@standupmaths there better be a steve mould in your video somewhere 😉
@legoworks-cg5hk
@legoworks-cg5hk 3 ай бұрын
​@@standupmathswhy don't you have that tick?
@loudej
@loudej 22 күн бұрын
This is the comment I was looking for
@maybud60
@maybud60 3 ай бұрын
Steve, it takes extraordinary talent to break down complex ideas into digestible pieces. Respect! Fascinating stuff.
@JimCoder
@JimCoder 3 ай бұрын
I suspect our own minds are filtering noise from those images to make sense of them. Then from another perspective that same noise becomes signal, yielding a different perceived image. Fascinating stuff reminiscent of Hofstadter's Godel Escher Bach.
@alquinn8576
@alquinn8576 3 ай бұрын
I wonder how my cat sees the world. Sometimes i think very different from me since they don't have the higher level concepts to make sense of nearly all of the human artifacts around them; i.e. it doesn't fit into their umwelt. I think the closest i came to understanding what that was like was when i overdosed on edibles and tried using my smart phone but nothing on it made any sense (I was trying to google what to do if you overdose on edibles, but I couldn't tell the app icons apart from one another).
@BrightBlueJim
@BrightBlueJim 3 ай бұрын
The cover of which is what I was reminded when looking at the 3D robot dog: the book cover art is a 3D figure that appears as a 'G' in one orientation, an 'E' in another, and a 'B' in another.
@Yahboykatra
@Yahboykatra 3 ай бұрын
Love how this video describes generative ai images so well! Appreciate the video!
@SnakeSolidPL
@SnakeSolidPL 3 ай бұрын
1:01 poor rabbit being called trash by Steve
@NotBroihon
@NotBroihon 3 ай бұрын
😢
@Jus10Ed
@Jus10Ed 3 ай бұрын
It's cute. Kind of looks like a stained glass window.
@izzard
@izzard 3 ай бұрын
6:14 Heeeyyy… I thought we weren't using Lenna anymore?!
@BrightBlueJim
@BrightBlueJim 3 ай бұрын
What do you mean, "we"?
@apppples
@apppples 3 ай бұрын
​@@BrightBlueJim people who respect the wishes of exploited women whose images were used without their consent is a pretty good stand in for the word "we" in this context
@ker6349
@ker6349 3 ай бұрын
​@@BrightBlueJimpeople who understand that there are a significant number of better test images, including those which are made and distributed with the permission of the subject of the photograph. Lenna was publicly fine with it for a while IIRC but now she thinks it's unnecessary for a variety of reasons
@Mykasan
@Mykasan 3 ай бұрын
can you do a epilepsy warning for 2:46. i'm not particularly sensitive to rapid light changes but i know some that are.
@landsgevaer
@landsgevaer 3 ай бұрын
This is my Video Of The Year! Excellent explanation of generative image AI with a pretty neat application too. Loved it! 💛
@binky_bun
@binky_bun 3 ай бұрын
The image in the thumbnail seems odd to me. Look at the top middle square. On the left it's got a light smudge near the top but on the middle image of the transition that light smudge has become a dark heavy smudge which in the right example becomes the rabbits right ear. How come it changes intensity if it's the same image? Nothing about the ducks head changes so why the cloud changed unless it's a different tile all together
@user-xsn5ozskwg
@user-xsn5ozskwg 3 ай бұрын
It's edited to be more convincing, because just like the underlying tech it's not actually as impressive to laymen as the people selling it want it to be.
@ker6349
@ker6349 3 ай бұрын
​@@user-xsn5ozskwgit's the KZbinr making a profit off of views doing the thumbnail homie, not the dudes who made the tech
@oliviervancantfort5327
@oliviervancantfort5327 3 ай бұрын
Too bad the cover image was edited. The left rabbit ear has basically disappeared on the duck image...
@robadkerson
@robadkerson 3 ай бұрын
You could have a puzzle that's a different picture no matter how you put it together
@jimburton5592
@jimburton5592 3 ай бұрын
I actually made one of those once. Didn't even take too long to design, and each of the possible ways to assemble the puzzle resulted in a unique image. Granted, it was only a 1 piece puzzle. But hey, it's a proof of concept, right?
@robadkerson
@robadkerson 3 ай бұрын
@@jimburton5592 nice! Not every arrangement has to work, you could even "seek" different solutions
@bobbob0507
@bobbob0507 3 ай бұрын
In other words, a normal puzzle
@KillerKatz12
@KillerKatz12 3 ай бұрын
@@bobbob0507 Well no because normal puzzles only make it possible for you to have one solution so you don’t get confused why your picture doesn’t look right. Basically the pieces only fit with certain pieces even if you try to jam them into a different one it will be slightly off size.
@moonrock41
@moonrock41 3 ай бұрын
You'd probably need to limit the number of pictures to two, but it would still be considerably more challenging since you'd need to determine which picture the pieces you've assembled are intended for.
@voradorhylden3410
@voradorhylden3410 3 ай бұрын
This is awesome. This is art. Something awe inspiring and flips how you look at things. Forces a new perspective. Nicely done!
@BrightBlueJim
@BrightBlueJim 3 ай бұрын
But it's not art. The root word for "art" is the same as that for "artifact" and "artificial", which means (to me) that for something to be art, it must be man-made. Which makes the AI itself art, but not the picture. Sort of.
@Nicola-cg1rg
@Nicola-cg1rg 3 ай бұрын
Great explanation of diffusion models and how text prompts work! One of your better videos of late!
@NickDClements
@NickDClements 3 ай бұрын
0:40 Those are SKEWBITS!, by Make Anything! Well, the auxetic cube he first modeled that led to SKEWBITS. Your original 'Self-assembling material' video inspired him to try and make an auxetic cube that he could 3D print. He made the files available for download, someone else then printed them, used it for this purpose, and now they are in this video. KZbin is amazing!
@alex.g7317
@alex.g7317 3 ай бұрын
At 2:00 I have never understood generative AI more. I love this explanation.
@chaos.corner
@chaos.corner 3 ай бұрын
The topic puts me in mind of ambigrams. I've created a few and it's all about getting enough features to trigger the word recognition in one orientation without destroying the recognition in the other direction. And vice versa.
@BrightBlueJim
@BrightBlueJim 3 ай бұрын
Which is what I hate, hate, Hate, Hate, HATE about AI. What used to be a clever thing is now something you can make just by writing the appropriate prompt.
@natchu96
@natchu96 3 ай бұрын
​@BrightBlueJim Well...doing normally time-consuming tasks extremely quickly is pretty much what computers were created for...
@gaelonhays1712
@gaelonhays1712 3 ай бұрын
I've been doing drawings that do this for years; this was really cool to see. _This_ is what AI is meant to be used for. It's not gonna take over every human job, because humans will always find ways to use it that it couldn't think of on its own.
@LexanPanda
@LexanPanda 3 ай бұрын
I hit the bell on your channel years ago and watch every video, but this one didn't show up in my notifications, nor was it recommended alongside other videos like your videos usually are to me. I'm glad this was a collab with Matt or I may have gone quite a while without seeing it.
@G33v3s
@G33v3s 3 ай бұрын
You need to get Vi Hart in on this action with a hexaflexagon that has actual images on each orientation
@zackcinq-mars2129
@zackcinq-mars2129 3 ай бұрын
Wow, completely agree! That would be so cool!
@Commentator-jh7wl
@Commentator-jh7wl 3 ай бұрын
I wish her brain hadn't melted years ago :(
@DangerDurians
@DangerDurians 3 ай бұрын
Stopping it half way is exactly how you would do it with physical media Do a sketch, re orient, edit sketch, repeat
@skilletborne
@skilletborne 20 күн бұрын
Right??? A real artist could have done it, but they were too lazy to
@maxlibz
@maxlibz 3 ай бұрын
2:46 epilepsy warning
@resourceress7
@resourceress7 3 ай бұрын
Yes, PLEASE PIN this comment. Thanks
@DavidTheHypnotist
@DavidTheHypnotist 3 ай бұрын
Thank you! That gave me a freaking headache!
@oliparkhouse
@oliparkhouse 3 ай бұрын
@SteveMould Would you mind editing in a 'flashing imagery warning' at the start of the video. KZbin's editor should allow a text box to be input ahead of the section with flashing, and shouldn't require you to re-upload. Thanks @maxlibz kudos for putting the warning up. KZbin showed the comment just before the flashing began. Though I'm not epileptic flashing imagery can trigger or worsen my migraines. Your effort has made a difference already. Thank you!
@yeahiagree1070
@yeahiagree1070 3 ай бұрын
@@oliparkhouse epilepsy is not a fashion accessory for you to wear to make yourself more interesting. Shut up
@piotrcthlu
@piotrcthlu 3 ай бұрын
Thank you, this should be Pinned.
@kyar0s539
@kyar0s539 3 ай бұрын
THANK YOU. This was really mindbreaking and inspiring. Love your channel, and loved this video. Love this kind of reflexion+ type where you take a very complex AI subject and decompose it bit by bit.
@BrightBlueJim
@BrightBlueJim 3 ай бұрын
I don't know how you do it, but every video of yours I see is fantastic.
@74Gee
@74Gee 3 ай бұрын
Your description of diffuser, large language and clip models, and how they relate/interact was the best I've heard so far. I can only imagine the enlightening journey it took to explain this so succinctly.
@illusion-xiii
@illusion-xiii 3 ай бұрын
Is it just me, or around 1:40 does it really look like the illusion is going to resolve into Yoda for a moment?
@GreenCat188
@GreenCat188 3 ай бұрын
Nyoda Cat?
@MrMattie725
@MrMattie725 3 ай бұрын
Are we ignoring that the rotated first draft giraf at 11:38 was already the most stereotypical penguin image one would think of? :o
@rai2880
@rai2880 3 ай бұрын
And the reverse penguin was also a giraffe
@moncef2733
@moncef2733 Ай бұрын
I think it's an editing mistake they swapped the 2 images without noticing because it was too blurry xD
@NitoTerrania
@NitoTerrania 20 күн бұрын
What the...???? How the heck did KZbin failed to recommend this video to me ??? I subscribed. I watch almost every single video in this channel. And somehow this video was not recommended to me at all ? not even once ?
@MikkoRantalainen
@MikkoRantalainen 3 ай бұрын
Great video! I consider this video to be mostly about creative visual hack that depends on human visual understanding but it also happens to be one of the best introductions to noise diffusion image generators, too.
@RjWolf3000
@RjWolf3000 3 ай бұрын
That rotating set that created 3 or more images is interesting. Could ai generate a bunch of layers where rotated coukd show an animated scene. That could make a really interesting sign or clock with a mechanical animation.
@joshlake1882
@joshlake1882 3 ай бұрын
I had the same idea, I’d love to make a rotating layer display.
@Simple_But_Expensive
@Simple_But_Expensive 3 ай бұрын
You mentioned not training with human data to eliminate bias, but I have seen mathematical arguments that bias is unavoidable. There were several papers and videos, but the only one I remember was an episode of Nova discussing how use of AI in predictive law enforcement in Oakland, California led to heavy handed responses in one neighborhood while ignoring rising crime in another. Admittedly, the math was way over my head, but it seemed pretty convincing. The problem basically lies not in the training data itself, but in the selection of training data. Something along the lines of having university students select a set of images of men. The students unconsciously biased the data set by selecting a majority of younger more attractive and apparently more affluent white men by 58%. Another example was Google’s AI refusing to show any white men in images of the founding fathers of the USA. (Which is confusing because they were all old white men. Talk about bias!) Trying to select the data completely randomly only proved that we can only generate pseudo random numbers, yielding pseudo random sets. The bias can be minimized, but never completely eliminated. In the end, any AI will be a reflection of us, both the good and the bad in all of us. That is what is scary about AI.
@DanKaschel
@DanKaschel 3 ай бұрын
I think this overstates the severity of the problem. Sometimes AI is thought of as a really sophisticated calculator, and indications that its answers might be incorrect are an existential threat. But AI is maybe more like... Marketing. We get iteratively better at creating AI that will achieve our goals, and with time we will build more and more expertise at accelerating that process. The fact that AI in its current form is not capable of solving certain problems perfectly is scary in the sense that we can't cure cancer with medicine. It's unfortunate, but not necessarily unsolvable and certainly not intrinsic (except to specific approaches).
@Gabu_
@Gabu_ 3 ай бұрын
Your very first point is "It isn't a problem with the training data, it's just a problem with the training data"... Maybe think a bit longer on your argument.
@LesenundDenken
@LesenundDenken 3 ай бұрын
The internet, like history, art, and pretty much any human cultural artifact, are all humanity's Caliban's mirror.
@Yottenburgen
@Yottenburgen 3 ай бұрын
The google thing was likely instructional bias tbh rather than something trained into it. But that really just points into bias on both parts, what you put in and what is already inside of it.
@Yottenburgen
@Yottenburgen 3 ай бұрын
@@Gabu_ Human training data is different, it means random quality datasets that humans have a 100% hand in creating, it matters what you put in but every dataset even if it isn't explicitly human consolidated is biased. Even if a LLM were to create its own dataset it would still be human based as it inherited a human bias.
@Lampe2020
@Lampe2020 3 ай бұрын
6:35 You've accidentally created a demon cat XD
@AnaSnyder-j4x
@AnaSnyder-j4x 3 ай бұрын
This content is always full of useful and practical knowledge.
@unfa00
@unfa00 3 ай бұрын
Steve, your clear explanation makes me want to try and make such a puzzle myself. My idea is I could model something and animate it so I can easily switch between two different states and paint digitally. Like painting on 4 separate cards while seeing them all juxtaposed. It seems possible to do manually with digital painting. Way, way harder to do with purely physical tools I guess. I'd wager an artist could make these, maybe even better than the AI can. The drawings that portray one thing, and then another thing when upside down, have been made by human artists already. The process you've described on how AI does it makes it seem to me like I could do it, even being mediocre at painting/drawing.
@dibbidydoo4318
@dibbidydoo4318 3 ай бұрын
the upside thing is the classic but there's more complicated tricks you can do with Genai.
@johannesstephanusroos4969
@johannesstephanusroos4969 3 ай бұрын
​@@dibbidydoo4318 Could you tell me how, please? I have a friend who's obsessed with ducks, so anything that changes from a duck to something else and back would be amazing. I'd really like to make one for them
@zynskeyfolf
@zynskeyfolf 2 ай бұрын
A wild unfa spotted
@justpassnthru
@justpassnthru 3 ай бұрын
I remember, back in the 70's, there was a drawing of a "prom queen" with crown and all but when turned upside down was a picture of an old woman. It was a classic. Very simplistic compared to this but the same idea.
@almendratlilkouatl
@almendratlilkouatl 3 ай бұрын
oh yeah, and if you put it on the side you can see the beatles and aleister crowley riding a whale on the pyramid of Tolotsin the ancient god of fire and the sun and if you fold it at 33 degrees you get the masonic token to unlock the next level
@ranjitkonkar9067
@ranjitkonkar9067 3 ай бұрын
I remember that. Still used as an optical illusion example. Except that you didn't have to turn it around, did you? Just took a shift in perspectives to suddenly start seeing the other one.
@michalpifko3516
@michalpifko3516 3 ай бұрын
@@ranjitkonkar9067 There are two commonly used optical illusions that show a young/old woman. One involves rotating the image (the one OP was talking about) and it often comes with text that says "before 6 beers/afer 6 beers". The other it the one you are probably remembering (you can see a profile of an old woman or a young woman looking away from the picture).
@khutikhuti
@khutikhuti 3 ай бұрын
Sneaking in a Matt Parker pic in the images there 😂👌
@Nakatoa0taku
@Nakatoa0taku 3 ай бұрын
Your sponsor sounds like insider trading with extra steps 😂
@PeetSneekes
@PeetSneekes 3 ай бұрын
You know, I never want to watch these videos, but when I do, I’m mesmerized, fascinated and happier. Thank you!
@Coksnuss
@Coksnuss 3 ай бұрын
What an excellent video that explains quite accurately (enough) how generative models work at a fundamental level.
@Grim_Beard
@Grim_Beard 3 ай бұрын
04:00 Sorry, Steve, but this is a very misleading explanation of Large Language Models (LLMs). LLMs do _not_ 'understand' text, and they _don't_ have semantic knowledge (e.g. that 'blue boat' means that the boat is blue). The model doesn't know what a boat is, or what blue is, or what it means for a boat to be blue. All it knows is that certain words (actually tokens, which might be words, parts of words, or combinations of words) go together at certain frequencies. LLMs do not have 'meanings', just probabilities of tokens occurring together.
@Grim_Beard
@Grim_Beard 3 ай бұрын
@Singularity606 Unsure why you think I feel "so strongly about this". I just thought Steve, who generally likes to give accurate information, might want to, you know, give accurate information. He can't correct errors if no-one points them out. Also unsure why you're giving misinformation about LLMs, which do _not_ have semantic knowledge. The fact that a prompt like 'blue boat' can be used to generate an image of a blue boat does not mean that either the LLM or the diffusion model has any semantic knowledge. No more than a checkout recognising a barcode as belonging to a banana and displaying a price means that the till knows what a 'banana' is or has any concept of either food or money.
@Grim_Beard
@Grim_Beard 3 ай бұрын
@Singularity606 No, I'm talking about meaning not 'qualia' (which is a silly concept invented by a philosopher who doesn't understand cognitive neuroscience or psychology). You know what a boat is, what it does, how it works, where you're likely to find one, what it's used for, and so on. To you, 'boat' is not just a token that appears in some sentences, it _means_ something. LLMs don't have that. In an LLM 'boat' is just a token, that is statistically associated with other tokens.
@Grim_Beard
@Grim_Beard 3 ай бұрын
@Singularity606 The word is literally just a token in the LLM's data set. The LLM has no understanding of meaning, it only (1) calculates statistical associations between tokens in training and then (2) uses them to generate output. This is not controversial, it's very basic, fundamental stuff about how LLMs work.
@alansmithee419
@alansmithee419 3 ай бұрын
​@@Grim_Beard It's very basic, fundamental stuff about how LLMs are *trained.* That does not necessarily tell us anything about how it actually performs that task internally within the model. AIs are often called a black box for this reason, and we are perpetually confused as to just *how* they perform so well. Perhaps the reason for this is that understanding is not so difficult to achieve as we'd expect. If you ask the LLM what a boat is it will tell you. If you ask the LLM what will happen if a broken boat is placed in water it will tell you. If you ask the LLM what a good tool for moving items over seas is it will tell you (it's a boat). These imply understanding of some form to me, even if it is not the exact same as the understanding we have. Yes internally it's "just a token." But it knows the relationship of that token to other tokens and how they can be put together to form coherent messages, and it can derive information about the world from these relationships. That is language, and (to me) that is understanding. Even if it is not a language any human speaks, being more numerical in nature, it remains a language with meaningful syntax and the ability to perform the task of any human language. The LLM understands this language, and we simply translate for it on either side of the process. Words in the human brain are "just electrical signals" that we know the relationship of to other electrical signals and how they interact with each other to allow us to form coherent messages, and we can derive information about the world from these electrical signals. We have more types of data than the AI, but that doesn't inherently mean that we understand and they don't, just that they understand less or differently. Ultimately the only way you can claim that AI doesn't understand (or does, my above statement that they do is just as subjective as your statement that they don't) is to first provide a solid definition of what you mean by "understanding." The word has no set definition, so unless you tell people what specifically you mean when you say that you are not communicating your thoughts in their full form. And in any case you cannot state this not understanding as being a known fact that others are incorrect about. They are simply using a different definition of this ill defined word to you. They are not wrong.
@shiuay6165
@shiuay6165 3 ай бұрын
​@@alansmithee419Thank you very much, that's exactly what I wanted to respond to this comment and yours saved me quite some time ! I find it weird that people will go and "correct" people like that, while being so horribly confident in their "knowledge", saying things like "this is basic knowledge/facts about LLMs". This guy even has 10 likes wtf, how can anyone not think a minute about defining what is "semantics", "understanding" or even "knowing" before arguing if current LLMs have such things. Guys, please define the terms you are using before asking if LLMs have those !
@TheLurker
@TheLurker 3 ай бұрын
Hey! Just a heads up that this video uses the Lenna image at 6:14. This is a playboy centerfold that was used for decades as a test image in digital image processing, but it's generally frowned upon to use it now, because it's a vestige of misogyny from the 1970s in tech. Its use has also historically privileged lighter skin tones over darker ones. It's worth going and reading about the history of this image and how it got into such wide use, and why folks consider it harmful in this day and age if you want to know more.
@Hyperlooper
@Hyperlooper 3 ай бұрын
Bring back lenna
@morphentropic
@morphentropic 3 ай бұрын
Which folks?
@alquinn8576
@alquinn8576 3 ай бұрын
i'm here for the flippy image stuff, not this woke BS
@Hyperlooper
@Hyperlooper 3 ай бұрын
​@@morphentropicyou know, "folks". Same ones who don't mind your cat getting eaten.
@jay_13875
@jay_13875 3 ай бұрын
Nobody asked
@anoobis117
@anoobis117 3 ай бұрын
It's not really an illusion in my opinion. It's just a fancy way of putting images together creatively. An illusion would imply there is some sort of visual trickery involved to make you think what you're seeing is something else, or that it exploits the visual cortex to produce hallucinatory artifacts. This does not do either.
@JimC
@JimC 3 ай бұрын
I agree with you completely. But what do we call it instead? I can't think of another word.
@handsbasic
@handsbasic 3 ай бұрын
i think we conclude that all visual perception is an illusion because of our object recognition meat “software.” i don’t think it’s such a radical conclusion.
@Foxmasker
@Foxmasker 3 ай бұрын
@@JimCa double image? Idk
@itsdonaldo
@itsdonaldo 3 ай бұрын
That red rabbit is amazing. The red hair looked naturally red and the red coat was a neat touch to match the top hat. That lil bugger needs a movie
@alansmithee419
@alansmithee419 3 ай бұрын
And the movie shall be called "lil bugger"
@DaleHawkins
@DaleHawkins 3 ай бұрын
Thanks!
@andrewcullen7671
@andrewcullen7671 3 ай бұрын
In the storied traditions of computational neuroscience, this video is a competent procedural explanation for the process of visual imagination. I wrote about this in my Master's thesis because I have aphantasia, and wanted to understand what other people could do, that I struggle with. In most people, the brain can generate real visual images in the occipital lobe based on words from the temporal lobe, eyes closed, no visual data. This process is how people have visual hallucinations - the brain generating visual data based on low-quality information. This is also why hallucinations are more common in one's peripheral vision and low light. People with aphantasia, including some hyperverbal autistic people, often require high quality visual data, so they can't imagine anything with their eyes closed, even picturing something that happened earlier that day, or their loved one's face. But the process of visual imagination works very much like diffusion. If a person pictures an apple, they may get a fuzzy red blob at first, and then the brain fills in more and more details based on previous experiences with apples. if I try this, I just think of the definition of an apple. Weirdly, I'm an abstract surrealist painter and art teacher - no visual imagination. I can't remember what my mom looks like.
@glennac
@glennac 3 ай бұрын
You would have been a great case study for Oliver Saks or V. S. Ramachandran. Both have written fascinating books about neuroscience and the many divergent ways the brain functions in certain individuals. May I ask, if you can’t “picture” your mother visually when you two are apart, with what cues do you rely on to establish that relationship? Do you “hear” or recall her voice? Are there behavioral mannerisms of hers that reinforce your relationship with her when you two are apart? Thank you for sharing your experience. 🙏🏼
@WindsorMason
@WindsorMason 3 ай бұрын
Fascinating!
@tdata545
@tdata545 3 ай бұрын
With the Duck and Rabbit, I can see both and where both transforms in each form. But these overlays are crazy.
@ramadrian248
@ramadrian248 3 ай бұрын
12:04 You're welcome
@m.sierra5258
@m.sierra5258 3 ай бұрын
1:23 Where can I get the Yoda VS weeping angel illusion? I really want that one for myself, it's awesome
@rayscotchcoulton
@rayscotchcoulton 3 ай бұрын
AI thoughts and comments aside, the angel-statue-to-Yoda transformation at 1:24 is absurdly clean and made me laugh out loud
@seekyunbounded9273
@seekyunbounded9273 3 ай бұрын
15:00 its about how its handled , if its handled by turning words into tokens it litterly cant see what the word is made of and will just rely on probability of what the input text thought it
@jeremiasrobinson
@jeremiasrobinson 3 ай бұрын
I made images that morph into each other using mirrors and anamorphism. As the viewer changes their position, the image morphs.
@jeremiasrobinson
@jeremiasrobinson 3 ай бұрын
I feel like I had to take a similar approach to these robots.
@VitorMiguell
@VitorMiguell 3 ай бұрын
You posted it somewhere?
@jeremiasrobinson
@jeremiasrobinson 3 ай бұрын
@@VitorMiguell I have videos of prototypes. I was living outdoors when I was making these, and they ended up only lasting for a few days each time I made them because if the temperature or humidity or something changes then the little boxes I made these in changed shape just enough to mess it up. I plan on making a better one soon, though, as now I live in a house and I can potentially make one large enough to put your head inside of the get the proper morphing effect. The problem with looking at it from outside of something is that when you move, something blocks your view, so there is an interruption in the morph. Keep that in mind as you look at this prototype. kzbin.info/www/bejne/Y6TXhKiBotiYm5o
@StevenGallman-g9c
@StevenGallman-g9c 3 ай бұрын
You explain even the most difficult concepts so well.
@CuJixBeatZ
@CuJixBeatZ 3 ай бұрын
now thats a great explanation of how diffusion models work with the noise! Felt like i learned something new
@Gna-rn7zx
@Gna-rn7zx 3 ай бұрын
Thumbnail is a bit misleading... the duck image was altered.
@baki2200
@baki2200 3 ай бұрын
The rabbit image was altered too! Look at the duck beak
@CountJeffula
@CountJeffula 3 ай бұрын
I wonder what Plato would think of the fact that we are quite literally creating a Theory of Forms where abstract ideas are no longer merely figments of human imagination, but destinations in a multidimensional vector space that can be visited repeatedly and used in increasingly novel ways. I’m sure Aristotle would need to think on it for a while given his views on Plato’s theory.
@brixiu5
@brixiu5 3 ай бұрын
I'm a little bit annoyed that the thumbnail is so obviously edited. The duck on the left has part of the square erased to make it look like the tool was better than it actually was.
@jeffrey5464
@jeffrey5464 2 ай бұрын
6:34 wasn't expecting a horror movie when I click on a Steve Mould Video
@HolyGarbage
@HolyGarbage 2 ай бұрын
This was incredibly fascinating. Thank you Steve and Matt.
@harrygreenfeld4964
@harrygreenfeld4964 3 ай бұрын
As long as all the data that gets scraped gets due credit or paid as necessary, not a problem.
@cmyk8964
@cmyk8964 3 ай бұрын
Yes, that’s the biggest problem with GenAI in its current state. It’s created from mostly pirated copyrighted works or sensitive personal data.
@user-xsn5ozskwg
@user-xsn5ozskwg 3 ай бұрын
Considering one of the illusions had Yoda something tells me even the most ethical proponents of the tech aren't interested in that.
@hurrdurr7861
@hurrdurr7861 3 ай бұрын
Your old world ideas of ownership of visuals is long gone.
@miauzure3960
@miauzure3960 3 ай бұрын
why there is no "AI" in the title!! this is one of the best explainations of AI diffusion
@IMakeUnitaleThings
@IMakeUnitaleThings 3 ай бұрын
Probably because AI and images is usually a bad thing and would minimize views, like putting nfts into the title or something like that
@J8_official
@J8_official 3 ай бұрын
Are AI tools being developed for languages other than English? Sentence structures and grammar differ in other languages, so the AI learnings of how to interpret a sentence in English will not be valid.
@Dialethian
@Dialethian 3 ай бұрын
I haven't checked but expect larger text models would have some ability with other languages in the scraped datasets. It might have passable at Spanish due to accidentally sampling: using from grammar structure more likely than poor spelling at this point. Chinese might be harder since the model is optimized around character clusters, not logographs?
@len9505
@len9505 3 ай бұрын
I would think they just need data on those languages.
@almendratlilkouatl
@almendratlilkouatl 3 ай бұрын
yes, only english, even the chinesians and russianic peoples are using only english data, the oldwolrdorder is expecting all other languages by 2028
@Yotanido
@Yotanido 3 ай бұрын
I've never actually tried this, so I decided to give Stable Diffusion some German. (Specifically I'm using Dreamshaper XL Turbo, CFG 3, 7 steps) It... works? My prompt "Ein Foto einer Tasse aus Baumrinde" (A photo of a cup made of tree bark) did generate an image of a cup and it had some tree parts in the background. The cup was definitely porcelain though. Using the same seed (noise pattern) with the English equivalent did yield a cup that was actually made of bark. So it's significantly worse, but it understood the gist of it. What tools like ChatGPT do to generate an image from prompts in other languages, is that they just translate to English first. It's not ideal, but I suspect you would have to specifically train the image models for each language individually, and that would be WAY too expensive.
@Skibidtoulet1234
@Skibidtoulet1234 3 ай бұрын
From my experience with chat gpt, it's Spanish is pretty rough and it used/may still make grammar mistakes that make it hard to understand. Most ais are for English because of its relative simplicity and flexibility.
@xyznihall
@xyznihall 3 ай бұрын
Great video. Loved the explanation of diffusion models!
@chrisimir
@chrisimir 3 ай бұрын
my brain diffusion model couldn't stop detecting your hoodies logo as video subtitles
@autotwilo
@autotwilo 3 ай бұрын
LLMs don't discover semantics, they identify probabilistic associations. Semantic meaning is never a part of their learning - these are closer to information entropy representations of bodies of text.
@benjaminavdicevic
@benjaminavdicevic 3 ай бұрын
This! Steve is misinforming his viewers.
@Raygun9000
@Raygun9000 3 ай бұрын
This did strike me as odd until I noticed the script was ai generated. 😉
@thp4983
@thp4983 3 ай бұрын
From a philosophical perspective semantic meaning is nothing but a set of probabilistic associations, i.e. one persons semantic understanding is all but guaranteed to differ from someone elses. Language and understanding of the same is in constant flux, any endeavor to model semantics that do not account for this flux, will never be able to capture language nor understanding. If your understanding of "semantics" is a rigid target, then your semantics of that target, is fundamentally different from the semantics of language. One could say that your weighted associations of the term, differ from those of another's understanding and semantics, and how would you capture this difference in semantics, without identifying these weighted associations and their probabilities?
@dibbidydoo4318
@dibbidydoo4318 3 ай бұрын
if semantics wasn't probabilistic associations then why do certain words mean different things to different people?
@natchu96
@natchu96 3 ай бұрын
​@@dibbidydoo4318 Or even mean different things to the same people when in a different context. Telling someone they've gotten first place prize has severely different connotations if they're participating in a game show as opposed to if they're in a group waiting to he executed.
@jarydm87
@jarydm87 3 ай бұрын
What about a generative ai song that sounds legible and good played forward and in reverse?
@frostden
@frostden 3 ай бұрын
JOIN THE NAVY!
@jarydm87
@jarydm87 3 ай бұрын
YVA NETH NIAJ
@dibbidydoo4318
@dibbidydoo4318 3 ай бұрын
it's quite similar to that sora video where you can choose the end frame of the video and the start frame of the video so you can create a loop.
@redandpigradioshows
@redandpigradioshows 3 ай бұрын
This video recaptures the fascination I had for AI before the investment bubble killed it, one day the bubble will pop and we'll be back to this kind of applications
@gaggix7095
@gaggix7095 3 ай бұрын
This stuff is like 1 year old. Nothing has changed, you can still use SD on your PC.
@TheChrisLeone
@TheChrisLeone 2 ай бұрын
The scannable QR code art is unbelievable, I've seen some that are cityscapes, bookshelves, groups of people, works of art, you name it! A lot of them looked nothing like a QR code but were totally scannable
@OllAxe
@OllAxe 3 ай бұрын
I gotta ask, did you record this in 24 or 30 fps by accident and then apply optical flow to upsample it to 60 fps?
Elastic knots are really mind bending
11:34
Steve Mould
Рет қаралды 1,4 МЛН
How backspin ACTUALLY works - in super slow motion
15:50
Steve Mould
Рет қаралды 984 М.
IL'HAN - Qalqam | Official Music Video
03:17
Ilhan Ihsanov
Рет қаралды 700 М.
These Illusions Fool Almost Everyone
24:55
Veritasium
Рет қаралды 4,2 МЛН
The Invention that Accidentally Made McMansions
14:14
Stewart Hicks
Рет қаралды 1,6 МЛН
The Greenwich Meridian is in the wrong place
25:07
Stand-up Maths
Рет қаралды 1 МЛН
The World's Most Dangerous Greedy Cup
9:00
The Action Lab
Рет қаралды 683 М.
Can I Make QR Code Damascus?
19:23
Alec Steele
Рет қаралды 645 М.
How can a jigsaw have two distinct solutions?
26:23
Stand-up Maths
Рет қаралды 570 М.
The Infinite Pattern That Never Repeats
21:12
Veritasium
Рет қаралды 20 МЛН
The experiment that revealed the atomic world: Brownian Motion
12:26
Steve Mould
Рет қаралды 2,7 МЛН
World Record Chain Fountain? The Mould Effect Explained
21:37
Steve Mould
Рет қаралды 2,5 МЛН
How these impossibly thin cuts are made
9:37
Steve Mould
Рет қаралды 11 МЛН