37C3 - Self-cannibalizing AI

Рет қаралды 7,072

media.ccc.de

4 ай бұрын

Artistic Strategies to expose generative text-to-image models
What occurs when machines learn from one another and engage in self-cannibalism within the generative process? Can an image model identify the happiest person or determine ethnicity from a random image? Most state-of-the-art text-to-image implementations rely on a number of limited datasets, models, and algorithms. These models, initially appearing as black boxes, reveal complex pipelines involving multiple linked models and algorithms upon closer examination. We engage artistic strategies like feedback, misuse, and hacking to crack the inner workings of image-generation models. This includes recursively confronting models with their output, deconstructing text-to-image pipelines, labelling images, and discovering unexpected correlations. During the talk, we will share our experiments on investigating Stable-Diffusion pipelines, manipulating aesthetic scoring in extensive public text-to-image datasets, revealing NSFW classification, and utilizing Contrastive Language-Image Pre-training (CLIP) to reveal biases and problematic correlations inherent in the daily use of these models.
The talk will be conducted by sharing various experiments we've done under the umbrella of generative AI models. We will begin with a general idea of how we, as artists/programmers, perceive these models and our research on the workflow of these constructs. Then, we will further elaborate on our exploration of the Stable Diffusion pipeline and datasets. Throughout our investigation, we discovered that some essential parts are all based on the same few datasets, models, and algorithms. This causes us to think that if we investigate deeper into some specific mechanisms, we might be able to reflect on the bigger picture of some political discourses surrounding generative AI models. We deconstructed the models into three steps essential to understanding how they worked: dataset, embedding, and diffusions. Our examples are primarily based on Stable-Diffusion, but some concepts are interchangeable in other generative models.
As datasets and machine-learning models grow in scale and complexity, understanding their nuances becomes challenging. Large datasets, like the one for training Stable Diffusion, are filtered using algorithms often employing machine learning. To "enhance" image generation, LAION's extensive dataset underwent filtering with an aesthetic prediction algorithm that uses machine learning to score the aesthetics of an image with a strong bias towards water-color and oil paintings. Besides the aesthetic scoring of images, images are also scored with a not safe-for-work classifier that outputs a probability of an image containing explicit content . This algorithm comes with its own discriminatory tendencies that we explore in the talk and furthermore asks how and by whom we want our datasets to be filtered and constructed.
Many generative models are built upon Contrastive Language-Image Pre-training (CLIP) and its open-source version, Open-CLIP, which stochastically relates images and texts. These models connect images and text, digitize text, and calculate distances between words and images. However, they heavily rely on a large number of text-image pairs during training, potentially introducing biases into the database. We conducted experiments involving various "false labelling" scenarios and identified correlations. For instance, we used faces from ThisPersonDoesNotExist to determine "happiness" faces, explored ethnicities and occupations on different looks, and analyzed stock images of culturally diverse food. The results often align with human predictions, but does that mean anything?
In the third part, we take a closer look at the image generation process, focusing on the Stable Diffusion pipeline. Generative AI models, like Stable Diffusion, have the ability not only to generate images from text descriptions but also to process existing images. Depending on the settings, they can reproduce input images with great accuracy. However, errors accumulate with each iteration when this AI reproduction is recursively used as input. We observed that images gradually transform into purple patterns or a limited set of mundane concepts depending on the parameters and settings. This raises questions about the models' tendencies to default to learned patterns.
Ting-Chun Liu
Leon-Etienne Kühr
events.ccc.de/congress/2023/h...
#37c3 #ArtBeauty

Пікірлер: 19

@movAX13h 4 ай бұрын

I don't think it's possible to train a net with perfect balance between all categories of input images. Thus it is very easy to understand that they all tend towards one type when not asked for a certain direction.

@sfdntk 4 ай бұрын

Yet another truly superb CCC talk, well done guys. Fascinating subject, and the presentation was one of the best I've seen, you communicated a huge amount of information quickly and efficiently, didn't stumble over your words or mumble, and you were funny and engaging. I have to say, the concentration of amazing talks at CCC is unlike any other conference, so many disparate topics get covered by so many brilliant presenters, it's fantastic.

@arbeitslos4247 4 ай бұрын

SD is at the moment the best method for liberal free speech open source generative AI without content filters. I cannot stress enough how important it is to archive all model versions before censorship, greed and politics destroy them.

@phieyl7105 3 ай бұрын

In that video we are taking the loop midway in the diffusion cycle to get back to the fundamental latent space. We can treat different latent spaces as different roots to the space they define - they operate within their intervals. Deriving to these roots from an original image to these intervals results in the loss of information as you would expect from a derivation function. What this study should show us is that we have the opportunity to create languages based on these complex vector spaces. Now in terms on how this will effect the internet, you have to account for Al drift, and the prospects of using AI to automate hypothesis generation. Either way would lead to a richer internet rather than a static one.

@5mxg 4 ай бұрын

Wonder what kind of image analysis gives results that presentation video input is corrupted and is blanking it during animations

@TheBigLou13 3 ай бұрын

Maybe KZbin tries to indicate what is shown and gets a meaning overflow... Or it's just a video encoding problem between laptop and beamer/recorder....

@haffolderhaus 4 ай бұрын

Very intresting Speech. As long as you can still recognize the errors of the AI, there are still ways to change things. I do not use AI imaging processes. That's why these errors shown are not (yet) relevant for me - but ultimately for society. The passport photo of every German with an ID card or passport is stored. At some point there will be software that can draw "phantom images" using voice input - this AI could have been trained with all of the images mentioned above. There shouldn't be any mistakes!

@Daniel-ve8oi 4 ай бұрын

A great talk. Informative and funny - very well done!

@Monstermoerder1 2 ай бұрын

Is there a reason that whenever the transitions are on screen the screen turns black? Some of the visually most interesting parts of the talk and they're just not shown?

@joachimbergt9842 3 ай бұрын

Could the purple/green degradation be related to the color-space of the substracted noise? If the noise is generated as YUV, the purple is on High U&V, und green on Low U&V. In Photography there is the concept on luma and chroma noise, and Computer Image encoding also uses YUV internally (instead of RGB)

@noxirixon 4 ай бұрын

where can i see the full 10 min latent space animation example from minute 31?

@kreterakete 4 ай бұрын

So long, and thanks for all the fish

@RealisticExpectations 4 ай бұрын

They learn at a much higher rates. We learned that in 2022.

@n.i.g.e.l 4 ай бұрын

Weird question - is anyone else getting this? When he shows the SD XL Turbo images at around 26:20, they look oddly familiar to me - just naming a timestamp but it feels like this for all noise images generated by it.

@NoXnk117 3 ай бұрын

Why is there a picture of mine? :D Somebody used google pictures in his presentation ...

@EstateCritique 4 ай бұрын

1st

@mx338 4 ай бұрын

If the computer creates the image you're not an artist. A technically intersting talk nontheless.

@dieSpinnt 4 ай бұрын

If you program a filter and then the computer changes or creates an image you are an artist. Some are great artists even by "accident", like Mandelbrot;) If you reorder/remix images/music producing something new, you are in most cases an artist. If you program a computer to statistically alter images ... which you stole ... and sell them as your own (or the illusion to to create something new), then you are a scammer, an egoist and a thief. Criminals belong behind bars. Not generated, but real ones;)

@thewhitefalcon8539 3 ай бұрын

You can be an artist by spraying paint randomly on a canvas, if there's a meaning to it. I think computers can be viewed the same. It's not artistically impressive to make a beach scene through Midjourney, but when you trick the model into doing something completely new, like the self-cannibalization loop animation, that is art.