#and as someone who does know how Stable Diffusion works at a pretty robust level
Explore tagged Tumblr posts
Text
I can't make any comments on legality because I have complicated feelings I'll put in the tags, but OP is right and the puzzle art is a really good analogy for how Stable Diffusion works. Like they said, the puzzle piece fitting happens at the pixel level to create "new" art.
For an example, this was taken directly from DALL-E 2 (a popular algorithm and data set) showing its variation function.
[Alt text: On the left. The original "Girl with a Pearl Earring" painting. On the right, AI generated variations of the painting]
The problem becomes that if you describe the image you want well enough, it will generate a straight up stolen image instead of making an amalgamation of images in a new, transformative way.
I use DALL-E to play a game with my AI team at work (yes, I'm one of those evil Artificial Intelligence engineers) where we generate art based on a piece of media and try to guess the media portrayed. It's a fun little game that helps get me going every morning.
Here is my entry for "Final Fantasy VII" along with the prompt I used to generate it.
[ALT TEXT: image prompt reads "A dark and moody anime image of a man with yellow spiky hair holding a large sword staring up at a skyscraper." the man also has a yellow cape despite that not being in the prompt.]
So... that yellow cape, huh? I wonder where that came from...
Honestly? I don't know enough anime to tell you immediately which characters wear yellow capes that DALL-E thought that this render of Cloud needed one, but I can guarantee you it pulled enough anime images and fanart to have learned it somewhere.
And that's really the problem here. I did not tell the AI to make a cape. That cape is not a creation of the AI either because, as OP said, true Sci fi level AI does not exist. The cape is here because the artwork it was trained on had enough anime men in capes to give Cloud one.
That's not a good thing.
How AI Datasets ACTUALLY Work
I love the irony of techbros shouting “YOU NEO-LUDDITES JUST DON’T KNOW HOW THE TECH WORKS” when they obviously don’t know how AI datasets that generate images work.
This is not true “artificial intelligence.” It doesn’t see images, form an understanding of them, then create something new. It’s not like a person looking at photos of frogs and then making a new painting of a frog.
That’s what tech bros seem to not get. True artificial intelligence WOULD be able to do that.
But the machine being fed real art is NOT artificially intelligent. What it does is take dozens of images and break them up into teeny tiny pieces, like a 5000-piece jigsaw puzzle but on a near pixel level. It also takes the understanding of those images based on what people say those images are.
Most puzzle-makers have the same die-cut, which means you can take pieces from multiple puzzles and put them together into something new. I won’t link it here so this post reaches the most people, but look up:
Puzzle Montage Art by Tim Klein
Examples of montage puzzle art:
What he did is EXACTLY what AI image generators do, except instead of using two or three artworks, one AI-generated image might use hundreds. And this is what those who actually understand the technology are trying to get across.
Right now, most things that exist are now fed into image datasets. The number of works that exist in midjourney and stable diffusion number in the literal billions. Datasets have stolen so much art that most people can’t fathom that kind of statistic because we’re just not capable of thinking in those kinds of numbers.
That’s why techbros think an art generator just takes inspiration from the works it has.
In reality, the reason tech-bros state that it takes them hours to get a particular set of prompts “just right” is because they are educating the machine. Basically, if you give it a set of prompts and it gives you something you don’t want, it is internally assigning labels to the pieces of the puzzle it gave you. Eventually, when you DO get what, the same datasets that produced those images will scrape them again and assign labels to pieces based on what is perceived as “correct.”
This is also why it is impossible to remove images from a dataset. Any image used to create an AI work must also exist within the work itself. Removing any one particular image necessitates finding and removing any child data produced from that image, or else the machine can literally just re-scrape it.
Currently, because a dataset uses the data from hundreds of images to create a new work and does not compensate the original artists for the use of their art, this qualifies as theft under international copyright law.
I hope providing a real-world example of how datasets work is helpful.
#i kind of see the eventual place of AI art being akin to fanfiction someday#just transformative enough to exist#but not transformative enough to make money off of#and i don't see that as a bad thing necessarily#i do believe in advancing the field and understanding of AI#and as someone who does know how Stable Diffusion works at a pretty robust level#the technology is super cool and innovative#but we need data to make these things work#and as someone who scrapes AO3 for data on the regular#these tech bros are making my life harder and pretty freakin annoying#because people get spooked#sometimes rightly so#a lot of the time not so rightly#and independent researchers like myself have a hard time doing our research
1K notes
·
View notes