#LAION
Explore tagged Tumblr posts
staruie · 7 months ago
Text
Tumblr media
where my fellow monster fuckers at 👅👅👅👅👅👅
26K notes · View notes
succliberation · 1 year ago
Text
The biggest dataset used for AI image generators had CSAM in it
Tumblr media
Link the original tweet with more info
The LAION dataset has had ethical concerns raised over its contents before, but the public now has proof that there was CSAM used in it.
The dataset was essentially created by scraping the internet and using a mass tagger to label what was in the images. Many of the images were already known to contain identifying or personal information, and several people have been able to use EU privacy laws to get images removed from the dataset.
However, LAION itself has known about the CSAM issue since 2021.
Tumblr media
LAION was a pretty bad data set to use anyway, and I hope researchers drop it for something more useful that was created more ethically. I hope that this will lead to a more ethical databases being created, and companies getting punished for using unethical databases. I hope the people responsible for this are punished, and the victims get healing and closure.
12 notes · View notes
morlock-holmes · 1 year ago
Text
Okay, tech people:
Can anybody tell me what the LAION-5B data set is in layman's terms, as well as how it is used to train actual models?
Everything I have read online is either so technical that it provides zero information to me, or so dumbed down that it provides almost zero information to me.
Here is what I *think* is going on (and I already have enough information to know that in some ways this is definitely wrong.)
LAION uses a web crawler to essentially randomly check publicly accessible web pages. When this crawler finds an image, it creates a record of the image URL, a set of descriptive words from the image ALT text, (and other sources I think?) and some other stuff.
This is compiled into a big giant list of image URLs and descriptive text associated with the URL.
When a model is trained on this data it... I guess... essentially goes to every URL in the list, checks the image, extracts some kind of data from the image file itself, and then associates the data extracted from the image with the discriptive text that LAION has already associated with the image URL?
The big pitfall, apparently, is that there are a lot of images that have been improperly or even illegally posted on the internet publicly with the ability to let crawlers access them even though they shouldn't be public (e.g. medical records or CSAM) and the dataset is too large to actually hand-curate every single entry? So that therefore models trained on the dataset contain some amount of data that legally they should not have, outside and beyond copyright considerations. A secondary problem is that the production of image ALT text is extremely opaque to ordinary users, so certain images that a user might be comfortable posting may, unbeknownst to them, contain ALT text that the user would not like to be disseminated.
Am I even in the ballpark here? It is incredibly frustrating to read multiple news stories about this stuff and still lack the basic knowledge you would need to think about this stuff systematically.
7 notes · View notes
see-fee · 2 years ago
Note
psst ai art is not real art and hurts artists
Real life tends to be far more nuanced than sweeping statements, emotional rhetoric, or conveniently fuzzy definitions. “Artists” are not a monolithic entity and neither are companies. There are different activities with different economics.
I’ll preface the rest of my post with sharing my own background, for personal context:
👩‍🎨 I am an artist. I went to/graduated from an arts college and learned traditional art-making (sculpture to silkscreen printing), and my specialism was in communication design (using the gamut of requisite software like Adobe Illustrator, InDesign, Photoshop, Lightroom, Dreamweaver etc). Many of my oldest friends are career artists—two of whom served as official witnesses to my marriage. Friends of friends have shown at the Venice Biennale, stuff like that. Many are in fields like games, animation, VFX, 3D etc. In the formative years of my life, I’ve worked & collaborated in a wide range of creative endeavours and pursuits. I freelanced under a business which I co-created, ran commercial/for-profit creative events for local musicians & artists, did photography (both digital & analog film, some of which I hand-processed in a darkroom), did some modelling, styling, appeared in student films… the list goes on. I’ve also dabbled with learning 3D using Blender, a free, open source software (note: Blender is an important example I’ll come back to, below). 💸 I am a (budding) patron of the arts. On the other side of the equation, I sometimes buy art: small things like buying friends’ work. I’m also currently holding (very very tiny) stakes in “real” art—as in, actual fine art: a few pieces by Basquiat, Yayoi Kusama, Joan Mitchell. 👩‍💻 I am a software designer & engineer. I spent about an equal number of years in tech: took some time to re-skill in a childhood passion and dive into a new field, then went off to work at small startups (not “big tech”), to design and write software every day.
So I’m quite happy to talk art, tech, and the intersection. I’m keeping tabs on the debate around the legal questions and the lawsuits.
Can an image be stolen if only used in training input, and is never reproduced as output? Can a company be vicariously liable for user-generated content? Legally, style isn’t copyrightable, and for good reason. Copyright law is not one-size-fits-all. Claims vary widely per case.
Flaws in the Anderson vs Stability AI case, aka “stolen images” argument
Read this great simple breakdown by a copyright lawyer that covers reproduction vs. derivative rights, model inputs and outputs, derivative works, style, and vicarious liability https://copyrightlately.com/artists-copyright-infringement-lawsuit-ai-art-tools/
“Getty’s new complaint is much better than the overreaching class action lawsuit I wrote about last month. The focus is where it should be: the input stage ingestion of copyrighted images to train the data. This will be a fascinating fair use battle.”
“Surprisingly, plaintiffs’ complaint doesn’t focus much on whether making intermediate stage copies during the training process violates their exclusive reproduction rights under the Copyright Act. Given that the training images aren’t stored in the software itself, the initial scraping is really the only reproduction that’s taken place.”
“Nor does the complaint allege that any output images are infringing reproductions of any of the plaintiffs’ works. Indeed, plaintiffs concede that none of the images provided in response to a particular text prompt “is likely to be a close match for any specific image in the training data.””
“Instead, the lawsuit is premised upon a much more sweeping and bold assertion—namely that every image that’s output by these AI tools is necessarily an unlawful and infringing “derivative work” based on the billions of copyrighted images used to train the models.”
“There’s another, more fundamental problem with plaintiffs’ argument. If every output image generated by AI tools is necessarily an infringing derivative work merely because it reflects what the tool has learned from examining existing artworks, what might that say about works generated by the plaintiffs themselves? Works of innumerable potential class members could reflect, in the same attenuated manner, preexisting artworks that the artists studied as they learned their skill.”
My thoughts on generative AI: how anti-AI rhetoric helps Big Tech (and harms open-source/independents), how there’s no such thing as “real art”
The AI landscape is still evolving and being negotiated, but fear-mongering and tighter regulations seldom serve anyone’s favour besides big companies. It’s the oldest trick in the book to preserve monopoly and all big corps in major industries have done this. Get a sense of the issue in this article: https://www.forbes.com/sites/hessiejones/2023/04/19/amid-growing-call-to-pause-ai-research-laion-petitions-governments-to-keep-agi-research-open-active-and-responsible/?sh=34b78bae62e3
“AI field is progressing at unprecedented speed; however, training state-of-art AI models such as GPT-4 requires large compute resources, not currently available to researchers in academia and open-source communities; the ‘compute gap’ keeps widening, causing the concentration of AI power at a few large companies.”
“Governments and businesses will become completely dependent on the technologies coming from the largest companies who have invested millions, and by definition have the highest objective to profit from it.”
“The “AGI Doomer” fear-mongering narrative distracts from actual dangers, implicitly advocating for centralized control and power consolidation.”
Regulation & lawsuits benefit massive monopolies: Adobe (which owns Adobe Stock), Microsoft, Google, Facebook et al. Fighting lawsuits, licensing with stock image companies for good PR—like OpenAI (which Microsoft invested $10billion in) and Shutterstock—is a cost which they have ample resources to pay, to protect their monopoly after all that massive investment in ML/AI R&D. The rewards outweigh the risks. They don't really care about ethics, only when it annihilates competition. Regulatory capture means these mega-corporations will continue to dominate tech, and nobody else can compete. Do you know what happens if only Big Tech controls AI? It ain’t gonna be pretty.
Open-source is the best alternative to Big Tech. Pro-corporation regulation hurts open-source. Which hurts indie creators/studios, who will find themselves increasingly shackled to Big Tech’s expensive software. Do you know who develops & releases the LAION dataset? An open-source research org. https://laion.ai/about/ Independent non-profit research orgs & developers cannot afford harsh anti-competition regulatory rigmarole, or multi-million dollar lawsuits, or being deprived of training data, which is exactly what Big Tech wants. Free professional industry-standard software like Blender is open-source, copyleft GNU General Public License. Do you know how many professional 3D artists and businesses rely on it? (Now it’s development fund is backed by industry behemoths.) The consequences of this kind of specious “protest” masquerading as social justice will ultimately screw over these “hurt artists” even harder. It’s shooting the foot. Monkey’s paw. Be very careful what you wish for.
TANSTAAFL: Visual tradespeople have no qualms using tons of imagery/content floating freely around the web to develop their own for-profit output—nobody’s sweating over source provenance or licensing whenever they whip out Google Images or Pinterest. Nobody decries how everything is reposted/reblogged to death when it benefits them. Do you know how Google, a for-profit company, and its massively profitable search product works? “Engines like the ones built by OpenAI ingest giant data sets, which they use to train software that can make recommendations or even generate code, art, or text. In many cases, the engines are scouring the web for these data sets, the same way Google’s search crawlers do, so they can learn what’s on a webpage and catalog it for search queries.”[1] The Authors Guild v. Google case found that Google’s wholesale scanning of millions of books to create its Google Book Search tool served a transformative purpose that qualified as fair use. Do you still use Google products? No man is an island. Free online access at your fingertips to a vast trove of humanity’s information cuts both ways. I’d like to see anyone completely forgo these technologies & services in the name of “ethics”. (Also. Remember that other hyped new tech that’s all about provenance, where some foot-shooting “artists” rejected it and self-excluded/self-harmed, while savvy others like Burnt Toast seized the opportunity and cashed in.)
There is no such thing as “real art.” The definition of “art” is far from a universal, permanent concept; it has always been challenged (Duchamp, Warhol, Kruger, Banksy, et al) and will continue to be. It is not defined by the degree of manual labour involved. A literal banana duct-taped to a wall can be art. (The guy who ate it claimed “performance art”). Nobody in Van Gogh’s lifetime considered his work to be “real art” (whatever that means). He died penniless, destitute, believing himself to be an artistic failure. He wasn’t the first nor last. If a soi-disant “artist” makes “art” and nobody values it enough to buy/commission it, is it even art? If Martin Shkreli buys Wu Tang Clan’s “Once Upon a Time in Shaolin” for USD$2 million, is it more art than their other albums? Value can be ascribed or lost at a moment’s notice, by pretty arbitrary vicissitudes. Today’s trash is tomorrow’s treasure—and vice versa. Whose opinion matters, and when? The artist’s? The patron’s? The public’s? In the present? Or in hindsight?
As for “artists” in the sense of salaried/freelance gig economy trade workers (illustrators, animators, concept artists, game devs, et al), they’ll have to adapt to the new tech and tools like everyone else, to remain competitive. Some are happy that AI tools have improved their workflow. Some were struggling to get paid for heavily commoditised, internationally arbitraged-to-pennies work long before AI, in dehumanising digital sweatshop conditions (dime-a-dozen hands-for-hire who struggled at marketing & distributing their own brand & content). AI is merely a tool. Methods and tools come and go, inefficient ones die off, niches get eroded. Over-specialisation is an evolutionary risk. The existence of AI tooling does not preclude anyone from succeeding as visual creators or Christie’s-league art-world artists, either. Beeple uses AI. The market is information about what other humans want and need, how much it’s worth, and who else is supplying the demand. AI will get “priced in.” To adapt and evolve is to live. There are much greater crises we're facing as a species.
I label my image-making posts as #my art, relative to #my fic, mainly for navigation purposes within my blog. Denoting a subset of my pieces with #ai is already generous on this hellsite entropy cesspool. Anti-AI rhetoric will probably drive some people to conceal the fact that they use AI. I like to be transparent, but not everyone does. Also, if you can’t tell, does it matter? https://youtu.be/1mR9hdy6Qgw
I can illustrate, up to a point, but honing the skill of hand-crafted image-making isn’t worth my remaining time alive. The effort-to-output ratio is too high. Ain’t nobody got time fo dat. I want to tell stories and bring my visions to life, and so do many others. It’s a creative enabler. The democratisation of image-making means that many more people, like the disabled, or those who didn’t have the means or opportunity to invest heavily in traditional skills, can now manifest their visions and unleash their imaginations. Visual media becomes a language more people can wield, and that is a good thing.
Where I’m personally concerned, AI tools don’t replace anything except some of my own manual labour. I am incredibly unlikely to commission a visual piece from another creator—most fanart styles or representations of the pair just don’t resonate with me that much. (I did once try to buy C/Fe merch from an artist, but it was no longer available.) I don’t currently hawk my own visual wares for monetary profit (tips are nice though). No scenario exists which involves me + AI tools somehow stealing some poor artist’s lunch by creating my tchotchkes. No overlap regarding commercial interests. No zero-sum situation. Even if there was, and I was competing in the same market, my work would first need to qualify as a copy. My blog and content is for personal purposes and doesn’t financially deprive anyone. I’ll keep creating with any tool I find useful.
AI art allegedly not being “real art” (which means nothing) because it's perceived as zero-effort? Not always the case. It may not be a deterministic process but some creators like myself still add a ton of human guidance and input—my own personal taste, judgement, labour. Most of my generation pieces require many steps of in-painting, manual hand tweaking, feeding it back as img2img, in a back and forth duet. If you've actually used any of these tools yourself with a specific vision in mind, you’ll know that it never gives you exactly what you want—not on the first try, nor even the hundredth… unless you're happy with something random. (Which some people are. To each their own.) That element of chance, of not having full control, just makes it a different beast. To achieve desired results with AI, you need to learn, research, experiment, iterate, combine, refine—like any other creative process.
If you upload content to the web (aka “release out in the wild”), then you must, by practical necessity, assume it’s already “stolen” in the sense that whatever happens to it afterwards is no longer under your control. Again, do you know how Google, a for-profit company, and its massively profitable search product works? Plagiarism has always been possible. Mass data scraping or AI hardly changed this fact. Counterfeits or bootlegs didn’t arise with the web.
As per blog title and Asimov's last major interview about AI, I’m optimistic about AI overall. The ride may be bumpy for some now, but general progress often comes with short-term fallout. This FUD about R’s feels like The Caves of Steel, like Lije at the beginning [insert his closing rant about humans not having to fear robots]. Computers are good at some things, we’re good at others. They free us up from incidental tedium, so we can do the things we actually want to do. Like shipping these characters and telling stories and making pretty pictures for personal consumption and pleasure, in my case. Most individuals aren’t that unique/important until combined into a statistical aggregate of humanity, and the tools trained on all of humanity’s data will empower us to go even further as a species.
You know what really hurts people? The pandemic which nobody cares about; which has a significant, harmful impact on my body/life and millions of others’. That cost me a permanent expensive lifestyle shift and innumerable sacrifices, that led me to walk away from my source of income and pack up my existence to move halfway across the planet. If you are not zero-coviding—the probability of which is practically nil—I’m gonna have to discount your views on “hurt”, ethics, or what we owe to each other.
We are a non-profit organization with members from all over the world, aiming to make large-scale machine learning models, datasets and related code available to the general public. OUR BELIEFS: We believe that machine learning research and its applications have the potential to have huge positive impacts on our world and therefore should be democratized. PRINCIPLE GOALS: Releasing open datasets, code and machine learning models. We want to teach the basics of large-scale ML research and data management. By making models, datasets and code reusable without the need to train from scratch all the time, we want to promote an efficient use of energy and computing ressources to face the challenges of climate change. FUNDING: Funded by donations and public research grants, our aim is to open all cornerstone results from such an important field as large-scale machine learning to all interested communities.
The “AGI Doomer” fear-mongering narrative distracts from actual dangers, implicitly advocating for centralized control and power consolidation.”
youtube
2 notes · View notes
mudrocksys · 2 years ago
Text
No, Doctors Aren't To Blame for AI Using Your Medical Record Photos, Here's How and Why
People care TOO MUCH about the IP laws of dumb cartoons like Mickey Mouse than the real abuse of data going on, acting like the only conversation worth having is "is copyright good or bad", but as a med student I have a vested interest in talking about data collection ethics.
You're welcome to address my bias or in less kind words say I'm in the pocket of "big pharma" or that I'm a "copyright maximalist" but I'm doing this purely to explain and educate how the LAION team is dishonest, manipulative, malicious and hides behind the good graces of "open-source" and "non-profit".
To start; how does LAION get hold of your photos? To put it shortly: Common Crawl, a service that has indexed and scraped web pages from 2014 until 2021. But, unlike LAION Common Crawl has a TOS, and states on their website that they do not allow users to violate IP or circumvent copy-protection using their data.
Tumblr media
The highlights in orange are important, but for future points.
So how does this affect medical photos? "They shouldn't be on the internet in the first place!" You might say. This is where things get a bit muddy, because in the most popular case being spread the user has signed a consent forum allowing the use of their photos in medical journals, seen here;
Tumblr media
Please make note of the first line, "to be used for my care, medical presentations and/or articles".
So how did it get online?
Despite what a lot of people jump to assume, this most likely was not the fault of the doctor – and unfortunately he's not alive anymore to even clarify what went wrong, RIP. There are many journals online on the user's condition – one which is particularly rare and as such requires study and photos for identification, many with attached images that have been scraped too. This user is most certainly not alone.
For background, PubMed is the largest site for sharing medical journals and case studies on the internet. It contains a wealth of information and is crucial to the safety and sanity of every overworked med student writing their 30th pharmacology paper. It also has attached images in some cases. These images are necessary to the journal, case study, research paper, whathaveyou that's being uploaded. They're also not just uploaded willy nilly. There are consent forms like the one seen above, procedures, and patient rights that are to be respected and honored. What I want to emphasize,
Being on a journal ≠ free to use.
Being online ≠ free to use.
If you do not have the patient's signed consent, you are not allowed to use the image at all, even in a transformative manner. It is not yours to use.
So how does LAION respond to this? Lying like shitty assholes, of course. Vice has done a very insightful article on just what LAION has stored within it and showing many harrowing stories of nonconsensual pornography, graphic executions and CSEM on the database, found here.
A very interesting part of the article that I'd like to draw attention to, though, is LAION team's claims about the copyright applied to these images. The claim in blue that all data falls under creative commons (lying about the copyright to every image) directly contradicts the claim in red (divorcing the team from copyright).
Tumblr media
The claim in orange is stupid because it claims photos of SSNs and addresses directly linked to your name are not personal data if they dont contain your face. It also is not GDP compliant, as they elevate their own definition of what private data is over what your actual private data is.
But whatever, team LAION is on this!! They got it, they'll definitely be pruning their database to remove all of the offending– aaaand they literally just locked the discord help channel, deleted the entire exchange and accused Vice of writing a "hit piece", as reported on by motherboard here. Classy, LAION!
They don't even remove images from their database unless you explicitly email them, and even then they first condescendingly tell you to download the entire database, find the image and the link tied to it, then remove the image from the internet yourself– somehow. Classy, LAION.
Of course, the medical system isn't completely free from blame here, from the new motherboard article;
Zack Marshall, an associate professor of Community Rehabilitation, and Disability Studies at the University of Calgary, has been researching the spread of patient medical photographs in Google Image Search Results and found that in over 70 percent of case reports, at least one image from the report will be found on Google Images. Most patients do not know that they even end up in medical journals, and most clinicians do not know that their patient photographs could be found online. 
“[Clinicians] wouldn't even know to warn their patients. Most of the consent forms say nothing about this. There are some consent forms that have been developed that will at least mention if you were photographed and published in a peer-reviewed journal online, [the image is] out of everybody's hands,” Marshall said. After hearing about the person whose patient photograph was found in the LAION-5B dataset, Marshall said he is trying to figure out what patients and clinicians can do to protect their images, especially as their images are now being used to train AI without their consent. 
It's a case of new risks that people have not been aware of, and of course people can't keep up with the evolving web of tech bro exploiters chomping at the bit to index every image of CSEM and ISIS beheading they can get their hands on. If artists are still trying to get informed on the topic, expecting doctors who share this information for the benefit of other doctors to be hiding it behind expensive paywalls and elaborate gates just to cut off the tech bros is asinine. But regardless, if you don't want to go on a journal, now you are aware of the possibility and can not consent to it in the future.
LAION however can't be held accountable themselves because, despite facilitating abuse, they're not direct participants in the training of data, they just compiled it and served it on a gold platter. But on the bright side, The Federal Trade Commission (FTC) has begun practicing algorithmic destruction, which is demanding that companies and organizations destroy the algorithms or AI models that it has built using personal information and data collected in a bad faith or illegally. FTC Commissioner Rebecca Slaughter published a Yale Journal of Law and Technology article alongside other FTC lawyers that highlighted algorithmic destruction as an important tool that would target the ability of algorithms “to amplify injustice while simultaneously making injustice less detectable” through training their systems on datasets that already contain bias, including racist and sexist imagery. 
“The premise is simple: when companies collect data illegally, they should not be able to profit from either the data or any algorithm developed using it,” they wrote in the article. “This innovative enforcement approach should send a clear message to companies engaging in illicit data collection in order to train AI models: Not worth it.”
This is likely going to be the fate of any algorithms that take advantage of the illegal data collected in LAION-5B.
So what do we take from all of this?
Please read consent forms thoroughly.
Algorithmic destruction should befall LAION-5B and I wouldn't mind if every member on the team is arrested
That's it, that's the whole thing 😊
Addendum, which I know people will ask; Am I against AI art? Well, I'm against unethical bullshit, which LAION team does plenty, and which most if not all AI algorithms are being trained on. While I hate going for the elephant in the room, capitalism is to blame for the absolutely abhorrent implementation of AI, and so it can't exist without being inherently unethical in these conditions.
2 notes · View notes
automundoarg · 2 months ago
Text
LAION, el asistente digital de Peugeot que facilita conocer el nuevo SUV Peugeot 2008
Peugeot ha dado un paso innovador en la industria automotriz argentina al lanzar LAION, un asistente digital con inteligencia artificial diseñada para ofrecer una experiencia de usuario rápida y completa. Este asistente virtual permite a los usuarios acceder a información detallada sobre el nuevo Peugeot 2008, convirtiendo la búsqueda en una interacción 24/7, ágil y precisa. LAION, diseñado con…
0 notes
nfoaivag · 11 months ago
Text
🦁🏆🔵⚪️🔴
0 notes
crafantale · 2 years ago
Text
Tumblr media
I'm going to set something on fire
1 note · View note
laiondataset · 2 years ago
Photo
Tumblr media
“childcare”
thumbpress.com/wp-content/uploads/2015/08/baby-carrier-grocery-basket.jpg
files.namnak.com/users/zt/aup/201809/997_pics/%D8%AA%D8%B5%D8%A7%D9%88%DB%8C%D8%B1-%D8%AC%D8%A7%D9%84%D8%A8-%D9%88-%D8%AE%D9%86%D8%AF%D9%87-%D8%AF%D8%A7%D8%B1.jpg
i1.wp.com/www.teamjimmyjoe.com/wp-content/uploads/2014/08/gambling-baby-worst-parents.jpg?resize=550%2C603
0 notes
axesent · 5 months ago
Text
Tumblr media
Just a heads up to any non AI artists that use red bubble (among many more). They are allowing your work to be used by the LAION-5B data set for use in AI training. haveibeentrained.com is free to use
9 notes · View notes
cure-yell-liker · 1 year ago
Text
Will you be my new mommy? 🥺👉👈
3 notes · View notes
1o1percentmilk · 10 months ago
Text
BITCHHH IM IN A PHILOSOPHY CLASS WHY AM I KNEE DEEP IN THE DOCUMENTATION FOR LAION-5B*
*LAION-5b is the dataset for Stable Diffusion text-to-image generator and currently the world's largest open-access image-text dataset. grins at you
3 notes · View notes
tangibletechnomancy · 2 years ago
Text
So, the good news is, dA and Artstation and other art sites are creating new protocols to tell robots to ignore certain pieces. The bad news is, as they admit, they can only extend that to those pieces on their own websites; reposts to other sites can and will still be scanned for data (on top of the fact that things like robots.txt instructions and do-not-track requests aren't legally binding and there is no recourse if someone just decides to ignore them, which I wholeheartedly believe should be addressed as an issue of privacy law, but that's neither here nor there).
I propose a complementary database - an anti-training database, consisting of the main images found on pages with a noai header. The purpose of this database would be to compare against a training database before it's processed and remove exact visual matches, much like deduplication.
Of course, it's not going to be 100% reliable - no system is, whether it's in tech or in law or both - but it could go a long way toward making things work better when it comes to respecting conventional artists' wishes for their own work.
2 notes · View notes
noxaeternaetc · 9 months ago
Text
"For decades datasets were constructed by human intervention. This generally yielded datasets that are of high quality but too small to make today's LLM’s yield meaningful results.
LAION set out to build a dataset for these newer, hungrier models. They built a dataset that is purely constructed by machine processes, by running models and tweaking thresholds: LAION-5B is made by measure.
But what is getting measured? The quality of data? The capacities of CLIP? The success of a model against a benchmark? The benchmark itself?
[...]
Openness in the AI field matters, not just for model biases, but for the structural biases in the ecosystem. An ongoing problem is that curation by statistics amplifies many of those structural biases."
Models all the way down, Christo Buschek and Jer Thorp.
0 notes
reachartwork · 11 months ago
Text
re: why nightshade/glaze is useless, aka "the chicken is already in the nugget", from the perspective of an Actual Machine Learning Researcher
a bunch of people have privately asked me to answer this aspect of the five points i raised, and i tire of repeating myself, so
Tumblr media
the fundamental oversight here is a lack of recognition that these AI models are not dynamic entities constantly absorbing new data; they are more akin to snapshots of the internet at the time they were trained, which, for the most part, was several years ago.
to put it simply, Nightshade's efforts to alter images and introduce them to the AI in hopes of affecting the model's output are based on an outdated concept of how these models function. the belief that the AI is actively scraping the internet and updating its dataset with new images is incorrect. the LAION datasets, which are the foundation of most if not all modern image synthesis models, were compiled and solidified into the AI's 'knowledge base' long ago. The process is not ongoing; it's historical.
Tumblr media
i think it's important for people to understand that Nightshade is fighting is against an already concluded war. the datasets have been created, the models have been trained, and the 'internet scraping' phase is not an ongoing process for these AI. the notion that AI is an ever-updating Skynet seeking to cannibalize all your art (or that the companies using it are constantly seeking out new art to add to the pile) is a science fiction myth, not a reality.
(for the many other reasons why it won't work see my other post. really i just wanted an excuse to make and post these two sloppy meme edits).
cheers
1K notes · View notes
valdevia · 3 months ago
Note
Hi! Genuine question, how do you know when one of your pieces has been stolen by AI dudebros?
You can search in haveibeentrained! It searches through LAION 5b, the biggest database of internet images that AI companies have been using. They now allow you to search by URL too, so if you have a website you can use that to see if they took something from it.
278 notes · View notes