#it’s literally just the low quality bulk to beef up the training data
Explore tagged Tumblr posts
snickerdoodlles · 1 year ago
Text
*pinches nose bridge* even if there wasn’t 6 degrees of separation between AO3 and generative AI, has anyone in this tag even considered that if it was possible for individuals to fuck up generative AI or their training datasets just by writing a/b/o fic, then fascists, bigots, or even just internet trolls could and would fuck it up worse with hate speech
#honestly my first thought here is that you lot need to take a statistics class#you’re not even data bombing???????#ao3 is such a small fraction in the common crawl data even as a whole. it *cant*#and it’s currently requesting to be left out of that anyways now hello??????#not that that even fucking matters???????#ao3 is not used to train AI#the *common crawl* was used in the first stage of training some AIs#which happened to include ao3 amongst the TERABYTES of information within it#and it’s not like the common crawl is the only thing used to train these models??#it’s literally just the low quality bulk to beef up the training data#not to mention at that stage all the data is broken down into strings of integers#the LLMs not even learning *your* words it’s literally just learning words#this is just the base stage training there’s still 3 more stages of training for AIs after that#all of which use much more curated data#some of those stages might include common crawl data but…no? not really highly unlikely not really useful#it’s a web scrape it’s low quality by definition#like. Wikipedia is *right there* and much more useful to them#ao3 just isn’t good training data#a/b/o isn’t even ‘corrupting’ AI???????????#it’d be corrupting AI if ‘knot’ was associated with it over like. rope knots or something#or if it had a predisposition to spitting out omegaverse unprompted#but the examples I’ve seen are just Literally people asking it to write omegaverse#…a LLM giving you exactly what you ask for for even a niche topic means it’s acting exactly the way its trainers want it to#not that that’s even my fucking point here#i get the frustrations behind AI training datasets but we as individuals can’t fuck these things up and that’s a *good* thing
4 notes · View notes