#(just kidding i don't have to wonder)-- even if they were still a nonprofit i'd have the same problem
Explore tagged Tumblr posts
Note
I appreciate the input, and I understand where you're coming from and already agree with some of what you're saying! CommonCrawl's sets exist for public use for exactly the kind of analysis you describe, this is a good thing, yes. Fully agree with you there and have never disagreed with that. The part where I get lost is...correct me if I'm wrong, but to your way of thinking, the datasets are vast enough to dilute any chance of regurgitated phrases or direct plagiarism, so no harm, no foul? Nobody's livelihood is threatened, therefore all is well?
Assuming that is more or less where you're coming from...again, I hear you, but I have trouble engaging with that stance because it doesn't actually address my problem. I'm not worried about regurgitation. My problem is not with the output at all.
My problem is with scraped data being used for massive profit on a massive scale with no permission or compensation. Full stop, that is where "is this ethical" begins and ends for me. Is it ethical to use someone else's work to generate billions-- not an exaggeration-- of dollars of profit? No. What if it's ten million "someone else"s, is it ethical then? No. What if it's JSTOR developing their own tool, relying on the tool that uses the unpaid work of ten million someone elses-- is that ethical? No. Can that be ethical? No. It can be used to do good things, but it cannot be used ethically; these statements are not mutually exclusive.
From where I'm sitting, the size of the dataset or dilution of any one piece in relation to the whole is not relevant except to indicate how many people have been exploited to develop these tools. I used sand as my analogy for a reason-- it's easy to look at the sandbox of generative AI and say, "no single grain meaningfully influenced the building of that castle. The amount any particular grain contributed to the whole is minimal." No one is hurt when the robot builds a sand castle, so who cares about the individual grains?
Me. I do. The castle could not exist without those individual grains, every single one of which took a human person some amount of time to make (time, and education, and practice, and labor, and thought, and energy; we're talking hours and days and years of work) and every single one of which is being used to generate enormous profit without permission or compensation.
That's my problem. You may not agree that this is a reasonable concern, and that's okay! We'll agree to disagree.
I'll address fair use under the cut, because I think I may not have been super clear on what I meant about that, and trying to explain it got a little long. It doesn't change anything up here, though, so if you wanna skip it that's totally cool. (And yes, let's assume we're talking exclusively about text-based stuff lol, image stuff is a topic for another post. My stance is the same, though.) Anyway, "fair use" in this context refers to a legal doctrine, not a moral judgment.
When I say there are fair use problems with generative AI, I mean that from a legal perspective. You may already have known that, I don't know-- you disagree that there are problems under fair use, but...your post doesn't really discuss fair use at all? Legally? You do sort of touch on one of the factors, the fourth one, and to be clear, it's a solid argument. Another argument would be that use of copyrighted materials in developing and training generative AI is transformative. That's up for debate, but it is an argument I've seen and I understand the reasoning behind it. I also understand why we wouldn't want it to fall under scrutiny.
But there are also arguments against fair use here, enough that several copyright lawsuits to that effect have already been brought against Microsoft and OpenAI and I think a couple of other corporations. (Disclaimer-- I'm an accountant, not a lawyer. What I'm saying is effectively recapping what I've read previously from actual lawyers, and I'm googling as I go to make sure I am not flat-out wrong on the face of this, lol.)
In evaluating a claim under the fair use doctrine, courts typically look at four factors:
Purpose and character of the use, including whether the use is for profit,
Nature of the copyrighted work,
Amount and substantiality of the copyrighted work as a whole, and
Effect of the use upon the potential market for or value of the copyrighted work.
Currently, I believe the defense of AI (and your stance, I think?) has mostly been riding on that last one. No chance of plagiarism means no effect on the market value of the original works! They're diluted beyond recognition! That's points in AI's favor.
But the third point up there is basically asking, "how much of the copyrighted material was used to create the work claimed to be protected under fair use?" and this one is the reason fanartists are, by and large, able to make some money on their fanworks while fanauthors really are not. A drawing is a still image, so it "uses" only small pieces of the original work overall in its creation; a written story, on the other hand, can be (and has been) argued to have "used" a significant portion of the original work. If I paint fanart of something for...idk, Supernatural or some other long-running show and sell it, well, I didn't use a substantial amount of the show to create the art. It's a still image; in context of the show it'd be a single frame among millions. But if I write a 500,000 word fanfic that draws on multiple characters and events and plot points from multiple seasons...that's a lot more of the source material! If I sell that, I'm way more likely to get sued than if I painted something.
So-- amount of source material used in comparison to the whole of the source material and profit generated are both problems under fair use. Here again is core of my argument as to why the current setup is inherently, inescapably unethical.
When it comes to data scraping, the original works in their entirety have been used. And they are being used to generate enormous profit. Microsoft gave ten billion dollars to OpenAI last year, that is not insignificant. Profit and substantiality are problems under the fair use doctrine, and-- again-- enough lawyers have agreed with that statement to take multiple cases to court over this. So far, the courts have not ruled in their favor and I can see why, but my point is simply that this is a fair use issue! We don't have to agree one way or the other on what bits are more or less important-- I'm just explaining why I said what I did and why I do stand by it. Yes, there are arguments to be made in either direction, but if you are familiar with fair use, you will see issues here.
But ultimately, fair use isn't really part of my argument. More just an aside. Maybe generative AI is perfectly defensible on all counts under fair use and I've just got my head up my ass, it's whatever. I'm interested to see how the various cases play out. Either way, even if generative AI is 100% defensible under the fair use doctrine, I do not agree that its use in its current setup is ethical.
If you've made it this far, kudos, and thank you for listening. Again, I absolutely do see your point, and I'm sorry, but I disagree. Theft for profit cannot be diluted to a point where it can be called ethical.
Why is JSTOR using AI? AI is deeply environmentally harmful and steals from creatives and academics.
Thanks for your question. We recognize the potential harm that AI can pose to the environment, creatives, and academics. We also recognize that AI tools, beyond our own, are emerging at a rapid rate inside and outside of academia.
We're committed to leveraging AI responsibly and ethically, ensuring it enhances, rather than replaces, human effort in research and education. Our use of AI aims to provide credible, scholarly support to our users, helping them engage more effectively with complex content. At this point, our tool isn't designed to rework content belonging to creatives and academics. It's designed to allow researchers to ask direct questions and deepen their understanding of complex texts.
Our approach here is a cautious one, mindful of ethical and environmental concerns, and we're dedicated to ongoing dialogue with our community to ensure our AI initiatives align with our core values and the needs of our users. Engagement and insight from the community, positive or negative, helps us learn how we might improve our approach. In this way, we hope to lead by example for responsible AI use.
For more details, please see our Generative AI FAQ.
#i am well aware that the logical end point of my problem is ''this technology should not exist in its current state at all''#and i'm well aware that mine is not a popular stance#but i say this as someone who works with a lot of small businesses (''small'' meaning under $25MM/yr): if your business cannot afford#to pay its employees & contractors living wages#then your business is a failure. you have failed. if the only way you make profit is by exploiting and undervaluing others' work#then your profit is stolen wages#this generative ai dataset nonsense is the same thing but instead of wages it's...royalties. i suppose. residuals.#i don't think there's a fully accurate term for it yet; the law has not caught up#my point is: i cannot claim to support everyone's right to receive the fair value of their labor#and then turn around and cheerfully ask a robot to build me a sandcastle out of stolen fucking labor#that does not fucking follow. i am sorry but those are incompatible stances.#i am not normally this inflexible#but the only way this follows is if you believe art (including written art) is not actually work with any value#in which case#i'm going to break into your home and take an enormous shit in the vegetable drawer of your refrigerator#but also you are factually wrong - it is valuable work - as proven by OpenAI's bottom fucking line#currently built on massive art theft#long post#and yes i am aware of OpenAI Global's corporate structure#it does not actually change my stance#frankly even if they were still a nonprofit-- which now they are a for-profit subsidiary of their parent non-profit (gee i wonder why)#(just kidding i don't have to wonder)-- even if they were still a nonprofit i'd have the same problem#nonprofits still generate profit; the difference is they can't distribute those profits to shareholders#but they can pay them to their employees and executives (:#ai bs
123 notes
·
View notes
Text
Yeah, I mean, as I see it, the competition is the reason WHY one would have a problem with others profiting off one's IP. Like, if I create something and then someone else is profiting from it, that reduces the amount I can profit from it... because people have limited money to spend. (Which I guess is why unpaid fanworks are okay, because like I said they don't actually compete; they don't take money that people could be paying the original artist.)
But I dunno... the fair use thing is hotly debated and I'm not gonna claim I understand it completely.
And from what I can see, neither does most of the U.S. legal system. What can get ruled (in a court case) to be valid fair use seems to depend entirely on the court and the judge and perhaps their current mood.
From what I've read, the idea of fair use was introduced to allow certain important types of commentary.
One example is quoting a piece of writing when you're doing journalism and that piece of writing is a relevant piece of evidence in the story. (I remember a time when some shady "news" outlets tried to challenge this, suing for copyright infringement whenever anyone tried to point out lies in their broadcasts. Luckily they failed for the most part, and this type of commentary is very much accepted as fair use now.)
Another example, of course, is parody. We had socially accepted parodies long before fanfic was considered legal. Usually they would change names to be safe (like when MAD magazine would do a whole comic-format spoof making fun of a popular movie, renaming Ferris Beuller to "Fearless Buller" and so on). But in some cases they didn't even bother with renaming. (I remember a Far Side cartoon featuring the crew of the Starship Enterprise, which was explicitly called the Starship Enterprise in the comic, no attempt to rename it.) I remember thinking about this, as a kid in the 90's, wondering why jokes and mockery seemed to be the only legal type of copyright infringement.
The OTW has done a lot of legal fighting since then to get fanworks accepted as valid fair use. The type of nonprofit use that happens on Ao3 doesn't tend to get challenged in court anymore. But it's still not accepted on the same level as MAD magazine parodies or SNL spoofs, because it's still generally considered infringement to make money off that.
When I try to decide whether I consider something fair use, my usual thought experiment is to put myself in the shoes of the copyright owner. Like:
If I made up a whole imaginary world, populated by my own OCs... and then I published and sold my own stories about it, in the form of novels and comics and so on... what type of fanworks would I be ok with?
How would I feel if someone used my OCs or my original world in their own writing or art, without asking for my permission?
Would the answer change depending on what type of work it was-- written or visual?
Would it depend on whether and how they gave me credit?
Would it change based on whether or how they made a profit from it?
Personally, for me, I would be okay with any of it, AS LONG AS I was credited for whatever ideas were taken from my work.
Monetizing would be okay, from my viewpoint. (I'd be feeling a deep sense of unfairness if the fan creator was making MORE money off my ideas than I was. But I'd accept that this probably meant they were contributing something valuable of their own... and I'd still consider it good publicity, as long as they made it clear where the ideas came from.)
For me, the only thing that would get me actually fighting would be if someone used my ideas WITHOUT crediting me. (And even then, I'd do some deep analysis of whether the ideas were really my property, or whether the other person might have just come up with something very similar on their own. There are limited ways a story can go, and minds tend to gravitate independently toward certain types of ideas, and I'm not going to challenge similarity unless it copies specific details in a specific way.)
But that's just my view. Copyright law in general is stricter than I would tend to be in defending my own copyright.
And what's socially acceptable within fandom can be both stricter and more lenient than actual law.
(More lenient when dealing with fanworks of properties owned by the rich and powerful, like Disney or Paramount. But stricter when dealing with the works of small, struggling artists, or ideas copied from one fanwork by another.)
(And to make it even more confusing, the exact line between "struggling artists" and "rich copyright owners" is also a topic of debate.)
(And I'm not even gonna get into the whole argument over how AI "art" programs are trained.)
(It's all just such a complicated mess, and I do not think it's possible to have a simple opinion on any of it.)
I've been thinking about how the idea that "fanworks are not made for profit" is a big part of the reason why they are usually not attacked for copyright and trademark violation.
(Which is why AO3 cracks down so hard on users advertising paid commissions -- because OTW worked hard to fight corporate lawyers for the right to share fanworks, and only won us the freedoms we have by promising that fanworks were not made for money.)
But the weird part is...
if the owner of the canon property is worried about fan art and fan fiction competing with their official material...
then, wouldn't being free make it much MORE competitive?
I mean, I don't know how often people will actually decide to read fanfiction and look at fan art instead of buying the canon book or movie or show or comic or whatever
But I'm pretty sure that, if it does happen, it's largely because the fanfic and fan art are free
so... ????
I mean, I'm guessing the reason they get way more litigious when there's profit involved, is the same reason why there's usually more legal red tape around selling something than giving it away free...
Which I think is mostly based around the fact that an activity will usually not become widespread enough to be a problem if it doesn't somehow make someone money
But there are exceptions.
There are things people will gladly put lots and lots of effort into, even if it doesn't bring in any money.
And fanworks are definitely among these things.
So in this case... is there any actual reason why for-profit fanworks would be more of a problem than nonprofit fanworks?
I guess maybe... people having limited amounts of money...
Maybe for-profit fanworks would compete with official material just because people might not have enough money to buy BOTH ...
7 notes
·
View notes