#currently it kinda sucks at all three. but this would help a metric ton with the latter and at least a little with the former
Explore tagged Tumblr posts
walugus-grudenburg · 11 months ago
Text
I'm hoping megacorps stop using shitty mass gathered data for their ML algorithms (Machine Learning, sometimes known as AI, which I will use for brevity but is a poor choice of words for it as while perfectly Artificial it is very functionally different than Intelligence.) The current trend of unlabeled zero QA datasets are horrid and often cause severe stupidity (use Google Docs or similar and you'll see what I mean.) It is extraordinarily expensive to get curated quality-tested datasets that you own to train an AI on. But, it not only solves 99% of moral issues with AI (if you own what it's trained on the "is it stealing" debate goes from a very subjective and contentious battle to pretty much vanishing entirely!) but it also increases the quality to an incredible degree! (though not necessarily a cost-effective one) Now I'm no machine learning scientist or businessperson, but surely at some point going that route's worth it to these companies just to get the courts off your back, right? Sure it's immensely expensive, but they're megacorps. They have the funds. They already spend so much on compute for these, surely they can afford some big data. (An additional benefit is since the data is better, it won't take as much of it, so less compute per quality. This helps decrease long-term costs some (though not as much as it costs to build the datasets unfortunately) but also helps the environment some by spending less power.)
2 notes · View notes
walugus-grudenburg · 11 months ago
Text
This is an example of why unregulated mass-fed (scraped or user generated) data is BAD for making AIs! Big corporations sweetie you're poisoning them! These unlabeled zero QA datasets are horrid and create messes like these! It is extraordinarily expensive to get curated quality-tested datasets that you own to train an AI on. But, it not only solves 99% of moral issues with AI (if you own what it's trained on the "is it stealing" debate goes from a very subjective and contentious battle to vanishing entirely!) but it also increases the quality to an incredible degree! Now I'm no machine learning scientist or businessperson, but surely at some point going that route's worth it to these companies just to get the courts off your back, right? Sure it's immensely expensive, but they're megacorps. (An additional benefit is since the data is better, it won't take as much of it, so less compute per quality. This helps decrease long-term costs some (though not as much as it costs to build the datasets unfortunately) but also helps the environment some by spending less power.)
googledocs you are getting awfully uppity for something that can’t differentiate between “its” and “it’s” correctly
225K notes · View notes