Tumgik
#BIG DATA
Text
Tumblr media
Fifty per cent of web users are running ad blockers. Zero per cent of app users are running ad blockers, because adding a blocker to an app requires that you first remove its encryption, and that’s a felony. (Jay Freeman, the American businessman and engineer, calls this “felony contempt of business-model”.) So when someone in a boardroom says, “Let’s make our ads 20 per cent more obnoxious and get a 2 per cent revenue increase,” no one objects that this might prompt users to google, “How do I block ads?” After all, the answer is, you can’t. Indeed, it’s more likely that someone in that boardroom will say, “Let’s make our ads 100 per cent more obnoxious and get a 10 per cent revenue increase.” (This is why every company wants you to install an app instead of using its website.) There’s no reason that gig workers who are facing algorithmic wage discrimination couldn’t install a counter-app that co-ordinated among all the Uber drivers to reject all jobs unless they reach a certain pay threshold. No reason except felony contempt of business model, the threat that the toolsmiths who built that counter-app would go broke or land in prison, for violating DMCA 1201, the Computer Fraud and Abuse Act, trademark, copyright, patent, contract, trade secrecy, nondisclosure and noncompete or, in other words, “IP law”. IP isn’t just short for intellectual property. It’s a euphemism for “a law that lets me reach beyond the walls of my company and control the conduct of my critics, competitors and customers”. And “app” is just a euphemism for “a web page wrapped in enough IP to make it a felony to mod it, to protect the labour, consumer and privacy rights of its user”.
11K notes · View notes
fuzzyghost · 6 months
Text
Tumblr media
7K notes · View notes
victusinveritas · 2 months
Text
Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media
143 notes · View notes
grison-in-space · 2 months
Text
look computational psychiatry is a concept with a certain amount of cursed energy trailing behind it, but I'm really getting my ass chapped about a fundamental flaw in large scale data analysis that I've been complaining about for years. Here's what's bugging me:
When you're trying to understand a system as complex as behavioral tendencies, you cannot substitute large amounts of "low quality" data (data correlating more weakly with a trait of interest, say, or data that only measures one of several potential interacting factors that combine to create outcomes) for "high quality" data that inquiries more deeply about the system.
The reason for that is this: when we're trying to analyze data as scientists, we leave things we're not directly interrogating as randomized as possible on the assumption that either there is no main effect of those things on our data, or that balancing and randomizing those things will drown out whatever those effects are.
But the problem is this: sometimes there are not only strong effects in the data you haven't considered, but also they correlate: either with one of the main effects you do know about, or simply with one another.
This means that there is structure in your data. And you can't see it, which means that you can't account for it. Which means whatever your findings are, they won't generalize the moment you switch to a new population structured differently. Worse, you are incredibly vulnerable to sampling bias because the moment your sample fails to reflect the structure of the population you're up shit creek without a paddle. Twin studies are notoriously prone to this because white and middle to upper class twins are vastly more likely to be identified and recruited for them, because those are the people who respond to study queries and are easy to get hold of. GWAS data, also extremely prone to this issue. Anything you train machine learning datasets like ChatGPT on, where you're compiling unbelievably big datasets to try to "train out" the noise.
These approaches presuppose that sampling depth is enough to "drown out" any other conflicting main effects or interactions. What it actually typically does is obscure the impact of meaningful causative agents (hidden behind conflicting correlation factors you can't control for) and overstate the value of whatever significant main effects do manage to survive and fall out, even if they explain a pitiably small proportion of the variation in the population.
It's a natural response to the wondrous power afforded by modern advances in computing, but it's not a great way to understand a complex natural world.
126 notes · View notes
Text
The almost overnight surge in electricity demand from data centers is now outstripping the available power supply in many parts of the world, according to interviews with data center operators, energy providers and tech executives. That dynamic is leading to years-long waits for businesses to access the grid as well as growing concerns of outages and price increases for those living in the densest data center markets. The dramatic increase in power demands from Silicon Valley’s growth-at-all-costs approach to AI also threatens to upend the energy transition plans of entire nations and the clean energy goals of trillion-dollar tech companies. In some countries, including Saudi Arabia, Ireland and Malaysia, the energy required to run all the data centers they plan to build at full capacity exceeds the available supply of renewable energy, according to a Bloomberg analysis of the latest available data. By one official estimate, Sweden could see power demand from data centers roughly double over the course of this decade — and then double again by 2040. In the UK, AI is expected to suck up 500% more energy over the next decade. And in the US, data centers are projected to use 8% of total power by 2030, up from 3% in 2022, according to Goldman Sachs, which described it as “the kind of electricity growth that hasn’t been seen in a generation.”
21 June 2024
657 notes · View notes
fuzzyghost · 3 months
Text
Tumblr media
93 notes · View notes
Text
Each time you search for something like “how many rocks should I eat” and Google’s AI “snapshot” tells you “at least one small rock per day,” you’re consuming approximately three watt-hours of electricity, according to Alex de Vries, the founder of Digiconomist, a research company exploring the unintended consequences of digital trends. That’s ten times the power consumption of a traditional Google search, and roughly equivalent to the amount of power used when talking for an hour on a home phone. (Remember those?) Collectively, De Vries calculates that adding AI-generated answers to all Google searches could easily consume as much electricity as the country of Ireland.
[...]
This insatiable hunger for power is slowing the transition to green energy. When the owner of two coal-fired power plants in Maryland filed plans to close last year, PJM asked them to keep running till at least 2028 to ensure grid reliability. Meanwhile, AI is also being used to actively increase fossil fuel production. Shell, for example, has aggressively deployed AI to find and produce deep-sea oil. “The truth is that these AI models are contributing in a significant way to climate change, in both direct and indirect ways,” says Tom McBrien, counsel for the Electronic Privacy Information Center, a digital policy watchdog. Even before Google’s AI integration this spring, the average internet user’s digital activity generated 229 kilograms of carbon dioxide a year. That means the world’s current internet use already accounts for about 40 percent of the per capita carbon budget needed to keep global warming under 1.5 degrees Celsius.
20 June 2024
238 notes · View notes
fuzzyghost · 1 year
Text
Tumblr media
409 notes · View notes
seobot · 1 year
Photo
Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media
633 notes · View notes
Link
Popular large language models (LLMs) like OpenAI’s ChatGPT and Google’s Bard are energy intensive, requiring massive server farms to provide enough data to train the powerful programs. Cooling those same data centers also makes the AI chatbots incredibly thirsty. New research suggests training for GPT-3 alone consumed 185,000 gallons (700,000 liters) of water. An average user’s conversational exchange with ChatGPT basically amounts to dumping a large bottle of fresh water out on the ground, according to the new study. Given the chatbot’s unprecedented popularity, researchers fear all those spilled bottles could take a troubling toll on water supplies, especially amid historic droughts and looming environmental uncertainty in the US.
[...]
Water consumption issues aren’t limited to OpenAI or AI models. In 2019, Google requested more than 2.3 billion gallons of water for data centers in just three states. The company currently has 14 data centers spread out across North America which it uses to power Google Search, its suite of workplace products, and more recently, its LaMDa and Bard large language models. LaMDA alone, according to the recent research paper, could require millions of liters of water to train, larger than GPT-3 because several of Google’s thirsty data centers are housed in hot states like Texas; researchers issued a caveat with this estimation, though, calling it an “ approximate reference point.”
Aside from water, new LLMs similarly require a staggering amount of electricity. A Stanford AI report released last week looking at differences in energy consumption among four prominent AI models, estimating OpenAI’s GPT-3 released 502 metric tons of carbon during its training. Overall, the energy needed to train GPT-3 could power an average American’s home for hundreds of years.
2K notes · View notes
pb-dot · 1 year
Text
Some Thoughts on the Reddit Blackout
Like many new arrivals on Tumblr these days, I used to be a Redditor until recent developments encouraged me to take my business elsewhere, and I have been following the development of the story as thoroughly as I can without actually giving Reddit any more traffic. With the most recent development of the Reddit admin corps taking on a suite of strategies lifted straight from the depression-era railroad baron playbook, I figured the time has come to talk a little about the wider implications of this whole story.
The Tech sector is, to the best of my understanding, in a vulnerable place right now. After the Web 2.0 gold rush and years of consolidation and growth from the biggest actors, your Alphabets, Twitters, Metas, and so on, many of the larger sites and services are reaching the largest size they can expect to grow to. How, for instance, could Facebook or Twitter grow much more now that everyone and their mother is on Facebook and Twitter? Prior to the Musk buyout, Twitter seemingly settled on upping engagement, making sure people were on Twitter longer and invested more energy and emotion in the platform, usually by making damn sure the discourse zapping through that hellhole was as polarizing and hostile as possible. Meta, meanwhile, has been making bank on user data as advertisers, AI folks, and any number of other actors salivate over getting their hands on the self-updating contact and interest registry that is Facebook.
With the rise of what we apparently have decided to call AI, data is now more valuable than ever. I consider this to be yet another Tech Hype Bubble on the level of NFTs or Metaverses, but, like with the two above, I can imagine it's hard to explain that when you are a Tech CEO and your shareholders ask you "Hey, how do you plan on earning us money off of this AI/NFT/Metaverse thing?" This is not to say CEO Steve Huffman isn't handling this whole thing with the grace of a three-legged hippo, but merely to suggest that his less-than-laudable decisions and actions in this mess don't arise from his character alone but also is a result of wider systemic issues.
One of these issues is the complicated role user data plays in modern websites and -services. Since its inception as a publicly accessible space, the question of how to monetize the Internet has been a tricky one for site and service owners. Selling ad space on your website or service has long been the go-to, but this in itself presents its own issues, having to curate content that is considered ad-friendly, malicious or careless actors making using said service or website less attractive for customers, and finally how to convince your advertisers that they get what they pay for in the first place, ie. "how do I know people even look at our ads?" All of this is before you even stop to consider how ads massively favor large, established actors.
It's no small wonder, then, that several startups in the era of internet mass adoption chose to forgo ads, or at least massively deprioritize them and/or relaunch them as "promoted posts," in an attempt to escape the stigma around ads. Meta/Facebook is probably the biggest fish in this particular pond, but we also see other services such as Twitter and Reddit follow the same pattern.
What makes this work is that the data these platforms collect from their users isn't all that valuable on a person-to-person basis, knowing that so-and-so is 32 years old, lives in a traditionally conservative part of the city, goes to Starbucks a lot, and listens to Radiohead isn't particularly useful information for anyone but a dedicated but lazy stalker; When viewed as an aggregate, however, large collections of data on a large population becomes quite valuable. This is especially true if you're working with, say, targeted ads or political campaigns. Look no further than the Cambridge Analytica data scandal for an example.
Now, all this is to illustrate the strange position the user occupies in Web 2.0. We tend to think of ourselves as the customer of Facebook, Reddit, Tumblr, and so on, but it isn't the case. After all, we don't pay for these services, and if we do it's to buy freedom from ads or other minor service modifications. It is more correct to say that we make up the product itself. This is true in two respects, first, an active social community is vital for social media to not be entirely pointless, and second, we generate the data that the platform holder seeks to monetize. This hybrid product/participant role doesn't map cleanly to traditional understandings of "worker," but I argue it is a closer fit than "customer."
All of this is to say that it is immensely gratifying to see the Reddit Blackout taking the shape of a strike rather than the more typical boycott model we've seen in the internet-based protests of yesteryear. Much of this, I think, we can thank the participating Reddit moderators. While the regular platform user can be *argued* to be a worker, the moderator inarguably is one, and the fact that they aren't paid for their efforts is more a credit to the prosocial nature of humans than to the corporate acumen of the platform holders. Either way, moderating a subreddit is work, if the subreddit is large, it's quite a lot of work, and moderators keeping malicious actors, scammers, and hatemongers out of everyone's hair is a must for any decently sized social space to not be an objectively terrible experience. So, if you were to, for example, withhold your labor (moderating for free) which you as a worker can do, it would be plain irresponsible to leave the place open for said bad apples to ruin everyone's bunches, thus the shutdowns.
I don't think it's a controversial take to claim that the Reddit admins also view this more as a strike than a boycott, given their use of scabs, intimidation, and other strikebreaking tactics in an attempt to break the thing up. This is nothing new, and the fact that Reddit admins are willing to stoop to these scumbag tactics tells us that their bluster about the shutdown not affecting their bottom line is nothing more than shareholder-placating hot air.
As this entire screed has perhaps demonstrated, I believe the Reddit Blackout is important. My stay at Tumblr so far has been excellent and will probably continue past this strike no matter what outcome it has, but for others in my situation, or perhaps entirely alien to the Reddit biome, I ask you to consider: If we do not stop this level of consumer and user-unfriendly bullshit Reddit have been pulling on the API change, where will it pop up next? Who's to say the next bright idea in corpo-hell isn't "Hey boss, how about we charge these nerd losers a dollar per reblog? And maybe a fiver for a Golden Reblog (TM)?"
This is perhaps getting into grandstanding, but I believe we are way past due for a renegotiation of what it means to be a platform holder and -user on this hot mess of an internet. If we as users do not take an active, strong stance on the matter, the Steve Huffmans, Elon Musks, and Mark Zuckerbergs of the world will decide without us. One does not have to be a fortune teller to see that the digital world this would create would not have our best interests in mind any more than the current one does.
So, in closing, I wish to extend my wholehearted support to the participating Moderators of Reddit and everyone who has decided to take their business elsewhere for the duration of the shutdown. Even without getting into the nitty-gritty of the API situation, this is a fight worth having, and may we through it make a world that's just a little bit less shitty.
Become Ungovernable
Become Unprofitable
Stay that way.
156 notes · View notes