#BIG DATA
Explore tagged Tumblr posts
Text
Fifty per cent of web users are running ad blockers. Zero per cent of app users are running ad blockers, because adding a blocker to an app requires that you first remove its encryption, and that’s a felony. (Jay Freeman, the American businessman and engineer, calls this “felony contempt of business-model”.) So when someone in a boardroom says, “Let’s make our ads 20 per cent more obnoxious and get a 2 per cent revenue increase,” no one objects that this might prompt users to google, “How do I block ads?” After all, the answer is, you can’t. Indeed, it’s more likely that someone in that boardroom will say, “Let’s make our ads 100 per cent more obnoxious and get a 10 per cent revenue increase.” (This is why every company wants you to install an app instead of using its website.) There’s no reason that gig workers who are facing algorithmic wage discrimination couldn’t install a counter-app that co-ordinated among all the Uber drivers to reject all jobs unless they reach a certain pay threshold. No reason except felony contempt of business model, the threat that the toolsmiths who built that counter-app would go broke or land in prison, for violating DMCA 1201, the Computer Fraud and Abuse Act, trademark, copyright, patent, contract, trade secrecy, nondisclosure and noncompete or, in other words, “IP law”. IP isn’t just short for intellectual property. It’s a euphemism for “a law that lets me reach beyond the walls of my company and control the conduct of my critics, competitors and customers”. And “app” is just a euphemism for “a web page wrapped in enough IP to make it a felony to mod it, to protect the labour, consumer and privacy rights of its user”.
11K notes
·
View notes
Text
7K notes
·
View notes
Text
147 notes
·
View notes
Text
The almost overnight surge in electricity demand from data centers is now outstripping the available power supply in many parts of the world, according to interviews with data center operators, energy providers and tech executives. That dynamic is leading to years-long waits for businesses to access the grid as well as growing concerns of outages and price increases for those living in the densest data center markets. The dramatic increase in power demands from Silicon Valley’s growth-at-all-costs approach to AI also threatens to upend the energy transition plans of entire nations and the clean energy goals of trillion-dollar tech companies. In some countries, including Saudi Arabia, Ireland and Malaysia, the energy required to run all the data centers they plan to build at full capacity exceeds the available supply of renewable energy, according to a Bloomberg analysis of the latest available data. By one official estimate, Sweden could see power demand from data centers roughly double over the course of this decade — and then double again by 2040. In the UK, AI is expected to suck up 500% more energy over the next decade. And in the US, data centers are projected to use 8% of total power by 2030, up from 3% in 2022, according to Goldman Sachs, which described it as “the kind of electricity growth that hasn’t been seen in a generation.”
21 June 2024
660 notes
·
View notes
Text
look computational psychiatry is a concept with a certain amount of cursed energy trailing behind it, but I'm really getting my ass chapped about a fundamental flaw in large scale data analysis that I've been complaining about for years. Here's what's bugging me:
When you're trying to understand a system as complex as behavioral tendencies, you cannot substitute large amounts of "low quality" data (data correlating more weakly with a trait of interest, say, or data that only measures one of several potential interacting factors that combine to create outcomes) for "high quality" data that inquiries more deeply about the system.
The reason for that is this: when we're trying to analyze data as scientists, we leave things we're not directly interrogating as randomized as possible on the assumption that either there is no main effect of those things on our data, or that balancing and randomizing those things will drown out whatever those effects are.
But the problem is this: sometimes there are not only strong effects in the data you haven't considered, but also they correlate: either with one of the main effects you do know about, or simply with one another.
This means that there is structure in your data. And you can't see it, which means that you can't account for it. Which means whatever your findings are, they won't generalize the moment you switch to a new population structured differently. Worse, you are incredibly vulnerable to sampling bias because the moment your sample fails to reflect the structure of the population you're up shit creek without a paddle. Twin studies are notoriously prone to this because white and middle to upper class twins are vastly more likely to be identified and recruited for them, because those are the people who respond to study queries and are easy to get hold of. GWAS data, also extremely prone to this issue. Anything you train machine learning datasets like ChatGPT on, where you're compiling unbelievably big datasets to try to "train out" the noise.
These approaches presuppose that sampling depth is enough to "drown out" any other conflicting main effects or interactions. What it actually typically does is obscure the impact of meaningful causative agents (hidden behind conflicting correlation factors you can't control for) and overstate the value of whatever significant main effects do manage to survive and fall out, even if they explain a pitiably small proportion of the variation in the population.
It's a natural response to the wondrous power afforded by modern advances in computing, but it's not a great way to understand a complex natural world.
#sciblr#big data#complaints#this is a small meeting with a lot of clinical focus which is making me even more irritated natch#see also similar complaints when samples are systematically filtered
125 notes
·
View notes
Text
Each time you search for something like “how many rocks should I eat” and Google’s AI “snapshot” tells you “at least one small rock per day,” you’re consuming approximately three watt-hours of electricity, according to Alex de Vries, the founder of Digiconomist, a research company exploring the unintended consequences of digital trends. That’s ten times the power consumption of a traditional Google search, and roughly equivalent to the amount of power used when talking for an hour on a home phone. (Remember those?) Collectively, De Vries calculates that adding AI-generated answers to all Google searches could easily consume as much electricity as the country of Ireland.
[...]
This insatiable hunger for power is slowing the transition to green energy. When the owner of two coal-fired power plants in Maryland filed plans to close last year, PJM asked them to keep running till at least 2028 to ensure grid reliability. Meanwhile, AI is also being used to actively increase fossil fuel production. Shell, for example, has aggressively deployed AI to find and produce deep-sea oil. “The truth is that these AI models are contributing in a significant way to climate change, in both direct and indirect ways,” says Tom McBrien, counsel for the Electronic Privacy Information Center, a digital policy watchdog. Even before Google’s AI integration this spring, the average internet user’s digital activity generated 229 kilograms of carbon dioxide a year. That means the world’s current internet use already accounts for about 40 percent of the per capita carbon budget needed to keep global warming under 1.5 degrees Celsius.
20 June 2024
#ai#artificial intelligence#google#big data#energy#internet#climate change#destroy your local AI data centre
238 notes
·
View notes
Link
Popular large language models (LLMs) like OpenAI’s ChatGPT and Google’s Bard are energy intensive, requiring massive server farms to provide enough data to train the powerful programs. Cooling those same data centers also makes the AI chatbots incredibly thirsty. New research suggests training for GPT-3 alone consumed 185,000 gallons (700,000 liters) of water. An average user’s conversational exchange with ChatGPT basically amounts to dumping a large bottle of fresh water out on the ground, according to the new study. Given the chatbot’s unprecedented popularity, researchers fear all those spilled bottles could take a troubling toll on water supplies, especially amid historic droughts and looming environmental uncertainty in the US.
[...]
Water consumption issues aren’t limited to OpenAI or AI models. In 2019, Google requested more than 2.3 billion gallons of water for data centers in just three states. The company currently has 14 data centers spread out across North America which it uses to power Google Search, its suite of workplace products, and more recently, its LaMDa and Bard large language models. LaMDA alone, according to the recent research paper, could require millions of liters of water to train, larger than GPT-3 because several of Google’s thirsty data centers are housed in hot states like Texas; researchers issued a caveat with this estimation, though, calling it an “ approximate reference point.”
Aside from water, new LLMs similarly require a staggering amount of electricity. A Stanford AI report released last week looking at differences in energy consumption among four prominent AI models, estimating OpenAI’s GPT-3 released 502 metric tons of carbon during its training. Overall, the energy needed to train GPT-3 could power an average American’s home for hundreds of years.
2K notes
·
View notes
Text
93 notes
·
View notes