#data bias
Explore tagged Tumblr posts
itellmyselfsecrets · 1 month ago
Text
“Several studies conducted over the past decade or so show that letters of recommendation are another seemingly gender-neutral part of a hiring process that is in fact anything but. One U.S. study found that female candidates are described with more communal (warm; kind; nurturing) and less active (ambitious; self-confident) language than men. And having communal characteristics included in your letter of recommendation makes it less likely that you will get the job, particularly if you're a woman: while 'team-player' is taken as a leadership quality in men, for women the term ‘can make a woman seem like a follower’”. - Caroline Criado Perez (Invisible Women: Data Bias in a World Designed for Men)
151 notes · View notes
brownwomanisland · 2 months ago
Text
You ever randomly remember excerpts from Invisible Women: Exposing Data Bias in a World Designed for Men by Caroline Criado-Perez and get mad all over again? Like there's no place on Earth where a woman won't be overlooked, ignored, taken for granted, expected to be in pain, or expected to suffer and die.
2 notes · View notes
skannar · 1 year ago
Text
I love good Audiobooks on new tech.
2 notes · View notes
jcmarchi · 1 month ago
Text
Reducing AI Hallucinations with MoME: How Memory Experts Enhance LLM Accuracy
New Post has been published on https://thedigitalinsider.com/reducing-ai-hallucinations-with-mome-how-memory-experts-enhance-llm-accuracy/
Reducing AI Hallucinations with MoME: How Memory Experts Enhance LLM Accuracy
Artificial Intelligence (AI) is transforming industries and reshaping our daily lives. But even the most intelligent AI systems can make mistakes. One big problem is AI hallucinations, where the system produces false or made-up information. This is a serious issue in healthcare, law, and finance, where getting things right is critical.
Though Large Language Models (LLMs) are incredibly impressive, they often struggle with staying accurate, especially when dealing with complex questions or retaining context. Addressing this issue requires a new approach, and the Mixture of Memory Experts (MoME) offers a promising solution. By incorporating advanced memory systems, MoME improves how AI processes information, enhancing accuracy, reliability, and efficiency. This innovation sets a new standard for AI development and leads to smarter and more dependable technology.
Understanding AI Hallucinations
AI hallucinations occur when a model produces outputs that may seem logical but are factually incorrect. These errors arise from processing data, relying on patterns rather than correctly understanding the content. For instance, a chatbot might provide incorrect medical advice with exaggerated uncertainty, or an AI-generated report could misinterpret crucial legal information. Such mistakes can lead to significant consequences, including misdiagnoses, flawed decisions, or financial losses.
Traditional LLMs are built to predict the next word or sentence based on patterns learned from their training data. While this design enables them to generate fluent and coherent outputs, it often prioritizes what sounds plausible over what is accurate. These models may invent information to fill the gaps when dealing with ambiguous or incomplete inputs. Additionally, biases present in the training data can further enhance these problems, resulting in outputs that perpetuate inaccuracies or reflect underlying biases.
Efforts to address these issues, such as fine-tuning models or using Retrieval-Augmented Generation (RAG), have shown some promise but are limited in handling complex and context-sensitive queries. These challenges highlight the need for a more advanced solution capable of adapting dynamically to different inputs while maintaining contextual accuracy. The MoME offers an innovative and reliable approach to addressing the limitations of traditional AI models.
What is MoME?
The MoME is a new architecture that transforms how AI systems handle complex tasks by integrating specialized memory modules. Unlike traditional models that rely on activating all components for every input, MoME uses a smart gating mechanism to activate only the memory modules that are most relevant to the task at hand. This modular design reduces computational effort and improves the model’s ability to process context and handle complex information.
Fundamentally, MoME is built around memory experts, dedicated modules designed to store and process contextual information specific to particular domains or tasks. For example, in a legal application, MoME might activate memory modules specializing in case law and legal terminology. By focusing only on the relevant modules, the model produces more accurate and efficient results.
This selective engagement of memory experts makes MoME particularly effective for tasks that require deep reasoning, long-context analysis, or multi-step conversations. By efficiently managing resources and zeroing in on contextually relevant details, MoME overcomes many challenges traditional language models face, setting a new benchmark for accuracy and scalability in AI systems.
Technical Implementation of MoME
The MoME is designed with a modular architecture that makes it efficient and flexible for handling complex tasks. Its structure includes three main components: memory experts, a gating network, and a central processing core. Each memory expert focuses on specific types of tasks or data, such as legal documents, medical information, or conversational contexts. The gating network is a decision-maker, selecting the most relevant memory experts based on the input. This selective approach ensures the system only uses the necessary resources, improving speed and efficiency.
A key feature of MoME is its scalability. New memory experts can be added as required, allowing the system to handle various tasks without significantly increasing resource demands. This makes it suitable for tasks requiring specialized knowledge and adaptability, such as real-time data analysis or personalized AI applications.
Training MoME involves several steps. Each memory expert is trained on domain-specific data to ensure it can handle its designated tasks effectively. For instance, a memory expert for healthcare might be trained using medical literature, research, and patient data. Using supervised learning techniques, the gating network is then trained to analyze input data and determine which memory experts are most relevant for a given task. Fine-tuning is performed to align all components, ensuring smooth integration and reliable performance across various tasks.
Once deployed, MoME continues to learn and improve through reinforcement mechanisms. This enables it to adapt to new data and changing requirements, maintaining its effectiveness over time. With its modular design, efficient activation, and continuous learning capabilities, MoME provides a flexible and reliable solution for complex AI tasks.
How MoME Reduces AI Errors?
MoME handles the issue of AI errors, such as hallucinations, by using a modular memory design that ensures the model retains and applies the most relevant context during the generation process. This approach addresses one of the primary reasons for errors in traditional models: the tendency to generalize or fabricate information when faced with ambiguous inputs.
For example, consider a customer service chatbot tasked with handling multiple interactions from the same user over time. Traditional models often struggle to maintain continuity between conversations, leading to responses that lack context or introduce inaccuracies. MoME, on the other hand, activates specific memory experts trained in conversational history and customer behavior. When a user interacts with the chatbot, MoME’s gating mechanism ensures that the relevant memory experts are dynamically engaged to recall previous interactions and tailor responses accordingly. This prevents the chatbot from fabricating information or overlooking critical details, ensuring a consistent and accurate conversation.
Similarly, MoME can reduce errors in medical diagnostics by activating memory modules trained on healthcare-specific data, such as patient histories and clinical guidelines. For instance, if a doctor consults an AI system to diagnose a condition, MoME ensures that only the relevant medical knowledge is applied. Instead of generalizing all medical data, the model focuses on the specific context of the patient’s symptoms and history, significantly lowering the risk of producing incorrect or misleading recommendations.
By dynamically engaging the correct memory experts for the task, MoME addresses the root causes of AI errors, ensuring contextually accurate and reliable outputs. This architecture sets a higher standard for precision in critical applications like customer service, healthcare, and beyond.
Challenges and Limitations of MoME
Despite its transformative potential, MoME has several challenges. Implementing and training MoME models requires advanced computational resources, which may limit accessibility for smaller organizations. The complexity of its modular architecture also introduces additional considerations in terms of development and deployment.
Bias is another challenge. Since the performance of memory experts depends on the quality of their training data, any biases or inaccuracies in the data can influence the model’s outputs. Ensuring fairness and transparency in MoME systems will require rigorous data curation and ongoing monitoring. Addressing these issues is essential to building trust in AI systems, particularly in applications where impartiality is critical.
Scalability is another area that requires attention. As the number of memory experts increases, managing and coordinating these modules becomes more complex. Future research must optimize gating mechanisms and explore hybrid architectures that balance scalability with efficiency. Overcoming these challenges will be essential to realize MoME’s full potential.
The Bottom Line
In conclusion, the MoME is a significant step forward in addressing the limitations of traditional AI models, particularly when it comes to reducing errors like hallucinations. Using its modular memory design and dynamic gating mechanisms, MoME delivers contextually accurate and reliable outputs, making it an invaluable tool for critical applications in healthcare, customer service, and beyond.
While challenges such as resource requirements, data bias, and scalability remain, MoME’s innovative architecture provides a solid foundation for future advancements in AI. With ongoing improvements and careful implementation, MoME has the potential to redefine how AI systems operate, paving the way for smarter, more efficient, and trustworthy AI solutions across industries.
0 notes
rachel-sylvan-author · 9 months ago
Text
Tumblr media
"Invisible Women" by Caroline Criado-Perez
Thank you @womensbookclub_paris for the rec! ❤️
0 notes
fluffyhummel · 1 month ago
Photo
something something biased data presentation... no yellow at all, just jumps from These Countries Allow 15 Year Old Kids to Do the Icky to And Heres Everyone Who Does It Right
also didn't i read something the other day about kids being allowed to marry at fourteen or less in certain us states? somehow i cannot imagine they'd let people do that and then come back and have rules saying they have to wait several years to have the wedding night. i bet theres some sort of 'religious freedom' exception for marriage, might have to look that up...
Tumblr media
Age of consent by country
121 notes · View notes
filehulk · 2 years ago
Text
Natural Language Processing with ChatGPT: Unlocking Human-Like Conversations
Natural Language Processing (NLP) has witnessed significant advancements in recent years, empowering machines to understand and generate human-like text. One remarkable breakthrough in this domain is ChatGPT, a cutting-edge language model that leverages state-of-the-art techniques to engage in conversational exchanges. In this article, we delve into the underlying technology of ChatGPT, its…
Tumblr media
View On WordPress
1 note · View note
selfindulgentcompetition · 4 months ago
Text
TRYING AGAIN WITH CLEARER WORDING. PLS READ BEFORE VOTING
*Meaning: When did you stop wearing a mask to a majority of your public activities? Wearing a mask when you feel sick or very rarely for specific events/reasons counts as “stopping”
[More Questions Here]
624 notes · View notes
ouaw-facts-i-just-made-up · 4 months ago
Text
YOU, the person who watches Once Upon A Witchlight, are autistic
207 notes · View notes
itellmyselfsecrets · 15 days ago
Text
“A UK Department for Transport study highlighted the stark difference between male and female perceptions of danger, finding that 62% of women are scared walking in multi-story car parks, 60% are scared waiting on train platforms, 49% are scared waiting at the bus stop, and 59% are scared walking home from a bus stop or station.
The figures for men are 31%, 25%, 20% and 25%, respectively. Fear of crime is particularly high among low-income women, partly because they tend to live in areas with higher crime rates, but also because they are likely to be working odd hours and often come home from work in the dark.' Ethnic-minority women tend to experience more fear for the same reasons, as well as having the added danger of (often gendered) racialised violence to contend with.” - Caroline Criado Perez (Invisible Women: Data Bias in a World Designed for Men)
2 notes · View notes
markscherz · 7 months ago
Note
Are you familiar with this frog?
Tumblr media Tumblr media
Yeah pretty sure that's Larry from down the pub. 'Ullo, Larry!
But in all seriousness, I'm afraid I cannot help without location information. Orientation within Bufonidae without location is a nightmare. If this is Africa, we're talking genus Sclerophrys. If it's the USA, it's probably Anaxyrus. If it's Europe, it's probably Bufo. If it's South America we're in Rhinella territory. And so on, and so forth.
171 notes · View notes
skannar · 1 year ago
Text
0 notes
jcmarchi · 3 months ago
Text
Tackling Misinformation: How AI Chatbots Are Helping Debunk Conspiracy Theories
New Post has been published on https://thedigitalinsider.com/tackling-misinformation-how-ai-chatbots-are-helping-debunk-conspiracy-theories/
Tackling Misinformation: How AI Chatbots Are Helping Debunk Conspiracy Theories
Misinformation and conspiracy theories are major challenges in the digital age. While the Internet is a powerful tool for information exchange, it has also become a hotbed for false information. Conspiracy theories, once limited to small groups, now have the power to influence global events and threaten public safety. These theories, often spread through social media, contribute to political polarization, public health risks, and mistrust in established institutions.
The COVID-19 pandemic highlighted the severe consequences of misinformation. The World Health Organization (WHO) called this an “infodemic,” where false information about the virus, treatments, vaccines, and origins spread faster than the virus itself. Traditional fact-checking methods, like human fact-checkers and media literacy programs, needed to catch up with the volume and speed of misinformation. This urgent need for a scalable solution led to the rise of Artificial Intelligence (AI) chatbots as essential tools in combating misinformation.
AI chatbots are not just a technological novelty. They represent a new approach to fact-checking and information dissemination. These bots engage users in real-time conversations, identify and respond to false information, provide evidence-based corrections, and help create a more informed public.
The Rise of Conspiracy Theories
Conspiracy theories have been around for centuries. They often emerge during uncertainty and change, offering simple, sensationalist explanations for complex events. These narratives have always fascinated people, from rumors about secret societies to government cover-ups. In the past, their spread was limited by slower information channels like printed pamphlets, word-of-mouth, and small community gatherings.
The digital age has changed this dramatically. The Internet and social media platforms like Facebook, Twitter, YouTube, and TikTok have become echo chambers where misinformation booms. Algorithms designed to keep users engaged often prioritize sensational content, allowing false claims to spread quickly. For example, a report by the Center for Countering Digital Hate (CCDH) found that just twelve individuals and organizations, known as the “disinformation dozen,” were responsible for nearly 65% of anti-vaccine misinformation on social media in 2023. This shows how a small group can have a huge impact online.
The consequences of this unchecked spread of misinformation are serious. Conspiracy theories weaken trust in science, media, and democratic institutions. They can lead to public health crises, as seen during the COVID-19 pandemic, where false information about vaccines and treatments hindered efforts to control the virus. In politics, misinformation fuels division and makes it harder to have rational, fact-based discussions. A 2023 study by the Harvard Kennedy School’s Misinformation Review found that many Americans reported encountering false political information online, highlighting the widespread nature of the problem. As these trends continue, the need for effective tools to combat misinformation is more urgent than ever.
How AI Chatbots Are Equipped to Combat Misinformation
AI chatbots are emerging as powerful tools to fight misinformation. They use AI and Natural Language Processing (NLP) to interact with users in a human-like way. Unlike traditional fact-checking websites or apps, AI chatbots can have dynamic conversations. They provide personalized responses to users’ questions and concerns, making them particularly effective in dealing with conspiracy theories’ complex and emotional nature.
These chatbots use advanced NLP algorithms to understand and interpret human language. They analyze the intent and context behind a user’s query. When a user submits a statement or question, the chatbot looks for keywords and patterns that match known misinformation or conspiracy theories. For example, suppose a user mentions a claim about vaccine safety. In that case, the chatbot cross-references this claim with a database of verified information from reputable sources like the WHO and CDC or independent fact-checkers like Snopes.
One of AI chatbots’ biggest strengths is real-time fact-checking. They can instantly access vast databases of verified information, allowing them to present users with evidence-based responses tailored to the specific misinformation in question. They offer direct corrections and provide explanations, sources, and follow-up information to help users understand the broader context. These bots operate 24/7 and can handle thousands of interactions simultaneously, offering scalability far beyond what human fact-checkers can provide.
Several case studies show the effectiveness of AI chatbots in combating misinformation. During the COVID-19 pandemic, organizations like the WHO used AI chatbots to address widespread myths about the virus and vaccines. These chatbots provided accurate information, corrected misconceptions, and guided users to additional resources.
AI Chatbots Case Studies from MIT and UNICEF
Research has shown that AI chatbots can significantly reduce belief in conspiracy theories and misinformation. For example, MIT Sloan Research shows that AI chatbots, like GPT-4 Turbo, can dramatically reduce belief in conspiracy theories. The study engaged over 2,000 participants in personalized, evidence-based dialogues with the AI, leading to an average 20% reduction in belief in various conspiracy theories. Remarkably, about one-quarter of participants who initially believed in a conspiracy shifted to uncertainty after their interaction. These effects were durable, lasting for at least two months post-conversation.
Likewise, UNICEF’s U-Report chatbot was important in combating misinformation during the COVID-19 pandemic, particularly in regions with limited access to reliable information. The chatbot provided real-time health information to millions of young people across Africa and other areas, directly addressing COVID-19 and vaccine safety
concerns.
The chatbot played a vital role in enhancing trust in verified health sources by allowing users to ask questions and receive credible answers. It was especially effective in communities where misinformation was extensive, and literacy levels were low, helping to reduce the spread of false claims. This engagement with young users proved vital in promoting accurate information and debunking myths during the health crisis.
Challenges, Limitations, and Future Prospects of AI Chatbots in Tackling Misinformation
Despite their effectiveness, AI chatbots face several challenges. They are only as effective as the data they are trained on, and incomplete or biased datasets can limit their ability to address all forms of misinformation. Additionally, conspiracy theories are constantly evolving, requiring regular updates to the chatbots.
Bias and fairness are also among the concerns. Chatbots may reflect the biases in their training data, potentially skewing responses. For example, a chatbot trained in Western media might not fully understand non-Western misinformation. Diversifying training data and ongoing monitoring can help ensure balanced responses.
User engagement is another hurdle. It cannot be easy to convince individuals deeply ingrained in their beliefs to interact with AI chatbots. Transparency about data sources and offering verification options can build trust. Using a non-confrontational, empathetic tone can also make interactions more constructive.
The future of AI chatbots in combating misinformation looks promising. Advancements in AI technology, such as deep learning and AI-driven moderation systems, will enhance chatbots’ capabilities. Moreover, collaboration between AI chatbots and human fact-checkers can provide a robust approach to misinformation.
Beyond health and political misinformation, AI chatbots can promote media literacy and critical thinking in educational settings and serve as automated advisors in workplaces. Policymakers can support the effective and responsible use of AI through regulations encouraging transparency, data privacy, and ethical use.
The Bottom Line
In conclusion, AI chatbots have emerged as powerful tools in fighting misinformation and conspiracy theories. They offer scalable, real-time solutions that surpass the capacity of human fact-checkers. Delivering personalized, evidence-based responses helps build trust in credible information and promotes informed decision-making.
While data bias and user engagement persist, advancements in AI and collaboration with human fact-checkers hold promise for an even stronger impact. With responsible deployment, AI chatbots can play a vital role in developing a more informed and truthful society.
0 notes
littlespoonevan · 25 days ago
Text
**I’m aware some of these are vastly different genres and are not necessarily comparable (eg some people value drama more than comedy and vice versa) and they all also have very different stakes in terms of the overall story they’re trying to tell so let that sway you if you want! These are legit just the seasons of shows that have made me go 10/10 no notes 👏 both from a storytelling pov and in terms of my own personal enjoyment
66 notes · View notes
mostlysignssomeportents · 2 years ago
Text
The surprising truth about data-driven dictatorships
Tumblr media
Here’s the “dictator’s dilemma”: they want to block their country’s frustrated elites from mobilizing against them, so they censor public communications; but they also want to know what their people truly believe, so they can head off simmering resentments before they boil over into regime-toppling revolutions.
These two strategies are in tension: the more you censor, the less you know about the true feelings of your citizens and the easier it will be to miss serious problems until they spill over into the streets (think: the fall of the Berlin Wall or Tunisia before the Arab Spring). Dictators try to square this circle with things like private opinion polling or petition systems, but these capture a small slice of the potentially destabiziling moods circulating in the body politic.
Enter AI: back in 2018, Yuval Harari proposed that AI would supercharge dictatorships by mining and summarizing the public mood — as captured on social media — allowing dictators to tack into serious discontent and diffuse it before it erupted into unequenchable wildfire:
https://www.theatlantic.com/magazine/archive/2018/10/yuval-noah-harari-technology-tyranny/568330/
Harari wrote that “the desire to concentrate all information and power in one place may become [dictators] decisive advantage in the 21st century.” But other political scientists sharply disagreed. Last year, Henry Farrell, Jeremy Wallace and Abraham Newman published a thoroughgoing rebuttal to Harari in Foreign Affairs:
https://www.foreignaffairs.com/world/spirals-delusion-artificial-intelligence-decision-making
They argued that — like everyone who gets excited about AI, only to have their hopes dashed — dictators seeking to use AI to understand the public mood would run into serious training data bias problems. After all, people living under dictatorships know that spouting off about their discontent and desire for change is a risky business, so they will self-censor on social media. That’s true even if a person isn’t afraid of retaliation: if you know that using certain words or phrases in a post will get it autoblocked by a censorbot, what’s the point of trying to use those words?
The phrase “Garbage In, Garbage Out” dates back to 1957. That’s how long we’ve known that a computer that operates on bad data will barf up bad conclusions. But this is a very inconvenient truth for AI weirdos: having given up on manually assembling training data based on careful human judgment with multiple review steps, the AI industry “pivoted” to mass ingestion of scraped data from the whole internet.
But adding more unreliable data to an unreliable dataset doesn’t improve its reliability. GIGO is the iron law of computing, and you can’t repeal it by shoveling more garbage into the top of the training funnel:
https://memex.craphound.com/2018/05/29/garbage-in-garbage-out-machine-learning-has-not-repealed-the-iron-law-of-computer-science/
When it comes to “AI” that’s used for decision support — that is, when an algorithm tells humans what to do and they do it — then you get something worse than Garbage In, Garbage Out — you get Garbage In, Garbage Out, Garbage Back In Again. That’s when the AI spits out something wrong, and then another AI sucks up that wrong conclusion and uses it to generate more conclusions.
To see this in action, consider the deeply flawed predictive policing systems that cities around the world rely on. These systems suck up crime data from the cops, then predict where crime is going to be, and send cops to those “hotspots” to do things like throw Black kids up against a wall and make them turn out their pockets, or pull over drivers and search their cars after pretending to have smelled cannabis.
The problem here is that “crime the police detected” isn’t the same as “crime.” You only find crime where you look for it. For example, there are far more incidents of domestic abuse reported in apartment buildings than in fully detached homes. That’s not because apartment dwellers are more likely to be wife-beaters: it’s because domestic abuse is most often reported by a neighbor who hears it through the walls.
So if your cops practice racially biased policing (I know, this is hard to imagine, but stay with me /s), then the crime they detect will already be a function of bias. If you only ever throw Black kids up against a wall and turn out their pockets, then every knife and dime-bag you find in someone’s pockets will come from some Black kid the cops decided to harass.
That’s life without AI. But now let’s throw in predictive policing: feed your “knives found in pockets” data to an algorithm and ask it to predict where there are more knives in pockets, and it will send you back to that Black neighborhood and tell you do throw even more Black kids up against a wall and search their pockets. The more you do this, the more knives you’ll find, and the more you’ll go back and do it again.
This is what Patrick Ball from the Human Rights Data Analysis Group calls “empiricism washing”: take a biased procedure and feed it to an algorithm, and then you get to go and do more biased procedures, and whenever anyone accuses you of bias, you can insist that you’re just following an empirical conclusion of a neutral algorithm, because “math can’t be racist.”
HRDAG has done excellent work on this, finding a natural experiment that makes the problem of GIGOGBI crystal clear. The National Survey On Drug Use and Health produces the gold standard snapshot of drug use in America. Kristian Lum and William Isaac took Oakland’s drug arrest data from 2010 and asked Predpol, a leading predictive policing product, to predict where Oakland’s 2011 drug use would take place.
Tumblr media
[Image ID: (a) Number of drug arrests made by Oakland police department, 2010. (1) West Oakland, (2) International Boulevard. (b) Estimated number of drug users, based on 2011 National Survey on Drug Use and Health]
Then, they compared those predictions to the outcomes of the 2011 survey, which shows where actual drug use took place. The two maps couldn’t be more different:
https://rss.onlinelibrary.wiley.com/doi/full/10.1111/j.1740-9713.2016.00960.x
Predpol told cops to go and look for drug use in a predominantly Black, working class neighborhood. Meanwhile the NSDUH survey showed the actual drug use took place all over Oakland, with a higher concentration in the Berkeley-neighboring student neighborhood.
What’s even more vivid is what happens when you simulate running Predpol on the new arrest data that would be generated by cops following its recommendations. If the cops went to that Black neighborhood and found more drugs there and told Predpol about it, the recommendation gets stronger and more confident.
In other words, GIGOGBI is a system for concentrating bias. Even trace amounts of bias in the original training data get refined and magnified when they are output though a decision support system that directs humans to go an act on that output. Algorithms are to bias what centrifuges are to radioactive ore: a way to turn minute amounts of bias into pluripotent, indestructible toxic waste.
There’s a great name for an AI that’s trained on an AI’s output, courtesy of Jathan Sadowski: “Habsburg AI.”
And that brings me back to the Dictator’s Dilemma. If your citizens are self-censoring in order to avoid retaliation or algorithmic shadowbanning, then the AI you train on their posts in order to find out what they’re really thinking will steer you in the opposite direction, so you make bad policies that make people angrier and destabilize things more.
Or at least, that was Farrell(et al)’s theory. And for many years, that’s where the debate over AI and dictatorship has stalled: theory vs theory. But now, there’s some empirical data on this, thanks to the “The Digital Dictator’s Dilemma,” a new paper from UCSD PhD candidate Eddie Yang:
https://www.eddieyang.net/research/DDD.pdf
Yang figured out a way to test these dueling hypotheses. He got 10 million Chinese social media posts from the start of the pandemic, before companies like Weibo were required to censor certain pandemic-related posts as politically sensitive. Yang treats these posts as a robust snapshot of public opinion: because there was no censorship of pandemic-related chatter, Chinese users were free to post anything they wanted without having to self-censor for fear of retaliation or deletion.
Next, Yang acquired the censorship model used by a real Chinese social media company to decide which posts should be blocked. Using this, he was able to determine which of the posts in the original set would be censored today in China.
That means that Yang knows that the “real” sentiment in the Chinese social media snapshot is, and what Chinese authorities would believe it to be if Chinese users were self-censoring all the posts that would be flagged by censorware today.
From here, Yang was able to play with the knobs, and determine how “preference-falsification” (when users lie about their feelings) and self-censorship would give a dictatorship a misleading view of public sentiment. What he finds is that the more repressive a regime is — the more people are incentivized to falsify or censor their views — the worse the system gets at uncovering the true public mood.
What’s more, adding additional (bad) data to the system doesn’t fix this “missing data” problem. GIGO remains an iron law of computing in this context, too.
But it gets better (or worse, I guess): Yang models a “crisis” scenario in which users stop self-censoring and start articulating their true views (because they’ve run out of fucks to give). This is the most dangerous moment for a dictator, and depending on the dictatorship handles it, they either get another decade or rule, or they wake up with guillotines on their lawns.
But “crisis” is where AI performs the worst. Trained on the “status quo” data where users are continuously self-censoring and preference-falsifying, AI has no clue how to handle the unvarnished truth. Both its recommendations about what to censor and its summaries of public sentiment are the least accurate when crisis erupts.
But here’s an interesting wrinkle: Yang scraped a bunch of Chinese users’ posts from Twitter — which the Chinese government doesn’t get to censor (yet) or spy on (yet) — and fed them to the model. He hypothesized that when Chinese users post to American social media, they don’t self-censor or preference-falsify, so this data should help the model improve its accuracy.
He was right — the model got significantly better once it ingested data from Twitter than when it was working solely from Weibo posts. And Yang notes that dictatorships all over the world are widely understood to be scraping western/northern social media.
But even though Twitter data improved the model’s accuracy, it was still wildly inaccurate, compared to the same model trained on a full set of un-self-censored, un-falsified data. GIGO is not an option, it’s the law (of computing).
Writing about the study on Crooked Timber, Farrell notes that as the world fills up with “garbage and noise” (he invokes Philip K Dick’s delighted coinage “gubbish”), “approximately correct knowledge becomes the scarce and valuable resource.”
https://crookedtimber.org/2023/07/25/51610/
This “probably approximately correct knowledge” comes from humans, not LLMs or AI, and so “the social applications of machine learning in non-authoritarian societies are just as parasitic on these forms of human knowledge production as authoritarian governments.”
Tumblr media
The Clarion Science Fiction and Fantasy Writers’ Workshop summer fundraiser is almost over! I am an alum, instructor and volunteer board member for this nonprofit workshop whose alums include Octavia Butler, Kim Stanley Robinson, Bruce Sterling, Nalo Hopkinson, Kameron Hurley, Nnedi Okorafor, Lucius Shepard, and Ted Chiang! Your donations will help us subsidize tuition for students, making Clarion — and sf/f — more accessible for all kinds of writers.
Tumblr media
Libro.fm is the indie-bookstore-friendly, DRM-free audiobook alternative to Audible, the Amazon-owned monopolist that locks every book you buy to Amazon forever. When you buy a book on Libro, they share some of the purchase price with a local indie bookstore of your choosing (Libro is the best partner I have in selling my own DRM-free audiobooks!). As of today, Libro is even better, because it’s available in five new territories and currencies: Canada, the UK, the EU, Australia and New Zealand!
Tumblr media
[Image ID: An altered image of the Nuremberg rally, with ranked lines of soldiers facing a towering figure in a many-ribboned soldier's coat. He wears a high-peaked cap with a microchip in place of insignia. His head has been replaced with the menacing red eye of HAL9000 from Stanley Kubrick's '2001: A Space Odyssey.' The sky behind him is filled with a 'code waterfall' from 'The Matrix.']
Tumblr media
Image: Cryteria (modified) https://commons.wikimedia.org/wiki/File:HAL9000.svg
CC BY 3.0 https://creativecommons.org/licenses/by/3.0/deed.en
 — 
Raimond Spekking (modified) https://commons.wikimedia.org/wiki/File:Acer_Extensa_5220_-_Columbia_MB_06236-1N_-_Intel_Celeron_M_530_-_SLA2G_-_in_Socket_479-5029.jpg
CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0/deed.en
 — 
Russian Airborne Troops (modified) https://commons.wikimedia.org/wiki/File:Vladislav_Achalov_at_the_Airborne_Troops_Day_in_Moscow_%E2%80%93_August_2,_2008.jpg
“Soldiers of Russia” Cultural Center (modified) https://commons.wikimedia.org/wiki/File:Col._Leonid_Khabarov_in_an_everyday_service_uniform.JPG
CC BY-SA 3.0 https://creativecommons.org/licenses/by-sa/3.0/deed.en
831 notes · View notes
rulesforthedance · 2 months ago
Text
My bisexual girlfriend has observed that there are more frog enthusiasts among bi people than in the general population.
"A lot" could mean: you often seek out pictures and videos of frogs or information about them, you go looking for frogs in the wild, you have pet frogs, or you buy frog-related items.
If you're not bisexual but are some closely-related identity, you can decide if it makes the most sense for the purposes of this poll to align yourself with bi people or everyone else.
44 notes · View notes