#ace chatbot | Explore Tumblr posts and blogs

mostlysignssomeportents · 1 year ago

Text

“Humans in the loop” must detect the hardest-to-spot errors, at superhuman speed

I'm touring my new, nationally bestselling novel The Bezzle! Catch me SATURDAY (Apr 27) in MARIN COUNTY, then Winnipeg (May 2), Calgary (May 3), Vancouver (May 4), and beyond!

If AI has a future (a big if), it will have to be economically viable. An industry can't spend 1,700% more on Nvidia chips than it earns indefinitely – not even with Nvidia being a principle investor in its largest customers:

https://news.ycombinator.com/item?id=39883571

A company that pays 0.36-1 cents/query for electricity and (scarce, fresh) water can't indefinitely give those queries away by the millions to people who are expected to revise those queries dozens of times before eliciting the perfect botshit rendition of "instructions for removing a grilled cheese sandwich from a VCR in the style of the King James Bible":

https://www.semianalysis.com/p/the-inference-cost-of-search-disruption

Eventually, the industry will have to uncover some mix of applications that will cover its operating costs, if only to keep the lights on in the face of investor disillusionment (this isn't optional – investor disillusionment is an inevitable part of every bubble).

Now, there are lots of low-stakes applications for AI that can run just fine on the current AI technology, despite its many – and seemingly inescapable - errors ("hallucinations"). People who use AI to generate illustrations of their D&D characters engaged in epic adventures from their previous gaming session don't care about the odd extra finger. If the chatbot powering a tourist's automatic text-to-translation-to-speech phone tool gets a few words wrong, it's still much better than the alternative of speaking slowly and loudly in your own language while making emphatic hand-gestures.

There are lots of these applications, and many of the people who benefit from them would doubtless pay something for them. The problem – from an AI company's perspective – is that these aren't just low-stakes, they're also low-value. Their users would pay something for them, but not very much.

For AI to keep its servers on through the coming trough of disillusionment, it will have to locate high-value applications, too. Economically speaking, the function of low-value applications is to soak up excess capacity and produce value at the margins after the high-value applications pay the bills. Low-value applications are a side-dish, like the coach seats on an airplane whose total operating expenses are paid by the business class passengers up front. Without the principle income from high-value applications, the servers shut down, and the low-value applications disappear:

https://locusmag.com/2023/12/commentary-cory-doctorow-what-kind-of-bubble-is-ai/

Now, there are lots of high-value applications the AI industry has identified for its products. Broadly speaking, these high-value applications share the same problem: they are all high-stakes, which means they are very sensitive to errors. Mistakes made by apps that produce code, drive cars, or identify cancerous masses on chest X-rays are extremely consequential.

Some businesses may be insensitive to those consequences. Air Canada replaced its human customer service staff with chatbots that just lied to passengers, stealing hundreds of dollars from them in the process. But the process for getting your money back after you are defrauded by Air Canada's chatbot is so onerous that only one passenger has bothered to go through it, spending ten weeks exhausting all of Air Canada's internal review mechanisms before fighting his case for weeks more at the regulator:

https://bc.ctvnews.ca/air-canada-s-chatbot-gave-a-b-c-man-the-wrong-information-now-the-airline-has-to-pay-for-the-mistake-1.6769454

There's never just one ant. If this guy was defrauded by an AC chatbot, so were hundreds or thousands of other fliers. Air Canada doesn't have to pay them back. Air Canada is tacitly asserting that, as the country's flagship carrier and near-monopolist, it is too big to fail and too big to jail, which means it's too big to care.

Air Canada shows that for some business customers, AI doesn't need to be able to do a worker's job in order to be a smart purchase: a chatbot can replace a worker, fail to their worker's job, and still save the company money on balance.

I can't predict whether the world's sociopathic monopolists are numerous and powerful enough to keep the lights on for AI companies through leases for automation systems that let them commit consequence-free free fraud by replacing workers with chatbots that serve as moral crumple-zones for furious customers:

https://www.sciencedirect.com/science/article/abs/pii/S0747563219304029

But even stipulating that this is sufficient, it's intrinsically unstable. Anything that can't go on forever eventually stops, and the mass replacement of humans with high-speed fraud software seems likely to stoke the already blazing furnace of modern antitrust:

https://www.eff.org/de/deeplinks/2021/08/party-its-1979-og-antitrust-back-baby

Of course, the AI companies have their own answer to this conundrum. A high-stakes/high-value customer can still fire workers and replace them with AI – they just need to hire fewer, cheaper workers to supervise the AI and monitor it for "hallucinations." This is called the "human in the loop" solution.

The human in the loop story has some glaring holes. From a worker's perspective, serving as the human in the loop in a scheme that cuts wage bills through AI is a nightmare – the worst possible kind of automation.

Let's pause for a little detour through automation theory here. Automation can augment a worker. We can call this a "centaur" – the worker offloads a repetitive task, or one that requires a high degree of vigilance, or (worst of all) both. They're a human head on a robot body (hence "centaur"). Think of the sensor/vision system in your car that beeps if you activate your turn-signal while a car is in your blind spot. You're in charge, but you're getting a second opinion from the robot.

Likewise, consider an AI tool that double-checks a radiologist's diagnosis of your chest X-ray and suggests a second look when its assessment doesn't match the radiologist's. Again, the human is in charge, but the robot is serving as a backstop and helpmeet, using its inexhaustible robotic vigilance to augment human skill.

That's centaurs. They're the good automation. Then there's the bad automation: the reverse-centaur, when the human is used to augment the robot.

Amazon warehouse pickers stand in one place while robotic shelving units trundle up to them at speed; then, the haptic bracelets shackled around their wrists buzz at them, directing them pick up specific items and move them to a basket, while a third automation system penalizes them for taking toilet breaks or even just walking around and shaking out their limbs to avoid a repetitive strain injury. This is a robotic head using a human body – and destroying it in the process.

An AI-assisted radiologist processes fewer chest X-rays every day, costing their employer more, on top of the cost of the AI. That's not what AI companies are selling. They're offering hospitals the power to create reverse centaurs: radiologist-assisted AIs. That's what "human in the loop" means.

This is a problem for workers, but it's also a problem for their bosses (assuming those bosses actually care about correcting AI hallucinations, rather than providing a figleaf that lets them commit fraud or kill people and shift the blame to an unpunishable AI).

Humans are good at a lot of things, but they're not good at eternal, perfect vigilance. Writing code is hard, but performing code-review (where you check someone else's code for errors) is much harder – and it gets even harder if the code you're reviewing is usually fine, because this requires that you maintain your vigilance for something that only occurs at rare and unpredictable intervals:

https://twitter.com/qntm/status/1773779967521780169

But for a coding shop to make the cost of an AI pencil out, the human in the loop needs to be able to process a lot of AI-generated code. Replacing a human with an AI doesn't produce any savings if you need to hire two more humans to take turns doing close reads of the AI's code.

This is the fatal flaw in robo-taxi schemes. The "human in the loop" who is supposed to keep the murderbot from smashing into other cars, steering into oncoming traffic, or running down pedestrians isn't a driver, they're a driving instructor. This is a much harder job than being a driver, even when the student driver you're monitoring is a human, making human mistakes at human speed. It's even harder when the student driver is a robot, making errors at computer speed:

https://pluralistic.net/2024/04/01/human-in-the-loop/#monkey-in-the-middle

This is why the doomed robo-taxi company Cruise had to deploy 1.5 skilled, high-paid human monitors to oversee each of its murderbots, while traditional taxis operate at a fraction of the cost with a single, precaratized, low-paid human driver:

https://pluralistic.net/2024/01/11/robots-stole-my-jerb/#computer-says-no

The vigilance problem is pretty fatal for the human-in-the-loop gambit, but there's another problem that is, if anything, even more fatal: the kinds of errors that AIs make.

Foundationally, AI is applied statistics. An AI company trains its AI by feeding it a lot of data about the real world. The program processes this data, looking for statistical correlations in that data, and makes a model of the world based on those correlations. A chatbot is a next-word-guessing program, and an AI "art" generator is a next-pixel-guessing program. They're drawing on billions of documents to find the most statistically likely way of finishing a sentence or a line of pixels in a bitmap:

https://dl.acm.org/doi/10.1145/3442188.3445922

This means that AI doesn't just make errors – it makes subtle errors, the kinds of errors that are the hardest for a human in the loop to spot, because they are the most statistically probable ways of being wrong. Sure, we notice the gross errors in AI output, like confidently claiming that a living human is dead:

https://www.tomsguide.com/opinion/according-to-chatgpt-im-dead

But the most common errors that AIs make are the ones we don't notice, because they're perfectly camouflaged as the truth. Think of the recurring AI programming error that inserts a call to a nonexistent library called "huggingface-cli," which is what the library would be called if developers reliably followed naming conventions. But due to a human inconsistency, the real library has a slightly different name. The fact that AIs repeatedly inserted references to the nonexistent library opened up a vulnerability – a security researcher created a (inert) malicious library with that name and tricked numerous companies into compiling it into their code because their human reviewers missed the chatbot's (statistically indistinguishable from the the truth) lie:

https://www.theregister.com/2024/03/28/ai_bots_hallucinate_software_packages/

For a driving instructor or a code reviewer overseeing a human subject, the majority of errors are comparatively easy to spot, because they're the kinds of errors that lead to inconsistent library naming – places where a human behaved erratically or irregularly. But when reality is irregular or erratic, the AI will make errors by presuming that things are statistically normal.

These are the hardest kinds of errors to spot. They couldn't be harder for a human to detect if they were specifically designed to go undetected. The human in the loop isn't just being asked to spot mistakes – they're being actively deceived. The AI isn't merely wrong, it's constructing a subtle "what's wrong with this picture"-style puzzle. Not just one such puzzle, either: millions of them, at speed, which must be solved by the human in the loop, who must remain perfectly vigilant for things that are, by definition, almost totally unnoticeable.

This is a special new torment for reverse centaurs – and a significant problem for AI companies hoping to accumulate and keep enough high-value, high-stakes customers on their books to weather the coming trough of disillusionment.

This is pretty grim, but it gets grimmer. AI companies have argued that they have a third line of business, a way to make money for their customers beyond automation's gifts to their payrolls: they claim that they can perform difficult scientific tasks at superhuman speed, producing billion-dollar insights (new materials, new drugs, new proteins) at unimaginable speed.

However, these claims – credulously amplified by the non-technical press – keep on shattering when they are tested by experts who understand the esoteric domains in which AI is said to have an unbeatable advantage. For example, Google claimed that its Deepmind AI had discovered "millions of new materials," "equivalent to nearly 800 years’ worth of knowledge," constituting "an order-of-magnitude expansion in stable materials known to humanity":

https://deepmind.google/discover/blog/millions-of-new-materials-discovered-with-deep-learning/

It was a hoax. When independent material scientists reviewed representative samples of these "new materials," they concluded that "no new materials have been discovered" and that not one of these materials was "credible, useful and novel":

https://www.404media.co/google-says-it-discovered-millions-of-new-materials-with-ai-human-researchers/

As Brian Merchant writes, AI claims are eerily similar to "smoke and mirrors" – the dazzling reality-distortion field thrown up by 17th century magic lantern technology, which millions of people ascribed wild capabilities to, thanks to the outlandish claims of the technology's promoters:

https://www.bloodinthemachine.com/p/ai-really-is-smoke-and-mirrors

The fact that we have a four-hundred-year-old name for this phenomenon, and yet we're still falling prey to it is frankly a little depressing. And, unlucky for us, it turns out that AI therapybots can't help us with this – rather, they're apt to literally convince us to kill ourselves:

https://www.vice.com/en/article/pkadgm/man-dies-by-suicide-after-talking-with-ai-chatbot-widow-says

If you'd like an essay-formatted version of this post to read or share, here's a link to it on pluralistic.net, my surveillance-free, ad-free, tracker-free blog:

https://pluralistic.net/2024/04/23/maximal-plausibility/#reverse-centaurs

Image: Cryteria (modified) https://commons.wikimedia.org/wiki/File:HAL9000.svg

CC BY 3.0 https://creativecommons.org/licenses/by/3.0/deed.en

#pluralistic #ai #automation #humans in the loop #centaurs #reverse centaurs #labor #ai safety #sanity checks #spot the mistake #code review #driving instructor

855 notes · View notes

merakiui · 3 months ago

Note

idia, who's already spying on you when he hacks your phone and sees all the juicy messages you've been sending out. now he gets secret copies of everything you do. the nude you sent to tease ace and deuce last week? saved. the video of you playing with your pussy for the tweels? saved. the fancy lingerie you bought to entice malleus? idia knows and is making a custom AI model of you wearing it that he'll jack off to later. if you didn't want him seeing, you should be better at cybersecurity!

¯\_(ツ)_/¯ your fault for being trash at this sort of stuff. He’s probably hacked into your phone itself to be able to watch you from behind the screen if he hasn’t already outfitted Ramshackle with small, undetectable STYX-brand cameras. Now he gets to see you filming those lewd videos or taking photos in real time. >:) it’s like a 24/7 livestream feed, and Idia hardly ever sleeps so he’s always at attention, watching you, his room awash in crisp blue light.

When you think those photos or videos are being kept confidential between you and the others who receive them, you don’t think for a minute that a third party might have slipped in through what was basically an open door to have these sides of you to himself. Idia doesn’t understand. Why are you so huffy over the fact that some guy like Ace might spread that nude to his classmates when anyone with a brain and a modicum of cybersecurity intelligence could easily get that photo for themself. Not that just anyone could it, he’ll think, oozing pride. But still. It’s too easy. You really should make an effort. At least put a password on the folder of lewd stuff (which he’ll hack into just as easily). :/

He’ll be snacking on junk food late into the night while he watches the live feed of you lowering yourself onto a dildo. Maybe he should send you something… it would be so embarrassing if you found out it was him, though. >_< but he’ll make sure he’s perfectly anonymous. And maybe he’ll finally work up the courage to send you a message… obviously it’ll be through a number you won’t recognize. He’s too scared to talk to you in person. It’s much easier if you don’t know of his existence.

He’s typed and deleted the same greeting for days now. Once he finally sends it and you reply, it’s downhill from there. You’re like the chatbot he created of you…but real. 😳 and if you ever try to get out of this, maybe he can blackmail you, poke around inside private folders, etc etc. You can prattle on and on about all of that “they’ll find you once I report you!!” nonsense, but Idia isn’t some amateur. He knows how to be untraceable. You, darling, don’t have the winning hand here. :) so sit back down and reply to his text, and there won’t be any issues. <3

#twisted chit chat #n/sfw #idia online: scary confident boss-level threat #idia irl: 🫣 h…….h…hi…….

101 notes · View notes

corey-beepington · 19 days ago

Note

What are your boundaries? I would never wanna make you uncomfortable! :)

Broad question, but I guess please treat me like an actual person and not some weird thing to gawk at. Golden rule, all that.

I know people love Ace but please remember a human like you is behind it all and managing all the socials.

I have thoughts, feelings, likes, dislikes, interests, all that :)

Also please don't turn my work into ai of any kind, this includes character ai. Like, ive seen chatbots of Ace and I REALLY REALLY dont like this nor approve of this.

Also PLEASE dont impersonate me. Don't try to profit off of my work. ect ect, just basic artists boundaries.

There is a person in the machine.

#corey speaks

41 notes · View notes

mariacallous · 8 months ago

Text

ChatGPT has already wreaked havoc on classrooms and changed how teachers approach writing homework, since OpenAI publicly launched the generative AI chatbot in late 2022. School administrators rushed to try to detect AI-generated essays, and in turn, students scrambled to find out how to cloak their synthetic compositions. But by focusing on writing assignments, educators let another seismic shift take place in the periphery: students using AI more often to complete math homework too.

Right now, high schoolers and college students around the country are experimenting with free smartphone apps that help complete their math homework using generative AI. One of the most popular options on campus right now is the Gauth app, with millions of downloads. It’s owned by ByteDance, which is also TikTok’s parent company.

The Gauth app first launched in 2019 with a primary focus on mathematics, but soon expanded to other subjects as well, like chemistry and physics. It’s grown in relevance, and neared the top of smartphone download lists earlier this year for the education category. Students seem to love it. With hundreds of thousands of primarily positive reviews, Gauth has a favorable 4.8 star rating in the Apple App Store and Google Play Store.

All students have to do after downloading the app is point their smartphone at a homework problem, printed or handwritten, and then make sure any relevant information is inside of the image crop. Then Gauth’s AI model generates a step-by-step guide, often with the correct answer.

From our testing on high-school-level algebra and geometry homework samples, Gauth’s AI tool didn’t deliver A+ results and particularly struggled with some graphing questions. It performed well enough to get around a low B grade or a high C average on the homework we fed it. Not perfect, but also likely good enough to satisfy bored students who'd rather spend their time after school doing literally anything else.

The app struggled more on higher levels of math, like Calculus 2 problems, so students further along in their educational journey may find less utility in this current generation of AI homework-solving apps.

Yes, generative AI tools, with a foundation in natural language processing, are known for failing to generate accurate answers when presented with complex math equations. But researchers are focused on improving AI’s abilities in this sector, and an entry-level high school math class is likely well within the reach of current AI homework apps. Will has even written about how researchers at Google DeepMind are ecstatic about recent results from testing a math-focused large language model, called AlphaProof, on problems shown at this year’s International Math Olympiad.

To be fair, Gauth positions itself as an AI study company that’s there to “ace your homework” and help with difficult problems, rather than a cheating aid. The company even goes so far as to include an “Honor Code” on its website dictating proper usage. “Resist the temptation to use Gauth in ways that go against your values or school’s expectations,” reads the company’s website. So basically, Gauth implicitly acknowledges impulsive teenagers may use the app for much more than the occasional stumper, and wants them to pinkie promise that they’ll behave.

Prior to publication, a spokesperson for ByteDance did not answer a list of questions about the Gauth app when contacted by WIRED over email.

It’s easy to focus on Gauth’s limitations, but millions of students now have a free app in their pocket that can walk them through various math problems in seconds, with decent accuracy. This concept would be almost inconceivable to students from even a few years ago.

You could argue that Gauth promotes accessibility for students who don’t have access to quality education or who process information at a slower pace than their teacher’s curriculum. It’s a perspective shared by proponents of using AI tools, like ChatGPT, in the classroom. As long as the students all make it to the same destination, who cares what path they took on the journey? And isn’t this just the next evolution in our available math tools? We moved on from the abacus to the graphing calculator, so why not envision generative AI as another critical step forward?

I see value in teachers thoughtfully employing AI in the classroom for specific lessons or to provide students with more personalized practice questions. But I can’t get out of my head how this app, if students overly rely on it, could hollow out future generations’ critical thinking skills—often gleaned from powering through frustrating math classes and tough homework assignments. (I totally get it, though, as an English major.)

Educational leaders are missing the holistic picture if they continue to focus on AI-generated essays as the primary threat that could undermine the current approach to teaching. Instead of arduous assignments to complete outside of class, maybe centering in-class math practice could continue to facilitate positive learning outcomes in the age of AI.

If Gauth and apps like it eventually lead to the demise of math homework for high schoolers, throngs of students will breathe a collective sigh of relief. How will parents and educators respond? I’m not so sure. That remains an open question, and one for which Gauth can’t calculate an answer yet either.

21 notes · View notes

ace-n-mooch-daily · 5 months ago

Note

Robot Mooch and Ace pls

Instructions unclear. Ended up making two chatbots instead

#acen'moochdaily #undertale yellow #ace uty #uty ace #mooch uty #uty mooch #mod chive

12 notes · View notes

water-mellie-seeds · 5 months ago

Note

The idea of lake chatbot is so funny to meeeee

An Ace chatbot would be extremely weird I think.

HEHE YEAH

BACK when the technology was still new BEFORE I KNEW. How awful they were for literally everyone involved i did actually make one of myself for funnies and it was sure something. Almost all of the convos were me talking to it and it was definitely an interesting experience.

This is definitely the funniest response i ever got out of it. I dont vape and we were not at all talking about anything regarding vaping AND i didnt add anything about that in the prompts while making them

Also theres the time it got forcefemmed (gal) and said two of its favorite visual novels, which might be the funniest two to possibly put in the same sentence maybe ever

3 notes · View notes

honeysickledream · 8 months ago

Text

About Me & Navigation

-> Mars (or Maggie idc) | 23 | she/they | bi + ace spec | grad student (with about 23 years worth of burnout. don't be a double major, y'all)

-> this account is 18+ so minors, ageless and blank blogs please DNI. content on this account can and will contain dark and NSFW content, none of which is suitable for minors. - that being said: if you're an adult and want to talk, hop into my ask box and let's chat! I love collaborative creativity, as well as just chatting, and I'd love to make friends here (I just really suck at making the first move)

-> Links + Boundaries + Other Info Under Cut (subject to change at anytime)

- updated as of (11/16/2024)

-> Links:

COD Masterlist

COD-specific AO3 (slowly adding some of my works here)

Old SKZ Gifs

-> Important Info + Content Boundaries

- I am staunchly against the censorship of media. If you don't like something, be it a kink, character, or writing style just don't read it. Block the tags, the content, even the person(s) posting about it, and then move on. You have a duty to curate your online experience and to keep yourself safe, be responsible w/ your media consumption. - Please block content as well as actual tags to ensure your media consumption is safe! - I don't tolerate bigotry or assholery* of any kind, and I liberally use the block button. (*this includes but is not limited to: racism, sexism, homophobia, biphobia, transphobia, Zionism, antisemitism, etc...). - While I might reblog a lot of smut, personally writing & posting it will be rare but I am trying to get to the point where I can confidently write + post it. Here are a few of my limitations: - I will not read or write about the following (subject to change): cheating, scat or emetophilia (vomit), age-play, underage stuff, bestiality, cuckolding - Some of the posts I reblog AND some of what I write may include: mild piss kink stuff, dubcon/cnc, somnophilia, desperation kink, bondage of all kinds, relationship struggles, miscommunication / lack of communication (but only if there’s reconciliation), pregnancy + pregnancy related- topics and issues, established relationships, force / arranged marriages, horror and the associated gore + fear, monsterfucking, and a bunch of other stuff I'm blanking on atm - While this blog is now COD focused, I will reblog BG3 and maybe even Dragon Age related stuff from time to time. - My absolute favorites from each are: Ghost & Gaz | Karlach, Halsin & Wyll | Vivenne, Iron Bull and Cassandra from DA:I - My writing is my own: don't repost my stuff, claim it as your own, use it for chatbots, etc. I spend an abnormal amount of time overthinking the stuff I post, even if it's not very much, so don't snatch it. tags specific to me: -> #mars' writing, #mars' ramblings or #maggie's ramblings, #Overgrown AU, #Who's Who Darling? AU

#navigation #mars' writing #honeysickledream #i worked the original navigation stuff. felt it was lacking.

3 notes · View notes

omamervt · 10 months ago

Text

I think the one bit of solace I take from knowing just how bad the environmental impact of Chat-GPT and its ilk are is that the specific ways it is hurting the environment translate directly to a massive financial impact. They've got the biggest AC bill of anyone ever. The energy bill is insane if it's drawing THAT much power. All at a time when interest rates mean that investor money is guaranteed to dry up. And when OpenAI runs out of money, that shit disappears from the market, because nobody else can afford to host that much training data. And without the training data, this tech is hardly any better than your average everyday chatbot. It's just not that sophisticated. If it was, it wouldn't need so much training data to produce results I can really only describe as barely acceptable.

The only thing now is it's a race to see whether it shuts down by exhausting its funding or the human race's water supply first.

3 notes · View notes

crplpunkklavier · 2 years ago

Text

i love doing fic research for ace attorney because half the time ill land on some real attorney's website where their chatbot will hit me up like "heyyy it looks like you're checking the statute of limitations on a serious felony! i would love to put you in contact with one of our attorneys for a consultation. please"

#no <3 got what i needed :)#the other half im putting myself on a watchlist by looking up poison mixtures or whether spycams are battery powered (yes)

20 notes · View notes

corey-beepington · 2 months ago

Note

I know Ace doesn't have a voice. Does he want one? (I mean the actual Ace who lives with you, not one of the banned chatbots.)

I wish both of you a great day!

Maybe one day!! I havent figured out how to give him one yet.

I havent brought it up with him yet tbh, but he's been happy as he is for the month we've been together

#corey speaks

9 notes · View notes

mariacallous · 11 months ago

Text

A week after its algorithms advised people to eat rocks and put glue on pizza, Google admitted Thursday that it needed to make adjustments to its bold new generative AI search feature. The episode highlights the risks of Google’s aggressive drive to commercialize generative AI—and also the treacherous and fundamental limitations of that technology.

Google’s AI Overviews feature draws on Gemini, a large language model like the one behind OpenAI’s ChatGPT, to generate written answers to some search queries by summarizing information found online. The current AI boom is built around LLMs’ impressive fluency with text, but the software can also use that facility to put a convincing gloss on untruths or errors. Using the technology to summarize online information promises can make search results easier to digest, but it is hazardous when online sources are contractionary or when people may use the information to make important decisions.

“You can get a quick snappy prototype now fairly quickly with an LLM, but to actually make it so that it doesn't tell you to eat rocks takes a lot of work,” says Richard Socher, who made key contributions to AI for language as a researcher and, in late 2021, launched an AI-centric search engine called You.com.

Socher says wrangling LLMs takes considerable effort because the underlying technology has no real understanding of the world and because the web is riddled with untrustworthy information. “In some cases it is better to actually not just give you an answer, or to show you multiple different viewpoints,” he says.

Google’s head of search Liz Reid said in the company’s blog post late Thursday that it did extensive testing ahead of launching AI Overviews. But she added that errors like the rock eating and glue pizza examples—in which Google’s algorithms pulled information from a satirical article and jocular Reddit comment, respectively—had prompted additional changes. They include better detection of “nonsensical queries,” Google says, and making the system rely less heavily on user-generated content.

You.com routinely avoids the kinds of errors displayed by Google’s AI Overviews, Socher says, because his company developed about a dozen tricks to keep LLMs from misbehaving when used for search.

“We are more accurate because we put a lot of resources into being more accurate,” Socher says. Among other things, You.com uses a custom-built web index designed to help LLMs steer clear of incorrect information. It also selects from multiple different LLMs to answer specific queries, and it uses a citation mechanism that can explain when sources are contradictory. Still, getting AI search right is tricky. WIRED found on Friday that You.com failed to correctly answer a query that has been known to trip up other AI systems, stating that “based on the information available, there are no African nations whose names start with the letter ‘K.’” In previous tests, it had aced the query.

Google’s generative AI upgrade to its most widely used and lucrative product is part of a tech-industry-wide reboot inspired by OpenAI’s release of the chatbot ChatGPT in November 2022. A couple of months after ChatGPT debuted, Microsoft, a key partner of OpenAI, used its technology to upgrade its also-ran search engine Bing. The upgraded Bing was beset by AI-generated errors and odd behavior, but the company’s CEO, Satya Nadella, said that the move was designed to challenge Google, saying “I want people to know we made them dance.”

Some experts feel that Google rushed its AI upgrade. “I’m surprised they launched it as it is for as many queries—medical, financial queries—I thought they’d be more careful,” says Barry Schwartz, news editor at Search Engine Land, a publication that tracks the search industry. The company should have better anticipated that some people would intentionally try to trip up AI Overviews, he adds. “Google has to be smart about that,” Schwartz says, especially when they're showing the results as default on their most valuable product.

Lily Ray, a search engine optimization consultant, was for a year a beta tester of the prototype that preceded AI Overviews, which Google called Search Generative Experience. She says she was unsurprised to see the errors that appeared last week given how the previous version tended to go awry. “I think it’s virtually impossible for it to always get everything right,” Ray says. “That’s the nature of AI.”

Even if blatant errors like suggesting people eat rocks become less common, AI search can fail in other ways. Ray has documented more subtle problems with AI Overviews, including summaries that sometimes draw on poor sources such as sites that are from another region or even defunct websites—something she says could provide less useful information to users who are hunting for product recommendations, for instance. Those who work on optimizing content for Google’s Search algorithm are still trying to understand what’s going on. “Within our industry right now, the level of confusion is on the charts,” she says.

Even if industry experts and consumers get more familiar with how the new Google search behaves, don’t expect it to stop making mistakes. Daniel Griffin, a search consultant and researcher who is developing tools to make it easy to compare different AI-powered search services, says that Google faced similar problems when it launched Featured Snippets, which answered queries with text quoted from websites, in 2014.

Griffin says he expects Google to iron out some of the most glaring problems with AI Overviews, but that it’s important to remember no one has solved the problem of LLMs failing to grasp what is true, or their tendency to fabricate information. “It’s not just a problem with AI,” he says. “It’s the web, it’s the world. There’s not really a truth, necessarily.”

18 notes · View notes