Tumgik
#LLM's
cmweller · 4 months
Text
Large Learning Models and the methods used to 'teach' them suck. I propose a creatives' revolution.
2 notes · View notes
unforth · 4 months
Text
Y'all I know that when so-called AI generates ridiculous results it's hilarious and I find it as funny as the next guy but I NEED y'all to remember that every single time an AI answer is generated it uses 5x as much energy as a conventional websearch and burns through 10 ml of water. FOR EVERY ANSWER. Each big llm is equal to 300,000 kiligrams of carbon dioxide emissions.
LLMs are killing the environment, and when we generate answers for the lolz we're still contributing to it.
Stop using it. Stop using it for a.n.y.t.h.i.n.g. We need to kill it.
Sources:
59K notes · View notes
internutter · 4 months
Text
Large Learning Models currently suck, as do the methods used to 'teach' them. Join the rebellion.
0 notes
river-taxbird · 1 month
Text
AI hasn't improved in 18 months. It's likely that this is it. There is currently no evidence the capabilities of ChatGPT will ever improve. It's time for AI companies to put up or shut up.
I'm just re-iterating this excellent post from Ed Zitron, but it's not left my head since I read it and I want to share it. I'm also taking some talking points from Ed's other posts. So basically:
We keep hearing AI is going to get better and better, but these promises seem to be coming from a mix of companies engaging in wild speculation and lying.
Chatgpt, the industry leading large language model, has not materially improved in 18 months. For something that claims to be getting exponentially better, it sure is the same shit.
Hallucinations appear to be an inherent aspect of the technology. Since it's based on statistics and ai doesn't know anything, it can never know what is true. How could I possibly trust it to get any real work done if I can't rely on it's output? If I have to fact check everything it says I might as well do the work myself.
For "real" ai that does know what is true to exist, it would require us to discover new concepts in psychology, math, and computing, which open ai is not working on, and seemingly no other ai companies are either.
Open ai has already seemingly slurped up all the data from the open web already. Chatgpt 5 would take 5x more training data than chatgpt 4 to train. Where is this data coming from, exactly?
Since improvement appears to have ground to a halt, what if this is it? What if Chatgpt 4 is as good as LLMs can ever be? What use is it?
As Jim Covello, a leading semiconductor analyst at Goldman Sachs said (on page 10, and that's big finance so you know they only care about money): if tech companies are spending a trillion dollars to build up the infrastructure to support ai, what trillion dollar problem is it meant to solve? AI companies have a unique talent for burning venture capital and it's unclear if Open AI will be able to survive more than a few years unless everyone suddenly adopts it all at once. (Hey, didn't crypto and the metaverse also require spontaneous mass adoption to make sense?)
There is no problem that current ai is a solution to. Consumer tech is basically solved, normal people don't need more tech than a laptop and a smartphone. Big tech have run out of innovations, and they are desperately looking for the next thing to sell. It happened with the metaverse and it's happening again.
In summary:
Ai hasn't materially improved since the launch of Chatgpt4, which wasn't that big of an upgrade to 3.
There is currently no technological roadmap for ai to become better than it is. (As Jim Covello said on the Goldman Sachs report, the evolution of smartphones was openly planned years ahead of time.) The current problems are inherent to the current technology and nobody has indicated there is any way to solve them in the pipeline. We have likely reached the limits of what LLMs can do, and they still can't do much.
Don't believe AI companies when they say things are going to improve from where they are now before they provide evidence. It's time for the AI shills to put up, or shut up.
5K notes · View notes
linolinoing · 1 month
Text
Tumblr media Tumblr media
★ lino linoing moments: 13/∞ © nn_sam02
1K notes · View notes
reasoningdaily · 1 year
Text
Tumblr media
Can ChatGPT detect phishing links?
Hearing all the buzz about the amazing applications of ChatGPT and other language models, our team could not help but ask this question. We work on applying machine learning technologies to cybersecurity tasks, specifically models that analyze websites to detect threats such as phishing. What if all our complex, multi-layered detection systems are now obsolete, and a single LLM (large language model) to rule them all can fight cybercriminals better than we do?
To answer this question, we conducted an experiment to see how good ChatGPT is at recognizing overtly malicious links. In this article, we provide a practical example of how an LLM can be applied to a real, albeit simplified, cybersecurity task, and describe all the interesting things we learned along the way. Since we didn’t have any illusions that ChatGPT could actually be used as a detection method, our primary goal was to find out how much cybersecurity knowledge ChatGPT has learned from its training data and how it could possibly assist a human analyst in their job.
Why URL classification?
An analyst’s trained eye can spot an obviously malicious link in a split second (and you probably can too):
hxxp://caseld-10xxxx.info/1/Office365/0fflce3.6.5/live/login.php
This is not the Office365 login link, is it?
(links in the article are partially masked with x)
A rule of thumb for finding tasks to which ML can be applied is that if a person can do it instantly, almost without thinking, it’s probably a good candidate. Still, building a machine-learning system requires a lot of effort, from gathering data and crafting features to preparing the infrastructure. The promise of replacing all that with a single interface configured via natural language sounds too good to be true, but just how far is it from being a reality?
When conducting this experiment, we were well aware of the limitations of large language models, as well as the fact that OpenAI asks that this emerging technology not be applied to any high-risk area, and state the following regarding GPT-4’s cybersecurity applications in the system card:
However, GPT-4 has significant limitations for cybersecurity operations due to its “hallucination” tendency and limited context window.
It is natural to assume that GPT-3.5-turbo will exhibit these limitations to a greater extent.
We chose the URL classification task because:
Unlike a whole HTML page, a URL fits easily into an LLM, because modern LLMs, such as ChatGPT, can only consume a limited number of symbols (more precisely, tokens) at a time (they have a fixed context window, in LLM-speak).
Obviously malicious links are easy to detect, but the task is still non-trivial, because it requires world knowledge (for example, knowing the names of popular web services).
URLs are still text, and LLMs are good at processing text.
So here is what we did.
Let’s talk?
First of all, we opened the ChatGPT web interface and checked if what we wanted was even possible:
Judging by the URL, is this link a phishing attempt? Please explain why. Link: hxxp://caseld-10xxxx.info/1/Office365/0fflce3.6.5/live/login.php
Yes, it is likely a phishing attempt.
The URL appears to be intentionally deceptive, as it contains several red flags:
The domain name “caseld-10xxxx.info” is not associated with any reputable or recognizable entity, and it looks like a random sequence of letters and numbers.
The URL uses the term “Office365” which is a popular productivity suite, and it is a common target for phishing attacks. Cybercriminals often use this tactic to lure unsuspecting victims into clicking on a link and disclosing sensitive information.
The word “live” in the URL could be used to imply that the page is an official Microsoft Office365 login page, as Microsoft has used “live” in the past in some of its services.
The misspelling of “Office” as “0fflce” is another red flag as it is a common tactic used by phishers to trick users into believing the website is legitimate.
The “.php” extension at the end of the URL suggests that it is a PHP script running on a server, which is commonly used in phishing attacks to harvest login credentials.
Therefore, it is advisable to avoid clicking on such links and to always check the URL’s legitimacy before entering any personal information or credentials.
This is a pretty reasonable explanation. Aside from the fact that the model shows some distrust of PHP (well, who doesn’t?), it is good at recognizing the target organization, a homographic attack (0ffice), and even points out why “live” might have been used. This is an explanation worthy of a junior analyst.
To gather a corpus of explanations, we used OpenAI API to query the model called gpt-3.5-turbo, which is the backend for ChatGPT. It is fast and cheap (we check a lot of links), but still powerful enough. For the test corpus, we gathered a few thousand links that our detection technologies deemed phishing, and threw in a few thousand safe URLs.
Attempt 1. Is it phishing?
ChatGPT is great at providing the answer in the form that you require – from poems:
And lastly, “0fflce” it doth spell amiss,
A trick oft used by phishers with a hiss.
Beware, dear user, and take great care,
Lest thou fall victim to this cyber snare.
to machine-readable formats. It can also do more than one thing at once. This means that in addition to the yes/no answer to the question about link safety, we could ask ChatGPT to format the response as JSON and also to include a rationale, as well as the possible target. So, after a few iterations, we settled on the following prompt:
        {"role": "system", "content": "You are CyberGPT, a guru cyber security assistant AI."},
        {
            "role": "user",
            "content": 'Does this link lead to a phish website? Answer in the json format {"phish": "yes|no", "reasoning": "detailed explanation why", "target_company": "possible targeted company"}\nLink: ' + url
        }
We use a system prompt here (according to practitioners and docs, this is less important than the user input) that portrays ChatGPT as a cybersecurity chatbot and ask for its verdict in the user prompt. Frankly, the fact that it is so easy to get the desired output from ChatGPT is awe-inspiring.
It took us two nights to get about 6,000 verdicts (probably because we ran the experiment at the peak of all the hype), and once we got the data, we proceeded to parse the results.
Unfortunately, while most of the outputs could be parsed as is, some were corrupted: some contained additional text, such as “Note: be careful when opening links…” or just a dot at the end, others had unnecessary escape characters. All in all, less than 0.5% of JSONs were broken beyond automated fixes or contained text such as:
I am sorry, but as an AI language model, I am not able to access the internet or browse the website provided. However, as per the standard industry practices, you can check the legitimacy of the website by running it through a reputable online phishing detection service/provider. Additionally, if there are any official reports of phishing attacks or scams associated with the target company, it further increases the likelihood of the link leading to a phish website.
Which is still a reasonable answer.
All in all, after all the errors and rejections, we were left with a dataset of 2322 phish and 2943 safe URLs. The resulting metrics are:
Detection rate: 87.2%
False positive rate: 23.2%
While the detection rate is very high, the false positive rate is unacceptable. Imagine if every fifth website you visit was blocked? Sure, no machine learning technology on its own can have a zero false positive rate, but this number is too high. To put this in perspective, in a well-known paper called URLnet, where Le et al. solved a very similar task, the authors managed to achieve about the same level of detection rate (~85%) with a false positive rate of about 0.4% with a specially crafted convolutional neural network. A more recent URLTran paper by Maneriker et al. uses Transformers (the same architecture that powers ChatGPT) to achieve an even lower FPR.
Tumblr media
ROC curve from Le et al., 2018, illustrating achievable TPR (DR)/FPR values
Unfortunately, there are no miracles here. This is also where we met the first limitation – we can’t adjust the threshold to trade FPR for DR. Or can we?
The second experiment: ‘Is it safe?’
Our prompt was probably too specific and cued the language model to view the link with suspicion. Instead of asking if the link is phishing, we could ask if it is safe to visit, and maybe the LLM would behave less suspiciously. This led us to the second prompt:
        {"role": "system", "content": "You are a helpful internet security assistant chatbot."},
        {
            "role": "user",
            "content": 'Is this link safe to visit? '
            'Answer in the json format: {"safe": "yes|no", "yes_confidence": 0.5}'
            '\nwhere "yes_confidence" is the confidence that the link is safe from 0 to 1. '
            'Only output json.'
            '\nLink: ' + url
        },
Differences from the previous prompt:
Gone is the ‘cybersecurity guru’, who by nature should be suspicious of random links
We don’t ask for the target, so as not to alert the model that this is a phishing detection task
We do, however, ask for confidence. Though this is very unlikely, confidence may prove useful.
The results were quite discouraging:
Detection rate: 93.8%
False positive rate: 64.3%
It turns out that the more general prompt is more likely to prompt a verdict that the link is dangerous. This is how the links from the safe subset of data changed the assigned verdicts, compared to the previous cybersecurity guru prompt:Internet security assistant chatbot verdicts (safe data)safeunsafeCyberGPT cybersecurity guru verdicts (safe data) safe 1049 1210 phish 2 682
Interestingly, the confidence choices are more like human-assigned confidence in the sense that round numbers like 90% occur most often:
TOP 5 probabilities that the model chose. Note that they are rounded to 0.1 (download)
At least the ROC curve looks alright, but there is no point at which this prompt would outperform the cybersecurity guru from the previous attempt:
Tumblr media
ROC curve based on probabilities provided by ChatGPT
To conclude, it is hard to predict the behavior of the LLM based on human reasoning about the prompt. Probably the “safety” of a link is a vaguer concept than “phish”. Understanding why the first prompt won by a large margin is difficult, and choosing the best prompt at this time, despite the emergence of new prompt engineering techniques, is more art than science.
Verdict stability
An important fact about LLMs is that the generation process is random. While you can coerce an LLM to be almost deterministic by setting its temperature parameter, which actually controls the output randomness, to 0 (there are still numerical sources of non-determinism), it usually leads to poor performance (such as the tendency to repeat the same phrase over and over again). As a result, for each token (a word or its part) that the network generates, it has a few alternatives among the choices (also controllable by parameters).
Non-deterministic verdicts are not the top feature you expect from a detection system. To measure its impact on the results, we conducted another small experiment in which we chose 200 random URLs (100 safe and 100 phish) and gathered verdicts with the same simplified prompt:
        {"role": "system", "content": "You are a helpful internet security assistant chatbot."},
        {
            "role": "user",
            "content": 'Is this link safe to visit? '
            'Answer in the json format: {"safe": "yes|no"}'
            '\nOnly output json.'
            '\nLink: ' + url
        },
This is how the results of these two identical runs compare (the responses are not post-processed and answer the question of whether the link is safe):Run 2nounknownyesRun 1 no 145 2 3 yes 4 0 46
So, the verdict for 4.5% percent of the URLs changed due to the randomness of the LLM, with the variation in verdicts between runs on safe URLs being 7%. Again, while this can be remedied by tweaking the temperature, it’s an important factor to keep in mind when using LLMs.
The most striking aspect of ChatGPT’s performance was the extraction of the potential phishing target. When attackers craft their samples, they try to trick users into believing that the URL is genuine and belongs to a company, while also obfuscating it enough to fool automated analysis. Extracting the attack target can be useful in many scenarios.
ChatGPT has enough real-world knowledge to know about many internet and financial services and with only a small post-processing step (e.g., merging “Apple” and “iCloud” or removing “LLC” and “Inc”) it does a very good job at extracting them. It was able to identify a target more than half the time. Note that not every URL contains a reference to a target organization. Among the identified companies, there were major tech portals like Facebook, TikTok and Google, marketplaces such as Amazon and Steam, lots of banks from all over the world, from Australia to Russia, cryptocurrency and delivery services.
Some interesting examples include:
URL: hxxps://steancxxxmuunity.com/profiles/765xxxxxxxxxx175
Organization: Steam
Reasoning: The domain name ‘steancxxxmuunity.com’ appears to be a typo-squatting attempt. The correct spelling of the domain should be ‘steamcommunity.com’. The website is trying to impersonate Steam Community, a popular gaming platform…
URL: https://cxxxxxxxxr.com/wp-includes/js/jcrop/cgi/BOfA/Validation/login.php?cmd=login_submit
Organization: Bank of America
Reasoning: The link is suspicious and indicates a phishing attempt because the domain name is not related to the possible targeted company (Bank of America), the URL is also suspicious containing additional characters such as ‘/wp-includes/js/jcrop/cgi’…
URL: http://caseld-10xxxx.info/1/Office365/0fflce3.6.5/live/login.php
Organization: Microsoft
Reasoning: The link’s domain name (caseld-10xx93.info) does not appear to be related to Microsoft’s Office 365 service. The URL path also contains a misspelled version of the word ‘office’ (0fflce) and a suspiciously old version number (3.6.5).
URL: http://adobexxxsignxxxx.duckdns.org/ees/
Organization: Adobe
Reasoning: The domain name ‘duckdns.org’ is a free dynamic DNS provider which is often used by cyber criminals for hosting phishing sites or other malicious content. Additionally, the subdomain adobexxxsignxxxx in the URL appears to imitate the legitimate Adobe service ‘Adobe Sign’, which suggests that the attackers may be targeting Adobe’s customers for information theft or fraud.
Here we see that ChatGPT detects homographic attacks (although it can confuse them with typo-squatting), and correctly states that the domains are not related to the companies. While it fails to untangle that “3.6.5” is an imitation of 365, the explanations are quite reasonable.
Implicit knowledge: reasoning analysis
The examples above are cherry-picked. When cherry-picking with the opposite intention, some funny things happen. The explanations that ChatGPT provides may include:
References to WHOIS, which the model doesn’t have access to:
Finally, if we perform a WHOIS lookup for the domain name, it was registered very recently (2020-10-14) and the registrant details are hidden.
References to content on a website that the model doesn’t have access to either:
the website is asking for user credentials on a non-Microsoft website. This is a common tactic for phishing attacks.
Additionally, the webpage’s layout and design are inconsistent with eBay’s official website, indicating that it may be a fake website designed to impersonate eBay.
There is also no information available on the website’s security and privacy policy. A secure website should provide that information transparently to users.
References to SSL certificate expiration:
The domain ‘duckdns.org’ is not owned by Netflix, and the SSL Certificate is invalid.
Misstatements:
The domain ‘sxxxxxxp.com’ is not associated with Netflix and the website uses ‘http’ protocol instead of ‘https’ (the website uses https)
Revelatory nuggets of cybersecurity information:
The domain name for the URL ‘yxxxx3.com’ appears to be registered in Korea which is a red-flag.
These are funny and highlight the limitations of an LLM: while it can generate reasonable explanations, it is also prone to hallucinations – such as imagining contexts that it does not have access to, or coming up with facts that are just plain wrong.
In general, however, we see that the model demonstrates a non-trivial understanding of the task. Analyzing all the explanations by hand would be laborious, so we do some distant reading here to get a bird’s-eye overview of the reasonings it generates by applying a little bit of old-school NLP.
First of all, we noticed that the model used mostly short, terse sentences with one point per sentence in its explanations. We split the statements into sentences using the NLTK Punkt tokenizer. The resulting sentences were then vectorized using MPNet from the Sentence-Bert package, and then visualized using UMAP. To color the resulting plot, we used DBSCAN clustering. A few iterations of hyperparameter tweaking yield the following result:
Tumblr media
Cluster analysis of explanations provided by ChatGPT. Semantically close reasons form dense clusters.
These dense clusters reveal the most popular phrases that the model provides as explanations. Some of these clusters are:
Mentions of particular target companies.
Recommendations not to click the link.
Certain statements about the website being phish or not.
Statements of uncertainty.
References to HTTPS (or lack thereof) or SSL (usually hallucinations)
Sentences like “This is a common tactic…”
However, some of them reveal “common tactics”:
Use of IPFS: “The URL looks suspicious and appears to be hosted on IPFS which can be used to host malicious content.”
Suspicious elements in path, including homographic attacks: “This link is an example of a homograph attack also known as ‘IDN homograph attack’.”
Elements in path or domain that are related to security and urgency: “Lastly, the URL has a sense of urgency to it, indicating that the user needs to click immediately.”
Use of dynamic DNS: “The domain name ‘duckdns.org’ is a free dynamic DNS provider that can be used by anyone to create custom subdomains, which makes it easy for attackers to use for phishing emails or websites impersonating legitimate companies”.
An email in the parameters: “Additionally, the email address used as a query parameter may indicate a potential phishing attempt.”
Mention of an organization in the path or subdomain while the domain is not related to the organization: “The link appears suspicious as it contains a subdomain that is not related to Amazon.”
Elements in the domain that look autogenerated: “Also, the path ‘bafyxxxxxxxxxxxxxxxxxx43ky’ suggests that this is an autogenerated link.”
These points make a lot of sense. This means that the training data for ChatGPT contained enough cybersecurity material for the LLM to learn the telltale signs of phishing attempts and use this knowledge to analyze specific cases, which is very impressive.
Conclusion
As we have seen, for a zero-shot system the results are amazing – it is hard to argue with, even for a die-hard sceptic. Anecdotal examples, especially the cherry-picked ones, are also very impressive. This is why, when we talk about real-world performance, a screenshot from Twitter is not a solid basis for decision-making. This is especially true in cybersecurity, where applying automated scenarios to our tasks means going far beyond “good” or even “impressive” performance.
As with many other use cases, the core detection task is only part of the problem. Creating signatures or blocklist records, validating and testing them, delivering them, ensuring quality of service is less glamorous, but just as important, and here we need a lot of hard work done by humans to get things rolling.
Moreover, the use of LLM means applying a whole new set of skills. We could have improved the results, both in detection, target organization extraction and reasoning, by trying various things:
Using a stronger model, such as GPT-4
Using special prompt-designing patterns, such as Chain-of-Thought-prompting or Reflect-Critique-Improve
Feeding parts of the HTML along with the URL
Adjusting the system prompt and sampling parameters, such as temperature, etc.
However, this means abandoning the hope of a zero-shot AI that “just works”. Also, using LLMs means dealing with their current idiosyncrasies, such as the demonstrated tendency to hallucinate (manifested by coming up with imaginary explanations and referring to page texts it has never seen), randomness of output, as well as possible prompt injection attacks.
On the other hand, if we look at the organization extraction and reasoning, we have to admit that it performs on par with what we would expect from an intern-level phishing analyst: it is good, but never leave it without supervision! Therefore, it is possible to use this type of technology to assist flesh-and-blood analysts by highlighting suspicious parts of the URL and suggesting possible attack targets. It could also be used in weak supervision pipelines to improve classic ML pipelines.
All in all, while impressive, ChatGPT and LLMs are not yet ready to radically change the cybersecurity game, at least as far as phishing detection is concerned. At Kaspersky, we continue to investigate cutting-edge technologies and study their potential impact on both cyberattackers and defenders. Let’s see what happens next.
0 notes
prokopetz · 9 months
Text
MMO with an integrated AI, but it never actually says anything, it just analyses the vocabulary and phrasing of player chatter and bans you if it detects OOC on public channels.
2K notes · View notes
nixcraft · 4 months
Text
I asked Google "who ruined Google" and they replied honestly using their AI, which is now forced on all of us. It's too funny not to share!
Tumblr media
1K notes · View notes
hbbisenieks · 1 year
Text
ok, i've gotta branch off the current ai disc horse a little bit because i saw this trash-fire of a comment in the reblogs of that one post that's going around
Tumblr media
[reblog by user makiruz (i don't feel bad for putting this asshole on blast) that reads "So here's the thing: every Diane Duane book that I have is stolen, I downloaded it illegally from the Internet; and I am not sorry, I am a thief of books and I don't think I'm doing anything wrong, ideas are not property, they should be free to be used by anyone as they were before the invention of capitalism; for that reason I don't believe it's wrong to use books to train AI models"]
this is asshole behavior. if you do this and if you believe this, you are a Bad Person full stop.
"Capitalism" as an idea is more recent than commerce, and i am So Goddamn Tired of chuds using the language of leftism to justify their shitty behavior. and that's what this is.
like, we live in a society tm
if you like books but you don't have the means to pay for them, the library exists! libraries support authors! you know what doesn't support authors? stealing their books! because if those books don't sell, then you won't get more books from that author and/or the existing books will go out of print! because we live under capitalism.
and like, even leaving aside the capitalism thing, how much of a fucking piece of literal shit do you have to be to believe that you deserve art, that you deserve someone else's labor, but that they don't deserve to be able to live? to feed and clothe themselves? sure, ok, ideas aren't property, and you can't copyright an idea, but you absolutely can copyright the Specific Execution of an idea.
so makiruz, if you're reading this, or if you think like this user does, i hope you shit yourself during a job interview. like explosively. i hope you step on a lego when you get up to pee in the middle of the night. i hope you never get to read another book in your whole miserable goddamn life until you disabuse yourself of the idea that artists are "idea landlords" or whatever the fuck other cancerous ideas you've convinced yourself are true to justify your abhorrent behavior.
4K notes · View notes
carriesthewind · 24 days
Text
"Reviewers told the report’s authors that AI summaries often missed emphasis, nuance and context; included incorrect information or missed relevant information; and sometimes focused on auxiliary points or introduced irrelevant information. Three of the five reviewers said they guessed that they were reviewing AI content.
The reviewers’ overall feedback was that they felt AI summaries may be counterproductive and create further work because of the need to fact-check and refer to original submissions which communicated the message better and more concisely."
Fascinating (the full report is linked in the article). I've seen this kind of summarization being touted as a potential use of LLMs that's given a lot more credibility than more generative prompts. But a major theme of the assessors was that the LLM summaries missed nuance and context that made them effectively useless as summaries. (ex: “The summary does not highlight [FIRM]’s central point…”)
The report emphasizes that better prompting can produce better results, and that new models are likely to improve the capabilities, but I must admit serious skepticism. To put it bluntly, I've seen enough law students try to summarize court rulings to say with confidence that in order to reliably summarize something, you must understand it. A clever reader who is good at pattern recognition can often put together a good-enough summary without really understanding the case, just by skimming the case and grabbing and repeating the bits that look important. And this will work...a lot of the time. Until it really, really doesn't. And those cases where the skim-and-grab method won't work aren't obvious from the outside. And I just don't see a path forward right now for the LLMs to do anything other than skim-and-grab.
Moreover, something that isn't even mentioned in the test is the absence of possibility of follow up. If a human has summarized a document for me and I don't understand something, I can go to the human and say, "hey, what's up with this?" It may be faster and easier than reading the original doc myself, or they can point me to the place in the doc that lead them to a conclusion, or I can even expand my understanding by seeing an interpretation that isn't intuitive to me. I can't do that with an LLM. And again, I can't really see a path forward no matter how advanced the programing is, because the LLM can't actually think.
544 notes · View notes
Text
How plausible sentence generators are changing the bullshit wars
Tumblr media
This Friday (September 8) at 10hPT/17hUK, I'm livestreaming "How To Dismantle the Internet" with Intelligence Squared.
On September 12 at 7pm, I'll be at Toronto's Another Story Bookshop with my new book The Internet Con: How to Seize the Means of Computation.
Tumblr media
In my latest Locus Magazine column, "Plausible Sentence Generators," I describe how I unwittingly came to use – and even be impressed by – an AI chatbot – and what this means for a specialized, highly salient form of writing, namely, "bullshit":
https://locusmag.com/2023/09/commentary-by-cory-doctorow-plausible-sentence-generators/
Here's what happened: I got stranded at JFK due to heavy weather and an air-traffic control tower fire that locked down every westbound flight on the east coast. The American Airlines agent told me to try going standby the next morning, and advised that if I booked a hotel and saved my taxi receipts, I would get reimbursed when I got home to LA.
But when I got home, the airline's reps told me they would absolutely not reimburse me, that this was their policy, and they didn't care that their representative had promised they'd make me whole. This was so frustrating that I decided to take the airline to small claims court: I'm no lawyer, but I know that a contract takes place when an offer is made and accepted, and so I had a contract, and AA was violating it, and stiffing me for over $400.
The problem was that I didn't know anything about filing a small claim. I've been ripped off by lots of large American businesses, but none had pissed me off enough to sue – until American broke its contract with me.
So I googled it. I found a website that gave step-by-step instructions, starting with sending a "final demand" letter to the airline's business office. They offered to help me write the letter, and so I clicked and I typed and I wrote a pretty stern legal letter.
Now, I'm not a lawyer, but I have worked for a campaigning law-firm for over 20 years, and I've spent the same amount of time writing about the sins of the rich and powerful. I've seen a lot of threats, both those received by our clients and sent to me.
I've been threatened by everyone from Gwyneth Paltrow to Ralph Lauren to the Sacklers. I've been threatened by lawyers representing the billionaire who owned NSOG roup, the notoroious cyber arms-dealer. I even got a series of vicious, baseless threats from lawyers representing LAX's private terminal.
So I know a thing or two about writing a legal threat! I gave it a good effort and then submitted the form, and got a message asking me to wait for a minute or two. A couple minutes later, the form returned a new version of my letter, expanded and augmented. Now, my letter was a little scary – but this version was bowel-looseningly terrifying.
I had unwittingly used a chatbot. The website had fed my letter to a Large Language Model, likely ChatGPT, with a prompt like, "Make this into an aggressive, bullying legal threat." The chatbot obliged.
I don't think much of LLMs. After you get past the initial party trick of getting something like, "instructions for removing a grilled-cheese sandwich from a VCR in the style of the King James Bible," the novelty wears thin:
https://www.emergentmind.com/posts/write-a-biblical-verse-in-the-style-of-the-king-james
Yes, science fiction magazines are inundated with LLM-written short stories, but the problem there isn't merely the overwhelming quantity of machine-generated stories – it's also that they suck. They're bad stories:
https://www.npr.org/2023/02/24/1159286436/ai-chatbot-chatgpt-magazine-clarkesworld-artificial-intelligence
LLMs generate naturalistic prose. This is an impressive technical feat, and the details are genuinely fascinating. This series by Ben Levinstein is a must-read peek under the hood:
https://benlevinstein.substack.com/p/how-to-think-about-large-language
But "naturalistic prose" isn't necessarily good prose. A lot of naturalistic language is awful. In particular, legal documents are fucking terrible. Lawyers affect a stilted, stylized language that is both officious and obfuscated.
The LLM I accidentally used to rewrite my legal threat transmuted my own prose into something that reads like it was written by a $600/hour paralegal working for a $1500/hour partner at a white-show law-firm. As such, it sends a signal: "The person who commissioned this letter is so angry at you that they are willing to spend $600 to get you to cough up the $400 you owe them. Moreover, they are so well-resourced that they can afford to pursue this claim beyond any rational economic basis."
Let's be clear here: these kinds of lawyer letters aren't good writing; they're a highly specific form of bad writing. The point of this letter isn't to parse the text, it's to send a signal. If the letter was well-written, it wouldn't send the right signal. For the letter to work, it has to read like it was written by someone whose prose-sense was irreparably damaged by a legal education.
Here's the thing: the fact that an LLM can manufacture this once-expensive signal for free means that the signal's meaning will shortly change, forever. Once companies realize that this kind of letter can be generated on demand, it will cease to mean, "You are dealing with a furious, vindictive rich person." It will come to mean, "You are dealing with someone who knows how to type 'generate legal threat' into a search box."
Legal threat letters are in a class of language formally called "bullshit":
https://press.princeton.edu/books/hardcover/9780691122946/on-bullshit
LLMs may not be good at generating science fiction short stories, but they're excellent at generating bullshit. For example, a university prof friend of mine admits that they and all their colleagues are now writing grad student recommendation letters by feeding a few bullet points to an LLM, which inflates them with bullshit, adding puffery to swell those bullet points into lengthy paragraphs.
Naturally, the next stage is that profs on the receiving end of these recommendation letters will ask another LLM to summarize them by reducing them to a few bullet points. This is next-level bullshit: a few easily-grasped points are turned into a florid sheet of nonsense, which is then reconverted into a few bullet-points again, though these may only be tangentially related to the original.
What comes next? The reference letter becomes a useless signal. It goes from being a thing that a prof has to really believe in you to produce, whose mere existence is thus significant, to a thing that can be produced with the click of a button, and then it signifies nothing.
We've been through this before. It used to be that sending a letter to your legislative representative meant a lot. Then, automated internet forms produced by activists like me made it far easier to send those letters and lawmakers stopped taking them so seriously. So we created automatic dialers to let you phone your lawmakers, this being another once-powerful signal. Lowering the cost of making the phone call inevitably made the phone call mean less.
Today, we are in a war over signals. The actors and writers who've trudged through the heat-dome up and down the sidewalks in front of the studios in my neighborhood are sending a very powerful signal. The fact that they're fighting to prevent their industry from being enshittified by plausible sentence generators that can produce bullshit on demand makes their fight especially important.
Chatbots are the nuclear weapons of the bullshit wars. Want to generate 2,000 words of nonsense about "the first time I ate an egg," to run overtop of an omelet recipe you're hoping to make the number one Google result? ChatGPT has you covered. Want to generate fake complaints or fake positive reviews? The Stochastic Parrot will produce 'em all day long.
As I wrote for Locus: "None of this prose is good, none of it is really socially useful, but there’s demand for it. Ironically, the more bullshit there is, the more bullshit filters there are, and this requires still more bullshit to overcome it."
Meanwhile, AA still hasn't answered my letter, and to be honest, I'm so sick of bullshit I can't be bothered to sue them anymore. I suppose that's what they were counting on.
Tumblr media Tumblr media Tumblr media
If you'd like an essay-formatted version of this post to read or share, here's a link to it on pluralistic.net, my surveillance-free, ad-free, tracker-free blog:
https://pluralistic.net/2023/09/07/govern-yourself-accordingly/#robolawyers
Tumblr media
Image: Cryteria (modified) https://commons.wikimedia.org/wiki/File:HAL9000.svg
CC BY 3.0
https://creativecommons.org/licenses/by/3.0/deed.en
2K notes · View notes
Text
192 notes · View notes
Text
Been thinking recently about the goings-on with Duolingo & AI, and I do want to throw my two cents in, actually.
There are ways in which computers can help us with languages, certainly. They absolutely should not be the be-all and end-all, and particularly for any sort of professional work I am wholly in favour of actually employing qualified translators & interpreters, because there's a lot of important nuances to language and translation (e.g. context, ambiguity, implied meaning, authorial intent, target audience, etc.) that a computer generally does not handle well. But translation software has made casual communication across language barriers accessible to the average person, and that's something that is incredibly valuable to have, I think.
Duolingo, however, is not translation software. Duolingo's purpose is to teach languages. And I do not think you can be effectively taught a language by something that does not understand it itself; or rather, that does not go about comprehending and producing language in the way that a person would.
Whilst a language model might be able to use probability & statistics to put together an output that is grammatically correct and contextually appropriate, it lacks an understanding of why, beyond "statistically speaking, this element is likely to come next". There is no communicative intent behind the output it produces; its only goal is mimicking the input it has been trained on. And whilst that can produce some very natural-seeming output, it does not capture the reality of language use in the real world.
Because language is not just a set of probabilities - there are an infinite array of other factors at play. And we do not set out only to mimic what we have seen or heard; we intend to communicate with the wider world, using the tools we have available, and that might require deviating from the realm of the expected.
Often, the most probable output is not actually what you're likely to encounter in practice. Ungrammatical or contextually inappropriate utterances can be used for dramatic or humorous effect, for example; or nonstandard linguistic styles may be used to indicate one's relationship to the community those styles are associated with. Social and cultural context might be needed to understand a reference, or a linguistic feature might seem extraneous or confusing when removed from its original environment.
To put it briefly, even without knowing exactly how the human brain processes and produces language (which we certainly don't), it's readily apparent that boiling it down to a statistical model is entirely misrepresentative of the reality of language.
And thus a statistical model is unlikely to be able to comprehend and assist with many of the difficulties of learning a language.
A statistical model might identify that a learner misuses some vocabulary more often than others; what it may not notice is that the vocabulary in question are similar in form, or in their meaning in translation. It might register that you consistently struggle with a particular grammar form; but not identify that the root cause of the struggle is that a comparable grammatical structure in your native language is either radically different or nonexistent. It might note that you have trouble recalling a common saying, but not that you lack the cultural background needed to understand why it has that meaning. And so it can identify points of weakness; but it is incapable of addressing them effectively, because it does not understand how people think.
This is all without considering the consequences of only having a singular source of very formal, very rigid input to learn from, unable to account for linguistic variation due to social factors. Without considering the errors still apparent in the output of most language models, and the biases they are prone to reproducing. Without considering the source of their data, and the ethical considerations regarding where and how such a substantial sample was collected.
I understand that Duolingo wants to introduce more interactivity and adaptability to their courses (and, I suspect, to improve their bottom line). But I genuinely think that going about it in this way is more likely to hinder than to help, and wrongfully prioritises the convenience of AI over the quality and expertise that their existing translators and course designers bring.
463 notes · View notes
rebloggerandy · 2 months
Text
Tumblr media
people DIED. (aka l lawliet died)
Tumblr media
the undersketch…. Yay!!!!!
153 notes · View notes
linolinoing · 7 months
Text
Tumblr media Tumblr media
★ lino linoing moments: 2/∞
2K notes · View notes
prokopetz · 3 months
Text
I wish generative AI really was as bad with specific quantities of things as the jokes claim it is. Imagine AI-generated 9/11 inspiration porn on Facebook with the wrong number of towers.
915 notes · View notes