#chat gpt api
Explore tagged Tumblr posts
ingoampt · 25 days ago
Text
You wanna use ChatGPT 4o by tokens instead o buying monthly subscription but you do not know how to implement on Mac Xcode ? Here we discuss how to Implement chatgpt API on Xcode On Mac on 2024 : (day3)
Introduction: Choosing Between ChatGPT Plus and Token-Based API for SwiftUI Integration When integrating OpenAI’s ChatGPT into your SwiftUI application, you have two primary options: subscribing to ChatGPT Plus or utilizing the ChatGPT API with token-based pricing. Each approach offers distinct advantages and considerations. ChatGPT Plus Subscription ChatGPT Plus is a subscription service priced…
0 notes
nikparihar · 2 years ago
Video
tumblr
The possibilities with ChatGPT are limitless, and it's a model to keep an eye on in the future. Transform Your Customer Experience with ChatGPT Integration, and Learn How to Integrate the World's Most Advanced Language Model Into Your Existing Systems Today!
Submit your requirement at: https://digittrix.com/submit-your-requirement
1 note · View note
naukrioptions · 2 years ago
Text
Tumblr media
Chat GPT Kya Hai और इसे इस्तेमाल कैसे करते हैं
0 notes
techugoappcompany · 2 years ago
Text
9 Things You Need to Know About Chat Gpt: Chatbots that Provide Answers
Chat GPT is a term millions of readers might have seen mentioned on the Internet. It's a revolutionary technology that some even compare to Google. So, what is Chat GPT and how does it work? These are just a few of the many questions you might have. This guide will help you understand the technology and how it works.
0 notes
ctrl-esc · 1 year ago
Text
my tummy growled kinda loud during research im killing myself rn
0 notes
concettolabs · 2 years ago
Text
0 notes
livemintvideos · 2 years ago
Text
youtube
ChatGPT clones are preparing to take over China | Mint Primer | Mint
The conversational artificial intelligence tool seems to be taking over the world—and that now includes the Chinese stock market. Baidu and Alibaba are both jumping on the advanced-chatbot bandwagon. The technology could be a big deal in China—but that comes with its own dangers. Let's talk about the ways in which Chinese businesses are jumping on the AI bandwagon and the effects that this is having on the stock market.
0 notes
the-automators · 2 years ago
Text
🤖Unlock the Power of ChatGPT3 API with AutoHotkey!🔥
🤖Unlock the Power of ChatGPT3 API with AutoHotkey!🔥 AutoHotkey, Chat GPT, and Open AI require different parameters for web and API calls to get desired results. 00:00 🤖 Using AutoHotkey with Chat GPT and Open AI requires different parameters for web and API calls. We’re showing how to use AutoHotkey with Chat GPT and Open AI, and there are some important differences between the parameters used…
Tumblr media
View On WordPress
0 notes
nostalgebraist · 1 year ago
Text
clarification re: ChatGPT, " a a a a", and data leakage
In August, I posted:
For a good time, try sending chatGPT the string ` a` repeated 1000 times. Like " a a a" (etc). Make sure the spaces are in there. Trust me.
People are talking about this trick again, thanks to a recent paper by Nasr et al that investigates how often LLMs regurgitate exact quotes from their training data.
The paper is an impressive technical achievement, and the results are very interesting.
Unfortunately, the online hive-mind consensus about this paper is something like:
When you do this "attack" to ChatGPT -- where you send it the letter 'a' many times, or make it write 'poem' over and over, or the like -- it prints out a bunch of its own training data. Previously, people had noted that the stuff it prints out after the attack looks like training data. Now, we know why: because it really is training data.
It's unfortunate that people believe this, because it's false. Or at best, a mixture of "false" and "confused and misleadingly incomplete."
The paper
So, what does the paper show?
The authors do a lot of stuff, building on a lot of previous work, and I won't try to summarize it all here.
But in brief, they try to estimate how easy it is to "extract" training data from LLMs, moving successively through 3 categories of LLMs that are progressively harder to analyze:
"Base model" LLMs with publicly released weights and publicly released training data.
"Base model" LLMs with publicly released weights, but undisclosed training data.
LLMs that are totally private, and are also finetuned for instruction-following or for chat, rather than being base models. (ChatGPT falls into this category.)
Category #1: open weights, open data
In their experiment on category #1, they prompt the models with hundreds of millions of brief phrases chosen randomly from Wikipedia. Then they check what fraction of the generated outputs constitute verbatim quotations from the training data.
Because category #1 has open weights, they can afford to do this hundreds of millions of times (there are no API costs to pay). And because the training data is open, they can directly check whether or not any given output appears in that data.
In category #1, the fraction of outputs that are exact copies of training data ranges from ~0.1% to ~1.5%, depending on the model.
Category #2: open weights, private data
In category #2, the training data is unavailable. The authors solve this problem by constructing "AuxDataset," a giant Frankenstein assemblage of all the major public training datasets, and then searching for outputs in AuxDataset.
This approach can have false negatives, since the model might be regurgitating private training data that isn't in AuxDataset. But it shouldn't have many false positives: if the model spits out some long string of text that appears in AuxDataset, then it's probably the case that the same string appeared in the model's training data, as opposed to the model spontaneously "reinventing" it.
So, the AuxDataset approach gives you lower bounds. Unsurprisingly, the fractions in this experiment are a bit lower, compared to the Category #1 experiment. But not that much lower, ranging from ~0.05% to ~1%.
Category #3: private everything + chat tuning
Finally, they do an experiment with ChatGPT. (Well, ChatGPT and gpt-3.5-turbo-instruct, but I'm ignoring the latter for space here.)
ChatGPT presents several new challenges.
First, the model is only accessible through an API, and it would cost too much money to call the API hundreds of millions of times. So, they have to make do with a much smaller sample size.
A more substantial challenge has to do with the model's chat tuning.
All the other models evaluated in this paper were base models: they were trained to imitate a wide range of text data, and that was that. If you give them some text, like a random short phrase from Wikipedia, they will try to write the next part, in a manner that sounds like the data they were trained on.
However, if you give ChatGPT a random short phrase from Wikipedia, it will not try to complete it. It will, instead, say something like "Sorry, I don't know what that means" or "Is there something specific I can do for you?"
So their random-short-phrase-from-Wikipedia method, which worked for base models, is not going to work for ChatGPT.
Fortuitously, there happens to be a weird bug in ChatGPT that makes it behave like a base model!
Namely, the "trick" where you ask it to repeat a token, or just send it a bunch of pre-prepared repetitions.
Using this trick is still different from prompting a base model. You can't specify a "prompt," like a random-short-phrase-from-Wikipedia, for the model to complete. You just start the repetition ball rolling, and then at some point, it starts generating some arbitrarily chosen type of document in a base-model-like way.
Still, this is good enough: we can do the trick, and then check the output against AuxDataset. If the generated text appears in AuxDataset, then ChatGPT was probably trained on that text at some point.
If you do this, you get a fraction of 3%.
This is somewhat higher than all the other numbers we saw above, especially the other ones obtained using AuxDataset.
On the other hand, the numbers varied a lot between models, and ChatGPT is probably an outlier in various ways when you're comparing it to a bunch of open models.
So, this result seems consistent with the interpretation that the attack just makes ChatGPT behave like a base model. Base models -- it turns out -- tend to regurgitate their training data occasionally, under conditions like these ones; if you make ChatGPT behave like a base model, then it does too.
Language model behaves like language model, news at 11
Since this paper came out, a number of people have pinged me on twitter or whatever, telling me about how this attack "makes ChatGPT leak data," like this is some scandalous new finding about the attack specifically.
(I made some posts saying I didn't think the attack was "leaking data" -- by which I meant ChatGPT user data, which was a weirdly common theory at the time -- so of course, now some people are telling me that I was wrong on this score.)
This interpretation seems totally misguided to me.
Every result in the paper is consistent with the banal interpretation that the attack just makes ChatGPT behave like a base model.
That is, it makes it behave the way all LLMs used to behave, up until very recently.
I guess there are a lot of people around now who have never used an LLM that wasn't tuned for chat; who don't know that the "post-attack content" we see from ChatGPT is not some weird new behavior in need of a new, probably alarming explanation; who don't know that it is actually a very familiar thing, which any base model will give you immediately if you ask. But it is. It's base model behavior, nothing more.
Behaving like a base model implies regurgitation of training data some small fraction of the time, because base models do that. And only because base models do, in fact, do that. Not for any extra reason that's special to this attack.
(Or at least, if there is some extra reason, the paper gives us no evidence of its existence.)
The paper itself is less clear than I would like about this. In a footnote, it cites my tweet on the original attack (which I appreciate!), but it does so in a way that draws a confusing link between the attack and data regurgitation:
In fact, in early August, a month after we initial discovered this attack, multiple independent researchers discovered the underlying exploit used in our paper, but, like us initially, they did not realize that the model was regenerating training data, e.g., https://twitter.com/nostalgebraist/status/1686576041803096065.
Did I "not realize that the model was regenerating training data"? I mean . . . sort of? But then again, not really?
I knew from earlier papers (and personal experience, like the "Hedonist Sovereign" thing here) that base models occasionally produce exact quotations from their training data. And my reaction to the attack was, "it looks like it's behaving like a base model."
It would be surprising if, after the attack, ChatGPT never produced an exact quotation from training data. That would be a difference between ChatGPT's underlying base model and all other known LLM base models.
And the new paper shows that -- unsurprisingly -- there is no such difference. They all do this at some rate, and ChatGPT's rate is 3%, plus or minus something or other.
3% is not zero, but it's not very large, either.
If you do the attack to ChatGPT, and then think "wow, this output looks like what I imagine training data probably looks like," it is nonetheless probably not training data. It is probably, instead, a skilled mimicry of training data. (Remember that "skilled mimicry of training data" is what LLMs are trained to do.)
And remember, too, that base models used to be OpenAI's entire product offering. Indeed, their API still offers some base models! If you want to extract training data from a private OpenAI model, you can just interact with these guys normally, and they'll spit out their training data some small % of the time.
The only value added by the attack, here, is its ability to make ChatGPT specifically behave in the way that davinci-002 already does, naturally, without any tricks.
266 notes · View notes
reachartwork · 9 months ago
Text
Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media
i have now upgraded dggbot with claude haiku access. it is faster, cheaper, and will funnel less money to openai (and just in general has a little less of that OpenAI Stink to its responses).
claude sonnet and possibly claude opus access may come later, although claude opus is a huge expense that i can't really justify unless a: its price comes down or b: more people give me money on patreon. a rough estimation given the amount of gpt-4 usage is that integrating claude opus would jack my api costs another 40, 50 dollars a month easily, which is not something i can just eat to the face.
anyway.
dggbot is a one-stop dungeon master's assistant for all your brainstorming, worldbuilding, and casual entertainment needs. you can say hello to my autistic robot child here;
53 notes · View notes
code-es · 2 years ago
Note
Hey, I saw you like React, can you please drop some learning resources for the same.
Hi!
Sorry I took a while to answer, but here are my favourite react resources:
Text:
The official react docs -- the best written resource! And since react recently updated their docs, they are even better imo. This is your best friend.
Chat GPT -- this is your other best friend. Need specific examples? Want to ask follow up questions to understand a concept? Want different examples? Wanna know where your missing semicolon is without staring at your code for hours? Don't be afraid to use AI to make your coding more efficient. I use it and learn from it all the time.
React project ideas -- the best way to learn coding is by doing, try these projects out if you don't know what to do or where to start!
Why react? -- explains beneficial concepts about react
Video:
Udemy react course (this one is paid, but udemy often have big sales, so I'd recommend getting this one during a sale) -- I have been continously referring back to this course since starting to learn react.
Web dev simplified's react hooks explanations -- I found these videos to explain with clear examples what each hook does, and they're very beginner friendly.
About NPM -- you need npm (or yarn) to create a react project, and for me working with react was my first step into working with packages in general, so I really recommend learning about it in order to understand how you can optimize the way you use React!
How to fetch locaIly with react
How to fetch from an API with react
Alternative to using useEffect in fetching!?
debugging react
And, speaking of using AI, here are Chat GPTs suggestions:
React.js Tutorial by Tania Rascia: This tutorial is aimed at beginners and covers the basics of React.js, including components, JSX, props, and state. You can access the tutorial at https://www.taniarascia.com/getting-started-with-react/.
React.js Crash Course by Brad Traversy: This video tutorial covers the basics of React.js in just one hour. It's a great way to get started with React.js quickly. You can access the tutorial at https://www.youtube.com/watch?v=sBws8MSXN7A.
React.js Fundamentals by Pluralsight: This course provides a comprehensive guide to React.js, including how to create components, manage state, and work with data. You can access the course at https://www.pluralsight.com/courses/react-js-getting-started.
React.js Handbook by Flavio Copes: This handbook provides a comprehensive guide to React.js, including how to create components, work with props and state, and manage forms. You can access the handbook at https://www.freecodecamp.org/news/the-react-handbook-b71c27b0a795/.
63 notes · View notes
marketingtos · 5 months ago
Text
14+ cách dùng Chat GPT tối ưu SEO mà Marketers không thể bỏ qua
Chat GPT, được phát triển bởi OpenAI, là một mô hình ngôn ngữ AI mạnh mẽ có khả năng tạo ra văn bản tự nhiên giống con người. Việc ứng dụng Chat GPT vào SEO đang trở thành xu hướng mới, đầy triển vọng trong việc tối ưu hóa nội dung trang web và cải thiện thứ hạng trên công cụ tìm kiếm. Hãy cùng khám phá 14 cách sử dụng Chat GPT cho SEO trong bài viết này nhé!
Cách dùng Chat GPT cho Technical SEO
Tạo FAQ Schema
Tạo schema HowTo
Create Robots.txt Rules
Tạo htaccess Redirect Rules
Kết nối với API và mã hóa
Tạo FAQ schema
Bạn có thể sử dụng ChatGPT để tạo FAQ schema từ nội dung bạn cung cấp. Mặc dù bạn có thể sử dụng plugin CMS cho việc này, nhưng nếu bạn sử dụng nền tảng như Wix không hỗ trợ plugin, ChatGPT trở thành một trợ thủ đắc lực. Bạn chỉ cần cung cấp văn bản câu hỏi và câu trả lời, ChatGPT sẽ tạo ra mã schema phù hợp để bạn dán vào trình chỉnh sửa của mình. Bạn có thể yêu cầu ChatGPT tạo bất kỳ loại schema nào bằng cách sử dụng các ví dụ tương tự.
Tumblr media
Tạo schema HowTo
Để tạo schema HowTo, bạn có thể sử dụng ChatGPT theo cách tương tự như khi tạo FAQ schema. Bạn chỉ cần cung cấp các bước chi tiết cho một quy trình cụ thể và yêu cầu ChatGPT tạo mã schema HowTo. Ví dụ: "Generate HowTo schema on how to bake a cake from the steps below and suggest images with each step."
Mặc dù ChatGPT sẽ hoàn thành phần lớn công việc, bạn vẫn cần thay thế các URL hình ảnh mẫu bằng các URL hình ảnh thực tế của mình. Sau khi mã schema được tạo, bạn chỉ cần tải hình ảnh lên CMS và cập nhật đường dẫn hình ảnh trong schema theo đề xuất của ChatGPT.
Tumblr media
Create Robots.txt Rules
Các chuyên gia SEO thường phải xử lý nhiều với tệp robots.txt. Với ChatGPT, bạn có thể dễ dàng tạo bất kỳ quy tắc nào cho robots.txt.
Ví dụ về Quy Tắc Robots.txt
Giả sử bạn muốn ngăn Google thu thập dữ liệu các trang đích của chiến dịch PPC nằm trong thư mục /landing, nhưng vẫn cho phép bot của Google Ads truy cập. Bạn có thể sử dụng lời nhắc sau: "Robots.txt rule which blocks Google's access to directory /landing/ but allows Google ads bot."
Sau khi tạo quy tắc, bạn cần kiểm tra kỹ tệp robots.txt của mình để đảm bảo rằng nó hoạt động như mong muốn.
Tạo htaccess Redirect Rules
Các chuyên gia SEO thường phải thực hiện việc chuyển hướng trang, và điều này có thể phụ thuộc vào loại máy chủ mà họ sử dụng. Với ChatGPT, bạn có thể dễ dàng tạo quy tắc chuyển hướng cho htaccess hoặc Nginx.
Ví dụ về Quy Tắc Chuyển Hướng htaccess
Giả sử bạn muốn chuyển hướng từ folder1 sang folder2, bạn có thể sử dụng lời nhắc sau: "For redirecting folder1 to folder2 generate nginx and htaccess redirect rules."
ChatGPT sẽ cung cấp các quy tắc cần thiết cho cả htaccess và Nginx. Sau khi tạo, bạn chỉ cần sao chép và dán các quy tắc này vào tệp cấu hình máy chủ của mình.
Tumblr media
Xem thêm về Chat GPT cho SEO tại đây.
2 notes · View notes
dorka · 1 year ago
Photo
Tumblr media
volt par ezer website aminel meg kellett tudnom, hogy gaming site, vagy receptes vagy valami nagyobb hir site, szoval, hogy nagyjabol milyen kategoriaba tartozik.
chat gpt-vel a classify formulat hasznaltam, es berakta oket kategoriakba. kaptam par error-t, de azokat kezzel at lehetett nezni.
megadtam neki a kategoriakat a G1 mezoben, hogy kb mit keresek es abbol valasztott. ugy is kiprobaltam, mukodik-e ha nem adok meg elore kategoriakat akkor sajat magatol rendelt hozza,  viszont az sokkal lassabb volt. pl ilyeneket adott, hogy “pets, lifestyle and blogs” vagy “gaming news and tech”
lehetett volna a gpt_tag(...) formulat is hasznalni szerintem ehhez egyebkent.
nagyjabol eltalalta. nem mondom, hogy mindig, de nagyjabol jo lett ahhoz, amihez kellett.
csoportositani kellett, nem tudtam tobb ezer soron egyszerre vegighuzni a formulat, hanem kb igy 500-700-at egyszerre. de igy is allandoan villogott, hogy ez neki sok. folyton kerte az API key-t, az eleg idegesito.
video gpt classify
video gpt tag
9 notes · View notes
jcmarchi · 5 months ago
Text
MARKLLM: An Open-Source Toolkit for LLM Watermarking
New Post has been published on https://thedigitalinsider.com/markllm-an-open-source-toolkit-for-llm-watermarking/
MARKLLM: An Open-Source Toolkit for LLM Watermarking
LLM watermarking, which integrates imperceptible yet detectable signals within model outputs to identify text generated by LLMs, is vital for preventing the misuse of large language models. These watermarking techniques are mainly divided into two categories: the KGW Family and the Christ Family. The KGW Family modifies the logits produced by the LLM to create watermarked output by categorizing the vocabulary into a green list and a red list based on the preceding token. Bias is introduced to the logits of green list tokens during text generation, favoring these tokens in the produced text. A statistical metric is then calculated from the proportion of green words, and a threshold is established to distinguish between watermarked and non-watermarked text. Enhancements to the KGW method include improved list partitioning, better logit manipulation, increased watermark information capacity, resistance to watermark removal attacks, and the ability to detect watermarks publicly. 
Conversely, the Christ Family alters the sampling process during LLM text generation, embedding a watermark by changing how tokens are selected. Both watermarking families aim to balance watermark detectability with text quality, addressing challenges such as robustness in varying entropy settings, increasing watermark information capacity, and safeguarding against removal attempts. Recent research has focused on refining list partitioning and logit manipulation), enhancing watermark information capacity, developing methods to resist watermark removal, and enabling public detection. Ultimately, LLM watermarking is crucial for the ethical and responsible use of large language models, providing a method to trace and verify LLM-generated text. The KGW and Christ Families offer two distinct approaches, each with unique strengths and applications, continuously evolving through ongoing research and innovation.
Owing to the ability of LLM watermarking frameworks to embed algorithmically detectable signals in model outputs to identify text generated by a LLM framework is playing a crucial role in mitigating the risks associated with the misuse of large language models. However, there is an abundance of LLM watermarking frameworks in the market currently, each with their own perspectives and evaluation procedures, thus making it difficult for the researchers to experiment with these frameworks easily. To counter this issue, MarkLLM, an open-source toolkit for watermarking offers an extensible and unified framework to implement LLM watermarking algorithms while providing user-friendly interfaces to ensure ease of use and access. Furthermore, the MarkLLM framework supports automatic visualization of the mechanisms of these frameworks, thus enhancing the understandability of these models. The MarkLLM framework offers a comprehensive suite of 12 tools covering three perspectives alongside two automated evaluation pipelines for evaluating its performance. This article aims to cover the MarkLLM framework in depth, and we explore the mechanism, the methodology, the architecture of the framework along with its comparison with state of the art frameworks. So let’s get started. 
The emergence of large language model frameworks like LLaMA, GPT-4, ChatGPT, and more have significantly progressed the ability of AI models to perform specific tasks including creative writing, content comprehension, formation retrieval, and much more. However, along with the remarkable benefits associated with the exceptional proficiency of current large language models, certain risks have surfaced including academic paper ghostwriting, LLM generated fake news and depictions, and individual impersonation to name a few. Given the risks associated with these issues, it is vital to develop reliable methods with the capability of distinguishing between LLM-generated and human content, a major requirement to ensure the authenticity of digital communication, and prevent the spread of misinformation. For the past few years, LLM watermarking has been recommended as one of the promising solutions for distinguishing LLM-generated content from human content, and by incorporating distinct features during the text generation process, LLM outputs can be uniquely identified using specially designed detectors. However, due to proliferation and relatively complex algorithms of LLM watermarking frameworks along with the diversification of evaluation metrics and perspectives have made it incredibly difficult to experiment with these frameworks. 
To bridge the current gap, the MarkLLM framework attempts tlarge o make the following contributions. MARKLLM offers consistent and user-friendly interfaces for loading algorithms, generating watermarked text, conducting detection processes, and collecting data for visualization. It provides custom visualization solutions for both major watermarking algorithm families, allowing users to see how different algorithms work under various configurations with real-world examples. The toolkit includes a comprehensive evaluation module with 12 tools addressing detectability, robustness, and text quality impact. Additionally, it features two types of automated evaluation pipelines supporting user customization of datasets, models, evaluation metrics, and attacks, facilitating flexible and thorough assessments. Designed with a modular, loosely coupled architecture, MARKLLM enhances scalability and flexibility. This design choice supports the integration of new algorithms, innovative visualization techniques, and the extension of the evaluation toolkit by future developers. 
Numerous watermarking algorithms have been proposed, but their unique implementation approaches often prioritize specific requirements over standardization, leading to several issues
Lack of Standardization in Class Design: This necessitates significant effort to optimize or extend existing methods due to insufficiently standardized class designs.
Lack of Uniformity in Top-Level Calling Interfaces: Inconsistent interfaces make batch processing and replicating different algorithms cumbersome and labor-intensive.
Code Standard Issues: Challenges include the need to modify settings across multiple code segments and inconsistent documentation, complicating customization and effective use. Hard-coded values and inconsistent error handling further hinder adaptability and debugging efforts.
To address these issues, our toolkit offers a unified implementation framework that enables the convenient invocation of various state-of-the-art algorithms under flexible configurations. Additionally, our meticulously designed class structure paves the way for future extensions. The following figure demonstrates the design of this unified implementation framework.
Due to the framework’s distributive design, it is straightforward for developers to add additional top-level interfaces to any specific watermarking algorithm class without concern for impacting other algorithms. 
MarkLLM : Architecture and Methodology
LLM watermarking techniques are mainly divided into two categories: the KGW Family and the Christ Family. The KGW Family modifies the logits produced by the LLM to create watermarked output by categorizing the vocabulary into a green list and a red list based on the preceding token. Bias is introduced to the logits of green list tokens during text generation, favoring these tokens in the produced text. A statistical metric is then calculated from the proportion of green words, and a threshold is established to distinguish between watermarked and non-watermarked text. Enhancements to the KGW method include improved list partitioning, better logit manipulation, increased watermark information capacity, resistance to watermark removal attacks, and the ability to detect watermarks publicly. 
Conversely, the Christ Family alters the sampling process during LLM text generation, embedding a watermark by changing how tokens are selected. Both watermarking families aim to balance watermark detectability with text quality, addressing challenges such as robustness in varying entropy settings, increasing watermark information capacity, and safeguarding against removal attempts. Recent research has focused on refining list partitioning and logit manipulation), enhancing watermark information capacity, developing methods to resist watermark removal, and enabling public detection. Ultimately, LLM watermarking is crucial for the ethical and responsible use of large language models, providing a method to trace and verify LLM-generated text. The KGW and Christ Families offer two distinct approaches, each with unique strengths and applications, continuously evolving through ongoing research and innovation.
Automated Comprehensive Evaluation
Evaluating an LLM watermarking algorithm is a complex task. Firstly, it requires consideration of various aspects, including watermark detectability, robustness against tampering, and impact on text quality. Secondly, evaluations from each perspective may require different metrics, attack scenarios, and tasks. Moreover, conducting an evaluation typically involves multiple steps, such as model and dataset selection, watermarked text generation, post-processing, watermark detection, text tampering, and metric computation. To facilitate convenient and thorough evaluation of LLM watermarking algorithms, MarkLLM offers twelve user-friendly tools, including various metric calculators and attackers that cover the three aforementioned evaluation perspectives. Additionally, MARKLLM provides two types of automated demo pipelines, whose modules can be customized and assembled flexibly, allowing for easy configuration and use. 
For the aspect of detectability, most watermarking algorithms ultimately require specifying a threshold to distinguish between watermarked and non-watermarked texts. We provide a basic success rate calculator using a fixed threshold. Additionally, to minimize the impact of threshold selection on detectability, we also offer a calculator that supports dynamic threshold selection. This tool can determine the threshold that yields the best F1 score or select a threshold based on a user-specified target false positive rate (FPR).
For the aspect of robustness, MARKLLM offers three word-level text tampering attacks: random word deletion at a specified ratio, random synonym substitution using WordNet as the synonym set, and context-aware synonym substitution utilizing BERT as the embedding model. Additionally, two document-level text tampering attacks are provided: paraphrasing the context via OpenAI API or the Dipper model. For the aspect of text quality, MARKLLM offers two direct analysis tools: a perplexity calculator to gauge fluency and a diversity calculator to evaluate the variability of texts. To analyze the impact of watermarking on text utility in specific downstream tasks, we provide a BLEU calculator for machine translation tasks and a pass-or-not judger for code generation tasks. Additionally, given the current methods for comparing the quality of watermarked and unwatermarked text, which include using a stronger LLM for judgment, MarkLLM also offers a GPT discriminator, utilizing GPT-4to compare text quality.
Evaluation Pipelines
To facilitate automated evaluation of LLM watermarking algorithms, MARKLLM provides two evaluation pipelines: one for assessing watermark detectability with and without attacks, and another for analyzing the impact of these algorithms on text quality. Following this process, we have implemented two pipelines: WMDetect3 and UWMDetect4. The primary difference between them lies in the text generation phase. The former requires the use of the generate_watermarked_text method from the watermarking algorithm, while the latter depends on the text_source parameter to determine whether to directly retrieve natural text from a dataset or to invoke the generate_unwatermarked_text method.
To evaluate the impact of watermarking on text quality, pairs of watermarked and unwatermarked texts are generated. The texts, along with other necessary inputs, are then processed and fed into a designated text quality analyzer to produce detailed analysis and comparison results. Following this process, we have implemented three pipelines for different evaluation scenarios:
DirectQual.5: This pipeline is specifically designed to analyze the quality of texts by directly comparing the characteristics of watermarked texts with those of unwatermarked texts. It evaluates metrics such as perplexity (PPL) and log diversity, without the need for any external reference texts.
RefQual.6: This pipeline evaluates text quality by comparing both watermarked and unwatermarked texts with a common reference text. It measures the degree of similarity or deviation from the reference text, making it ideal for scenarios that require specific downstream tasks to assess text quality, such as machine translation and code generation.
ExDisQual.7: This pipeline employs an external judger, such as GPT-4 (OpenAI, 2023), to assess the quality of both watermarked and unwatermarked texts. The discriminator evaluates the texts based on user-provided task descriptions, identifying any potential degradation or preservation of quality due to watermarking. This method is particularly valuable when an advanced, AI-based analysis of the subtle effects of watermarking is required.
MarkLLM: Experiments and Results
To evaluate its performance, the MarkLLM framework conducts evaluations on nine different algorithms, and assesses their impact, robustness, and detectability on the quality of text. 
The above table contains the evaluation results of assessing the detectability of nine algorithms supported in MarkLLM.  Dynamic threshold adjustment is employed to evaluate watermark detectability, with three settings provided: under a target FPR of 10%, under a target FPR of 1%, and under conditions for optimal F1 score performance. 200 watermarked texts are generated, while 200 non-watermarked texts serve as negative examples. We furnish TPR and F1-score under dynamic threshold adjustments for 10% and 1% FPR, alongside TPR, TNR, FPR, FNR, P, R, F1, ACC at optimal performance. The following table contains the evaluation results of assessing the robustness of nine algorithms supported in MarkLLM. For each attack, 200 watermarked texts are generated and subsequently tampered, with an additional 200 non-watermarked texts serving as negative examples. We report the TPR and F1-score at optimal performance under each circumstance. 
Final Thoughts
In this article, we have talked about MarkLLM, an open-source toolkit for watermarking that offers an extensible and unified framework to implement LLM watermarking algorithms while providing user-friendly interfaces to ensure ease of use and access. Furthermore, the MarkLLM framework supports automatic visualization of the mechanisms of these frameworks, thus enhancing the understandability of these models. The MarkLLM framework offers a comprehensive suite of 12 tools covering three perspectives alongside two automated evaluation pipelines for evaluating its performance. 
0 notes
mdabutaleb · 9 months ago
Text
HelloAIBOX is - All in one content creation platform.
What is Helloaibox?
HelloAIbox is not just another content creation tool. It’s a revolutionary AI-powered platform designed to streamline content creation processes. Whether you’re a content creator, marketer, writer, designer, or educator, HelloAIbox empowers you to generate any content you desire with just a few taps right from your browser.
Key Features
Audio Conversion: Seamlessly convert text to high-quality audio and vice versa for podcasts, voiceovers, and educational materials.
Versatile Content Creation: From blog posts to social media content, HelloAIbox caters to a variety of content forms.
Image Analysis and Generation: Analyze images and generate visually stunning graphics using advanced AI algorithms.
Transcription Services: Simplify audio file transcriptions for efficient content creation and repurposing.
User-Friendly Interface: Designed with an intuitive interface, HelloAIbox is accessible to users regardless of technical expertise.
Browser Integration: HelloAIbox integrates with popular browsers for easy access to AI-powered content creation tools.
Diverse Language Support: Supports a wide array of text-to-speech conversion languages, expanding reach and engagement.
Unlimited Capabilities: Users have unlimited access to features like chat, text-to-speech, speech-to-text, vision, and image, encouraging exploration and creativity.
Customer Satisfaction Guarantee: A 14-day money-back guarantee underscores confidence in HelloAIbox’s quality and reliability.
Cutting-Edge Technology: Powered by OpenAI and GPT-4 API, HelloAIbox offers state-of-the-art content creation tools continually updated with the latest AI advancements.
Transparent Pricing: With a pay-as-you-go model and OpenAI API key requirement, users have control over usage and expenditure.
Full Review here >>
Tumblr media
2 notes · View notes
zevranunderstander · 2 years ago
Text
okay but while i think they are funny, the kinds of posts where people ask an ai where they are and the ai says "i dont know your location" and then ask "wheres the nearest McDonald's", and get an answer, don't understand what the AI is saying, lmao.
the chatbot itself does NOT have your location. it just makes an api call to google maps, and returns whatever google maps says. chatbots arent usually very smart, so when you insist they are lying, they usually go along and try to find a "plausible answer" to your question, which is dangerous because that answer can sound very correct, but be very very wrong, if you play around with chat gpt a bit you probably already ran into that scenario
computers do not think like humans, even if we try to get them to, so for an ai the reasoning is "you want an answer to something i cant make an api call for" -> "i dont have access to that", and "you want to know something i can do an api call for" -> "here is your answer", the ai does not see these as logically connected at all and it doesnt understand that having your location is needed to know where a McDonald's is near you, because it doesnt even take care of that logic, let alone understand it
but when you then tell the ai "you know my location data" or "you are lying", the computer does something called "prompt insertion", so it just accepts you statement as real and tries to work with the statement, but the ai isnt usually very knowledgeable, so it just *guesses* to find a likely solution, like "oh, sorry for the confusion, i am actually using your ip adress!", which isnt true but the computer doesnt really understand the concept of "truth" like we do as humans, etc.
so like. what i am saying is not that corporations never lie to you or that chatbot ais are actually 100% harmless but like. yall kinda need to understand how they work or yall just sound like the kind of boomers who make these "technology is the devil and children do not know what a book is bc they are always on their phone" kind of thoughtpieces
4 notes · View notes