#it's gonna be really interesting seeing the different things Claire and Charlie teach each other
Explore tagged Tumblr posts
Note
She stays silent for a couple minutes, thinking about what she said. It was clear that she was far too trusting of other people, but.. Something didn't make sense. 624 was clearly older than her, and as of such shouldn't she be more distrustful than Claire is? Her Mom always said– Ah. There's the problem. Why did she even bother retaining anything that woman said? Though, just because she was a bitch doesn't mean she wasn't right about some things. Most people couldn't be trusted– she learned this first hand, after all. Her mother was more distrustful than she was because she was older and had experienced more of the world; that made sense– that was just how the world worked. But 624 was an outlier to that; Despite being old enough to have experienced the world, she was still able to trust people. Why? Was she just an outlier, or was that how things actually worked?
She shook her head slightly to get her thoughts in order. She couldn't afford to spiral about this right now– 624 was probably waiting for a response by now, and she had more questions to ask. She managed to contain her thoughts and feelings for the most part, but 624 could tell something changed. Her tone was much less energetic, and while her questions were still extremely paranoid, it seemed like she was asking them out of familiarity rather than genuine curiosity or to try to accuse that Sebastian was doing anything.
"What do you mean by 'behaving'? What happens if you don't behave? Your deal sounds good, sure, but a deal like that sounds too good to be true. You get protection, a place to stay, and I'd assume food in exchange for a couple savaged items and good behavior? What's the catch?"
Unfortunately for her, asking more questions did not distract her thoughts for nearly as long as she wanted it to. She really needed to find a better distraction, but maybe if she asked enough questions and learned enough information, her head wouldn't have enough room anymore to do this sort of unnecessary thing. Until then, she'd just have to deal with it. Sure, she didn't really know how, but she'd been living– ..She had been around long enough to at least have some ideas.
...She was never going to get used to that, would she? At least her body was still somewhat humanoid, most of the anomalies down here didn't have that. Granted, most of the other anomalies were still alive, but still. At least she had some sort of choice about whether she wanted to be physical or not each morning.
She pulled herself out of her thoughts as she heard 624's machine start talking again. She tensed up slightly as she heard the first sentence, her ear fins flicking downwards as she looked away from her.
"Oh– I'm sorry.. I didn't mean to... Um, anyways, why don't you know your exact age? And it's not that bad down here, I mean it's not great, but it's not a horrible place to end up."
Was that all she was gonna ask? She felt slightly embarrassed that 624 had to ask for something like that, seeing as she should've introduced herself when they first met.. But if she had done that then she might've been able to ask a more risky question, such as the second question she asked. She wasn't entirely sure what she meant by the question, but if she was asking what Claire thought she was, then she doubted she'd enjoy hearing about it. Maybe if she just acknowledged the question but didn't answer it, she wouldn't notice?
She automatically started to give out her full name, before her paranoia kicked back in and she realized that giving a stranger her full last name would be a horribly risky idea. Unfortunately she had already started saying her middle name.. Maybe she could just leave it like that? Most people didn't introduce themselves with all 3 of their names, so it shouldn't be too noticeable if she stops after that.
"Claire... Rose. My name is Claire Rose, you can call me Claire. And for the record, I was not running away from the expendable, I was just.. making a tactical retreat."
There was some rummaging that could be heard in one of the many rooms in the blacksite. An expendable sorted through the drawers as she had done numerous times before, however this time her motives were a bit different.
It had been slow in Sebastian's shop recently, and since she had taken up space there, she had decided to scrounge up some research for his sake.
Unfortunately, it seemed that almost every single room, drawer, locker, shelf, and desk had been looted. She let out an annoyed sigh.
She decided to take a break since she was getting tired after doing this for who knows how long. She leaned against a cold metal wall and slid down it to sit down.
- @exrp624-calisto
After several minutes of silence outside of the room, 624 could hear what sounded like shouting. It was accompanied by two pairs of footsteps quickly approaching the room 624 was in, along with what sounded like a child's laughter.
The noises quickly got closer, until a little girl ran into the room, crouching behind one of the desks while holding a small object close to her. She didn't seem to have noticed 624, as her attention was focused on the door.
The footsteps outside continued for a couple seconds before they slowly moved away, causing the girl to smile, tuck the object in her pocket, and look around the room. She quickly noticed 624, and her smile immediately dropped, her expression now filled with distrust, annoyance, and a small amount of fear.
"Who are you? What do you want? Why are you here?"
#hadal hideout#claire's corner#YESSS IT IS GOOD JOB#do you want the notes on her?#it's gonna be really interesting seeing the different things Claire and Charlie teach each other#sgshs watching the development is gonna be so fun
15 notes
·
View notes
Text
Transcript Episode 40: Making machines learn language - Interview with Janelle Shane
This is a transcript for Lingthusiasm Episode 40: Making machines learn language - Interview with Janelle Shane. It’s been lightly edited for readability. Listen to the episode here or wherever you get your podcasts. Links to studies mentioned and further reading can be found on the Episode 40 show notes page.
[Music]
Lauren: Welcome to Lingthusiasm, a podcast that’s enthusiastic about linguistics! I’m Lauren Gawne.
Gretchen: I’m Gretchen McCulloch. Today, we’re getting enthusiastic about artificial intelligence – teaching computers language – with special guest Dr Janelle Shane, who runs the blog A.I.weirdness.com and is the author of You Look Like a Thing and I Love You, which is a fun new book about A.I. But first, we have some announcements.
Lauren: It’s a new year and we have new, big, exciting plans for the Lingthusiasm Patreon page. We are introducing a Discord, which is an online chat space, for patrons to share their lingthusiasm with their fellow lingthusiasts.
Gretchen: We’ve heard from a lot of you that you got into linguistics because of Lingthusiasm or it reawakened your memories of how much you like linguistics because you did some courses on it way back when and now you wish you could talk about linguistics more. We’re giving you a space where you can talk about linguistics, share your interesting linguistics links that you come across, and talk about them in a space with other lingthusiasm fans. We’re really excited to see what this community becomes. It’s a bit of an experiment, but we think it’ll be really fun to do. You can join the Patreon at the tier where you get bonus episodes as well, and you also have a space to talk about those bonus episodes and the regular Lingthusiasm episodes and any other linguistics things you wanna talk about.
Lauren: We want to see more Lingthusiasm not just online but also on all kinds of things, which is why we are also sending stickers over the next few months to patrons at the Ling-phabet tier. Patrons who are at that tier for three months or more will get stickers that say, “Lingthusiast” on them.
Gretchen: You can stick that to your laptop, your water bottle, your notebook, anything else in your life. Because the original trial run of stickers that we did with the special offer last year were really popular, we thought we’d provide a way for you to do that around the year. You can join that tier on Patreon as well.
Lauren: You can get other items at our lingthusiasm.com/merch page, but the stickers are an exclusive for our patrons.
Gretchen: Thanks to everybody who’s been a patron so far. We’re really excited to see you in the Discord. And we’re excited to get to try that out.
Lauren: Our last exciting announcement is that our patrons also helped us meet a new funding goal, which means that we now have some additional ling-ministration support.
Gretchen: Our fantastic producer Claire, who’s been with us since the very beginning, is also going to be taking on some more of the administration for the podcast, so you’ll see her around a bit on social media and on Patreon. You can listen to a bonus episode with Claire if you’d like to get to know her better as well.
Lauren: Our current bonus is on the future of English and what English might look like in a couple of centuries from now, inspired by Gretchen’s New York Times article.
Gretchen: You can get access to this episode and 34 other bonus episodes – that’s twice as much Lingthusiasm that you can listen to – at patreon.com/lingthusiasm.
[Music]
Gretchen: Hello, Janelle. Welcome to Lingthusiasm!
Janelle: Hi, it’s great to be here.
Lauren: Janelle, we are so excited to have you on the show today to talk about how we can make machines do language.
Gretchen: I think one of the things that we have in common, definitely one of the reasons I enjoy following your blog and Twitter feed and so on, is that both linguists and your approach to A.I. like poking at systems and seeing where they break.
Janelle: Yeah, for sure.
Gretchen: In case some people aren’t already following you on all of the internets, I wanna give people an idea of some of the stuff that you have tried to make break.
Lauren: Janelle, in your work, for people who haven’t seen it, you take large data sets of particular sets of terms or particular language genres, I guess, and then you feed them into an artificial intelligence, and we’ll talk about what that is later, and then it spits out these delightfully whimsical outputs. It takes inspiration from the data set that it’s given. I have a sustained history of laughing inappropriately loudly on public transport while reading your blog because the results are always so entertaining. Gretchen, do you have a favourite to share with us so I can chortle inappropriately?
Gretchen: Lauren, I think we should start with ice cream because I know you have a deep and abiding love of ice cream, and Janelle has come up with ice cream flavours.
Lauren: Yes! Yes, yes, yes. Janelle, where did the ice cream data come from? Did you have a list of ice cream flavours that someone gave you or…?
Janelle: Yeah. In this case, it was a group of middle-schoolers, actually. There’s a school in Austin, Texas, called Kealing Middle School where there is a group of students in the coding classes who decided that – they saw my blog. They wanted to do it too, and they wanted to generate ice cream flavours.
Lauren: Aww.
Gretchen: That’s so great!
Janelle: The thing is, I had looked at that, and I’m like, “Oh, this would be cool.” Then, I looked online and I say, “I need examples of existing ice cream flavours” because the A.I. has to have something to imitate. It doesn’t know about ice cream flavours unless I have some to tell it about. They’re scattered around. There wasn’t any big master list of them. So, I kinda said, “Oh, well. I guess that’s not gonna work.” Then, these middle-schoolers kicked my butt because they went and there was, I dunno, dozens of them – 50, 60 of them. Like, a lot of them. Each of them went and collected a few from this site or that site. Each one site would only have a few at a time. They had to manually copy and paste to this data set. They just, through the sheer numbers and having the time to do it, they put together this amazing data set of existing ice cream flavours. These middle-schoolers ended up getting about 1600 different ice cream flavours. Whereas, I only managed to get together 200. With the data set that much bigger, it made a huge difference. They started generating pretty amusing flavours.
Gretchen: I’ve got the blogpost up about the ice cream flavours from the middle school students, and some of them are really good. There are these whimsical flavours like “It’s Sunday” and “Cherry Poet” and “Brittle Cheesecake” and “Honey Vanilla Happy.” These seem like kind of reasonable ice cream flavours, right?
Lauren: I’d be open to ordering a “Vanilla Nettle.”
Gretchen: “Cherry Cherry Cherry.” If you like cherries, this is the flavour for you. There are also some weirder flavours from this data set like, “Chocolate Finger” and “Caramel Book” and –
Lauren: “Washing Chocolate.”
Gretchen: “Texas Charlie Covered Stunt.” Then, there’s this even weirder category, “Nuts with Mattery,” “Brown Crunch,” “Cookies and Green.”
Lauren: Aww, so close, and yet…
Gretchen: “Mango Cats.”
Lauren: They’re weird to us because of the semantics of them – just to be linguist-y and spoil the moment for a second – but they still are English words, or they look like something we’d recognise as English words, even though I don’t think “mattery” is a word that I know of. I think it’s worth saying artificial intelligence doesn’t know what ice cream is, right, it’s just using this list of flavours to figure out what kind of patterns could fit into that list.
Janelle: Exactly. It’s doing it at a very basic level. Like, what kinds of letters tend to come after other letters? What letters are we often finding in combination? Which letters are we never finding in combination? It’ll learn frequent words like “chocolate” or something. It’ll learn how to spell that after some false starts during training, but, yeah, without any concept of what chocolate is.
Gretchen: If it ends up with something like “Vervette’s Caramel Borfle,” it learned “caramel” but who “Vervette” and “borfle” are, I don’t know. That’s just randomly combining some letters in ways that are probable as English words.
Janelle: Yeah, it’s like a kid who learns how to write and immediately starts putting down letters on paper like, “Is this a word? Is this a word? How do you pronounce this?”
Lauren: Because obviously we train the neural nets that are children’s brains by talking to them a lot and giving them more input and taking them to school and doing those kind of things, but a neural net-type artificial intelligence that we’re doing this kind of training by giving it lots of data, how does it know if it is generating something that is more or less English? Is there a little thing in the computer saying, “Good work, Computer”?
Janelle: What it’s trying to do, how it knows it’s making any progress at all, is its job is to try and predict the next letter or the next combination of letters. Then, it just checks its prediction against some example of real texts that it hasn’t seen before that it saved aside to check itself with and said, “Okay, did I guess close or am I still way off? Am I going to have to change my internal structure so that my guess would’ve been better and see if, going forward, that’s gonna be an improvement?” It’s like a trial and error, guess and check.
Gretchen: When you look at the different sorts of stages – because it goes through several different generations, right? It might start out with just “Here’s a bunch of Es because E is really common.” And then the check is like, “Yeah, but you could do better.”
Janelle: Yeah. It’s like guessing lots of Es is more correct than guessing lots of question marks or lots of Qs. Yeah, it has to say, “Oh, well, maybe I could work in an S from time to time. What do you know? That’s slightly more correct,” and proceeds from there.
Lauren: So, that’s how it learns “chocolate”? Because it might go in with CH and HC, and every time it goes, “Is HC right? Is HC right?” And the data set is like, “Naw, not really.” But when it’s got the CH for an ice cream list, it’s like, getting lots of positive feedback that that’s gonna appear in “chocolate” and “chip” and “cherry.”
Janelle: Yeah, exactly. The process, yeah, it is a lot different from the human child learning language because it’s taking place, really, in isolation with no other context. It’s as if you are setting somebody in a room with just a few dictionaries or a few encyclopaedias written in a language that they don’t understand. It’s even harder for the A.I. because it doesn’t have a concept of what language even is to start out with. It’s all just guessing what comes next in this sequence of arcane symbols.
Gretchen: It doesn’t have a sense of what’s probable in the world either, right?
Janelle: Yeah.
Gretchen: Because you have some of these flavours like “Peanut Butter Slime,” which those are all English words, it’s just it would make a terrible ice cream because slime and peanut butter and ice cream are not things that go together.
Janelle: Yeah, exactly. Or, if I’m getting it to generate Halloween costumes, it’ll come up with “zombie school bus.” It’s like, “Okay, zombie school bus. There’s magic school bus. Why is that more likely than zombie school bus?” We know. It doesn’t.
Gretchen: It doesn’t have any of that real-world knowledge that you can do – or like “Mango Cats.” What does it mean for a cat to be mango? I don’t know.
Lauren: If an artificial intelligence gained sentience, it’s likely it actually wouldn’t be a very good linguistics student because it doesn’t really understand the concept of sounds. It doesn’t seem to have a lot of understanding of the structure of a sentence. We talk in one episode about syntax essentially being this structure that we can hang other bits of sentences off. It has much more of a flat, just looking at the patterns on the surface kind of approach to language.
Janelle: Yeah. Keep in mind, too, the amount of computing power it has to work with is so much less than what it takes for sentience or anything near human level. If you’re looking at raw computing power, the neural nets we have today are somewhere around the level of an earthworm.
Gretchen: Maybe an earthworm would like peanut butter slime-flavoured ice cream.
Janelle: I’ll give all my Peanut Butter Slime to the earthworm.
Lauren: That’s very generous of you.
Gretchen: This was one of the analogies that I liked in your book, which I enjoyed very much. You Look Like a Thing and I Love You – the title of this book was named after another neural net, right?
Janelle: Mm-hmm. This was a phrase generated from a neutral net that was trying to do pick-up lines.
Gretchen: I guess that could be a pick-up line.
Lauren: We have things like ice cream names, and you’ve done death metal names, and Halloween costumes, and colours, and these are all three or four words at most. Pick-up lines is moving into more of the sentence/couple of sentences-type of thing. As the amount of words you’re trying to generate grows longer, how much more difficult does that make it for the artificial intelligence?
Janelle: It makes it a lot more difficult. When I was generating the ice cream flavours and things, I was deliberately going exclusively for these kinds of problems where it would just have to do a couple words at a time because when it tried to do longer sentences or phrases, it would not make sense. One of the things is that the A.I. I was working with at the time didn’t have very much memory at all. So, it would kind of lose track of things that happened a couple of words ago. It wasn’t really able to figure out then how to make a sentence work or make phrases work. It was a bit beyond it. The pick-up lines was definitely a case of, “This is too hard for the A.I.” It struggles, okay, not just the “How do you make a grammatical phrase?” but also “How do you do puns? How do you do innuendo?” These were all things that require a lot of background knowledge that this thing just did not have.
Gretchen: Another example that you use in the book is with recipes, right? It can figure out that you need to list some ingredients, you need to list some instructions, but then those instructions won’t contain the ingredients that were previously mentioned, necessarily, because it doesn’t remember that those are what it listed before.
Janelle: Yeah, we’ll see that. You’ll get something that on the surface at first glance looks like a recipe and then, when you actually read more closely, you’re like, “Wait a second. It has no idea what’s going on. It’s forgotten its ingredients. It’s telling me to chop the milk into cubes. Something’s going on here.”
Lauren: There’s something very confident about the way it fakes its ability.
Janelle: Yeah. Well, I mean, part of the reason it sounds so confident is that it’s copying what humans have written, and humans generally didn’t tend to write in the middle of a recipe, “Uhh, wait a second. I have no idea what’s going on.” It learns that is not a phrase that appears in a recipe, so it’s going to express any kind of confusion. It’s just going to plough ahead with its best guess at what a human would say.
Gretchen: This is where, I think, your famous giraffe question comes from.
Janelle: Ah, yes. I love this chatbot. It’s a chatbot called Visual Chatbot. It’s designed to answer questions about an image. You show it an image and then it comes up with a caption, and then you can have this back and forth conversation with the bot about what it sees in the image. You think that premise would be fairly straightforward, but there are weird quirks that arise just because this thing is trying to copy how humans ask and answer questions about images. The training data is important. In this case, the training data is a whole bunch of people hired through Amazon Mechanical Turk to take turns asking and answering questions about images. Then, the chatbot was trained on answers. So, given this kind of image, given what the question is, what would humans tend to answer in this situation? Some weird quirks emerge just from that premise. One of the things that they wanted to make sure to avoid was this thing called priming. People tend to ask questions to which the answer tends to be “yes.” They found in an early version of this chatbot that they could get 80% accuracy just by answering “yes” to every single yes-or-no question.
Gretchen: Uh-oh!
Janelle: They ended up having to hide the image from the person who was asking questions, so that helped a little bit. Now, it’s about 50/50 if you ask a given question whether it’s going to answer yes or no to that. One of the things that they weren’t able to correct was this interesting thing with the giraffes. What happens is, if you ask the question, “How many giraffes do you see?” the chatbot will almost always return a non-zero answer. It can be doing great about an image and, “Oh, yeah. This is a person on a snowboard. There’s snow,” up until the point where you ask, “And how many giraffes are there?” It will answer, “Three” or “Two” or “Too many to count.”
Lauren: I think it’s just worth clarifying, just to really make this clear, this is not a data set in which giraffes appear in every image.
Janelle: True. Yes. I would love to see that data set – snowboarding with giraffes.
Lauren: “Yeah, there are three giraffes.”
Gretchen: Giraffe snowboarders – this is possible. Because I know this is an ongoing joke that you have, I tested with an image of the cover of my book which, as I think as everyone knows, contains zero giraffes because it’s not about giraffes. Visual chatbot told me that it is a sign that says, “Unknown, unknown, unknown,” on the side of it which I guess is not the worst for a cover that has text in it. It just can’t read the text – sure. Then, I said, “How many giraffes?” and Visual Chatbot said, “Two.”
Janelle: It comes from this thing is copying how humans tend to answer this question. In its examples of humans hired through Amazon Mechanical Turk, the humans had not tended to ask the question, “How many giraffes are there?” when they didn’t know if there were any giraffes.
Gretchen: Right. You’d say something like, “Are there any giraffes?” The person says, “Yes,” and then you say, “How many giraffes?”
Janelle: Exactly. If you ask the chatbot, “Are there any giraffes?” it will answer, “No,” quite often. But then, if you follow up with the question, “And how many giraffes do you see?” it’ll say, “Five.”
Lauren: This approach reminds me of, as Gretchen said earlier, as soon as I get my hands on some kind of thing that’s doing this back and forth question asking or as soon as I’m let loose on a Google Translate, I think it’s a very linguist-brain thing to try and find these points at which the computer can’t handle language properly. It’s always great when you have an approach that understands how humans actually interact with this data that helps explain why you end up getting these really strange answers and why it’s good to have linguists help design artificial intelligence or chatbots and these things because the way humans choose to do language is very different to what we think of as the nice, straightforward application at the end.
Janelle: There’s so many start-ups that are trying to have some kind of bot that you can interact with in an open-ended manner. Then, they run into trouble. Facebook M is one of these services that was discontinued last year because they thought it was going to be like a digital assistant, lives in your browser, you can ask it to do things like look up show times and stuff. But what people ended up asking for was the weirdest, most complicated things. One guy documented, oh yeah, he asked it if it could arrange for a parrot to visit his office. I mean, you’re not gonna prepare for that when you’re training one of these chatbots. It turned out to be the chatbot kept needing humans to step in and rescue it. They realised it was going to be too expense because they were always gonna need these humans.
Lauren: This is a company that has no shortage of resources to throw at a problem like this.
Gretchen: I think if you tell people, “You can interact with this like a human,” they think they can do things like make a request for parrots because humans can understand a request for parrots. Even if I can’t personally deliver you a parrot, at least I understand this request. Whereas, a chatbot, if parrots aren’t in the training data, then parrots don’t exist.
Janelle: This is one of the things, too, that makes it hard to tell the difference between humans and computers when you’re chatting with them. If you’re in a customer service situation, they try to really narrow the context in which you can ask questions and not make it open-ended, especially if they’re going to invisibly use bots because they don’t want you asking for parrots out of the blue.
Gretchen: Right. It’s like when you call into a customer service line, it’s like, “Press 1 to talk to this,” “Press 2 to talk to that,” they really wanna keep your options constrained because then the computer can help you. It’s when it’s open-ended and people start behaving as if it can do anything that a human can do that you start running into problems.
Janelle: Yeah. What you’ll get is you’ll get these companies that’ll build chatbots where it’ll start out as an open-end conversation with something that is secretly a bot but it hasn’t said it is. But then if it gets confused, it’ll invisibly hand control over to a human. That can be problematic because then, if the customer by then is frustrated and thinks they’re dealing with a robot, the poor human employee may not have a very pleasant time with that conversation. What I would really love – what I would love linguists to design for me – is some kind of very polite, in-context way to ask a question or interact with one of these bots that would reveal whether it is a human or a computer, some kind of shibboleth that is never – not asking about his favourite Star Wars character, because that’s impolite if you’re talking to a human employee – but some phrasing or something that’s tricky.
Gretchen: That’s an interesting question because I think, a lot of times, asking for something that’s a little bit non-cooperative, like “How many giraffes?” out of the blue, is maybe gonna deliver that answer. But it’s also gonna be confusing and annoying to a human.
Janelle: Exactly. My default has always been, as soon as a human – because better be polite to a computer than rude to a human sort of thing – but it would be lovely to be able to tell the difference. Companies should just tell us or have a “Talk to a human” button or something, but yeah.
Gretchen: You’re looking for an inverse Turing Test. A Turing Test is this classic test in computer conversation where, if a computer can fool a human into thinking that they’re talking to another human, then they’ve passed the Turing Test. There are ways of passing the Turing Test if you constrain the context enough. Or if you tell people that they’re talking to a child or they’re talking to somebody who’s on some drugs or something like this – or a philosopher – then they’ll be more likely to believe – these are the three kinds of people that a robot can be. But if you try to do something that’s very practical or that is grounded very much in reality, then people aren’t as willing to be generous with the computer’s misinterpretations. Janelle, your blog post that you make the neural nets do funny things, they’re really funny. And yet, I have a feeling that it’s not only that the neural nets are funny, it’s also that you’re really good at spotting the funny bits and bringing them out to a blog post for us.
Janelle: Yeah, there’s a lot of human storytelling work that goes on. How is this going to be interesting? Where is the funny thing that it’s doing? Sometimes, the ratio is like 100 to 1 of things that aren’t very funny that it generates and the one thing that I’m like, “Oh, yeah. I’m posting that.”
Lauren: Because, I guess, the thing about it being a computer process is that you could just generate infinite numbers of nonsensical ice cream names, but a lot of those are too nonsensical to even be particularly amusing.
Janelle: Yeah. It also has a tendency to – especially if we’re dealing with something short-ish and simple-ish like ice cream, then it’ll generate something and it says, “Mint Chocolate Chip,” and I’m like, “Oh. It just copied that.” It learned that one.
Lauren: Learnt that one too well.
Janelle: Yeah. Because as far as these A.I.s are concerned, exactly copying my examples is a perfect solution to the question I’m asking of it. If it can predict every single word, word for word, in the text file that I gave it, then that is a perfect score. Sometimes, it’s almost like a battle for me to try to get it to be just bad enough at the task.
Gretchen: Not so bad that it’s incoherent, but bad enough that humans can resolve what it’s supposed to mean and it’s still funny.
Lauren: One application of this name-generation process you’ve been doing was when you created a list of craft beer names and a company actually took one of those names to create a beer. Was that a process that you embarked on because you thought this was a good place to experiment with creative naming or how did that come about?
Janelle: This was one of the things where I happened to know somebody who was friends with the owner of the brewery, and I thought, “Well, this would be fun to actually get one of these beers to exist in real life,” because people keep saying that the names A.I.s are generating are pretty good. In the case of craft beer names, there’ve actually been companies who have taken each other to court over having beer names that were too close to one another. There’s this need to maybe show there’re ways to still come up with new beer names and we hadn’t exhausted all the possibilities yet.
Lauren: It’s really a collaboration between you and the A.I. where you are curating all of the names that it gives you in order to find the ones that have that perfect balance of following the rules you’ve given it but with a bit of a lateral thinking approach.
Janelle: Yeah. Just the right amount of lateral thinking as well, too. Sometimes, it’s way off the mark and comes up with, I don’t know, “Farm Fight,” as a name for beer. I’m like, “Well…”
Gretchen: Here are some of the beer names that were on the list like “Dang River” and “Binglezard Flack” and “Toe Deal” and “Devil’s Chard.”
Lauren: Some of them I can almost imagine being a craft beer. In the end, it was “The Fine Stranger” that was bottled and labelled.
Gretchen: That’s good. I think the examples are very funny, but there’s also an important part of making a lot of funny examples, right? It’s not just to entertain people, even though it is very entertaining.
Janelle: There’s people using these practically as their business in coming up with brand names. I did this one beer. There’s a whole art to naming brands, and it’s not just coming up with the names, but it’s also this whole framing of “Because of the etymology of this and that” or “Because the computer mashed this together with that.” There’s definitely a storytelling element to it as well. When I was going through this process with the beer, I was definitely getting the sense of, “Oh, yeah. I’ve got all these great names.” Any – not any one of these – but many of them would make great beer names, and the beer would sell well, and the brewery would be happy with it. But, yeah, how do I put it on the marquee, put it on the silver platter and make them actually say, “Yes. The authority has spoken. This is the name.”
Gretchen: Beyond brand names, there’re also lots of other practical applications people are using artificial intelligence for now, whether that’s machine translation or self-driving cars or all of these sorts of very practical aspects to things. It’s hard to see the inside of a self-driving car, and what that looks like, and how it’s making problems for things. Whereas, it’s easier to see what happens when you make a bunch of weird ice cream flavours.
Janelle: Exactly. That’s why I like doing these tests. Some of the biggest applications for A.I. is in doing financial predictions or looking for fraudulent logins and things like that where it, maybe, is comprehensible to somebody who’s in that field, but the way that they’re making mistakes in that field is not very obvious, not very interesting, if you’re not right there in that field working with these kinds of numbers all the time. If it’s making a mistake on an ice cream flavour, that is much faster to see, “Oh, yeah, it’s doing pattern matching. Oh, yeah, it doesn’t understand what it’s doing.” A lot of these same mistakes really do translate over to commercial applications.
Lauren: We’ve talked a little bit about how you have to curate the output because it will just keep spitting out silly ice cream names forever. We’ve talked a little bit about some of the problems with the types of data that are put into these processes in terms of, you know, if you don’t set it up very well and you have people answering questions about giraffes in a way that the A.I. is going to implement weirdly. There are bigger and more serious implications for thinking about the kind of data that we are using to create artificial intelligence processes not just with language but particularly for this topic looking at the kinds of data that people use to build artificial intelligence. You talk about this a bit in your book. Where do you see some of the biggest challenges in creating good A.I.?
Janelle: One of the things is, remember these A.I.s have about the raw computing power of an earthworm and they don’t have the context, then, to realise that there are some things that the humans do that they probably shouldn’t be copying. Completely unknowingly, they will copy things like racial/gender discrimination and they won’t know that that’s what they’re doing. They won’t know that that’s a bad thing. They just really can’t comprehend it.
Gretchen: It’s kind of like the chatbot that figures, “Oh, if I just answer yes to everything, I’ll get 80% accuracy,” even though it’s not actually useful, communicatively, to just answer yes to everything.
Janelle: It’s like this is exactly what you have asked for but is not necessarily what you want. When we give it a bunch of human decisions on resume sorting, for example, and we tell it, “Copy these human decisions,” then these algorithms can look and say, “Well, this is a very difficult problem, but looks like all of the applicants who’ve gone to this one college tend not to be hired” and “Oh, that college is a women’s college” and it is implementing the gender discrimination that it’s seeing in its training data because it saw this signal, didn’t know what it was, only knew that it was helping it copy the humans a little better.
Gretchen: Right. If the humans are already having their sets of bias and if I can magnify that bias, like if you have a human that’s answering “yes” 80% and now the A.I.’s answering “yes” 100% of the time, it doesn’t know what it’s doing.
Janelle: Exactly. Yeah. They are so good about being sneaky about – you may think that if you set up a resume sorting algorithm saying, “Well, we’re just not gonna tell it what gender any of these applicants are” and it is very good at figuring this out not just through colleges but through if somebody has their extra-curriculars listed and “women’s soccer team” is on there, it will glom onto that. Or even subtleties with word choice and phrasing, it will start using those kinds of trends and use them to copy the humans better.
Gretchen: I’m thinking about a different resume study which showed that people – they had the same sorts of resumes – people with a white-sounding name versus with a black-sounding name were more likely to get called back for interviews. You can imagine in the A.I. that it actually just learns how to predict based on someone’s name. Like, “Oh, we’ve hired a lot of people named ‘Mike’ at this company.” We all know these companies that have a whole bunch of people named “Mike” and “Adam” and stuff. “Maybe we should just only interview the people named ‘Mike.’”
Janelle: It will absolutely do that sort of thing. You see there’s a lot of companies out there that are offering resume screening but knowing what I know about how commonly these A.I.s can pick up on this bias I would not want one of these programs screening resumes for my company, for example. Or I would, at the very least, demand to see the evidence that this thing is not making biased decisions.
Gretchen: Right. That’s a sort of way of saying, “Okay, well, if this A.I. still thinks ‘Slime’ is a good flavour for ice cream, then really how much can we trust it to make a good decision about resumes?”
Janelle: I think that’s almost the counter-intuitive danger about A.I. in a lot of ways. It’s not that it’s too smart and it’s going to take over the world and it’s not gonna obey humans – no. The problem is that it’s not smart enough to realise what we’re actually trying to ask it to do.
Lauren: It keeps obeying us too well in ways that we don’t want it to.
Janelle: Yeah, if it can. When it comes to language generation, language processing, human language is really, really difficult. So, that particular domain, more than a lot of others, you’ll see these A.I.s that are really struggling to get a handle on what the humans are saying.
Lauren: It’s good news that linguists will have jobs for a little bit longer.
Janelle: Yeah.
Gretchen: One of the questions that really came up in my mind when we were thinking about interviewing you was, can the A.I. take my job as the co-host of the podcast, Lingthusiasm? If Lauren and I want to go live on a beach somewhere, can we replace, as co-hosts, a bot-generated Gretchen and Lauren to run this podcast? Lauren, what do you think?
Lauren: We actually put this to Janelle a few years ago, back when we started releasing transcripts for our early episodes. About three years ago, in 2016/2017, we didn’t have many episodes, so we didn’t have a lot of data to work with, but also it seems like in these last few years, the ability to process larger text has gotten better. Is that the case, Janelle?
Janelle: Yeah, that’s definitely the case. The kinds of things I was doing in 2016 – generating words, short phrases, paint colour names, ice cream flavour names, those sorts of things – I wouldn’t think of tackling entire sentences or, let alone, sentences that follow one another that make sense. But now, just pretty much in the last year, there’s been some really big A.I.s that have been trained on millions of pages from the internet. They are much better at generating text. They can generate grammatical sentences most of the time now. Most of the words that they use are real words. They still don’t understand what they’re saying. I think, yeah, it has gotten better.
Gretchen: You can potentially take something that’s been trained on, let’s say, most of the English pages of the internet and then fine-tune it on a smaller data set to try to push it more in the direction of just, for a random example, Lingthusiasm episodes.
Janelle: Yes. If, hypothetically, I had many episodes worth of Lingthusiasm transcripts, I might be able to make a robo-Gretchen and a robo-Lauren.
Lauren: Do you know what else has happened in the last couple of years, Gretchen?
Gretchen: I think we’ve produced a lot more episodes of Lingthusiasm.
Lauren: Between the main episodes and the bonus episodes, we have 70 transcripts, which is over 800 pages of data. Janelle, would that be enough to have a go at creating a robo-Gretchen and a robo-Lauren?
Janelle: There’s one way to find out.
Gretchen: Oh, boy! Let’s do some live neural netting on the podcast.
Janelle: All right! What could possibly go wrong?
Gretchen: Okay. Can you walk us through what are you doing right now on your computer?
Lauren: Janelle’s gonna share her computer with us so that we can see what’s happening, but we might get some screen grabs as we go through.
Gretchen: We may put some links into the show notes if there’s stuff that’s visual that’s hard to see as well.
Janelle: What we’re looking at right now, this is actually just a browser window in Chrome. What I’m looking at is a thing that is an interface to an A.I. that’s being hosted on Google’s computers right now. Google is graciously allowing people to use their powerful computers that are pretty specialised for these kinds of calculations. Even though I am working on a fairly ordinary laptop, I’m able to connect to some fairly serious firepower here.
Lauren: It’s really interesting to get to see under the hood of making an A.I. run. I think we’ll give people a bit of a taste of that here, but if you want more details and more of an explanation of how we made “Robot Lingthusiasm,” we’ll make that into a bonus episode.
Janelle: So, here we are. I’ve connected to this A.I. I’ve downloaded a copy of it. Now, I’m going to upload lingthusiasm.txt. I’m going to upload this file of 2.4MB of you two talking. Let’s – okay. Okay. We’ve got our first sample out here right now. “It is already conversations.”
Lauren: Except it’s just conversation by someone called “Gina.”
Gretchen: Maybe this is the hybrid between the two of us – our merged alter-ego? Shall we read a few of these lines, Lauren? I think we should each start with “Gina” as we’re reading the lines.
Lauren: Okay.
Gretchen: First line. This is the first of Gina’s lines. “Gina: Yeah, that’s why I’m gonna be honest with you.”
Lauren: “Gina: We’re not always going to be like, ‘Oh, we don’t know why we did that.’ That’s why.”
Gretchen: “Gina: I know. The people who’ve come to me to ask me are gonna be like, “Yeah, I didn’t know who was getting up and down the stairs and going to a doctor’s appointment.”
Lauren: Okay. So, not very Lingthusiasm in content there, but I like where Gina’s going.
Gretchen: Yeah. I like that it’s getting a dialogue thing. We’re pleased to announce that, in fact, your Lingthusiasm hosts will be replaced by robots but only for one episode and it will be bonus and it will be very, very funny. You can go to patreon.com/lingthusiasm to listen to the next bonus episode, which will be written by robots and performed by you and me, Lauren.
Lauren: To listen to that bonus episode, check out patreon.com/lingthusiasm. You can hear us reading some of our favourite examples. We will also give patrons access to some of those reams of examples so you can find ones that make you chortle as well. It’ll have some screenshots from the A.I.-building process for patrons as well. Thank you so much, Janelle, for taking us through the process of actually training a neural net artificial intelligence and showing us some of the pitfalls and some of the challenges and for talking to us today. If people want to read more about how artificial intelligence is making the world weirder and more wonderful, and some of the challenges and limitations, your book is You Look Like a Thing and I Love You. I loved reading it.
Gretchen: Yes, I can personally attest that I got my copy the night before my book came out when I was very distracted. It successfully distracted me for several hours while I was waiting for that countdown, midnight, to have that happen. It has lots of fun pictures of weird things that the A.I.s are doing as well. Thanks again for coming on the show.
Janelle: Oh, it was my pleasure. This was a lot of fun. I loved listening to your very strange generated conversations.
[Music]
Gretchen: For more Lingthusiasm and links to all the things mentioned in this episode, including extended versions of A.I.-generated Lingthusiasm transcripts, go to lingthusiasm.com. You can listen to us on Apple Podcasts, Google Podcasts, Spotify, SoundCloud, or wherever else you get your podcasts, and you can follow @Lingthusiasm on Twitter, Facebook, Instagram, and Tumblr. You can get IPA scarves, IPA ties, IPA socks, and other Lingthusiasm merch at lingthusiasm.com/merch. I can be found as @GretchenAMcC on Twitter, my blog is AllThingsLinguistic.com, and my book about internet language is called Because Internet.
Lauren: I tweet and blog as Superlinguo. Janelle Shane is @JanelleCShane on Twitter, her blog is aiweirdness.com, and her book is You Look Like a Thing and I Love You. To listen to bonus episodes and help keep the show ad-free, go to patreon.com/lingthusiasm or follow the links from our website. Recent bonus topics include future English, onomatopoeia, and linguistics fiction. If you can’t afford to pledge, that’s okay too. We really appreciate it if you could recommend Lingthusiasm to anyone who needs a little more linguistics in their life.
Gretchen: Lingthusiasm is created and produced by Gretchen McCulloch and Lauren Gawne. Our senior producer to Claire Gawne, and our editorial producer is Sarah Dopierala, and our music is “Ancient City” by The Triangles.
Janelle: Stay lingthusiastic!
[Music]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
#language#linguistics#lingthusiasm#episode 40#transcripts#Janelle Shane#artificial intelligence#Neural net#roboLingthusiasm#AI
35 notes
·
View notes