#linguistic corpus
Explore tagged Tumblr posts
Text
Personally I believe the world would benefit greatly from the existence of corpora from fanfiction.
Imagine a corpus that draws its data from every single ao3 fanfiction.
The number of occurrences for the most painfully specific words would be off the charts.
The collocations would be insane.
I am haunted by the possibility of a corpus in which the word "oh" occurs a million times and half of them are followed by another "oh".
Delightful
#corpus#corpus linguistics#linguistics#linguistic corpus#corpora#ao3#fanfiction#archive of our own#text corpus
3 notes
·
View notes
Text
Bonus 90: Don't you love to do a "do" episode?
We do love the word "do", and we hope you do too. Don't you want to know what the word "do" can do?
In this bonus episode, Gretchen and Lauren get enthusiastic about the word "do"! We talk about the various functions of "do" as illustrated by lyrics from ABBA and other pop songs, what makes the word "do" so unique in English compared to other languages, and the drama of how "do" caught on and then almost got driven out again. Listen to this episode about do, and get access to many more bonus episodes by supporting Lingthusiasm on Patreon.
#linguistics#lingthusiasm#language#podcast#podcasts#bonus#bonus episodes#bonuses#do#english grammar#auxiliary#grammaticalisation#syntax#morphosyntax#do support#corpus
68 notes
·
View notes
Text
What is corpus linguistics?
Linguists need large databases of language in use in order to study how language works.
A language database is called a corpus (Latin ‘body’, pl. corpora), and a corpus consists of a collection of texts (they don’t have to be actual text; a “text” is any cohesive discourse event).
The field of corpus linguistics focuses on how to build and use corpora and run statistical analyses on the resulting data.
Here’s a very accessible introduction to corpus linguistics:
58 notes
·
View notes
Text
hapax legomenon and automated email replies
While I’ve been on leave in 2023 I’ve had an automated email reply set up to direct people who email me to the most relevant alternative contact. Because I know that some people are stuck emailing me (sorry bosses, sorry mailing lists), I wanted to add a reminder about the magic of email filters, and couldn’t resist using it to share a little fact about corpus linguistics:
Sick of this automated reply?
If you’d like to not get automated replies from me, you can filter them by creating a rule. The best rule will probably be to filter anything that has the phrase "a hapax legomenon is a word or an expression that occurs only once within a corpus of texts" in the body of the email. That’s rare enough that it currently doesn’t turn up anywhere on the internet when I search it as a string with DuckDuckGo.
Of course, by time I’m back from leave this post will be up and my autoreply won’t technically be correct anymore!
29 notes
·
View notes
Text
Book of the Day - A Practical Handbook of Corpus Linguistics
Today’s Book of the Day is A Practical Handbook of Corpus Linguistics, edited by Magali Paquot and Stefan Th. Gries in 2021 and published by Springer. Magali Paquot is a permanent FNRS research associate at the Centre for English Corpus Linguistics, UCLouvain. She is co‐editor in chief of the International Journal of Learner Corpus Research, a founding member of the Learner Corpus Research…
View On WordPress
#A Practical Handbook of Corpus Linguistics#AI#AI developer#Artificial Intelligence#Business#business coach#business consultant#Coaching#consulti#Corpus Linguistics#Discourse Analysis#linguistics#Magali Paquot#quantitative analysis#Raffaello Palandri#Stefan Th. Gries
70 notes
·
View notes
Text
i'm thinking of doing a corpus study with taylor's discography and this popped up in the keywords results and to be fair it reads like poetry
#this is for a paper i have to write for a corpus linguistics class#if i can find something interesting to say lol#rambles
3 notes
·
View notes
Text
Got accepted to my first academic conference!
#I will be presenting some original corpus linguistics research re: patterns of metaphor in 'Onegin'#(I started the research in question over winter break)#(now I need to finish it and turn it into a conference paper — I'm so excited!)
46 notes
·
View notes
Text
so we've got terms like "actually autistic" to distance from vocal but unhelpful crowds like autism speaks, and i think that's great.
but i feel like i need one that's like "actually AI" so i can tag and follow stuff about the actual science and helpful application of machine learning stuff without the Build-Your-Own Waifu crowd invading my dash
4 notes
·
View notes
Text
0 notes
Text
Not me getting a tumblr api key and melting my brain with python bc I want to know more about the tumblr-specific flavour of irony and self-deprecation inherent to 'normal to want and possible to achieve'.
no, of course I'm not doing it so I can spend time on tumblr and call it research for uni, thus soothing my catholic-wired guilt-sodden brain, how dare you
1 note
·
View note
Text
Kat: Yeah. Computers are super, super good at counting. They’re super, super good at finding and identifying these strings. But they’re not very good at the analysis bit. We don’t want our computer to do the analysis for us. We want to be very aware of the kind of software and the kind of programming that goes into it that give us the results. Because we as humans are fantastically sensitive to language. That’s where the human element comes in. It’s why we don’t just leave it all to the computers to just do as they will with it.
Gretchen: It’s really a lot more of a partnership between the computer showing you some things and the human making meaning out of that.
Kat: Exactly. It’s meant to be a partnership where you play to each other’s strengths. You let the computer do the bit it’s good at, and then you do the bit you’re good at. Excerpt from Lingthusiasm episode: Corpus linguistics and consent - Interview with Kat Gupta
Listen to the episode, read the full transcript, or check out more links about language and technology, and the history of language
#langauge#linguistics#lingthusiasm#episode 61#quotes#podcasts#corpus linguistics#kat gupta#language and technology
173 notes
·
View notes
Text
The many senses of run
How do you define the word run? You probably think of something like ‘fast pedestrian motion’, but what about the use of run in these examples? There are three boats that run from the mainland to the Island On my way to the elevator, I ran into Pete the bench, which numerous times rebuked the Attorney General for letting his witnesses run on The tears ran down my face Colors on the towels…
View On WordPress
#cognitive linguistics#corpus linguistics#historical linguistics#language change#prototype theory#prototypes#semantic maps#semantics
13 notes
·
View notes
Text
like severus snape, james potter and harry potter once said... leviCORPUS!
hi, tumblr's lovely people,
I know, I know... you've missed me. I'm well aware that in my farewell post, I said this break won't take too long but oh, boy, wasn't I lying? but no worries, as Georgina Sparks says in Gossip Girl you can tell Jesus that the freak is back. aww, I can almost hear your virtual screams, didn't notice you love me that much. I love you too, fellow teacher candidates. anyway, a bit of seriousness now, please. we're still here for education and education only. it's not like you're following me to bore you to death about Taylor Swift, Gossip Girl, Harry Potter, etc.
speaking of Harry Potter, you must have been wondering what this post is all about and how it connects to a spell from a fictional world. well, actually it doesn't. I thought it was funny when I wrote it. however, even though this post has nothing to do with our Hogwarts houses, it does have a connection to the title. as you can see from the word I capitalized, we are gathered here to talk about our Corpus assignment. I won't trouble you by explaining what Corpus is, probably you already know. you're as much of a genius as I am, duh? although, I want to tell you all about our task.
for this assignment, we had to prepare a worksheet to teach and reinforce the topic we chose from the units in the book sent to us using Corpus tools. we were paired up with my classmate Ayşe (I'll tag her blog here, so you can check it out) and first, we started by choosing our topic. after deciding that the most effective unit to prepare the most effective work is Unit 3, where we can talk about our hobbies and free time activities and at the same time focus on likes and dislikes, we divided our process into four parts, taking into account the sample work our teacher posted on Google Classroom. afterward, we started to brainstorm and develop our activities step by step, considering whether they met the requirements in the checklist sent to us beforehand.
to tell you the truth, I (as always) loved the outcome and was very proud of us. there were aspects of the assignment that were really challenging and difficult to understand, but we overcame them by keeping in mind the elements of the task that gave us the capacity for creativity and a deeper understanding of the needs of a new generation of learners. it was also a new experience for me to prepare activities and to combine them with an educational tool like Corpus, which I didn't know before, and I'm so glad that our assignment gave us this space where we can be challenged sweetly, but in return, we can acquire a different teaching skill. besides, how can you not love an assignment that gives you a reason to combine its name with a Harry Potter universe spell?
well, I'll go now before I talk more and turn this post into a 7-book series. if you're curious about our work, you can check the link. also, that sweet little pink and lilac-designed worksheet has my blood, sweat, and tears on it. if you try to steal our ideas, I'll know and I'll make sure the teachers' guardian angels haunt you for the rest of your life. so, beware.
thank you so much for tolerating me and reading until the end. you know I wouldn't probably. sending you virtual hugs and so much love.
until next time,
with love... and obviously education, Doğa.
oh, and also... LEVICORPUS! 🪄
#technology#corpus linguistics#21st century learning#english teacher#education#english learning#learning english
0 notes
Text
Image Descriptions in Alt Text
For The Class That I Made This Blog For, we had to present on a Hot Topic in TESOL right now. Our professor asked for a data-driven approach, and encouraged us to us AI LLMs like ChatGPT or Claude to analyse our data. I . . . did half of that. I manually found and created corpora and then used AntConc, the classic Corpus Linguistics program, to analyze the data by hand. It took too much time. I was consumed by the work. Luckily, my professor allowed me to use my explanation of how I found a Hot Topic to be my presentation for my hot topic. Very chill.
Links:
AntConc is software that lets you study large text corpora. It's awesome. Love it so much.
TESOL Quarterly is one of the top respected journals in the TESOL profession.
IATEFL is the European version of TESOL International, and is also highly respected in the field.
1 note
·
View note
Text
i dont think its morally reprehensible to use "ai" for school assignments so long as u use it smartly. the thing about ai"—language algoritms—is that they don't know shit. so you gotta be the one who knows shit. u can tell it, write x amount of words introducing y topic and mention points z and w that u the human have researched. but ya gotta do the research. u gotta double check what it writes. u gotta actually have your sources. if you ask it to give u sources, it combines words into things that look like sources. odds are there's nothing real about them! you're gonna screw yourself over something fierce if you rely on a language algorithm to know shit. it doesn't understand any of the data it has. it doesn't know what's true and what's not! it just combines data in reasonably believable ways. it's a tool, nothing more, nothing less.
#it was fucking funny lin the corpust linguistics course i took how people were talking about applying ai in corpus linguistics#and the teacher had to tell em that ai is literally an application OF corpus linguistics
0 notes