#linguistic corpus
Explore tagged Tumblr posts
Text
Personally I believe the world would benefit greatly from the existence of corpora from fanfiction.
Imagine a corpus that draws its data from every single ao3 fanfiction.
The number of occurrences for the most painfully specific words would be off the charts.
The collocations would be insane.
I am haunted by the possibility of a corpus in which the word "oh" occurs a million times and half of them are followed by another "oh".
Delightful
#corpus#corpus linguistics#linguistics#linguistic corpus#corpora#ao3#fanfiction#archive of our own#text corpus
3 notes
·
View notes
Text
Bonus 90: Don't you love to do a "do" episode?
We do love the word "do", and we hope you do too. Don't you want to know what the word "do" can do?
In this bonus episode, Gretchen and Lauren get enthusiastic about the word "do"! We talk about the various functions of "do" as illustrated by lyrics from ABBA and other pop songs, what makes the word "do" so unique in English compared to other languages, and the drama of how "do" caught on and then almost got driven out again. Listen to this episode about do, and get access to many more bonus episodes by supporting Lingthusiasm on Patreon.
#linguistics#lingthusiasm#language#podcast#podcasts#bonus#bonus episodes#bonuses#do#english grammar#auxiliary#grammaticalisation#syntax#morphosyntax#do support#corpus
68 notes
·
View notes
Text
What is corpus linguistics?
Linguists need large databases of language in use in order to study how language works.
A language database is called a corpus (Latin ‘body’, pl. corpora), and a corpus consists of a collection of texts (they don’t have to be actual text; a “text” is any cohesive discourse event).
The field of corpus linguistics focuses on how to build and use corpora and run statistical analyses on the resulting data.
Here’s a very accessible introduction to corpus linguistics:
58 notes
·
View notes
Text
hapax legomenon and automated email replies
While I’ve been on leave in 2023 I’ve had an automated email reply set up to direct people who email me to the most relevant alternative contact. Because I know that some people are stuck emailing me (sorry bosses, sorry mailing lists), I wanted to add a reminder about the magic of email filters, and couldn’t resist using it to share a little fact about corpus linguistics:
Sick of this automated reply?
If you’d like to not get automated replies from me, you can filter them by creating a rule. The best rule will probably be to filter anything that has the phrase "a hapax legomenon is a word or an expression that occurs only once within a corpus of texts" in the body of the email. That’s rare enough that it currently doesn’t turn up anywhere on the internet when I search it as a string with DuckDuckGo.
Of course, by time I’m back from leave this post will be up and my autoreply won’t technically be correct anymore!
29 notes
·
View notes
Text
On Poimandrēs
The classical Hermetic literature (the Corpus Hermeticum, the Asclepius, the Stobaean Hermetic Fragments, etc.) generally take one of several formats: a monologue-type musing or speech (e.g. CH III or CH VII), a letter from a teacher to a student or between students sharing their wisdom (e.g. CH XIV or CH XVI), or a dialogue between a teacher and student (e.g. CH I or CH IV). By far the most…
View On WordPress
#amenemhat iii#coptic#Corpus Hermeticum#egyptian#greek#howard jackson#linguistics#ma`at#nimaatre#peter kingsley#poemander#poimandres#pymander#zosimos of panopolis
11 notes
·
View notes
Text
Book of the Day - A Practical Handbook of Corpus Linguistics
Today’s Book of the Day is A Practical Handbook of Corpus Linguistics, edited by Magali Paquot and Stefan Th. Gries in 2021 and published by Springer. Magali Paquot is a permanent FNRS research associate at the Centre for English Corpus Linguistics, UCLouvain. She is co‐editor in chief of the International Journal of Learner Corpus Research, a founding member of the Learner Corpus Research…
View On WordPress
#A Practical Handbook of Corpus Linguistics#AI#AI developer#Artificial Intelligence#Business#business coach#business consultant#Coaching#consulti#Corpus Linguistics#Discourse Analysis#linguistics#Magali Paquot#quantitative analysis#Raffaello Palandri#Stefan Th. Gries
69 notes
·
View notes
Text
i'm thinking of doing a corpus study with taylor's discography and this popped up in the keywords results and to be fair it reads like poetry
#this is for a paper i have to write for a corpus linguistics class#if i can find something interesting to say lol#rambles
3 notes
·
View notes
Text
Got accepted to my first academic conference!
#I will be presenting some original corpus linguistics research re: patterns of metaphor in 'Onegin'#(I started the research in question over winter break)#(now I need to finish it and turn it into a conference paper — I'm so excited!)
46 notes
·
View notes
Text
i will say yesterday i caught part of bad's stream where he was explaining how ai and llms work and it was a pretty good explanation aside from the dirt block placing
#bell.txt#not to act like an authority on how they work but i know more than most random people because of where i went to school#and also bc ive done linguistics so i understand better than most how a corpus works
3 notes
·
View notes
Text
so we've got terms like "actually autistic" to distance from vocal but unhelpful crowds like autism speaks, and i think that's great.
but i feel like i need one that's like "actually AI" so i can tag and follow stuff about the actual science and helpful application of machine learning stuff without the Build-Your-Own Waifu crowd invading my dash
4 notes
·
View notes
Text
Kat: Yeah. Computers are super, super good at counting. They’re super, super good at finding and identifying these strings. But they’re not very good at the analysis bit. We don’t want our computer to do the analysis for us. We want to be very aware of the kind of software and the kind of programming that goes into it that give us the results. Because we as humans are fantastically sensitive to language. That’s where the human element comes in. It’s why we don’t just leave it all to the computers to just do as they will with it.
Gretchen: It’s really a lot more of a partnership between the computer showing you some things and the human making meaning out of that.
Kat: Exactly. It’s meant to be a partnership where you play to each other’s strengths. You let the computer do the bit it’s good at, and then you do the bit you’re good at. Excerpt from Lingthusiasm episode: Corpus linguistics and consent - Interview with Kat Gupta
Listen to the episode, read the full transcript, or check out more links about language and technology, and the history of language
#langauge#linguistics#lingthusiasm#episode 61#quotes#podcasts#corpus linguistics#kat gupta#language and technology
173 notes
·
View notes
Text
The many senses of run
How do you define the word run? You probably think of something like ‘fast pedestrian motion’, but what about the use of run in these examples? There are three boats that run from the mainland to the Island On my way to the elevator, I ran into Pete the bench, which numerous times rebuked the Attorney General for letting his witnesses run on The tears ran down my face Colors on the towels…
View On WordPress
#cognitive linguistics#corpus linguistics#historical linguistics#language change#prototype theory#prototypes#semantic maps#semantics
13 notes
·
View notes
Text
Reading the Hermetica: CH XVI
For this week’s Reading the Hermetica discussion, we’re continuing our reading and discussion of the Corpus Hermeticum (CH), specifically Book 16 (CH XVI). This text is entitled “Definitions of Asklēpios to King Ammōn (on God, matter, vice, fate, the Sun, intellectual essence, divine essence, mankind, the arrangement of the plenitude, the seven stars, and mankind according to the image)”. As…
#animism#astrology#Corpus Hermeticum#daimones#language#linguistics#nativism#reading the hermetica#solar#sun
2 notes
·
View notes
Text
something interesting i learned from a second-language acquisition lecture today, apparently if a lexical (vocab) item appears less than 20 times per million words it’s not worth teaching lol
#basically something that uncommon would just be learned on a case by case basis when you come across it#but it's not useful to spend lesson time on in the classroom#ofc if you don't have access to a corpus or something it's hard to guess how often something appears#but still! interesting stuff#linguistics#sasha.txt
5 notes
·
View notes
Text
Just casually watching an ex mormon video and she pulls out the CORPUS LINGUISTICS?? I see you girl
#i mean The Corpus Linguistics guy was a byu professor for like 20 years#so kinda funny to see one of his corpora brought up in a video criticising the mormon church
1 note
·
View note