#macrocomparison
Explore tagged Tumblr posts
Text
From Greenberg (1957: 42–43):
All available grammatical information should be systematically examined, but vocabulary leads most swiftly to the correct hypotheses as a general rule. The effectiveness of mass comparison of basic vocabulary, for all its apparent simplicity, is illustrated in Table 3 by only a few forms from all the contemporary languages of Europe. Note that, even by the time the second word has been examined, the correct hypothesis emerges.
Not impressed by this assertion. Sure the comparison of just '1' and '2' will easily suggest some language groups like Germanic, Baltic, Slavic, Finnic, and leave Basque so far isolated. But Hungarian egy (also mistranscribed here as /ed/ and not /ɛɟ/ or even /edʲ/) will not be readily identified as (and probably isn't) cognate to Finnic *üksi. One might rather end up thinking it is a reduced reflex of Slavic #jadin (actually *edinъ but we will not exactly be doing any real reconstruction based on just this data). Két also shows no highly compelling similarity to *kaksi; maybe would do so more if G. had bothered to inform that the inflected stem is kahte-. So just by comparing the first two numerals' citation forms, the most sound conclusion might be that Hungarian is either a divergent Slavic language, or an isolate that has borrowed '1' from Slavic!
Do we really retrieve correct sorting of the other IE languages from just these either, if we pay no heed to G. having already sorted the data according to what we should think? I see no consistent way to divide Celtic from Romance, rather e.g. Irish and French seem to come close to each other. And nothing whatsoever even suggests Balto-Slavic. Instead, I think, we'd end up with a basic division Slavic / Rest by different words for '1', which then suggests that the cluster *tv- in Germanic is archaic and that groups with only *d- are a single unit; and Greek, of course, subgroups with Baltic by virtue of suffixal -as in '1' and evidence for a close front element in '2'. Maybe it would remain there even after considering the rest of thsi data, thanks to e.g. #afci 'ear', #(m-)aki 'eye', and the i-suffix in #dand-i 'tooth'!
This has probably been noted before, just adding to the record.
7 notes
·
View notes
Text
My baldfaced optimism™ on the limits of the comparative method does not even end at the prospect of perhaps unearthing some partial fragments of things like Proto-Indo-Uralic. Eventually I could be in principle even sold on some "global etymologies" going back to Proto-World or Proto-Exo-African or therearound, if I trusted the basic data to not have been mishandled too badly.
Currently I very much do not grant this though. I only know some global etymologies to have been proposed in any real detail by Bengtson & Ruhlen (1994). Their method in seeking these alas does not seem to have been to work step-by-step thru established reconstructions or etymologies or even families, but more the old prescientific approach to just cherrypick words from individual languages, and no, we know that that has an astronomically high rate of false positives.
Simple example: in positing that some word like Kaingang /(in-)tso/ '(my) leg' comes from Proto-World #tsaku, the evidential value of this is vanishingly little before we know if there is any reason to reconstruct the word even for Proto-Jê. Per recent comparative work (e.g. Nikulin 2020) it appears that the native Proto-Jê word for 'foot/leg' is rather *par > South Jê *pãn > Kaingang /pẽn/. Something is also very wrong here anyway because Kaingang doesn't even have a phoneme /ts/: the word must have been either misattributed or mistranscribed. And in B&R we find no source for the data to even check whichever might be the case. Once again, garbage in, garbage out.
There is power in numbers, and many independent lines of evidence could in principle support a hypothesis even if many of these turned out to be junk ("recyclables in, perhaps some non-garbage out"?). But so far that also fares much worse than it theoretically could. Bengtson & Ruhlen's entry for #tsaku lists 103 reflexes, unpacking to maybe 200-300-ish individual languages given that some reflexes are from decent-sized proto-languages like Proto-Bantu or Proto-Andic. A good Proto-World etymology however, it seems to me, could not be possibly supported by a mere low triple digits number of reflexes. It would need to have thousands, with thorough attestation in all sorts of language families around the world. This is probably not doable just by eyeballing for vague phonetic and semantic similarities though, and would need detailed lexicographical, reconstruction and etymological work to be done on every established family involved. Maybe not necessarily on the level of Indo-European or Uralic, but at least on the level of what we currently have on the likes of Turkic or Dravidian. This in mind I do not foresee actually plausible Proto-World etymologies coming along anytime soon.
As a corollary, a rigorous argument for anything to do with Proto-World would also require handling more data (and more source literature) than one or two or six bold maverick linguists could possibly hope to accomplish in a lifetime. Still only finitely much though. Probably getting here would "just" take an extensively collaborative framework for collecting the etymologies of every language family on Earth. This would pare the task much further down. Instead of comparing the vocabularies of all individual languages on Earth (Glottolog currently tracks 8533 varieties), a wannabe Proto-Worlder could just look at 100 or so reconstructed proto-languages, a much more tractable comparison task. But then focusing on intermediate speculative groupings like Proto-Amerind or Proto-Australian should be more tractable yet; still more so focusing on families just beyond the horizon like Austric or Penutian; and as said most established ground-level families so far still could use quite a lot of work. There is no royal road to etymology.
11 notes
·
View notes
Text
Welp, there goes Kortlandt into the wild speculation stage of retirement: he has now posted an essay according to which Indo-European would be a branch of Uralic — more specifically that Indo-Uralic is a sister branch of Finno-Ugric, and Samoyedic is a more distant relative. This is based just on archeological considerations, while linguistically this is obvious nonsense if you know anything about the structure of Samoyedic (on the level of claiming that Latin is actually Germanic and is more closely related to modern Germanic languages than Gothic is).
The list of Indo-Uralic grammatical formants is still interesting, but it also certainly hides a lot of detail under listing things like three “locatives” (nominal? postpositional? etc.), two “nominalizers” (action noun? actor noun? etc.) and four “participles” (active? passive? etc.)
5 notes
·
View notes
Link
TIL about Allan Bomhard’s collection of comparative linguistics works on Internet Archive. Lots of interesting stuff here, maybe mostly as research history reading and/or macrocomparison fodder however.
A few picks though that I can readily vouch for as both entirely mainstream and not too badly outdated include Hajdú (1975), Finno-Ugrian Languages and Peoples (still a decent introduction to Uralic studies) or Subrahmanyam (1983), Dravidian Comparative Phonology (still a standard reference work).
15 notes
·
View notes
Link
Newest suggestion from the "Transeurasian" rebranding of Altaicists. I've seen relatively little productive discussion of this so far. To start with it is surely probable that something spread with millet agriculture as detected in the archeological evidence, and this does seem to fit the rough degree of divergence between the suggested branches of Altaic. I'm less sure if this has anything at all to do with Mongolic or especially Turkic, whose homeland seems to be often located more by where people want it to be than where there is any actual evidence for it (and as always, agriculture can be transmitted also independently of language).
The clearest indicator of problems must be the absolutely waffling and slightly nonsensical take in their article's map 1b, with Proto-Turkic being placed as a big sausage from Beijing to north-central Kazakhstan:
Even their own Supplementary Data 4 fails to support any of this furthest eastern range in fact. This has been rather placed by slight terminological abuse where the lineage of Turkic immediately counts as "Proto-Turkic" as soon as it splits from Proto-Transeurasian:
Therefore, the Proto-Turkic homeland on the map in Fig. SI 4.10 can be considered as a dynamic entity, gradually expanding from Southeast to Northwest from the Middle Neolithic to the Early Iron Age.
But this makes the map and the argument rather misleading since it's not that range (3) is reconstructed by the evidence of the descendant ranges. Instead, apparently the descendant ranges have been tweaked to better fit the Proto-Transeurasian range they want to find. So should we trust any of them?
I am also not impressed at all by the alleged native layer of "agripastoral" vocabulary in Turkic. The section on agriculture in particular seems to be mostly made of look-alikes with meanings different from the other words (which do correspond better), e.g. 'field' is compared with 'island', 'sour' is compared with Turkic–Mongolic 'to filter' (both are steps in cheesemaking but I don't think that makes them cognate); or, in some cases, general action verbs with no especially agricultural meaning ('to soak', with four separate verbs with this meaning suggested altogether; 'to crush'). The only really solid-looking case is *tari- 'to cultivate', and this is however absent from Japanese and Korean.
It would not be hard to find some suggestive Uralic correspondences to many of these either, off the top of my head e.g.
– *muda 'field': cf. Uralic *muďa 'earth' – *saga- 'to ferment': cf. Ugric *čawa- id. – *pisi- 'to sprinkle': cf. west Uralic *piśa-, perhaps 'to drizzle' (Finnic pisara 'drop', Mordvinic piźe- 'to rain'); or Ob-Ugric *pëśəɣ 'to drop'
A section on domesticated animals is also amazingly weak, finding only three items where none is unambiguously domesticated and all classic livestock are absent:
– Japonic–Korean *ina/u 'dog' ~ Tungusic 'wolf' – Mongolic *toru 'young pig' ~ Turkic *tōru 'young ruminant' (~ North Tungusic torokī 'wild boar', probably ← Mongolic). – Khitan–Tungusic *uli 'pig'
20 notes
·
View notes
Text
“Kassite might be a rest [sic] of Altaic on its Near Eastern homeland”
well, there’s a bold new linguistic relationship hypothesis—
“Kassite da-kaš ‘star’ ~ Proto-Altaic *kiūčV > Proto-Turkic *Kučɨk ‘the constellation of Cancer’”
idontknowwhatiwasexpecting.jpg
(it’s not even interestingly bad, just junk all around…I rate this at ca. 80 mNylands)
0 notes